OpenAI (Realtime)

OpenAI (Realtime)

Use OpenAI's Realtime API for native audio-to-audio conversations with ultra-low latency.

Setup

  • Set OPENAI_API_KEY
# .env
OPENAI_API_KEY=...

Example

from siphon.agent import Agent
from siphon.plugins import openai

agent = Agent(
    agent_name="RealtimeAssistant",
    llm=openai.Realtime(
        model="gpt-realtime",
        voice="alloy",
        temperature=0.3
    ),
    system_instructions="You are a helpful voice assistant.",
)

if __name__ == "__main__":
    agent.dev()

Common options

  • model (default: gpt-realtime)
  • voice (default: alloy)
    • Available voices: alloy, echo, shimmer
  • temperature (default: 0.3)
  • api_key (default: None - reads from OPENAI_API_KEY env var)

Notes

  • OpenAI Realtime API provides native audio input/output
  • Handles the complete voice pipeline in a single model
  • Optimized for conversational latency
  • Supports natural interruptions and turn-taking

When to Use

Use OpenAI Realtime when:

  • You need ultra-low latency voice conversations
  • You want simplified architecture (one component instead of LLM + STT + TTS)
  • Your use case benefits from OpenAI's conversational AI optimizations

Alternative: Traditional Pipeline

If you need more flexibility or want to mix providers, use the traditional approach:

from siphon.plugins import openai, deepgram, cartesia

agent = Agent(
    agent_name="Assistant",
    llm=openai.LLM(),
    stt=deepgram.STT(),
    tts=cartesia.TTS(),
)