Realtime Models Overview
Realtime Models Overview
Realtime models provide native audio-to-audio conversation capabilities with end-to-end latency optimizations. These models handle the complete voice pipeline (speech-to-text, reasoning, and text-to-speech) in a single integrated system.
What are Realtime Models?
Realtime models are designed specifically for voice conversations and offer:
- Native Audio Processing: Direct audio input and output without separate STT/TTS components
- Ultra-Low Latency: Optimized for real-time conversations with minimal delay
- Integrated Pipeline: Single model handles listening, reasoning, and speaking
- Natural Interruptions: Better handling of conversational dynamics
When to Use Realtime Models
Use realtime models when:
- You need the absolute lowest latency for voice conversations
- You want simplified architecture (one model instead of LLM + STT + TTS)
- Your use case benefits from native audio understanding
- You're building interactive voice experiences
Available Providers
SIPHON supports the following realtime model providers:
Usage Pattern
from siphon.agent import Agent
from siphon.plugins import openai # or gemini
# Use realtime model instead of separate LLM/STT/TTS
agent = Agent(
agent_name="RealtimeAssistant",
llm=openai.Realtime(
model="gpt-realtime",
voice="alloy",
temperature=0.3
),
system_instructions="You are a helpful voice assistant.",
)
if __name__ == "__main__":
agent.dev()
Key Differences from Traditional Pipeline
| Traditional (LLM + STT + TTS) | Realtime Model |
|---|---|
| Three separate components | Single integrated component |
| Higher overall latency | Ultra-low latency |
| More configuration options | Simplified configuration |
| Flexible provider mixing | Provider-specific |
Notes
- Realtime models typically require specific API access or preview features
- Configuration options may vary by provider
- Some providers may have usage limits or pricing differences for realtime APIs