Gemini (Realtime)
Gemini (Realtime)
Use Google's Gemini native audio preview models for real-time audio-to-audio conversations.
Setup
- Set
GEMINI_API_KEY
# .env
GEMINI_API_KEY=...
Example
from siphon.agent import Agent
from siphon.plugins import gemini
agent = Agent(
agent_name="RealtimeAssistant",
llm=gemini.Realtime(
model="gemini-2.5-flash-native-audio-preview-12-2025",
voice="Puck",
temperature=0.3,
max_output_tokens=150
),
system_instructions="You are a helpful voice assistant.",
)
if __name__ == "__main__":
agent.dev()
Common options
model(default:gemini-2.5-flash-native-audio-preview-12-2025)voice(default:Puck)- Available voices:
Puck,Charon,Kore,Fenrir,Aoede
- Available voices:
temperature(default:0.3)max_output_tokens(default:150)api_key(default:None- reads fromGEMINI_API_KEYenv var)
Notes
- Gemini Realtime uses Google's native audio preview models
- Provides end-to-end audio processing without separate STT/TTS components
- Optimized for conversational latency and natural dialogue
- Preview models may have access restrictions or usage limits
When to Use
Use Gemini Realtime when:
- You need Google's latest conversational AI capabilities
- You want native audio understanding and generation
- Your use case benefits from Gemini's multimodal reasoning
- You prefer simplified architecture over separate components
Alternative: Traditional Pipeline
If you need more flexibility or want to use Gemini for only LLM or TTS:
from siphon.plugins import gemini, deepgram
# Gemini LLM with separate STT/TTS
agent = Agent(
agent_name="Assistant",
llm=gemini.LLM(),
stt=deepgram.STT(),
tts=gemini.TTS(),
)
Voice Options
Gemini Realtime supports multiple voice options for natural-sounding speech:
- Puck (default): Balanced, professional voice
- Charon: Deep, authoritative voice
- Kore: Clear, articulate voice
- Fenrir: Dynamic, energetic voice
- Aoede: Warm, friendly voice
Choose the voice that best matches your use case and brand identity.