Gemini (Realtime)

Gemini (Realtime)

Use Google's Gemini native audio preview models for real-time audio-to-audio conversations.

Setup

  • Set GEMINI_API_KEY
# .env
GEMINI_API_KEY=...

Example

from siphon.agent import Agent
from siphon.plugins import gemini

agent = Agent(
    agent_name="RealtimeAssistant",
    llm=gemini.Realtime(
        model="gemini-2.5-flash-native-audio-preview-12-2025",
        voice="Puck",
        temperature=0.3,
        max_output_tokens=150
    ),
    system_instructions="You are a helpful voice assistant.",
)

if __name__ == "__main__":
    agent.dev()

Common options

  • model (default: gemini-2.5-flash-native-audio-preview-12-2025)
  • voice (default: Puck)
    • Available voices: Puck, Charon, Kore, Fenrir, Aoede
  • temperature (default: 0.3)
  • max_output_tokens (default: 150)
  • api_key (default: None - reads from GEMINI_API_KEY env var)

Notes

  • Gemini Realtime uses Google's native audio preview models
  • Provides end-to-end audio processing without separate STT/TTS components
  • Optimized for conversational latency and natural dialogue
  • Preview models may have access restrictions or usage limits

When to Use

Use Gemini Realtime when:

  • You need Google's latest conversational AI capabilities
  • You want native audio understanding and generation
  • Your use case benefits from Gemini's multimodal reasoning
  • You prefer simplified architecture over separate components

Alternative: Traditional Pipeline

If you need more flexibility or want to use Gemini for only LLM or TTS:

from siphon.plugins import gemini, deepgram

# Gemini LLM with separate STT/TTS
agent = Agent(
    agent_name="Assistant",
    llm=gemini.LLM(),
    stt=deepgram.STT(),
    tts=gemini.TTS(),
)

Voice Options

Gemini Realtime supports multiple voice options for natural-sounding speech:

  • Puck (default): Balanced, professional voice
  • Charon: Deep, authoritative voice
  • Kore: Clear, articulate voice
  • Fenrir: Dynamic, energetic voice
  • Aoede: Warm, friendly voice

Choose the voice that best matches your use case and brand identity.