Is SIPHON really free?

Yes. SIPHON is 100% open-source under the Apache 2.0 license. You never pay us a platform fee. You only pay for your own infrastructure and the AI/Telephony providers you choose (e.g., OpenAI, Twilio). Zero markups, zero per-minute fees.

How much can I save compared to Vapi, Retell, or Bland?

Managed platforms charge $0.05-$0.30/min on top of your AI provider costs. With SIPHON, you pay only direct provider costs (typically $0.01-$0.03/min). For 10,000 minutes/month, that's $500-$3,000 in savings.

Who owns my call data with SIPHON?

You do. SIPHON runs on your infrastructure. All recordings, transcripts, and metadata stay in your storage (S3, MongoDB, SQL, etc.). No third-party platform has access to your customer conversations.

Do I need to be a VoIP expert?

Not at all. SIPHON abstracts away the complex SIP signaling and media handling. If you're comfortable with Python, you can build production-grade voice agents.

Which AI models can I use with SIPHON?

You have total freedom. SIPHON supports OpenAI, Anthropic, Gemini, Groq, DeepSeek, Cerebras, and open-source models. You can swap providers with a single config change - no vendor lock-in.

How does SIPHON handle latency?

SIPHON is optimized for sub-500ms latency using WebRTC (LiveKit). By running on your infrastructure and choosing fast models (Groq, Cerebras, GPT-4o), you achieve natural conversation speeds without platform routing overhead.

Can I scale SIPHON to thousands of calls?

Yes. SIPHON is built on LiveKit, which powers massive-scale streaming. The worker architecture allows horizontal scaling to handle any call volume on your infrastructure.

Is SIPHON HIPAA/SOC2 compliant?

SIPHON is a framework that runs on your infrastructure. Compliance depends on your deployment. Since you control all data and infrastructure, you can deploy SIPHON in HIPAA-compliant or air-gapped environments - something not possible with managed platforms.

Gemini (Realtime)

Use Google's Gemini native audio preview models for real-time audio-to-audio conversations.

Setup

Set GEMINI_API_KEY

# .env
GEMINI_API_KEY=...

Example

from siphon.agent import Agent
from siphon.plugins import gemini

agent = Agent(
    agent_name="RealtimeAssistant",
    llm=gemini.Realtime(
        model="gemini-2.5-flash-native-audio-preview-12-2025",
        voice="Puck",
        temperature=0.3,
        max_output_tokens=150
    ),
    system_instructions="You are a helpful voice assistant.",
)

if __name__ == "__main__":
    agent.dev()

Common options

model (default: gemini-2.5-flash-native-audio-preview-12-2025)
voice (default: Puck)
- Available voices: Puck, Charon, Kore, Fenrir, Aoede
temperature (default: 0.3)
max_output_tokens (default: 150)
api_key (default: None - reads from GEMINI_API_KEY env var)

Notes

Gemini Realtime uses Google's native audio preview models
Provides end-to-end audio processing without separate STT/TTS components
Optimized for conversational latency and natural dialogue
Preview models may have access restrictions or usage limits

When to Use

Use Gemini Realtime when:

You need Google's latest conversational AI capabilities
You want native audio understanding and generation
Your use case benefits from Gemini's multimodal reasoning
You prefer simplified architecture over separate components

Alternative: Traditional Pipeline

If you need more flexibility or want to use Gemini for only LLM or TTS:

from siphon.plugins import gemini, deepgram

# Gemini LLM with separate STT/TTS
agent = Agent(
    agent_name="Assistant",
    llm=gemini.LLM(),
    stt=deepgram.STT(),
    tts=gemini.TTS(),
)

Voice Options

Gemini Realtime supports multiple voice options for natural-sounding speech:

Puck (default): Balanced, professional voice
Charon: Deep, authoritative voice
Kore: Clear, articulate voice
Fenrir: Dynamic, energetic voice
Aoede: Warm, friendly voice

Choose the voice that best matches your use case and brand identity.