Is SIPHON really free?

Yes. SIPHON is 100% open-source under the Apache 2.0 license. You never pay us a platform fee. You only pay for your own infrastructure and the AI/Telephony providers you choose (e.g., OpenAI, Twilio). Zero markups, zero per-minute fees.

How much can I save compared to Vapi, Retell, or Bland?

Managed platforms charge $0.05-$0.30/min on top of your AI provider costs. With SIPHON, you pay only direct provider costs (typically $0.01-$0.03/min). For 10,000 minutes/month, that's $500-$3,000 in savings.

Who owns my call data with SIPHON?

You do. SIPHON runs on your infrastructure. All recordings, transcripts, and metadata stay in your storage (S3, MongoDB, SQL, etc.). No third-party platform has access to your customer conversations.

Do I need to be a VoIP expert?

Not at all. SIPHON abstracts away the complex SIP signaling and media handling. If you're comfortable with Python, you can build production-grade voice agents.

Which AI models can I use with SIPHON?

You have total freedom. SIPHON supports OpenAI, Anthropic, Gemini, Groq, DeepSeek, Cerebras, and open-source models. You can swap providers with a single config change - no vendor lock-in.

How does SIPHON handle latency?

SIPHON is optimized for sub-500ms latency using WebRTC (LiveKit). By running on your infrastructure and choosing fast models (Groq, Cerebras, GPT-4o), you achieve natural conversation speeds without platform routing overhead.

Can I scale SIPHON to thousands of calls?

Yes. SIPHON is built on LiveKit, which powers massive-scale streaming. The worker architecture allows horizontal scaling to handle any call volume on your infrastructure.

Is SIPHON HIPAA/SOC2 compliant?

SIPHON is a framework that runs on your infrastructure. Compliance depends on your deployment. Since you control all data and infrastructure, you can deploy SIPHON in HIPAA-compliant or air-gapped environments - something not possible with managed platforms.

Voice & Turn-Taking Settings

These settings control how aggressively the agent detects user speech and decides when to respond.

Interruptions

allow_interruptions
- Allows the user to interrupt the agent mid-speech.
min_interruption_duration
- Minimum duration of detected user speech to count as an interruption.

Defaults:

allow_interruptions=True
min_interruption_duration=0.08

Turn-taking / endpointing

These parameters control Voice Activity Detection (VAD) and determine when the agent decides the user has finished speaking:

min_silence_duration - How long (in seconds) the user must be silent before the system considers they might be done speaking.
activation_threshold - Confidence level (0.0-1.0) required to detect speech. Higher values reduce false positives from background noise.
prefix_padding_duration - Amount of audio (in seconds) to capture before detected speech starts. Helps catch the beginning of words.
min_endpointing_delay - Minimum time (in seconds) to wait before ending the user's turn.
max_endpointing_delay - Maximum time (in seconds) to wait before forcefully ending the turn.

Defaults:

min_silence_duration=2.0
activation_threshold=0.4
prefix_padding_duration=0.5
min_endpointing_delay=0.45
max_endpointing_delay=3.0

These defaults are optimized to handle natural conversation pauses and prevent the agent from cutting off users who are thinking or formulating their thoughts.

Practical guidance

If the agent responds too quickly: Increase min_silence_duration or max_endpointing_delay.
If the agent cuts off users mid-thought: Increase min_endpointing_delay and max_endpointing_delay.
If users experience silence detection issues: Check if the stream is closing prematurely by increasing max_endpointing_delay to 4.0-5.0 seconds.
If the agent interrupts users too easily: Reduce allow_interruptions or increase min_interruption_duration.
If speech is missed at the beginning: Increase prefix_padding_duration.
For noisy environments: Increase activation_threshold to 0.65-0.7 to reduce false speech detection.
For users who speak slowly or thoughtfully: Increase both min_silence_duration (to 1.0-1.2) and max_endpointing_delay (to 4.0-5.0).