SIPHON: The Telephony-First Framework for Calling AI

Building an AI that can chat via text is now a solved problem. Building an AI that can handle a real phone call in production is still a nightmare of fragmented infrastructure.
Developers typically face a "plumbing" crisis: they must manually stitch together SIP trunks, manage real-time media buffers, handle signal interruptions, and wire up multiple AI providers—all while trying to keep latency low enough for a human conversation.
SIPHON was built to end the plumbing. It is an open-source Python framework that provides a unified, telephony-first abstraction for building and operating Calling AI agents.
The Problem: The Telephony Gap
Traditional IVR systems and contact center platforms were never designed for the era of Large Language Models (LLMs). When developers try to build modern calling agents, they encounter three primary friction points:
- Complexity: Each project requires re-implementing SIP trunk provisioning, room orchestration, and VAD (Voice Activity Detection) tuning.
- Vendor Lock-in: Most "Voice AI" platforms tie you to a single stack, making it impossible to swap LLM or TTS providers as technology evolves.
- Operation Hurdles: Features like call recording, metadata persistence, and transcription handling are often treated as afterthoughts rather than first-class features.
The Solution: A Unified Calling Abstraction
SIPHON sits between the telephony world and the AI world, providing a coherent framework for developers to build production-ready agents in ~30 lines of code.
How it Works: The SIPHON Architecture
Unlike generic voice frameworks, SIPHON is built on top of LiveKit, leveraging its real-time media and SIP layer to ensure high reliability.

The diagram illustrates how SIPHON orchestrates the entire call lifecycle:
- PSTN/SIP Provider ↔ LiveKit SIP Domain (Real-time media)
- LiveKit Rooms ↔ SIPHON Agent Worker (Orchestration)
- SIPHON Worker ↔ AI Providers (OpenAI, Deepgram, Cartesia, etc.)
The framework is divided into two core modules:
siphon.telephony: Handles inboundDispatchrules and outboundCallinitiation, allowing you to bind phone numbers to agents programmatically.siphon.agent: The runner that manages the LiveKit Agent worker, entrypoint orchestration, and the dynamic construction of AI components.
Key Pillars of SIPHON
1. Production-Ready at Scale
SIPHON is not a "toy" framework. It is designed for horizontal scalability; you can run your agent on one server or a thousand, and the architecture automatically balances the load.
2. Sub-500ms Latency
By utilizing WebRTC for real-time media interactions, SIPHON ensures voice interactions feel natural. It manages audio packet loss and interruptions to maintain a human-like flow.
3. Total Vendor Flexibility
Through its plugin architecture, SIPHON is agnostic to your AI stack. You can swap between OpenAI, Gemini, Deepgram, or ElevenLabs with minimal code changes.
4. Integrated Data Persistence
Recording, transcription, and metadata persistence are enabled via simple environment flags. Whether you need to save to S3, Postgres, or Redis, SIPHON handles the storage logic for you.
Getting Started
SIPHON is now open-source under the Apache 2.0 license.
Developers can install the framework and spin up a functioning agent worker in minutes:
pip install siphon-ai
Whether you are building an AI receptionist, an outbound notification system, or a contact-center triage bot, SIPHON provides the infrastructure to get you to production faster.
Stop building the plumbing. Start building the agent.
📚 Read the Documentation | ⭐ Star us on GitHub
A BLACKDWARF Initiative.
