Horizontal Scaling

Horizontal Scaling

One of the most powerful features is its ability to scale horizontally with zero configuration. Because Siphon uses a worker-based architecture, you can handle higher call volumes simply by running more copies of your agent code.

How it works

When you run agent.start(), your script acts as a Worker. It connects to the Siphon/LiveKit infrastructure and waits for jobs (calls).

  • One Worker: Can handle N concurrent calls (depending on CPU/Memory).
  • Multiple Workers: Multiply your capacity by simply starting the same script on another machine.

All workers join a shared pool. When a new call comes in, it is automatically routed to an available worker with capacity.

Scaling Out

To scale out, you do not need to change any code.

  1. Server A: Run python agent.py
  2. Server B: Run python agent.py
  3. Server C: Run python agent.py

That's it. You now have 3x the capacity. It handles the load balancing and distribution of calls across these servers automatically.

Deployment Best Practices

For production environments, we recommend:

  • Containerize your Agent: Wrap your agent.py in a Docker container.
  • Orchestration: Use Kubernetes or Docker Swarm to manage the number of replicas.
  • Auto-scaling: Configure your orchestrator to scale the number of pods based on CPU usage or custom metrics (e.g., active calls).
# Example K8s deployment snippet
apiVersion: apps/v1
kind: Deployment
metadata:
  name: siphon-worker
spec:
  replicas: 3 # Start with 3 workers
  template:
    spec:
      containers:
      - name: agent
        image: my-agent-image:latest
        envFrom:
        - secretRef:
            name: siphon-secrets