Stop Paying for Vapi/Retell: Run your own AI Voice Agent in Python

Published: (May 3, 2026 at 11:11 AM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Building AI calling agents without commercial licenses

If you are a Python developer, you can spin up a sub‑500 ms latency voice agent on your own machine. This guide introduces Siphon, an open‑source (Apache 2.0) Python framework that bridges SIP trunks to LLMs.

Prerequisites

  • Python 3.10+
  • A Twilio or Telnyx SIP trunk
  • LiveKit credentials
  • An OpenAI API key

Step 1: Installation & Setup

Clone the Siphon repository and install the package:

pip install siphon-ai

Create a .env file in your project root with your provider keys. Since Siphon is self‑hosted, you pay providers like OpenAI and LiveKit directly—no middle‑man fees.

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_key
LIVEKIT_API_SECRET=your_livekit_secret
OPENAI_API_KEY=sk-yourkey
DEEPGRAM_API_KEY=yourkey
FROM_NUMBER=+15551234567
SIP_USERNAME=your_sip_user
SIP_PASSWORD=your_sip_pass

Step 2: Defining the Agent

Siphon abstracts away complex WebRTC pipelines and Voice Activity Detection (VAD). Define your agent using Siphon’s plugin architecture:

from siphon.agent import Agent
from siphon.plugins import openai, cartesia, deepgram

# Define the Agent
agent = Agent(
    agent_name="Receptionist",
    llm=openai.LLM(),
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    system_instructions="You are a helpful dental receptionist. Help the user book an appointment."
)

Step 3: Triggering an Outbound Call

Outbound SIP signaling is straightforward. If you don’t have a trunk ID set up, you can trigger a call using SIP credentials; Siphon will reuse or create an outbound trunk as needed.

import os
from dotenv import load_dotenv
from siphon.telephony.outbound import Call

load_dotenv()

call = Call(
    agent_name="Receptionist",
    sip_trunk_setup={
        "name": "telnyx-primary",
        "sip_address": "sip.telnyx.com",
        "sip_number": os.getenv("FROM_NUMBER"),
        "sip_username": os.getenv("SIP_USERNAME"),
        "sip_password": os.getenv("SIP_PASSWORD"),
    },
    number_to_call="+15550200",
)

# Execute the asynchronous dial and bridge to the LiveKit WebRTC room
call.start()

Step 4: Handling State and Interruptions

Handling interruptions (barge‑ins) is one of the hardest parts of Voice AI. Siphon leverages LiveKit’s WebRTC engine to halt TTS output instantly when human speech is detected, enabling natural, low‑latency conversations hosted entirely on your infrastructure.

Further Resources

  • GitHub repository:
  • Documentation:

If Siphon saves you money, consider starring the repository!

0 views
Back to Blog

Related posts

Read more »

Claude Moves Fast. Codex Ships.

Summary I gave two big coding tasks to both Claude and Codex. - Claude finished in about one hour. - Codex took about eight hours. At first glance that looks l...