Stop Paying for Vapi/Retell: Run your own AI Voice Agent in Python
Source: Dev.to
Building AI calling agents without commercial licenses
If you are a Python developer, you can spin up a sub‑500 ms latency voice agent on your own machine. This guide introduces Siphon, an open‑source (Apache 2.0) Python framework that bridges SIP trunks to LLMs.
Prerequisites
- Python 3.10+
- A Twilio or Telnyx SIP trunk
- LiveKit credentials
- An OpenAI API key
Step 1: Installation & Setup
Clone the Siphon repository and install the package:
pip install siphon-ai
Create a .env file in your project root with your provider keys. Since Siphon is self‑hosted, you pay providers like OpenAI and LiveKit directly—no middle‑man fees.
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_livekit_key
LIVEKIT_API_SECRET=your_livekit_secret
OPENAI_API_KEY=sk-yourkey
DEEPGRAM_API_KEY=yourkey
FROM_NUMBER=+15551234567
SIP_USERNAME=your_sip_user
SIP_PASSWORD=your_sip_pass
Step 2: Defining the Agent
Siphon abstracts away complex WebRTC pipelines and Voice Activity Detection (VAD). Define your agent using Siphon’s plugin architecture:
from siphon.agent import Agent
from siphon.plugins import openai, cartesia, deepgram
# Define the Agent
agent = Agent(
agent_name="Receptionist",
llm=openai.LLM(),
tts=cartesia.TTS(),
stt=deepgram.STT(),
system_instructions="You are a helpful dental receptionist. Help the user book an appointment."
)
Step 3: Triggering an Outbound Call
Outbound SIP signaling is straightforward. If you don’t have a trunk ID set up, you can trigger a call using SIP credentials; Siphon will reuse or create an outbound trunk as needed.
import os
from dotenv import load_dotenv
from siphon.telephony.outbound import Call
load_dotenv()
call = Call(
agent_name="Receptionist",
sip_trunk_setup={
"name": "telnyx-primary",
"sip_address": "sip.telnyx.com",
"sip_number": os.getenv("FROM_NUMBER"),
"sip_username": os.getenv("SIP_USERNAME"),
"sip_password": os.getenv("SIP_PASSWORD"),
},
number_to_call="+15550200",
)
# Execute the asynchronous dial and bridge to the LiveKit WebRTC room
call.start()
Step 4: Handling State and Interruptions
Handling interruptions (barge‑ins) is one of the hardest parts of Voice AI. Siphon leverages LiveKit’s WebRTC engine to halt TTS output instantly when human speech is detected, enabling natural, low‑latency conversations hosted entirely on your infrastructure.
Further Resources
- GitHub repository:
- Documentation:
If Siphon saves you money, consider starring the repository!