We Open-Sourced Our AI Calling Framework (So You Don't Waste 2-3 Months)

Published: 2 hours ago (January 17, 2026 at 05:32 AM EST)

3 min read

Source: Dev.to

Siphon

Three months.
That’s how long many teams spend building telephony infrastructure before writing a single line of actual conversation logic for an AI voice agent.

Not because the AI was hard.
Because telephony is brutal.

Today, we’re open‑sourcing the solution so you don’t have to go through the same pain.

The Hidden Problem with AI Calling Agents

Building an AI calling agent sounds straightforward:

Use an LLM
Add speech‑to‑text
Add text‑to‑speech
Connect it to a phone number

In reality, that’s where most teams hit a wall. To make real phone calls, you end up dealing with:

SIP trunks & PSTN providers
Low‑latency, bidirectional audio
Real‑time orchestration of STT, LLM, and TTS
Call state, interruptions, transfers
Scaling, monitoring, recordings, persistence

The result? Most teams spend weeks or months on infrastructure before they ever touch the conversation itself.

We did too. And eventually asked:

“Why is building voice AI still this hard?”

Introducing Siphon

Siphon is an open‑source Python framework that handles the telephony complexity for you, so you can focus on building great conversations.

Here’s what a complete AI receptionist looks like with Siphon:

from siphon.agent import Agent
from siphon.plugins import openai, cartesia, deepgram

agent = Agent(
    agent_name="receptionist",
    llm=openai.LLM(model="gpt-4"),
    tts=cartesia.TTS(voice="helpful-assistant"),
    stt=deepgram.STT(model="nova-2"),
    system_instructions="""
    You are a friendly receptionist for Acme Corp.
    Help callers schedule appointments or route them correctly.
    """
)

if __name__ == "__main__":
    agent.start()

Run this, and your agent can answer real phone calls via any SIP provider (Twilio, Telnyx, etc.).

What Siphon Handles for You

🔌 SIP & PSTN connectivity – Works with any SIP provider, no FreeSWITCH pain.
⚡ Real‑time audio pipeline – Built on LiveKit with streaming audio and sub‑500 ms voice‑to‑voice latency.
🤖 AI orchestration – Plug‑and‑play support for LLMs, STT, and TTS.

Swap providers with a single line:

llm=anthropic.LLM(model="claude-3-5-sonnet")

📈 Production‑ready by default – Auto‑scaling, call recordings, transcripts, state handling, and observability.

Quick Start

Install the package:

pip install siphon-ai

Create an agent:

from siphon.agent import Agent
from siphon.plugins import openai, cartesia, deepgram

agent = Agent(
    agent_name="my_first_agent",
    llm=openai.LLM(),
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    system_instructions="You are a helpful assistant.",
)

agent.start()

That’s it. Your agent is live and answering phone calls. (Full setup, outbound calling, and advanced examples are in the docs.)

Why We Open‑Sourced It

We could have kept Siphon proprietary or turned it into a closed SaaS, but we believe voice AI shouldn’t be locked behind massive infrastructure effort.

Siphon is:

Apache 2.0 licensed
Provider‑agnostic
Fully self‑hostable
No vendor lock‑in

Use it commercially, modify it, or build on top of it.

What You Can Build

📞 Customer support agents
📅 Appointment scheduling
💼 Sales qualification
📊 Surveys & feedback collection
🏥 Healthcare intake systems

If it involves phone calls and conversations, Siphon handles the hard parts.

Get Involved

⭐ GitHub:
📖 Docs:
🐛 Issues & feature requests welcome
🤝 PRs encouraged

We’re building Siphon in public and would love community feedback. If you’ve ever thought “I wish building AI calling agents was simpler”—give Siphon a try.

We Open-Sourced Our AI Calling Framework (So You Don't Waste 2-3 Months)

The Hidden Problem with AI Calling Agents

Introducing Siphon

What Siphon Handles for You

Quick Start

Why We Open‑Sourced It

What You Can Build

Get Involved

Related posts

Getting Started with AEM: On-Prem vs AEM Cloud (In Simple Terms)

Building a resilient, scalable AWS Lambda + S3 architecture

RSA Performance Anatomy: Why is 'Verification' Blazing Fast and 'Signing' Extremely Slow?

Enterprise-Grade Node.js with NestJS: Building Scalable Backend Architecture