Bridging Worlds: Building a SIP-to-WebRTC Gateway with Python and Drachtio
Source: Dev.to
Source: Dev.to
The Billion‑Dollar Bridge
In the rush toward AI agents and browser‑based communication, it’s easy to forget where the money is. It isn’t just in peer‑to‑peer video; it’s in the Public Switched Telephone Network (PSTN). Every major cloud contact‑center, conferencing platform, and voice‑AI startup eventually faces the same requirement:
- “We need to let users dial in from a phone.”
- “Our AI agent needs to call a customer’s mobile.”
This requires bridging SIP (Session Initiation Protocol)—the 1990s‑era standard that powers the telecom world—with WebRTC, the modern browser standard.
While they share common ancestors (SDP, RTP), they are practically alien to each other. SIP is text‑based, transactional, and runs over UDP/TCP. WebRTC is event‑based, encrypted (DTLS‑SRTP), and runs over ICE/WebSockets. Building a gateway to bridge them is one of the hardest infrastructure challenges in real‑time engineering.
For a more detailed explanation, check my YouTube channel: The Lalit Official.
(A meme for making your mood light and enhance your humour.)
SIP vs. WebRTC – A Quick Comparison
| Aspect | SIP | WebRTC |
|---|---|---|
| Signaling | Transactional (INVITE → 100 Trying → 180 Ringing → 200 OK). Handles retransmissions, hop‑by‑hop routing, and Via‑header manipulation. | Not defined by the spec – typically a JSON payload sent over a WebSocket. |
| Security | Legacy trunks often send plain‑text SIP and unencrypted RTP (e.g., G.711). | Mandatory encryption: DTLS for key exchange and SRTP for media. Browsers reject plain RTP. |
| NAT Traversal | Assumes relatively static IPs or simple NATs. | Designed for hostile networks; requires ICE (STUN/TURN) to punch holes. |
| Transport | UDP/TCP with text‑based messages. | UDP/TLS (DTLS‑SRTP) for media; WebSocket/HTTP for signaling. |
Note: Trying to terminate a SIP trunk directly in a Python script using a raw socket means re‑implementing the entire SIP transaction state machine and header parsing logic. In practice, you should use a dedicated SIP stack or library rather than building it from scratch.
Choosing a SIP Stack
Drachtio (Node.js)
- Core – High‑performance C++ parser.
- API – Node.js signaling‑resource framework (
drachtio‑srf).
For a Python‑centric team, using a Node.js tool may feel counter‑intuitive, but the Python ecosystem lacks a SIP stack with Drachtio’s maturity or Kamailio’s raw power.
Kamailio (C) + KEMI (Python)
- Performance – Tens of thousands of calls per second.
- Learning curve – Steep; you must master SIP routing blocks, memory management, and C‑style configuration syntax.
- Debugging – Embedded Python crashes can be hard to trace.
| Feature | Drachtio | Kamailio + KEMI |
|---|---|---|
| Performance | Thousands of calls / sec | Tens of thousands of calls / sec |
| Developer friendliness | Very high (Node.js API) | Low (C‑style config) |
| Time‑to‑market | Fast | Longer |
| Scale | Sufficient for most WebRTC gateways | Carrier‑grade switching |
Bottom line: For most modern WebRTC gateways (cloud PBX, AI voice agents), Drachtio offers a quicker time‑to‑market with ample scale. Kamailio is best suited for massive carrier‑grade deployments.
The Sidecar Pattern
+----------------+ +-------------------+ +-----------------+
| Drachtio | HTTP | Flask / Quart | Redis | Browser (WebRTC)|
| (SIP Edge) |<-------->| (Business Logic) |<-------->| Agent UI |
+----------------+ +-------------------+ +-----------------+
- Drachtio handles low‑level SIP “noise”: parsing headers, managing transaction timers, keep‑alives.
- Flask/Quart (or any Python web framework) decides what to do: “Is this user active?”, “Which AI agent should handle this call?”, “Record this call?”.
- The two services communicate via high‑speed HTTP webhooks or a shared Redis bus.
Flow: Drachtio receives the INVITE, pauses processing, asks Python what to do, then executes the signaling instruction.
Call Flow Example
-
Incoming SIP INVITE (via a SIP trunk) arrives at Drachtio (listening on port 5060).
INVITE sip:+15550199@sip.myapp.com SIP/2.0 -
Drachtio parses the INVITE and fires a webhook (or Redis message) to the Flask app:
POST /webhook/voice/incoming HTTP/1.1 Content-Type: application/json { "caller": "+1234567890", "callee": "+15550199", "call_id": "a1b2c3d4" } -
Python app checks the database: “Is +15550199 assigned to an active agent?” → finds
agent‑42online. -
Python responds to Drachtio: “Bridge this call to the WebRTC session for agent‑42.”
-
Media translation
- SIP trunk offers G.711 (PCMU) over RTP.
- Browser requires Opus over SRTP.
Drachtio commands RTPEngine (a kernel‑space media proxy) to allocate endpoints:
Side A (SIP) → IP 1.2.3.4, Codec PCMU, Proto RTP/AVP Side B (WebRTC)→ IP 5.6.7.8, Codec Opus, Proto UDP/TLS/RTP/SAVPF (DTLS‑SRTP)RTPEngine performs real‑time transcoding and terminates encryption.
-
Application state remains in Python, while Drachtio manages the SIP state machine.
Flask Route – Conceptual Example
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/hooks/sip-invite', methods=['POST'])
def handle_sip_invite():
data = request.json
from_number = data['sip']['from']
to_number = data['sip']['to']
# 1️⃣ Lookup the WebRTC user
agent = user_repo.find_agent_by_did(to_number)
if not agent or not agent.is_online:
return jsonify({
"action": "reject",
"code": 480,
"reason": "Temporarily Unavailable"
})
# 2️⃣ (Optional) Get media parameters (SDP) from RTPEngine
# In a real app Drachtio handles the RTPEngine interaction,
# but Python may dictate codec policy.
# 3️⃣ Notify the browser via WebSocket
socket_manager.emit(agent.id, 'incoming_call', {
'caller': from_number,
'sdp': data.get('sdp') # forward SDP if needed
})
# 4️⃣ Tell Drachtio to bridge the call
return jsonify({
"action": "bridge",
"target": agent.webrtc_endpoint,
"codec": "opus"
})
While Drachtio manages the SIP state machine, your Python code manages the application logic.
TL;DR
- SIP – legacy, transactional, often unencrypted.
- WebRTC – modern, encrypted, ICE‑driven.
- Bridging – requires a SIP stack (Drachtio or Kamailio) + a media proxy (RTPEngine) + application logic (Python).
- Sidecar pattern – gives you the best of both worlds: high‑performance SIP handling + flexible Python business logic.
Happy bridging! 🚀
# Return the transcodable SDP from RTPEngine
return jsonify({'sdp': data['sdp']})
# Example of a simple ringing response
return jsonify({"action": "ringing"})
Note: Python never touches a SIP header or a raw UDP packet.
It works with high‑level concepts: Users, Status, and Signaling instructions.
You cannot build a production SIP‑to‑WebRTC gateway without a specialized media proxy. RTPEngine is the industry‑standard because it operates in kernel space for packet forwarding, minimizing latency and jitter.
RTPEngine responsibilities in this architecture
- ICE termination – Acts as a “lite” ICE server, allowing the browser to connect even when behind NAT.
- DTLS handshake – Performs the cryptographic handshake with the browser to establish SRTP keys.
- Transcoding – Converts the 8 kHz G.711 stream from the PSTN into the 48 kHz Opus stream for the browser (and vice‑versa).
- RTCP feedback – Generates the WebRTC keep‑alives that browsers expect, which plain SIP trunks do not provide.
Building a SIP gateway used to require deep C++ knowledge and months of debugging race conditions. By leveraging the Sidecar Pattern with Drachtio and Python, you encapsulate that complexity:
- Drachtio handles the rigid, archaic rules of SIP.
- RTPEngine handles the heavy lifting of media encryption and transcoding.
- Your Python backend stays clean, modern, and focused on what matters: the experience of the user on the other end of the line.