Test
Source: Dev.to
DTMF Hand‑Raise System – Corrected technical research: feasibility issues with the original approach, verified working patterns, 5 alternative architectures, and an implementation‑recommendation matrix for the muted‑conference hand‑raise use case.
Participants: 5–7 callers + 1 host
Core Issue: Gather cannot wrap “
Recommended Production Path: Media Streams + DTMF Detection
System Feasibility: ✓ Still Fully Feasible
01 ❌ Root Problem
Twilio’s TwiML specification does NOT allow DTMF detection while a caller is inside a “. Any keypad digits pressed by a participant are transmitted as audio tones to the conference room – no server‑side webhook fires. There is no native event for this.
The original document proposed nesting inside. This is structurally invalid. The Twilio TwiML schema enforces strict parent‑child rules:
| TwiML Verb | Valid Children | Notes |
|---|---|---|
| “ | , , “ | Collects DTMF or speech input |
| “ | , , , , “ | is a noun inside |
| “ | — | No children. Cannot be inside “ |
The Twilio Node.js SDK enforces this schema. Calling .conference() on a Gather or VoiceResponse object throws immediately:
// TypeError: twiml.conference is not a function
⚠️ statusCallbackEvent Correction
The statusCallbackEvent parameter does NOT include an “unmute” event. Valid values are:
start, end, join, leave, mute, hold, modify, speaker, announcement
Twilio fires the participant-mute event for both mute and unmute actions. The Muted field in the webhook body (true or false) tells you which occurred.
if (StatusCallbackEvent === 'participant-mute') {
// Muted === true → mute
// Muted === false → unmute
res.sendStatus(200);
}
02 Proof‑of‑Concept (POC) – Two Approaches that Respect TwiML Constraints
Approach A – Gather‑Before‑Conference
- Participant gets a short “ window before joining the conference.
- If they press
*1during this window, the hand‑raise is registered before entry.
Pros:
- ✅ Zero audio interruption
- ⚡ One‑time only
Approach B – REST API Call Redirect (Host‑Initiated)
- Host triggers a mid‑conference DTMF prompt via the dashboard.
- Backend calls
twilioClient.calls(callSid).update({ url: gatherUrl }), temporarily pulling the participant out to collect a keypress, then returning them.
Pros:
- ✅ Mid‑call capable
- ⚡ Host‑initiated
twiml.say({ voice: 'Polly.Joanna' }, 'Press *1 to raise hand.');
// Gather window — participant can press *1 NOW
// Fall‑through: no key pressed → enter conference muted
twiml.redirect(`${BASE_URL}/webhooks/conference`);
twiml.play(`${BASE_URL}/hold-music`);
res.type('text/xml').send(twiml.toString());
// Handle pre‑join keypress
if (Digits === '*1') {
// Either way, enter the conference
res.type('text/xml').send(twiml.toString());
}
The dashboard shows a “Prompt Hand Raise” button per participant. When clicked, the backend redirects that caller’s active call to a gather TwiML page, collects the response, then returns them to the conference.
// /voice/gather-hand-raise – the gather prompt TwiML page
const gather = twiml.gather({
action: `${BASE_URL}/handle-hand-raise`,
method: 'POST',
timeout: 5,
numDigits: 2,
});
gather.say('Press *1 to raise your hand.');
twiml.redirect(`${BASE_URL}/webhooks/conference`);
res.type('text/xml').send(twiml.toString());
// Process their response
if (Digits === '*1') {
const dial = twiml.dial();
dial.conference('myConference', { muted: true });
}
ℹ️ Trade‑off of Pattern B
- The participant is briefly disconnected from conference audio (~3‑5 s) while the gather prompt plays.
- This is host‑initiated – the participant cannot trigger it themselves from inside the conference.
Pattern B is a usable POC solution, but the production path should upgrade to Media Streams (Section 03) for true participant‑initiated hand‑raises.
03 Alternative 1 – Media Streams + Goertzel (Production‑Grade)
Goal: Detect DTMF tones in real‑time without ever pulling the participant out of the conference.
How It Works
- Twilio Media Streams sends a raw audio stream (8 kHz µ‑law) from each participant’s call to a WebSocket endpoint on your server.
- Your server runs a Goertzel‑based DTMF detector on the incoming audio.
- When a key is pressed, the tone is detected server‑side and a hand‑raise event is generated.
Participant presses *1
↓
Phone keypad → Twilio streams audio (8 kHz µ-law) via WebSocket
↓
Your WS server → Goertzel detector
↓
Hand‑raise event → Dashboard notified
Sample WebSocket Server (Node.js)
const WebSocket = require('ws');
const mediaWss = new WebSocket.Server({ path: '/media-stream', server });
mediaWss.on('connection', (ws) => {
let callSid = null;
ws.on('message', (msg) => {
const data = JSON.parse(msg);
if (data.event === 'start') {
callSid = data.start.callSid; // map stream → Call SID
}
if (data.event === 'media') {
// Decode base64 µ-law audio payload
const audio = Buffer.from(data.media.payload, 'base64');
// Run Goertzel DTMF detector on this audio chunk
const digit = detector.detect(audio);
if (digit) {
handleDTMFDigit(callSid, digit);
}
}
if (data.event === 'stop') {
detector.reset();
}
});
});
async function handleDTMFDigit(callSid, digit) {
// Your business logic – e.g., flag hand‑raise in DB, push to dashboard, etc.
}
DTMF‑Detection Libraries
| Library | Language | Notes |
|---|---|---|
node-dtmf | Node.js | Simple Goertzel implementation for µ‑law audio |
goertzel-js | Node.js | Low‑level Goertzel filter; you must map DTMF frequencies |
dtmf-decoder | Python | Good if your backend is Python/FastAPI |
librosa + custom | Python | Overkill but very accurate |
Pros & Cons
| ✅ Pros | ❌ Cons |
|---|---|
| Participant stays in conference – zero audio interruption | Requires a WebSocket server to receive audio |
| True participant‑initiated hand‑raises at any time | Need a DTMF detection library + Goertzel implementation |
| No additional Twilio cost – Media Streams included in Voice | Slightly higher engineering effort |
| Works with any keypad digit combination you define |
04 Alternative 2 – Twilio Flex / TaskRouter
(Details omitted for brevity – see original document for full description.)
05 Alternative 3 – Twilio Sync State
(Details omitted for brevity – see original document for full description.)
06 Alternative 4 – Conference Hold + Gather
(Details omitted for brevity – see original document for full description.)
07 Alternative 5 – Dual‑Channel (Phone + Web)
(Details omitted for brevity – see original document for full description.)
08 Decision Matrix & Recommendation
| Architecture | Audio Interruption | Participant‑Initiated | Implementation Effort | Cost |
|---|---|---|---|---|
| Gather‑Before‑Conference | ✅ None | ❌ No (pre‑join only) | Low | Free |
| Host‑Initiated Redirect | ⚠️ Brief pause | ❌ No (host‑only) | Medium | Free |
| Media Streams + Goertzel | ✅ None | ✅ Yes | High (WebSocket + DSP) | Free (Voice‑only) |
| Flex / TaskRouter | Varies | Varies | High | Paid (Flex) |
| Sync State | Varies | Varies | Medium | Paid (Sync) |
| Conference Hold + Gather | ⚠️ Pause | ❌ Host‑only | Medium | Free |
| Dual‑Channel | ✅ None | ✅ Yes (Web) | High | Paid (Web UI) |
Recommendation:
For a production‑grade, zero‑interruption, participant‑initiated hand‑raise experience, adopt Alternative 1 – Media Streams + Goertzel. Use the Gather‑Before‑Conference pattern only as a quick fallback for pre‑join hand‑raises.
On
- Higher server resource usage (audio processing per participant)
- Multi‑digit debouncing logic needed (*1 = two signals)