Test

Published: (March 8, 2026 at 06:04 PM EDT)
6 min read
Source: Dev.to

Source: Dev.to

DTMF Hand‑Raise System – Corrected technical research: feasibility issues with the original approach, verified working patterns, 5 alternative architectures, and an implementation‑recommendation matrix for the muted‑conference hand‑raise use case.

Participants: 5–7 callers + 1 host
Core Issue: Gather cannot wrap “
Recommended Production Path: Media Streams + DTMF Detection
System Feasibility: ✓ Still Fully Feasible

01 ❌ Root Problem

Twilio’s TwiML specification does NOT allow DTMF detection while a caller is inside a “. Any keypad digits pressed by a participant are transmitted as audio tones to the conference room – no server‑side webhook fires. There is no native event for this.

The original document proposed nesting inside. This is structurally invalid. The Twilio TwiML schema enforces strict parent‑child rules:

TwiML VerbValid ChildrenNotes
, , “Collects DTMF or speech input
, , , , “is a noun inside
No children. Cannot be inside “

The Twilio Node.js SDK enforces this schema. Calling .conference() on a Gather or VoiceResponse object throws immediately:

// TypeError: twiml.conference is not a function

⚠️ statusCallbackEvent Correction

The statusCallbackEvent parameter does NOT include an “unmute” event. Valid values are:

start, end, join, leave, mute, hold, modify, speaker, announcement

Twilio fires the participant-mute event for both mute and unmute actions. The Muted field in the webhook body (true or false) tells you which occurred.

if (StatusCallbackEvent === 'participant-mute') {
  // Muted === true  → mute
  // Muted === false → unmute
  res.sendStatus(200);
}

02 Proof‑of‑Concept (POC) – Two Approaches that Respect TwiML Constraints

Approach A – Gather‑Before‑Conference

  1. Participant gets a short “ window before joining the conference.
  2. If they press *1 during this window, the hand‑raise is registered before entry.

Pros:

  • ✅ Zero audio interruption
  • ⚡ One‑time only

Approach B – REST API Call Redirect (Host‑Initiated)

  1. Host triggers a mid‑conference DTMF prompt via the dashboard.
  2. Backend calls twilioClient.calls(callSid).update({ url: gatherUrl }), temporarily pulling the participant out to collect a keypress, then returning them.

Pros:

  • ✅ Mid‑call capable
  • ⚡ Host‑initiated
twiml.say({ voice: 'Polly.Joanna' }, 'Press *1 to raise hand.');
// Gather window — participant can press *1 NOW
// Fall‑through: no key pressed → enter conference muted
twiml.redirect(`${BASE_URL}/webhooks/conference`);
twiml.play(`${BASE_URL}/hold-music`);
res.type('text/xml').send(twiml.toString());

// Handle pre‑join keypress
if (Digits === '*1') {
  // Either way, enter the conference
  res.type('text/xml').send(twiml.toString());
}

The dashboard shows a “Prompt Hand Raise” button per participant. When clicked, the backend redirects that caller’s active call to a gather TwiML page, collects the response, then returns them to the conference.

// /voice/gather-hand-raise – the gather prompt TwiML page
const gather = twiml.gather({
  action: `${BASE_URL}/handle-hand-raise`,
  method: 'POST',
  timeout: 5,
  numDigits: 2,
});
gather.say('Press *1 to raise your hand.');
twiml.redirect(`${BASE_URL}/webhooks/conference`);
res.type('text/xml').send(twiml.toString());

// Process their response
if (Digits === '*1') {
  const dial = twiml.dial();
  dial.conference('myConference', { muted: true });
}

ℹ️ Trade‑off of Pattern B

  • The participant is briefly disconnected from conference audio (~3‑5 s) while the gather prompt plays.
  • This is host‑initiated – the participant cannot trigger it themselves from inside the conference.

Pattern B is a usable POC solution, but the production path should upgrade to Media Streams (Section 03) for true participant‑initiated hand‑raises.

03 Alternative 1 – Media Streams + Goertzel (Production‑Grade)

Goal: Detect DTMF tones in real‑time without ever pulling the participant out of the conference.

How It Works

  1. Twilio Media Streams sends a raw audio stream (8 kHz µ‑law) from each participant’s call to a WebSocket endpoint on your server.
  2. Your server runs a Goertzel‑based DTMF detector on the incoming audio.
  3. When a key is pressed, the tone is detected server‑side and a hand‑raise event is generated.
Participant presses *1

Phone keypad → Twilio streams audio (8 kHz µ-law) via WebSocket

Your WS server → Goertzel detector

Hand‑raise event → Dashboard notified

Sample WebSocket Server (Node.js)

const WebSocket = require('ws');
const mediaWss = new WebSocket.Server({ path: '/media-stream', server });

mediaWss.on('connection', (ws) => {
  let callSid = null;

  ws.on('message', (msg) => {
    const data = JSON.parse(msg);

    if (data.event === 'start') {
      callSid = data.start.callSid; // map stream → Call SID
    }

    if (data.event === 'media') {
      // Decode base64 µ-law audio payload
      const audio = Buffer.from(data.media.payload, 'base64');

      // Run Goertzel DTMF detector on this audio chunk
      const digit = detector.detect(audio);

      if (digit) {
        handleDTMFDigit(callSid, digit);
      }
    }

    if (data.event === 'stop') {
      detector.reset();
    }
  });
});

async function handleDTMFDigit(callSid, digit) {
  // Your business logic – e.g., flag hand‑raise in DB, push to dashboard, etc.
}

DTMF‑Detection Libraries

LibraryLanguageNotes
node-dtmfNode.jsSimple Goertzel implementation for µ‑law audio
goertzel-jsNode.jsLow‑level Goertzel filter; you must map DTMF frequencies
dtmf-decoderPythonGood if your backend is Python/FastAPI
librosa + customPythonOverkill but very accurate

Pros & Cons

✅ Pros❌ Cons
Participant stays in conference – zero audio interruptionRequires a WebSocket server to receive audio
True participant‑initiated hand‑raises at any timeNeed a DTMF detection library + Goertzel implementation
No additional Twilio cost – Media Streams included in VoiceSlightly higher engineering effort
Works with any keypad digit combination you define

04 Alternative 2 – Twilio Flex / TaskRouter

(Details omitted for brevity – see original document for full description.)

05 Alternative 3 – Twilio Sync State

(Details omitted for brevity – see original document for full description.)

06 Alternative 4 – Conference Hold + Gather

(Details omitted for brevity – see original document for full description.)

07 Alternative 5 – Dual‑Channel (Phone + Web)

(Details omitted for brevity – see original document for full description.)

08 Decision Matrix & Recommendation

ArchitectureAudio InterruptionParticipant‑InitiatedImplementation EffortCost
Gather‑Before‑Conference✅ None❌ No (pre‑join only)LowFree
Host‑Initiated Redirect⚠️ Brief pause❌ No (host‑only)MediumFree
Media Streams + Goertzel✅ None✅ YesHigh (WebSocket + DSP)Free (Voice‑only)
Flex / TaskRouterVariesVariesHighPaid (Flex)
Sync StateVariesVariesMediumPaid (Sync)
Conference Hold + Gather⚠️ Pause❌ Host‑onlyMediumFree
Dual‑Channel✅ None✅ Yes (Web)HighPaid (Web UI)

Recommendation:
For a production‑grade, zero‑interruption, participant‑initiated hand‑raise experience, adopt Alternative 1 – Media Streams + Goertzel. Use the Gather‑Before‑Conference pattern only as a quick fallback for pre‑join hand‑raises.

On

  • Higher server resource usage (audio processing per participant)
  • Multi‑digit debouncing logic needed (*1 = two signals)
0 views
Back to Blog

Related posts

Read more »