如何在 Telephony AI Agent 中启用 DTMF 事件

发布: 3周前 (2026年1月5日 GMT+8 16:49)

6 min read

原文: Dev.to

Source: Dev.to

《如何在电话 AI 代理中启用 DTMF 事件》封面图

概述

并非所有来电者都想与语音代理交谈。在许多通话场景中，用户期望通过按键来进行选择、确认操作或在通话流程中前进。这在基于菜单的系统、简短响应或语音识别可能不可靠的情况下尤为常见。

DTMF（双音多频）输入为语音代理提供了一种清晰且可预测的方式来处理这些交互。当来电者在手机上按下键时，代理会立即收到该输入，并可利用它来控制通话流程或触发应用逻辑。

在本文中，我们将探讨如何在基于 VideoSDK 的语音代理中使用 DTMF 事件，从常见的交互模式开始，逐步了解系统如何实时处理键盘输入。

工作原理

DTMF 事件检测 – 代理在通话会话期间检测来电者的按键（0–9，*，#）。
实时处理 – 每次按键会生成一个 DTMF 事件，立即传递给代理。
回调集成 – 用户自定义的回调函数处理传入的 DTMF 事件。
动作执行 – 代理根据收到的 DTMF 输入执行操作或触发工作流（例如，构建 IVR 流程、收集用户输入或调用应用逻辑）。

第 1 步 – 启用 DTMF 事件

DTMF 事件检测可以通过两种方式启用：

1️⃣ 通过仪表盘

在 VideoSDK 仪表盘中创建或编辑 SIP 网关时，勾选 DTMF 选项。

Enable DTMF in the VideoSDK dashboard

2️⃣ 通过 API

使用 API 创建或更新 SIP 网关时，将 enableDtmf 参数设为 true。

curl -H 'Authorization: $YOUR_TOKEN' \
     -H 'Content-Type: application/json' \
     -d '{
           "name": "Twilio Inbound Gateway",
           "enableDtmf": true,
           "numbers": ["+0123456789"]
         }' \
     -X POST https://api.videosdk.live/v2/sip/inbound-gateways

启用后，DTMF 事件将在所有经该网关路由的通话中被检测并发布。

要设置入站呼叫、出站呼叫以及路由规则，请查看 快速入门示例。

第2步 – 实现

from videosdk.agents import AgentSession, DTMFHandler

async def entrypoint(ctx: JobContext):

    async def dtmf_callback(digit: int):
        if digit == 1:
            agent.instructions = (
                "You are a Sales Representative. Your goal is to sell our products."
            )
            await agent.session.say(
                "Routing you to Sales. Hi, I'm from Sales. How can I help you today?"
            )
        elif digit == 2:
            agent.instructions = (
                "You are a Support Specialist. Your goal is to help customers with technical issues."
            )
            await agent.session.say(
                "Routing you to Support. Hi, I'm from Support. What issue are you facing?"
            )
        else:
            await agent.session.say(
                "Invalid input. Press 1 for Sales or 2 for Support."
            )

    dtmf_handler = DTMFHandler(dtmf_callback)

    session = AgentSession(
        dtmf_handler=dtmf_handler,
    )

完整工作示例

import logging
from videosdk.agents import (
    Agent,
    AgentSession,
    CascadingPipeline,
    ConversationFlow,
    JobContext,
    DTMFHandler,
)
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[logging.StreamHandler()],
)

# Ensure the VAD model is available locally
pre_download_model()


class VoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful voice assistant that can answer questions."
        )

    async def on_enter(self) -> None:
        await self.session.say("Hello, how can I help you today?")

    async def on_exit(self) -> None:
        await self.session.say("Goodbye!")


async def entrypoint(ctx: JobContext):
    agent = VoiceAgent()
    conversation_flow = ConversationFlow(agent)

    pipeline = CascadingPipeline(
        stt=DeepgramSTT(),
        llm=OpenAILLM(),
        tts=ElevenLabsTTS(),
        vad=SileroVAD(),
        turn_detector=TurnDetector(),
    )

    async def dtmf_callback(message):
        print("DTMF message received:", message)

    dtmf_handler = DTMFHandler(dtmf_callback)

    session = AgentSession(
        agent=agent,
        pipeline=pipeline,
        conversation_flow=conversation_flow,
        dtmf_handler=dtmf_handler,
    )

    await session.start(wait_for_participation=True)

ant=True, run_until_shutdown=True)

def make_context() -> JobContext:
    room_options = RoomOptions(name="DTMF Agent Test", playground=True)
    return JobContext(room_options=room_options) 

if __name__ == "__main__":
    job = WorkerJob(
        entrypoint=entrypoint,
        jobctx=make_context,
        options=Options(
            agent_id="YOUR_AGENT_ID",
            max_processes=2,
            register=True,
            host="localhost",
            port=8081,
        ),
    )
    job.start()

提示: 您可以在编辑器中切换全屏模式，以更好地查看代码。

Benefits of Enabling DTMF Detection

构建可预测的呼叫流程
引导用户浏览菜单
在不中断通话体验的情况下触发应用逻辑

当与语音输入结合使用时，DTMF 能让您更好地控制用户与代理的交互方式。这使得 DTMF 成为任何在通话期间需要明确、确定性用户输入的语音代理的实用补充。

资源和后续步骤

探索 dtmf‑event‑implementation‑example 获取完整代码实现。
若要设置入站呼叫、出站呼叫和路由规则，请查看 快速入门示例。
学习如何 部署您的 AI 代理。
在 VideoSDK 文档 中探索更多功能。

👉 在评论中分享您的想法、遇到的难题或成功案例，或加入我们的 Discord 社区。我们期待了解您的旅程，并帮助您构建更出色的 AI 驱动通信工具！

如何在 Telephony AI Agent 中启用 DTMF 事件

概述

工作原理

第 1 步 – 启用 DTMF 事件

1️⃣ 通过仪表盘

2️⃣ 通过 API

第2步 – 实现

完整工作示例

Benefits of Enabling DTMF Detection

资源和后续步骤

相关文章

RGB LED 支线任务 💡

Zapier vs. Custom Code：何时放弃你的‘Glue’工具

Mendex：我为何构建

为什么 Apache Ozone 是大数据的首选对象存储

概述

工作原理

第 1 步 – 启用 DTMF 事件

1️⃣ 通过仪表盘

2️⃣ 通过 API

第2步 – 实现

完整工作示例

Benefits of Enabling DTMF Detection

资源和后续步骤

相关文章

RGB LED 支线任务 💡

Zapier vs. Custom Code：何时放弃你的‘Glue’工具

Mendex：我为何构建

为什么 Apache Ozone 是大数据的首选对象存储

第 1 步 – 启用 DTMF 事件