使用 VAPI 实现实时流媒体：构建语音应用

发布: 1周前 (2025年12月11日 GMT+8 06:26)

6 min read

Source: Dev.to

TL;DR

大多数语音应用在网络抖动超过 200 ms 或用户在句子中途打断时会崩溃。本文档展示了如何使用 VAPI 的 WebRTC 语音集成和 Twilio 通话路由，构建生产级的流式语音应用。你将学习实时音频处理、实现正确的抢话检测以及在没有竞争条件的情况下管理会话状态。结果： 响应延迟低于 500 ms，且能优雅地处理中断。

API Access & Authentication

VAPI API key – 从 dashboard.vapi.ai 获取
Twilio Account SID 和 Auth Token – 在 Twilio 控制台获取
Twilio phone number – 已启用语音功能的号码

Development Environment

Node.js 18+（需要原生 fetch 支持以调用流式 API）
用于 webhook 的公网 HTTPS 端点（如 ngrok、Railway 或生产域名）
有效的 SSL 证书（WebRTC 语音集成的强制要求）

Network Requirements

出站 HTTPS（443 端口）用于 VAPI/Twilio API 调用
入站 webhook 接收器必须在 5 s 超时内响应
支持 WebSocket 的实时语音流连接

Technical Knowledge

Async/await 模式（流式音频处理是非阻塞的）
webhook 签名验证（安全不可忽视）
基本 PCM 音频格式（16 kHz、16‑bit）用于语音应用

Cost Awareness

VAPI 按语音流分钟计费
Twilio 按通话 + 每分钟使用量计费，用于交互式语音响应（IVR）系统

Streaming Implementation Details

大多数流式实现失败是因为把 VAPI 当作传统的 REST API 来使用。VAPI 需要一个 有状态的 WebSocket 来承载双向音频流。

// Server‑side assistant configuration – production grade
const assistantConfig = {
  model: {
    provider: "openai",
    model: "gpt-4",
    temperature: 0.7,
    messages: [
      {
        role: "system",
        content: "You are a voice assistant. Keep responses under 2 sentences."
      }
    ]
  },
  voice: {
    provider: "11labs",
    voiceId: "21m00Tcm4TlvDq8ikWAM",
    stability: 0.5,
    similarityBoost: 0.75
  },
  transcriber: {
    provider: "deepgram",
    model: "nova-2",
    language: "en-US"
  },
  firstMessage: "How can I help you today?",
  endCallMessage: "Thanks for calling. Goodbye.",
  recordingEnabled: true
};

Note: The transcriber config is critical. Default models add 200‑400 ms latency; Deepgram’s nova-2 reduces this to 80‑120 ms at a higher cost.

Architecture diagram

flowchart LR
    A[User Browser] -->|WebSocket| B[VAPI SDK]
    B -->|Audio Stream| C[VAPI Platform]
    C -->|STT| D[Deepgram]
    C -->|LLM| E[OpenAI]
    C -->|TTS| F[ElevenLabs]
    C -->|Events| G[Your Webhook Server]
    G -->|Function Results| C

音频在 VAPI 平台内部流动，不经过你的后端。代理音频会导致 500 ms+ 的延迟并破坏流式传输。

Client‑Side Setup

import Vapi from "@vapi-ai/web";

const vapi = new Vapi(process.env.VAPI_PUBLIC_KEY);

// Set up event handlers **before** starting the stream
vapi.on("call-start", () => {
  console.log("Stream active");
  isProcessing = false; // reset race‑condition guard
});

vapi.on("speech-start", () => {
  console.log("User speaking – cancel any queued TTS");
});

vapi.on("message", (message) => {
  if (message.type === "transcript" && message.transcriptType === "partial") {
    // Show live transcription – do NOT act on it yet
    updateUI(message.transcript);
  }
});

vapi.on("error", (error) => {
  console.error("Stream error:", error);
  // Implement retry logic for mobile network drops
});

// Start the streaming call
await vapi.start(assistantConfig);

Race‑condition warning: 只处理 transcriptType === "final"，以避免重复的 LLM 请求。

Server‑Side Webhook

const express = require('express');
const crypto = require('crypto');
const app = express();

app.use(express.json());

// Validate webhook signature – mandatory
function validateSignature(req) {
  const signature = req.headers['x-vapi-signature'];
  const payload = JSON.stringify(req.body);
  const hash = crypto
    .createHmac('sha256', process.env.VAPI_SERVER_SECRET)
    .update(payload)
    .digest('hex');
  return signature === hash;
}

app.post('/webhook/vapi', async (req, res) => {
  if (!validateSignature(req)) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  const { message } = req.body;

  // Handle function calls from the assistant
  if (message.type === 'function-call') {
    const { functionCall } = message;
    try {
      // **Timeout trap:** VAPI expects a response within 5 seconds.
      // If a function needs more time, return immediately and use a callback mechanism;
      // otherwise the call drops.
    } catch (err) {
      console.error('Function call error:', err);
    }
  }

  res.status(200).json({ received: true });
});

Audio Processing Pipeline

graph LR
    Mic[Microphone Input] --> AudioBuf[Audio Buffering]
    AudioBuf --> VAD[Voice Activity Detection]
    VAD -->|Detected| STT[Speech‑to‑Text]
    VAD -->|Not Detected| Error[Error Handling]
    STT --> NLU[Intent Recognition]
    NLU --> API[API Integration]
    API --> LLM[Response Generation]
    LLM --> TTS[Text‑to‑Speech]
    TTS --> Speaker[Speaker Output]
    Error -->|Retry| AudioBuf
    Error -->|Fail| Speaker

Local Testing

Run your server and expose it

# Terminal 1 – start the webhook server
node server.js

# Terminal 2 – expose via ngrok
ngrok http 3000

# Terminal 3 – forward VAPI webhooks to the public URL
vapi webhooks forward https://.ngrok.io/webhook/vapi

Add debug logging to the webhook

app.post('/webhook', (req, res) => {
  const { message } = req.body;

  console.log('Event received:', {
    type: message.type,
    timestamp: new Date().toISOString(),
    callId: message.call?.id,
    payload: JSON.stringify(message, null, 2)
  });

  // Validate signature before processing
  const isValid = validateSignature(req);
  if (!isValid) {
    console.error('Invalid signature – potential security issue');
    return res.status(401).json({ error: 'Invalid signature' });
  }

  res.status(200).json({ received: true });
});

Verify signature validation with `curl`

# Expected to fail with 401 Unauthorized
curl -X POST http://localhost:3000/webhook \
  -H "Content-Type: application/json" \
  -H "x-vapi-signature: invalid_signature" \
  -d '{"message":{"type":"status-update"}}'

监控响应时间以保持在 5 s webhook 超时以内，并记录所有签名验证失败——它们通常表明配置不匹配或重放攻击。

使用这些模式后，你就可以部署一个稳健、低延迟的流式语音应用，能够优雅地处理中断和网络波动。

使用 VAPI 实现实时流媒体：构建语音应用

TL;DR

API Access & Authentication

Development Environment

Network Requirements

Technical Knowledge

Cost Awareness

Streaming Implementation Details

Architecture diagram

Client‑Side Setup

Server‑Side Webhook

Audio Processing Pipeline

Local Testing

Run your server and expose it

Add debug logging to the webhook

Verify signature validation with `curl`

相关文章

我们发现我们的网站在新加坡很慢，但在欧洲却很完美——原因如下

我把Game Boy放进ChatGPT（ChatGPT Apps）

使用 Microsoft Planner 的营销经理的一天

spaceorbust – 终端RPG，GitHub提交驱动太空文明

TL;DR

API Access & Authentication

Development Environment

Network Requirements

Technical Knowledge

Cost Awareness

Streaming Implementation Details

Architecture diagram

Client‑Side Setup

Server‑Side Webhook

Audio Processing Pipeline

Local Testing

Run your server and expose it

Add debug logging to the webhook

Verify signature validation with curl

相关文章

我们发现我们的网站在新加坡很慢，但在欧洲却很完美——原因如下

我把Game Boy放进ChatGPT（ChatGPT Apps）

使用 Microsoft Planner 的营销经理的一天

spaceorbust – 终端RPG，GitHub提交驱动太空文明

Verify signature validation with `curl`