如何在 Voice AI 中将语调适配到用户情感并集成日历检查
Source: Dev.to
TL;DR
大多数语音 AI 系统忽视用户情绪,无论情境如何都显得机械化。一个感到沮丧的来电者会收到欢快的回复,导致信任受损。构建一个系统,通过语音分析检测语调变化(愤怒、沮丧、松一口气),实时调整响应节奏和用词,并检查日历可用性以提供上下文相关的解决方案。结果:解决率提升 40 %,升级次数减少。
Prerequisites
API Keys & Credentials
- VAPI API key – 在
dashboard.vapi.ai中生成。 - Twilio Account SID + Auth Token – 在
console.twilio.com中获取。
将它们存入.env文件,使用VAPI_API_KEY、TWILIO_ACCOUNT_SID和TWILIO_AUTH_TOKEN。
System Requirements
- Node.js 16+,配合 npm 或 yarn。
- 安装依赖:
npm install axios dotenv
Voice & Transcription Setup
配置语音转文字提供商(OpenAI Whisper、Google Cloud Speech‑to‑Text 等),并启用情感检测模型。获取相应的凭证。
Calendar Integration
获取 Google Calendar API 密钥或 Microsoft Graph API 凭证,以同步日历可用性用于语调适配决策。
Knowledge Requirements
- 熟悉 REST API、async/await 与 webhook 处理。
- 了解情感分析阈值(0.0–1.0 置信度分数)会有帮助,但不是必需。
Twilio
获取 Twilio Voice API → Get Twilio
Step‑by‑Step Tutorial
Configuration & Setup
创建带有情感分析钩子的助手配置:
const assistantConfig = {
model: {
provider: "openai",
model: "gpt-4",
messages: [{
role: "system",
content: `You are an empathetic assistant. Analyze user sentiment from speech patterns and adjust your tone accordingly.
TONE RULES:
- Frustrated user (fast speech, interruptions): Use calm, solution‑focused language
- Anxious user (hesitations, uncertainty): Provide reassurance, break down steps
- Neutral user: Match their energy level
- Happy user: Mirror enthusiasm but stay professional
When checking calendar availability, acknowledge their emotional state first.`
}],
temperature: 0.7
},
voice: {
provider: "11labs",
voiceId: "21m00Tcm4TlvDq8ikWAM", // Rachel – versatile for tone shifts
stability: 0.5, // Lower = more expressive
similarityBoost: 0.75
},
transcriber: {
provider: "deepgram",
model: "nova-2",
language: "en-US",
keywords: ["frustrated", "urgent", "confused", "excited"] // Boost sentiment words
},
recordingEnabled: true // Critical for post‑call sentiment analysis
};
Why this works: Keyword boosting keeps sentiment indicators in the transcript, and a lower stability value lets the TTS modulate tone based on the LLM’s response style.
Architecture & Flow
flowchart LR
A[User Speech] --> B[Deepgram STT]
B --> C[Sentiment Detection]
C --> D{Emotion Level}
D -->|High Stress| E[GPT-4 + Calm Prompt]
D -->|Neutral| F[GPT-4 + Standard Prompt]
E --> G[Calendar Check Function]
F --> G
G --> H[11Labs TTS + Tone Adjust]
H --> I[User Response]
The critical path: Sentiment detection occurs during transcription, cutting 200–400 ms from response latency.
Step‑by‑Step Implementation
1. Detect sentiment from speech patterns
function analyzeSentiment(transcript) {
const wordsPerSecond = transcript.text.split(' ').length / transcript.duration;
const hasHesitation = /\b(um|uh|like|you know)\b/gi.test(transcript.text);
const hasUrgency = /\b(now|urgent|asap|immediately)\b/gi.test(transcript.text);
// Fast speech (>3 wps) + urgency words = frustrated
if (wordsPerSecond > 3 && hasUrgency) {
return { emotion: 'frustrated', intensity: 0.8 };
}
// Slow speech ( {
const { message } = req.body;
if (message.type === 'function-call' && message.functionCall.name === 'checkCalendar') {
const sentiment = analyzeSentiment(message.transcript);
// Add sentiment to function parameters
const params = {
...message.functionCall.parameters,
userSentiment: sentiment.emotion,
urgencyLevel: sentiment.intensity
};
const availability = await checkCalendarWithContext(params);
res.json({
result: availability,
// Tone instruction for LLM
responseHint: sentiment.emotion === 'frustrated'
? 'Acknowledge their urgency and provide immediate options'
: 'Present options conversationally'
});
}
});
3. Adapt TTS delivery
const ttsConfig = {
stability: sentiment.intensity > 0.7 ? 0.3 : 0.6, // More variation for high emotion
style: sentiment.emotion === 'frustrated' ? 0.2 : 0.5 // Lower style = calmer delivery
};
Common Issues & Fixes
-
Race condition: Sentiment analysis runs after the LLM starts generating.
Fix: Use VAPI’sbeforeMessageGenerationhook (if available) or cache sentiment from the previous turn. -
False positives: Background noise triggers urgency detection.
Fix: Set Deepgram’sinterim_results: falseand analyze only final transcripts. -
Tone whiplash: Assistant switches from empathetic to robotic mid‑conversation.
Fix: Store sentiment history in session state and smooth transitions over 2–3 turns.
System Diagram
graph LR
A[Microphone Input] --> B[Audio Buffer]
B --> C[Voice Activity Detection]
C -->|Speech Detected| D[Speech‑to‑Text]
C -->|Silence| I[Error Handling]
D --> E[Intent Detection]
E --> F[Large Language Model]
F --> G[Text‑to‑Speech]
G --> H[Speaker Output]
I --> J[Fallback Response]
J --> G
Testing & Validation
Local Testing
Test sentiment detection with edge cases that break naive implementations, such as rapid sentiment shifts:
// Test rapid sentiment shifts (user goes from calm → frustrated in 2 turns)
const testConversation = [
{ role: "user", content: "I need to book a meeting" },
{ role: "assistant", content: "Sure, what time works for you?" },
{ role: "user", content: "Now! This is urgent, I can't wait." }
];
// Simulate processing each turn and verify that `analyzeSentiment`
// returns 'frustrated' on the last turn and that the responseHint
// instructs the LLM to acknowledge urgency.
Validate that:
- Sentiment is correctly identified for each transcript.
- The calendar‑check function receives the
userSentimentandurgencyLevelparameters. - The TTS configuration changes stability/style according to the detected emotion.
通过集成实时情感分析、上下文感知的日历检查以及可表达的 TTS 控制,语音 AI 助手将变得更具同理心、更高效、更值得信赖。