Voice AI에서 사용자 감정에 맞게 톤을 조정하고 Calendar Checks를 통합하는 방법

발행: 10시간 전 (2025년 12월 16일 오전 03:24 GMT+9)

6 min read

Source: Dev.to

TL;DR

대부분의 음성 AI 시스템은 사용자 감정을 무시하고 상황에 관계없이 로봇처럼 응답합니다. 화가 난 발신자에게는 밝은 톤의 답변이 제공돼 신뢰가 무너집니다. 음성 분석을 통해 분노, 좌절, 안도와 같은 톤 변화를 감지하고, 실시간으로 응답 속도와 어휘 선택을 조정하며, 캘린더 가용성을 확인해 상황에 맞는 솔루션을 제공하는 시스템을 구축하세요. 결과: 해결률 40 % 상승, 에스컬레이션 감소.

Prerequisites

API Keys & Credentials

VAPI API key – dashboard.vapi.ai에서 생성합니다.
Twilio Account SID + Auth Token – console.twilio.com에서 확인합니다.
이 값을 .env 파일에 VAPI_API_KEY, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN 변수로 저장합니다.

System Requirements

Node.js 16+ 와 npm 또는 yarn.
의존성 설치:

npm install axios dotenv

Voice & Transcription Setup

음성‑텍스트 변환 제공업체(OpenAI Whisper, Google Cloud Speech‑to‑Text 등)를 설정하고 감정 감지 모델을 활성화합니다. 제공업체의 인증 정보를 확보하세요.

Calendar Integration

Google Calendar API 키 또는 Microsoft Graph API 인증 정보를 사용해 캘린더 가용성을 동기화하고, 톤‑적응 결정에 활용합니다.

Knowledge Requirements

REST API, async/await, webhook 처리에 익숙할 것.
감정 분석 임계값(0.0–1.0 신뢰도 점수)에 대한 이해가 있으면 도움이 되지만 필수는 아닙니다.

Twilio

Twilio Voice API → Get Twilio

Step‑by‑Step Tutorial

Configuration & Setup

감정 분석 훅을 포함한 어시스턴트 설정을 생성합니다:

const assistantConfig = {
  model: {
    provider: "openai",
    model: "gpt-4",
    messages: [{
      role: "system",
      content: `You are an empathetic assistant. Analyze user sentiment from speech patterns and adjust your tone accordingly.

TONE RULES:
- Frustrated user (fast speech, interruptions): Use calm, solution‑focused language
- Anxious user (hesitations, uncertainty): Provide reassurance, break down steps
- Neutral user: Match their energy level
- Happy user: Mirror enthusiasm but stay professional

When checking calendar availability, acknowledge their emotional state first.`
    }],
    temperature: 0.7
  },
  voice: {
    provider: "11labs",
    voiceId: "21m00Tcm4TlvDq8ikWAM", // Rachel – versatile for tone shifts
    stability: 0.5, // Lower = more expressive
    similarityBoost: 0.75
  },
  transcriber: {
    provider: "deepgram",
    model: "nova-2",
    language: "en-US",
    keywords: ["frustrated", "urgent", "confused", "excited"] // Boost sentiment words
  },
  recordingEnabled: true // Critical for post‑call sentiment analysis
};

Why this works: Keyword boosting keeps sentiment indicators in the transcript, and a lower stability value lets the TTS modulate tone based on the LLM’s response style.

Architecture & Flow

flowchart LR
    A[User Speech] --> B[Deepgram STT]
    B --> C[Sentiment Detection]
    C --> D{Emotion Level}
    D -->|High Stress| E[GPT-4 + Calm Prompt]
    D -->|Neutral| F[GPT-4 + Standard Prompt]
    E --> G[Calendar Check Function]
    F --> G
    G --> H[11Labs TTS + Tone Adjust]
    H --> I[User Response]

The critical path: Sentiment detection occurs during transcription, cutting 200–400 ms from response latency.

Step‑by‑Step Implementation

1. Detect sentiment from speech patterns

function analyzeSentiment(transcript) {
  const wordsPerSecond = transcript.text.split(' ').length / transcript.duration;
  const hasHesitation = /\b(um|uh|like|you know)\b/gi.test(transcript.text);
  const hasUrgency = /\b(now|urgent|asap|immediately)\b/gi.test(transcript.text);

  // Fast speech (>3 wps) + urgency words = frustrated
  if (wordsPerSecond > 3 && hasUrgency) {
    return { emotion: 'frustrated', intensity: 0.8 };
  }

  // Slow speech ( {
  const { message } = req.body;

  if (message.type === 'function-call' && message.functionCall.name === 'checkCalendar') {
    const sentiment = analyzeSentiment(message.transcript);

    // Add sentiment to function parameters
    const params = {
      ...message.functionCall.parameters,
      userSentiment: sentiment.emotion,
      urgencyLevel: sentiment.intensity
    };

    const availability = await checkCalendarWithContext(params);

    res.json({
      result: availability,
      // Tone instruction for LLM
      responseHint: sentiment.emotion === 'frustrated'
        ? 'Acknowledge their urgency and provide immediate options'
        : 'Present options conversationally'
    });
  }
});

3. Adapt TTS delivery

const ttsConfig = {
  stability: sentiment.intensity > 0.7 ? 0.3 : 0.6, // More variation for high emotion
  style: sentiment.emotion === 'frustrated' ? 0.2 : 0.5 // Lower style = calmer delivery
};

Common Issues & Fixes

Race condition: Sentiment analysis runs after the LLM starts generating.
Fix: Use VAPI’s beforeMessageGeneration hook (if available) or cache sentiment from the previous turn.
False positives: Background noise triggers urgency detection.
Fix: Set Deepgram’s interim_results: false and analyze only final transcripts.
Tone whiplash: Assistant switches from empathetic to robotic mid‑conversation.
Fix: Store sentiment history in session state and smooth transitions over 2–3 turns.

System Diagram

graph LR
    A[Microphone Input] --> B[Audio Buffer]
    B --> C[Voice Activity Detection]
    C -->|Speech Detected| D[Speech‑to‑Text]
    C -->|Silence| I[Error Handling]
    D --> E[Intent Detection]
    E --> F[Large Language Model]
    F --> G[Text‑to‑Speech]
    G --> H[Speaker Output]
    I --> J[Fallback Response]
    J --> G

Testing & Validation

Local Testing

Test sentiment detection with edge cases that break naive implementations, such as rapid sentiment shifts:

// Test rapid sentiment shifts (user goes from calm → frustrated in 2 turns)
const testConversation = [
  { role: "user", content: "I need to book a meeting" },
  { role: "assistant", content: "Sure, what time works for you?" },
  { role: "user", content: "Now! This is urgent, I can't wait." }
];

// Simulate processing each turn and verify that `analyzeSentiment`
// returns 'frustrated' on the last turn and that the responseHint
// instructs the LLM to acknowledge urgency.

Validate that:

Sentiment is correctly identified for each transcript.
The calendar‑check function receives the userSentiment and urgencyLevel parameters.
The TTS configuration changes stability/style according to the detected emotion.

By integrating real‑time sentiment analysis, context‑aware calendar checks, and expressive TTS controls, voice AI assistants become more empathetic, efficient, and trustworthy.