Voice AI에서 사용자 감정에 맞게 톤을 조정하고 Calendar Checks를 통합하는 방법

발행: (2025년 12월 16일 오전 03:24 GMT+9)
6 min read
원문: Dev.to

Source: Dev.to

TL;DR

대부분의 음성 AI 시스템은 사용자 감정을 무시하고 상황에 관계없이 로봇처럼 응답합니다. 화가 난 발신자에게는 밝은 톤의 답변이 제공돼 신뢰가 무너집니다. 음성 분석을 통해 분노, 좌절, 안도와 같은 톤 변화를 감지하고, 실시간으로 응답 속도와 어휘 선택을 조정하며, 캘린더 가용성을 확인해 상황에 맞는 솔루션을 제공하는 시스템을 구축하세요. 결과: 해결률 40 % 상승, 에스컬레이션 감소.

Prerequisites

API Keys & Credentials

  • VAPI API keydashboard.vapi.ai에서 생성합니다.
  • Twilio Account SID + Auth Tokenconsole.twilio.com에서 확인합니다.
    이 값을 .env 파일에 VAPI_API_KEY, TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN 변수로 저장합니다.

System Requirements

  • Node.js 16+ 와 npm 또는 yarn.
  • 의존성 설치:
npm install axios dotenv

Voice & Transcription Setup

음성‑텍스트 변환 제공업체(OpenAI Whisper, Google Cloud Speech‑to‑Text 등)를 설정하고 감정 감지 모델을 활성화합니다. 제공업체의 인증 정보를 확보하세요.

Calendar Integration

Google Calendar API 키 또는 Microsoft Graph API 인증 정보를 사용해 캘린더 가용성을 동기화하고, 톤‑적응 결정에 활용합니다.

Knowledge Requirements

  • REST API, async/await, webhook 처리에 익숙할 것.
  • 감정 분석 임계값(0.0–1.0 신뢰도 점수)에 대한 이해가 있으면 도움이 되지만 필수는 아닙니다.

Twilio

Twilio Voice API → Get Twilio

Step‑by‑Step Tutorial

Configuration & Setup

감정 분석 훅을 포함한 어시스턴트 설정을 생성합니다:

const assistantConfig = {
  model: {
    provider: "openai",
    model: "gpt-4",
    messages: [{
      role: "system",
      content: `You are an empathetic assistant. Analyze user sentiment from speech patterns and adjust your tone accordingly.

TONE RULES:
- Frustrated user (fast speech, interruptions): Use calm, solution‑focused language
- Anxious user (hesitations, uncertainty): Provide reassurance, break down steps
- Neutral user: Match their energy level
- Happy user: Mirror enthusiasm but stay professional

When checking calendar availability, acknowledge their emotional state first.`
    }],
    temperature: 0.7
  },
  voice: {
    provider: "11labs",
    voiceId: "21m00Tcm4TlvDq8ikWAM", // Rachel – versatile for tone shifts
    stability: 0.5, // Lower = more expressive
    similarityBoost: 0.75
  },
  transcriber: {
    provider: "deepgram",
    model: "nova-2",
    language: "en-US",
    keywords: ["frustrated", "urgent", "confused", "excited"] // Boost sentiment words
  },
  recordingEnabled: true // Critical for post‑call sentiment analysis
};

Why this works: Keyword boosting keeps sentiment indicators in the transcript, and a lower stability value lets the TTS modulate tone based on the LLM’s response style.

Architecture & Flow

flowchart LR
    A[User Speech] --> B[Deepgram STT]
    B --> C[Sentiment Detection]
    C --> D{Emotion Level}
    D -->|High Stress| E[GPT-4 + Calm Prompt]
    D -->|Neutral| F[GPT-4 + Standard Prompt]
    E --> G[Calendar Check Function]
    F --> G
    G --> H[11Labs TTS + Tone Adjust]
    H --> I[User Response]

The critical path: Sentiment detection occurs during transcription, cutting 200–400 ms from response latency.

Step‑by‑Step Implementation

1. Detect sentiment from speech patterns

function analyzeSentiment(transcript) {
  const wordsPerSecond = transcript.text.split(' ').length / transcript.duration;
  const hasHesitation = /\b(um|uh|like|you know)\b/gi.test(transcript.text);
  const hasUrgency = /\b(now|urgent|asap|immediately)\b/gi.test(transcript.text);

  // Fast speech (>3 wps) + urgency words = frustrated
  if (wordsPerSecond > 3 && hasUrgency) {
    return { emotion: 'frustrated', intensity: 0.8 };
  }

  // Slow speech ( {
  const { message } = req.body;

  if (message.type === 'function-call' && message.functionCall.name === 'checkCalendar') {
    const sentiment = analyzeSentiment(message.transcript);

    // Add sentiment to function parameters
    const params = {
      ...message.functionCall.parameters,
      userSentiment: sentiment.emotion,
      urgencyLevel: sentiment.intensity
    };

    const availability = await checkCalendarWithContext(params);

    res.json({
      result: availability,
      // Tone instruction for LLM
      responseHint: sentiment.emotion === 'frustrated'
        ? 'Acknowledge their urgency and provide immediate options'
        : 'Present options conversationally'
    });
  }
});

3. Adapt TTS delivery

const ttsConfig = {
  stability: sentiment.intensity > 0.7 ? 0.3 : 0.6, // More variation for high emotion
  style: sentiment.emotion === 'frustrated' ? 0.2 : 0.5 // Lower style = calmer delivery
};

Common Issues & Fixes

  • Race condition: Sentiment analysis runs after the LLM starts generating.
    Fix: Use VAPI’s beforeMessageGeneration hook (if available) or cache sentiment from the previous turn.

  • False positives: Background noise triggers urgency detection.
    Fix: Set Deepgram’s interim_results: false and analyze only final transcripts.

  • Tone whiplash: Assistant switches from empathetic to robotic mid‑conversation.
    Fix: Store sentiment history in session state and smooth transitions over 2–3 turns.

System Diagram

graph LR
    A[Microphone Input] --> B[Audio Buffer]
    B --> C[Voice Activity Detection]
    C -->|Speech Detected| D[Speech‑to‑Text]
    C -->|Silence| I[Error Handling]
    D --> E[Intent Detection]
    E --> F[Large Language Model]
    F --> G[Text‑to‑Speech]
    G --> H[Speaker Output]
    I --> J[Fallback Response]
    J --> G

Testing & Validation

Local Testing

Test sentiment detection with edge cases that break naive implementations, such as rapid sentiment shifts:

// Test rapid sentiment shifts (user goes from calm → frustrated in 2 turns)
const testConversation = [
  { role: "user", content: "I need to book a meeting" },
  { role: "assistant", content: "Sure, what time works for you?" },
  { role: "user", content: "Now! This is urgent, I can't wait." }
];

// Simulate processing each turn and verify that `analyzeSentiment`
// returns 'frustrated' on the last turn and that the responseHint
// instructs the LLM to acknowledge urgency.

Validate that:

  1. Sentiment is correctly identified for each transcript.
  2. The calendar‑check function receives the userSentiment and urgencyLevel parameters.
  3. The TTS configuration changes stability/style according to the detected emotion.

By integrating real‑time sentiment analysis, context‑aware calendar checks, and expressive TTS controls, voice AI assistants become more empathetic, efficient, and trustworthy.

Back to Blog

관련 글

더 보기 »