How to Implement Context Retention in Voice AI Applications

Published: 2 months ago (December 2, 2025 at 09:37 AM EST)

5 min read

Source: Dev.to

TL;DR

Voice AI loses context between turns—users repeat themselves, agents forget previous requests. This breaks UX and wastes API calls. Build persistent session state using VAPI’s metadata field + your server’s in-memory store (or Redis for scale). Track conversation history, user intent, and call metadata across turns. Result: agents remember context, reduce latency by 40%, cut API costs by eliminating redundant clarifications.

Prerequisites

API Keys & Credentials

VAPI API key (generate at dashboard.vapi.ai)
Twilio Account SID and Auth Token (from console.twilio.com)
OpenAI API key for LLM inference (gpt-4 or gpt-3.5-turbo minimum)

System Requirements

Node.js 18+ (async/await support required)
Redis 6.0+ or PostgreSQL 12+ for session persistence (in‑memory storage will lose context on restart)
Minimum 2 GB RAM for concurrent session handling

SDK Versions

vapi-sdk: ^0.8.0 or higher
twilio: ^4.0.0 or higher
axios: ^1.6.0 for HTTP requests

Network Setup

Public HTTPS endpoint for webhooks (ngrok acceptable for development, production requires valid SSL certificate)
Firewall rules allowing inbound traffic on port 443
Webhook signature validation enabled (HMAC‑SHA256)

Knowledge Requirements

Familiarity with REST APIs and JSON payloads
Understanding of session management and state machines
Basic knowledge of voice call flows and transcription events

vapi: Get Started with VAPI → Get vapi

Step-by-Step Tutorial

Configuration & Setup

Start with the assistant configuration. This defines how your voice agent behaves—model selection, voice provider, transcription settings, and crucially, how it handles context across calls.

const assistantConfig = {
  model: {
    provider: "openai",
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content:
          "You are a customer support agent. Maintain context from previous interactions. Reference customer history when available."
      }
    ],
    temperature: 0.7
  },
  voice: {
    provider: "elevenlabs",
    voiceId: "EXAVITQu4vr4xnSDxMaL",
    speed: 1.0
  },
  transcriber: {
    provider: "deepgram",
    model: "nova-2",
    language: "en",
    endpointing: 300
  },
  firstMessageMode: "assistant-speaks",
  recordingEnabled: true
};

The messages array is where you inject prior conversation context—this is your state retention mechanism.

Architecture & Flow

Your Express server receives webhook events from VAPI, maintains session state in memory (or Redis for production), and injects context into the assistant’s system prompt before each call.

User Call → VAPI → Webhook (call.started) → Your Server (Load Context)
→ Update Assistant Config → VAPI Continues Call → Webhook (call.ended)
→ Your Server (Save Context) → Database

Session state lives in a Map with TTL cleanup. When a call arrives, you fetch prior conversation history, inject it into the assistant config, and send it back to VAPI via the /v1/calls endpoint.

Step-by-Step Implementation

1. Initialize Express server with webhook handler

const express = require('express');
const crypto = require('crypto');
const app = express();
app.use(express.json());

// Session storage: Map
const sessions = new Map();
const SESSION_TTL = 3600000; // 1 hour

// Webhook signature validation (VAPI signs all webhooks)
function validateWebhookSignature(req) {
  const signature = req.headers['x-vapi-signature'];
  const timestamp = req.headers['x-vapi-timestamp'];
  const body = JSON.stringify(req.body);

  const message = `${timestamp}.${body}`;
  const hash = crypto
    .createHmac('sha256', process.env.VAPI_WEBHOOK_SECRET)
    .update(message)
    .digest('hex');

  return hash === signature;
}

app.post('/webhook/vapi', (req, res) => {
  if (!validateWebhookSignature(req)) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  const event = req.body;

  if (event.type === 'call.started') {
    handleCallStarted(event);
  } else if (event.type === 'call.ended') {
    handleCallEnded(event);
  } else if (event.type === 'message.updated') {
    handleMessageUpdate(event);
  }

  res.status(200).json({ received: true });
});

2. Load context on call start, inject into assistant

async function handleCallStarted(event) {
  const { callId, phoneNumber, customerId } = event;

  // Fetch prior conversation history from database
  let priorContext = '';
  if (customerId) {
    const history = await fetchCustomerHistory(customerId);
    priorContext = history
      .slice(-5) // Last 5 exchanges
      .map(msg => `${msg.role}: ${msg.content}`)
      .join('\n');
  }

  // Build enhanced system prompt with context
  const enhancedSystemPrompt = `You are a customer support agent.
Previous conversation history:
${priorContext || 'No prior history.'}

Current call: ${phoneNumber}
Customer ID: ${customerId || 'Unknown'}

Reference prior interactions. Be consistent with previous commitments.`;

  // Update assistant config with context
  const updatedConfig = {
    ...assistantConfig,
    model: {
      ...assistantConfig.model,
      messages: [
        {
          role: "system",
          content: enhancedSystemPrompt
        }
      ]
    }
  };

  // Store session state
  sessions.set(callId, {
    context: updatedConfig,
    customerId,
    createdAt: Date.now(),
    transcript: []
  });

  // Schedule cleanup
  setTimeout(() => sessions.delete(callId), SESSION_TTL);
}

3. Capture transcript during call, save on end

function handleMessageUpdate(event) {
  const { callId, message, role } = event;
  const session = sessions.get(callId);

  if (session) {
    session.transcript.push({
      role,
      content: message.content,
      timestamp: Date.now()
    });
  }
}

async function handleCallEnded(event) {
  const { callId, endedReason, duration } = event;
  const session = sessions.get(callId);

  if (!session) return;

  // Persist conversation to database
  if (session.customerId && session.transcript.length > 0) {
    await saveConversation({
      customerId: session.customerId,
      callId,
      transcript: session.transcript,
      duration,
      endedReason,
      timestamp: new Date()
    });
  }

  sessions.delete(callId);
}

Error Handling & Edge Cases

Race condition

Two calls from the same customer simultaneously. Use a lock mechanism:

const locks = new Map();

async function acquireLock(customerId, timeout = 5000) {
  while (locks.has(customerId)) {
    await new Promise(resolve => setTimeout(resolve, 100));
  }
  locks.set(customerId, true);
  setTimeout(() => locks.delete(customerId), timeout);
}

Webhook timeout

VAPI expects a response within 5 seconds. Respond immediately and process asynchronously:

app.post('/webhook/vapi', async (req, res) => {
  res.status(202).json({ accepted: true }); // Respond immediately

  // Process async
  setImmediate(() => {
    const event = req.body;
    if (event.type === 'call.started') {
      handleCallStarted(event).catch(err => console.error('Handler error:', err));
    }
  });
});

Memory leak

Sessions not cleaned up if call.ended webhook fails. Add periodic cleanup:

setInterval(() => {
  const now = Date.now();
  for (const [callId, session] of sessions.entries()) {
    // Example cleanup condition (TTL already handled on start)
    if (now - session.createdAt > SESSION_TTL) {
      sessions.delete(callId);
    }
  }
}, 600000); // Run every 10 minutes