How to Implement Context Retention in Voice AI Applications
Source: Dev.to
TL;DR
Voice AI loses context between turns—users repeat themselves, agents forget previous requests. This breaks UX and wastes API calls. Build persistent session state using VAPI’s metadata field + your server’s in-memory store (or Redis for scale). Track conversation history, user intent, and call metadata across turns. Result: agents remember context, reduce latency by 40%, cut API costs by eliminating redundant clarifications.
Prerequisites
API Keys & Credentials
- VAPI API key (generate at dashboard.vapi.ai)
- Twilio Account SID and Auth Token (from console.twilio.com)
- OpenAI API key for LLM inference (gpt-4 or gpt-3.5-turbo minimum)
System Requirements
- Node.js 18+ (async/await support required)
- Redis 6.0+ or PostgreSQL 12+ for session persistence (in‑memory storage will lose context on restart)
- Minimum 2 GB RAM for concurrent session handling
SDK Versions
vapi-sdk: ^0.8.0 or highertwilio: ^4.0.0 or higheraxios: ^1.6.0 for HTTP requests
Network Setup
- Public HTTPS endpoint for webhooks (ngrok acceptable for development, production requires valid SSL certificate)
- Firewall rules allowing inbound traffic on port 443
- Webhook signature validation enabled (HMAC‑SHA256)
Knowledge Requirements
- Familiarity with REST APIs and JSON payloads
- Understanding of session management and state machines
- Basic knowledge of voice call flows and transcription events
vapi: Get Started with VAPI → Get vapi
Step-by-Step Tutorial
Configuration & Setup
Start with the assistant configuration. This defines how your voice agent behaves—model selection, voice provider, transcription settings, and crucially, how it handles context across calls.
const assistantConfig = {
model: {
provider: "openai",
model: "gpt-4",
messages: [
{
role: "system",
content:
"You are a customer support agent. Maintain context from previous interactions. Reference customer history when available."
}
],
temperature: 0.7
},
voice: {
provider: "elevenlabs",
voiceId: "EXAVITQu4vr4xnSDxMaL",
speed: 1.0
},
transcriber: {
provider: "deepgram",
model: "nova-2",
language: "en",
endpointing: 300
},
firstMessageMode: "assistant-speaks",
recordingEnabled: true
};
The messages array is where you inject prior conversation context—this is your state retention mechanism.
Architecture & Flow
Your Express server receives webhook events from VAPI, maintains session state in memory (or Redis for production), and injects context into the assistant’s system prompt before each call.
User Call → VAPI → Webhook (call.started) → Your Server (Load Context)
→ Update Assistant Config → VAPI Continues Call → Webhook (call.ended)
→ Your Server (Save Context) → Database
Session state lives in a Map with TTL cleanup. When a call arrives, you fetch prior conversation history, inject it into the assistant config, and send it back to VAPI via the /v1/calls endpoint.
Step-by-Step Implementation
1. Initialize Express server with webhook handler
const express = require('express');
const crypto = require('crypto');
const app = express();
app.use(express.json());
// Session storage: Map
const sessions = new Map();
const SESSION_TTL = 3600000; // 1 hour
// Webhook signature validation (VAPI signs all webhooks)
function validateWebhookSignature(req) {
const signature = req.headers['x-vapi-signature'];
const timestamp = req.headers['x-vapi-timestamp'];
const body = JSON.stringify(req.body);
const message = `${timestamp}.${body}`;
const hash = crypto
.createHmac('sha256', process.env.VAPI_WEBHOOK_SECRET)
.update(message)
.digest('hex');
return hash === signature;
}
app.post('/webhook/vapi', (req, res) => {
if (!validateWebhookSignature(req)) {
return res.status(401).json({ error: 'Invalid signature' });
}
const event = req.body;
if (event.type === 'call.started') {
handleCallStarted(event);
} else if (event.type === 'call.ended') {
handleCallEnded(event);
} else if (event.type === 'message.updated') {
handleMessageUpdate(event);
}
res.status(200).json({ received: true });
});
2. Load context on call start, inject into assistant
async function handleCallStarted(event) {
const { callId, phoneNumber, customerId } = event;
// Fetch prior conversation history from database
let priorContext = '';
if (customerId) {
const history = await fetchCustomerHistory(customerId);
priorContext = history
.slice(-5) // Last 5 exchanges
.map(msg => `${msg.role}: ${msg.content}`)
.join('\n');
}
// Build enhanced system prompt with context
const enhancedSystemPrompt = `You are a customer support agent.
Previous conversation history:
${priorContext || 'No prior history.'}
Current call: ${phoneNumber}
Customer ID: ${customerId || 'Unknown'}
Reference prior interactions. Be consistent with previous commitments.`;
// Update assistant config with context
const updatedConfig = {
...assistantConfig,
model: {
...assistantConfig.model,
messages: [
{
role: "system",
content: enhancedSystemPrompt
}
]
}
};
// Store session state
sessions.set(callId, {
context: updatedConfig,
customerId,
createdAt: Date.now(),
transcript: []
});
// Schedule cleanup
setTimeout(() => sessions.delete(callId), SESSION_TTL);
}
3. Capture transcript during call, save on end
function handleMessageUpdate(event) {
const { callId, message, role } = event;
const session = sessions.get(callId);
if (session) {
session.transcript.push({
role,
content: message.content,
timestamp: Date.now()
});
}
}
async function handleCallEnded(event) {
const { callId, endedReason, duration } = event;
const session = sessions.get(callId);
if (!session) return;
// Persist conversation to database
if (session.customerId && session.transcript.length > 0) {
await saveConversation({
customerId: session.customerId,
callId,
transcript: session.transcript,
duration,
endedReason,
timestamp: new Date()
});
}
sessions.delete(callId);
}
Error Handling & Edge Cases
Race condition
Two calls from the same customer simultaneously. Use a lock mechanism:
const locks = new Map();
async function acquireLock(customerId, timeout = 5000) {
while (locks.has(customerId)) {
await new Promise(resolve => setTimeout(resolve, 100));
}
locks.set(customerId, true);
setTimeout(() => locks.delete(customerId), timeout);
}
Webhook timeout
VAPI expects a response within 5 seconds. Respond immediately and process asynchronously:
app.post('/webhook/vapi', async (req, res) => {
res.status(202).json({ accepted: true }); // Respond immediately
// Process async
setImmediate(() => {
const event = req.body;
if (event.type === 'call.started') {
handleCallStarted(event).catch(err => console.error('Handler error:', err));
}
});
});
Memory leak
Sessions not cleaned up if call.ended webhook fails. Add periodic cleanup:
setInterval(() => {
const now = Date.now();
for (const [callId, session] of sessions.entries()) {
// Example cleanup condition (TTL already handled on start)
if (now - session.createdAt > SESSION_TTL) {
sessions.delete(callId);
}
}
}, 600000); // Run every 10 minutes