AWS re:Invent 2025 - Create hyper-personalized voice interactions with Amazon Nova Sonic (AIM374)

Published: 1 hour ago (December 5, 2025 at 08:46 PM EST)

3 min read

Source: Dev.to

Overview

In this session, Veerdhawal Pande (Principal Product Manager, Amazon Nova Sonic) and Ankur Gandhe (Principal Scientist, Amazon General Intelligence) introduce Amazon Nova 2 Sonic, a speech‑to‑speech foundation model designed for real‑time, human‑like conversational AI. The presentation covers core features, benchmark results, architecture, key use cases, developer tools, and a live demo of a customer implementation.

Core Features

Bidirectional streaming API on Amazon Bedrock with low user‑perceived latency, enabling real‑time audio input and output.
Best‑in‑class speech understanding, robust to diverse speaking styles, accents, and background noise.
Turn‑taking controllability: developers can configure pause durations that define the end of a turn, allowing natural interruptions and pauses.
Sentiment awareness: the model detects tonality and voice‑based sentiment, adapting responses to mirror user emotions.
Tool calling & knowledge grounding: responses can be backed by external knowledge bases or invoke APIs (e.g., reservations, membership upgrades) while maintaining factual correctness.
Privacy‑first design: built with responsible AI safeguards and compliance features.

New Capabilities in Nova 2 Sonic

Expanded Language Support

Seven languages, each with masculine and feminine voices:
- English (US, GB, India, Australia) – new Indian and Australian variants
- Spanish
- French
- Italian
- German
- Hindi
- Portuguese

Language Switching

Users can switch languages mid‑session; the model responds in the newly selected language without restarting the conversation.

Asynchronous Task Completion

Long‑running tool calls (e.g., 8–10 seconds) no longer block the dialogue. Users can continue the conversation or invoke other tools while previous calls complete in the background.

Cross‑Modal Input/Output

Supports speech ↔ speech, speech ↔ text, and text ↔ text within the same session, preserving conversational context across modalities.

Turn‑Taking Controllability

Developers can set the maximum pause length before the model assumes the user’s turn has ended, improving fluidity in multi‑turn dialogues.

Architectural Highlights

Ankur Gandhe explains that the unified speech‑to‑speech model replaces traditional cascaded pipelines (ASR → NLU → TTS) with a single end‑to‑end network. Benefits include:

Reduced latency and error propagation.
Consistent voice characteristics across input and output.
Simplified deployment and scaling via Bedrock’s managed service.

Benchmark Results

Speech understanding accuracy improves by ~50 % on alphanumeric content compared to the previous generation.
Achieves state‑of‑the‑art ASR performance on public benchmarks.
Demonstrates competitive quality‑price ratio, positioning Nova 2 Sonic as a cost‑effective solution for enterprise workloads.

Key Use Cases

Customer service automation (e.g., call center agents, Amazon Connect).
Voice assistants for consumer and enterprise applications.
Education apps that require interactive, multimodal dialogue.
AI receptionists and appointment booking systems.

Integration Options

Developers can integrate Nova 2 Sonic through:

LiveKit and Pipecat SDKs for real‑time streaming.
Amazon Connect for contact‑center workflows.
Telephony partners such as Twilio for PSTN connectivity.

Customer Demo: AI Receptionist

Amma Pandekar from Cisco showcases a real‑world implementation for a tire‑chain retailer:

An AI receptionist handles appointment scheduling, answers queries, and switches between voice and text as needed.
Demonstrates multi‑modal dialogue and language switching in a live call.

Getting Started

Enable Nova 2 Sonic in the desired Bedrock region (IAD, PDX, ARN, NRT).
Use the bidirectional streaming API to send audio or text and receive real‑time responses.
Configure turn‑taking parameters and language preferences via the API payload.
Leverage tool‑calling by defining callable functions in your application logic.

For detailed integration guides, see the Amazon Bedrock documentation and the SDK references for LiveKit and Pipecat.

This content reflects the material presented at AWS re:Invent 2025 (session AIM374).

AWS re:Invent 2025 - Create hyper-personalized voice interactions with Amazon Nova Sonic (AIM374)

Overview

Core Features

New Capabilities in Nova 2 Sonic

Expanded Language Support

Language Switching

Asynchronous Task Completion

Cross‑Modal Input/Output

Turn‑Taking Controllability

Architectural Highlights

Benchmark Results

Key Use Cases

Integration Options

Customer Demo: AI Receptionist

Getting Started

Related posts

AWS re:Invent 2025-AWS Generative AI Innovation Center driving enterprise success with AWS Partners

AWS re:Invent 2025 - Unlocking GenAI potential with automated modernization to AWS (ANT313)

If You Want Serendipity and Transformation, Embrace 'Newest Literally'

AWS re:Invent 2025 - Fast-track to insights: AWS-SAP data strategy (ANT333)

Overview

Core Features

New Capabilities in Nova 2 Sonic

Expanded Language Support

Language Switching

Asynchronous Task Completion

Cross‑Modal Input/Output

Turn‑Taking Controllability

Architectural Highlights

Benchmark Results

Key Use Cases

Integration Options

Customer Demo: AI Receptionist

Getting Started

Related posts

AWS re:Invent 2025-AWS Generative AI Innovation Center driving enterprise success with AWS Partners

AWS re:Invent 2025 - Unlocking GenAI potential with automated modernization to AWS (ANT313)

If You Want Serendipity and Transformation, Embrace 'Newest Literally'

AWS re:Invent 2025 - Fast-track to insights: AWS-SAP data strategy (ANT333)

New Capabilities in Nova 2 Sonic