OpenAI launches new voice intelligence features in its API
Source: TechCrunch
New Voice Models
GPT‑Realtime‑2
- A voice model built to create realistic vocal simulations that can converse with users.
- Incorporates GPT‑5‑class reasoning to handle more complex user requests, improving on its predecessor (GPT‑Realtime‑1.5).
- Learn more about GPT‑Realtime‑2
GPT‑Realtime‑Translate
- Provides real‑time translation services that “keep pace” with the user in a conversational flow.
- Supports 70+ input languages (languages it can understand) and 13 output languages (languages it can speak).
- Supported languages
GPT‑Realtime‑Whisper
- Offers live speech‑to‑text capabilities, capturing spoken words as interactions occur.
“Together, the models we are launching move real‑time audio from simple call‑and‑response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” the company said.
Potential Use Cases
These updates are valuable for:
- Expanding customer‑service capabilities
- Education platforms
- Media production
- Event management
- Creator platforms
- And other applications that benefit from real‑time voice interaction
Safety Measures
OpenAI has implemented guardrails to prevent misuse, such as spam, fraud, or other forms of online abuse. Specific triggers can halt conversations that violate the company’s harmful‑content guidelines.
Availability and Pricing
All new voice models are available through OpenAI’s Realtime API.
- Translate and Whisper are billed by the minute.
- GPT‑Realtime‑2 is billed based on token consumption.