OpenAI has new voice models that reason, translate, and transcribe as you speak
Source: 9to5Mac

Developers can build new app experiences with OpenAI’s 3 new voice models
OpenAI announced three new voice models designed for reasoning, translation, and transcription:
- GPT‑Realtime‑2 – the first voice model with GPT‑5‑class reasoning, capable of handling complex requests and maintaining natural conversation flow.
- GPT‑Realtime‑Translate – a live translation model that converts speech from 70+ input languages into 13 output languages in real time.
- GPT‑Realtime‑Whisper – a streaming speech‑to‑text model that transcribes speech live as the speaker talks.
GPT‑Realtime‑2
Built for live voice interactions, GPT‑Realtime‑2 keeps conversations moving while it:
- Reasons through requests
- Calls tools
- Handles corrections or interruptions
- Responds in context‑appropriate ways
GPT‑Realtime‑Translate
Supports 70 input languages and 13 output languages, enabling real‑time multilingual conversations.
GPT‑Realtime‑Whisper
A low‑latency streaming transcription model that transcribes audio as people speak, making live products feel faster and more natural—e.g., instant captions and real‑time meeting notes.
Pricing
All three models are available through OpenAI’s Realtime API:
- GPT‑Realtime‑2: $32 / 1 M audio input tokens ( $0.40 for cached input tokens ) and $64 / 1 M audio output tokens.
- GPT‑Realtime‑Translate: $0.034 per minute.
- GPT‑Realtime‑Whisper: $0.017 per minute.
Try them out
Test the new realtime voice models in the Playground. If you have Codex installed, click Submit on the prompt below to add GPT‑Realtime‑2 to an existing app or create a new one with it.
Learn more about OpenAI’s latest voice models and see how companies are already using the technology here.