Introducing GPT-5.3-Codex-Spark

Published: 3 days ago (February 12, 2026 at 05:00 AM EST)

6 min read

Source: OpenAI Blog

Source: OpenAI Blog

Research Preview: GPT‑5.3‑Codex‑Spark

A smaller, real‑time coding model built in partnership with Cerebras.

📢 What’s new?

Codex‑Spark is the first model designed for instant‑feedback coding.
Optimized for ultra‑low‑latency hardware, it can generate > 1,000 tokens / second while staying highly capable on real‑world programming tasks.
Available now as a research preview for ChatGPT Pro users.

🤝 Partnership with Cerebras

This launch marks the first milestone of the collaboration announced in January:
OpenAI × Cerebras partnership.
We are working with Cerebras to:
- Scale datacenter capacity.
- Harden the end‑to‑end user experience.
- Deploy larger frontier models in the future.

🛠️ Model capabilities

Feature	Details
Context window	128 k tokens
Output type	Text‑only
Primary use‑case	Real‑time code edits, logic reshaping, UI refinements with immediate results
Long‑running tasks	Still supported – Codex‑Spark complements existing models that can run autonomously for hours/days/weeks.

🚀 How to access

Who can use it? ChatGPT Pro users (research preview).
Rate limits: Codex‑Spark has its own limits; usage does not count toward your standard ChatGPT quotas.
Potential throttling: When demand spikes, you may encounter limited access or temporary queuing as we balance reliability across all users.

📋 What we’re looking for

Developer feedback on real‑time coding workflows.
Insights on how the model performs for both instant edits and long‑running projects.
Suggestions for future improvements and feature expansions.

Speed and Intelligence

Codex‑Spark is optimized for interactive work where latency matters as much as intelligence. You can collaborate with the model in real time—interrupting or redirecting it as it works—and rapidly iterate with near‑instant responses.

Because it’s tuned for speed, Codex‑Spark keeps its default working style lightweight:

Minimal, targeted edits – only the changes you need.
No automatic test runs – tests are executed only when you request them.

Coding

Codex‑Spark is a highly capable, small model optimized for fast inference. On SWE‑Bench Pro and Terminal‑Bench 2.0—two benchmarks that evaluate agentic software‑engineering capability—GPT‑5.3‑Codex‑Spark demonstrates strong performance while completing the tasks in a fraction of the time compared to GPT‑5.3‑Codex.

Latency Improvements for All Models

While training Codex‑Spark, we discovered that model speed alone isn’t enough for real‑time collaboration. Reducing latency across the entire request‑response pipeline became essential. The following end‑to‑end enhancements have been added to our harness and will benefit all models:

What We Changed

Streaming pipeline – Optimized how responses flow from client ↔ server.
Inference stack – Rewrote critical components for faster execution.
Session initialization – Made the first visible token appear sooner, keeping Codex responsive during iteration.
Persistent WebSocket connection – Introduced a dedicated, long‑lived channel for communication (enabled by default for Codex‑Spark and soon for every model).

Quantitative Gains

Metric	Improvement
Client/Server round‑trip overhead	‑80 %
Per‑token processing overhead	‑30 %
Time‑to‑first‑token (TTFT)	‑50 %

What This Means for You

Faster feedback – The first token shows up much sooner, improving the interactive feel.
Smoother iterations – Reduced per‑token latency makes continuous editing feel seamless.
Unified experience – The WebSocket path will become the default for all models, ensuring consistent performance across the platform.

Powered by Cerealis

Codex‑Spark runs on Cerebras’ Wafer Scale Engine 3 — a purpose‑built AI accelerator for high‑speed inference that gives Codex a latency‑first serving tier. We partnered with Cerebras to add this low‑latency path to the same production serving stack as the rest of our fleet, so it works seamlessly across Codex and sets us up to support future models.

“What excites us most about GPT‑5.3‑Codex‑Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible — new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning.”
— Sean Lie, CTO and Co‑Founder of Cerebras

GPUs remain foundational across our training and inference pipelines and deliver the most cost‑effective tokens for broad usage.
Cerebras complements that foundation by excelling at workflows that demand extremely low latency, tightening the end‑to‑end loop so Codex feels more responsive as you iterate.
GPUs and Cerebras can be combined for single workloads to achieve the best performance.

Availability & Details

Codex‑Spark is rolling out today as a research preview for ChatGPT Pro users in the latest versions of:

the Codex app
the CLI
the VS Code extension

Because it runs on specialized low‑latency hardware, usage is governed by a separate rate limit that may adjust based on demand during the preview.

API Access

Currently available to a small set of design partners.
Goal: understand how developers want to integrate Codex‑Spark into their products.
Wider access will be expanded over the coming weeks as we tune the integration under real workloads.

Model Capabilities

Text‑only with a 128 k token context window.
First model in a family of ultra‑fast models.
Future enhancements (based on developer feedback) may include:
- Larger models
- Longer context lengths
- Multimodal input

Safety & Evaluation

Includes the same safety training as our mainline models, covering cyber‑relevant scenarios.
Evaluated through our standard deployment process, which includes baseline assessments for cybersecurity and other capabilities.
Determined not to meet the Preparedness Framework threshold for high capability in cybersecurity or biology.

What’s Next

Codex‑Spark is the first step toward a Codex with two complementary modes:

Longer‑horizon reasoning and execution
Real‑time collaboration for rapid iteration

Over time, these modes will blend. Codex can keep you in a tight interactive loop while delegating longer‑running work to sub‑agents in the background, or it can fan out tasks to many models in parallel when you need breadth and speed. This means you won’t have to choose a single mode up front.

As models become more capable, interaction speed becomes a clear bottleneck. Ultra‑fast inference tightens that loop, making Codex feel more natural to use and expanding what’s possible for anyone turning an idea into working software.