GPT‑5.3‑Codex‑Spark

Published: 3 days ago (February 12, 2026 at 01:06 PM EST)

6 min read

Source: Hacker News

GPT‑5.3‑Codex‑Spark Research Preview

We’re excited to release a research preview of GPT‑5.3‑Codex‑Spark, a smaller, real‑time‑oriented version of GPT‑5.3‑Codex. This marks the first milestone in our partnership with Cerebras (announced in January link).

Why Codex‑Spark?

Near‑instant latency on ultra‑low‑latency hardware ( > 1 000 tokens / second).
Optimized for real‑time coding: quick edits, logic reshaping, UI refinements, and immediate feedback.
Complements our frontier models, which excel at long‑running, autonomous tasks (hours, days, weeks).

Key Features

Feature	Details
Context window	128 k tokens
Output type	Text‑only (no images or audio)
Performance	> 1 000 tokens / second on Cerebras hardware
Use‑case focus	Real‑time code editing and rapid iteration

Availability

Who can access? ChatGPT Pro users via the Cerebras platform (research preview).
Purpose: Enable early experimentation while we scale datacenter capacity, harden the end‑to‑end experience, and prepare larger frontier models for broader release.

Rate Limits & Usage

Codex‑Spark has its own rate limits separate from standard ChatGPT limits.
Usage does not count toward your regular quota.
During periods of high demand you may encounter limited access or temporary queuing as we balance reliability across users.

Next Steps

Experiment: Try Codex‑Spark on your coding tasks and share feedback.
Iterate: We’ll incorporate developer insights to improve performance, reliability, and feature set.
Scale: Work with Cerebras to expand capacity and eventually roll out larger models for real‑time coding.

We look forward to seeing what you build with Codex‑Spark!

Speed and Intelligence

Codex‑Spark is optimized for interactive work where latency matters as much as intelligence. You can collaborate with the model in real time—interrupting or redirecting it as it works—and rapidly iterate with near‑instant responses.

Key characteristics

Fast, lightweight workflow – By default, Codex‑Spark makes minimal, targeted edits.
User‑controlled testing – It won’t automatically run tests unless you explicitly ask it to.
Real‑time collaboration – You can pause, steer, or adjust the model on the fly, keeping the development loop tight and responsive.

Coding

Codex‑Spark is a highly capable small model optimized for fast inference.
On SWE‑Bench Pro and Terminal‑Bench 2.0—two benchmarks evaluating agentic software‑engineering capability—GPT‑5.3‑Codex‑Spark demonstrates strong performance while completing the tasks in a fraction of the time compared to GPT‑5.3‑Codex.

Latency Improvements for All Models

While training Codex‑Spark, we discovered that model speed alone isn’t enough for real‑time collaboration. Reducing latency across the entire request‑response pipeline became a priority. The following end‑to‑end enhancements have been added to our harness and will benefit all models:

What We Changed

Streaming pipeline – Optimized how responses stream between client and server.
Inference stack – Rewrote key components for faster processing.
Session initialization – Made the first visible token appear sooner, keeping Codex responsive during iteration.

Technical Highlights

Improvement	Impact
Persistent WebSocket connection (enabled by default for Codex‑Spark)	Reduces client/server round‑trip overhead by ≈ 80 %
Optimizations inside the Responses API	Cuts per‑token overhead by ≈ 30 %
Faster session start‑up	Decreases time‑to‑first‑token by ≈ 50 %

Note: The WebSocket path will become the default for all models shortly.

These changes collectively lower latency throughout the request‑response lifecycle, delivering a smoother, more responsive experience for developers using any of our models.

Powered by Cerealis

Codex‑Spark runs on Cerealis’ Wafer Scale Engine 3 — a purpose‑built AI accelerator for high‑speed inference that gives Codex a latency‑first serving tier. We partnered with Cerealis to add this low‑latency path to the same production serving stack as the rest of our fleet, so it works seamlessly across Codex and sets us up to support future models.

“What excites us most about GPT‑5.3‑Codex‑Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible — new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning.”
— Sean Lie, CTO and Co‑Founder of Cerealis

GPUs remain foundational across our training and inference pipelines and deliver the most cost‑effective tokens for broad usage.
Cerealis complements that foundation by excelling at workflows that demand extremely low latency, tightening the end‑to‑end loop so Codex feels more responsive as you iterate.
Hybrid deployments: GPUs and Cerealis can be combined for single workloads to achieve the best performance.

Availability & Details

Codex‑Spark is rolling out today as a research preview for ChatGPT Pro users in the latest versions of:

the Codex app
the CLI
the VS Code extension

Rate Limits

Because it runs on specialized low‑latency hardware, usage is governed by a separate rate limit that may adjust based on demand during the preview.

API Access

We are also making Codex‑Spark available via the API for a small set of design partners. This will help us understand how developers want to integrate the model into their products. Access will be expanded over the coming weeks as we continue tuning the integration under real workloads.

Model Capabilities

Text‑only with a 128 k token context window
First model in a family of ultra‑fast models

As we learn more from the developer community about where fast models shine for coding, we’ll introduce additional capabilities, including:

Larger models
Longer context lengths
Multimodal input

Safety & Evaluation

Codex‑Spark includes the same safety training as our mainline models, covering cyber‑relevant scenarios. It was evaluated as part of our standard deployment process, which includes baseline assessments for cybersecurity and other capabilities. The model does not meet the Preparedness Framework threshold for high capability in cybersecurity or biology.

What’s Next

Codex‑Spark is the first step toward a Codex with two complementary modes:

Long‑horizon reasoning & execution – handling complex, time‑consuming tasks.
Real‑time collaboration – enabling rapid, interactive iteration.

Over time these modes will blend:

Tight interactive loops keep you in the moment while background sub‑agents handle longer‑running work.
Parallel task fanning lets many models work simultaneously, giving you breadth and speed without committing to a single mode up front.

As models become more capable, interaction speed emerges as a clear bottleneck. Ultra‑fast inference tightens the feedback loop, making Codex feel more natural to use and expanding what’s possible for anyone turning an idea into working software.