GPT‑5.3‑Codex‑Spark
Source: Hacker News
GPT‑5.3‑Codex‑Spark Research Preview
We’re excited to release a research preview of GPT‑5.3‑Codex‑Spark, a smaller, real‑time‑oriented version of GPT‑5.3‑Codex. This marks the first milestone in our partnership with Cerebras (announced in January link).
Why Codex‑Spark?
- Near‑instant latency on ultra‑low‑latency hardware ( > 1 000 tokens / second).
- Optimized for real‑time coding: quick edits, logic reshaping, UI refinements, and immediate feedback.
- Complements our frontier models, which excel at long‑running, autonomous tasks (hours, days, weeks).
Key Features
| Feature | Details |
|---|---|
| Context window | 128 k tokens |
| Output type | Text‑only (no images or audio) |
| Performance | > 1 000 tokens / second on Cerebras hardware |
| Use‑case focus | Real‑time code editing and rapid iteration |
Availability
- Who can access? ChatGPT Pro users via the Cerebras platform (research preview).
- Purpose: Enable early experimentation while we scale datacenter capacity, harden the end‑to‑end experience, and prepare larger frontier models for broader release.
Rate Limits & Usage
- Codex‑Spark has its own rate limits separate from standard ChatGPT limits.
- Usage does not count toward your regular quota.
- During periods of high demand you may encounter limited access or temporary queuing as we balance reliability across users.
Next Steps
- Experiment: Try Codex‑Spark on your coding tasks and share feedback.
- Iterate: We’ll incorporate developer insights to improve performance, reliability, and feature set.
- Scale: Work with Cerebras to expand capacity and eventually roll out larger models for real‑time coding.
We look forward to seeing what you build with Codex‑Spark!
Speed and Intelligence
Codex‑Spark is optimized for interactive work where latency matters as much as intelligence. You can collaborate with the model in real time—interrupting or redirecting it as it works—and rapidly iterate with near‑instant responses.
Key characteristics
- Fast, lightweight workflow – By default, Codex‑Spark makes minimal, targeted edits.
- User‑controlled testing – It won’t automatically run tests unless you explicitly ask it to.
- Real‑time collaboration – You can pause, steer, or adjust the model on the fly, keeping the development loop tight and responsive.
Coding
Codex‑Spark is a highly capable small model optimized for fast inference.
On SWE‑Bench Pro and Terminal‑Bench 2.0—two benchmarks evaluating agentic software‑engineering capability—GPT‑5.3‑Codex‑Spark demonstrates strong performance while completing the tasks in a fraction of the time compared to GPT‑5.3‑Codex.
Latency Improvements for All Models
While training Codex‑Spark, we discovered that model speed alone isn’t enough for real‑time collaboration. Reducing latency across the entire request‑response pipeline became a priority. The following end‑to‑end enhancements have been added to our harness and will benefit all models:
What We Changed
- Streaming pipeline – Optimized how responses stream between client and server.
- Inference stack – Rewrote key components for faster processing.
- Session initialization – Made the first visible token appear sooner, keeping Codex responsive during iteration.
Technical Highlights
| Improvement | Impact |
|---|---|
| Persistent WebSocket connection (enabled by default for Codex‑Spark) | Reduces client/server round‑trip overhead by ≈ 80 % |
| Optimizations inside the Responses API | Cuts per‑token overhead by ≈ 30 % |
| Faster session start‑up | Decreases time‑to‑first‑token by ≈ 50 % |
Note: The WebSocket path will become the default for all models shortly.
These changes collectively lower latency throughout the request‑response lifecycle, delivering a smoother, more responsive experience for developers using any of our models.
Powered by Cerealis
Codex‑Spark runs on Cerealis’ Wafer Scale Engine 3 — a purpose‑built AI accelerator for high‑speed inference that gives Codex a latency‑first serving tier. We partnered with Cerealis to add this low‑latency path to the same production serving stack as the rest of our fleet, so it works seamlessly across Codex and sets us up to support future models.
“What excites us most about GPT‑5.3‑Codex‑Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible — new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning.”
— Sean Lie, CTO and Co‑Founder of Cerealis
- GPUs remain foundational across our training and inference pipelines and deliver the most cost‑effective tokens for broad usage.
- Cerealis complements that foundation by excelling at workflows that demand extremely low latency, tightening the end‑to‑end loop so Codex feels more responsive as you iterate.
- Hybrid deployments: GPUs and Cerealis can be combined for single workloads to achieve the best performance.
Availability & Details
Codex‑Spark is rolling out today as a research preview for ChatGPT Pro users in the latest versions of:
- the Codex app
- the CLI
- the VS Code extension
Rate Limits
Because it runs on specialized low‑latency hardware, usage is governed by a separate rate limit that may adjust based on demand during the preview.
API Access
We are also making Codex‑Spark available via the API for a small set of design partners. This will help us understand how developers want to integrate the model into their products. Access will be expanded over the coming weeks as we continue tuning the integration under real workloads.
Model Capabilities
- Text‑only with a 128 k token context window
- First model in a family of ultra‑fast models
As we learn more from the developer community about where fast models shine for coding, we’ll introduce additional capabilities, including:
- Larger models
- Longer context lengths
- Multimodal input
Safety & Evaluation
Codex‑Spark includes the same safety training as our mainline models, covering cyber‑relevant scenarios. It was evaluated as part of our standard deployment process, which includes baseline assessments for cybersecurity and other capabilities. The model does not meet the Preparedness Framework threshold for high capability in cybersecurity or biology.
What’s Next
Codex‑Spark is the first step toward a Codex with two complementary modes:
- Long‑horizon reasoning & execution – handling complex, time‑consuming tasks.
- Real‑time collaboration – enabling rapid, interactive iteration.
Over time these modes will blend:
- Tight interactive loops keep you in the moment while background sub‑agents handle longer‑running work.
- Parallel task fanning lets many models work simultaneously, giving you breadth and speed without committing to a single mode up front.
As models become more capable, interaction speed emerges as a clear bottleneck. Ultra‑fast inference tightens the feedback loop, making Codex feel more natural to use and expanding what’s possible for anyone turning an idea into working software.