Introducing GPT-5.4 mini and nano
Source: OpenAI Blog
📦 Model Highlights
| Model | Size | Speed vs. GPT‑5 mini | Key Strengths |
|---|---|---|---|
| GPT‑5.4 mini | “mini” (xhigh) | > 2× faster than GPT‑5 mini | Coding, reasoning, multimodal understanding, tool use; approaches GPT‑5.4 performance on many benchmarks |
| GPT‑5.4 nano | “nano” (xhigh) | Smallest & cheapest version of GPT‑5.4 | Classification, data extraction, ranking, simple coding sub‑agents |
Both models are designed for latency‑sensitive product experiences: coding assistants, sub‑agents that finish supporting tasks quickly, computer‑using systems that interpret screenshots, and real‑time multimodal applications.
Note – In many settings the best model isn’t the biggest one; it’s the one that can respond quickly, use tools reliably, and still handle complex professional tasks.
🛠️ When to Use Which Model
-
GPT‑5.4 mini – Ideal for:
- Fast‑iteration coding workflows (targeted edits, code‑base navigation, front‑end generation, debugging loops)
- Systems that combine models of different sizes (e.g., larger GPT‑5.4 does planning, mini handles narrow sub‑tasks)
→ Available in the API, Codex, and ChatGPT
-
GPT‑5.4 nano – Ideal for:
- Classification, data extraction, ranking
- Simple coding sub‑agents that handle supporting tasks
→ Available only via the API
📊 Benchmark Performance
1️⃣ Core Benchmarks (All Models)
| Model | SWE‑Bench Pro (Public) | Terminal‑Bench 2.0 | Toolathon | GPQA Diamond | OSWorld‑Verified |
|---|---|---|---|---|---|
| GPT‑5.4 (xhigh) | 57.7 % | 75.1 % | 54.6 % | 93.0 % | 75.0 % |
| GPT‑5.4 mini (xhigh) | 54.4 % | 60.0 % | 42.9 % | 88.0 % | 72.1 % |
| GPT‑5.4 nano (xhigh) | 52.4 % | 46.3 % | 35.5 % | 82.8 % | 39.0 % |
| GPT‑5 mini (high¹) | 45.7 % | 38.2 % | 26.9 % | 81.6 % | 42.0 % |
¹ The highest
reasoning_effortavailable for GPT‑5 mini is high.
2️⃣ Additional Benchmarks
| Benchmark | GPT‑5.4 (xhigh) | GPT‑5.4 mini (xhigh) | GPT‑5.4 nano (xhigh) | GPT‑5 mini (high¹) |
|---|---|---|---|---|
| MCP Atlas | 67.2 % | 57.7 % | 56.1 % | 47.6 % |
| τ2‑bench (telecom) | 98.9 % | 93.4 % | 92.5 % | 74.1 % |
| GPQA Diamond (re‑listed) | 93.0 % | 88.0 % | 82.8 % | 81.6 % |
| HLE w/ tool | 52.1 % | 41.5 % | 37.7 % | 31.6 % |
| HLE w/o tools | 39.8 % | 28.2 % | 24.3 % | 18.3 % |
| OSWorld‑Verified (re‑listed) | 75.0 % | 72.1 % | 39.0 % | 42.0 % |
| MMMU‑Pro w/ Python | 81.5 % | 78.0 % | 69.5 % | 74.1 % |
| MMMU‑Pro | 81.2 % | 76.6 % | 66.1 % | 67.5 % |
| OmniDocBench 1.5 (no tools)² (lower = better) | 0.109 | 0.126 | 0.241 | 0.179 |
| OpenAI MRCR v2 8‑needle 64K‑128K | 86.0 % | 47.7 % | 44.2 % | 35.1 % |
| OpenAI MRCR v2 8‑needle 128K‑256K | 79.3 % | 33.6 % | 33.1 % | 19.4 % |
| Graphwalks BFS 0K‑128K | 93.1 % | 76.3 % | 73.4 % | 73.4 % |
| Graphwalks parents 0‑128K (accuracy) | 89.8 % | 71.5 % | 50.8 % | 64.3 % |
² Overall Edit Distance. OmniDocBench was run with
reasoning_effortset to none to reflect a pure “no‑reasoning” baseline.
📚 How the Models Fit Into Your Stack
API
- Inputs: Text & image
- Capabilities: Tool use, function calling, web search, file search, computer use, skills
- Context window: 400 k tokens
- Pricing:
- GPT‑5.4 mini – $0.75 / 1 M input tokens, $4.50 / 1 M output tokens
- GPT‑5.4 nano – $0.20 / 1 M input tokens, $1.25 / 1 M output tokens
Codex
- Available across the Codex app, CLI, IDE extension, and web.
- Uses only 30 % of the GPT‑5.4 quota, delivering ~⅓ the cost for simpler coding tasks.
- Can delegate to GPT‑5.4 mini sub‑agents for low‑reasoning work.
ChatGPT
- Free & Go users: “Thinking” feature (via the + menu) uses GPT‑5.4 mini.
- All other users: GPT‑5.4 mini serves as a rate‑limit fallback for GPT‑5.4 Thinking.
🤖 Sub‑Agents & System Design
“Instead of using one model for everything, developers can compose systems where larger models decide what to do and smaller models execute quickly at scale.”
- Example: In Codex, GPT‑5.4 handles planning & final judgment, while GPT‑5.4 mini sub‑agents perform parallel tasks such as searching a codebase, reviewing large files, or processing supporting documents.
- Learn more about sub‑agents in the Codex docs (opens in a new window).
📈 Latency & Cost Disclaimer
Latency estimates are derived from production‑behavior simulations that include tool‑call duration, sampled tokens, and input tokens. Real‑world latency can vary substantially based on many factors not captured in the simulation. Costs are based on current API pricing and may change in the future. Reasoning efforts were swept from low to xhigh.
GPT‑5.4 mini and GPT‑5.4 nano are now live. Choose the model that best balances speed, cost, and capability for your workload!
Low‑cost, low‑latency performance.