Introducing GPT-5.4 mini and nano

Published: 3 days ago (March 17, 2026 at 06:00 AM EDT)

4 min read

Source: OpenAI Blog

📦 Model Highlights

Model	Size	Speed vs. GPT‑5 mini	Key Strengths
GPT‑5.4 mini	“mini” (xhigh)	> 2× faster than GPT‑5 mini	Coding, reasoning, multimodal understanding, tool use; approaches GPT‑5.4 performance on many benchmarks
GPT‑5.4 nano	“nano” (xhigh)	Smallest & cheapest version of GPT‑5.4	Classification, data extraction, ranking, simple coding sub‑agents

Both models are designed for latency‑sensitive product experiences: coding assistants, sub‑agents that finish supporting tasks quickly, computer‑using systems that interpret screenshots, and real‑time multimodal applications.

Note – In many settings the best model isn’t the biggest one; it’s the one that can respond quickly, use tools reliably, and still handle complex professional tasks.

🛠️ When to Use Which Model

GPT‑5.4 mini – Ideal for:
- Fast‑iteration coding workflows (targeted edits, code‑base navigation, front‑end generation, debugging loops)
- Systems that combine models of different sizes (e.g., larger GPT‑5.4 does planning, mini handles narrow sub‑tasks)
  → Available in the API, Codex, and ChatGPT
GPT‑5.4 nano – Ideal for:
- Classification, data extraction, ranking
- Simple coding sub‑agents that handle supporting tasks
  → Available only via the API

📊 Benchmark Performance

1️⃣ Core Benchmarks (All Models)

Model	SWE‑Bench Pro (Public)	Terminal‑Bench 2.0	Toolathon	GPQA Diamond	OSWorld‑Verified
GPT‑5.4 (xhigh)	57.7 %	75.1 %	54.6 %	93.0 %	75.0 %
GPT‑5.4 mini (xhigh)	54.4 %	60.0 %	42.9 %	88.0 %	72.1 %
GPT‑5.4 nano (xhigh)	52.4 %	46.3 %	35.5 %	82.8 %	39.0 %
GPT‑5 mini (high¹)	45.7 %	38.2 %	26.9 %	81.6 %	42.0 %

¹ The highest reasoning_effort available for GPT‑5 mini is high.

2️⃣ Additional Benchmarks

Benchmark	GPT‑5.4 (xhigh)	GPT‑5.4 mini (xhigh)	GPT‑5.4 nano (xhigh)	GPT‑5 mini (high¹)
MCP Atlas	67.2 %	57.7 %	56.1 %	47.6 %
τ2‑bench (telecom)	98.9 %	93.4 %	92.5 %	74.1 %
GPQA Diamond (re‑listed)	93.0 %	88.0 %	82.8 %	81.6 %
HLE w/ tool	52.1 %	41.5 %	37.7 %	31.6 %
HLE w/o tools	39.8 %	28.2 %	24.3 %	18.3 %
OSWorld‑Verified (re‑listed)	75.0 %	72.1 %	39.0 %	42.0 %
MMMU‑Pro w/ Python	81.5 %	78.0 %	69.5 %	74.1 %
MMMU‑Pro	81.2 %	76.6 %	66.1 %	67.5 %
OmniDocBench 1.5 (no tools)² (lower = better)	0.109	0.126	0.241	0.179
OpenAI MRCR v2 8‑needle 64K‑128K	86.0 %	47.7 %	44.2 %	35.1 %
OpenAI MRCR v2 8‑needle 128K‑256K	79.3 %	33.6 %	33.1 %	19.4 %
Graphwalks BFS 0K‑128K	93.1 %	76.3 %	73.4 %	73.4 %
Graphwalks parents 0‑128K (accuracy)	89.8 %	71.5 %	50.8 %	64.3 %

² Overall Edit Distance. OmniDocBench was run with reasoning_effort set to none to reflect a pure “no‑reasoning” baseline.

📚 How the Models Fit Into Your Stack

API

Inputs: Text & image
Capabilities: Tool use, function calling, web search, file search, computer use, skills
Context window: 400 k tokens
Pricing:
- GPT‑5.4 mini – $0.75 / 1 M input tokens, $4.50 / 1 M output tokens
- GPT‑5.4 nano – $0.20 / 1 M input tokens, $1.25 / 1 M output tokens

Codex

Available across the Codex app, CLI, IDE extension, and web.
Uses only 30 % of the GPT‑5.4 quota, delivering ~⅓ the cost for simpler coding tasks.
Can delegate to GPT‑5.4 mini sub‑agents for low‑reasoning work.

ChatGPT

Free & Go users: “Thinking” feature (via the + menu) uses GPT‑5.4 mini.
All other users: GPT‑5.4 mini serves as a rate‑limit fallback for GPT‑5.4 Thinking.

🤖 Sub‑Agents & System Design

“Instead of using one model for everything, developers can compose systems where larger models decide what to do and smaller models execute quickly at scale.”

Example: In Codex, GPT‑5.4 handles planning & final judgment, while GPT‑5.4 mini sub‑agents perform parallel tasks such as searching a codebase, reviewing large files, or processing supporting documents.
Learn more about sub‑agents in the Codex docs (opens in a new window).

📈 Latency & Cost Disclaimer

Latency estimates are derived from production‑behavior simulations that include tool‑call duration, sampled tokens, and input tokens. Real‑world latency can vary substantially based on many factors not captured in the simulation. Costs are based on current API pricing and may change in the future. Reasoning efforts were swept from low to xhigh.

GPT‑5.4 mini and GPT‑5.4 nano are now live. Choose the model that best balances speed, cost, and capability for your workload!

Low‑cost, low‑latency performance.

Introducing GPT-5.4 mini and nano

📦 Model Highlights

🛠️ When to Use Which Model

📊 Benchmark Performance

1️⃣ Core Benchmarks (All Models)

2️⃣ Additional Benchmarks

📚 How the Models Fit Into Your Stack

API

Codex

ChatGPT

🤖 Sub‑Agents & System Design

📈 Latency & Cost Disclaimer

Related posts

GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users

Build a Domain-Specific Embedding Model in Under a Day

Nemotron 3 Content Safety 4B: Multimodal, Multilingual Content Moderation

The Math That’s Killing Your AI Agent