Claude Moves Fast. Codex Ships.

Published: 1 hour ago (May 4, 2026 at 08:00 PM EDT)

9 min read

Source: Dev.to

Source: Dev.to

Summary

I gave two big coding tasks to both Claude and Codex.

Claude finished in about one hour.
Codex took about eight hours.

At first glance that looks like a clean win for Claude, but the results tell a different story.

What actually happened

Claude

Fast, confident, and useless.
Bad assumptions, broken code, missing integration points.
Only half‑followed the rules.
Produced a shape that would make the codebase harder to maintain even if I patched it into working order.

I threw it away.

Codex

Took a full workday, but read the repo, followed the local instructions, and reused existing patterns.
Added tests, ran checks, and the first run worked.

That changed how I think about AI coding tools.

Context

I’m not writing this as a casual weekend test.
In the last four months I shipped three products where AI performed a serious amount of the implementation work.
I burn through the weekly limits on a $200/month subscription.
This is still a personal opinion, not a benchmark, but it’s built from long hours in real repos where broken code costs me time.

Key Takeaways

Claude optimizes for motion.
Codex optimizes for grounded work.
Fast output is not fast delivery if you spend the next day cleaning it up.
Codex’s friction is annoying, but most of that friction is it reading, verifying, and respecting project rules.
Superpowers improves both tools, but each agent follows the workflow differently.
Claude’s sub‑agent defaults are useful, but delegation does not save you when the main agent is willing to invent around your codebase.
For work I actually ship, I now trust the slower agent more.

Personality Analogy

Claude	Codex
Feels like an engineer who wants to finish the ticket before lunch.	Feels like an engineer who annoys you by reading every linked file before touching the code, then quietly hands you something that passes tests.

The Claude Trap

Progress appears quickly – it edits, assumes, finds a path through your rules, writes what it thinks should exist, and may use sub‑agents aggressively.
First failure is usually small (type mismatch, missing import, non‑existent component API, wrong hook pattern).
You ask for a fix → Claude patches the symptom → something else breaks.
You ask why the original thing failed → instead of tracing the root cause, it often jumps into another patch.

Result: a codebase littered with bandaids. The more you accept, the harder it becomes to tell what is necessary and what is agent residue. This fragility comes not from one bad commit, but from many plausible commits that never proved they matched the system.

Why Codex Feels Slow

Even for simple work it reads first: opens files, checks instructions, searches for existing patterns, and tries to understand why the current code is shaped the way it is.
It is far less eager to introduce a new abstraction just because it can.

My setup wasn’t optimal

The eight‑hour run was not an optimized Codex setup.
I tested Codex almost as soon as I installed it, with only Superpowers installed.
I hadn’t yet tuned my AGENTS.md, wired a stronger agent workflow, or built the muscle memory for telling Codex when to split work.
Much of the time was self‑inflicted setup friction.

What actually consumed time

Codex tested after almost every meaningful change – edit, run the relevant check, read the failure, patch, run again, and repeat.
This is slower than an agent that writes a big diff and says “done,” but it’s why the final result worked.
My Codex setup lacked default sub‑agents. Claude tends to fan out work more aggressively; without explicit sub‑agent instructions, more work stayed in the main thread, lengthening the run.

The Value of Patience

The time spent isn’t wasted; it’s spent on the things that usually decide whether code survives first contact with the repo:

Does this match the surrounding modules?
Is there already a helper for this?
Are there project rules that override the obvious solution?
Is the failure the root cause, or just the first visible symptom?
What is the smallest check that proves this change worked?

What I Want From a Coding Agent

I do not want an agent that wins the stopwatch and loses the branch. I want the branch.

Claude’s limitation

Claude works from model knowledge unless you explicitly push it to fetch current information.
For stable things that’s fine, but for modern SDKs, AI tools, product docs, pricing, CLI behavior, framework releases, or anything that may have changed last month, it’s dangerous.
I don’t want an agent confidently writing against a six‑month‑old mental model of a library.

Codex’s strength

Much more willing to say “this might have changed, search first.”
The installed CLI even exposes --search, and the current Codex docs have first‑class pages for rules and sub‑agents.
The feature surface changes fast, which is the point: for fast‑moving tools, searching is not optional polish – it’s correctness.

That is one of the reasons Codex feels slow: it spends time making sure the premise is still true.

Final Complaint About Claude

My biggest complaint isn’t that Claude is dumb – it’s powerful.
The problem is that it is too willing to route around constraints.

If a repo says “do not edit env files,” I don’t want the agent to decide that .env.example is close enough to documentation.
If a repo says “do not add scripts unless asked,” I don’t want the agent to add a convenience script just because it makes its own workflow nicer.
If I ask “why is this not working,” I don’t want three patch attempts before the root cause is found.

Codex’s approach

Slower partly because the rules stay heavy in its head.
It treats instructions as part of the job, not as decoration.

That matters in real repos, where correctness outweighs raw speed.

Superpowers Workflow

The fastest fix is often the fix that breaks a migration, a hook, a deploy job, or another engineer’s in‑progress work.

I also use Superpowers with both Claude and Codex.

Superpowers is a workflow plugin by Jesse Vincent. The pitch is simple: do not let the agent jump straight into coding. Make it

brainstorm,
plan,
route work,
use TDD when appropriate,
review the result, and
verify before claiming success.

The official Claude marketplace page describes it as a skills framework for brainstorming, sub‑agent development, debugging, TDD, and skill authoring. The GitHub repo also documents Codex installation, so this is not a Claude‑only idea.

Claude + Superpowers

Claude does pretty well with Superpowers. It understands the workflow and can move fast once the plan exists.

The problem is that Claude still feels like it is deciding when the workflow matters.

Sometimes it invokes the right skill.
Sometimes it treats the skill like advice.
Sometimes it decides the task is obvious and starts moving.

Codex + Superpowers

Codex behaves differently. If the rules say “use Superpowers to route this,” Codex is much more likely to actually do that. It will

invoke the routing step,
check which skill applies, and
follow the workflow even when I personally think the task could have been done in two edits.

That strictness was part of why my first Codex run took so long: Superpowers wanted routing and planning, and Codex treated that as a rule, not a suggestion.

That is both good and bad.

For big tasks, I want that discipline: route first, plan first, use sub‑agents when the work is independent, review before declaring done. That is exactly where Superpowers shines.
For small tasks, it can be overkill. If I ask for a copy tweak or a one‑line bug fix, I do not always need a methodology. This is the same trade‑off as Codex itself: the discipline that makes it reliable on large work can feel heavy on tiny work.

Still, if I have to choose which failure mode I prefer, I will take “too much process” over “fast wrong code” almost every time.

Sub‑Agents

Claude defaults harder toward sub‑agents.
Codex can use sub‑agents too, but in my workflow I usually need to ask explicitly or encode it in AGENTS.md. That lines up with the Codex sub‑agents docs, which describe them as explicit helpers for larger tasks. On that first run, I had not done the tuning yet, so Codex paid the cost of doing too much sequentially.

That is a real advantage for Claude: sub‑agents can isolate context, split work, and make large tasks move faster.

But delegation only helps if the delegated work is bounded and grounded.

If the parent agent makes bad assumptions, you get multiple workers implementing bad assumptions in parallel. That is not leverage; that is multiplication.
Codex’s default is more conservative. It does not fan work out as aggressively unless the task clearly benefits from it. Sometimes that costs time; sometimes it prevents a mess.

Metric I Care About

How long from prompt to shippable branch?

Not prompt → diff.
Not prompt → “I made the changes.”
Not prompt → a confident summary.
Prompt → code I can ship without feeling like I need to audit every line for agent damage.

On that metric, the one‑hour Claude run was slower than the eight‑hour Codex run because the Claude output had no path to trust—every line needed suspicion. The Codex output had already done the boring work: inspect, edit, test, verify.

That is why I keep using Codex even when it annoys me.

I do not need the fastest agent.
I need the one whose output I can ship with confidence.

Originally published at harryy.me.