I asked Opus 4.6 and Codex 5.2 to build the same thing, and I was surprised with the results

Published: 1 hour ago (February 6, 2026 at 06:47 PM EST)

4 min read

Source: Dev.to

Most of the world is now using Opus 4.6 and Codex 5.2

A little over 24 hours after their release, I was finally able to test them myself. Was I about to need to change my workflow?

I built an application a while back for my own Discord server that ingests content for a RAG system while also helping moderate by auto‑assigning roles. It’s a fun side project.

Before I hop back in and start messing around with these new models, I wanted to test them first to see which one I should use. I settled on a visualization of my data test, so we can consider this frontend‑heavy with minor logic implementation. I’ll run separate tests for other projects in the backend category soon.

In this test I’m measuring two main things: speed and accuracy. Let’s begin.

Setting up the test

Connection – For both models I ensured they had a connection to my Xano.com workspace and could read the data. (There’s another discussion to be had on MCP, but that’s how the models will be communicating with the platform.)
Environments –
- I used Cursor to test Codex.
- I used Claude (Claude 3 Opus 4.6) to test Opus.
(Critically, these are different environments, and the behind‑the‑scenes aspects are handled differently enough to skew results, but I have other tests for this in the future.)
Prompt – Both models received the same prompt:
Please take the data from workspace 11 inside Xano (MCP); I want you to create a visual representation of all of my data as it relates to one another. By this, I want you to show me an isometric view of all the relationships between tables, data, and functions; this includes middlewares, authentication systems, tasks, and anything else. Please go through the entire application and assess all functions, tables, endpoints, tasks, and more to create a map.
1. Scan through the necessary .XS files. Use MCP to assist with both application flow and data‑storage.
2. Create an HTML page with CSS and JS that shows, in isometric view, the landscape of the application, with a way to visualize how everything is interconnected. This should be mildly video‑game‑like, but with emphasis on readability and accessibility.
3. To assist with readability: query all data, persist as files within local.
Your prompts may look different, which will 100 % impact the outcome of this experiment. However, I want to test the models on their ability to extrapolate from what I provide.

And we’re off to the races!

Side‑by‑side building

Pressing Enter on both, I watched Codex whiz through the tasks while Claude was left flibbergasting for several minutes at a time. There isn’t a ton of variability in the decision‑making, but the speed at which they execute is notably different.

Codex finished in 5 minutes 55 seconds.
Claude (Opus 4.6) wrapped up shortly after the 8‑minute mark.

Winner of Development Speed: Codex

The awaited outputs

Opus 4.6

Opus 4.6 output

I started with the Opus 4.6 output first. It wasn’t much of a surprise, but when I opened the page it worked, was accessible, and matched a visual model I had conceived in my head.

Auto‑zoom, drag‑around, click‑on nodes, click‑off nodes → sidebar opened with connectivity information.
Not blown away, but I fully expected Opus 4.6 to do a great job. The standard was upheld.

Codex 5.2

Empty build

Codex 5.2: Empty build

This was not expected. I’ve only heard good things about Codex, so seeing nothing load was disappointing.

I can’t necessarily blame Codex, but within the constraints of the given task it over‑performed, assuming it should serve content from the server side. Since I wanted to keep this local, I copied the error codes, threw them into Cursor like a proper vibe‑coder, and then hit refresh.

Fixed build

Codex 5.2: Fixed build

The visualization was clunky; I had little understanding of what I was looking at. The entire UX required additional prompting for me to make it feasible to use.

Winner of Development Accuracy: Opus

Summary

Ultimately, I wasn’t too disappointed with the outcome: Claude has always seemed to perform a little…

(The original text cuts off here; the rest of the summary can be added later.)

Better for me on the frontend

Factoring in the environment differences, Codex is still aching to be field‑tested in the CLI with some proper backend development.

But was I surprised? Yes. I did truly expect similar results between the two.

It doesn’t seem that I’ll need to change my workflow too much in the time being, as Claude really does have an affinity for being able to read between the lines, extrapolate the user’s intentions, and deliver.

But it does offer debate around the personality of a model and which one suits your building style best: interpretive vs. executional.

With that, and my building style, I’d assign Opus 4.6 as the winner in this test. Codex delivers speed, but accuracy and outcome are still the deciding factor.

Leave me a comment if you want me to test anything in particular. More tests to come!

I asked Opus 4.6 and Codex 5.2 to build the same thing, and I was surprised with the results

Most of the world is now using Opus 4.6 and Codex 5.2

Setting up the test

And we’re off to the races!