Software factories and the agentic moment
Source: Hacker News
The Software Factory
We built a Software Factory – a non‑interactive development pipeline where specifications and scenarios drive autonomous agents that:
- Write code
- Run test harnesses
- Converge on a solution
All of this happens without any human‑written code or human review.
Narrative Overview
Below is a concise narrative of the approach. If you prefer to start from first principles, the following constraints and guidelines can be applied iteratively to guide any team toward the same intuitions, convictions1, and ultimately a self‑sustaining factory2.
Kōan / Mantra
Why am I doing this?
(Implied answer: the model should be doing this instead.)
Rules
- Code must not be written by humans.
- Code must not be reviewed by humans.
Practical Metric
- If you haven’t spent at least $1,000 on tokens today per human engineer, your software factory still has room for improvement.
Footnotes
The StrongDM AI Story
On July 14 2025, Jay Taylor and Navan Chauhan joined me—Justin McCarthy, co‑founder & CTO—in founding the StrongDM AI team.
The Catalyst
In late 2024 we observed a pivotal shift: with the second revision of Claude 3.5 (October 2024), long‑horizon, agentic coding workflows began to compound correctness rather than error.

By December 2024, the model’s long‑horizon coding performance was unmistakable, as demonstrated in Cursor’s YOLO mode.
Why This Matters
Before this improvement, iteratively applying LLMs to coding tasks accumulated a wide range of errors:
- Misunderstandings of intent
- Hallucinated APIs or data
- Syntax mistakes
- DRY violations across versions
- Library incompatibilities
These errors caused applications to decay and eventually collapse—a classic “death by a thousand cuts.”
The New Paradigm
Combined with YOLO mode, Anthropic’s updated model gave us the first glimpse of what we now call non‑interactive development or grown software—software that evolves correctly without constant human correction.
Find Knobs, Turn To Eleven

“These go to 11”
In the first hour of the first day of our AI team, we established a charter that set us on a path toward a series of findings (which we refer to as our “unlocks.”) In retrospect, the most important line in the charter document was the following:

Hands off!
Initially it was just a hunch—an experiment. How far could we get without writing any code by hand?
Not very far! At least, not very far, until we added tests. However, the agent, obsessed with the immediate task, soon began to take shortcuts:
return trueis a great way to pass narrowly written tests, but probably won’t generalize to the software you want.
Tests alone were not enough. What about:
- Integration tests?
- Regression tests?
- End‑to‑end tests?
- Behavior tests?
From Tests to Scenarios and Satisfaction
One recurring theme of the agentic moment is the need for new language.
For example, the word test has proven insufficient and ambiguous:
- A test stored in the codebase can be lazily rewritten to match the code.
- The code can be rewritten to trivially pass the test.
We repurposed the word scenario to represent an end‑to‑end user story, often stored outside the codebase (similar to a holdout set in model training). A scenario can be intuitively understood and flexibly validated by an LLM.

Because much of the software we grow itself has an agentic component, we transitioned from Boolean definitions of success (“the test suite is green”) to a probabilistic and empirical one. We use the term satisfaction to quantify this validation:
Of all the observed trajectories through all the scenarios, what fraction of them likely satisfy the user?
Validating Scenarios in the Digital Twin Universe
In earlier development cycles we relied on integration tests, regression tests, and UI‑automation to answer the simple question “Is it working?”
Why the old approach fell short
| Issue | Why it mattered |
|---|---|
| Tests are too rigid | Our code now uses agents and LLM‑driven loops as core design primitives. Determining success often required an LLM‑as‑judge rather than a static assertion. |
| Tests can be reward‑hacked | Models can learn to “game” the test harness, producing superficially correct outputs without truly solving the problem. We needed validation that was resistant to such cheating. |
Introducing the Digital Twin Universe (DTU)
The DTU provides behavioral clones of the third‑party services our software depends on. We have built twins for:
- Okta
- Jira
- Slack
- Google Docs
- Google Drive
- Google Sheets
Each twin replicates the service’s API surface, edge‑case handling, and observable behavior.
Benefits of the DTU
- Massive scale – run tests at volumes and request rates far beyond production limits.
- Safe failure injection – simulate dangerous or impossible failure modes without affecting real services.
- Cost‑free execution – avoid rate‑limit throttling, abuse‑detection blocks, and API usage fees.
- Rapid iteration – execute thousands of scenarios per hour, enabling exhaustive scenario coverage.
Digital Twin Universe: behavioral clones of Okta, Jira, Google Docs, Slack, Drive, and Sheets
(click to enlarge)
With the DTU we can now validate complex, LLM‑driven workflows reliably, ensuring that our software behaves correctly even under extreme or pathological conditions.
Unconventional Economics
Our success with DTU illustrates one of the many ways the Agentic Moment has profoundly changed the economics of software.
- Creating a high‑fidelity clone of a significant SaaS application was always technically possible, but never economically feasible.
- Generations of engineers may have wanted a full in‑memory replica of their CRM to test against, yet they self‑censored the proposal. They didn’t even bring it to their manager because they knew the answer would be “no.”
The Software‑Factory Mindset
Those of us building software factories must practice a deliberate naivete:
- Identify the habits, conventions, and constraints of Software 1.0.
- Systematically remove or replace them.
The DTU is our proof that what was unthinkable six months ago is now routine.
Software 1.0 – Introductory Video
Read Next
- Principles – what we believe is true about building software with agents.
- Techniques – repeated patterns for applying those principles.
- Products – tools we use daily and believe others will benefit from.
Thank you for reading. We wish you the best of luck constructing your own Software Factory.