I Built a Thing That Builds Things: Tres Comas Scrum
Source: Dev.to
The Setup
The architecture is dead simple:
- CEO holds the product backlog, plans sprints, reviews deliveries and decides when the project is “done”.
- Coder receives a ticket, reads the existing codebase, and delivers code in XML format.
- Tester wakes up every 4 sprints, inspects what was built, and generates new stories based on what’s broken or missing.
Each delivery gets tested in an isolated sandbox (bwrap on Linux) before being written to disk. If tests fail, the Coder gets the error and tries again — up to 3 times.
That’s it. No orchestration framework, no LangChain, no AutoGen. Just a SQLite database, a few hundred lines of Python, and a lot of time.sleep(4) to stay under the rate limit.
What Went Wrong (A Highlight Reel)
- The Coder kept writing ````python` inside XML tags. Every sprint. We added a regex to strip markdown backticks, then added a visual example to the system prompt, but it still happened occasionally.
code.pyis a reserved name. Our test runner was namedcode.pyand Python kept importing the stdlibcodemodule instead. Renamed it totest_runner.py, fixing two hours of debugging.sys.exit()makes bwrap angry. The sandbox intercepted pytest’s exit call and threw a fit. Removing thesys.exit()wrapper solved the issue instantly.- The Coder delivered 0 files and the tests passed. No delivery = no test files = the runner returned
success: True. Added checks: “no files = failure” and “no test files = failure”. The Coder has been slightly more motivated since then. argparseisn’t in the stdlib imports list. Our system strips external imports before inlining code into the test runner. We forgot about many stdlib modules. Addedargparse,subprocess,random,zipfile,gzip,statistics,decimal, etc., expanding the list as needed.
What Went Right
-
The CEO figured out the right build order on its own. It started with foundations (Agent, LLM, Memory, Tools), then the run loop, then inter‑agent communication, then the tools library, then docs. That’s exactly what a senior developer would do, without any explicit instruction.
-
The Tester’s feedback at Sprint 8 was genuinely useful:
“Tests pass but you can’t actually use the framework in practice. The bus pattern is implemented but agents don’t use it. No CLI, no examples, no error handling.”
The CEO turned that into six actionable stories, all fixed by Sprint 12.
-
By Sprint 12, the generated framework had:
- An
Agentclass withrun()andrun_autonomous() - A
MessageBuswith pub/sub between agents - Persistent
Memorywith search and tagging - A
Toolsregistry with 16 pre‑configured tools - Error handling in the LLM provider
- A
ConfigManagerthat loads agents from YAML - A README
Not bad for something that started with zero lines of code.
- An
The Meta Thing
The whole point of this experiment was to see if agents could build the very framework you’d use to build agents. They mostly did. The generated code has Pydantic V2 warnings everywhere and Message doesn’t extend BaseModel, so model_dump() will crash in production — but the architecture is sound.
The next step is to use that framework to rewrite this builder: agents orchestrated by their own creation, running on Ollama locally with qwen3:8b. Turtles all the way down.
Try It
The project is on GitHub: Tres Comas Scrum
You’ll need Linux (bwrap for sandboxing), Python 3.10+, and an OpenRouter API key. The free tier is enough to run a full build — just set the model to arcee-ai/trinity-large-preview:free and wait.
Fair warning: it will take a few hours, the Coder will occasionally deliver empty XML, and you will at some point stare at an IndentationError that makes no sense. That’s part of the experience.
Named after the Tres Comas tequila from Silicon Valley. Three agents, three commas. Russ Hanneman would not be impressed, but he’d probably try to invest anyway.