I Built a Thing That Builds Things: Tres Comas Scrum

Published: (February 19, 2026 at 09:36 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

The Setup

The architecture is dead simple:

  • CEO holds the product backlog, plans sprints, reviews deliveries and decides when the project is “done”.
  • Coder receives a ticket, reads the existing codebase, and delivers code in XML format.
  • Tester wakes up every 4 sprints, inspects what was built, and generates new stories based on what’s broken or missing.

Each delivery gets tested in an isolated sandbox (bwrap on Linux) before being written to disk. If tests fail, the Coder gets the error and tries again — up to 3 times.

That’s it. No orchestration framework, no LangChain, no AutoGen. Just a SQLite database, a few hundred lines of Python, and a lot of time.sleep(4) to stay under the rate limit.

What Went Wrong (A Highlight Reel)

  • The Coder kept writing ````python` inside XML tags. Every sprint. We added a regex to strip markdown backticks, then added a visual example to the system prompt, but it still happened occasionally.
  • code.py is a reserved name. Our test runner was named code.py and Python kept importing the stdlib code module instead. Renamed it to test_runner.py, fixing two hours of debugging.
  • sys.exit() makes bwrap angry. The sandbox intercepted pytest’s exit call and threw a fit. Removing the sys.exit() wrapper solved the issue instantly.
  • The Coder delivered 0 files and the tests passed. No delivery = no test files = the runner returned success: True. Added checks: “no files = failure” and “no test files = failure”. The Coder has been slightly more motivated since then.
  • argparse isn’t in the stdlib imports list. Our system strips external imports before inlining code into the test runner. We forgot about many stdlib modules. Added argparse, subprocess, random, zipfile, gzip, statistics, decimal, etc., expanding the list as needed.

What Went Right

  • The CEO figured out the right build order on its own. It started with foundations (Agent, LLM, Memory, Tools), then the run loop, then inter‑agent communication, then the tools library, then docs. That’s exactly what a senior developer would do, without any explicit instruction.

  • The Tester’s feedback at Sprint 8 was genuinely useful:

    “Tests pass but you can’t actually use the framework in practice. The bus pattern is implemented but agents don’t use it. No CLI, no examples, no error handling.”

    The CEO turned that into six actionable stories, all fixed by Sprint 12.

  • By Sprint 12, the generated framework had:

    • An Agent class with run() and run_autonomous()
    • A MessageBus with pub/sub between agents
    • Persistent Memory with search and tagging
    • A Tools registry with 16 pre‑configured tools
    • Error handling in the LLM provider
    • A ConfigManager that loads agents from YAML
    • A README

    Not bad for something that started with zero lines of code.

The Meta Thing

The whole point of this experiment was to see if agents could build the very framework you’d use to build agents. They mostly did. The generated code has Pydantic V2 warnings everywhere and Message doesn’t extend BaseModel, so model_dump() will crash in production — but the architecture is sound.

The next step is to use that framework to rewrite this builder: agents orchestrated by their own creation, running on Ollama locally with qwen3:8b. Turtles all the way down.

Try It

The project is on GitHub: Tres Comas Scrum

You’ll need Linux (bwrap for sandboxing), Python 3.10+, and an OpenRouter API key. The free tier is enough to run a full build — just set the model to arcee-ai/trinity-large-preview:free and wait.

Fair warning: it will take a few hours, the Coder will occasionally deliver empty XML, and you will at some point stare at an IndentationError that makes no sense. That’s part of the experience.

Named after the Tres Comas tequila from Silicon Valley. Three agents, three commas. Russ Hanneman would not be impressed, but he’d probably try to invest anyway.

0 views
Back to Blog

Related posts

Read more »

Apex B. OpenClaw, Local Embeddings.

Local Embeddings para Private Memory Search Por default, el memory search de OpenClaw envía texto a un embedding API externo típicamente Anthropic u OpenAI par...