How do you know the software is working?

Published: 3 months ago (February 3, 2026 at 12:54 PM EST)

6 min read

Source: Dev.to

Source: Dev.to

What We’ll Cover

Mindset – the many roles you’ll take on: product designer, project manager, tech lead, and quality‑assurance engineer.
Brainstorming – turning a feature idea into a concrete specification.
Managing coding agents – how to enforce rules, why they matter, and how to keep the agents on track.
Shipping – confidently deploying AI‑generated code to production (yes, we’ll do some “vibe coding” in prod).

The order may vary. Buckle up!

Recap

In the previous post we showed how to make Claude stick to conventions (tl;dr – skills + hooks fix it). Claude now follows the rules, but…

Marcin, all tasks are complete.
I open a browser and see:
NoMethodError: undefined method 'hallucinated_method' for an instance of User (NoMethodError)

“Great job, Claude! High‑five… let’s ship it to production… NOT.”

That raises a fundamental question:

How do you know the software is actually working?

A Real‑World Story (2017)

Company: Paladin Software
Task: Process massive YouTube‑earnings CSV files (gigabyte‑sized).
Context: I was the on‑call engineer ensuring timely, reliable processing.

One beautiful Thursday afternoon I was at my favourite spot in Kraków – Dolnych Młynów. Friday was a day off. A client uploaded their spreadsheet on Friday; when I returned on Monday, the earnings still hadn’t been processed.

Looking at Sidekiq’s failed jobs queue:

NoMethodError: undefined method 'hallucinated_method' for an instance of User (NoMethodError)

Was I an LLM before it was a thing? I was certainly shipping code like one.

Why LLMs Need Strict Guardrails

Anterograde amnesia: LLMs can’t form new memories after a session starts. They remember their training data, but new experiences don’t stick.
Non‑determinism: An AI agent may produce brilliant code one run and broken code the next.

Therefore we must give the model strict rules each time it writes code (see the previous post on enforcing rules) and add checks and reviews.

Deterministic Checks in Your Workflow

Below is my opinion on what should be included in local CI:

Tool	Purpose	Recommended Flags
Rubocop	Static code analyser & linter (autocorrect on)	`bundle exec rubocop -A`
Prettier	Formatter for ERB, CSS, JS	`yarn prettier --config .prettierrc.json app/packs app/components --write`
Brakeman	Security‑vulnerability static analysis	`bundle exec brakeman --quiet --no-pager --except=EOLRails`
RSpec	Testing framework (use your favourite)	`bundle exec parallel_rspec --serialize-stdout --combine-stderr`
SimpleCov	Coverage reporting (used by RSpec)	–
Undercover	Warns about changed code without tests (uses git diffs, code structure, SimpleCov)	`bundle exec undercover --lcov coverage/lcov/app.lcov --compare origin/master`

What This Gives Us

Single code style – consistent formatting.
Security – known vulnerabilities flagged early.
Test coverage – new and changed code are exercised.
No runtime errors – failures surface in CI, not in prod.

“How many times has an AI agent told you a test failure is unrelated to its changes?”

Boilerplate Concerns?

No. Unit tests are fast, and with coding agents they become more maintainable than ever. The reliability payoff far outweighs the modest amount of boilerplate.

Sample CI Output

Continuous Integration
Running checks...

Rubocop
bundle exec rubocop -A
✅ Rubocop passed in 1.98s

Prettier
yarn prettier --config .prettierrc.json app/packs app/components --write
✅ Prettier passed in 1.57s

Brakeman
bundle exec brakeman --quiet --no-pager --except=EOLRails
✅ Brakeman passed in 8.76s

RSpec
bundle exec parallel_rspec --serialize-stdout --combine-stderr
✅ RSpec passed in 1m32.45s

Undercover
bundle exec undercover --lcov coverage/lcov/app.lcov --compare origin/master
✅ Undercover passed in 0.94s

✅ Continuous Integration passed in 1m45.70s

If you’re on Rails 8.1+, most of these checks are already baked into the framework. For Rails 8.0 and earlier you can use my ported implementation or roll your own.

Lessons from My Early Career (2014)

Job: Junior Rails developer at Netguru.
Onboarding: “The Netguru way” – specific libraries and patterns.
Feedback: My models were “fat”. I was pointed to Code Climate’s article “7 Ways to Decompose Fat ActiveRecord Models.”

I heard the rules but was too focused on business logic to apply them.

“The CLAUDE.md says ‘ALWAYS STOP and ask for clarification rather than making assumptions,’ and I violated that repeatedly. I got caught up in the momentum of the Rails 8 upgrade and stopped being careful.”

This isn’t a new problem for the software industry. The remedy? Code review. Every pull request must be checked by another developer, giving less‑experienced developers a safety net and ensuring higher quality overall.

…

(The post continues in the next installment.)

Three‑Stage Review Process

Goal:

Teach good practices.
Enable experienced developers to mentor others and pass on their knowledge.
Enforce the rules so everybody wins.

Remember: Never let the developer review their own code. The same applies to AI agents.

Why a Three‑Stage Review?

Specification compliance – Verify that the implementation matches the functional specifications exactly (neither more nor less).
Rails & project‑specific conventions – Load all conventions (see the previous post) and check them:
- Are the interfaces clean?
- Are view components used instead of partials?
- Are jobs idempotent and thin?
- Do the tests verify behaviour?
General code‑quality review – Assess architecture, design, documentation, standards, and maintainability.

Each stage is performed by a different agent with a fresh perspective and no attachment to the feature, giving a comprehensive overview of the implementation and any possible deviations.

Example Full Report

1. Spec compliance – line‑by‑line verification

Requirement	Implementation	Status
Column: `delay_peer_reviews`	`:delay_peer_reviews`	✅ Match

2. Rails conventions – checklist

Convention	Status
Reversible migration	PASS
Handles existing data	PASS

3. Code quality – structured report

Strengths
Critical / Important / Minor issues (with references)
Merge assessment

Final summary table

Check	Status
✅ Spec compliance	Passed
✅ Rails conventions	Passed
✅ Code quality	Approved with minor suggestions
✅ Local CI	Passed

Ready for merge.

When Issues Are Found

The review consolidates them into a single, actionable list, making it easy for the author to address each point before the final merge.