I Built ac-trace to Check What Tests Actually Protect

Published: 1 month ago (March 9, 2026 at 10:45 AM EDT)

5 min read

Source: Dev.to

Source: Dev.to

The core problem

Passing tests are often a weaker signal than teams think. Coverage is not enough either. A test suite can be green, a code path can be exercised, and the intended behavior can still be only weakly defended.

What I care about is not just whether code ran, or whether assertions passed. The harder question is this:

Are the acceptance criteria actually protected?

The problem: green tests do not prove much by themselves

In many teams, these ideas get blended together:

tests are passing
code is covered
therefore the requirement is safe

But those are different signals.

A passing test tells you that some expectation held in one scenario.
Coverage tells you that code executed.

Neither one, by itself, proves that the important business behavior is strongly defended against breakage.

This becomes more important with AI‑assisted coding.

AI is good at producing plausible implementations and plausible tests very quickly. That is useful, but it also lowers the cost of producing code that looks well‑tested. You get more test files, more green checks, more visible structure — and sometimes only shallow confidence underneath.

A concrete example

Imagine a billing service with this acceptance criterion:

Premium users must never be charged above their contractual monthly cap.

Now imagine the code has:

tests for invoice creation
tests for the premium‑user billing flow
good coverage around the billing function

The relevant lines all execute. The pipeline is green. Looks fine.

But now remove the cap check, flip the comparison, or mutate the billing logic in a way that breaks the intended behavior.

Do the tests fail?

If they do not, then the acceptance criterion was never really protected. The system had tests. The code was covered. But the thing that mattered was still weakly defended.

schema

That is why I built ac‑trace

ac‑trace (Repo: ) is an open‑source tool that:

Maps acceptance criteria to code and tests.
Mutates the mapped code to verify whether the tests actually catch the breakage.

In plain terms:

It tries to answer whether the tests defend the behavior they are supposed to defend.

This is not just traceability for documentation. The point is not only to show links between requirements, code, and tests, but also to test whether those links have teeth.

How it works

The current workflow is intentionally simple:

Define acceptance criteria
Map them to relevant source code and tests
Infer some links from annotated tests
Mutate the mapped implementation
Run the relevant tests
Generate a report showing what failed and what survived

So the flow is roughly:

acceptance criteria → code → tests → mutation → report

If the mapped code is changed and the linked tests fail, that is a useful sign.
If the mapped code is changed and the linked tests still pass, that is also useful — it reveals a confidence gap that might otherwise stay hidden behind a green suite.

Why this matters more now

AI‑assisted coding is not the problem by itself.
The problem is that AI increases output faster than it increases justified confidence.

When implementation and tests both become cheap to generate, teams need better ways to distinguish between:

code that looks tested
code that is covered
code whose important behavior is actually defended

Without that distinction, it becomes very easy to over‑trust green pipelines. That is the broader reason for ac‑trace: a practical tool that pushes on this exact point.

Current scope

ac‑trace is still early and intentionally narrow. Right now it focuses on:

Python
pytest
YAML manifests
Inferred links from annotated tests
Generated reports

I kept the scope small on purpose. I would rather build a narrow tool around one precise question than make broad claims too early. This is an experiment in making one software‑quality problem more concrete.

Launch note

This post is also the announcement: ac‑trace is now open source → .

If you work on backend systems, care about software quality, or are thinking seriously about how AI changes testing and confidence, I think this problem is worth exploring.

I built ac‑trace because I kept coming back to the same thought:

Passing tests are useful, but they do not necessarily mean the acceptance criteria are protected.

I want a more direct way to inspect that gap.

Conclusion

ac‑trace is my open‑source attempt to make the gap between green tests and justified confidence more visible.

Call to Action

Check out the repository, try it on your own project, and let me know what you think!

Check out the [repo](https://github.com/DmytroHuzz/ac-trace), try it on a small Python project, and tell me where the idea is useful, naive, or worth pushing further.

I Built ac-trace to Check What Tests Actually Protect

The core problem

The problem: green tests do not prove much by themselves

A concrete example

That is why I built ac‑trace

How it works

Why this matters more now

Current scope

Launch note

Conclusion

Call to Action

Related posts

Building an Open Source, Agent Friendly Mobile Engagement Tool

I Built an Open-Source YouTube Scraper for Python, No API Key Needed

I just published my first Python package — a tool that maps your project's imports visually 🕸️

Paperless-ngx vs Stirling-PDF: Which to Use?