We benchmarked an 84% token reduction. Then we open sourced the protocol.
Source: Dev.to
I was watching an agent answer a simple question.
The question was small. Three sentences would have covered it. The agent loaded the page, parsed the HTML, walked through nav bars, footer links, cookie banners, a sticky “subscribe to our newsletter” modal, three paragraphs of preamble, and finally found the part it needed.
Twenty thousand tokens.
For three sentences.
This is happening everywhere right now. Quietly. Constantly. Every agent, every query, every page. We’ve handed agents a web that was built for human eyeballs and asked them to make it work.
It does… expensively.
The shape is wrong
The web was built for browsers. Humans scroll, scan, skip the boilerplate. Our eyes know what nav bars look like.
Agents don’t get that for free.
They read the whole thing. There’s no “give me the relevant part” channel, scaffolding and all—every header, every analytics script, every footer link in twelve languages. The cost is paid in tokens, latency, and the slightly absurd reality that an agent might burn more compute parsing your nav menu than thinking about your content.
This is a shape problem. No amount of optimization fixes a shape mismatch.
ACP: a shape, not a framework
Atomic Content Protocol (ACP) – an open spec for structured content envelopes.
- Not a framework.
- Not a platform.
- A shape.
You pre‑compute a compact, enriched representation of your content, persist it, and serve it first. The envelope sits in front of the body; it doesn’t replace it. The body is still there if anyone needs it—most of the time, agents don’t.
Built on top of MCP. Open spec, MIT licensed, npm package shipped. Designed to complement protocols already in motion, not replace them.
Example envelope
{
"id": "atom_7f3a...",
"summary": "AI is the capability of computational systems to perform tasks associated with human intelligence...",
"classification": "reference",
"language": "en",
"tags": [
"artificial-intelligence",
"machine-learning",
"deep-learning",
"neural-networks"
],
"key_entities": [
"OpenAI",
"Google DeepMind",
"Transformer Architecture",
"AGI",
"NLP"
],
"confidence": 0.85,
"provenance": {
"tool": "acp-enricher",
"version": "0.4.2",
"generated_at": "2026-05-14T09:12:33Z"
},
"agent_discoverable": true,
"body_ref": "https://en.wikipedia.org/wiki/Artificial_intelligence"
}
Content gets broken into atoms—discrete units with stable IDs. An agent that needs one specific atom asks for that atom, not the whole page. That’s the whole idea. The simplicity does a lot of the work.
The pipeline
The enrichment runs asynchronously; we’re not blocking the write path or paying a real‑time tax. The flow:
- Content changes → trigger flips a
dirtyflag on the row. - Queue worker picks it up out‑of‑band.
- Enricher generates the envelope (summary, tags, entities, classification).
- Envelope is persisted to the database.
- Agent requests come in → envelope served from cache; body fetched only if asked.
By the time an agent shows up, the envelope is waiting. No on‑demand computation, no LLM call in the request path—just a read.
Query modes
| Mode | Cost (tokens) | What you get |
|---|---|---|
aco | 619 | Envelope only |
full | 3,043 | Envelope + scraped body |
both | 3,043 | Same as full |
The full version costs ~80 % more for the same query. Most of that extra cost is paying for HTML the agent didn’t need.
And then we rebuilt our product around it
We didn’t just publish the spec.
We rewrote Stacklist around it.
Stacky, our MCP server, now serves ACO envelopes by default. Every card in Stacklist has an envelope sitting in front of it. The dirty‑flag → queue‑worker → persist pipeline runs in production. By the time an agent queries Stacky, the envelope is ready.
We did this because we needed to feel it. A spec describes a shape. A product has one. Those are different things, and you only learn the difference when you’re staring at a migration deciding whether the envelope is one column or its own table (it’s its own table—we tried both).
So Stacky now talks to agents the way we wish the web talked to agents. And we can actually measure what that costs—or doesn’t.
The numbers
Query: “Stacky, give me Wikipedia’s article on Artificial Intelligence.”
| What was read | Approx. tokens |
|---|---|
| Full body | ~25,000 |
| ACO envelope | ~350 |
Savings: ~99 %

That’s not a benchmark run in a notebook; it’s a real query against a real page through the real product, right now.
The pattern holds across content. On a broader 13‑item set:
- Full bodies: ~65,000 tokens
- Envelopes: ~2,800 tokens
Reduction: 84 %–93 % (depending on the document)
The savings aren’t marginal. They’re not a 10 % win you have to graph to see. They’re the kind of difference where the question stops being “is this worth doing?” and becomes “why weren’t we doing this all along?”
The part we haven’t solved
Here’s where I have to be honest.
Every envelope is stamped—tool, version, timestamp. You can see what produced an envelope and when. The provenance layer is real and it’s working.
But the envelope claims to faithfully represent the content underneath it. And “faithfully represents” is partly a technical statement and partly a social one.
What stops someone from publishing an envelope that says one thing while the body says another? What does adversarial enrichment look like? Who watches the enrichers? When an agent reads the envelope and skips the body—which is exactly the efficiency we want—what happens when the envelope is lying?
I don’t have a clean answer. There are partial ones: signed envelopes, verifiable enrichment chains, reputation layers. Each of those is real work, and each shifts the problem rather than solving it.
The honest version: we built a shape that makes the agent web meaningfully more efficient. We did not solve trust. We made it more visible, which is something, but visible isn’t the same as solved.
That’s the part I keep sitting with.
The efficiency is real. The shape works. The numbers hold up in benchmarks and in our own product. And underneath all of it is a question — what does “faithfully represents” mean when the reader has stopped checking? — that I think is the actual hard problem of the agent web, and I don’t think any of us have answered it yet.
So we’re going to keep building.
And keep sitting with it.
Both at the same time.
