We benchmarked an 84% token reduction. Then we open sourced the protocol.

Published: 3 weeks ago (May 18, 2026 at 06:12 PM EDT)

6 min read

Source: Dev.to

I was watching an agent answer a simple question.

The question was small. Three sentences would have covered it. The agent loaded the page, parsed the HTML, walked through nav bars, footer links, cookie banners, a sticky “subscribe to our newsletter” modal, three paragraphs of preamble, and finally found the part it needed.

Twenty thousand tokens.

For three sentences.

This is happening everywhere right now. Quietly. Constantly. Every agent, every query, every page. We’ve handed agents a web that was built for human eyeballs and asked them to make it work.

It does… expensively.

The shape is wrong

The web was built for browsers. Humans scroll, scan, skip the boilerplate. Our eyes know what nav bars look like.

Agents don’t get that for free.

They read the whole thing. There’s no “give me the relevant part” channel, scaffolding and all—every header, every analytics script, every footer link in twelve languages. The cost is paid in tokens, latency, and the slightly absurd reality that an agent might burn more compute parsing your nav menu than thinking about your content.

This is a shape problem. No amount of optimization fixes a shape mismatch.

ACP: a shape, not a framework

Atomic Content Protocol (ACP) – an open spec for structured content envelopes.

Not a framework.
Not a platform.
A shape.

You pre‑compute a compact, enriched representation of your content, persist it, and serve it first. The envelope sits in front of the body; it doesn’t replace it. The body is still there if anyone needs it—most of the time, agents don’t.

Built on top of MCP. Open spec, MIT licensed, npm package shipped. Designed to complement protocols already in motion, not replace them.

Example envelope

{
  "id": "atom_7f3a...",
  "summary": "AI is the capability of computational systems to perform tasks associated with human intelligence...",
  "classification": "reference",
  "language": "en",
  "tags": [
    "artificial-intelligence",
    "machine-learning",
    "deep-learning",
    "neural-networks"
  ],
  "key_entities": [
    "OpenAI",
    "Google DeepMind",
    "Transformer Architecture",
    "AGI",
    "NLP"
  ],
  "confidence": 0.85,
  "provenance": {
    "tool": "acp-enricher",
    "version": "0.4.2",
    "generated_at": "2026-05-14T09:12:33Z"
  },
  "agent_discoverable": true,
  "body_ref": "https://en.wikipedia.org/wiki/Artificial_intelligence"
}

Content gets broken into atoms—discrete units with stable IDs. An agent that needs one specific atom asks for that atom, not the whole page. That’s the whole idea. The simplicity does a lot of the work.

The pipeline

The enrichment runs asynchronously; we’re not blocking the write path or paying a real‑time tax. The flow:

Content changes → trigger flips a dirty flag on the row.
Queue worker picks it up out‑of‑band.
Enricher generates the envelope (summary, tags, entities, classification).
Envelope is persisted to the database.
Agent requests come in → envelope served from cache; body fetched only if asked.

By the time an agent shows up, the envelope is waiting. No on‑demand computation, no LLM call in the request path—just a read.

Query modes

Mode	Cost (tokens)	What you get
`aco`	619	Envelope only
`full`	3,043	Envelope + scraped body
`both`	3,043	Same as `full`

The full version costs ~80 % more for the same query. Most of that extra cost is paying for HTML the agent didn’t need.

And then we rebuilt our product around it

We didn’t just publish the spec.

We rewrote Stacklist around it.

Stacky, our MCP server, now serves ACO envelopes by default. Every card in Stacklist has an envelope sitting in front of it. The dirty‑flag → queue‑worker → persist pipeline runs in production. By the time an agent queries Stacky, the envelope is ready.

We did this because we needed to feel it. A spec describes a shape. A product has one. Those are different things, and you only learn the difference when you’re staring at a migration deciding whether the envelope is one column or its own table (it’s its own table—we tried both).

So Stacky now talks to agents the way we wish the web talked to agents. And we can actually measure what that costs—or doesn’t.

The numbers

Query: “Stacky, give me Wikipedia’s article on Artificial Intelligence.”

What was read	Approx. tokens
Full body	~25,000
ACO envelope	~350

Savings: ~99 %

aco envelope savings

That’s not a benchmark run in a notebook; it’s a real query against a real page through the real product, right now.

The pattern holds across content. On a broader 13‑item set:

Full bodies: ~65,000 tokens
Envelopes: ~2,800 tokens

Reduction: 84 %–93 % (depending on the document)

The savings aren’t marginal. They’re not a 10 % win you have to graph to see. They’re the kind of difference where the question stops being “is this worth doing?” and becomes “why weren’t we doing this all along?”

The part we haven’t solved

Here’s where I have to be honest.

Every envelope is stamped—tool, version, timestamp. You can see what produced an envelope and when. The provenance layer is real and it’s working.

But the envelope claims to faithfully represent the content underneath it. And “faithfully represents” is partly a technical statement and partly a social one.

What stops someone from publishing an envelope that says one thing while the body says another? What does adversarial enrichment look like? Who watches the enrichers? When an agent reads the envelope and skips the body—which is exactly the efficiency we want—what happens when the envelope is lying?

I don’t have a clean answer. There are partial ones: signed envelopes, verifiable enrichment chains, reputation layers. Each of those is real work, and each shifts the problem rather than solving it.

The honest version: we built a shape that makes the agent web meaningfully more efficient. We did not solve trust. We made it more visible, which is something, but visible isn’t the same as solved.

That’s the part I keep sitting with.

The efficiency is real. The shape works. The numbers hold up in benchmarks and in our own product. And underneath all of it is a question — what does “faithfully represents” mean when the reader has stopped checking? — that I think is the actual hard problem of the agent web, and I don’t think any of us have answered it yet.

So we’re going to keep building.

And keep sitting with it.

Both at the same time.

We benchmarked an 84% token reduction. Then we open sourced the protocol.

The shape is wrong

ACP: a shape, not a framework

Example envelope

The pipeline

Query modes

And then we rebuilt our product around it

The numbers

The part we haven’t solved

Related posts

LangSmith Engine closes the agent debugging loop automatically — but multi-model enterprises still need a neutral layer

Your AI agent needs a governance layer, not just guardrails

Anthropic co-founder to present AI encyclical alongside Pope Leo XIV

Vera Arrives: NVIDIA’s First CPU Built for Agents Lands at Top AI Labs