Context Engineering Has a Blind Spot
Source: Dev.to
The biggest shift in agent design over the past year has been context engineering rather than improved models
Most of the published guidance focuses on codebases, documentation, and structured knowledge bases, and it’s good guidance.
But there’s a category of enterprise data that breaks every standard context‑engineering pattern, and almost nobody is writing about it: email.
Why email is different from everything else
When Google’s ADK team writes about context engineering (link), they describe a pipeline:
- Ingest data
- Compile a view
- Serve it to the model
When Anthropic describes it (link), they talk about curating tokens for maximum utility.
Both assume the source data has some structural integrity to work with, because:
| Data type | Structural cues |
|---|---|
| Codebase | Files, functions, imports |
| Knowledge base | Documents, authors, dates |
| Slack | Channels, timestamps |
Email has none of that.
- A 20‑reply business thread contains the same quoted text duplicated up to 20 times.
- Every email client quotes differently (Gmail uses
>prefixes, Outlook uses indentation, Apple Mail wraps in “ HTML). - Forwarded chains collapse three separate conversations into a single message body with no structural separator.
- Inline replies break every deduplication pattern because someone typed new content between quoted blocks.
- The most critical information—e.g., the PDF with contract terms or the invoice that needs reconciling—is sitting in an attachment that most context pipelines never touch.
This is where a huge amount of enterprise context actually lives: not in the CRM fields or the wiki, but in the messy, unstructured communication data where business actually happens.
What breaks at enterprise scale
The reason this matters isn’t that one agent can’t parse one email thread. It’s what happens when you try to run context engineering across an organization’s entire communication history.
Finance
- A finance team closing the books at month‑end needs to reconcile invoices against purchase‑order approvals across hundreds of vendors.
- Invoices arrive as PDF attachments; approvals live in email threads scattered across 15 people’s inboxes, often buried in a reply that says “approved, go ahead” with no formal record in any system.
- An agent running multi‑hop search over this data makes one retrieval call, gets a fragment, reformulates, searches again, and by hop 5 it’s burning 40 k tokens on a single vendor reconciliation.
Multiply that by 300 vendors and you’ve spent more on token costs than the finance team’s monthly payroll, with accuracy degrading on every query because each hop compounds the noise from the previous one.
Compliance
- A compliance team monitoring regulatory commitments must scan 50 k threads per month for obligations that were agreed to in email and never entered into a tracking system.
- Commitments aren’t labeled; they’re buried in sentences like “we can do that by Q3” inside a 30‑reply thread where the first 20 messages were about something else entirely.
- A multi‑hop agent searching for “regulatory commitments” returns threads that mention regulations, not threads that contain actual commitments.
The semantic gap between what the agent searches for and what the data looks like structurally is exactly where context engineering is supposed to help—and where standard approaches fail on email.
Sales
-
A sales organization running deal‑risk scoring across 200 active opportunities needs to detect signals that only exist in email patterns:
- The champion going quiet for two weeks
- Procurement entering a thread where they weren’t before
- Reply latency increasing
- Tone shifting from collaborative to transactional
-
None of this shows up in the CRM, which says the deal is “Stage 3, on track” while the email thread says the deal is dying.
An agent that can’t reason over the full communication history with participant attribution, temporal ordering, and cross‑thread awareness will miss every one of these signals—and miss them confidently.
The architectural gap
Standard context engineering assumes you can compile a useful view of your data at query time. For email at enterprise scale, this doesn’t hold because the preprocessing required to make email useful is too expensive and too complex to do per‑query.
- Thread reconstruction
- Quoted‑text deduplication
- Participant attribution
- Attachment extraction
- Temporal ordering across threads that reference each other
All of this work needs to happen once at index time, not repeatedly inside an agent loop.
Index‑time vs. query‑time
| Aspect | Index‑time processing | Query‑time (multi‑hop) processing |
|---|---|---|
| Latency | Predictable, single retrieval call | Variable (10–60 s) depending on thread complexity |
| Cost | Fixed (one-time) | Scales with messiness; hardest queries are most expensive |
| Accuracy | Consistent (same query → same result) | Variable; error compounds across hops |
| Agent workload | Receives pre‑assembled context | Must reconstruct conversation, attribute speakers, separate current vs. quoted text, and answer the question—all in one loop |
The agent is simultaneously trying to:
- Reconstruct the conversation
- Figure out who said what
- Determine what’s current versus quoted history
- Answer the actual question
That’s four hard problems stacked together.
What index‑time context engineering looks like
The work that makes email usable for agents boils down to a few things that need to happen once, not per‑query:
- Reconstruct threads – link related messages across inboxes and folders.
- Strip quoted text – remove duplicated content while preserving any new inline replies.
- Attribute speakers – map each sentence or paragraph to the correct participant.
- Read attachments – extract text (and optionally tables, images, etc.) from PDFs, Word docs, spreadsheets, etc.
Then index all of it with semantic and struc
tural metadata, scoped per‑user so one person’s agent can’t surface another person’s data.
Most teams skip this and go straight to multi‑hop search, which works in demos and breaks in production at exactly the scale where the business case justifies the investment.
We build this infrastructure at iGPT, where a developer sends one API call and gets back structured, reasoning‑ready context with source citations—no loops, retries, or per‑query preprocessing.
from igptai import IGPT
client = IGPT(api_key="...", user="user_123")
result = client.recall.ask(
input="Reconcile Q1 invoices from Apex Logistics, flag PO mismatches",
quality="cef-1-normal",
output_format="json"
)
# Structured JSON: vendor, invoice amounts, PO deltas, source email citations
The industry is right to focus on context, but most implementations assume the data is already usable, and email isn’t.
If your agent is reasoning over email without fixing that first, it’s not failing because the model is weak—it’s failing because the context never made sense in the first place.
Docs:
SDK: pip install igptai