Context Engineering Has a Blind Spot

Published: (March 18, 2026 at 06:27 AM EDT)
6 min read
Source: Dev.to

Source: Dev.to

The biggest shift in agent design over the past year has been context engineering rather than improved models

Most of the published guidance focuses on codebases, documentation, and structured knowledge bases, and it’s good guidance.

But there’s a category of enterprise data that breaks every standard context‑engineering pattern, and almost nobody is writing about it: email.


Why email is different from everything else

When Google’s ADK team writes about context engineering (link), they describe a pipeline:

  1. Ingest data
  2. Compile a view
  3. Serve it to the model

When Anthropic describes it (link), they talk about curating tokens for maximum utility.

Both assume the source data has some structural integrity to work with, because:

Data typeStructural cues
CodebaseFiles, functions, imports
Knowledge baseDocuments, authors, dates
SlackChannels, timestamps

Email has none of that.

  • A 20‑reply business thread contains the same quoted text duplicated up to 20 times.
  • Every email client quotes differently (Gmail uses > prefixes, Outlook uses indentation, Apple Mail wraps in “ HTML).
  • Forwarded chains collapse three separate conversations into a single message body with no structural separator.
  • Inline replies break every deduplication pattern because someone typed new content between quoted blocks.
  • The most critical information—e.g., the PDF with contract terms or the invoice that needs reconciling—is sitting in an attachment that most context pipelines never touch.

This is where a huge amount of enterprise context actually lives: not in the CRM fields or the wiki, but in the messy, unstructured communication data where business actually happens.


What breaks at enterprise scale

The reason this matters isn’t that one agent can’t parse one email thread. It’s what happens when you try to run context engineering across an organization’s entire communication history.

Finance

  • A finance team closing the books at month‑end needs to reconcile invoices against purchase‑order approvals across hundreds of vendors.
  • Invoices arrive as PDF attachments; approvals live in email threads scattered across 15 people’s inboxes, often buried in a reply that says “approved, go ahead” with no formal record in any system.
  • An agent running multi‑hop search over this data makes one retrieval call, gets a fragment, reformulates, searches again, and by hop 5 it’s burning 40 k tokens on a single vendor reconciliation.

Multiply that by 300 vendors and you’ve spent more on token costs than the finance team’s monthly payroll, with accuracy degrading on every query because each hop compounds the noise from the previous one.

Compliance

  • A compliance team monitoring regulatory commitments must scan 50 k threads per month for obligations that were agreed to in email and never entered into a tracking system.
  • Commitments aren’t labeled; they’re buried in sentences like “we can do that by Q3” inside a 30‑reply thread where the first 20 messages were about something else entirely.
  • A multi‑hop agent searching for “regulatory commitments” returns threads that mention regulations, not threads that contain actual commitments.

The semantic gap between what the agent searches for and what the data looks like structurally is exactly where context engineering is supposed to help—and where standard approaches fail on email.

Sales

  • A sales organization running deal‑risk scoring across 200 active opportunities needs to detect signals that only exist in email patterns:

    • The champion going quiet for two weeks
    • Procurement entering a thread where they weren’t before
    • Reply latency increasing
    • Tone shifting from collaborative to transactional
  • None of this shows up in the CRM, which says the deal is “Stage 3, on track” while the email thread says the deal is dying.

An agent that can’t reason over the full communication history with participant attribution, temporal ordering, and cross‑thread awareness will miss every one of these signals—and miss them confidently.


The architectural gap

Standard context engineering assumes you can compile a useful view of your data at query time. For email at enterprise scale, this doesn’t hold because the preprocessing required to make email useful is too expensive and too complex to do per‑query.

  • Thread reconstruction
  • Quoted‑text deduplication
  • Participant attribution
  • Attachment extraction
  • Temporal ordering across threads that reference each other

All of this work needs to happen once at index time, not repeatedly inside an agent loop.

Index‑time vs. query‑time

AspectIndex‑time processingQuery‑time (multi‑hop) processing
LatencyPredictable, single retrieval callVariable (10–60 s) depending on thread complexity
CostFixed (one-time)Scales with messiness; hardest queries are most expensive
AccuracyConsistent (same query → same result)Variable; error compounds across hops
Agent workloadReceives pre‑assembled contextMust reconstruct conversation, attribute speakers, separate current vs. quoted text, and answer the question—all in one loop

The agent is simultaneously trying to:

  1. Reconstruct the conversation
  2. Figure out who said what
  3. Determine what’s current versus quoted history
  4. Answer the actual question

That’s four hard problems stacked together.


What index‑time context engineering looks like

The work that makes email usable for agents boils down to a few things that need to happen once, not per‑query:

  1. Reconstruct threads – link related messages across inboxes and folders.
  2. Strip quoted text – remove duplicated content while preserving any new inline replies.
  3. Attribute speakers – map each sentence or paragraph to the correct participant.
  4. Read attachments – extract text (and optionally tables, images, etc.) from PDFs, Word docs, spreadsheets, etc.

Then index all of it with semantic and struc

tural metadata, scoped per‑user so one person’s agent can’t surface another person’s data.

Most teams skip this and go straight to multi‑hop search, which works in demos and breaks in production at exactly the scale where the business case justifies the investment.

We build this infrastructure at iGPT, where a developer sends one API call and gets back structured, reasoning‑ready context with source citations—no loops, retries, or per‑query preprocessing.

from igptai import IGPT

client = IGPT(api_key="...", user="user_123")
result = client.recall.ask(
    input="Reconcile Q1 invoices from Apex Logistics, flag PO mismatches",
    quality="cef-1-normal",
    output_format="json"
)
# Structured JSON: vendor, invoice amounts, PO deltas, source email citations

The industry is right to focus on context, but most implementations assume the data is already usable, and email isn’t.

If your agent is reasoning over email without fixing that first, it’s not failing because the model is weak—it’s failing because the context never made sense in the first place.

Docs:
SDK: pip install igptai

0 views
Back to Blog

Related posts

Read more »

AI Vocab 101

Why Vocabulary Matters When Talking About AI I've been having a lot of conversations with non‑tech people recently about AI. What I keep running into is the sa...

Your Context Is Poisoned

Lance Martin at LangChain published a framework for context engineering with four operations: Write, Select, Compress, and Isolate. Each operation has a failure...

What is RAG?

Introduction Most AI models don't actually “know” your data. They generate answers based on what they were trained on — which means they can be outdated, incor...