Architecting efficient context-aware multi-agent framework for production

Published: (December 4, 2025 at 12:16 PM EST)
4 min read

Source: Google Developers Blog

The landscape of AI agent development is shifting fast. We’ve moved beyond prototyping single‑turn chatbots. Today, organizations are deploying sophisticated, autonomous agents to handle long‑horizon tasks: automating workflows, conducting deep research, and maintaining complex codebases.

That ambition immediately runs into a bottleneck: context.

As agents run longer, the amount of information they need to track—chat history, tool outputs, external documents, intermediate reasoning—explodes. The prevailing “solution” has been to lean on ever‑larger context windows in foundation models. But simply giving agents more space to paste text cannot be the single scaling strategy.

To build production‑grade agents that are reliable, efficient, and debuggable, the industry is exploring a new discipline:

Context engineering — treating context as a first‑class system with its own architecture, lifecycle, and constraints.

Based on our experience scaling complex single‑ or multi‑agentic systems, we designed and evolved the context stack in Google Agent Development Kit (ADK) to support that discipline. ADK is an open‑source, multi‑agent‑native framework built to make active context engineering achievable in real systems.

The scaling bottleneck

A large context window will help context‑related problems but won’t address all of them. In practice, the naive pattern—append everything into one giant prompt—collapses under a three‑way pressure:

  • Cost and latency spirals: Model cost and time‑to‑first‑token grow quickly with context size. “Shoveling” raw history and verbose tool payloads into the window makes agents prohibitively slow and expensive.
  • Signal degradation (“lost in the middle”): A context window flooded with irrelevant logs, stale tool outputs, or deprecated state can distract the model, causing it to fixate on past patterns rather than the immediate instruction. To ensure robust decision‑making, we must maximize the density of relevant information.
  • Physical limits: Real‑world workloads—involving full RAG results, intermediate artifacts, and long conversation traces—eventually overflow even the largest fixed windows.

Throwing more tokens at the problem buys time, but it doesn’t change the shape of the curve. To scale, we need to change how context is represented and managed, not just how much of it we can cram into a single call.

The design thesis: context as a compiled view

In the previous generation of agent frameworks, context was treated like a mutable string buffer. ADK is built around a different thesis:

Context is a compiled view over a richer stateful system.

In that view:

  • Sessions, memory, and artifacts (files) are the sources – the full, structured state of the interaction and its data.
  • Flows and processors are the compiler pipeline – a sequence of passes that transform that state.
  • The working context is the compiled view you ship to the LLM for this one invocation.

Once you adopt this mental model, context engineering stops being prompt gymnastics and starts looking like systems engineering. You are forced to ask standard systems questions: What is the intermediate representation? Where do we apply compaction? How do we make transformations observable?

ADK’s architecture answers these questions via three design principles:

  • Separate storage from presentation: We distinguish between durable state (Sessions) and per‑call views (working context). This allows you to evolve storage schemas and prompt formats independently.
  • Explicit transformations: Context is built through named, ordered processors, not ad‑hoc string concatenation. This makes the “compilation” step observable and testable.
  • Scope by default: Every model call and sub‑agent sees the minimum context required. Agents must reach for more information explicitly via tools, rather than being flooded by default.

ADK’s tiered structure, its relevance mechanisms, and its multi‑agent handoff semantics are essentially an application of this “compiler” thesis and the three principles:

  • Structure – a tiered model that separates how information is stored from what the model sees.
  • Relevance – agentic and human controls that decide what matters now.
  • Multi‑agent context – explicit semantics for handing off the right slice of context between agents.

The next sections walk through each of these pillars in turn.

1. Structure: The tiered model

Most early agent systems implicitly assume a single window of context. ADK goes the other way. It separates storage from presentation and organizes context into distinct layers, each with a specific job:

  • Working context – the immediate prompt for this model call: system instructions, agent identity, selected history, tool outputs, optional memory results, and references to artifacts.
  • Session – the durable log of the interaction: every user message, agent reply, tool call, tool result, control signal, and error, captured as structured Event objects.
  • Memory – long‑lived, searchable knowledge that outlives a single session: user preferences, past conversations.
  • Artifacts – large binary or textual data associated with the session or user (files, logs, images), addressed by name and version rather than pasted into the prompt.

1.1 Working context as a recomputed view

For each invocation, ADK rebuilds the Working Context from the underlying state. It starts with instructions and identity, pulls in selected Session events, and optionally attaches memory results. This view is ephemeral (thrown away after the call), configurable (you can change formatting without migrating storage), and model‑agnostic.

Thi

Back to Blog

Related posts

Read more »