Stop building reactive agents: Why your architecture needs a System 1 and System 2

Published: (February 26, 2026 at 02:35 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

If you’ve built an LLM agent recently, you’ve probably hit the “autonomy wall.”
You give the agent a tool to search the web, a prompt to “be helpful,” and a task. For the first two turns it looks like magic. On turn three it goes down a Wikipedia rabbit hole. By turn ten it’s stuck in an infinite loop trying to fix a syntax error on a file it never downloaded.

Most developers try to fix this by cramming more instructions into the system prompt: “Never repeat the same action twice! Think step‑by‑step!”
But the problem isn’t the prompt—it’s the architecture. You’re forcing a single execution loop to do two completely different jobs: talking/acting (low latency, high bandwidth) and planning (slow, deliberative reasoning).

The Problem with Reactive Agents

Standard agents (e.g., a naive ReAct loop) operate in a flat sequence:

Observe → Think → Act → Observe → Think → Act …

When the agent’s “thinking,” it tries to decide what to say to the user and what its long‑term strategy should be. Because LLMs are autoregressive, the immediate context (the last user utterance or the last API error) overwhelmingly dominates its attention.

If the agent’s only “planner” is the same loop that’s doing the work, two failure modes emerge:

  • Shallow Exploration – The agent never discovers new subgoals because it’s too focused on the immediate task.
  • Runaway Exploration – The agent forgets the original goal entirely and never finishes.

Dual‑Process Architecture

Inspired by Daniel Kahneman’s Thinking, Fast and Slow, we can separate the doer from the planner.

Fast (System 1) Loop

  • Reactive and low‑latency.
  • Looks at the immediate context and executes the next tactical step.
  • In an interview scenario, it simply asks the next question, decides whether to probe deeper, or transitions to a new topic.
  • It does not consider the global strategy.

Deliberative (System 2) Loop

  • Runs asynchronously in the background (e.g., every k turns).
  • Examines the entire interaction history, zooms out, and optimizes the overarching trajectory.
  • Operates by simulating rollouts: it generates hypothetical futures, scores them against a utility function (e.g., maximize new information while minimizing token cost), and updates a shared “Agenda” that System 1 reads from.

SparkMe: A Case Study

A recent Stanford paper, SparkMe: Adaptive Semi‑Structured Interviewing for Qualitative Insight Discovery (arXiv:2602.21136), demonstrates this architecture in practice. The authors split their agent into two distinct systems:

  1. Fast reactive loop – asks questions and handles immediate probing.
  2. Deliberative planner – periodically simulates interview trajectories, selects high‑utility paths, and updates the agenda.

Paper link:

Benefits of Decoupling Execution and Planning

Control KnobDescription
Planning frequencyRun the planner every n steps (e.g., every 5 turns) to save compute compared to forcing a deep “Chain of Thought” on every micro‑action.
Look‑ahead horizonDefine how many future steps to simulate (e.g., 3 steps ahead).
Optimization objectiveExplicitly encode a utility function (information gain, token cost, user satisfaction, etc.) rather than relying on vague prompt instructions.

Takeaway

  • Stop trying to build a single “God Prompt” that must act perfectly in the moment while simultaneously playing 4‑D chess.
  • Let fast agents ship actions. Let slow agents simulate the future.

If you enjoy architecture notes like this, you can follow the Telegram channel:

0 views
Back to Blog

Related posts

Read more »