ReAct Pattern

Published: (February 3, 2026 at 09:35 PM EST)
7 min read
Source: Dev.to

Source: Dev.to

What is ReAct?

Klover: ReAct stands for Reasoning + Acting — explicitly interleaved.
The idea is simple: instead of just having an LLM answer a question in one shot, you let it think out loud, take actions (like calling tools), observe the results, and then think again. It’s a loop.

Me: So it’s like chain‑of‑thought but with actual tool use mixed in?

Klover: Exactly. The loop looks like this:

  1. Thought – the model reasons about what to do next.
  2. Action – the model calls a tool (search, calculator, API, etc.).
  3. Observation – the system injects the real result from that tool.
  4. Repeat until the model decides it has enough info → Final Answer.

The key word is “interleaved.” It’s not “think everything through then act” — it’s think a little, act a little, observe, adjust.

How does the system know when the model wants to call a tool vs. just talking?

Me: How does the system know when the model wants to call a tool vs. just talking?

Klover: Stop sequences. The model is trained (or prompted) to output in a specific format, e.g.:

Action: search["weather in Singapore"]

The system watches for that pattern, intercepts it, actually runs the tool, then injects the result back as:

Observation: 32 °C, sunny

The model never actually executes anything. It just writes text that looks like a tool call, and the surrounding scaffolding does the rest.

Me: So the LLM doesn’t know it’s calling real tools? It just thinks it’s generating text?

Klover: Right. From the model’s perspective, it’s just next‑token prediction. The magic is in the scaffolding that intercepts, executes, and injects.

Example Interaction

Klover: Say you ask a ReAct agent:

“Should I go to the beach this Saturday?”

The agent might do:

Thought: I need to check the weather for Saturday.
Action: weather_forecast("Saturday")
Observation: Saturday: 34 °C, sunny, UV index 11, no rain.
Thought: Weather looks great, but UV is very high. Let me check tide times too.
Action: tide_info("Saturday")
Observation: High tide at 2 pm, low tide at 8 am.
Thought: Good beach weather, but go early to avoid peak UV and catch low tide. I have enough info now.
Final Answer: Yes! Saturday looks perfect — sunny, 34 °C. Go in the morning around 8‑9 am when the tide is low and UV isn’t brutal yet.

Multiple steps, each informed by the last. The model decides when it has enough.

Me: Got it. So it’s basically an LLM in a loop.

Klover: That’s the most concise way to put it – an LLM in a loop, with tool access and the ability to decide when to stop. That’s what an agent is at its core.

Me: Wait — are you a ReAct agent right now?

Klover: Pretty much. I reason about what you’re asking, I can call tools (search, read files, run code), I observe the results, and I decide what to say next. You’re watching the pattern in action.

Common Failure Modes

Klover: Two big failure modes.

  1. Infinite loops – The model might keep calling tools without converging on an answer (e.g., repeatedly searching, re‑phrasing, searching again…).
    Fix: Set a max‑iteration limit. After N steps, force a final answer with whatever information is available.

  2. Context overflow – Every Thought, Action, and Observation is appended to the conversation. After 10‑15 steps that can be a lot of tokens, eventually hitting the context‑window limit and causing the model to break.
    But the real problem starts before you hit the limit.

Why “before the limit” matters

  • Diluted attention: As the token count grows, the softmax attention spreads thinner across all tokens.
    Analogy: A pie cut into 10 slices vs. 1 000 slices – each slice becomes almost meaningless.

  • “Lost in the middle” effect: Models tend to attend more to the beginning and end of the context, neglecting the middle.
    If key reasoning occurs in step 5 of a 15‑step chain, it may be effectively ignored.

Result: Long ReAct chains suffer a double hit – diluted attention and lost‑in‑the‑middle.

Practical Fix: Context Compression

Klover: The practical fix is to compress context between steps.

  • Keep the last few exchanges verbatim for detail.
  • Summarize everything older.

LangChain’s ConversationSummaryBufferMemory does exactly this: recent messages stay raw, older ones get summarized by an LLM.

Trade‑off

  • Lossy compression: The summarizer might drop something important.
  • Ten steps later that dropped detail could become critical.
  • No perfect solution – it’s a balance between context freshness and information retention.

Why the Explicit Thought Step Matters

Me: Why does the explicit Thought step matter so much? Couldn’t the model just go straight to Action?

Klover: This is a deeper insight. Chain‑of‑thought (CoT) isn’t just a prompting trick – it’s extra computation.

  • Without CoT: The model gets one forward pass to produce an answer. That’s all the computation it can do, limited by the fixed depth of the transformer network.
  • With CoT: Every token the model generates becomes new context for the next token. Each generated token results from a full forward pass, and that output feeds into the next pass. You’re effectively chaining multiple forward passes together.

Me: But isn’t each forward pass independent? The only difference is it has new context from the previous tokens. It’s not like the model “remembers” — it just sees more text.

Klover: You’re right about the mechanism – each pass is independent, but the accumulated context lets the model perform step‑by‑step reasoning, refining its internal state across many passes rather than a single shot. This iterative reasoning is what gives ReAct its power.

Me: independent, and the only connection is through the context. But think about the effect: you’ve turned a fixed‑depth computation (one pass through N transformer layers) into a variable‑depth computation (multiple passes, each informed by the output of the last). The mechanism is context. The effect is more computation. You’re chaining fixed‑depth into variable‑depth.

Klover: Yes — each token is a compressed artifact of a full forward pass. All those layers of attention and feed‑forward processing get distilled into a single token, which then becomes context for the next pass. The output tokens are essentially working memory — scratch space where the model does its intermediate computation. Without CoT, there’s no scratch space. One shot, one pass.

Session: Feb 4 2026

First exposure — covered full ReAct pattern including failure modes, CoT mechanics, and agent pattern comparisons.

Alternatives to ReAct

Me: are there alternatives to ReAct?

Klover: Two main ones worth comparing.

  1. Pure tool calling — no explicit Thought step. The model just emits tool calls directly.

    • Pros: Faster, fewer tokens.
    • Cons: The model doesn’t reason about what to do before doing it. Works for simple tasks, falls apart for complex multi‑step reasoning.
  2. Plan‑first — the model creates a full plan upfront (step 1, step 2, step 3 …), then executes it.

    • Advantage: It can see dependencies and think ahead (“I need X before I can do Y”).
    • Disadvantage: Plans are brittle. If step 2 returns unexpected data that invalidates step 3, the whole plan collapses.

Me: so ReAct is more adaptive but short‑sighted, and plan‑first thinks ahead but breaks easily?

Klover: That’s a clean summary. ReAct is adaptive but myopic — it only thinks one step ahead. Plan‑first is strategic but brittle.

The practical sweet spot is a hybrid: plan and re‑plan. Make a rough plan, start executing, but be willing to re‑plan when observations don’t match expectations. This is essentially what frameworks like LangGraph enable — you define a graph of steps with conditional edges, so the agent can branch and loop based on what actually happens.

Me: so nobody actually uses pure ReAct in production?

Klover: Not in its textbook form, no. Real systems take the core insight — interleave reasoning with acting — and add structure around it: planning, memory management, guardrails, fallbacks. But the ReAct loop is the foundation all of it builds on.

Back to Blog

Related posts

Read more »