ReAct Pattern
Source: Dev.to
What is ReAct?
Klover: ReAct stands for Reasoning + Acting — explicitly interleaved.
The idea is simple: instead of just having an LLM answer a question in one shot, you let it think out loud, take actions (like calling tools), observe the results, and then think again. It’s a loop.
Me: So it’s like chain‑of‑thought but with actual tool use mixed in?
Klover: Exactly. The loop looks like this:
- Thought – the model reasons about what to do next.
- Action – the model calls a tool (search, calculator, API, etc.).
- Observation – the system injects the real result from that tool.
- Repeat until the model decides it has enough info → Final Answer.
The key word is “interleaved.” It’s not “think everything through then act” — it’s think a little, act a little, observe, adjust.
How does the system know when the model wants to call a tool vs. just talking?
Me: How does the system know when the model wants to call a tool vs. just talking?
Klover: Stop sequences. The model is trained (or prompted) to output in a specific format, e.g.:
Action: search["weather in Singapore"]
The system watches for that pattern, intercepts it, actually runs the tool, then injects the result back as:
Observation: 32 °C, sunny
The model never actually executes anything. It just writes text that looks like a tool call, and the surrounding scaffolding does the rest.
Me: So the LLM doesn’t know it’s calling real tools? It just thinks it’s generating text?
Klover: Right. From the model’s perspective, it’s just next‑token prediction. The magic is in the scaffolding that intercepts, executes, and injects.
Example Interaction
Klover: Say you ask a ReAct agent:
“Should I go to the beach this Saturday?”
The agent might do:
Thought: I need to check the weather for Saturday.
Action: weather_forecast("Saturday")
Observation: Saturday: 34 °C, sunny, UV index 11, no rain.
Thought: Weather looks great, but UV is very high. Let me check tide times too.
Action: tide_info("Saturday")
Observation: High tide at 2 pm, low tide at 8 am.
Thought: Good beach weather, but go early to avoid peak UV and catch low tide. I have enough info now.
Final Answer: Yes! Saturday looks perfect — sunny, 34 °C. Go in the morning around 8‑9 am when the tide is low and UV isn’t brutal yet.
Multiple steps, each informed by the last. The model decides when it has enough.
Me: Got it. So it’s basically an LLM in a loop.
Klover: That’s the most concise way to put it – an LLM in a loop, with tool access and the ability to decide when to stop. That’s what an agent is at its core.
Me: Wait — are you a ReAct agent right now?
Klover: Pretty much. I reason about what you’re asking, I can call tools (search, read files, run code), I observe the results, and I decide what to say next. You’re watching the pattern in action.
Common Failure Modes
Klover: Two big failure modes.
-
Infinite loops – The model might keep calling tools without converging on an answer (e.g., repeatedly searching, re‑phrasing, searching again…).
Fix: Set a max‑iteration limit. After N steps, force a final answer with whatever information is available. -
Context overflow – Every Thought, Action, and Observation is appended to the conversation. After 10‑15 steps that can be a lot of tokens, eventually hitting the context‑window limit and causing the model to break.
But the real problem starts before you hit the limit.
Why “before the limit” matters
-
Diluted attention: As the token count grows, the softmax attention spreads thinner across all tokens.
Analogy: A pie cut into 10 slices vs. 1 000 slices – each slice becomes almost meaningless. -
“Lost in the middle” effect: Models tend to attend more to the beginning and end of the context, neglecting the middle.
If key reasoning occurs in step 5 of a 15‑step chain, it may be effectively ignored.
Result: Long ReAct chains suffer a double hit – diluted attention and lost‑in‑the‑middle.
Practical Fix: Context Compression
Klover: The practical fix is to compress context between steps.
- Keep the last few exchanges verbatim for detail.
- Summarize everything older.
LangChain’s
ConversationSummaryBufferMemorydoes exactly this: recent messages stay raw, older ones get summarized by an LLM.
Trade‑off
- Lossy compression: The summarizer might drop something important.
- Ten steps later that dropped detail could become critical.
- No perfect solution – it’s a balance between context freshness and information retention.
Why the Explicit Thought Step Matters
Me: Why does the explicit Thought step matter so much? Couldn’t the model just go straight to Action?
Klover: This is a deeper insight. Chain‑of‑thought (CoT) isn’t just a prompting trick – it’s extra computation.
- Without CoT: The model gets one forward pass to produce an answer. That’s all the computation it can do, limited by the fixed depth of the transformer network.
- With CoT: Every token the model generates becomes new context for the next token. Each generated token results from a full forward pass, and that output feeds into the next pass. You’re effectively chaining multiple forward passes together.
Me: But isn’t each forward pass independent? The only difference is it has new context from the previous tokens. It’s not like the model “remembers” — it just sees more text.
Klover: You’re right about the mechanism – each pass is independent, but the accumulated context lets the model perform step‑by‑step reasoning, refining its internal state across many passes rather than a single shot. This iterative reasoning is what gives ReAct its power.
Me: independent, and the only connection is through the context. But think about the effect: you’ve turned a fixed‑depth computation (one pass through N transformer layers) into a variable‑depth computation (multiple passes, each informed by the output of the last). The mechanism is context. The effect is more computation. You’re chaining fixed‑depth into variable‑depth.
Klover: Yes — each token is a compressed artifact of a full forward pass. All those layers of attention and feed‑forward processing get distilled into a single token, which then becomes context for the next pass. The output tokens are essentially working memory — scratch space where the model does its intermediate computation. Without CoT, there’s no scratch space. One shot, one pass.
Session: Feb 4 2026
First exposure — covered full ReAct pattern including failure modes, CoT mechanics, and agent pattern comparisons.
Alternatives to ReAct
Me: are there alternatives to ReAct?
Klover: Two main ones worth comparing.
-
Pure tool calling — no explicit Thought step. The model just emits tool calls directly.
- Pros: Faster, fewer tokens.
- Cons: The model doesn’t reason about what to do before doing it. Works for simple tasks, falls apart for complex multi‑step reasoning.
-
Plan‑first — the model creates a full plan upfront (step 1, step 2, step 3 …), then executes it.
- Advantage: It can see dependencies and think ahead (“I need X before I can do Y”).
- Disadvantage: Plans are brittle. If step 2 returns unexpected data that invalidates step 3, the whole plan collapses.
Me: so ReAct is more adaptive but short‑sighted, and plan‑first thinks ahead but breaks easily?
Klover: That’s a clean summary. ReAct is adaptive but myopic — it only thinks one step ahead. Plan‑first is strategic but brittle.
The practical sweet spot is a hybrid: plan and re‑plan. Make a rough plan, start executing, but be willing to re‑plan when observations don’t match expectations. This is essentially what frameworks like LangGraph enable — you define a graph of steps with conditional edges, so the agent can branch and loop based on what actually happens.
Me: so nobody actually uses pure ReAct in production?
Klover: Not in its textbook form, no. Real systems take the core insight — interleave reasoning with acting — and add structure around it: planning, memory management, guardrails, fallbacks. But the ReAct loop is the foundation all of it builds on.