ReAct Pattern
Source: Dev.to
What is ReAct?
Klover: ReAct stands for Reasoning + Acting — explicitly interleaved.
The idea is simple: instead of just having an LLM answer a question in one shot, you let it think out loud, take actions (like calling tools), observe the results, and then think again. It’s a loop.
Me: So it’s like chain‑of‑thought but with actual tool use mixed in?
Klover: Exactly. The loop looks like this:
- Thought – the model reasons about what to do next.
- Action – the model calls a tool (search, calculator, API, etc.).
- Observation – the system injects the real result from that tool.
- Repeat until the model decides it has enough info → Final Answer.
The key word is “interleaved.” It’s not “think everything through then act” — it’s think a little, act a little, observe, adjust.
How does the system know when the model wants to call a tool vs. just talking?
Me: How does the system know when the model wants to call a tool vs. just talking?
Klover: Stop sequences. The model is trained (or prompted) to output in a specific format, e.g.:
Action: search["weather in Singapore"]The system watches for that pattern, intercepts it, actually runs the tool, then injects the result back as:
Observation: 32 °C, sunnyThe model never actually executes anything. It just writes text that looks like a tool call, and the surrounding scaffolding does the rest.
Me: So the LLM doesn’t know it’s calling real tools? It just thinks it’s generating text?
Klover: Right. From the model’s perspective, it’s just next‑token prediction. The magic is in the scaffolding that intercepts, executes, and injects.
Example Interaction
Klover: Say you ask a ReAct agent:
“Should I go to the beach this Saturday?”
The agent might do:
Thought: I need to check the weather for Saturday.
Action: weather_forecast("Saturday")
Observation: Saturday: 34 °C, sunny, UV index 11, no rain.
Thought: Weather looks great, but UV is very high. Let me check tide times too.
Action: tide_info("Saturday")
Observation: High tide at 2 pm, low tide at 8 am.
Thought: Good beach weather, but go early to avoid peak UV and catch low tide. I have enough info now.
Final Answer: Yes! Saturday looks perfect — sunny, 34 °C. Go in the morning around 8‑9 am when the tide is low and UV isn’t brutal yet.Multiple steps, each informed by the last. The model decides when it has enough.
Me: Got it. So it’s basically an LLM in a loop.
Klover: That’s the most concise way to put it – an LLM in a loop, with tool access and the ability to decide when to stop. That’s what an agent is at its core.
Me: Wait — are you a ReAct agent right now?
Klover: Pretty much. I reason about what you’re asking, I can call tools (search, read files, run code), I observe the results, and I decide what to say next. You’re watching the pattern in action.
Common Failure Modes
Klover: Two big failure modes.
Infinite loops – The model might keep calling tools without converging on an answer (e.g., repeatedly searching, re‑phrasing, searching again…).
Fix: Set a max‑iteration limit. After N steps, force a final answer with whatever information is available.Context overflow – Every Thought, Action, and Observation is appended to the conversation. After 10‑15 steps that can be a lot of tokens, eventually hitting the context‑window limit and causing the model to break.
But the real problem starts before you hit the limit.
Why “before the limit” matters
Diluted attention: As the token count grows, the softmax attention spreads thinner across all tokens.
Analogy: A pie cut into 10 slices vs. 1 000 slices – each slice becomes almost meaningless.“Lost in the middle” effect: Models tend to attend more to the beginning and end of the context, neglecting the middle.
If key reasoning occurs in step 5 of a 15‑step chain, it may be effectively ignored.
Result: Long ReAct chains suffer a double hit – diluted attention and lost‑in‑the‑middle.
Practical Fix: Context Compression
Klover: The practical fix is to compress context between steps.
- Keep the last few exchanges verbatim for detail.
- Summarize everything older.
LangChain’s
ConversationSummaryBufferMemorydoes exactly this: recent messages stay raw, older ones get summarized by an LLM.
Trade‑off
- Lossy compression: The summarizer might drop something important.
- Ten steps later that dropped detail could become critical.
- No perfect solution – it’s a balance between context freshness and information retention.
Why the Explicit Thought Step Matters
Me: Why does the explicit Thought step matter so much? Couldn’t the model just go straight to Action?
Klover: This is a deeper insight. Chain‑of‑thought (CoT) isn’t just a prompting trick – it’s extra computation.
- Without CoT: The model gets one forward pass to produce an answer. That’s all the computation it can do, limited by the fixed depth of the transformer network.
- With CoT: Every token the model generates becomes new context for the next token. Each generated token results from a full forward pass, and that output feeds into the next pass. You’re effectively chaining multiple forward passes together.
Me: But isn’t each forward pass independent? The only difference is it has new context from the previous tokens. It’s not like the model “remembers” — it just sees more text.
Klover: You’re right about the mechanism – each pass is independent, but the accumulated context lets the model perform step‑by‑step reasoning, refining its internal state across many passes rather than a single shot. This iterative reasoning is what gives ReAct its power.
Me: independent, and the only connection is through the context. But think about the effect: you’ve turned a fixed‑depth computation (one pass through N transformer layers) into a variable‑depth computation (multiple passes, each informed by the output of the last). The mechanism is context. The effect is more computation. You’re chaining fixed‑depth into variable‑depth.
Klover: Yes — each token is a compressed artifact of a full forward pass. All those layers of attention and feed‑forward processing get distilled into a single token, which then becomes context for the next pass. The output tokens are essentially working memory — scratch space where the model does its intermediate computation. Without CoT, there’s no scratch space. One shot, one pass.
Session: Feb 4 2026
First exposure — covered full ReAct pattern including failure modes, CoT mechanics, and agent pattern comparisons.
Alternatives to ReAct
Me: are there alternatives to ReAct?
Klover: Two main ones worth comparing.
Pure tool calling — no explicit Thought step. The model just emits tool calls directly.
- Pros: Faster, fewer tokens.
- Cons: The model doesn’t reason about what to do before doing it. Works for simple tasks, falls apart for complex multi‑step reasoning.
Plan‑first — the model creates a full plan upfront (step 1, step 2, step 3 …), then executes it.
- Advantage: It can see dependencies and think ahead (“I need X before I can do Y”).
- Disadvantage: Plans are brittle. If step 2 returns unexpected data that invalidates step 3, the whole plan collapses.
Me: so ReAct is more adaptive but short‑sighted, and plan‑first thinks ahead but breaks easily?
Klover: That’s a clean summary. ReAct is adaptive but myopic — it only thinks one step ahead. Plan‑first is strategic but brittle.
The practical sweet spot is a hybrid: plan and re‑plan. Make a rough plan, start executing, but be willing to re‑plan when observations don’t match expectations. This is essentially what frameworks like LangGraph enable — you define a graph of steps with conditional edges, so the agent can branch and loop based on what actually happens.
Me: so nobody actually uses pure ReAct in production?
Klover: Not in its textbook form, no. Real systems take the core insight — interleave reasoning with acting — and add structure around it: planning, memory management, guardrails, fallbacks. But the ReAct loop is the foundation all of it builds on.