[Paper] RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents
Source: arXiv - 2602.02486v1
Overview
The paper introduces RE‑TRAC, a new framework for large‑language‑model (LLM) research agents that moves beyond the linear “think‑act‑observe” loop of the popular ReAct paradigm. By compressing each search trajectory into a structured state summary and feeding that back into the next round, RE‑TRAC enables agents to reflect on past attempts, branch into alternative strategies, and keep a global view even when the context grows very long. The authors show that this recursive, cross‑trajectory reasoning yields sizable gains on benchmark web‑search tasks while also cutting down on unnecessary tool calls and token usage.
Key Contributions
- Recursive trajectory compression: After every search run, the agent creates a concise, structured representation of evidence, uncertainties, failures, and next‑step plans.
- Cross‑trajectory conditioning: Subsequent search trajectories are generated conditioned on the compressed state, allowing the agent to build on prior knowledge rather than starting from scratch each time.
- Empirical superiority: RE‑TRAC outperforms the baseline ReAct by 15‑20 % on the BrowseComp benchmark when paired with frontier LLMs (e.g., GPT‑4, Claude‑2).
- Fine‑tuning recipe for smaller models: Introduces a RE‑TRAC‑aware supervised fine‑tuning pipeline that brings mid‑size models (7‑13 B parameters) up to state‑of‑the‑art performance at comparable compute budgets.
- Efficiency gains: Demonstrates a monotonic drop in tool‑call count and token consumption across iterative rounds, indicating more focused exploration.
Methodology
-
Trajectory Generation (Round t):
- The agent follows the standard ReAct loop: reason → act (e.g., browse, query) → observe → update internal state.
- All intermediate actions, observations, and the final answer are recorded as a trajectory.
-
State Compression:
- A dedicated LLM (or a lightweight encoder) processes the raw trajectory and extracts a structured state consisting of:
- Evidence snippets (high‑confidence facts gathered).
- Uncertainties (open questions, contradictory info).
- Failures (dead‑ends, rejected tool calls).
- Plan sketch (next hypotheses or search directions).
- This representation is deliberately compact (≈ 200‑300 tokens) to stay well within context windows.
- A dedicated LLM (or a lightweight encoder) processes the raw trajectory and extracts a structured state consisting of:
-
Cross‑Trajectory Conditioning (Round t + 1):
- The next trajectory is generated with the compressed state prepended to the prompt, effectively giving the model a “memory” of what has already been tried.
- The agent can now branch (try a different tool or query) or refine (dig deeper on a promising lead) based on the summarized knowledge.
-
Iterative Loop:
- Steps 1‑3 repeat for a fixed number of rounds (or until a stopping criterion such as confidence threshold is met).
- For smaller models, the authors fine‑tune the model on a dataset of (trajectory, compressed‑state, next‑action) triples, teaching it to internalize the compression‑conditioning pattern.
-
Evaluation:
- Primary benchmark: BrowseComp, a web‑search and information‑synthesis task suite.
- Metrics: task success rate, number of tool calls, total token usage, and answer quality (BLEU/ROUGE).
Results & Findings
| Model / Setting | Success ↑ | Tool Calls ↓ | Tokens ↓ |
|---|---|---|---|
| GPT‑4 + ReAct | 62 % | 48 | 1.2 M |
| GPT‑4 + RE‑TRAC | 78 % (+15 pp) | 31 (‑35 %) | 0.9 M (‑25 %) |
| Claude‑2 + ReAct | 58 % | 45 | 1.1 M |
| Claude‑2 + RE‑TRAC | 73 % (+15 pp) | 29 (‑36 %) | 0.85 M (‑23 %) |
| LLaMA‑13B (FT) + ReAct | 44 % | 52 | 1.3 M |
| LLaMA‑13B (FT) + RE‑TRAC‑aware FT | 58 % (+14 pp) | 34 (‑35 %) | 1.0 M (‑23 %) |
- Monotonic improvement: Across rounds, the number of tool calls drops steadily, showing that the agent becomes more decisive after each reflection.
- Quality of answers: Human evaluation reports higher factual correctness and coherence for RE‑TRAC outputs.
- Scalability: The compression step adds negligible overhead (≈ 0.1 s per round) and works equally well with both massive and mid‑size LLMs.
Practical Implications
- More reliable autonomous agents: Developers building agents for web‑scraping, data‑gathering, or automated research can adopt RE‑TRAC to avoid getting stuck in loops or repeating failed queries.
- Cost savings: Fewer tool calls and reduced token consumption translate directly into lower API bills, especially when using pay‑per‑token LLM services.
- Better multi‑step reasoning: Applications that require deep investigation—e.g., legal document analysis, scientific literature review, or troubleshooting complex systems—benefit from the ability to reflect and re‑plan across iterations.
- Fine‑tuning recipe for smaller models: Teams without access to GPT‑4 can still reap most of the gains by applying the RE‑TRAC‑aware supervised fine‑tuning pipeline to their own open‑source models.
- Plug‑and‑play architecture: The compression module can be swapped with any encoder (e.g., a lightweight T5) and the conditioning simply involves concatenating the state to the prompt, making integration straightforward in existing ReAct‑style pipelines.
Limitations & Future Work
- Compression fidelity: The structured state is a lossy summary; critical nuances might be omitted, potentially leading the next round astray.
- Fixed round budget: The current setup uses a predetermined number of iterations; adaptive stopping criteria could make the process more efficient.
- Domain generality: Experiments focus on web‑search tasks; it remains to be seen how RE‑TRAC performs on non‑textual toolchains (e.g., code execution, robotics).
- Scalability of state representation: While 200‑300 tokens work for BrowseComp, more complex domains may require richer representations, challenging the context‑window limits of smaller models.
Future research directions include learning dynamic compression strategies, exploring hierarchical state representations, and extending RE‑TRAC to multimodal agents that can summarize visual or auditory observations alongside text.
Bottom line: RE‑TRAC offers a pragmatic, low‑overhead upgrade to existing LLM‑driven agents, turning linear search into a reflective, globally‑aware process that boosts success rates while cutting costs—a win for both developers and the organizations that rely on autonomous information‑gathering systems.
Authors
- Jialiang Zhu
- Gongrui Zhang
- Xiaolong Ma
- Lin Xu
- Miaosen Zhang
- Ruiqi Yang
- Song Wang
- Kai Qiu
- Zhirong Wu
- Qi Dai
- Ruichun Ma
- Bei Liu
- Yifan Yang
- Chong Luo
- Zhengyuan Yang
- Linjie Li
- Lijuan Wang
- Weizhu Chen
- Xin Geng
- Baining Guo
Paper Information
- arXiv ID: 2602.02486v1
- Categories: cs.CL, cs.AI
- Published: February 2, 2026
- PDF: Download PDF