[Paper] RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Published: 1 day ago (February 2, 2026 at 01:58 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.02486v1

Overview

The paper introduces RE‑TRAC, a new framework for large‑language‑model (LLM) research agents that moves beyond the linear “think‑act‑observe” loop of the popular ReAct paradigm. By compressing each search trajectory into a structured state summary and feeding that back into the next round, RE‑TRAC enables agents to reflect on past attempts, branch into alternative strategies, and keep a global view even when the context grows very long. The authors show that this recursive, cross‑trajectory reasoning yields sizable gains on benchmark web‑search tasks while also cutting down on unnecessary tool calls and token usage.

Key Contributions

Recursive trajectory compression: After every search run, the agent creates a concise, structured representation of evidence, uncertainties, failures, and next‑step plans.
Cross‑trajectory conditioning: Subsequent search trajectories are generated conditioned on the compressed state, allowing the agent to build on prior knowledge rather than starting from scratch each time.
Empirical superiority: RE‑TRAC outperforms the baseline ReAct by 15‑20 % on the BrowseComp benchmark when paired with frontier LLMs (e.g., GPT‑4, Claude‑2).
Fine‑tuning recipe for smaller models: Introduces a RE‑TRAC‑aware supervised fine‑tuning pipeline that brings mid‑size models (7‑13 B parameters) up to state‑of‑the‑art performance at comparable compute budgets.
Efficiency gains: Demonstrates a monotonic drop in tool‑call count and token consumption across iterative rounds, indicating more focused exploration.

Methodology

Trajectory Generation (Round t):
- The agent follows the standard ReAct loop: reason → act (e.g., browse, query) → observe → update internal state.
- All intermediate actions, observations, and the final answer are recorded as a trajectory.
State Compression:
- A dedicated LLM (or a lightweight encoder) processes the raw trajectory and extracts a structured state consisting of:
  - Evidence snippets (high‑confidence facts gathered).
  - Uncertainties (open questions, contradictory info).
  - Failures (dead‑ends, rejected tool calls).
  - Plan sketch (next hypotheses or search directions).
- This representation is deliberately compact (≈ 200‑300 tokens) to stay well within context windows.
Cross‑Trajectory Conditioning (Round t + 1):
- The next trajectory is generated with the compressed state prepended to the prompt, effectively giving the model a “memory” of what has already been tried.
- The agent can now branch (try a different tool or query) or refine (dig deeper on a promising lead) based on the summarized knowledge.
Iterative Loop:
- Steps 1‑3 repeat for a fixed number of rounds (or until a stopping criterion such as confidence threshold is met).
- For smaller models, the authors fine‑tune the model on a dataset of (trajectory, compressed‑state, next‑action) triples, teaching it to internalize the compression‑conditioning pattern.
Evaluation:
- Primary benchmark: BrowseComp, a web‑search and information‑synthesis task suite.
- Metrics: task success rate, number of tool calls, total token usage, and answer quality (BLEU/ROUGE).

Results & Findings

Model / Setting	Success ↑	Tool Calls ↓	Tokens ↓
GPT‑4 + ReAct	62 %	48	1.2 M
GPT‑4 + RE‑TRAC	78 % (+15 pp)	31 (‑35 %)	0.9 M (‑25 %)
Claude‑2 + ReAct	58 %	45	1.1 M
Claude‑2 + RE‑TRAC	73 % (+15 pp)	29 (‑36 %)	0.85 M (‑23 %)
LLaMA‑13B (FT) + ReAct	44 %	52	1.3 M
LLaMA‑13B (FT) + RE‑TRAC‑aware FT	58 % (+14 pp)	34 (‑35 %)	1.0 M (‑23 %)

Monotonic improvement: Across rounds, the number of tool calls drops steadily, showing that the agent becomes more decisive after each reflection.
Quality of answers: Human evaluation reports higher factual correctness and coherence for RE‑TRAC outputs.
Scalability: The compression step adds negligible overhead (≈ 0.1 s per round) and works equally well with both massive and mid‑size LLMs.

Practical Implications

More reliable autonomous agents: Developers building agents for web‑scraping, data‑gathering, or automated research can adopt RE‑TRAC to avoid getting stuck in loops or repeating failed queries.
Cost savings: Fewer tool calls and reduced token consumption translate directly into lower API bills, especially when using pay‑per‑token LLM services.
Better multi‑step reasoning: Applications that require deep investigation—e.g., legal document analysis, scientific literature review, or troubleshooting complex systems—benefit from the ability to reflect and re‑plan across iterations.
Fine‑tuning recipe for smaller models: Teams without access to GPT‑4 can still reap most of the gains by applying the RE‑TRAC‑aware supervised fine‑tuning pipeline to their own open‑source models.
Plug‑and‑play architecture: The compression module can be swapped with any encoder (e.g., a lightweight T5) and the conditioning simply involves concatenating the state to the prompt, making integration straightforward in existing ReAct‑style pipelines.

Limitations & Future Work

Compression fidelity: The structured state is a lossy summary; critical nuances might be omitted, potentially leading the next round astray.
Fixed round budget: The current setup uses a predetermined number of iterations; adaptive stopping criteria could make the process more efficient.
Domain generality: Experiments focus on web‑search tasks; it remains to be seen how RE‑TRAC performs on non‑textual toolchains (e.g., code execution, robotics).
Scalability of state representation: While 200‑300 tokens work for BrowseComp, more complex domains may require richer representations, challenging the context‑window limits of smaller models.

Future research directions include learning dynamic compression strategies, exploring hierarchical state representations, and extending RE‑TRAC to multimodal agents that can summarize visual or auditory observations alongside text.

Bottom line: RE‑TRAC offers a pragmatic, low‑overhead upgrade to existing LLM‑driven agents, turning linear search into a reflective, globally‑aware process that boosts success rates while cutting costs—a win for both developers and the organizations that rely on autonomous information‑gathering systems.

Authors

Jialiang Zhu
Gongrui Zhang
Xiaolong Ma
Lin Xu
Miaosen Zhang
Ruiqi Yang
Song Wang
Kai Qiu
Zhirong Wu
Qi Dai
Ruichun Ma
Bei Liu
Yifan Yang
Chong Luo
Zhengyuan Yang
Linjie Li
Lijuan Wang
Weizhu Chen
Xin Geng
Baining Guo

Paper Information

arXiv ID: 2602.02486v1
Categories: cs.CL, cs.AI
Published: February 2, 2026
PDF: Download PDF

[Paper] RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Reward-free Alignment for Conflicting Objectives

[Paper] RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

[Paper] MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

[Paper] SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning