[Paper] LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

Published: 4 days ago (May 6, 2026 at 01:54 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.05191v1

Overview

LongSeeker tackles a core bottleneck for autonomous search agents: as they explore, reason, and invoke tools, their internal “working memory” can balloon, leading to higher inference costs and more hallucinations. The authors propose an elastic context orchestration framework that lets the agent dynamically compress, prune, or expand its memory based on what’s currently relevant, enabling reliable long‑horizon search with far less overhead.

Key Contributions

Context‑ReAct paradigm – a unified loop that couples reasoning, tool use, and context management through five atomic operations: Skip, Compress, Rollback, Snippet, Delete.
Expressive completeness proof for the Compress operator, showing any context transformation can be represented using it.
Efficiency & fidelity guarantees for the specialized operators, reducing token usage and hallucination risk without sacrificing answer quality.
LongSeeker agent – a Qwen3‑30B‑A3B‑based model fine‑tuned on 10 k synthetic long‑horizon search trajectories that implements Context‑ReAct.
Strong empirical gains on four search benchmarks (e.g., 61.5 % vs. 43.2 % on BrowseComp), demonstrating the practical value of adaptive context handling.

Methodology

Problem framing – The authors view a search episode as a sequence of states (observations, tool calls, reasoning steps). Keeping every state verbatim quickly exceeds token limits.
Elastic context operations
- Skip: ignore irrelevant past steps when generating the next action.
- Compress: replace a sub‑trajectory with a concise summary while preserving logical dependencies.
- Rollback: revert to an earlier state to explore an alternative branch.
- Snippet: extract a focused excerpt (e.g., a key piece of evidence) to keep in memory.
- Delete: permanently discard dead‑end branches.
Context‑ReAct loop – At each step the agent decides (via a lightweight policy network) which operation(s) to apply, then proceeds with reasoning or tool invocation using the newly shaped context.
Training data – 10 k synthetic trajectories were generated with a “teacher” planner that demonstrates optimal use of the five operators. LongSeeker is fine‑tuned on this data, learning when and how to reshape its memory.
Evaluation – Benchmarks involve multi‑turn web browsing, fact‑finding, and multilingual search tasks. Metrics focus on task success rate and token consumption.

Results & Findings

Benchmark	LongSeeker	Tongyi DeepResearch	AgentFold
BrowseComp (EN)	61.5 %	43.2 %	36.2 %
BrowseComp‑ZH (CN)	62.5 %	46.7 %	47.3 %
Additional two benchmarks (not listed)	Consistently +15‑20 % over baselines	–	–

Token savings: On average, LongSeeker reduces context size by ~30 % compared with a naïve “keep‑everything” baseline, directly lowering inference cost.
Hallucination reduction: The selective retention of evidence (via Snippet/Compress) cuts factual errors by roughly 40 % in human‑rated evaluations.
Robustness to branching: The Rollback operator enables the agent to backtrack from dead‑ends without re‑processing the entire history, improving success on tasks that require trial‑and‑error exploration.

Practical Implications

Cost‑effective agents – Developers building LLM‑powered assistants (e.g., research bots, code‑search tools) can adopt Context‑ReAct to stay within token limits, making large‑model deployment cheaper.
Improved reliability – By keeping only the most relevant evidence in memory, agents become less prone to hallucinating outdated or irrelevant facts—a critical requirement for compliance‑heavy domains like finance or healthcare.
Modular integration – The five operators are API‑friendly; existing tool‑calling frameworks (LangChain, LlamaIndex) can wrap them around the LLM call loop, giving developers fine‑grained control over memory without retraining the base model.
Better multi‑turn UX – For chat‑based search assistants, elastic context means the system can remember earlier conversation threads while discarding noise, leading to smoother, more coherent user experiences.

Limitations & Future Work

Synthetic training data – The 10 k trajectories are generated by a planner, which may not capture all nuances of real‑world user behavior; performance on truly noisy, human‑generated sessions remains to be validated.
Operator selection overhead – Deciding which operation to apply adds a small inference step; scaling this decision policy to extremely long sessions (>10 k tokens) could become a bottleneck.
Generalization across domains – While benchmarks cover web search and multilingual tasks, it’s unclear how well Context‑ReAct transfers to domains with highly structured data (e.g., codebases, scientific literature) without domain‑specific fine‑tuning.
Future directions suggested by the authors include learning the operator policy end‑to‑end with reinforcement learning, extending the framework to multi‑agent collaboration, and exploring hierarchical compression schemes for even deeper context reduction.

Authors

Yijun Lu
Rui Ye
Yuwen Du
Jiajun Wang
Songhua Liu
Siheng Chen

Paper Information

arXiv ID: 2605.05191v1
Categories: cs.AI
Published: May 6, 2026
PDF: Download PDF

[Paper] LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

[Paper] GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction