[Paper] LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

Published: (May 6, 2026 at 01:54 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2605.05191v1

Overview

LongSeeker tackles a core bottleneck for autonomous search agents: as they explore, reason, and invoke tools, their internal “working memory” can balloon, leading to higher inference costs and more hallucinations. The authors propose an elastic context orchestration framework that lets the agent dynamically compress, prune, or expand its memory based on what’s currently relevant, enabling reliable long‑horizon search with far less overhead.

Key Contributions

  • Context‑ReAct paradigm – a unified loop that couples reasoning, tool use, and context management through five atomic operations: Skip, Compress, Rollback, Snippet, Delete.
  • Expressive completeness proof for the Compress operator, showing any context transformation can be represented using it.
  • Efficiency & fidelity guarantees for the specialized operators, reducing token usage and hallucination risk without sacrificing answer quality.
  • LongSeeker agent – a Qwen3‑30B‑A3B‑based model fine‑tuned on 10 k synthetic long‑horizon search trajectories that implements Context‑ReAct.
  • Strong empirical gains on four search benchmarks (e.g., 61.5 % vs. 43.2 % on BrowseComp), demonstrating the practical value of adaptive context handling.

Methodology

  1. Problem framing – The authors view a search episode as a sequence of states (observations, tool calls, reasoning steps). Keeping every state verbatim quickly exceeds token limits.
  2. Elastic context operations
    • Skip: ignore irrelevant past steps when generating the next action.
    • Compress: replace a sub‑trajectory with a concise summary while preserving logical dependencies.
    • Rollback: revert to an earlier state to explore an alternative branch.
    • Snippet: extract a focused excerpt (e.g., a key piece of evidence) to keep in memory.
    • Delete: permanently discard dead‑end branches.
  3. Context‑ReAct loop – At each step the agent decides (via a lightweight policy network) which operation(s) to apply, then proceeds with reasoning or tool invocation using the newly shaped context.
  4. Training data – 10 k synthetic trajectories were generated with a “teacher” planner that demonstrates optimal use of the five operators. LongSeeker is fine‑tuned on this data, learning when and how to reshape its memory.
  5. Evaluation – Benchmarks involve multi‑turn web browsing, fact‑finding, and multilingual search tasks. Metrics focus on task success rate and token consumption.

Results & Findings

BenchmarkLongSeekerTongyi DeepResearchAgentFold
BrowseComp (EN)61.5 %43.2 %36.2 %
BrowseComp‑ZH (CN)62.5 %46.7 %47.3 %
Additional two benchmarks (not listed)Consistently +15‑20 % over baselines
  • Token savings: On average, LongSeeker reduces context size by ~30 % compared with a naïve “keep‑everything” baseline, directly lowering inference cost.
  • Hallucination reduction: The selective retention of evidence (via Snippet/Compress) cuts factual errors by roughly 40 % in human‑rated evaluations.
  • Robustness to branching: The Rollback operator enables the agent to backtrack from dead‑ends without re‑processing the entire history, improving success on tasks that require trial‑and‑error exploration.

Practical Implications

  • Cost‑effective agents – Developers building LLM‑powered assistants (e.g., research bots, code‑search tools) can adopt Context‑ReAct to stay within token limits, making large‑model deployment cheaper.
  • Improved reliability – By keeping only the most relevant evidence in memory, agents become less prone to hallucinating outdated or irrelevant facts—a critical requirement for compliance‑heavy domains like finance or healthcare.
  • Modular integration – The five operators are API‑friendly; existing tool‑calling frameworks (LangChain, LlamaIndex) can wrap them around the LLM call loop, giving developers fine‑grained control over memory without retraining the base model.
  • Better multi‑turn UX – For chat‑based search assistants, elastic context means the system can remember earlier conversation threads while discarding noise, leading to smoother, more coherent user experiences.

Limitations & Future Work

  • Synthetic training data – The 10 k trajectories are generated by a planner, which may not capture all nuances of real‑world user behavior; performance on truly noisy, human‑generated sessions remains to be validated.
  • Operator selection overhead – Deciding which operation to apply adds a small inference step; scaling this decision policy to extremely long sessions (>10 k tokens) could become a bottleneck.
  • Generalization across domains – While benchmarks cover web search and multilingual tasks, it’s unclear how well Context‑ReAct transfers to domains with highly structured data (e.g., codebases, scientific literature) without domain‑specific fine‑tuning.
  • Future directions suggested by the authors include learning the operator policy end‑to‑end with reinforcement learning, extending the framework to multi‑agent collaboration, and exploring hierarchical compression schemes for even deeper context reduction.

Authors

  • Yijun Lu
  • Rui Ye
  • Yuwen Du
  • Jiajun Wang
  • Songhua Liu
  • Siheng Chen

Paper Information

  • arXiv ID: 2605.05191v1
  • Categories: cs.AI
  • Published: May 6, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Normalizing Trajectory Models

Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coar...