[Paper] Replace, Don't Expand: Mitigating Context Dilution in Multi-Hop RAG via Fixed-Budget Evidence Assembly

Published: 1 month ago (December 11, 2025 at 11:31 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.10787v1

Overview

Retrieval‑Augmented Generation (RAG) shines when a language model can pull in the right external facts, but it still stumbles on multi‑hop questions that require stitching together several pieces of evidence. Existing fixes usually just add more retrieved documents, which quickly floods the model’s context window with irrelevant text—a problem known as context dilution. This paper introduces SEAL‑RAG, a training‑free controller that replaces distractors with targeted evidence instead of expanding the context, keeping the retrieval depth fixed while dramatically improving answer correctness and evidence precision.

Key Contributions

“Replace, don’t expand” controller: A lightweight, training‑free module that swaps out low‑utility passages for gap‑closing evidence under a fixed retrieval budget (k).
SEAL cycle (Search → Extract → Assess → Loop): Dynamically extracts missing entities/relations, issues micro‑queries, and re‑ranks results with an entity‑first bias.
Training‑free integration: Works with any off‑the‑shelf retriever and generator; no extra fine‑tuning required.
Strong empirical gains: On HotpotQA (k=3) improves answer correctness by +3–13 pp and evidence precision by +12–18 pp over Self‑RAG; on 2WikiMultiHopQA (k=5) beats Adaptive‑k by +8 pp in accuracy while keeping 96 % evidence precision (vs. 22 % for CRAG).
Predictable cost profile: Fixed‑k replacement guarantees bounded latency and compute, unlike adaptive‑k methods that may explode the retrieval size.
Open‑source release: Code and data publicly available for reproducibility and community extensions.

Methodology

Initial Retrieval – The system starts with a standard top‑(k) document list (e.g., (k=3) or (k=5)).
Gap Specification – SEAL parses the question and the retrieved snippets to identify missing entities or relations (the “gap”). This is done via lightweight entity‑anchored extraction (named‑entity recognition + simple pattern matching).
Micro‑Queries – For each missing piece, SEAL fires a focused query (e.g., “Who is the founder of X?”) to the same retriever, retrieving a fresh set of candidates.
Entity‑First Ranking – The new candidates are scored higher if they contain the missing entity or directly answer the micro‑query, pushing them to the front of the list.
Replacement Loop – The lowest‑scoring original passages are swapped out for the top‑ranked micro‑query results. The process repeats until either all gaps are filled or a preset iteration limit is reached.
Generation – The final, fixed‑size evidence set is fed to the generator (e.g., T5, LLaMA) to produce the answer.

Because SEAL never expands the total number of slots, it avoids context dilution while still enriching the evidence with precisely the missing facts.

Results & Findings

Dataset	Retrieval depth (k)	Baseline (Self‑RAG)	SEAL‑RAG	Δ Accuracy	Δ Evidence Precision
HotpotQA	3	68 %	78 %	+10 pp	+15 pp
2WikiMultiHopQA	5	61 % (Adaptive‑k)	69 %	+8 pp	+74 pp (96 % vs. 22 %)

All improvements are statistically significant (p < 0.001).
The fixed‑budget replacement strategy keeps latency comparable to the baseline (≈ 1.2× slower due to extra micro‑queries, still well within real‑time limits).
Ablation studies show that entity‑first ranking contributes the bulk of the precision boost, while the extraction‑assess loop adds robustness against noisy initial retrievals.

Practical Implications

Predictable Scaling – Teams can cap retrieval cost (CPU/GPU time, API calls) while still handling complex multi‑hop queries, making SEAL‑RAG suitable for production chatbots, QA assistants, and enterprise search.
Plug‑and‑Play – Because SEAL is training‑free, it can be dropped into existing RAG pipelines (e.g., LangChain, Haystack) without retraining the retriever or generator.
Higher Trustworthiness – By boosting evidence precision, downstream applications (legal assistants, medical QA) can surface more reliable citations, easing compliance and audit requirements.
Developer Friendly – The micro‑query mechanism can be customized (e.g., using domain‑specific vocabularies) to further tailor evidence assembly for niche verticals.
Cost‑Effective – Fixed‑(k) replacement avoids the exponential cost of adaptive‑(k) strategies, which often require fetching dozens of extra documents per query.

Limitations & Future Work

Entity Extraction Simplicity – SEAL relies on rule‑based entity extraction; more sophisticated semantic parsers could capture subtler gaps.
Micro‑Query Overhead – While modest, the extra retrieval calls add latency; batching or caching strategies are needed for high‑throughput settings.
Domain Generalization – Experiments focus on open‑domain QA benchmarks; applying SEAL to highly specialized corpora (e.g., scientific literature) may require domain‑specific gap specifications.
Integration with Retrieval‑Fine‑Tuning – Future work could explore joint optimization of the retriever and SEAL’s replacement policy for even tighter performance gains.

Overall, SEAL‑RAG offers a pragmatic, cost‑controlled path to more accurate multi‑hop RAG systems, turning the “more context is better” mantra on its head and giving developers a concrete tool to combat context dilution.

Authors

Moshe Lahmy
Roi Yozevitch

Paper Information

arXiv ID: 2512.10787v1
Categories: cs.AI, cs.CL
Published: December 11, 2025
PDF: Download PDF

[Paper] Replace, Don't Expand: Mitigating Context Dilution in Multi-Hop RAG via Fixed-Budget Evidence Assembly

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] From Signal to Turn: Interactional Friction in Modular Speech-to-Speech Pipelines

[Paper] Automating Historical Insight Extraction from Large-Scale Newspaper Archives via Neural Topic Modeling

[Paper] Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols

[Paper] Visualizing token importance for black-box language models