[Paper] AdaGReS:Adaptive Greedy Context Selection via Redundancy-Aware Scoring for Token-Budgeted RAG

Published: 1 month ago (December 31, 2025 at 01:48 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.25052v1

Overview

Retrieval‑augmented generation (RAG) systems rely on pulling in external text snippets (contexts) to help large language models answer questions or complete tasks. The paper AdaGReS tackles a surprisingly common problem: the top‑k retrieval step often returns many overlapping or duplicate chunks, squandering the limited token budget that a model can process and hurting answer quality. AdaGReS introduces a redundancy‑aware selection algorithm that balances relevance with diversity, automatically calibrates its trade‑off, and comes with provable near‑optimal guarantees.

Key Contributions

Redundancy‑aware objective: Formulates context selection as a set‑level optimization that rewards relevance to the query while penalizing intra‑set similarity.
Greedy token‑budgeted selection: Derives marginal gains for each candidate chunk and selects chunks greedily until the token budget is exhausted.
Adaptive relevance‑redundancy trade‑off: Proposes a closed‑form, instance‑specific calibration of the λ parameter (relevance vs. redundancy) based on statistics of the candidate pool and the token budget, removing the need for manual tuning.
Theoretical guarantee: Shows the objective is ε‑approximate submodular under realistic embedding similarity assumptions, giving the greedy algorithm a provable (1‑1/e‑ε) approximation bound.
Empirical validation: Demonstrates consistent reductions in redundancy and improvements in downstream QA performance on Natural Questions and a high‑redundancy biomedical drug corpus.

Methodology

Candidate generation: For a given query, a dense retriever returns a pool of N text chunks (e.g., 100). Each chunk has an embedding similarity score sᵢ indicating relevance.
Scoring function:

[ F(S) = \sum_{i\in S} s_i ;-; \lambda \sum_{i,j\in S, i<j} \text{sim}(c_i,c_j) ]
- The first term rewards relevance.
- The second term penalizes redundancy using pairwise cosine similarity between chunk embeddings.
- λ controls the trade‑off.
Adaptive λ: Instead of fixing λ, AdaGReS computes it analytically from the mean/variance of relevance scores and the token budget B. The formula scales λ up when the pool is highly redundant or the budget is tight, and down when relevance dominates.
Greedy selection under budget: Starting with an empty set, the algorithm repeatedly adds the chunk that yields the largest marginal gain ΔF while keeping the total token count ≤ B. Because the objective is (approximately) submodular, this greedy process is near‑optimal.
Implementation details: The method plugs into any existing retriever‑generator pipeline with only a few extra lines of code (computing pairwise similarities and the adaptive λ). No re‑training of the retriever or generator is required.

Results & Findings

Dataset	Baseline (top‑k)	AdaGReS	Redundancy ↓	Token budget usage ↑	End‑to‑end EM ↑
Natural Questions	10 chunks (≈ 800 tokens)	8 chunks (≈ 650 tokens)	32 % fewer duplicate tokens	19 % tokens saved	+2.1 % exact match
Biomedical drug corpus	12 chunks (≈ 900 tokens)	7 chunks (≈ 540 tokens)	45 % redundancy reduction	40 % tokens saved	+3.4 % exact match

Redundancy control: AdaGReS consistently selects fewer overlapping chunks, freeing up tokens for more diverse information.
Answer quality: The modest token savings translate into measurable gains in exact‑match and F1 scores for open‑domain QA.
Robustness: Across varying budgets (from 300 to 1200 tokens) and different retrieval models (dense vs. BM25), AdaGReS maintains its advantage, showing that the adaptive λ truly adapts to pool characteristics.

Practical Implications

Cost‑effective inference: By squeezing out redundant tokens, developers can stay within tighter token limits (e.g., for API pricing or latency constraints) without sacrificing answer quality.
Plug‑and‑play improvement: Existing RAG pipelines (LangChain, LlamaIndex, Haystack, etc.) can integrate AdaGReS with minimal code changes—just replace the top‑k slice with the greedy selector.
Domain‑specific gains: In fields with naturally high overlap (legal documents, biomedical literature, product manuals), the redundancy penalty yields larger token savings and clearer, more concise context for the LLM.
Better user experience: End‑users see more accurate and less “hallucinated” answers because the model receives higher‑quality, non‑redundant evidence.
Scalable to large corpora: The algorithm’s complexity is O(N log N) for sorting plus O(N · k) for marginal gain updates, which is tractable even for thousands of candidates when combined with approximate nearest‑neighbor search.

Limitations & Future Work

Pairwise similarity cost: Computing all pairwise similarities scales quadratically with the candidate pool; the authors mitigate this with approximate clustering, but very large pools may still be expensive.
Embedding dependence: The redundancy penalty relies on the quality of chunk embeddings; poor embeddings could misjudge similarity and either over‑penalize or under‑penalize redundancy.
Single‑modal focus: The current formulation assumes textual chunks; extending to multimodal evidence (tables, figures, code snippets) would require new similarity measures.
User‑controlled trade‑off: While the adaptive λ removes manual tuning, some applications might want explicit control over relevance vs. diversity—future work could expose a higher‑level “budget aggressiveness” knob.
End‑to‑end training: Integrating the selection objective into a differentiable retriever‑generator loop could further improve performance, an avenue the authors suggest for follow‑up research.

Authors

Chao Peng
Bin Wang
Zhilei Long
Jinfang Sheng

Paper Information

arXiv ID: 2512.25052v1
Categories: cs.CL, cs.AI, cs.IR
Published: December 31 2025
PDF: Download PDF

[Paper] AdaGReS:Adaptive Greedy Context Selection via Redundancy-Aware Scoring for Token-Budgeted RAG

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

[Paper] Memory Bank Compression for Continual Adaptation of Large Language Models

[Paper] Exploring the Performance of Large Language Models on Subjective Span Identification Tasks

[Paper] TeleDoCTR: Domain-Specific and Contextual Troubleshooting for Telecommunications