[Paper] TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Published: 3 days ago (June 9, 2026 at 01:16 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.11119v1

Overview

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is often limited by insufficient reward contrast, arising when overly simple or complex prompts generate low-variance feedback and when outcome-only rewards assign the same terminal assessment to every decision in a multi-turn rollout. Past efforts have focused on allocating available rollout resources to promising prompts, yet they only leverage sample informativeness at the prompt level and neglect variation in prefix-level informativeness across turns within the same rollout. This work targets multi-turn agentic RL by modeling each ReAct-style thought-action-observation turn as a semantically distinct node, allowing budget allocation to extend from prompt roots to turn-level prefixes with further continuations, which naturally forms tree-structured rollouts. We introduce Tree Rollout Allocation for Contrastive Exploration (TRACE), a unified rollout allocation framework that enhances reward contrast within a fixed sampling budget. Technically, TRACE allocates rollout budget to both prompt roots and intermediate prefixes that are most likely to yield mixed terminal rewards. A shared generalizable predictor estimates conditional success probability at these anchors from prefix histories to guide this allocation. The resulting adaptive tree structure enriches outcome-only feedback and amplifies the policy-update signal. Empirically, TRACE achieves competitive performance and efficiency gains on typical agentic benchmarks, e.g., improving Qwen3-14B Multi-Hop QA average accuracy by 2.8 points over competitive baselines at equal sampling cost.

Key Contributions

This paper presents research in the following areas:

cs.LG
cs.AI
cs.CL

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.LG.

Authors

Heming Zou
Qi Wang
Yun Qu
Yuhang Jiang
Lizhou Cai
Yixiu Mao
Ru Peng
Xin Xu
Weijie Liu
Kai Yang
Saiyong Yang
Xiangyang Ji

Paper Information

arXiv ID: 2606.11119v1
Categories: cs.LG, cs.AI, cs.CL
Published: June 9, 2026
PDF: Download PDF

[Paper] TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

[Paper] EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

[Paper] Operadic consistency: a label-free signal for compositional reasoning failures in LLMs

[Paper] SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation