[Paper] An Agency-Transferring Model-Free Policy Enhancement Technique

Published: (June 8, 2026 at 01:59 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.09825v1

Overview

Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal policy available as a baseline. This paper proposes a method for embedding such a baseline into the RL training process, simultaneously improving training efficiency relative to from-scratch methods and producing a learning policy that outperforms the baseline. At each step, the method arbitrates between the baseline policy and a trainable learning policy, initially relying strongly on the baseline policy and then progressively transferring agency to the learning policy. By the end of training, the learning policy is a standalone neural network that operates without baseline policy support. The paper formalizes what it means for the baseline policy to be functional: under this policy, the agent reaches a goal set and remains there with high probability. The proposed arbitration mechanism is designed to exploit this property during training, yielding high goal-reaching rates right from the beginning of training. A theoretical analysis provides a formal interpretation of this behavior under stated assumptions and extends it to the final baseline-free regime, where explicit lower bounds are derived for the goal-reaching probability of the standalone learning policy. Empirical results on continuous-control benchmarks show that the proposed method achieves returns that match or exceed those of competitive approaches, while maintaining the highest goal-reaching rates throughout training among the compared methods — including in the final stage, where the learning policy operates without any baseline support.

Key Contributions

This paper presents research in the following areas:

  • cs.LG
  • cs.AI
  • eess.SY
  • math.OC

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.LG.

Authors

  • Anton Bolychev
  • Georgiy Malaniya
  • Sinan Ibrahim
  • Pavel Osinenko

Paper Information

  • arXiv ID: 2606.09825v1
  • Categories: cs.LG, cs.AI, eess.SY, math.OC
  • Published: June 8, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »