[Paper] Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

Published: 3 days ago (June 10, 2026 at 05:42 AM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.11867v1

Overview

Mixture-of-Experts (MoE) and reinforcement learning (RL) post-training now dominate large language model (LLM) development, yet expert load imbalance remains a critical challenge. Existing load-balancing systems target pre-training by relying on historical step-level statistics. However, these methods fail under the unique workload dynamics of RL post-training: the step-level load is stable, but the tiny batch sizes processed during micro-steps cause severe, high-frequency load fluctuations. We introduce ForeMoE, a micro-step-level load balancing system for MoE RL post-training. Instead of relying on historical statistics, ForeMoE exploits the multi-stage RL pipeline (rollout, recompute, policy update) by using foreseeable routing information from the rollout stage to proactively guide load balancing in the remaining stages. To support frequent per-micro-step reconfiguration, ForeMoE employs a hierarchical planner that decomposes the NP-hard load balancing problem into tractable sub-components, alongside a transfer engine that leverages complementary hardware paths (CPU-assisted and GPU-direct) for overlapped expert transfer. Evaluations on 64 GPUs demonstrate that ForeMoE achieves up to a 1.45$\times$ speedup over state-of-the-art RL post-training systems.

Key Contributions

This paper presents research in the following areas:

cs.DC

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.DC.

Authors

Yuming Zhou
Haoyang Li
Sheng Lin
Yanfeng Zhao
Tong Zhao
Xupeng Miao
Jie Jiang
Fangcheng Fu
Bin Cui

Paper Information

arXiv ID: 2606.11867v1
Categories: cs.DC
Published: June 10, 2026
PDF: Download PDF

[Paper] Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Finding Conservation Laws of Large Dynamical Systems with Tasks and Futures: A Case Study in Utilizing Dynamic Data Dependencies

[Paper] Temporal Conductance and Bounds on the Voter Model for Dynamic Networks

[Paper] SupraSNN: Exploiting Synapse-Level Parallelism in Spiking Neural Network Accelerators through Co-Optimized Mapping and Scheduling

[Paper] Work Stealing for the 2D-Mesh Topology of Satellite Constellations in Low Earth Orbit