[Paper] Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

Published: (June 10, 2026 at 05:42 AM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.11867v1

Overview

Mixture-of-Experts (MoE) and reinforcement learning (RL) post-training now dominate large language model (LLM) development, yet expert load imbalance remains a critical challenge. Existing load-balancing systems target pre-training by relying on historical step-level statistics. However, these methods fail under the unique workload dynamics of RL post-training: the step-level load is stable, but the tiny batch sizes processed during micro-steps cause severe, high-frequency load fluctuations. We introduce ForeMoE, a micro-step-level load balancing system for MoE RL post-training. Instead of relying on historical statistics, ForeMoE exploits the multi-stage RL pipeline (rollout, recompute, policy update) by using foreseeable routing information from the rollout stage to proactively guide load balancing in the remaining stages. To support frequent per-micro-step reconfiguration, ForeMoE employs a hierarchical planner that decomposes the NP-hard load balancing problem into tractable sub-components, alongside a transfer engine that leverages complementary hardware paths (CPU-assisted and GPU-direct) for overlapped expert transfer. Evaluations on 64 GPUs demonstrate that ForeMoE achieves up to a 1.45$\times$ speedup over state-of-the-art RL post-training systems.

Key Contributions

This paper presents research in the following areas:

  • cs.DC

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.DC.

Authors

  • Yuming Zhou
  • Haoyang Li
  • Sheng Lin
  • Yanfeng Zhao
  • Tong Zhao
  • Xupeng Miao
  • Jie Jiang
  • Fangcheng Fu
  • Bin Cui

Paper Information

  • arXiv ID: 2606.11867v1
  • Categories: cs.DC
  • Published: June 10, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »