[Paper] You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories
Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying...