[Paper] Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning
Overview
We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates. In matched SFT/RLVR checkpoints on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, the SFT-to-RLVR increment differs sharply from the SFT update in token-level delta-log-probability, and full-parameter gradient ascent forgets only by damaging retain MATH and GSM8K. MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling magnitude, then updates only the top-ranked subset. On the primary model, MAST induces statistically significant target forgetting (MATH forget 45/150 to 37/150; McNemar p=0.0078) while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). The advantage reproduces across seeds, NPO/SimNPO objectives, and Qwen3, where MAST preserves GSM8K while full-parameter unlearning collapses it.
Key Contributions
This paper presents research in the following areas:
- cs.LG
- cs.AI
이 논문은 다음과 같은 분야를 다룹니다:
- cs.LG
- cs.AI
Methodology
Please refer to the full paper for detailed methodology.
자세한 내용은 전체 논문을 참고하십시오.
Practical Implications
This research contributes to the advancement of cs.LG.
이 연구는 cs.LG의 발전에 기여합니다.
Authors
- Chenyu Zhou
- Qiliang Jiang
- Shuning Wu
- Xu Zhou
Paper Information
- arXiv ID: 2606.19222v1
- Categories: cs.LG, cs.AI
- Published: June 17, 2026
- PDF: Download PDF