[Paper] Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

발행: (2026년 6월 18일 AM 12:59 GMT+9)
2 분 소요
원문: arXiv

출처: arXiv - 2606.19222v1

Overview

We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates. In matched SFT/RLVR checkpoints on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, the SFT-to-RLVR increment differs sharply from the SFT update in token-level delta-log-probability, and full-parameter gradient ascent forgets only by damaging retain MATH and GSM8K. MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling magnitude, then updates only the top-ranked subset. On the primary model, MAST induces statistically significant target forgetting (MATH forget 45/150 to 37/150; McNemar p=0.0078) while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). The advantage reproduces across seeds, NPO/SimNPO objectives, and Qwen3, where MAST preserves GSM8K while full-parameter unlearning collapses it.

Key Contributions

This paper presents research in the following areas:

  • cs.LG
  • cs.AI

이 논문은 다음과 같은 분야를 다룹니다:

  • cs.LG
  • cs.AI

Methodology

Please refer to the full paper for detailed methodology.
자세한 내용은 전체 논문을 참고하십시오.

Practical Implications

This research contributes to the advancement of cs.LG.
이 연구는 cs.LG의 발전에 기여합니다.

Authors

  • Chenyu Zhou
  • Qiliang Jiang
  • Shuning Wu
  • Xu Zhou

Paper Information

  • arXiv ID: 2606.19222v1
  • Categories: cs.LG, cs.AI
  • Published: June 17, 2026
  • PDF: Download PDF
0 조회
Back to Blog

관련 글

더 보기 »