[Paper] Quantization-Robust LLM Unlearning via Low-Rank Adaptation

Published: (February 13, 2026 at 01:01 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.13151v1

Overview

The paper tackles a practical snag in deploying large language models (LLMs): after you “unlearn” (i.e., delete) specific knowledge from a fine‑tuned model, aggressive post‑training quantization (PTQ) – often required to run the model on edge devices or to cut inference costs – can wipe out those unlearning updates. The authors show that standard full‑parameter fine‑tuning produces weight changes that are too tiny to survive 4‑bit quantization, and they propose a LoRA‑based (Low‑Rank Adaptation) solution that keeps the unlearning effect intact even after quantization.

Key Contributions

  • Identified quantization‑induced forgetting reversal: Demonstrated that 4‑bit PTQ can restore a model’s pre‑unlearning behavior when using conventional full‑parameter unlearning methods.
  • LoRA‑based unlearning pipeline: Introduced a workflow that freezes the base LLM and concentrates all unlearning updates into low‑rank adapter modules, making the changes robust to low‑bit quantization.
  • Empirical gains on Llama‑2‑7B: Achieved up to +7.93 points in 4‑bit utility on the MUSE BOOKS benchmark and +4.76 points on the NEWS benchmark compared to full‑parameter unlearning.
  • Improved privacy leakage metrics: Showed a dramatic reduction in privacy leakage (e.g., GA+KLR on BOOKS moved from –25.68 to –5.86) while preserving strong forgetting (VerMem & KnowMem ≈ 0).
  • Open‑source‑ready recipe: Provided a reproducible pipeline that can be plugged into existing PTQ toolchains (e.g., GPTQ, AWQ) with minimal code changes.

Methodology

  1. Baseline unlearning (full‑parameter fine‑tuning):

    • The entire LLM is fine‑tuned on a “forget” dataset, aiming to reduce the model’s ability to recall that data.
    • After fine‑tuning, the model is quantized to 4‑bit using a standard PTQ algorithm.
  2. LoRA‑based unlearning:

    • Freeze the base model (the 7B Llama‑2 weights stay untouched).
    • Insert low‑rank adapter matrices (typically rank = 4–8) into each transformer layer.
    • Train only the adapters on the forget dataset. Because the adapters are separate, their weight updates are orders of magnitude larger than the tiny changes spread across the whole model.
    • After adapter training, apply 4‑bit PTQ to the combined model (base + adapters). The adapters’ larger magnitude updates survive quantization, preserving the unlearning effect.
  3. Evaluation suite:

    • Utility: Measured with NPO (Negative Prompt Overlap) + GDR (Generalized Dialogue Recall) on the MUSE BOOKS and NEWS subsets.
    • Forgetting: Assessed via VerMem (Verification Memory) and KnowMem (Knowledge Memory) – both should approach zero after successful unlearning.
    • Privacy leakage: Quantified with the PrivLeak metric (closer to 0 = less leakage).

The pipeline is deliberately lightweight: training LoRA adapters typically requires < 1 % of the compute of full‑model fine‑tuning, and the adapters add only a few megabytes to the model size.

Results & Findings

BenchmarkMetricFull‑param (4‑bit)LoRA (4‑bit)Δ
MUSE BOOKSNPO+GDR50.1758.10+7.93
MUSE NEWSGA+GDR40.0644.82+4.76
Privacy (GA+KLR, BOOKS)PrivLeak–25.68–5.86+19.82 (much less leakage)
ForgettingVerMem / KnowMem≈ 0 (both)≈ 0 (both)

Key takeaways

  • Utility improves despite the aggressive 4‑bit quantization, indicating that LoRA adapters retain more of the model’s expressive power after unlearning.
  • Privacy leakage drops dramatically, meaning that an adversary probing the quantized model is far less likely to recover the forgotten data.
  • Training cost is slashed – LoRA adapters converge in a few hundred steps, whereas full‑parameter fine‑tuning can take thousands.

Practical Implications

  • Edge & mobile deployment: Companies that ship LLM‑powered features on devices (e.g., on‑device assistants, code completion tools) can now comply with “right‑to‑be‑forgotten” requests without sacrificing the low‑memory footprint that quantization provides.
  • Regulatory compliance: GDPR‑style data erasure mandates can be met more reliably because the unlearning effect survives the quantization step that is often mandatory for production inference pipelines.
  • Cost‑effective model updates: Instead of re‑training or fine‑tuning the entire model each time a piece of data must be removed, teams can simply update a small set of adapters and re‑quantize, cutting GPU hours and cloud spend.
  • Toolchain integration: The approach plugs into existing PTQ libraries (e.g., bitsandbytes, GPTQ) and LoRA frameworks (peft, loralib), making adoption straightforward for developers already familiar with these ecosystems.

Limitations & Future Work

  • Scope limited to 4‑bit PTQ: The study focuses on 4‑bit quantization; behavior under more extreme quantization (e.g., 2‑bit) or mixed‑precision schemes remains unexplored.
  • Adapter rank selection: While the paper uses a fixed low rank, optimal rank may vary across model sizes and downstream tasks; an automated rank‑search could improve robustness.
  • Generalization to other architectures: Experiments are confined to Llama‑2‑7B; applying the method to encoder‑only models (e.g., BERT) or multimodal LLMs may require additional tweaks.
  • Long‑term forgetting stability: The paper evaluates forgetting shortly after unlearning; future work should assess whether the effect persists after further fine‑tuning or continual learning cycles.

Authors

  • João Vitor Boer Abitante
  • Joana Meneguzzo Pasquali
  • Luan Fonseca Garcia
  • Ewerton de Oliveira
  • Thomas da Silva Paula
  • Rodrigo C. Barros
  • Lucas S. Kupssinskü

Paper Information

  • arXiv ID: 2602.13151v1
  • Categories: cs.LG, cs.CL
  • Published: February 13, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »