[Paper] Quantization-Robust LLM Unlearning via Low-Rank Adaptation
Source: arXiv - 2602.13151v1
Overview
The paper tackles a practical snag in deploying large language models (LLMs): after you “unlearn” (i.e., delete) specific knowledge from a fine‑tuned model, aggressive post‑training quantization (PTQ) – often required to run the model on edge devices or to cut inference costs – can wipe out those unlearning updates. The authors show that standard full‑parameter fine‑tuning produces weight changes that are too tiny to survive 4‑bit quantization, and they propose a LoRA‑based (Low‑Rank Adaptation) solution that keeps the unlearning effect intact even after quantization.
Key Contributions
- Identified quantization‑induced forgetting reversal: Demonstrated that 4‑bit PTQ can restore a model’s pre‑unlearning behavior when using conventional full‑parameter unlearning methods.
- LoRA‑based unlearning pipeline: Introduced a workflow that freezes the base LLM and concentrates all unlearning updates into low‑rank adapter modules, making the changes robust to low‑bit quantization.
- Empirical gains on Llama‑2‑7B: Achieved up to +7.93 points in 4‑bit utility on the MUSE BOOKS benchmark and +4.76 points on the NEWS benchmark compared to full‑parameter unlearning.
- Improved privacy leakage metrics: Showed a dramatic reduction in privacy leakage (e.g., GA+KLR on BOOKS moved from –25.68 to –5.86) while preserving strong forgetting (VerMem & KnowMem ≈ 0).
- Open‑source‑ready recipe: Provided a reproducible pipeline that can be plugged into existing PTQ toolchains (e.g., GPTQ, AWQ) with minimal code changes.
Methodology
-
Baseline unlearning (full‑parameter fine‑tuning):
- The entire LLM is fine‑tuned on a “forget” dataset, aiming to reduce the model’s ability to recall that data.
- After fine‑tuning, the model is quantized to 4‑bit using a standard PTQ algorithm.
-
LoRA‑based unlearning:
- Freeze the base model (the 7B Llama‑2 weights stay untouched).
- Insert low‑rank adapter matrices (typically rank = 4–8) into each transformer layer.
- Train only the adapters on the forget dataset. Because the adapters are separate, their weight updates are orders of magnitude larger than the tiny changes spread across the whole model.
- After adapter training, apply 4‑bit PTQ to the combined model (base + adapters). The adapters’ larger magnitude updates survive quantization, preserving the unlearning effect.
-
Evaluation suite:
- Utility: Measured with NPO (Negative Prompt Overlap) + GDR (Generalized Dialogue Recall) on the MUSE BOOKS and NEWS subsets.
- Forgetting: Assessed via VerMem (Verification Memory) and KnowMem (Knowledge Memory) – both should approach zero after successful unlearning.
- Privacy leakage: Quantified with the PrivLeak metric (closer to 0 = less leakage).
The pipeline is deliberately lightweight: training LoRA adapters typically requires < 1 % of the compute of full‑model fine‑tuning, and the adapters add only a few megabytes to the model size.
Results & Findings
| Benchmark | Metric | Full‑param (4‑bit) | LoRA (4‑bit) | Δ |
|---|---|---|---|---|
| MUSE BOOKS | NPO+GDR | 50.17 | 58.10 | +7.93 |
| MUSE NEWS | GA+GDR | 40.06 | 44.82 | +4.76 |
| Privacy (GA+KLR, BOOKS) | PrivLeak | –25.68 | –5.86 | +19.82 (much less leakage) |
| Forgetting | VerMem / KnowMem | ≈ 0 (both) | ≈ 0 (both) | – |
Key takeaways
- Utility improves despite the aggressive 4‑bit quantization, indicating that LoRA adapters retain more of the model’s expressive power after unlearning.
- Privacy leakage drops dramatically, meaning that an adversary probing the quantized model is far less likely to recover the forgotten data.
- Training cost is slashed – LoRA adapters converge in a few hundred steps, whereas full‑parameter fine‑tuning can take thousands.
Practical Implications
- Edge & mobile deployment: Companies that ship LLM‑powered features on devices (e.g., on‑device assistants, code completion tools) can now comply with “right‑to‑be‑forgotten” requests without sacrificing the low‑memory footprint that quantization provides.
- Regulatory compliance: GDPR‑style data erasure mandates can be met more reliably because the unlearning effect survives the quantization step that is often mandatory for production inference pipelines.
- Cost‑effective model updates: Instead of re‑training or fine‑tuning the entire model each time a piece of data must be removed, teams can simply update a small set of adapters and re‑quantize, cutting GPU hours and cloud spend.
- Toolchain integration: The approach plugs into existing PTQ libraries (e.g.,
bitsandbytes,GPTQ) and LoRA frameworks (peft,loralib), making adoption straightforward for developers already familiar with these ecosystems.
Limitations & Future Work
- Scope limited to 4‑bit PTQ: The study focuses on 4‑bit quantization; behavior under more extreme quantization (e.g., 2‑bit) or mixed‑precision schemes remains unexplored.
- Adapter rank selection: While the paper uses a fixed low rank, optimal rank may vary across model sizes and downstream tasks; an automated rank‑search could improve robustness.
- Generalization to other architectures: Experiments are confined to Llama‑2‑7B; applying the method to encoder‑only models (e.g., BERT) or multimodal LLMs may require additional tweaks.
- Long‑term forgetting stability: The paper evaluates forgetting shortly after unlearning; future work should assess whether the effect persists after further fine‑tuning or continual learning cycles.
Authors
- João Vitor Boer Abitante
- Joana Meneguzzo Pasquali
- Luan Fonseca Garcia
- Ewerton de Oliveira
- Thomas da Silva Paula
- Rodrigo C. Barros
- Lucas S. Kupssinskü
Paper Information
- arXiv ID: 2602.13151v1
- Categories: cs.LG, cs.CL
- Published: February 13, 2026
- PDF: Download PDF