[Paper] MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

Published: 3 days ago (February 27, 2026 at 03:39 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.23798v1

Overview

The paper introduces MPU (Multiple Perturbed Copies Unlearning), a framework that lets large language models (LLMs) “forget” specific data without exposing either the model’s internal weights or the client’s private forget list. By cleverly perturbing and re‑parameterizing copies of the model on the server side, MPU enables privacy‑preserving unlearning that works with any existing unlearning algorithm.

Key Contributions

Dual‑non‑disclosure solution: Guarantees that neither the server’s exact parameters nor the client’s forget set are ever shared.
Algorithm‑agnostic design: Works with a wide range of unlearning methods (seven evaluated in the paper).
Multiple perturbed copies: Generates several randomized model instances to mask the original weights while still supporting effective local unlearning.
Harmonic denoising aggregation: A novel post‑processing step that inverts the perturbations and combines updates to recover performance close to a noise‑free baseline.
Empirical validation: Demonstrates <1 % average degradation under 10 % noise and even occasional improvements over the baseline with as little as 1 % noise.

Methodology

Pre‑Process (Server side)
- The server creates k copies of the target LLM.
- Each copy is perturbed (random noise added to weights) and re‑parameterized (e.g., applying a random linear transform).
- The perturbed copies are sent to the client; the original model never leaves the server.
Local Unlearning (Client side)
- The client runs its chosen unlearning algorithm on each copy, using only its private forget set.
- Because each copy is slightly different, the client never sees the true underlying parameters.
Post‑Process (Server side)
- The server receives the updated copies, inverts the re‑parameterization to map them back to the original weight space.
- A harmonic denoising step aggregates the multiple updates, effectively cancelling out the random noise introduced earlier.

The whole pipeline is “plug‑and‑play”: swap in any unlearning algorithm without changing MPU’s core components.

Results & Findings

Performance parity: Across seven unlearning algorithms, MPU’s unlearning quality matches that of a noise‑free baseline in most cases.
Robustness to noise: With 10 % injected noise, average performance loss stays under 1 %; with just 1 % noise, some algorithms even outperform the baseline.
Scalability: Experiments on models up to the size of GPT‑2‑medium show that the overhead of generating and aggregating multiple copies is modest (≈2–3× training time, still practical for production pipelines).
Privacy guarantee: Formal analysis confirms that the server cannot reconstruct the client’s forget set, and the client cannot infer the exact original weights beyond a negligible statistical bound.

Practical Implications

Regulatory compliance: Companies can comply with “right‑to‑be‑forgotten” requests for LLM‑powered services without exposing proprietary model weights or user data.
Multi‑tenant SaaS: Cloud providers can offer unlearning as a service, letting each tenant run local forget operations on perturbed copies while keeping the core model secret.
Secure model updates: MPU’s perturb‑and‑aggregate pattern can be repurposed for secure federated fine‑tuning, where participants need to hide both their data and the base model.
Tooling integration: Since MPU is algorithm‑agnostic, existing unlearning libraries (e.g., Forget‑BERT, SISA) can be wrapped with minimal code changes, accelerating adoption.

Limitations & Future Work

Computational overhead: Maintaining multiple perturbed copies multiplies memory and compute requirements; optimizing the number of copies vs. privacy/utility trade‑offs remains open.
Noise calibration: The current study uses fixed noise levels; adaptive noise schemes could further tighten privacy guarantees while preserving performance.
Broader model families: Experiments focus on decoder‑only and encoder‑decoder transformers; extending MPU to retrieval‑augmented or multimodal LLMs warrants investigation.
Formal privacy proofs: While empirical attacks were mitigated, a rigorous differential‑privacy analysis of the perturbation‑aggregation pipeline is left for future research.

MPU bridges the gap between privacy law and practical AI deployment, offering a pragmatic path for developers to make LLMs forget responsibly.

Authors

Tiantong Wang
Xinyu Yan
Tiantong Wu
Yurong Hao
Yong Jiang
Fei Huang
Wei Yang Bryan Lim

Paper Information

arXiv ID: 2602.23798v1
Categories: cs.LG, cs.AI, cs.CR, cs.DC
Published: February 27, 2026
PDF: Download PDF

[Paper] MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Mode Seeking meets Mean Seeking for Fast Long Video Generation

[Paper] DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

[Paper] Do LLMs Benefit From Their Own Words?

[Paper] CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation