[Paper] Protecting the Undeleted in Machine Unlearning
Source: arXiv - 2602.16697v1
Overview
Machine unlearning promises to let you delete a user’s data from a trained model as if that data had never been used. Cohen et al. show that the most common “perfect‑retraining” notion of unlearning can unintentionally leak information about the remaining (undeleted) data. Their work uncovers a new class of reconstruction attacks and proposes a safer security definition that still supports practical operations like summations and statistical learning.
Key Contributions
- Reconstruction attack: Demonstrates that an adversary controlling only a handful of data points can recover almost the entire original dataset by repeatedly issuing deletion requests to a model that satisfies the perfect‑retraining guarantee.
- Critical survey of existing definitions: Shows that current formalizations of machine unlearning are either vulnerable to the attack or so restrictive that they cannot implement basic functionalities (e.g., exact sum aggregation).
- New security definition: Introduces a “undeleted‑data protection” notion that explicitly prevents leakage about the data that stays in the system when other points are deleted.
- Positive feasibility results: Proves that under the new definition you can still build useful primitives such as bulletin‑board style data stores, secure summation services, and standard statistical learning pipelines.
Methodology
- Threat model – The attacker can insert a small number of malicious records into the training set and then issue a sequence of deletion requests for arbitrary records.
- Attack construction – By observing the model’s output (or a public summary) after each deletion, the attacker solves a system of equations that gradually reveals the contributions of the untouched records, eventually reconstructing the whole dataset.
- Formal analysis of definitions – The authors formalize several existing unlearning definitions (e.g., “perfect retraining”, differential‑privacy‑based unlearning) and prove that each either allows the attack or disallows essential operations.
- Design of a new definition – They define a security game where the adversary’s view after deletions must be indistinguishable from a view generated on a dataset that never contained the deleted points, while keeping the undeleted points perfectly hidden.
- Constructive proofs – Using standard cryptographic tools (secret sharing, homomorphic encryption) and algorithmic tricks (incremental updates), they build concrete mechanisms that satisfy the new definition for common tasks.
Results & Findings
- Attack effectiveness: For tasks such as linear regression, logistic regression, and simple counting queries, the reconstruction attack succeeds with high probability after only (O(\log n)) deletions, where (n) is the dataset size.
- Vulnerability of perfect retraining: Any algorithm that guarantees the exact same model as “retraining from scratch” after deletions leaks linear constraints on the remaining data, which the attacker can solve.
- Impossibility under existing definitions: The paper proves that no mechanism can simultaneously achieve perfect retraining, support exact summation, and protect undeleted data.
- Feasibility of the new definition: The authors present prototype constructions (e.g., a secure bulletin board using additive secret sharing) that achieve the new security guarantee with only modest overhead (constant‑factor increase in computation and storage).
Practical Implications
- Regulatory compliance: Companies building “right‑to‑be‑forgotten” APIs need to be aware that simply re‑training or applying existing unlearning libraries may expose other users’ data.
- Design of ML‑as‑a‑service platforms: Service providers should adopt unlearning primitives that satisfy the undeleted‑data protection definition, especially when models are exposed via public endpoints.
- Secure data pipelines: The paper’s constructions enable safe aggregation (e.g., telemetry summation) and incremental learning without risking cross‑user leakage, useful for analytics dashboards and federated learning setups.
- Tooling impact: Open‑source unlearning frameworks may need to incorporate cryptographic back‑ends (secret sharing, homomorphic encryption) to meet the stronger guarantee, influencing performance trade‑offs.
Limitations & Future Work
- Scope of attacks: The reconstruction attack is demonstrated for linear‑type models and counting queries; extending it to deep neural networks remains an open question.
- Performance overhead: While the proposed constructions are theoretically efficient, real‑world latency and memory costs for large‑scale models have not been empirically evaluated.
- Broader threat models: The paper assumes the attacker can observe the model after each deletion; future work could explore weaker observation models (e.g., only final model access).
- Integration with differential privacy: Combining the new undeleted‑data protection definition with DP‑based training to obtain both privacy and unlearning guarantees is a promising direction.
Authors
- Aloni Cohen
- Refael Kohen
- Kobbi Nissim
- Uri Stemmer
Paper Information
- arXiv ID: 2602.16697v1
- Categories: cs.LG, cs.DS
- Published: February 18, 2026
- PDF: Download PDF