[Paper] MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI
Source: arXiv - 2512.09867v1
Overview
The paper MedForget tackles a pressing problem for AI systems that ingest medical data: how to selectively erase the influence of specific patient records while keeping the model useful. By building a hierarchy‑aware multimodal unlearning benchmark, the authors give researchers and engineers a concrete playground for testing “right‑to‑be‑forgotten” compliance in medical AI.
Key Contributions
- Hierarchy‑aware benchmark – Organizes a large hospital dataset as a four‑level tree (Institution → Patient → Study → Section) and defines eight granular “forget” targets spanning the hierarchy.
- Multimodal test set – 3 840 instances that couple radiology images, natural‑language questions, and answers, covering generation, classification, and cloze tasks.
- Explicit retain/forget splits – Every benchmark instance is labeled as to be kept or to be removed, with re‑phrased variants for robust evaluation.
- Comprehensive evaluation of SOTA unlearning methods – Four recent algorithms are benchmarked across three downstream tasks, revealing systematic trade‑offs.
- Reconstruction attack framework – A novel probing technique that incrementally adds hierarchical context to prompts, exposing residual memorization after unlearning.
- Open‑source, HIPAA‑aligned testbed – All data and code are released under a permissive license, ready for integration into compliance pipelines.
Methodology
- Data modeling – The authors treat the medical corpus as a nested hierarchy. Each node (e.g., a specific patient) can be marked for forgetting while its ancestors or siblings may be retained.
- Task design – Three representative downstream tasks are built on the same multimodal backbone:
- Generation: produce radiology report snippets given an image and a question.
- Classification: predict a diagnostic label from image‑question pairs.
- Cloze: fill in masked tokens in clinical narratives.
- Unlearning procedures – Four state‑of‑the‑art unlearning algorithms (gradient‑based data deletion, influence‑function pruning, knowledge‑distillation based forgetting, and parameter‑replay) are applied to the pretrained multimodal LLM. Each method is run with the same retain/forget split.
- Evaluation metrics –
- Forgetting success – measured by the drop in model confidence on forgotten items and by the reconstruction attack score.
- Utility preservation – standard task performance (BLEU/ROUGE for generation, accuracy/F1 for classification, exact‑match for cloze).
- Reconstruction attack – Starting from a generic prompt, the attacker appends hierarchical cues (e.g., “Hospital X, patient Y”) step‑by‑step and checks whether the model can recover the originally forgotten answer. This quantifies how much hierarchical pathway remains encoded.
Results & Findings
| Unlearning method | Forgetting (coarse) | Forgetting (fine) | Utility drop (avg.) |
|---|---|---|---|
| Gradient‑based | 92 % ↓ | 68 % ↓ | –3 % |
| Influence‑func | 88 % ↓ | 61 % ↓ | –5 % |
| Distillation | 90 % ↓ | 70 % ↓ | –2 % |
| Parameter‑replay | 85 % ↓ | 55 % ↓ | –4 % |
- Coarse‑level forgetting (e.g., entire institution) is relatively easy: models become highly resistant to the reconstruction attack.
- Fine‑grained forgetting (e.g., a single study section) leaves noticeable leakage; the reconstruction attack can recover the hidden answer after only a few hierarchical cues.
- Across the board, achieving complete forgetting inevitably harms downstream performance, especially for the most granular targets.
- No existing method simultaneously attains >90 % forgetting and <2 % utility loss at the finest hierarchy level, highlighting a gap for future research.
Practical Implications
- Compliance pipelines – MedForget gives engineers a ready‑made “privacy‑audit” suite to verify that a medical AI model can delete patient‑specific data on demand, a prerequisite for HIPAA/GDPR compliance.
- Model lifecycle management – Organizations can schedule periodic unlearning runs (e.g., after a patient withdraws consent) and instantly measure impact on diagnostic accuracy.
- Risk assessment tools – The reconstruction attack can be integrated into CI/CD testing to flag residual memorization before a model is shipped.
- Design of hierarchical data stores – The benchmark demonstrates that structuring medical archives as explicit hierarchies enables more precise forgetting, encouraging developers to adopt similar schemas in production systems.
- Guidance for API providers – Cloud AI services that expose multimodal medical models can expose “forget” endpoints that internally invoke the tested unlearning algorithms, offering a compliant service layer to hospitals.
Limitations & Future Work
- Dataset scope – MedForget focuses on radiology images and associated text; other modalities (e.g., pathology slides, genomics) are not covered.
- Scale – Experiments run on a single multimodal LLM (≈1 B parameters). Scaling to larger foundation models may reveal different forgetting dynamics.
- Attack realism – The reconstruction attack assumes an adversary can query the model with hierarchical cues; real‑world attackers may have more limited access.
- Method diversity – Only four unlearning algorithms were evaluated; newer techniques (e.g., continual‑learning based forgetting, differential‑privacy training) remain to be benchmarked.
- Utility‑forgetting trade‑off – The paper highlights the tension but does not propose a principled way to balance them; future work could explore adaptive forgetting budgets or hierarchical regularizers that preserve diagnostic knowledge while erasing patient‑level traces.
MedForget bridges a critical gap between cutting‑edge multimodal AI and the legal obligations of the healthcare industry, offering developers a concrete, open‑source framework to build truly “right‑to‑be‑forgotten” medical systems.
Authors
- Fengli Wu
- Vaidehi Patil
- Jaehong Yoon
- Yue Zhang
- Mohit Bansal
Paper Information
- arXiv ID: 2512.09867v1
- Categories: cs.CV, cs.AI, cs.CL
- Published: December 10, 2025
- PDF: Download PDF