[Paper] Benchmarking Unlearning for Vision Transformers

Published: (February 23, 2026 at 01:33 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.20114v1

Overview

Machine unlearning (MU) – the ability to erase specific training data from a model without retraining from scratch – is becoming a cornerstone for responsible AI. While most MU research has focused on convolutional neural networks (CNNs), Vision Transformers (ViTs) are now the go‑to architecture for many vision tasks. This paper delivers the first systematic benchmark of MU algorithms on ViT‑style models, comparing how well they forget, how they retain overall performance, and what design choices matter most.

Key Contributions

  • First MU benchmark for Vision Transformers covering two major families (plain ViT and hierarchical Swin‑T) across multiple model sizes.
  • Comprehensive experimental matrix: three datasets (small‑scale, medium‑scale, and complex), three fundamentally different MU algorithms, and both single‑shot and continual unlearning scenarios.
  • Memorization‑aware unlearning: evaluates algorithms that exploit training‑data memorization signals, a technique recently shown to boost unlearning quality.
  • Unified evaluation suite: introduces metrics that jointly capture forget quality (how well the target data is removed) and utility (accuracy on retained and test data).
  • Empirical insights into how ViTs store information compared with CNNs, and how different memorization proxies (e.g., gradient‑based vs. activation‑based) affect unlearning performance.
  • Open‑source benchmark toolkit enabling reproducible, fair comparisons for future MU research on transformer‑based vision models.

Methodology

  1. Model families & capacities – The authors train vanilla Vision Transformers (ViT‑B/16) and Swin‑Transformer‑Tiny, each at three parameter scales (small, medium, large).
  2. Datasets
    • CIFAR‑10 (tiny, low‑complexity)
    • ImageNet‑subset (mid‑size, moderate diversity)
    • iNaturalist‑2019 (large, fine‑grained classes)
      These choices let the study isolate the effect of data scale and visual complexity on unlearning.
  3. Unlearning algorithms – Three representative approaches:
    • Retraining‑free gradient projection (a fast, linear‑algebraic method)
    • Data‑dependent weight pruning (removes parameters most tied to the forgotten samples)
    • Memorization‑guided fine‑tuning (uses a memorization score to target the most “remembered” neurons).
  4. Protocols
    • Single‑shot: a one‑off request to delete a fixed subset of images.
    • Continual: a stream of deletion requests arriving over time, testing scalability.
  5. Metrics
    • Forget Score (based on loss increase on forgotten data and similarity of model outputs before/after deletion).
    • Retention Accuracy (performance on the remaining training set).
    • Test Accuracy (generalization on a held‑out test set).
      The three are combined into a single “Unlearning Utility Index” for easy ranking.

All experiments are run with the same training hyper‑parameters, and the code, data splits, and evaluation scripts are released under an MIT license.

Results & Findings

ModelDatasetBest Forget Score ↑Retention Acc. ↓Test Acc. ↔
ViT‑B/16 (medium)CIFAR‑100.92 (Mem‑guided FT)–1.3 %–0.5 %
Swin‑T (small)ImageNet‑subset0.87 (Gradient proj.)–0.9 %–0.3 %
ViT‑L/32 (large)iNaturalist0.81 (Pruning)–2.1 %–1.0 %

Key takeaways

  • Memorization‑guided fine‑tuning consistently yields the highest forget quality with only modest drops in retained and test accuracy, confirming the authors’ hypothesis that leveraging memorization signals is beneficial for ViTs.
  • ViTs memorize training data more strongly than CNNs (higher gradient norms for individual samples), which makes them harder to unlearn but also provides richer signals for targeted removal.
  • Single‑shot unlearning is cheap (≈5 % of full retraining time), while continual unlearning scales linearly with the number of deletion requests; the gradient‑projection method shines in the continual setting due to its low per‑request overhead.
  • Model capacity matters: larger ViTs retain higher test accuracy after unlearning but suffer slightly larger absolute forget scores, indicating a trade‑off between capacity and unlearning precision.

Overall, the benchmark establishes a baseline forget quality of ~0.85–0.92 for current MU algorithms on ViTs, a solid starting point for future improvements.

Practical Implications

  • Compliance pipelines – Companies that must honor “right‑to‑be‑forgotten” requests can now plug a memorization‑guided fine‑tuning step into existing ViT‑based image classification services, achieving high deletion fidelity without full retraining.
  • Model‑as‑a‑service (MaaS) – Cloud providers can expose an API that runs the lightweight gradient‑projection algorithm for rapid, on‑demand data removal, especially useful for continual deletion workloads (e.g., user‑generated content platforms).
  • Security & privacy audits – The unified metrics give auditors a concrete way to verify that a vision model truly forgets specific data, a requirement for many regulatory frameworks.
  • Cost savings – By avoiding full retraining, organizations can cut GPU hours by 80–95 % per deletion request, translating into significant operational expense reductions.
  • Design guidance – The finding that ViTs store richer memorization cues suggests that future model architectures could be deliberately engineered to expose such signals, making compliant unlearning a first‑class feature.

Limitations & Future Work

  • Scope of datasets – While the benchmark spans three datasets, it does not cover video or multimodal vision‑language models, where temporal dynamics may affect memorization.
  • Algorithm diversity – Only three MU strategies were evaluated; newer approaches (e.g., differential‑privacy‑based unlearning or knowledge‑distillation tricks) remain untested on ViTs.
  • Hardware constraints – Experiments were run on a single‑node GPU cluster; scaling to massive ViT‑L/14 models (billions of parameters) may reveal new bottlenecks.
  • Theoretical guarantees – The study is empirical; formal bounds on forgetting for transformer architectures are still an open research question.

Future work could extend the benchmark to video transformers, explore hybrid CNN‑ViT models, and integrate privacy‑preserving training regimes that make unlearning even more efficient.

Authors

  • Kairan Zhao
  • Iurie Luca
  • Peter Triantafillou

Paper Information

  • arXiv ID: 2602.20114v1
  • Categories: cs.CV, cs.AI
  • Published: February 23, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »