[Paper] Deepfake detectors are DUMB: A benchmark to assess adversarial training robustness under transferability constraints

Published: 1 month ago (January 9, 2026 at 01:06 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.05986v1

Overview

Deepfake detection models are increasingly deployed in platforms that need to verify the authenticity of video content. This paper shows that many of these detectors, even when hardened with adversarial training, can still be fooled by subtle, transferable perturbations—especially when the attacker’s data or model differs from the defender’s. By extending the DUMB benchmarking framework to deepfake detection, the authors provide a realistic stress‑test that mirrors how adversaries operate in the wild.

Key Contributions

DUMB‑er Benchmark for Deepfakes – adapts the Dataset‑Sources‑Model‑Balance (DUMB) methodology to evaluate robustness under transferability constraints (i.e., attacker and defender use different data or architectures).
Comprehensive Empirical Study – tests five state‑of‑the‑art detectors (RECCE, SRM, XCeption, UCF, SPSL) against three popular attacks (PGD, FGSM, FPBA) on two widely used datasets (FaceForensics++ and Celeb‑DF‑V2).
Cross‑Dataset Insight – reveals that adversarial training improves in‑distribution robustness but can hurt performance when the test data comes from a different distribution.
Case‑Aware Defense Recommendations – proposes that defense strategies must be tuned to the expected mismatch scenario (e.g., same‑source vs. cross‑source attacks).
Open‑Source Evaluation Suite – releases code and benchmark scripts so the community can reproduce and extend the analysis.

Methodology

Benchmark Construction (DUMB‑er)

Dataset Sources: Two deepfake corpora (FaceForensics++ and Celeb‑DF‑V2) serve as source and target domains.
Model Architecture: Five detectors covering handcrafted features (SRM), deep CNNs (XCeption), and hybrid approaches (RECCE, UCF, SPSL).
Balance: Each detector is trained on a balanced mix of real and fake videos, then optionally fine‑tuned with adversarial examples.

Adversarial Attack Scenarios

White‑Box: Attacker knows the exact model and training data (baseline).
Transferability‑Constrained: Attacker trains a surrogate model on a different dataset or architecture, then generates perturbations (PGD, FGSM, FPBA) that are applied to the target detector.

Evaluation Protocol

In‑Distribution: Test and attack both use the same dataset the detector was trained on.
Cross‑Dataset: Test set comes from the other dataset, simulating real‑world distribution shift.
Metrics: detection accuracy, AUC, and robustness drop (difference between clean and adversarial performance).

Results & Findings

Scenario	Clean Accuracy	Adversarial Accuracy (PGD)	Effect of Adversarial Training
In‑distribution (same source)	~92 %	~45 %	↑ to ~78 % (robustness gain)
Cross‑dataset (different source)	~85 %	~38 %	↓ to ~70 % (robustness loss)

Adversarial training helps when the attacker’s surrogate matches the defender’s data distribution (e.g., both use FaceForensics++).
When data mismatches, some defenses overfit to the adversarial patterns of the source domain, causing a negative transfer that harms detection on the target domain.
Attack transferability varies: FPBA (feature‑preserving) is the most successful across datasets, while FGSM’s impact drops sharply under cross‑dataset conditions.
Detector‑specific trends: Handcrafted‑feature models (SRM) are more resilient to transfer attacks than pure CNNs, but they still suffer under aggressive PGD perturbations.

Practical Implications

Deployments must anticipate distribution shift – platforms that ingest user‑generated videos from diverse sources should not rely on a single adversarial‑training recipe.
Hybrid defenses are promising – combining handcrafted cues (e.g., SRM) with learned features can mitigate transfer attacks without sacrificing clean‑data performance.
Continuous fine‑tuning – periodic re‑training on freshly collected, possibly adversarially perturbed data from the target platform can keep robustness in check.
Security‑by‑design – developers should integrate a robustness monitoring pipeline that flags sudden drops in detection confidence, indicating a potential adversarial campaign.
Tooling – the released benchmark can be plugged into CI pipelines to evaluate new detector versions against realistic adversarial scenarios before production rollout.

Limitations & Future Work

Scope of Datasets – Only two deepfake corpora were examined; emerging datasets with higher visual fidelity may exhibit different transfer dynamics.
Attack Diversity – The study focuses on gradient‑based attacks; future work should explore generative adversarial attacks that synthesize more naturalistic perturbations.
Real‑World Constraints – Perturbations are assumed to be imperceptible at the pixel level; in practice, compression, streaming, and device‑specific processing could alter attack efficacy.
Defense Strategies – The paper evaluates standard adversarial training; exploring certified defenses, ensemble methods, or meta‑learning could yield more universally robust detectors.

Bottom line: adversarial training isn’t a silver bullet for deepfake detection. Its benefits hinge on how closely the training and deployment environments align, urging practitioners to adopt adaptive, data‑aware defense pipelines.

Authors

Adrian Serrano
Erwan Umlil
Ronan Thomas

Paper Information

arXiv ID: 2601.05986v1
Categories: cs.CV, cs.CR
Published: January 9, 2026
PDF: Download PDF

[Paper] Deepfake detectors are DUMB: A benchmark to assess adversarial training robustness under transferability constraints

Overview

Key Contributions

Methodology

Benchmark Construction (DUMB‑er)

Adversarial Attack Scenarios

Evaluation Protocol

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Adaptive Conditional Contrast-Agnostic Deformable Image Registration with Uncertainty Estimation

[Paper] VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

[Paper] WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation

[Paper] Context-Aware Decoding for Faithful Vision-Language Generation

Overview

Key Contributions

Methodology

Benchmark Construction (DUMB‑er)

Adversarial Attack Scenarios

Evaluation Protocol

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Adaptive Conditional Contrast-Agnostic Deformable Image Registration with Uncertainty Estimation

[Paper] VideoAR: Autoregressive Video Generation via Next-Frame &amp; Scale Prediction

[Paper] WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation

[Paper] Context-Aware Decoding for Faithful Vision-Language Generation

[Paper] VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction