[Paper] Scaling Adversarial Training via Data Selection

Published: (December 26, 2025 at 10:50 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.22069v1

Overview

Adversarial training with strong attacks like Projected Gradient Descent (PGD) is the gold‑standard for building robust deep‑learning models, but the inner‑loop optimization required for every training sample makes it prohibitively expensive at scale. This paper introduces Selective Adversarial Training, a simple yet effective data‑selection strategy that attacks only the most “critical” examples in each mini‑batch, cutting the computational load by up to half while preserving—or even improving—robustness.

Key Contributions

  • Selective adversarial training framework that generates adversarial perturbations for a subset of samples rather than the whole batch.
  • Two principled selection criteria:
    1. Margin‑based sampling – picks examples that lie close to the model’s decision boundary.
    2. Gradient‑matching sampling – picks examples whose loss gradients align with the dominant direction of the batch’s overall gradient.
  • Mixed‑objective training: adversarially perturbed samples are trained with the usual robust loss, while the remaining “clean” samples use the standard cross‑entropy loss.
  • Empirical validation on MNIST and CIFAR‑10 showing comparable or superior robustness to full‑batch PGD training with ≈ 50 % less adversarial computation.

Methodology

  1. Mini‑batch formation – As usual, data are sampled into mini‑batches for stochastic gradient descent.
  2. Sample selection – Before the inner PGD loop runs, the algorithm scores each example in the batch:
    • Margin‑based: compute the difference between the top‑two class logits; smaller margins indicate proximity to the decision boundary.
    • Gradient‑matching: compute the gradient of the loss w.r.t. the model parameters for each example; select those whose gradients have the highest cosine similarity with the batch‑average gradient.
  3. Adversarial generation – Run the full PGD attack only on the selected subset (e.g., 50 % of the batch).
  4. Loss composition
    • For perturbed examples: use the robust loss (e.g., cross‑entropy on the adversarially perturbed inputs).
    • For the rest: use the standard clean loss.
  5. Parameter update – Back‑propagate the combined loss and update model weights as usual.

The key insight is that not every training point contributes equally to shaping the decision boundary; focusing the expensive PGD step on the “hardest” points yields most of the robustness benefit.

Results & Findings

DatasetBaseline (Full PGD)Selective (Margin)Selective (Grad‑Match)Compute Reduction
MNIST96.2 % clean / 84.5 % robust95.8 % / 85.1 %95.9 % / 84.9 %~45 %
CIFAR‑1084.3 % clean / 48.7 % robust83.9 % / 49.2 %84.0 % / 48.9 %~50 %
  • Robust accuracy (accuracy on PGD‑attacked test data) is on par with or slightly better than full adversarial training despite using half the adversarial budget.
  • Training time per epoch drops by roughly 40–50 %, making the approach viable for larger models and datasets.
  • The two selection strategies perform similarly; margin‑based sampling is marginally cheaper because it only needs logits, while gradient‑matching requires per‑sample gradients but can be more expressive for complex data distributions.

Practical Implications

  • Faster robust model pipelines – Teams can now incorporate strong adversarial training into regular training schedules without needing massive GPU clusters.
  • Cost‑effective security – Reducing the number of PGD steps translates directly into lower cloud compute bills, a tangible benefit for SaaS providers and edge‑device manufacturers.
  • Scalable to larger datasets – The selection logic is lightweight and can be parallelized; extending to ImageNet‑scale or language models becomes realistic.
  • Hybrid training regimes – Developers can combine selective adversarial training with other efficiency tricks (e.g., mixed‑precision, curriculum learning) for even greater speed‑ups.

Limitations & Future Work

  • Dataset scope – Experiments are limited to MNIST and CIFAR‑10; performance on high‑resolution vision tasks or NLP benchmarks remains untested.
  • Selection overhead – Gradient‑matching requires per‑sample gradients, which adds a modest overhead; future work could explore cheaper proxies (e.g., using activation statistics).
  • Dynamic budget – The paper uses a fixed selection ratio (≈ 50 %). Adaptive schemes that adjust the ratio based on training progress could yield further gains.
  • Robustness against adaptive attacks – The authors focus on standard PGD evaluation; assessing resistance to stronger or adaptive adversaries would strengthen the security claim.

Bottom line: Selective adversarial training shows that “smart” data selection can dramatically cut the cost of building robust models, opening the door for wider adoption of adversarial defenses in production‑grade machine‑learning systems.

Authors

  • Youran Ye
  • Dejin Wang
  • Ajinkya Bhandare

Paper Information

  • arXiv ID: 2512.22069v1
  • Categories: cs.LG
  • Published: December 26, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »