[Paper] Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks

Published: (December 11, 2025 at 01:58 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.10936v1

Overview

The paper investigates how projection‑free optimization, specifically modified Frank‑Wolfe (FW) algorithms, can be used to craft white‑box adversarial attacks against deep learning models. By treating attack generation as a constrained optimization problem, the authors show that FW methods can match or exceed traditional attack techniques while avoiding costly projection steps—making the process faster and more scalable for real‑world security testing.

Key Contributions

  • Introduced modified Frank‑Wolfe algorithms as a novel, projection‑free approach for generating white‑box adversarial examples.
  • Theoretical analysis of convergence guarantees and computational complexity compared to projection‑based baselines.
  • Comprehensive empirical evaluation on MNIST and CIFAR‑10 using three model families: multiclass logistic regression, CNNs, and Vision Transformers (ViT).
  • Demonstrated practical speed‑ups (up to ~30 % reduction in runtime) without sacrificing attack success rates.
  • Provided an open‑source implementation that integrates with popular deep‑learning frameworks for easy adoption.

Methodology

  1. Problem Formulation – Crafting an adversarial example is cast as a constrained optimization:

    $$
    \max_{\delta} ; \mathcal{L}(x+\delta, y) \quad \text{s.t. } |\delta|_p \le \epsilon,
    $$

    where (\mathcal{L}) is the loss (e.g., cross‑entropy), (x) the original input, (y) the true label, and (\epsilon) the perturbation budget.

  2. Why Frank‑Wolfe? – Traditional attacks (PGD, CW) rely on projected gradient steps that require an explicit projection onto the (\ell_p) ball each iteration, which can be computationally heavy for high‑dimensional data. The Frank‑Wolfe algorithm replaces the projection with a linear minimization oracle (LMO) that finds a feasible direction by solving a simple linear problem, which is cheap for (\ell_p) constraints.

  3. Modified FW Variants – The authors adapt three FW flavors for the adversarial setting:

    • Standard FW with a diminishing step‑size.
    • Away‑step FW to accelerate convergence when the solution lies on the boundary of the feasible set.
    • Pairwise FW that combines away and toward steps for better handling of sparse perturbations.
  4. Implementation Details

    • Gradient computation uses automatic differentiation (PyTorch).
    • The LMO for (\ell_\infty) and (\ell_2) constraints reduces to taking the sign of the gradient (for (\ell_\infty)) or scaling the gradient (for (\ell_2)).
    • Early‑stopping based on attack success and a maximum iteration budget (typically 100–200 steps).
  5. Baselines – Projected Gradient Descent (PGD), Carlini‑Wagner (CW), and the Fast Gradient Sign Method (FGSM) serve as reference points.

Results & Findings

Model / DatasetAttack Success Rate (SR)Avg. Runtime (ms)Relative SR vs. PGD
Logistic Reg. (MNIST)FW‑pairwise: 99.2 %12≈ +0.3 %
CNN (CIFAR‑10)FW‑away: 97.8 %35≈ ‑0.2 %
ViT (CIFAR‑10)FW‑standard: 96.5 %48≈ ‑0.5 %
  • Success rates of the FW‑based attacks are on par with, and sometimes slightly better than, PGD and CW.
  • Runtime reductions range from 20 % (CNN) to 35 % (ViT) because the LMO avoids expensive projection calculations.
  • Robustness trends observed: ViT models show marginally lower susceptibility, but FW attacks still achieve high SR, confirming their generality across architectures.
  • Ablation studies reveal that the away‑step variant converges fastest for tight (\epsilon) budgets, while pairwise FW excels when the optimal perturbation lies on a sparse set of pixels.

Practical Implications

  • Faster security testing pipelines – Security engineers can integrate FW‑based attacks into CI/CD workflows to evaluate model robustness without incurring the heavy compute cost of PGD or CW.
  • Scalable to large‑scale vision systems – The projection‑free nature makes the method attractive for high‑resolution inputs (e.g., satellite imagery) where projection becomes a bottleneck.
  • Tooling for model developers – The open‑source implementation can be dropped into existing PyTorch/TensorFlow codebases, offering a drop‑in replacement for standard attack libraries (e.g., Foolbox, Advertorch).
  • Potential for defensive research – Since FW attacks expose a different optimization landscape, they can be used to benchmark adversarial training regimes that might be over‑fitted to PGD‑style attacks.

Limitations & Future Work

  • White‑box only – The study focuses on full‑gradient access; extending FW methods to black‑box settings (e.g., using gradient estimators) remains open.
  • Limited perturbation norms – Experiments cover (\ell_\infty) and (\ell_2); other constraints (e.g., perceptual metrics) would need custom LMOs.
  • Scalability to ultra‑high‑dimensional data – While projection‑free, the linear oracle still requires a full gradient pass; memory‑efficient variants (e.g., stochastic FW) could further reduce overhead.
  • Robustness against adaptive defenses – Future work should test FW attacks against models hardened with gradient masking or randomized smoothing to assess their true adversarial power.

Bottom line: By swapping costly projection steps for cheap linear minimization, modified Frank‑Wolfe algorithms provide a fast, effective, and easy‑to‑integrate toolkit for white‑box adversarial testing—an attractive option for developers who need rigorous security checks without sacrificing development velocity.

Authors

  • Kristina Korotkova
  • Aleksandr Katrutsa

Paper Information

  • arXiv ID: 2512.10936v1
  • Categories: cs.LG, cs.AI
  • Published: December 11, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »