[Paper] Stylized Synthetic Augmentation further improves Corruption Robustness

Published: 1 month ago (December 17, 2025 at 01:28 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.15675v1

Overview

The paper introduces Stylized Synthetic Augmentation (SSA) – a data‑augmentation pipeline that mixes computer‑generated (synthetic) images with neural‑style‑transfer (NST) to make vision models more resilient to everyday image corruptions (noise, blur, weather effects, etc.). By showing that even “low‑quality” stylized synthetic images can boost robustness, the authors set a new benchmark for corruption‑robust accuracy on the popular CIFAR‑10‑C, CIFAR‑100‑C, and TinyImageNet‑C test suites.

Key Contributions

Hybrid augmentation pipeline that combines synthetic image generation (e.g., GAN‑based or diffusion‑based) with neural style transfer to produce diverse, stylized training samples.
Empirical evidence that stylized synthetic images improve corruption robustness despite poorer Fréchet Inception Distance (FID) scores, challenging the conventional wisdom that “high‑fidelity” data is always better.
Systematic hyper‑parameter study covering style‑transfer strength, synthetic‑to‑real ratios, and interaction with classic rule‑based augmentations (e.g., TrivialAugment).
State‑of‑the‑art robustness results on three small‑scale benchmarks: 93.54 % on CIFAR‑10‑C, 74.9 % on CIFAR‑100‑C, and 50.86 % on TinyImageNet‑C.
Open‑source implementation (code and pretrained models) that can be dropped into existing PyTorch training pipelines with minimal changes.

Methodology

Synthetic Image Generation – The authors use off‑the‑shelf generative models (e.g., StyleGAN2, diffusion models) to create a large pool of class‑conditioned images that do not exist in the original dataset.
Neural Style Transfer (NST) – Each synthetic image is passed through a fast NST network (e.g., AdaIN or a lightweight transformer) that applies a randomly sampled style from a curated style‑bank (artworks, textures, weather patterns). The style strength is controlled by a scalar hyper‑parameter λ.
Mixing Strategy – During each training epoch, a minibatch is composed of three parts: (i) real images, (ii) raw synthetic images, and (iii) stylized synthetic images. The ratios are tunable (e.g., 40 % real, 30 % synthetic, 30 % stylized).
Complementary Augmentations – The pipeline can be combined with TrivialAugment (a minimal, automatically tuned set of geometric/color transforms) but not with more aggressive augmenters that already saturate the corruption space.
Training – Standard cross‑entropy loss on the target classification task; no extra robustness‑specific loss terms are required. The authors train ResNet‑18/34/50 backbones on CIFAR‑10/100 and TinyImageNet variants.

Results & Findings

Dataset	Baseline (no SSA)	+TrivialAugment	+SSA (synthetic + stylized)
CIFAR‑10‑C	89.1 %	91.2 %	93.54 %
CIFAR‑100‑C	66.3 %	70.1 %	74.9 %
TinyImageNet‑C	42.0 %	45.5 %	50.86 %

Stylization matters – Removing the NST step drops robust accuracy by ~2–4 % even when synthetic images are kept.
FID paradox – Stylized synthetic images have higher FID (i.e., look less realistic) but still improve robustness, suggesting that distributional diversity outweighs visual fidelity for this task.
Compatibility – SSA works well with lightweight augmenters (TrivialAugment) but interferes with heavy augmenters that already introduce strong color/texture variations (e.g., RandAugment).
Scalability – Adding more synthetic styles yields diminishing returns after ~10–15 distinct style families, keeping compute overhead modest (≈ 1.2× training time).

Practical Implications

Robust model deployment – Developers building vision services (e.g., autonomous drones, medical imaging, retail analytics) can integrate SSA to harden models against sensor noise, compression artifacts, and adverse weather without redesigning the architecture.
Data‑efficiency – Teams with limited real‑world labeled data can generate synthetic samples on‑the‑fly, apply stylization, and achieve robustness comparable to collecting costly corrupted datasets.
Plug‑and‑play – The open‑source code provides a PyTorch DataLoader wrapper; swapping in SSA requires only a few lines of configuration (synthetic source, style bank, mix ratios).
Cost‑effective robustness testing – By training with SSA, developers can reduce the need for extensive post‑training corruption benchmarks, accelerating the CI/CD cycle for computer‑vision models.
Potential for transfer learning – Pre‑training large backbones on stylized synthetic data before fine‑tuning on a target domain may yield downstream robustness gains, a promising avenue for industry‑scale models.

Limitations & Future Work

Synthetic quality dependence – While the method tolerates low‑FID stylized images, extremely poor generative models (e.g., mode‑collapsed GANs) still hurt performance.
Small‑scale focus – Experiments are limited to CIFAR‑10/100 and TinyImageNet; scaling to ImageNet‑scale or domain‑specific datasets (e.g., satellite imagery) remains untested.
Style‑bank curation – The current style set is manually assembled; an automated procedure to discover optimal styles per task could further improve results.
Compute overhead – NST adds a modest runtime cost; future work could explore style‑aware generative models that embed stylization directly in the synthesis step, eliminating the separate NST pass.

Overall, Stylized Synthetic Augmentation offers a pragmatic, developer‑friendly recipe for building vision models that stay reliable when the world gets messy.

Authors

Georg Siedel
Rojan Regmi
Abhirami Anand
Weijia Shao
Silvia Vock
Andrey Morozov

Paper Information

arXiv ID: 2512.15675v1
Categories: cs.CV, cs.LG
Published: December 17, 2025
PDF: Download PDF

[Paper] Stylized Synthetic Augmentation further improves Corruption Robustness

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting

[Paper] Adversarial Robustness of Vision in Open Foundation Models

[Paper] RadarGen: Automotive Radar Point Cloud Generation from Cameras

[Paper] Visually Prompted Benchmarks Are Surprisingly Fragile