[Paper] Stylized Synthetic Augmentation further improves Corruption Robustness
Source: arXiv - 2512.15675v1
Overview
The paper introduces Stylized Synthetic Augmentation (SSA) – a data‑augmentation pipeline that mixes computer‑generated (synthetic) images with neural‑style‑transfer (NST) to make vision models more resilient to everyday image corruptions (noise, blur, weather effects, etc.). By showing that even “low‑quality” stylized synthetic images can boost robustness, the authors set a new benchmark for corruption‑robust accuracy on the popular CIFAR‑10‑C, CIFAR‑100‑C, and TinyImageNet‑C test suites.
Key Contributions
- Hybrid augmentation pipeline that combines synthetic image generation (e.g., GAN‑based or diffusion‑based) with neural style transfer to produce diverse, stylized training samples.
- Empirical evidence that stylized synthetic images improve corruption robustness despite poorer Fréchet Inception Distance (FID) scores, challenging the conventional wisdom that “high‑fidelity” data is always better.
- Systematic hyper‑parameter study covering style‑transfer strength, synthetic‑to‑real ratios, and interaction with classic rule‑based augmentations (e.g., TrivialAugment).
- State‑of‑the‑art robustness results on three small‑scale benchmarks: 93.54 % on CIFAR‑10‑C, 74.9 % on CIFAR‑100‑C, and 50.86 % on TinyImageNet‑C.
- Open‑source implementation (code and pretrained models) that can be dropped into existing PyTorch training pipelines with minimal changes.
Methodology
- Synthetic Image Generation – The authors use off‑the‑shelf generative models (e.g., StyleGAN2, diffusion models) to create a large pool of class‑conditioned images that do not exist in the original dataset.
- Neural Style Transfer (NST) – Each synthetic image is passed through a fast NST network (e.g., AdaIN or a lightweight transformer) that applies a randomly sampled style from a curated style‑bank (artworks, textures, weather patterns). The style strength is controlled by a scalar hyper‑parameter λ.
- Mixing Strategy – During each training epoch, a minibatch is composed of three parts: (i) real images, (ii) raw synthetic images, and (iii) stylized synthetic images. The ratios are tunable (e.g., 40 % real, 30 % synthetic, 30 % stylized).
- Complementary Augmentations – The pipeline can be combined with TrivialAugment (a minimal, automatically tuned set of geometric/color transforms) but not with more aggressive augmenters that already saturate the corruption space.
- Training – Standard cross‑entropy loss on the target classification task; no extra robustness‑specific loss terms are required. The authors train ResNet‑18/34/50 backbones on CIFAR‑10/100 and TinyImageNet variants.
Results & Findings
| Dataset | Baseline (no SSA) | +TrivialAugment | +SSA (synthetic + stylized) |
|---|---|---|---|
| CIFAR‑10‑C | 89.1 % | 91.2 % | 93.54 % |
| CIFAR‑100‑C | 66.3 % | 70.1 % | 74.9 % |
| TinyImageNet‑C | 42.0 % | 45.5 % | 50.86 % |
- Stylization matters – Removing the NST step drops robust accuracy by ~2–4 % even when synthetic images are kept.
- FID paradox – Stylized synthetic images have higher FID (i.e., look less realistic) but still improve robustness, suggesting that distributional diversity outweighs visual fidelity for this task.
- Compatibility – SSA works well with lightweight augmenters (TrivialAugment) but interferes with heavy augmenters that already introduce strong color/texture variations (e.g., RandAugment).
- Scalability – Adding more synthetic styles yields diminishing returns after ~10–15 distinct style families, keeping compute overhead modest (≈ 1.2× training time).
Practical Implications
- Robust model deployment – Developers building vision services (e.g., autonomous drones, medical imaging, retail analytics) can integrate SSA to harden models against sensor noise, compression artifacts, and adverse weather without redesigning the architecture.
- Data‑efficiency – Teams with limited real‑world labeled data can generate synthetic samples on‑the‑fly, apply stylization, and achieve robustness comparable to collecting costly corrupted datasets.
- Plug‑and‑play – The open‑source code provides a PyTorch
DataLoaderwrapper; swapping in SSA requires only a few lines of configuration (synthetic source, style bank, mix ratios). - Cost‑effective robustness testing – By training with SSA, developers can reduce the need for extensive post‑training corruption benchmarks, accelerating the CI/CD cycle for computer‑vision models.
- Potential for transfer learning – Pre‑training large backbones on stylized synthetic data before fine‑tuning on a target domain may yield downstream robustness gains, a promising avenue for industry‑scale models.
Limitations & Future Work
- Synthetic quality dependence – While the method tolerates low‑FID stylized images, extremely poor generative models (e.g., mode‑collapsed GANs) still hurt performance.
- Small‑scale focus – Experiments are limited to CIFAR‑10/100 and TinyImageNet; scaling to ImageNet‑scale or domain‑specific datasets (e.g., satellite imagery) remains untested.
- Style‑bank curation – The current style set is manually assembled; an automated procedure to discover optimal styles per task could further improve results.
- Compute overhead – NST adds a modest runtime cost; future work could explore style‑aware generative models that embed stylization directly in the synthesis step, eliminating the separate NST pass.
Overall, Stylized Synthetic Augmentation offers a pragmatic, developer‑friendly recipe for building vision models that stay reliable when the world gets messy.
Authors
- Georg Siedel
- Rojan Regmi
- Abhirami Anand
- Weijia Shao
- Silvia Vock
- Andrey Morozov
Paper Information
- arXiv ID: 2512.15675v1
- Categories: cs.CV, cs.LG
- Published: December 17, 2025
- PDF: Download PDF