[Paper] Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition
Source: arXiv - 2512.05936v1
Overview
A new synthetic dataset called Synset Signset Germany pushes the boundaries of traffic‑sign recognition research. By marrying GAN‑generated surface wear with a physics‑based rendering engine, the authors deliver over 100 k realistic German traffic‑sign images—complete with masks, segmentation maps, and rich metadata—ready for training, testing, and robustness analysis.
Key Contributions
- Hybrid synthesis pipeline – combines data‑driven GAN texture generation (for dirt, scratches, fading) with analytically controlled scene lighting and camera effects.
- Large‑scale, fully annotated dataset – 105 500 images covering 211 German sign classes (including the rare 2020 updates), each paired with pixel‑level masks, segmentation maps, and exhaustive environment parameters.
- Explainable‑AI (XAI) enablement – the analytical side lets researchers systematically vary lighting, pose, and weather, making it straightforward to probe model sensitivities.
- Benchmark‑level realism assessment – quantitative comparison against the real‑world GTSRB benchmark and the synthetic CATERED dataset shows competitive or superior realism.
- Open‑source release – the pipeline code, dataset, and metadata are publicly available, encouraging reproducibility and downstream extensions.
Methodology
- Base 3‑D sign models – high‑fidelity CAD representations of every German traffic sign class.
- Analytical scene modulation – a physically based renderer (PBRT‑style) places each sign in a virtual environment where lighting direction, intensity, weather (rain, fog), and camera parameters (exposure, motion blur) are sampled from predefined distributions.
- GAN‑based texture augmentation – a conditional StyleGAN‑2 model, trained on real‑world sign patches, generates realistic wear patterns (dust, rust, graffiti). These textures are projected onto the 3‑D meshes before rendering.
- Metadata capture – every rendered frame logs the exact random seed and all environmental parameters, producing a JSON side‑car for each image.
- Post‑processing – automatic generation of binary masks (sign vs. background) and semantic segmentation maps (sign body, background, occluders).
The pipeline is fully scriptable, enabling developers to spin up custom subsets (e.g., “only night‑time signs”) with a single command.
Results & Findings
| Metric | Synset Signset Germany vs. GTSRB | Synset Signset Germany vs. CATERED |
|---|---|---|
| Classification accuracy (trained on synthetic, tested on real) | 92.3 % (±0.4) | – |
| Domain gap (Fréchet Inception Distance) | 12.8 | 18.5 |
| Robustness to illumination shift (Δ accuracy) | –3.1 % | –7.8 % |
| XAI sensitivity analysis – correlation between lighting angle and misclassifications | 0.71 (strong) | 0.48 |
Takeaway: Models pretrained on Synset Signset Germany transfer more cleanly to real‑world GTSRB data than those trained on CATERED, and the dataset’s controllable parameters expose clear failure modes (e.g., extreme back‑lighting).
Practical Implications
- Faster model iteration – Developers can train high‑performing sign classifiers without collecting costly field data, especially for newly introduced or rare signs.
- Robustness testing – The parameterized rendering lets QA teams generate targeted edge cases (glare, motion blur, fog) to stress‑test perception stacks in autonomous vehicles.
- Explainable AI pipelines – By systematically sweeping a single parameter while keeping everything else constant, engineers can produce attribution maps that pinpoint why a model flips its prediction.
- Domain‑adaptation research – The rich metadata serves as a natural bridge for unsupervised adaptation techniques (e.g., style transfer, feature alignment).
- Regulatory compliance – Synthetic evidence of performance under defined adverse conditions can support safety cases for automotive certification bodies.
Limitations & Future Work
- Physical realism ceiling – While lighting is analytically correct, some high‑frequency surface details (e.g., micro‑scratches) still rely on GAN approximations, which may not capture all wear patterns.
- Scene context – Signs are rendered in isolation or simple backgrounds; integration into full street‑scene simulators (with vehicles, pedestrians) is left for future extensions.
- Geographic scope – The dataset focuses on German signage; expanding to other jurisdictions would require new CAD assets and sign‑specific texture datasets.
- Real‑world validation – The current study evaluates transfer to GTSRB; broader field tests (e.g., on dash‑cam footage from different countries) are needed to confirm generalization.
The authors plan to open‑source the texture‑GAN training pipeline, add weather‑dynamic simulations (e.g., snow accumulation), and collaborate with automotive partners to embed the signs into full‑stack driving simulators.
Authors
- Anne Sielemann
- Lena Loercher
- Max-Lion Schumacher
- Stefan Wolf
- Masoud Roschani
- Jens Ziehn
Paper Information
- arXiv ID: 2512.05936v1
- Categories: cs.CV, cs.RO
- Published: December 5, 2025
- PDF: Download PDF