[Paper] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

Published: 3 days ago (February 13, 2026 at 08:02 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.12902v1

Overview

The paper introduces a systematic way to gauge how well autonomous‑vehicle (AV) object‑detection models survive harsh weather and lighting. By synthetically “storming” benchmark images with controllable fog, rain, snow, darkness, glare, etc., the authors pinpoint the exact intensity at which a detector first breaks down, offering a clear, quantitative robustness score.

Key Contributions

First‑failure‑coefficient metric (AFFC): A novel, easy‑to‑interpret measure that captures the average intensity level where a model first fails on a set of images.
Synthetic adverse‑condition pipeline: Seven parametric data‑augmentation operators (fog, rain, snow, dark, bright, flaring, shadow) that can be tuned to any severity level.
Comprehensive benchmark: Evaluation of four popular detectors (YOLOv5s, YOLOv11s, Faster R-CNN, Detectron2) across all seven conditions.
Training‑for‑robustness study: Demonstrates that augmenting the training set with synthetic adverse weather improves robustness, but also reveals diminishing returns and catastrophic forgetting when over‑trained.
Open‑source implementation: The authors release code and augmentation recipes, enabling reproducible robustness testing for any detection model.

Methodology

Baseline dataset: A standard object‑detection benchmark (e.g., COCO‑like images) is used as the clean reference.
Adverse‑condition generators: Each of the seven operators takes a clean image and a scalar intensity (t) (0 = no effect, 1 = maximum effect) and produces a weather‑altered version. The operators are based on well‑known graphics techniques (e.g., perlin‑noise fog, motion‑blur rain streaks, illumination scaling).
Progressive probing: For every test image, the intensity (t) is increased step‑wise until the detector’s output no longer meets a predefined IoU/score threshold. The smallest (t) that triggers failure is recorded as the first‑failure coefficient for that image.
Aggregating results: The Average First Failure Coefficient (AFFC) is computed by averaging the per‑image failure coefficients across the whole benchmark, yielding a single robustness number per model‑condition pair.
Robustness‑enhanced training: Models are retrained with a mixture of clean and synthetically corrupted images. The same AFFC pipeline is then reapplied to assess gains or losses.

Results & Findings

Model	Overall AFFC (average over 7 conditions)	Best‑case condition	Worst‑case condition
Faster R-CNN	71.9 %	Fog (≈78 %)	Bright glare (≈65 %)
Detectron2	68 %	Snow (≈73 %)	Dark (≈60 %)
YOLOv5s	43 %	Rain (≈48 %)	Shadow (≈38 %)
YOLOv11s	42 %	Fog (≈46 %)	Dark (≈35 %)

Faster R-CNN consistently tolerates higher severity before failing, making it the most robust among the tested detectors.
YOLO family models degrade earlier, especially under low‑light and high‑contrast lighting (dark, flaring, shadow).
Adding synthetic adverse‑weather images to the training set lifts AFFC by ~10–15 % for most models, but beyond a certain augmentation ratio the improvement plateaus and can even reverse (forgetting of clean‑scene performance).

Practical Implications

Safety‑by‑design thresholds: AV manufacturers can use AFFC to define operational design domains (ODDs). For example, a vehicle equipped with Faster R-CNN could be certified to operate safely up to fog density ≈ 0.7 (on the authors’ scale).
Model selection for edge devices: Developers targeting low‑power hardware may favor Faster R-CNN or Detectron2 when robustness is critical, despite their higher compute cost, whereas YOLO variants remain attractive for speed‑first applications with supplemental sensor fusion (e.g., radar).
Data‑augmentation pipelines: The seven operators can be plugged into existing training workflows (PyTorch, TensorFlow) to produce “weather‑hardened” models without collecting costly real‑world rainy/snowy data.
Continuous validation: AFFC provides a lightweight regression test that can be run nightly on new model builds, catching robustness regressions early in the CI pipeline.
Regulatory reporting: The metric offers a quantifiable, repeatable figure that regulators could require in safety cases, similar to how crash‑test ratings are used for conventional vehicles.

Limitations & Future Work

Synthetic vs. real weather: While the augmentations approximate physical effects, they may miss subtle sensor‑specific artifacts (e.g., lens flare, water droplets on the camera housing). Real‑world validation remains necessary.
Single‑sensor focus: The study evaluates only camera‑based detection; extending the framework to lidar, radar, or multimodal fusion would broaden its applicability.
Static intensity scaling: The current pipeline treats intensity as a scalar; future work could model spatiotemporal dynamics (e.g., moving rain streaks) and evaluate video‑based detectors.
Forgetting mitigation: The observed degradation when over‑training on adverse conditions suggests a need for smarter curriculum or regularization strategies to preserve clean‑scene performance.

Bottom line: By turning “bad weather” into a controllable test knob, this research gives AV engineers a practical yardstick—AFFC—to compare, tune, and certify object‑detection models for the real world’s messier conditions.

Authors

Fox Pettersen
Hong Zhu

Paper Information

arXiv ID: 2602.12902v1
Categories: cs.CV, cs.AI, cs.LG, cs.SE
Published: February 13, 2026
PDF: Download PDF

[Paper] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

[Paper] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

[Paper] Realistic Face Reconstruction from Facial Embeddings via Diffusion Models

[Paper] EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition