[Paper] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

Published: (February 13, 2026 at 08:02 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.12902v1

Overview

The paper introduces a systematic way to gauge how well autonomous‑vehicle (AV) object‑detection models survive harsh weather and lighting. By synthetically “storming” benchmark images with controllable fog, rain, snow, darkness, glare, etc., the authors pinpoint the exact intensity at which a detector first breaks down, offering a clear, quantitative robustness score.

Key Contributions

  • First‑failure‑coefficient metric (AFFC): A novel, easy‑to‑interpret measure that captures the average intensity level where a model first fails on a set of images.
  • Synthetic adverse‑condition pipeline: Seven parametric data‑augmentation operators (fog, rain, snow, dark, bright, flaring, shadow) that can be tuned to any severity level.
  • Comprehensive benchmark: Evaluation of four popular detectors (YOLOv5s, YOLOv11s, Faster R-CNN, Detectron2) across all seven conditions.
  • Training‑for‑robustness study: Demonstrates that augmenting the training set with synthetic adverse weather improves robustness, but also reveals diminishing returns and catastrophic forgetting when over‑trained.
  • Open‑source implementation: The authors release code and augmentation recipes, enabling reproducible robustness testing for any detection model.

Methodology

  1. Baseline dataset: A standard object‑detection benchmark (e.g., COCO‑like images) is used as the clean reference.
  2. Adverse‑condition generators: Each of the seven operators takes a clean image and a scalar intensity (t) (0 = no effect, 1 = maximum effect) and produces a weather‑altered version. The operators are based on well‑known graphics techniques (e.g., perlin‑noise fog, motion‑blur rain streaks, illumination scaling).
  3. Progressive probing: For every test image, the intensity (t) is increased step‑wise until the detector’s output no longer meets a predefined IoU/score threshold. The smallest (t) that triggers failure is recorded as the first‑failure coefficient for that image.
  4. Aggregating results: The Average First Failure Coefficient (AFFC) is computed by averaging the per‑image failure coefficients across the whole benchmark, yielding a single robustness number per model‑condition pair.
  5. Robustness‑enhanced training: Models are retrained with a mixture of clean and synthetically corrupted images. The same AFFC pipeline is then reapplied to assess gains or losses.

Results & Findings

ModelOverall AFFC (average over 7 conditions)Best‑case conditionWorst‑case condition
Faster R-CNN71.9 %Fog (≈78 %)Bright glare (≈65 %)
Detectron268 %Snow (≈73 %)Dark (≈60 %)
YOLOv5s43 %Rain (≈48 %)Shadow (≈38 %)
YOLOv11s42 %Fog (≈46 %)Dark (≈35 %)
  • Faster R-CNN consistently tolerates higher severity before failing, making it the most robust among the tested detectors.
  • YOLO family models degrade earlier, especially under low‑light and high‑contrast lighting (dark, flaring, shadow).
  • Adding synthetic adverse‑weather images to the training set lifts AFFC by ~10–15 % for most models, but beyond a certain augmentation ratio the improvement plateaus and can even reverse (forgetting of clean‑scene performance).

Practical Implications

  • Safety‑by‑design thresholds: AV manufacturers can use AFFC to define operational design domains (ODDs). For example, a vehicle equipped with Faster R-CNN could be certified to operate safely up to fog density ≈ 0.7 (on the authors’ scale).
  • Model selection for edge devices: Developers targeting low‑power hardware may favor Faster R-CNN or Detectron2 when robustness is critical, despite their higher compute cost, whereas YOLO variants remain attractive for speed‑first applications with supplemental sensor fusion (e.g., radar).
  • Data‑augmentation pipelines: The seven operators can be plugged into existing training workflows (PyTorch, TensorFlow) to produce “weather‑hardened” models without collecting costly real‑world rainy/snowy data.
  • Continuous validation: AFFC provides a lightweight regression test that can be run nightly on new model builds, catching robustness regressions early in the CI pipeline.
  • Regulatory reporting: The metric offers a quantifiable, repeatable figure that regulators could require in safety cases, similar to how crash‑test ratings are used for conventional vehicles.

Limitations & Future Work

  • Synthetic vs. real weather: While the augmentations approximate physical effects, they may miss subtle sensor‑specific artifacts (e.g., lens flare, water droplets on the camera housing). Real‑world validation remains necessary.
  • Single‑sensor focus: The study evaluates only camera‑based detection; extending the framework to lidar, radar, or multimodal fusion would broaden its applicability.
  • Static intensity scaling: The current pipeline treats intensity as a scalar; future work could model spatiotemporal dynamics (e.g., moving rain streaks) and evaluate video‑based detectors.
  • Forgetting mitigation: The observed degradation when over‑training on adverse conditions suggests a need for smarter curriculum or regularization strategies to preserve clean‑scene performance.

Bottom line: By turning “bad weather” into a controllable test knob, this research gives AV engineers a practical yardstick—AFFC—to compare, tune, and certify object‑detection models for the real world’s messier conditions.

Authors

  • Fox Pettersen
  • Hong Zhu

Paper Information

  • arXiv ID: 2602.12902v1
  • Categories: cs.CV, cs.AI, cs.LG, cs.SE
  • Published: February 13, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »