[Paper] A Systematic Mapping Study on the Debugging of Autonomous Driving Systems
Source: arXiv - 2601.04293v1
Overview
The paper presents the first systematic mapping study that surveys how researchers and engineers debug Autonomous Driving Systems (ADS). While testing has received a lot of attention, the authors argue that the debugging step—pinpointing and fixing the root cause of a failure—is equally critical for safety‑critical deployments. By cataloguing 84 primary studies, the authors expose the current landscape, spot recurring themes, and point out where the research community still has gaps.
Key Contributions
- Comprehensive taxonomy of ADS debugging approaches (e.g., simulation‑based, log‑analysis, formal methods, ML‑driven techniques).
- Bibliometric analysis showing publication trends, venues, and dominant research groups.
- Identification of research gaps, such as lack of standardized debugging pipelines, limited support for real‑time fault localisation, and scarce evaluation on production‑grade platforms.
- Recommendations for a unified terminology and problem definition to help future work speak a common language.
- Road‑map for future research, highlighting promising directions like hybrid debugging, automated repair, and integration with continuous integration/continuous deployment (CI/CD) for ADS.
Methodology
The authors followed a classic systematic mapping protocol:
- Search Strategy – Querying major digital libraries (IEEE Xplore, ACM DL, Scopus, Web of Science) with keywords like “autonomous driving”, “debugging”, “fault localisation”.
- Inclusion/Exclusion Filtering – Keeping only peer‑reviewed works that explicitly address debugging (or fault localisation) of ADS, discarding pure testing or simulation‑only papers.
- Classification Scheme – Each primary study was coded along dimensions such as debugging technique, target component (perception, planning, control), evaluation environment (simulator, real vehicle), and automation level (manual vs. automated).
- Data Extraction & Synthesis – Aggregating the coded data to produce visual maps (heat‑maps, timelines) that reveal where research effort is concentrated.
The process is deliberately transparent, enabling other researchers to replicate or extend the mapping.
Results & Findings
- Dominant Techniques: Simulation‑based replay (42 % of studies) and log‑analysis tools (35 %) are the most common. Formal verification for debugging appears in only 8 % of papers.
- Targeted ADS Sub‑systems: Perception modules (camera/LiDAR processing) receive the bulk of debugging attention, while planning and control are under‑explored.
- Evaluation Settings: 71 % of works rely on synthetic simulators (CARLA, LGSVL), with only 19 % testing on real‑world vehicles or test tracks.
- Automation Gap: Only 12 % of approaches provide automated fault localisation; the rest require manual inspection of logs or visualizations.
- Fragmented Terminology: Authors use a wide range of terms—“fault localisation”, “error diagnosis”, “debugging”—often interchangeably, hindering cross‑study comparison.
Overall, the field shows promising early prototypes but lacks a cohesive, industry‑ready debugging ecosystem.
Practical Implications
- Tooling Road‑Map for Developers: The taxonomy can guide teams in selecting or building debugging tools that match their stack (e.g., integrating simulation replay with ROS‑based logging).
- CI/CD Integration: Highlighted gaps suggest opportunities to embed automated fault localisation into continuous testing pipelines, reducing mean‑time‑to‑repair (MTTR) for ADS updates.
- Safety Certification: Standardised terminology and problem definitions can simplify evidence collection for regulatory bodies (e.g., ISO 26262, UNECE WP.29).
- Prioritising Investment: Companies can focus R&D on under‑served areas—planning‑module debugging, real‑vehicle fault localisation, and hybrid (simulation + real‑world) approaches—to gain a competitive edge.
- Cross‑Domain Learning: Techniques from other safety‑critical domains (e.g., aerospace fault injection) identified in the study could be adapted for ADS, accelerating maturity.
Limitations & Future Work
- Scope Restriction: The mapping only includes English‑language, peer‑reviewed papers up to 2023, possibly missing recent industry white‑papers or proprietary solutions.
- Depth vs. Breadth: Systematic mapping favours breadth; the study does not deeply evaluate the effectiveness of each debugging technique.
- Rapidly Evolving Landscape: New ML‑driven debugging tools (e.g., neural attention visualisers) are emerging faster than the literature can capture.
The authors call for:
- a shared benchmark suite for debugging ADS,
- open‑source repositories of debugging artefacts, and
- longitudinal studies that measure the impact of debugging interventions on safety metrics in real deployments.
Authors
- Nathan Shaw
- Sanjeetha Pennada
- Robert M Hierons
- Donghwan Shin
Paper Information
- arXiv ID: 2601.04293v1
- Categories: cs.SE
- Published: January 7, 2026
- PDF: Download PDF