[Paper] SafePlanner: Testing Safety of the Automated Driving System Plan Model
Source: arXiv - 2601.09171v1
Overview
SafePlanner is a new testing framework that automatically uncovers safety‑critical bugs in the planning component of Automated Driving Systems (ADS). By analyzing the planner’s source code to generate realistic driving scenarios and then fuzz‑testing the planner’s decisions, the authors expose hidden hazards in a production‑grade Level‑4 system (Baidu Apollo) that would be extremely hard to find with conventional simulation or on‑road testing.
Key Contributions
- Structural scenario generation – extracts feasible scene‑transition pairs directly from the planner’s control‑flow graph, guaranteeing that every generated test case respects the planner’s internal logic.
- Guided fuzzing of planner behavior – combines the extracted transitions with diverse NPC (non‑player vehicle) actions and uses coverage‑aware fuzzing to explore the planner’s decision space efficiently.
- Comprehensive coverage metrics – achieves 83.6 % function and 63.2 % decision coverage on the Plan model, far surpassing baseline random or naïve scenario generators.
- Real‑world bug discovery – identifies 520 hazardous behaviors in Baidu Apollo, distilled into 15 root‑cause categories; patches for four of them eliminate the bugs without side effects.
- Scalable test suite – automatically produces over 20 k test cases, demonstrating that systematic code‑driven scenario synthesis can scale to large, production‑grade planners.
Methodology
- Static structural analysis – The framework parses the planner’s source code to build a scene‑transition graph that captures how the planner moves between high‑level driving contexts (e.g., “follow lane”, “merge”, “stop”).
- Feasible transition extraction – From this graph, SafePlanner extracts concrete scene‑transition pairs that are actually reachable given the planner’s hierarchical control flow.
- Scenario composition – Each transition is paired with a set of NPC vehicle behaviors (speed profiles, lane changes, cut‑ins, etc.) to form a full driving scenario.
- Guided fuzzing – A coverage‑guided fuzzer mutates the NPC parameters and timing while monitoring the planner’s internal decision branches. The fuzzer prioritizes inputs that increase function or decision coverage, steering the search toward unexplored planner logic.
- Hazard detection – After each run, the system checks for safety violations (e.g., collisions, near‑misses, illegal maneuvers) using a lightweight runtime monitor. Detected violations are logged for manual root‑cause analysis.
The whole pipeline runs in simulation, requiring only the planner’s code and a standard vehicle dynamics simulator.
Results & Findings
| Metric | SafePlanner | Baselines |
|---|---|---|
| Test cases generated | 20 635 | 8 412 (random) |
| Hazardous behaviors found | 520 | 127 |
| Function coverage | 83.6 % | 61.2 % |
| Decision coverage | 63.2 % | 38.5 % |
| Patches applied (validated) | 4 | – |
- The 520 hazards clustered into 15 distinct root causes, ranging from missing edge‑case checks in lane‑change logic to improper handling of sudden NPC decelerations.
- After fixing four of the most critical bugs, the planner exhibited zero regressions in the entire test suite, confirming that the patches did not introduce new issues.
- SafePlanner’s guided fuzzing required ≈30 % less CPU time than exhaustive random testing to achieve the same coverage levels.
Practical Implications
- Accelerated safety validation – Developers can integrate SafePlanner early in the CI pipeline to catch planner bugs before on‑road testing, dramatically reducing costly field trials.
- Regulatory compliance – The systematic coverage metrics provide concrete evidence for safety audits and can help satisfy standards such as ISO 26262 or UNECE R157.
- Rapid regression testing – When a planner is updated (e.g., new perception module or map data), SafePlanner can automatically regenerate relevant scenarios, ensuring that previous safety guarantees remain intact.
- Cross‑platform applicability – Because the approach relies on static code analysis and generic simulation APIs, it can be adapted to other ADS stacks (e.g., Autoware, Tesla’s Autopilot) with minimal effort.
- Developer productivity – By surfacing concrete root‑cause categories, the framework guides engineers to the most impactful code sections, focusing debugging effort where it matters most.
Limitations & Future Work
- Dependence on source code – SafePlanner requires access to the planner’s implementation; black‑box or proprietary planners cannot be analyzed directly.
- Simulation fidelity – While the generated scenarios are structurally valid, they still rely on the underlying vehicle dynamics simulator; mismatches with real‑world physics could hide certain bugs.
- Scalability of root‑cause analysis – Manual clustering of 520 hazards into 15 categories is labor‑intensive; future work could incorporate automated fault‑localization techniques.
- Extension to perception‑planning interaction – The current focus is solely on the planning module; integrating perception uncertainties would provide a more holistic safety assessment.
SafePlanner demonstrates that a code‑centric, coverage‑guided testing strategy can dramatically improve the safety assurance of modern automated driving planners, offering a practical toolset for developers aiming to ship reliable Level‑4+ systems.
Authors
- Dohyun Kim
- Sanggu Han
- Sangmin Woo
- Joonha Jang
- Jaehoon Kim
- Changhun Song
- Yongdae Kim
Paper Information
- arXiv ID: 2601.09171v1
- Categories: cs.SE
- Published: January 14, 2026
- PDF: Download PDF