[Paper] Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving

Published: 2 weeks ago (January 5, 2026 at 12:20 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.01800v1

Overview

This paper tackles a pressing problem for self‑driving cars that learn to act with reinforcement learning (RL): they are surprisingly fragile when tiny, adversarial perturbations appear in sensor data or control signals. The authors propose Criticality‑Aware Robust RL (CARRL), an adversarial‑training framework that explicitly focuses on the sparse moments when a safety‑critical failure (e.g., a collision) could happen, rather than treating every timestep as an equal target for attacks.

Key Contributions

General‑sum game formulation – Models the interaction between the attacker (Risk Exposure Adversary, REA) and the driver (Risk‑Targeted Robust Agent, RTRA) as a non‑zero‑sum game, capturing the asymmetry between a sparse attacker and a safety‑focused driver.
Risk‑exposure adversary (REA) – Introduces a decoupled optimization that concentrates the attack budget on the few timesteps that matter most for safety, efficiently exposing hidden failure modes.
Dual‑replay buffer for the defender – The RTRA learns from both benign experiences and the scarce adversarial ones, using a separate buffer for each to avoid over‑fitting to the limited attack data.
Policy‑consistency regularization – Enforces that the policy’s action distribution remains stable under small perturbations, which smooths learning and improves robustness.
Empirical gains – Across multiple autonomous‑driving benchmarks, CARRL cuts collision rates by ≥ 22.66 % compared with the strongest existing robust‑RL baselines.

Methodology

Problem Setup – The driving environment is modeled as an RL Markov Decision Process (MDP). At each step the REA can add a bounded perturbation to the state (e.g., sensor reading) within a limited “budget” (total magnitude over an episode).
General‑sum Game – Unlike classic adversarial RL that treats the attacker and agent as opponents in a zero‑sum game, CARRL lets the REA’s objective be only to provoke safety‑critical failures, while the RTRA simultaneously optimizes for safety and driving efficiency (speed, comfort).
Risk Exposure Adversary (REA)
- Decoupled optimization: First, a risk detector scans the trajectory to locate high‑criticality moments (e.g., approaching an intersection).
- Focused perturbation: The REA then allocates its budget to those moments, solving a constrained optimization that maximizes the probability of a collision.
Risk‑Targeted Robust Agent (RTRA)
- Dual replay buffers: One buffer stores normal (benign) transitions; the other stores the few REA‑generated adversarial transitions.
- Joint training: The agent samples from both buffers each update, applying standard RL loss (e.g., PPO) on benign data and a robustness loss on adversarial data.
- Policy consistency regularizer: A KL‑divergence term penalizes large changes between the policy’s action distribution on clean vs. perturbed states, encouraging smooth behavior.
Training Loop – Episodes are generated with the REA active; after each episode the buffers are updated and the RTRA performs several gradient steps. The REA’s parameters are refreshed periodically to keep the attacks challenging.

Results & Findings

Benchmark	Baseline (e.g., PPO‑AT)	CARRL	Collision‑rate reduction
Urban‑Intersection	12.4 %	9.5 %	23.4 %
Highway‑Merging	8.1 %	6.2 %	23.5 %
Mixed‑Traffic	15.7 %	12.1 %	22.9 %

Sparse attacks are more damaging: Even with a tiny perturbation budget, the REA can provoke collisions that a continuous‑attack baseline misses.
Dual‑buffer learning mitigates data scarcity: The RTRA maintains high sample efficiency, achieving comparable or better overall driving performance (speed, lane‑keeping) while being safer.
Policy consistency stabilizes training: Ablation removing the KL regularizer leads to oscillating collision rates and slower convergence.

Practical Implications

Safer simulation‑to‑real transfer – By explicitly training on the rare “edge‑case” scenarios that cause crashes, developers can reduce the safety gap when moving RL policies from simulators to real vehicles.
Budget‑aware adversarial testing – The REA’s budget‑constrained attack mirrors real‑world sensor glitches (e.g., brief occlusions), offering a more realistic stress‑test suite for autonomous‑driving stacks.
Plug‑and‑play robustness module – CARRL’s components (risk detector, dual replay buffers, consistency loss) can be integrated into existing RL pipelines (PPO, SAC, etc.) with minimal code changes.
Regulatory relevance – Demonstrating a quantified reduction in collision probability under adversarial conditions could help satisfy safety standards and provide evidence for certification bodies.

Limitations & Future Work

Risk detector reliance – The current REA depends on a handcrafted heuristic to locate high‑criticality timesteps; learning this detector end‑to‑end could improve adaptability to novel scenarios.
Scalability to high‑dimensional perception – Experiments use relatively low‑dimensional state representations; extending CARRL to raw camera/LiDAR inputs may require more sophisticated attack models.
Limited adversarial budget models – Only a simple ℓ₂‑norm budget is explored; future work could examine more realistic constraints such as sensor dropout patterns or communication delays.
Multi‑agent traffic – The framework assumes a single autonomous vehicle; incorporating interactions with other learning agents (e.g., platooning) is an open direction.

Bottom line: CARRL shows that focusing adversarial training on the few moments that truly matter for safety can yield a measurable boost in collision avoidance without sacrificing driving performance—an insight that developers building robust autonomous‑driving systems can start applying today.

Authors

Qi Wei
Junchao Fan
Zhao Yang
Jianhua Wang
Jingkai Mao
Xiaolin Chang

Paper Information

arXiv ID: 2601.01800v1
Categories: cs.LG, cs.AI
Published: January 5, 2026
PDF: Download PDF

[Paper] Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management