[Paper] IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models

Published: 1 week ago (January 30, 2026 at 01:34 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.23266v1

Overview

The paper introduces IRL‑DAL, a new framework that blends inverse reinforcement learning (IRL), diffusion models, and adaptive perception to produce a safe, high‑performance trajectory planner for autonomous vehicles. By starting from an expert finite‑state‑machine (FSM) controller and then fine‑tuning with reinforcement learning, the authors achieve near‑human driving quality while dramatically cutting collision rates.

Key Contributions

Hybrid IRL‑RL training pipeline – combines expert imitation, an IRL discriminator, and a diffusion‑based safety supervisor.
Conditional diffusion model as a safety supervisor – generates lane‑keeping, obstacle‑avoidance trajectories that respect smoothness constraints.
Learnable Adaptive Mask (LAM) – a perception module that dynamically shifts visual attention based on vehicle speed and nearby hazards.
Two‑stage curriculum in Webots – first train on simple scenarios, then progressively harder traffic situations, leading to a 96 % success rate.
State‑of‑the‑art safety metrics – collisions reduced to 0.05 per 1 k simulation steps, setting a new benchmark for safe autonomous navigation.
Open‑source release – code and trained models are publicly available for reproducibility and downstream research.

Methodology

Imitation Pre‑training – The policy network is first trained to mimic an expert FSM controller, providing a stable baseline that already respects traffic rules.
IRL Discriminator Integration – An IRL discriminator evaluates how well the agent’s behavior matches expert intentions, feeding a reward signal that nudges the policy toward expert‑like decisions.
Hybrid Reward RL (PPO) – Proximal Policy Optimization is used with a composite reward:
- Environmental feedback (e.g., lane deviation, speed limits) supplied by a conditional diffusion model that predicts safe trajectories.
- IRL reward from the discriminator that captures higher‑level goals such as courteous merging.
Conditional Diffusion Safety Supervisor – Trained on safe trajectory data, the diffusion model generates candidate paths conditioned on the current scene; the planner selects the one that best satisfies safety constraints.
Learnable Adaptive Mask (LAM) – A lightweight attention mask applied to the perception pipeline; its parameters are learned jointly with the policy, allowing the system to focus on relevant visual cues (e.g., a pedestrian crossing when the car is slow).
Curriculum Learning – Training proceeds from simple straight‑road scenarios to complex urban intersections, ensuring the agent gradually acquires robust behaviors.

Results & Findings

Metric	Value
Success rate (complete episode without infractions)	96 %
Collisions per 1 k simulation steps	0.05
Lane‑keeping deviation (average)	0.12 m
Smoothness (jerk)	Reduced by 38 % vs. baseline PPO

The diffusion‑based supervisor proved especially effective at preventing “last‑minute” lane changes, while LAM improved perception accuracy in cluttered scenes by ~12 % compared to a static camera mask. Ablation studies showed that removing either the IRL discriminator or the diffusion supervisor caused success rates to drop below 80 %.

Practical Implications

Safer Deployment in Real‑World AVs – The hybrid reward structure can be plugged into existing RL pipelines to inherit expert knowledge while still allowing continuous improvement.
Modular Safety Layer – The diffusion supervisor acts as a drop‑in safety filter that can be used with any downstream planner, offering a principled way to enforce lane‑keeping and obstacle avoidance without hand‑crafted rules.
Adaptive Perception for Edge Cases – LAM’s speed‑aware attention can be integrated into camera‑based perception stacks, helping AVs allocate compute where it matters most (e.g., focusing on crosswalks when slowing down).
Curriculum‑Based Training Framework – The two‑stage curriculum is readily transferable to other simulators (CARLA, LGSVL), accelerating the development of robust policies for diverse traffic environments.
Open‑Source Baseline – Researchers and engineers can benchmark new planning algorithms against IRL‑DAL’s publicly released code, fostering faster iteration in the community.

Limitations & Future Work

Simulation‑Only Validation – All experiments were performed in the Webots simulator; real‑world transferability remains to be demonstrated.
Computational Overhead of Diffusion Sampling – Generating safety trajectories on‑the‑fly adds latency; future work could explore lightweight diffusion approximations or caching strategies.
Limited Sensor Modalities – The current setup uses only visual input; extending LAM to fuse LiDAR/radar could improve robustness under adverse weather.
Scalability of IRL Discriminator – As traffic scenarios become more complex, the discriminator may need richer state representations to capture nuanced expert intents.

The authors suggest tackling these points by integrating domain‑randomization for sim‑to‑real transfer, optimizing diffusion inference, and expanding the perception stack to multimodal sensors.

Authors

Seyed Ahmad Hosseini Miangoleh
Amin Jalal Aghdasian
Farzaneh Abdollahi

Paper Information

arXiv ID: 2601.23266v1
Categories: cs.RO, cs.AI
Published: January 30, 2026
PDF: Download PDF

[Paper] IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

[Paper] End-to-end Optimization of Belief and Policy Learning in Shared Autonomy Paradigms

[Paper] Decoupled Diffusion Sampling for Inverse Problems on Function Spaces

[Paper] FOCUS: DLLMs Know How to Tame Their Compute Bound