[Paper] IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models

Published: (January 30, 2026 at 01:34 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.23266v1

Overview

The paper introduces IRL‑DAL, a new framework that blends inverse reinforcement learning (IRL), diffusion models, and adaptive perception to produce a safe, high‑performance trajectory planner for autonomous vehicles. By starting from an expert finite‑state‑machine (FSM) controller and then fine‑tuning with reinforcement learning, the authors achieve near‑human driving quality while dramatically cutting collision rates.

Key Contributions

  • Hybrid IRL‑RL training pipeline – combines expert imitation, an IRL discriminator, and a diffusion‑based safety supervisor.
  • Conditional diffusion model as a safety supervisor – generates lane‑keeping, obstacle‑avoidance trajectories that respect smoothness constraints.
  • Learnable Adaptive Mask (LAM) – a perception module that dynamically shifts visual attention based on vehicle speed and nearby hazards.
  • Two‑stage curriculum in Webots – first train on simple scenarios, then progressively harder traffic situations, leading to a 96 % success rate.
  • State‑of‑the‑art safety metrics – collisions reduced to 0.05 per 1 k simulation steps, setting a new benchmark for safe autonomous navigation.
  • Open‑source release – code and trained models are publicly available for reproducibility and downstream research.

Methodology

  1. Imitation Pre‑training – The policy network is first trained to mimic an expert FSM controller, providing a stable baseline that already respects traffic rules.
  2. IRL Discriminator Integration – An IRL discriminator evaluates how well the agent’s behavior matches expert intentions, feeding a reward signal that nudges the policy toward expert‑like decisions.
  3. Hybrid Reward RL (PPO) – Proximal Policy Optimization is used with a composite reward:
    • Environmental feedback (e.g., lane deviation, speed limits) supplied by a conditional diffusion model that predicts safe trajectories.
    • IRL reward from the discriminator that captures higher‑level goals such as courteous merging.
  4. Conditional Diffusion Safety Supervisor – Trained on safe trajectory data, the diffusion model generates candidate paths conditioned on the current scene; the planner selects the one that best satisfies safety constraints.
  5. Learnable Adaptive Mask (LAM) – A lightweight attention mask applied to the perception pipeline; its parameters are learned jointly with the policy, allowing the system to focus on relevant visual cues (e.g., a pedestrian crossing when the car is slow).
  6. Curriculum Learning – Training proceeds from simple straight‑road scenarios to complex urban intersections, ensuring the agent gradually acquires robust behaviors.

Results & Findings

MetricValue
Success rate (complete episode without infractions)96 %
Collisions per 1 k simulation steps0.05
Lane‑keeping deviation (average)0.12 m
Smoothness (jerk)Reduced by 38 % vs. baseline PPO

The diffusion‑based supervisor proved especially effective at preventing “last‑minute” lane changes, while LAM improved perception accuracy in cluttered scenes by ~12 % compared to a static camera mask. Ablation studies showed that removing either the IRL discriminator or the diffusion supervisor caused success rates to drop below 80 %.

Practical Implications

  • Safer Deployment in Real‑World AVs – The hybrid reward structure can be plugged into existing RL pipelines to inherit expert knowledge while still allowing continuous improvement.
  • Modular Safety Layer – The diffusion supervisor acts as a drop‑in safety filter that can be used with any downstream planner, offering a principled way to enforce lane‑keeping and obstacle avoidance without hand‑crafted rules.
  • Adaptive Perception for Edge Cases – LAM’s speed‑aware attention can be integrated into camera‑based perception stacks, helping AVs allocate compute where it matters most (e.g., focusing on crosswalks when slowing down).
  • Curriculum‑Based Training Framework – The two‑stage curriculum is readily transferable to other simulators (CARLA, LGSVL), accelerating the development of robust policies for diverse traffic environments.
  • Open‑Source Baseline – Researchers and engineers can benchmark new planning algorithms against IRL‑DAL’s publicly released code, fostering faster iteration in the community.

Limitations & Future Work

  • Simulation‑Only Validation – All experiments were performed in the Webots simulator; real‑world transferability remains to be demonstrated.
  • Computational Overhead of Diffusion Sampling – Generating safety trajectories on‑the‑fly adds latency; future work could explore lightweight diffusion approximations or caching strategies.
  • Limited Sensor Modalities – The current setup uses only visual input; extending LAM to fuse LiDAR/radar could improve robustness under adverse weather.
  • Scalability of IRL Discriminator – As traffic scenarios become more complex, the discriminator may need richer state representations to capture nuanced expert intents.

The authors suggest tackling these points by integrating domain‑randomization for sim‑to‑real transfer, optimizing diffusion inference, and expanding the perception stack to multimodal sensors.

Authors

  • Seyed Ahmad Hosseini Miangoleh
  • Amin Jalal Aghdasian
  • Farzaneh Abdollahi

Paper Information

  • arXiv ID: 2601.23266v1
  • Categories: cs.RO, cs.AI
  • Published: January 30, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »