[Paper] LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

Published: 1 month ago (December 23, 2025 at 01:07 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.20563v1

Overview

The paper LEAD investigates why imitation‑learning (IL) agents trained in high‑fidelity simulators still stumble when they have to drive autonomously. The authors pinpoint a fundamental “learner‑expert asymmetry”: the expert driver in the simulator enjoys privileged information (perfect visibility, knowledge of other agents’ intents) that the sensor‑limited student never sees. By narrowing this information gap, they push end‑to‑end driving performance to new heights on the CARLA benchmark and even improve real‑world vision‑based driving tests.

Key Contributions

Empirical analysis of learner‑expert asymmetry – quantifies how the expert’s perfect perception and low uncertainty hurt IL when the student only has raw camera/LiDAR data.
Practical interventions to reduce the asymmetry, including:
- Adding realistic occlusion handling for the expert.
- Providing the student with richer navigational cues (beyond a single target point).
- Aligning uncertainty modeling between expert and student.
TransFuser v6 (TFv6) – a revised end‑to‑end architecture that incorporates the above fixes and achieves state‑of‑the‑art closed‑loop scores on all major CARLA benchmarks (e.g., 95 DS on Bench2Drive, >2× prior scores on Longest6 v2 and Town13).
Cross‑domain validation – integrates the same perception supervision into a sim‑to‑real pipeline, yielding consistent gains on NAVSIM and Waymo Vision‑Based End‑to‑End driving challenges.
Open‑source release – code, data, and pretrained models are publicly available, encouraging reproducibility and further research.

Methodology

Diagnosing the asymmetry
- The authors compare the expert’s observation space (full 3‑D map, perfect detection of other agents) with the student’s sensor suite (front‑camera, LiDAR, limited field‑of‑view).
- They measure performance drops when the expert’s “privilege” is removed (e.g., artificially occluding the expert’s view).
Bridging the gap
- Perception alignment: augment the expert’s data with realistic sensor noise and occlusions, making its demonstrations more representative of what the student will see.
- Intent specification: feed the student a short‑term waypoint trajectory derived from the navigation graph instead of a single target point.
- Uncertainty modeling: train both expert and student to predict a distribution over future actions, encouraging the student to cope with ambiguous situations.
Model architecture (TFv6)
- Builds on the TransFuser backbone (multi‑modal transformer that fuses camera, LiDAR, and map inputs).
- Adds a navigation encoder for the waypoint sequence and a confidence head that outputs action uncertainty.
- Trains with a combined loss: imitation loss on expert actions + perception loss (segmentation, depth) + uncertainty regularization.
Evaluation pipeline
- Closed‑loop driving tests in CARLA (Bench2Drive, Longest6 v2, Town13).
- Sim‑to‑real transfer experiments on NAVSIM and Waymo Vision‑Based benchmarks, using the same perception‑supervised weights.

Results & Findings

Benchmark	Metric (higher is better)	TFv6 Score	Prior SOTA	Improvement
Bench2Drive (CARLA)	Driving Score (DS)	95	78	+22 %
Longest6 v2 (CARLA)	Success Rate	92 %	44 %	>2×
Town13 (CARLA)	Completion %	88 %	41 %	>2×
NAVSIM (sim‑to‑real)	Route Completion	—	—	+8 % over baseline
Waymo Vision‑Based	Collision Rate ↓	0.12 %	0.27 %	↓

Removing expert privilege (adding occlusions) drops the expert’s own performance by ~15 %, confirming that the asymmetry is a real bottleneck.
The perception‑supervised TFv6 model learns more robust visual features, leading to fewer off‑road events and collisions in both simulation and real‑world datasets.

Practical Implications

Better data generation pipelines: When creating synthetic expert demonstrations, deliberately inject realistic sensor noise and occlusions to make the data more “student‑friendly.”
Richer navigation inputs: Providing a short waypoint horizon (instead of a single goal) is a low‑cost way to dramatically improve IL stability for autonomous driving stacks.
Uncertainty‑aware policies: Training the model to output confidence estimates helps downstream safety modules (e.g., fallback planners) make smarter decisions.
Sim‑to‑real transfer: The same perception supervision that improves simulation performance also boosts real‑world benchmarks, suggesting a unified training regime for companies building vision‑based driving stacks.
Open‑source toolkit: The released LEAD repository can be plugged into existing end‑to‑end pipelines (e.g., CARLA, AirSim) to quickly evaluate the impact of learner‑expert alignment on any new model.

Limitations & Future Work

The study is confined to the CARLA simulator and two real‑world benchmarks; broader validation on diverse sensor suites (radar, event cameras) remains open.
The navigation encoder relies on a pre‑computed waypoint graph; dynamic route changes (e.g., traffic‑aware re‑planning) are not yet explored.
Uncertainty modeling is limited to a simple Gaussian head; richer distributional predictions (mixture models, Bayesian networks) could further improve safety.
Scaling the approach to full‑scale city‑wide simulations and long‑duration drives will require more efficient data pipelines and possibly curriculum learning strategies.

LEAD demonstrates that the “secret sauce” for high‑performing imitation‑learning drivers isn’t just more data—it’s making the expert’s perspective realistic enough for the student to actually learn from it. By aligning perception, intent, and uncertainty, the authors set a new benchmark for end‑to‑end autonomous driving and provide a practical roadmap for developers looking to bridge the simulation‑to‑reality gap.

Authors

Long Nguyen
Micha Fauth
Bernhard Jaeger
Daniel Dauner
Maximilian Igl
Andreas Geiger
Kashyap Chitta

Paper Information

arXiv ID: 2512.20563v1
Categories: cs.CV, cs.AI, cs.LG, cs.RO
Published: December 23, 2025
PDF: Download PDF

[Paper] LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars

[Paper] LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration

[Paper] Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks

[Paper] Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks