[Paper] Planning-aligned Token Compression for Long-Context Autonomous Driving

Published: 5 days ago (June 5, 2026 at 01:16 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.07464v1

Overview

Monolithic vision-action models represent an emerging paradigm in autonomous driving. However, this architecture produces token sequences that quickly exceed real-time computational budgets when encoding extended temporal context for complex interactions. While approaches like linear transformers and external memory try to make the context lightweight, token compression is most compatible with the architecture as it requires no backbone modifications. Yet existing compression adopts rule-based heuristics like temporal decay, decoupled from planning, risking loss of decision-critical information. We propose COMPACT-VA, a planning-aligned working memory framework built on conditional VQ-VAE, compressing extended context into bounded representations. Compression is conditioned on both historical trajectory and a learned planning intent that the posterior encoder distills from future trajectories during training, while the prior encoder learns to predict it from compressed observations. The compressed memory, concatenated with the predicted latent, feeds the policy for end-to-end optimization, planning with retained decision-critical information. We evaluate on high-signal dynamic scenarios where historical context is most critical for behavior correctness (e.g., stop, yield, or proceed), and accordingly design behavioral metrics. Under comparable token budgets, we achieve $>$6% improvement (68.3%) on success rates with consistent gains across metrics. Ablations validate planning-aligned coupling effectiveness. Closed-loop evaluation confirms that COMPACT-VA maintained general driving performance with 3.3* speedup and 2.7* memory reduction over uncompressed processing.

Key Contributions

This paper presents research in the following areas:

cs.RO
cs.AI
cs.CV

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.RO.

Authors

Zhixuan Liang
Yuxiao Chen
Yurong You
Peter Karkus
Wenhao Ding
Boyi Li
Alexander Popov
Yan Wang
Maximilian Igl
Yiming Li
Danfei Xu
Nikolai Smolyanskiy
Boris Ivanovic
Ping Luo
Marco Pavone

Paper Information

arXiv ID: 2606.07464v1
Categories: cs.RO, cs.AI, cs.CV
Published: June 5, 2026
PDF: Download PDF

[Paper] Planning-aligned Token Compression for Long-Context Autonomous Driving

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

[Paper] TEVI: Text-Conditioned Editing of Visual Representations via Sparse Autoencoders for Improved Vision-Language Alignment

[Paper] Watch, Remember, Reason: Human-View Video Understanding with MLLMs

[Paper] Impact of Synthetic Lesional MR Images in Automated Focal Cortical Dysplasia Detection in Low-Data Scenarios