[Paper] TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Published: 6 days ago (June 4, 2026 at 01:59 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.06491v1

Overview

Robot manipulation alternates between low‑risk transit phases that call for fast execution and high‑risk contact stages that demand slow, precise motion. Yet existing Vision‑Language‑Action models (VLAs) only inherit a single fixed speed from training demonstrations. Prior efforts to accelerate VLAs through model compression, KV‑cache reuse, or reinforcement learning only shift the policy from one fixed speed to another, and leave deceleration almost unexplored. We observe that the magnitude of each predicted action already governs how fast the robot moves, opening a direct route to controllable execution speed. We turn this observation into TempoVLA, a single VLA whose execution speed is controlled by an explicit condition.

TempoVLA combines two coupled components:

Variable‑Speed Trajectory Augmentation (VSTA) – a data‑side technique that re‑times demonstrations to any target speed by merging or splitting actions while preserving motion semantics.
Speed‑conditioning mechanism – a model‑side approach that feeds the desired speed to the policy.

Statistics show that VSTA reaches the requested speed with negligible motion error. Experiments in simulation and on real‑world tasks demonstrate that TempoVLA achieves flexible speed control in both directions, while VSTA additionally boosts the default $1\times$ performance via better data utilization. Furthermore, by cooperating with a large multimodal model, TempoVLA realizes dynamic speed control, accelerating through low‑risk phases and decelerating for high‑risk ones.

Key Contributions

cs.RO
cs.AI

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.RO.

Authors

Dong Jing
Jingchen Nie
Tianqi Zhang
Jiaqi Liu
Huaxiu Yao
Zhiwu Lu
Mingyu Ding

Paper Information

arXiv ID: 2606.06491v1
Categories: cs.RO, cs.AI
Published: June 4, 2026
PDF: Download PDF

[Paper] TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] How reliable are LLMs when it comes to playing dice?

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

[Paper] Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

[Paper] Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization