[Paper] When Video Misreads: Closed-Loop Distillation of Reading Heuristics for Exploratory Manipulation Trace QA

Published: (June 7, 2026 at 05:49 AM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.08542v1

Overview

Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For example, a robot pulls a locked cabinet drawer, fails, and only succeeds after opening the lock. The failed pull reveals a latent precondition (the drawer is locked) that determines the minimal-success action chain (the fewest actions that complete the task), here [lock-open, drawer-pull]. Correctly reading this trace is therefore the prerequisite for recovering that chain. We formalize this setting as Exploratory Manipulation Trace QA (EMT-QA): given synchronized video and proprioception from an exploratory trace, predict the minimal-success action chain under the latent precondition revealed by the probe. However, even state-of-the-art VLMs and embodied multimodal LLMs misread this evidence: they do not reliably recover the chain from raw video, raw proprioception, or their combination. We introduce Closed-Loop Trace Distillation, a pipeline that uses a per-task coding agent to inspect labeled training traces and distill a one-line natural-language prompt over the trace, which we call the Distilled Reading Heuristic (DRH). At inference, no agent is invoked and no model weights are updated; a frozen VLM receives the raw trace plus the DRH as a prompt entry. Across three simulator and two real-robot tasks, the DRH improves chain accuracy by +0.38 to +0.47 over the best raw-modality baseline. The same DRH also serves as the sole specification for one-shot programmatic classifiers that match the prompted VLM.

Key Contributions

This paper presents research in the following areas:

  • cs.RO
  • cs.AI
  • cs.CV

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.RO.

Authors

  • Haizhou Ge
  • Yufei Jia
  • Yue Li
  • Zhixing Chen
  • Lu Shi
  • Lei Han
  • Guyue Zhou
  • Ruqi Huang

Paper Information

  • arXiv ID: 2606.08542v1
  • Categories: cs.RO, cs.AI, cs.CV
  • Published: June 7, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »