[Paper] Do as I Do: Dexterous Manipulation Data from Everyday Human Videos

Published: (June 17, 2026 at 01:57 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.19333v1

Overview

How can we scalably generate data for robotic manipulation, especially on human-like platforms such as dexterous multi-fingered hands? Learning from human videos has recently emerged as a likely answer to this question. However, difficulties in estimating hand-object interaction and crossing the human-to-robot embodiment gap have hindered the adoption of abundant monocular RGB-only human videos as the primary source of robot manipulation data. In this work, we present DO AS I DO, an algorithm to reconstruct and retarget monocular RGB human videos to multi-fingered dexterous robotic hands. DO AS I DO reconstructs hand-object interactions from various egocentric and exocentric in-the-wild video sources. The algorithm then retargets these hand-object interaction estimates into a sequence of actions executable in the real world, yielding robot-complete manipulation data from disparate human videos. Overall, DO AS I DO outperforms previous state of the art in estimating hand-object interactions and extracting dexterous manipulation trajectories from RGB videos, as we show in experiments on datasets with ground truths and on a dataset of video clips collected online. Our experiments enable us to propose an efficacy playbook for practitioners collecting human data for manipulation.

Key Contributions

This paper presents research in the following areas:

  • cs.RO
  • cs.CV

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.RO.

Authors

  • Bhawna Paliwal
  • Haritheja Etukuru
  • William Liang
  • Pieter Abbeel
  • Nur Muhammad Mahi Shafiullah
  • Jitendra Malik

Paper Information

  • arXiv ID: 2606.19333v1
  • Categories: cs.RO, cs.CV
  • Published: June 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »