[Paper] OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer

Published: (December 9, 2025 at 01:56 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.08920v1

Overview

The paper presents OSMO, an open‑source tactile glove that captures high‑resolution contact data (normal and shear forces) from human demonstrators and feeds it directly to robot learning pipelines. By closing the “visual‑tactile embodiment gap,” OSMO enables robots to acquire contact‑rich manipulation skills from pure human video demonstrations—no robot‑side data collection is required.

Key Contributions

  • Open‑source hardware: Full CAD files, PCB layouts, firmware, and step‑by‑step assembly instructions for a 12‑sensor, three‑axis tactile glove.
  • Unified sensing interface: Identical tactile data streams for both human demonstrators and robot end‑effectors, simplifying domain transfer.
  • Contact‑aware learning pipeline: Demonstrates that policies trained solely on human‑collected tactile trajectories can solve a real‑world wiping task with 72 % success.
  • Benchmark against vision‑only baselines: Shows a clear reduction in contact‑related failure modes when tactile feedback is incorporated.
  • Compatibility with existing hand‑tracking: Designed to work alongside state‑of‑the‑art vision‑based hand pose estimators for “in‑the‑wild” data capture.

Methodology

  1. Glove Design – Each fingertip and the palm host a 3‑axis force sensor (total 12 sensors) that streams normal and shear forces at ~200 Hz. The glove is lightweight, wireless, and powered by a small Li‑Po battery.
  2. Data Collection – Human operators wear the glove while performing manipulation demos captured by a standard RGB camera and a hand‑tracking system (e.g., MediaPipe). The tactile stream is synchronized with the video and pose data.
  3. Policy Training – The authors use behavior cloning: the robot’s neural network receives the synchronized visual pose plus tactile readings as input and predicts joint commands. No robot‑side interaction data is used during training.
  4. Deployment – The same glove (or a robot‑mounted replica) is attached to a 6‑DOF manipulator. During execution, the robot reads its own tactile sensors, feeding them back into the learned policy for closed‑loop control.
  5. Evaluation – A contact‑intensive wiping task (maintaining steady pressure on a surface while moving laterally) is used to compare tactile‑aware policies against vision‑only baselines.

Results & Findings

  • Success Rate: The tactile‑aware policy achieved 72 % task success across 50 trials, whereas the best vision‑only baseline peaked at ~45 %.
  • Failure Mode Reduction: Most vision‑only failures were due to loss of contact (slippage) or excessive force; tactile feedback allowed the robot to adjust pressure in real time.
  • Generalization: Policies trained on human demos transferred to the robot without any fine‑tuning, demonstrating that the shared glove interface effectively bridges embodiment differences.
  • Latency: End‑to‑end sensing‑to‑action latency stayed under 30 ms, sufficient for stable closed‑loop force control in the tested scenario.

Practical Implications

  • Rapid Skill Acquisition: Developers can harvest large libraries of human demonstrations (e.g., from YouTube) and instantly train tactile‑aware robots without costly robot‑side data collection.
  • Lower Barrier to Contact‑Rich Tasks: Industries such as assembly, cleaning, or food handling—where force control is critical—can adopt OSMO to prototype robust manipulation pipelines faster.
  • Modular Integration: Because the glove outputs standard ROS messages, it can be dropped into existing perception‑action stacks, augmenting vision‑only datasets with force cues.
  • Open‑source Ecosystem: The released hardware and firmware invite community extensions (e.g., higher‑density sensor arrays, haptic feedback for tele‑operation) and foster reproducibility.

Limitations & Future Work

  • Sensor Coverage: Only fingertips and palm are instrumented; finer contact points (e.g., finger pads) are not captured, which may limit performance on tasks requiring delicate finger‑level force modulation.
  • Calibration Overhead: Accurate force readings demand per‑glove calibration, adding a setup step for large‑scale deployments.
  • Scalability of Demonstrations: While the glove is inexpensive, gathering diverse, high‑quality human demos still requires careful instruction and consistent hand‑tracking quality.
  • Future Directions: The authors plan to (1) integrate additional shear‑sensitive sensors on the back of the hand, (2) explore self‑supervised domain adaptation to reduce calibration effort, and (3) extend the pipeline to multi‑modal learning that fuses tactile, vision, and audio cues for even richer skill transfer.

Authors

  • Jessica Yin
  • Haozhi Qi
  • Youngsun Wi
  • Sayantan Kundu
  • Mike Lambeta
  • William Yang
  • Changhao Wang
  • Tingfan Wu
  • Jitendra Malik
  • Tess Hellebrekers

Paper Information

  • arXiv ID: 2512.08920v1
  • Categories: cs.RO, cs.LG
  • Published: December 9, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »