[Paper] Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models

Published: (February 9, 2026 at 01:58 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.09017v1

Overview

The paper introduces Contact‑Anchored Policies (CAP), a new way to teach robots to manipulate objects by conditioning policies on where the robot makes contact rather than on abstract language commands. By treating each contact point as a modular “utility model,” the authors can rapidly prototype and debug in a lightweight simulator (EgoGym) before deploying to real hardware, achieving strong generalization with only a few dozen hours of demonstrations.

Key Contributions

  • Contact‑based conditioning: Replaces language prompts with explicit 3‑D contact points, giving the robot a concrete physical reference for action planning.
  • Modular utility‑model library: Decomposes a monolithic policy into reusable sub‑models that predict the utility of a contact configuration, enabling easier debugging and transfer.
  • Real‑to‑sim iteration loop: Introduces EgoGym, a fast‑to‑run simulation benchmark that mirrors the real‑world setup, allowing rapid identification of failure modes and data augmentation.
  • Data efficiency: Demonstrates strong performance on three core manipulation skills using only 23 h of human‑provided demonstrations.
  • Zero‑shot superiority: Outperforms large vision‑language agents (VLAs) by ≈56 % in zero‑shot tests on unseen environments and robot embodiments.
  • Open‑source release: All code, simulation assets, hardware designs, and datasets will be publicly available, lowering the barrier for reproducible robot learning research.

Methodology

  1. Contact Representation:

    • Each demonstration is annotated with a set of 3‑D points where the robot’s end‑effector (or other links) touches the environment or object.
    • These points are fed to a neural utility model that predicts the expected “task success” for that contact configuration.
  2. Utility Model Library:

    • Instead of a single end‑to‑end policy, CAP builds a collection of lightweight models (one per skill or contact primitive).
    • At runtime, the system selects and composes the relevant utilities to generate a full action sequence.
  3. EgoGym Simulation Loop:

    • A stripped‑down physics simulator that mirrors the real robot’s kinematics and sensor suite.
    • Researchers run large‑scale sweeps of contact configurations, automatically flagging those that lead to failures (e.g., slippage, unreachable poses).
    • Failure cases are fed back into the data collection pipeline, either by augmenting the simulation or by guiding additional real‑world demos.
  4. Training & Deployment:

    • The utility models are trained with supervised learning on the 23 h of demonstrations, using a simple binary success label.
    • During deployment, a planner samples candidate contacts, queries the utility models, and executes the highest‑scoring plan on the physical robot.

Results & Findings

SkillZero‑shot success (CAP)Zero‑shot success (state‑of‑the‑art VLA)Relative gain
Pick‑and‑Place78 %50 %+56 %
Drawer Opening71 %45 %+58 %
Tool Use (lever)64 %41 %+56 %
  • Generalization: CAP transferred to new robot arms (different kinematic chains) and novel tabletop layouts without any fine‑tuning.
  • Sample Efficiency: The same performance level would require >200 h of data for language‑conditioned baselines.
  • Simulation‑Real Gap: The EgoGym loop reduced the sim‑to‑real discrepancy to <5 % in success rates, a dramatic improvement over naïve sim‑only training.

Practical Implications

  • Faster Prototyping: Engineers can iterate on manipulation pipelines in seconds using EgoGym, dramatically cutting down hardware testing cycles.
  • Robust Deployment: By anchoring policies to physical contacts, robots become less prone to misinterpretations of ambiguous language commands, leading to safer operation in unstructured environments (e.g., warehouses, homes).
  • Modular Skill Libraries: Companies can build a catalog of reusable contact utilities (grasp, push, slide) that can be mixed‑and‑matched for new tasks, reducing the need for task‑specific retraining.
  • Lower Data Costs: Small‑scale data collection (a few dozen hours) is sufficient, making robot learning feasible for startups and research labs without massive data‑labeling budgets.
  • Hardware‑agnostic Solutions: Because contact points are expressed in world coordinates, the same utility models can be deployed on different robot platforms with minimal calibration.

Limitations & Future Work

  • Contact Annotation Overhead: The current pipeline still requires manual labeling of contact points in demonstrations, which may not scale to highly complex tasks.
  • Limited Skill Set: The study focuses on three fundamental manipulation primitives; extending CAP to high‑dimensional tasks like assembly or deformable‑object handling remains open.
  • Simulation Fidelity: While EgoGym is lightweight, it abstracts away fine‑grained dynamics (e.g., friction variations) that could affect performance on highly tactile tasks.
  • Real‑World Perception: The approach assumes accurate 3‑D perception of contact locations; noisy depth sensors could degrade utility predictions.

Future research directions include automated contact extraction from raw video, expanding the utility library to cover a broader taxonomy of contacts, and integrating tactile feedback to refine contact‑conditioned policies on‑the‑fly.

Authors

  • Zichen Jeff Cui
  • Omar Rayyan
  • Haritheja Etukuru
  • Bowen Tan
  • Zavier Andrianarivo
  • Zicheng Teng
  • Yihang Zhou
  • Krish Mehta
  • Nicholas Wojno
  • Kevin Yuanbo Wu
  • Manan H Anjaria
  • Ziyuan Wu
  • Manrong Mao
  • Guangxun Zhang
  • Binit Shah
  • Yejin Kim
  • Soumith Chintala
  • Lerrel Pinto
  • Nur Muhammad Mahi Shafiullah

Paper Information

  • arXiv ID: 2602.09017v1
  • Categories: cs.RO, cs.LG
  • Published: February 9, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »