[Paper] Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models
Source: arXiv - 2602.09017v1
Overview
The paper introduces Contact‑Anchored Policies (CAP), a new way to teach robots to manipulate objects by conditioning policies on where the robot makes contact rather than on abstract language commands. By treating each contact point as a modular “utility model,” the authors can rapidly prototype and debug in a lightweight simulator (EgoGym) before deploying to real hardware, achieving strong generalization with only a few dozen hours of demonstrations.
Key Contributions
- Contact‑based conditioning: Replaces language prompts with explicit 3‑D contact points, giving the robot a concrete physical reference for action planning.
- Modular utility‑model library: Decomposes a monolithic policy into reusable sub‑models that predict the utility of a contact configuration, enabling easier debugging and transfer.
- Real‑to‑sim iteration loop: Introduces EgoGym, a fast‑to‑run simulation benchmark that mirrors the real‑world setup, allowing rapid identification of failure modes and data augmentation.
- Data efficiency: Demonstrates strong performance on three core manipulation skills using only 23 h of human‑provided demonstrations.
- Zero‑shot superiority: Outperforms large vision‑language agents (VLAs) by ≈56 % in zero‑shot tests on unseen environments and robot embodiments.
- Open‑source release: All code, simulation assets, hardware designs, and datasets will be publicly available, lowering the barrier for reproducible robot learning research.
Methodology
-
Contact Representation:
- Each demonstration is annotated with a set of 3‑D points where the robot’s end‑effector (or other links) touches the environment or object.
- These points are fed to a neural utility model that predicts the expected “task success” for that contact configuration.
-
Utility Model Library:
- Instead of a single end‑to‑end policy, CAP builds a collection of lightweight models (one per skill or contact primitive).
- At runtime, the system selects and composes the relevant utilities to generate a full action sequence.
-
EgoGym Simulation Loop:
- A stripped‑down physics simulator that mirrors the real robot’s kinematics and sensor suite.
- Researchers run large‑scale sweeps of contact configurations, automatically flagging those that lead to failures (e.g., slippage, unreachable poses).
- Failure cases are fed back into the data collection pipeline, either by augmenting the simulation or by guiding additional real‑world demos.
-
Training & Deployment:
- The utility models are trained with supervised learning on the 23 h of demonstrations, using a simple binary success label.
- During deployment, a planner samples candidate contacts, queries the utility models, and executes the highest‑scoring plan on the physical robot.
Results & Findings
| Skill | Zero‑shot success (CAP) | Zero‑shot success (state‑of‑the‑art VLA) | Relative gain |
|---|---|---|---|
| Pick‑and‑Place | 78 % | 50 % | +56 % |
| Drawer Opening | 71 % | 45 % | +58 % |
| Tool Use (lever) | 64 % | 41 % | +56 % |
- Generalization: CAP transferred to new robot arms (different kinematic chains) and novel tabletop layouts without any fine‑tuning.
- Sample Efficiency: The same performance level would require >200 h of data for language‑conditioned baselines.
- Simulation‑Real Gap: The EgoGym loop reduced the sim‑to‑real discrepancy to <5 % in success rates, a dramatic improvement over naïve sim‑only training.
Practical Implications
- Faster Prototyping: Engineers can iterate on manipulation pipelines in seconds using EgoGym, dramatically cutting down hardware testing cycles.
- Robust Deployment: By anchoring policies to physical contacts, robots become less prone to misinterpretations of ambiguous language commands, leading to safer operation in unstructured environments (e.g., warehouses, homes).
- Modular Skill Libraries: Companies can build a catalog of reusable contact utilities (grasp, push, slide) that can be mixed‑and‑matched for new tasks, reducing the need for task‑specific retraining.
- Lower Data Costs: Small‑scale data collection (a few dozen hours) is sufficient, making robot learning feasible for startups and research labs without massive data‑labeling budgets.
- Hardware‑agnostic Solutions: Because contact points are expressed in world coordinates, the same utility models can be deployed on different robot platforms with minimal calibration.
Limitations & Future Work
- Contact Annotation Overhead: The current pipeline still requires manual labeling of contact points in demonstrations, which may not scale to highly complex tasks.
- Limited Skill Set: The study focuses on three fundamental manipulation primitives; extending CAP to high‑dimensional tasks like assembly or deformable‑object handling remains open.
- Simulation Fidelity: While EgoGym is lightweight, it abstracts away fine‑grained dynamics (e.g., friction variations) that could affect performance on highly tactile tasks.
- Real‑World Perception: The approach assumes accurate 3‑D perception of contact locations; noisy depth sensors could degrade utility predictions.
Future research directions include automated contact extraction from raw video, expanding the utility library to cover a broader taxonomy of contacts, and integrating tactile feedback to refine contact‑conditioned policies on‑the‑fly.
Authors
- Zichen Jeff Cui
- Omar Rayyan
- Haritheja Etukuru
- Bowen Tan
- Zavier Andrianarivo
- Zicheng Teng
- Yihang Zhou
- Krish Mehta
- Nicholas Wojno
- Kevin Yuanbo Wu
- Manan H Anjaria
- Ziyuan Wu
- Manrong Mao
- Guangxun Zhang
- Binit Shah
- Yejin Kim
- Soumith Chintala
- Lerrel Pinto
- Nur Muhammad Mahi Shafiullah
Paper Information
- arXiv ID: 2602.09017v1
- Categories: cs.RO, cs.LG
- Published: February 9, 2026
- PDF: Download PDF