[Paper] CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation

Published: 1 month ago (December 16, 2025 at 01:56 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.14689v1

Overview

The paper introduces CHIP (Adaptive Compliance Humanoid control through Hindsight Perturbation), a lightweight “plug‑and‑play” module that lets a humanoid robot dynamically adjust the stiffness of its end‑effector while still faithfully tracking fast, dynamic motions (e.g., backflips, running). By decoupling compliance from the core motion‑tracking policy, CHIP enables a single learned controller to handle a wide variety of forceful manipulation tasks—pushing, wiping, collaborative lifting—without extra data collection or reward engineering.

Key Contributions

CHIP module: a generic, runtime‑attachable layer that injects controllable compliance into any existing motion‑tracking controller.
Hindsight perturbation: a novel training trick that simulates compliance demands after the fact, letting the policy learn to recover from unexpected forces without explicit augmentation.
One‑policy‑fits‑all: Demonstrates that a single, generalist controller equipped with CHIP can execute diverse manipulation scenarios (multi‑robot hand‑over, door opening, box delivery, wiping) that traditionally require task‑specific tuning.
Zero‑reward‑tuning: Shows that compliance can be achieved without redesigning reward functions or adding auxiliary loss terms, simplifying the RL pipeline.
Real‑time plug‑in: CHIP runs at control‑frequency (≈1 kHz) and requires only a few additional parameters, making it practical for on‑board deployment.

Methodology

Base Motion‑Tracking Policy – The authors start with a reinforcement‑learning (RL) policy that learns to follow high‑frequency reference trajectories (e.g., a backflip). The policy receives proprioceptive observations and outputs joint torques.
Hindsight Perturbation – During training, after each rollout the algorithm retrospectively injects a virtual external force at the end‑effector (the “hindsight” part). It then asks the policy to re‑track the original trajectory despite this disturbance. This forces the policy to learn how to modulate joint torques to absorb or counteract forces.
Compliance Parameter – At inference time, a scalar compliance gain (c) is supplied to CHIP. The module blends the nominal torque output with a corrective term proportional to the measured end‑effector deviation, effectively softening or stiffening the interaction.
Plug‑and‑Play Integration – CHIP sits between the policy and the robot’s low‑level controller; no changes to the policy architecture or the RL loss are needed.

The overall pipeline is illustrated as:

Reference Trajectory → RL Policy → CHIP (compliance gain) → Torque Commands → Robot

Results & Findings

Scenario	Compliance Needed?	Success Rate (w/ CHIP)	Success Rate (baseline)
Multi‑robot hand‑over (co‑lifting)	High (soft)	92 %	45 %
Door opening (push‑pull)	Medium	88 %	33 %
Wiping a table (sliding contact)	Low (soft)	95 %	51 %
Box delivery (carrying)	High (stiff)	90 %	87 %

Compliance control: By varying the gain (c) from 0 (rigid) to 1 (fully compliant), the same policy smoothly transitions between stiff pushing and gentle sliding.
No extra data: Training time and sample efficiency are comparable to the baseline policy (≈2 M environment steps).
Real‑robot validation: Experiments on a 30‑kg humanoid platform show stable execution of a backflip followed by a door‑opening sequence, with end‑effector forces staying within safe limits (< 30 N).

These results confirm that CHIP can endow a high‑performance locomotion controller with the dexterity needed for forceful manipulation.

Practical Implications

Unified controller stack: Robotics teams can maintain a single RL policy for locomotion and manipulation, reducing engineering overhead and simplifying version control.
Rapid prototyping: Developers can test new manipulation tasks by merely tweaking the compliance gain, avoiding costly re‑training or reward redesign.
Safety & human‑robot interaction: Adjustable compliance makes humanoids safer around people (e.g., soft hand‑overs, compliant wiping) without sacrificing agility.
Multi‑robot collaboration: CHIP’s ability to soften the end‑effector on demand facilitates cooperative tasks where force sharing is critical (e.g., jointly lifting heavy objects).
Edge deployment: The module’s low computational footprint means it can run on embedded CPUs/GPUs typical of mobile robots, enabling on‑board adaptation in the field.

Limitations & Future Work

Model‑dependence: CHIP assumes reasonably accurate proprioceptive sensing and a dynamics model that can predict end‑effector forces; noisy sensors could degrade compliance behavior.
Single‑dimensional gain: The current implementation uses a scalar compliance parameter; richer, direction‑specific stiffness matrices could improve performance on anisotropic tasks.
Transfer to real hardware: While the authors demonstrated on one platform, broader validation across different humanoid morphologies and actuation schemes remains open.
Learning from real perturbations: Future work could incorporate real‑world contact events (e.g., unexpected collisions) into the hindsight perturbation loop to further close the sim‑to‑real gap.

Overall, CHIP offers a pragmatic bridge between high‑speed locomotion and delicate manipulation, opening the door for more versatile humanoid robots in everyday environments.

Authors

Sirui Chen
Zi‑ang Cao
Zhengyi Luo
Fernando Castañeda
Chenran Li
Tingwu Wang
Ye Yuan
Linxi “Jim” Fan
C. Karen Liu
Yuke Zhu

Paper Information

arXiv ID: 2512.14689v1
Categories: cs.RO, cs.LG
Published: December 16, 2025
PDF: Download PDF

[Paper] CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting

[Paper] Adversarial Robustness of Vision in Open Foundation Models

[Paper] When Reasoning Meets Its Laws

[Paper] Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy