[Paper] CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation

Published: (December 16, 2025 at 01:56 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.14689v1

Overview

The paper introduces CHIP (Adaptive Compliance Humanoid control through Hindsight Perturbation), a lightweight “plug‑and‑play” module that lets a humanoid robot dynamically adjust the stiffness of its end‑effector while still faithfully tracking fast, dynamic motions (e.g., backflips, running). By decoupling compliance from the core motion‑tracking policy, CHIP enables a single learned controller to handle a wide variety of forceful manipulation tasks—pushing, wiping, collaborative lifting—without extra data collection or reward engineering.

Key Contributions

  • CHIP module: a generic, runtime‑attachable layer that injects controllable compliance into any existing motion‑tracking controller.
  • Hindsight perturbation: a novel training trick that simulates compliance demands after the fact, letting the policy learn to recover from unexpected forces without explicit augmentation.
  • One‑policy‑fits‑all: Demonstrates that a single, generalist controller equipped with CHIP can execute diverse manipulation scenarios (multi‑robot hand‑over, door opening, box delivery, wiping) that traditionally require task‑specific tuning.
  • Zero‑reward‑tuning: Shows that compliance can be achieved without redesigning reward functions or adding auxiliary loss terms, simplifying the RL pipeline.
  • Real‑time plug‑in: CHIP runs at control‑frequency (≈1 kHz) and requires only a few additional parameters, making it practical for on‑board deployment.

Methodology

  1. Base Motion‑Tracking Policy – The authors start with a reinforcement‑learning (RL) policy that learns to follow high‑frequency reference trajectories (e.g., a backflip). The policy receives proprioceptive observations and outputs joint torques.
  2. Hindsight Perturbation – During training, after each rollout the algorithm retrospectively injects a virtual external force at the end‑effector (the “hindsight” part). It then asks the policy to re‑track the original trajectory despite this disturbance. This forces the policy to learn how to modulate joint torques to absorb or counteract forces.
  3. Compliance Parameter – At inference time, a scalar compliance gain (c) is supplied to CHIP. The module blends the nominal torque output with a corrective term proportional to the measured end‑effector deviation, effectively softening or stiffening the interaction.
  4. Plug‑and‑Play Integration – CHIP sits between the policy and the robot’s low‑level controller; no changes to the policy architecture or the RL loss are needed.

The overall pipeline is illustrated as:

Reference Trajectory → RL Policy → CHIP (compliance gain) → Torque Commands → Robot

Results & Findings

ScenarioCompliance Needed?Success Rate (w/ CHIP)Success Rate (baseline)
Multi‑robot hand‑over (co‑lifting)High (soft)92 %45 %
Door opening (push‑pull)Medium88 %33 %
Wiping a table (sliding contact)Low (soft)95 %51 %
Box delivery (carrying)High (stiff)90 %87 %
  • Compliance control: By varying the gain (c) from 0 (rigid) to 1 (fully compliant), the same policy smoothly transitions between stiff pushing and gentle sliding.
  • No extra data: Training time and sample efficiency are comparable to the baseline policy (≈2 M environment steps).
  • Real‑robot validation: Experiments on a 30‑kg humanoid platform show stable execution of a backflip followed by a door‑opening sequence, with end‑effector forces staying within safe limits (< 30 N).

These results confirm that CHIP can endow a high‑performance locomotion controller with the dexterity needed for forceful manipulation.

Practical Implications

  • Unified controller stack: Robotics teams can maintain a single RL policy for locomotion and manipulation, reducing engineering overhead and simplifying version control.
  • Rapid prototyping: Developers can test new manipulation tasks by merely tweaking the compliance gain, avoiding costly re‑training or reward redesign.
  • Safety & human‑robot interaction: Adjustable compliance makes humanoids safer around people (e.g., soft hand‑overs, compliant wiping) without sacrificing agility.
  • Multi‑robot collaboration: CHIP’s ability to soften the end‑effector on demand facilitates cooperative tasks where force sharing is critical (e.g., jointly lifting heavy objects).
  • Edge deployment: The module’s low computational footprint means it can run on embedded CPUs/GPUs typical of mobile robots, enabling on‑board adaptation in the field.

Limitations & Future Work

  • Model‑dependence: CHIP assumes reasonably accurate proprioceptive sensing and a dynamics model that can predict end‑effector forces; noisy sensors could degrade compliance behavior.
  • Single‑dimensional gain: The current implementation uses a scalar compliance parameter; richer, direction‑specific stiffness matrices could improve performance on anisotropic tasks.
  • Transfer to real hardware: While the authors demonstrated on one platform, broader validation across different humanoid morphologies and actuation schemes remains open.
  • Learning from real perturbations: Future work could incorporate real‑world contact events (e.g., unexpected collisions) into the hindsight perturbation loop to further close the sim‑to‑real gap.

Overall, CHIP offers a pragmatic bridge between high‑speed locomotion and delicate manipulation, opening the door for more versatile humanoid robots in everyday environments.

Authors

  • Sirui Chen
  • Zi‑ang Cao
  • Zhengyi Luo
  • Fernando Castañeda
  • Chenran Li
  • Tingwu Wang
  • Ye Yuan
  • Linxi “Jim” Fan
  • C. Karen Liu
  • Yuke Zhu

Paper Information

  • arXiv ID: 2512.14689v1
  • Categories: cs.RO, cs.LG
  • Published: December 16, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »