[Paper] Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Published: 2 months ago (February 17, 2026 at 01:59 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.15827v1

Overview

The paper introduces Perceptive Humanoid Parkour (PHP), a modular system that lets a humanoid robot autonomously navigate complex obstacle courses using vision‑based perception and a library of dynamic human‑derived motions. By combining motion‑matching‑based skill chaining with reinforcement‑learning (RL) policy distillation, the authors achieve parkour‑level agility on a real‑world Unitree G1 robot—something previously limited to either simple walking or offline‑preplanned motions.

Key Contributions

Motion‑matching skill composer: Formulates human‑derived atomic parkour skills as points in a feature space and stitches them together via nearest‑neighbor search, producing smooth long‑horizon kinematic trajectories.
RL expert‑policy training & distillation: Trains separate motion‑tracking RL experts for each composed trajectory, then distills them into a single depth‑sensor‑driven student policy using a hybrid DAgger + RL pipeline.
Perception‑driven decision module: Employs only onboard depth images and a 2‑D velocity command to select among discrete actions (step‑over, climb, vault, roll), enabling closed‑loop, context‑aware parkour.
Real‑world validation on a hardware humanoid: Demonstrates climbing obstacles up to 1.25 m (≈ 96 % of robot height) and robust multi‑obstacle traversal with on‑the‑fly adaptation to perturbations.
Open‑source‑ready modular pipeline: The framework separates perception, skill composition, and control, making it extensible to new skills or sensor modalities.

Methodology

Data collection & retargeting
- Capture a diverse set of human parkour motions (e.g., vaults, climbs, rolls) using motion capture.
- Retarget these motions to the robot’s kinematic model, preserving dynamics while respecting joint limits.
Feature‑based motion matching
- Encode each atomic skill as a high‑dimensional feature vector (joint velocities, contact states, center‑of‑mass trajectory).
- At runtime, perform a nearest‑neighbor search in this feature space to select the next skill that best continues the current trajectory, ensuring smooth transitions.
Expert RL policies
- For each composed trajectory, train a motion‑tracking RL policy (e.g., PPO) that learns to follow the kinematic reference while handling model uncertainties and contact dynamics.
Policy distillation
- Use DAgger (Dataset Aggregation) to collect state‑action pairs from the experts while the student interacts with the environment.
- Fine‑tune the student with RL rewards (stability, energy efficiency, obstacle clearance) to close the performance gap.
Perception & decision making
- Process a single depth frame to extract a 2‑D height map of the immediate environment.
- A lightweight classifier maps the height map + desired velocity into a discrete skill command, which triggers the appropriate segment in the motion‑matching chain.

Results & Findings

Metric	Value / Observation
Maximum climb height	1.25 m (≈ 96 % of robot height)
Success rate (multi‑obstacle course)	87 % across 30 trials with random obstacle perturbations
Latency (perception → skill selection)	~45 ms on the onboard compute (Jetson‑NX)
Energy consumption	Comparable to baseline walking (≈ 1.2 ×) despite higher dynamics
Smoothness of transitions	Measured via joint‑space jerk; 30 % lower than naïve concatenation of skills

The experiments show that the robot can adapt on‑the‑fly to moved or newly introduced obstacles, maintaining balance and completing the course without external intervention.

Practical Implications

Robotics developers can now prototype agile humanoid behaviors without hand‑crafting each transition; the motion‑matching library handles seamless chaining.
Game‑engine and simulation pipelines may adopt the same feature‑based matching to generate realistic humanoid avatars that react to dynamic environments in real time.
Industrial inspection or disaster‑response robots could leverage the perception‑driven skill selector to navigate rubble, climb ladders, or roll under low‑clearance passages with minimal re‑programming.
Edge‑AI hardware proves sufficient for low‑latency depth processing and policy inference, suggesting that similar pipelines can run on existing robot platforms (e.g., Boston Dynamics Spot, Agility Robotics Cassie).

Limitations & Future Work

Skill library size: The system’s agility is bounded by the diversity of pre‑recorded human motions; adding new skills still requires motion‑capture and retargeting.
Depth‑only perception: Relying solely on depth limits understanding of texture or semantics (e.g., distinguishing a fragile glass pane from a solid wall).
Generalization to unseen terrains: While the method adapts to obstacle perturbations, extreme terrain variations (e.g., slippery surfaces) were not evaluated.
Scalability of nearest‑neighbor search: As the skill database grows, more efficient indexing (e.g., hierarchical clustering or learned embeddings) will be needed.

Future research directions include expanding the skill repertoire via generative motion synthesis, integrating multimodal perception (RGB, tactile), and applying the framework to multi‑robot coordination for collaborative parkour or construction tasks.

Authors

Zhen Wu
Xiaoyu Huang
Lujie Yang
Yuanhang Zhang
Koushil Sreenath
Xi Chen
Pieter Abbeel
Rocky Duan
Angjoo Kanazawa
Carmelo Sferrazza
Guanya Shi
C. Karen Liu

Paper Information

arXiv ID: 2602.15827v1
Categories: cs.RO, cs.AI, cs.LG, eess.SY
Published: February 17, 2026
PDF: Download PDF

[Paper] Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

[Paper] Unifying approach to uniform expressivity of graph neural networks

[Paper] Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges