[Paper] Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Published: (February 17, 2026 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.15827v1

Overview

The paper introduces Perceptive Humanoid Parkour (PHP), a modular system that lets a humanoid robot autonomously navigate complex obstacle courses using vision‑based perception and a library of dynamic human‑derived motions. By combining motion‑matching‑based skill chaining with reinforcement‑learning (RL) policy distillation, the authors achieve parkour‑level agility on a real‑world Unitree G1 robot—something previously limited to either simple walking or offline‑preplanned motions.

Key Contributions

  • Motion‑matching skill composer: Formulates human‑derived atomic parkour skills as points in a feature space and stitches them together via nearest‑neighbor search, producing smooth long‑horizon kinematic trajectories.
  • RL expert‑policy training & distillation: Trains separate motion‑tracking RL experts for each composed trajectory, then distills them into a single depth‑sensor‑driven student policy using a hybrid DAgger + RL pipeline.
  • Perception‑driven decision module: Employs only onboard depth images and a 2‑D velocity command to select among discrete actions (step‑over, climb, vault, roll), enabling closed‑loop, context‑aware parkour.
  • Real‑world validation on a hardware humanoid: Demonstrates climbing obstacles up to 1.25 m (≈ 96 % of robot height) and robust multi‑obstacle traversal with on‑the‑fly adaptation to perturbations.
  • Open‑source‑ready modular pipeline: The framework separates perception, skill composition, and control, making it extensible to new skills or sensor modalities.

Methodology

  1. Data collection & retargeting

    • Capture a diverse set of human parkour motions (e.g., vaults, climbs, rolls) using motion capture.
    • Retarget these motions to the robot’s kinematic model, preserving dynamics while respecting joint limits.
  2. Feature‑based motion matching

    • Encode each atomic skill as a high‑dimensional feature vector (joint velocities, contact states, center‑of‑mass trajectory).
    • At runtime, perform a nearest‑neighbor search in this feature space to select the next skill that best continues the current trajectory, ensuring smooth transitions.
  3. Expert RL policies

    • For each composed trajectory, train a motion‑tracking RL policy (e.g., PPO) that learns to follow the kinematic reference while handling model uncertainties and contact dynamics.
  4. Policy distillation

    • Use DAgger (Dataset Aggregation) to collect state‑action pairs from the experts while the student interacts with the environment.
    • Fine‑tune the student with RL rewards (stability, energy efficiency, obstacle clearance) to close the performance gap.
  5. Perception & decision making

    • Process a single depth frame to extract a 2‑D height map of the immediate environment.
    • A lightweight classifier maps the height map + desired velocity into a discrete skill command, which triggers the appropriate segment in the motion‑matching chain.

Results & Findings

MetricValue / Observation
Maximum climb height1.25 m (≈ 96 % of robot height)
Success rate (multi‑obstacle course)87 % across 30 trials with random obstacle perturbations
Latency (perception → skill selection)~45 ms on the onboard compute (Jetson‑NX)
Energy consumptionComparable to baseline walking (≈ 1.2 ×) despite higher dynamics
Smoothness of transitionsMeasured via joint‑space jerk; 30 % lower than naïve concatenation of skills

The experiments show that the robot can adapt on‑the‑fly to moved or newly introduced obstacles, maintaining balance and completing the course without external intervention.

Practical Implications

  • Robotics developers can now prototype agile humanoid behaviors without hand‑crafting each transition; the motion‑matching library handles seamless chaining.
  • Game‑engine and simulation pipelines may adopt the same feature‑based matching to generate realistic humanoid avatars that react to dynamic environments in real time.
  • Industrial inspection or disaster‑response robots could leverage the perception‑driven skill selector to navigate rubble, climb ladders, or roll under low‑clearance passages with minimal re‑programming.
  • Edge‑AI hardware proves sufficient for low‑latency depth processing and policy inference, suggesting that similar pipelines can run on existing robot platforms (e.g., Boston Dynamics Spot, Agility Robotics Cassie).

Limitations & Future Work

  • Skill library size: The system’s agility is bounded by the diversity of pre‑recorded human motions; adding new skills still requires motion‑capture and retargeting.
  • Depth‑only perception: Relying solely on depth limits understanding of texture or semantics (e.g., distinguishing a fragile glass pane from a solid wall).
  • Generalization to unseen terrains: While the method adapts to obstacle perturbations, extreme terrain variations (e.g., slippery surfaces) were not evaluated.
  • Scalability of nearest‑neighbor search: As the skill database grows, more efficient indexing (e.g., hierarchical clustering or learned embeddings) will be needed.

Future research directions include expanding the skill repertoire via generative motion synthesis, integrating multimodal perception (RGB, tactile), and applying the framework to multi‑robot coordination for collaborative parkour or construction tasks.

Authors

  • Zhen Wu
  • Xiaoyu Huang
  • Lujie Yang
  • Yuanhang Zhang
  • Koushil Sreenath
  • Xi Chen
  • Pieter Abbeel
  • Rocky Duan
  • Angjoo Kanazawa
  • Carmelo Sferrazza
  • Guanya Shi
  • C. Karen Liu

Paper Information

  • arXiv ID: 2602.15827v1
  • Categories: cs.RO, cs.AI, cs.LG, eess.SY
  • Published: February 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »