[Paper] Hiking in the Wild: A Scalable Perceptive Parkour Framework for Humanoids

Published: (January 12, 2026 at 11:50 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2601.07718v1

Overview

The paper introduces “Hiking in the Wild,” a new end‑to‑end reinforcement‑learning (RL) framework that lets full‑size humanoid robots trek over rugged, unstructured terrain without relying on external state estimators (e.g., SLAM). By feeding raw depth images and joint‑level proprioceptive data directly into a single‑stage policy network, the system can walk at up to 2.5 m/s while avoiding catastrophic slips on edges—a capability that brings humanoid parkour closer to real‑world deployment.

Key Contributions

  • Scalable perceptive parkour pipeline that maps raw depth + proprioception → joint torques in a single RL pass.
  • Foothold safety mechanism combining Terrain Edge Detection with Foot Volume Points to automatically reject unsafe footholds and prevent slippage.
  • Flat‑Patch Sampling strategy that generates realistic navigation targets during training, eliminating reward‑hacking and improving policy robustness.
  • No external state estimation: the policy operates purely on on‑board sensors, sidestepping drift issues common in LiDAR‑based mapping.
  • Open‑source training and deployment code, enabling reproducible research and rapid transfer to other humanoid platforms.

Methodology

  1. Sensor Input – The robot streams a forward‑facing depth map (≈ 640×480) and a vector of joint positions/velocities. No global pose or map is used.
  2. Neural Policy Architecture – A lightweight CNN extracts terrain geometry from the depth image, while a separate MLP processes proprioception. The two embeddings are concatenated and fed into a final MLP that outputs desired joint torques.
  3. Safety Layer – Before executing the torque command, the Terrain Edge Detector (a fast edge‑filter on the depth map) identifies potential drop‑offs. Foot Volume Points (pre‑computed safe foot‑placement volumes) are cross‑checked; if a proposed foothold lies outside a safe volume, the command is clipped to a fallback stance.
  4. Training Loop – A single‑stage RL algorithm (PPO) runs in a high‑fidelity physics simulator populated with procedurally generated terrains. The Flat‑Patch Sampling module samples target footholds only on locally flat regions, ensuring the reward function reflects feasible navigation rather than exploiting simulation quirks.
  5. Domain Randomization – Visual noise, sensor latency, and actuator dynamics are randomized to bridge the sim‑to‑real gap.

Results & Findings

MetricSimulationReal‑world (full‑size humanoid)
Max speed on uneven terrain2.8 m/s2.5 m/s
Success rate on 30 m obstacle course96 %92 %
Average slip incidents per 100 m0.30.4
Policy latency (sensor → torque)12 ms15 ms
  • The safety layer reduced edge‑related falls by ≈ 85 % compared to a baseline RL policy without it.
  • Flat‑Patch Sampling eliminated reward‑hacking behaviors (e.g., “jump‑and‑land” tricks that exploit simulator physics) and improved generalization to unseen terrain types.
  • Field tests demonstrated stable locomotion on rocky slopes, loose gravel, and narrow ledges—conditions that typically break mapping‑based controllers.

Practical Implications

  • Robotics developers can now prototype rugged humanoid navigation without building a full SLAM stack, cutting development time and hardware cost.
  • The safety mechanisms are modular; they can be dropped into existing RL policies for quadrupeds or wheeled robots that also need edge awareness.
  • Industry use‑cases such as inspection of disaster sites, outdoor delivery, or construction‑site assistance become more feasible because the robot can move quickly (≈ 2.5 m/s) over unpredictable ground.
  • Open‑source code and the single‑stage architecture make it straightforward to port the approach to other platforms (e.g., Boston Dynamics Atlas, NASA Valkyrie) with minimal sensor changes.

Limitations & Future Work

  • The current system assumes a forward‑facing depth sensor; side‑view or 360° perception would be needed for truly omnidirectional navigation.
  • While domain randomization mitigates sim‑to‑real gaps, extreme weather conditions (heavy rain, snow) were not evaluated.
  • The policy still relies on dense depth data, which can be bandwidth‑heavy for low‑power robots; future work could explore sparse LiDAR or event‑camera inputs.
  • Extending the framework to multi‑robot coordination or dynamic obstacles (e.g., moving humans) remains an open research direction.

Authors

  • Shaoting Zhu
  • Ziwen Zhuang
  • Mengjie Zhao
  • Kun‑Ying Lee
  • Hang Zhao

Paper Information

  • arXiv ID: 2601.07718v1
  • Categories: cs.RO, cs.AI
  • Published: January 12, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »