[Paper] Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit
Source: arXiv - 2512.10934v1
Overview
This paper tackles a notoriously hard problem: getting a drone to fly autonomously through narrow, curved tubes it has never seen before. By combining reinforcement learning (RL) with a curriculum‑learning training schedule, the authors teach a UAV to rely only on on‑board LiDAR and a fleeting visual cue of the tube’s centre, outperforming a classic deterministic controller that even knows the exact centreline.
Key Contributions
- Curriculum‑based RL framework for navigating unknown 3‑D tubular conduits without a pre‑built map.
- Partial‑observability handling via a “turn‑negotiation” module that fuses LiDAR symmetry, directional memory, and intermittent visual centre detection.
- Robust comparison against a Pure Pursuit baseline that is given privileged geometric information, demonstrating RL’s ability to compensate for missing data.
- High‑fidelity simulation validation showing that policies trained in a simplified environment transfer to realistic physics and sensor noise.
- Open‑ended applicability to industrial inspection, underground pipe inspection, and minimally invasive medical robotics.
Methodology
- State Representation – The UAV receives a 1‑D LiDAR depth profile (front‑facing) and a binary flag indicating whether the tube centre is currently visible in the camera.
- Action Space – Continuous pitch and yaw commands that steer the drone forward.
- Curriculum Learning – Training starts in gently curving tubes; curvature is gradually increased, and the visual centre cue is made sparser, forcing the agent to rely more on LiDAR symmetry and memory.
- Policy Optimization – Proximal Policy Optimization (PPO) is used to learn a stochastic policy that maximizes forward progress while penalizing collisions and excessive control effort.
- Turning‑Negotiation Mechanism – A lightweight rule‑based overlay that, when the centre is lost, uses the last known direction and checks for symmetric LiDAR returns to decide whether to keep turning left or right. This module is learned jointly with the RL policy.
- Baseline – A Pure Pursuit controller that follows the exact centreline (available only to the baseline) serves as a deterministic reference.
Results & Findings
- Success Rate: The PPO‑trained UAV completed 92 % of test runs in tubes with curvature up to 1.5 m⁻¹, compared to 68 % for Pure Pursuit (despite Pure Pursuit’s perfect centreline knowledge).
- Collision Reduction: Average collisions per episode dropped from 0.45 (baseline) to 0.12 (RL).
- Generalization: Policies trained on synthetic tubes transferred to a photorealistic Unity‑based simulator with realistic aerodynamics, maintaining a >85 % success rate without additional fine‑tuning.
- Ablation Study: Removing the turning‑negotiation module caused a 30 % drop in success, confirming its critical role under partial observability.
Practical Implications
- Industrial Inspection: Companies can deploy low‑cost drones inside HVAC ducts, oil‑pipeline networks, or underground utility tunnels without needing detailed CAD models.
- Medical Robotics: The same principles could be adapted for capsule endoscopes navigating the gastrointestinal tract where visual cues are intermittent.
- Rapid Deployment: Since the method learns from simulated data, new conduit geometries can be handled by re‑running the curriculum in a virtual replica, avoiding costly field trials.
- Software Integration: The approach fits into existing ROS‑based UAV stacks; the policy can be exported as a TensorFlow/PyTorch model and run on edge hardware (e.g., NVIDIA Jetson).
Limitations & Future Work
- Sensor Assumptions: The current setup assumes a reliable 1‑D LiDAR and occasional centre detection; performance may degrade with noisier sensors or in highly reflective tubes.
- Scalability to Branching Networks: The work focuses on single, continuous tubes; handling junctions or branching networks remains an open challenge.
- Real‑World Flight Tests: Validation is limited to high‑fidelity simulation; physical flight experiments in actual conduits are needed to confirm robustness to airflow disturbances and hardware latency.
- Curriculum Design Automation: The curvature schedule is hand‑crafted; future research could automate curriculum generation based on difficulty metrics.
Bottom line: By teaching a drone to “feel” its way through darkness with a mix of learning and clever heuristics, this research pushes autonomous navigation into spaces that were previously off‑limits to robots—opening a new frontier for inspection, maintenance, and medical devices.
Authors
- Zamirddine Mari
- Jérôme Pasquet
- Julien Seinturier
Paper Information
- arXiv ID: 2512.10934v1
- Categories: cs.RO, cs.LG
- Published: December 11, 2025
- PDF: Download PDF