[Paper] Visual Heading Prediction for Autonomous Aerial Vehicles

Published: (December 10, 2025 at 01:27 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.09898v1

Overview

The paper presents a vision‑only, data‑driven pipeline that lets an autonomous drone (UAV) reliably locate a ground robot (UGV) and compute the exact heading it must turn to align with the robot—without relying on GPS or external motion‑capture systems. By combining a fine‑tuned YOLOv5 detector with a tiny neural network for angle regression, the authors achieve sub‑degree heading accuracy using only a single onboard camera, opening the door for UAV‑UGV teamwork in GPS‑denied or indoor settings.

Key Contributions

  • Real‑time UGV detection with a YOLOv5 model reaching ≈95 % accuracy on a custom dataset of >13 k annotated images.
  • Lightweight heading‑prediction ANN that consumes bounding‑box geometry and outputs a heading angle with MAE = 0.1506° and RMSE = 0.1957°.
  • End‑to‑end, infrastructure‑independent framework that works with a single monocular camera, eliminating the need for GNSS, LiDAR, or external motion‑capture rigs.
  • Comprehensive dataset and training pipeline (VICON‑ground‑truthed images) released publicly, facilitating reproducibility and further research.
  • Live demo showing UAV‑UGV alignment under dynamic conditions, confirming the approach’s feasibility for real‑world missions.

Methodology

  1. Data Collection – A VICON motion‑capture system recorded precise UAV and UGV poses while a downward‑facing RGB camera captured the scene. Over 13 k frames were manually annotated with UGV bounding boxes and the corresponding ground‑truth heading angles.
  2. Object Detection – The authors fine‑tuned YOLOv5 (a popular single‑stage detector) on the annotated set. The model runs at >30 fps on a modest GPU, outputting the UGV’s bounding box (center, width, height).
  3. Feature Extraction – From each bounding box they compute simple geometric cues (relative size, offset from image center) that correlate with the UAV’s relative orientation to the UGV.
  4. Heading Regression – A shallow feed‑forward ANN (2 hidden layers, ~200 parameters) ingests these cues and predicts the required yaw angle for the UAV to face the UGV. The network is trained with mean‑squared‑error loss against the VICON headings.
  5. Inference Loop – In deployment, the UAV captures a frame, runs YOLOv5, feeds the bounding‑box features to the ANN, and instantly commands a yaw adjustment to align with the ground robot.

Results & Findings

  • Detection: YOLOv5 achieved 95 % precision/recall on a held‑out test split, with an average inference time of 12 ms per frame.
  • Angle Prediction: The ANN’s MAE of 0.1506° and RMSE of 0.1957° indicate that the predicted heading is virtually indistinguishable from the ground‑truth, even when the UGV appears at various distances and orientations.
  • Robustness: Experiments with moving UGVs and varying lighting showed the system maintaining sub‑degree accuracy, confirming resilience to moderate visual disturbances.
  • Real‑time Performance: The full pipeline (detection + regression) runs at ~25 fps on an NVIDIA Jetson Xavier, satisfying typical UAV control loop requirements.

Practical Implications

  • GPS‑Denied Operations – Search‑and‑rescue, indoor inspection, or subterranean missions can now rely on pure vision for UAV‑UGV coordination, reducing hardware cost and mission risk.
  • Swarm Scalability – Because the model is lightweight, multiple drones can run the pipeline concurrently on edge devices, enabling larger multi‑robot teams without centralized processing.
  • Plug‑and‑Play Integration – The approach works with any monocular RGB camera and can be dropped into existing UAV flight stacks (e.g., PX4, ROS) with minimal code changes.
  • Rapid Prototyping – The released dataset and training scripts let developers fine‑tune the system for different ground‑robot shapes, colors, or camera placements, accelerating custom deployments.
  • Safety & Redundancy – Vision‑only heading estimation provides a fallback when GNSS signals are jammed or spoofed, enhancing overall system robustness.

Limitations & Future Work

  • Controlled Environment Bias – Training data were captured in a lab with relatively uniform backgrounds; performance in cluttered outdoor scenes remains to be validated.
  • Single‑UGV Focus – The current model assumes one target per frame; handling multiple ground robots or occlusions will require additional detection and data association logic.
  • Depth Ambiguity – Using only a monocular camera limits absolute distance estimation; integrating lightweight depth cues (e.g., stereo or monocular depth networks) could improve long‑range alignment.
  • Dynamic Lighting & Weather – Future work should test robustness under harsh illumination, rain, or dust, possibly by augmenting training data or employing domain‑adaptation techniques.

Overall, the paper delivers a practical, low‑cost solution for UAV‑UGV heading alignment that can be immediately leveraged by developers building autonomous multi‑robot systems for GPS‑challenged environments.

Authors

  • Reza Ahmari
  • Ahmad Mohammadi
  • Vahid Hemmati
  • Mohammed Mynuddin
  • Parham Kebria
  • Mahmoud Nabil Mahmoud
  • Xiaohong Yuan
  • Abdollah Homaifar

Paper Information

  • arXiv ID: 2512.09898v1
  • Categories: cs.RO, cs.AI, cs.CV, cs.MA, eess.SY
  • Published: December 10, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »