[Paper] Accelerated Rotation-Invariant Convolution for UAV Image Segmentation

Published: (December 9, 2025 at 01:30 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.08888v1

Overview

The paper presents a GPU‑optimized rotation‑invariant convolution layer that dramatically speeds up UAV (drone) image segmentation while keeping accuracy on par with the best existing methods. By removing the costly “im2col” data‑lowering step and sharing computation across rotated filter copies, the authors achieve up to 45 % faster training and significant energy savings, making rotation‑aware deep nets practical for real‑time aerial‑vision pipelines.

Key Contributions

  • Novel convolution kernel that natively handles multiple orientations without expanding the filter bank into separate weight matrices.
  • Elimination of the im2col step, reducing memory traffic and eliminating redundant matrix‑multiplication work.
  • Generalization to arbitrary (non‑symmetric) rotation angles, enabling fine‑grained orientation handling.
  • GPU‑level implementation that outperforms cuDNN by 20‑55 % in training speed and 15‑45 % in energy consumption across a range of input sizes.
  • Integration with U‑Net, delivering up to 6 % higher segmentation accuracy on UAV datasets compared with a standard, rotation‑agnostic baseline.

Methodology

  1. Rotated Filter Sharing – Instead of storing a separate filter for each orientation, the algorithm stores a single base filter and generates rotated versions on‑the‑fly using a lightweight index‑mapping scheme. Because many pixel accesses are shared across orientations, the method re‑uses the same memory reads.
  2. Matrix‑Multiplication‑Free Convolution – Traditional GPU convolutions first reshape the input (im2col) into a large matrix, then call a GEMM routine. The authors bypass this step, directly streaming the input through a custom CUDA kernel that computes the dot‑product for all orientations in one pass.
  3. Arbitrary Angle Support – For angles that do not align with the filter’s symmetry (e.g., 13°, 27°), the kernel interpolates filter weights using a pre‑computed rotation table, preserving the same low‑overhead data flow.
  4. Benchmark Suite – The authors evaluate on synthetic and real UAV datasets, comparing against cuDNN, group‑equivariant CNNs, and other rotation‑invariant baselines.

Results & Findings

SettingSpeedup vs. cuDNNEnergy ReductionSegmentation mIoU (U‑Net)
8 orientations, 256×256 input+45 %‑41 %+4 % over baseline
8 orientations, 1024×1024 input+32 %‑23 %+6 % over baseline
Arbitrary angles (13°, 27°, …)+20‑55 %+15‑45 %Comparable to state‑of‑the‑art equivariant nets

The method delivers consistent speed and power gains across resolutions while maintaining (or slightly improving) segmentation quality.

Practical Implications

  • Real‑time UAV analytics – Faster, lower‑power convolutions enable on‑board processing for tasks like crop monitoring, infrastructure inspection, or search‑and‑rescue, where drones have limited compute and battery.
  • Edge deployment – The reduced memory bandwidth makes the layer attractive for edge GPUs (Jetson, Coral) and even FPGA‑based accelerators.
  • Simplified model design – Developers can add rotation invariance by swapping a standard Conv2D layer with the proposed one, without redesigning the whole architecture or inflating the parameter count.
  • Energy‑aware training – Data‑center training jobs that involve massive aerial‑image datasets can cut electricity costs by up to 45 %, which translates to lower cloud‑compute bills.

Limitations & Future Work

  • The current implementation targets NVIDIA CUDA GPUs; porting to other back‑ends (AMD, Intel, or mobile GPUs) will require additional engineering.
  • While arbitrary angles are supported, the interpolation accuracy may degrade for very fine angular steps, potentially affecting tasks that need sub‑degree precision.
  • The authors focus on U‑Net‑style encoder‑decoder segmentation; extending the approach to detection or instance‑segmentation pipelines remains an open question.
  • Future research could explore joint learning of the rotation set (i.e., learning which orientations matter most) and hardware‑level co‑design to further squeeze memory traffic.

Authors

  • Manduhu Manduhu
  • Alexander Dow
  • Gerard Dooly
  • James Riordan

Paper Information

  • arXiv ID: 2512.08888v1
  • Categories: cs.CV, cs.RO
  • Published: December 9, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »