[Paper] AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement

Published: (December 5, 2025 at 01:56 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.05960v1

Overview

The paper introduces AQUA‑Net, a lightweight deep‑learning architecture that simultaneously tackles the color‑shift, low‑contrast, and hazy artifacts that plague underwater photography. By fusing spatial features with frequency‑domain cues and an illumination‑aware decoder, the model delivers high‑quality enhancements while keeping the parameter count low enough for real‑time deployment on embedded platforms.

Key Contributions

  • Adaptive Frequency Fusion Encoder – extracts complementary texture details from the Fourier domain and injects them into the spatial feature stream.
  • Illumination‑Aware Decoder – learns a per‑pixel illumination map (inspired by Retinex theory) to perform adaptive exposure correction and separate reflectance from lighting effects.
  • Dual‑branch Residual Encoder‑Decoder – combines the frequency and illumination pathways in a unified residual framework, preserving fine structures without blowing up model size.
  • New High‑Resolution Underwater Video Dataset – collected from the Mediterranean Sea, featuring diverse depths, turbidity levels, and lighting conditions for robust benchmarking.
  • State‑of‑the‑Art Performance with Fewer Parameters – matches or exceeds existing methods on standard benchmarks while using significantly less memory and compute.

Methodology

AQUA‑Net builds on a classic encoder‑decoder backbone but augments it with two auxiliary branches:

  1. Frequency Fusion Encoder

    • The input image is transformed with a Fast Fourier Transform (FFT).
    • Low‑frequency magnitude and high‑frequency phase components are processed through shallow convolutional blocks.
    • These frequency features are up‑sampled and concatenated with the spatial encoder’s latent representation, giving the network a richer view of texture and edge information that is often lost in underwater scattering.
  2. Illumination‑Aware Decoder

    • Mirrors the encoder’s hierarchy and predicts an illumination map L(x, y) alongside the enhanced reflectance R(x, y).
    • The final output is computed as Enhanced = R ⊙ L (element‑wise multiplication), allowing the network to adapt exposure locally—exactly what human visual perception does when adjusting to uneven lighting underwater.
    • Residual connections between encoder and decoder layers help preserve structural details.

Both branches are trained jointly with a composite loss:

  • L1 reconstruction loss on the enhanced image,
  • Perceptual loss (VGG‑based) to retain high‑level semantics,
  • Frequency consistency loss to ensure the Fourier spectrum of the output aligns with that of clean reference images.

Results & Findings

DatasetPSNR ↑SSIM ↑Params (M)
UIEB (test)28.70.921.9
RUIE27.40.891.9
New Mediterranean Video Set29.10.941.9
  • AQUA‑Net reaches parity or slight gains over heavyweight SOTA models (e.g., UWCNN, WaterNet) while using ~40 % fewer parameters.
  • Ablation studies show that removing the frequency branch drops PSNR by ~1.2 dB, and removing the illumination branch reduces SSIM by ~0.03, confirming their complementary impact.
  • Qualitative visual comparisons reveal sharper coral textures, more natural color balance, and reduced haze, especially in deep‑sea frames where traditional methods struggle.

Practical Implications

  • Real‑time underwater robotics – the low‑footprint model can run on NVIDIA Jetson or ARM‑based vision processors, enabling on‑board image enhancement for autonomous underwater vehicles (AUVs) and ROVs.
  • Marine monitoring & inspection – clearer imagery improves downstream computer‑vision tasks such as object detection, segmentation, and species classification, reducing false positives caused by color distortion.
  • Consumer underwater photography – mobile apps can integrate AQUA‑Net for instant post‑capture correction without draining battery or requiring cloud processing.
  • Dataset generation – the frequency‑fusion approach can be repurposed to synthesize realistic underwater degradations for training other vision models, accelerating research in this niche domain.

Limitations & Future Work

  • The current model assumes a single illumination map per frame; highly dynamic lighting (e.g., moving light sources) may still cause artifacts.
  • While the parameter count is low, inference speed on ultra‑low‑power microcontrollers (e.g., 8‑bit MCUs) has not been benchmarked.
  • The authors plan to explore self‑supervised training on unpaired underwater footage and to extend the frequency branch to handle multi‑scale wavelet representations for even finer texture recovery.

Authors

  • Munsif Ali
  • Najmul Hassan
  • Lucia Ventura
  • Davide Di Bari
  • Simonepietro Canese

Paper Information

  • arXiv ID: 2512.05960v1
  • Categories: cs.CV, cs.AI
  • Published: December 5, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »