[Paper] Hero-Mamba: Mamba-based Dual Domain Learning for Underwater Image Enhancement

Published: (April 17, 2026 at 01:24 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2604.16266v1

Overview

Underwater photography is plagued by color casts, low contrast, and blurry details caused by light absorption and scattering. The new Hero‑Mamba model tackles these issues by combining a lightweight Mamba‑based architecture with a dual‑domain learning strategy that processes both the raw RGB image and its frequency‑domain (FFT) representation. The result is a fast, high‑quality enhancer that outperforms current CNN and Transformer‑based methods on standard benchmarks.

Key Contributions

  • Dual‑domain learning: Simultaneously feeds spatial (RGB) and spectral (FFT) data into the network, allowing it to separate color/brightness degradation from texture/noise artifacts.
  • Mamba‑based SS2D blocks: Leverages state‑space sequence modeling (Mamba) to capture global context with linear computational complexity, avoiding the quadratic cost of Vision Transformers while still modeling long‑range dependencies.
  • ColorFusion block with background‑light prior: Introduces a physics‑inspired prior to guide accurate color restoration, improving hue fidelity in challenging underwater scenes.
  • State‑of‑the‑art performance: Achieves PSNR = 25.802 and SSIM = 0.913 on the LSUI dataset, surpassing existing CNN and Transformer baselines.
  • Efficient inference: The linear‑complexity backbone enables real‑time processing of high‑resolution underwater footage on commodity GPUs.

Methodology

  1. Input preparation – Each underwater image is transformed into two parallel streams:

    • Spatial stream: The original RGB image.
    • Spectral stream: The magnitude of its 2‑D Fast Fourier Transform (FFT), which emphasizes repetitive patterns and high‑frequency noise.
  2. Feature extraction with SS2D blocks – Both streams pass through a stack of SS2D (Spatial‑Spectral 2‑Dimensional) Mamba blocks. These blocks treat the image as a sequence along both height and width, applying a state‑space model that captures dependencies across the entire frame with O(N) cost (where N is the number of pixels).

  3. Cross‑domain fusion – Features from the two streams are merged via concatenation and a lightweight attention mechanism, allowing the network to learn how color/brightness cues (from RGB) interact with texture/noise cues (from FFT).

  4. ColorFusion module – A dedicated sub‑network receives the fused features together with a background light prior (estimated from the darkest regions of the image). This prior steers the color correction process, ensuring that the restored hues align with the physical lighting conditions underwater.

  5. Reconstruction – The final enhanced RGB image is produced by a series of convolutional layers that map the fused representation back to pixel space.

The entire pipeline is end‑to‑end trainable using a combination of L1 reconstruction loss, perceptual loss (VGG‑based), and a color consistency loss that penalizes deviations from the background light prior.

Results & Findings

DatasetPSNR ↑SSIM ↑Runtime (1080Ti)
LSUI25.8020.9130.032 s / 720p
UIEB27.1 (≈ +0.6 dB over best CNN)0.925 (≈ +0.02)comparable
  • Visual quality: Hero‑Mamba restores natural blues and greens while preserving fine textures, as shown in side‑by‑side comparisons with recent Transformer‑based enhancers.
  • Generalization: The model trained on LSUI transfers well to UIEB without fine‑tuning, indicating robustness to varying water types and lighting conditions.
  • Efficiency: Thanks to the linear‑complexity SS2D blocks, inference scales gracefully to 4K frames, a regime where many Transformers become prohibitively slow.

Practical Implications

  • Real‑time underwater robotics: Autonomous underwater vehicles (AUVs) and remotely operated vehicles (ROVs) can integrate Hero‑Mamba for on‑board visual navigation, object detection, and mapping without sacrificing frame rate.
  • Marine research & conservation: Scientists can quickly clean large image archives, improving downstream analysis such as coral health assessment or species counting.
  • Consumer applications: Dive‑camera manufacturers and mobile apps can embed the model to deliver instant, high‑quality photos to end‑users, enhancing user experience and reducing post‑processing effort.
  • Cross‑domain potential: The dual‑domain concept (spatial + FFT) and the Mamba‑based backbone can be adapted to other imaging problems where frequency‑domain cues are valuable, e.g., low‑light photography, medical ultrasound denoising, or satellite image restoration.

Limitations & Future Work

  • Prior estimation sensitivity: The background light prior assumes relatively uniform ambient illumination; in highly turbid or multi‑light environments the estimate can be noisy, affecting color fidelity.
  • Training data bias: The model was trained primarily on clear‑water datasets; performance on extreme conditions (e.g., deep‑sea, heavy particulate matter) may degrade.
  • Future directions: The authors plan to explore adaptive prior learning, incorporate depth or polarization cues, and extend the dual‑domain framework to video sequences with temporal consistency constraints.

Hero‑Mamba demonstrates that marrying efficient state‑space models with a clever dual‑domain input can finally give developers a practical, high‑performance tool for underwater image enhancement.

Authors

  • Tejeswar Pokuri
  • Shivarth Rai

Paper Information

  • arXiv ID: 2604.16266v1
  • Categories: cs.CV
  • Published: April 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »