[Paper] Enhancing Hazy Wildlife Imagery: AnimalHaze3k and IncepDehazeGan
Source: arXiv - 2604.16284v1
Overview
The paper tackles a practical problem that many wildlife‑monitoring projects face: dense atmospheric haze that makes animal photos look murky and hampers computer‑vision pipelines. The authors introduce a new synthetic hazy dataset (AnimalHaze3k) and a GAN‑based de‑hazing model (IncepDehazeGan) that together boost the quality of hazy wildlife images and dramatically improve downstream detection performance.
Key Contributions
- AnimalHaze3k dataset – 3,477 synthetically hazed wildlife images derived from 1,159 clean photos using a physics‑based haze model, filling a gap in publicly available data for this niche.
- IncepDehazeGan architecture – A novel GAN that fuses Inception blocks with residual skip connections, tailored for de‑hazing natural‑scene wildlife imagery.
- State‑of‑the‑art restoration metrics – Achieves SSIM = 0.8914, PSNR = 20.54 dB, and LPIPS = 0.1104, outperforming prior de‑hazing methods by +6.27 % SSIM and +10.2 % PSNR.
- Downstream impact demonstration – When de‑hazed images are fed to YOLOv11, detection mAP more than doubles (↑112 %) and IoU rises by 67 %, proving the practical value of the restoration step.
- Open‑source release – Code, pretrained weights, and the AnimalHaze3k dataset are made publicly available, encouraging reproducibility and community extensions.
Methodology
- Synthetic haze generation – Starting from high‑resolution, haze‑free wildlife photos, the authors apply a physically grounded atmospheric scattering model (Koschmieder’s law) to simulate varying haze densities, illumination, and depth cues. This yields a paired dataset of clean ↔ hazy images.
- IncepDehazeGan design
- Generator: Stacks of Inception modules capture multi‑scale texture details, while residual skip connections preserve low‑level structure and ease gradient flow.
- Discriminator: A PatchGAN classifier judges realism at the patch level, encouraging the generator to produce locally consistent de‑hazed textures.
- Losses: Combined adversarial loss, L1 reconstruction loss, and perceptual loss (VGG‑based) to balance pixel fidelity and visual quality.
- Training pipeline – The model is trained on AnimalHaze3k with standard data augmentations (random crops, flips) and Adam optimizer. Validation uses both quantitative metrics (SSIM, PSNR, LPIPS) and a downstream object‑detection benchmark (YOLOv11 on de‑hazed vs. hazy inputs).
Results & Findings
| Metric | IncepDehazeGan | Best Prior Method |
|---|---|---|
| SSIM | 0.8914 (+6.27 %) | 0.8392 |
| PSNR (dB) | 20.54 (+10.2 %) | 18.66 |
| LPIPS | 0.1104 (lower is better) | 0.1587 |
- Visual quality: Side‑by‑side comparisons show sharper animal outlines, restored color contrast, and reduced halo artifacts.
- Detection boost: Feeding de‑hazed frames to YOLOv11 lifts mean Average Precision from 0.42 (hazy) to 0.89 (de‑hazed) and IoU from 0.48 to 0.80.
- Generalization: The model retains performance on real‑world hazy wildlife footage not seen during training, indicating the synthetic data captures essential haze characteristics.
Practical Implications
- Field deployments: Conservation drones or camera traps operating in foggy or smoky environments can run IncepDehazeGan on‑device (lightweight inference ~30 ms on an NVIDIA Jetson Nano) to clean images before analysis.
- Improved analytics pipelines: Cleaner frames translate to higher detection recall, more reliable animal counts, and better behavior tracking—critical for population monitoring and anti‑poaching alerts.
- Cross‑domain utility: The architecture is agnostic to the subject matter, so developers can adapt it for other outdoor domains (e.g., autonomous driving in haze, aerial surveying).
- Data augmentation: AnimalHaze3k can serve as a benchmark for training robust models that are resilient to atmospheric disturbances, reducing the need for costly field data collection under adverse weather.
Limitations & Future Work
- Synthetic vs. real haze: Although the physics‑based generator mimics many haze properties, subtle real‑world factors (e.g., aerosol composition, dynamic lighting) may still cause a domain gap.
- Resolution ceiling: The current model is trained on 512×512 patches; scaling to ultra‑high‑resolution imagery may require architectural tweaks or memory‑efficient training tricks.
- Temporal consistency: For video streams, frame‑wise de‑hazing can introduce flicker; extending the model with temporal loss or recurrent modules is a promising direction.
- Broader ecological validation: The paper evaluates detection on a single YOLO version; testing across diverse species detectors and multi‑species datasets would solidify the claimed conservation impact.
Overall, the combination of a purpose‑built dataset and a high‑performing de‑hazing GAN offers developers a ready‑to‑use toolset for tackling haze‑related visual challenges in wildlife monitoring and beyond.
Authors
- Shivarth Rai
- Tejeswar Pokuri
Paper Information
- arXiv ID: 2604.16284v1
- Categories: cs.CV
- Published: April 17, 2026
- PDF: Download PDF