[Paper] RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

Published: (January 8, 2026 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.05249v1

Overview

Night‑time photography is notoriously difficult for automatic white‑balance (AWB) algorithms because low‑light noise and mixed artificial lighting break the assumptions most color‑constancy methods rely on. The paper RL‑AWB introduces a hybrid framework that first extracts a reliable statistical estimate of the scene illumination and then refines it with a deep reinforcement‑learning (RL) agent that “tunes” the AWB parameters the way a human expert would. The authors also release a new multi‑sensor nighttime dataset, enabling cross‑camera evaluation that has been missing from prior work.

Key Contributions

  • Hybrid statistical + RL pipeline: Combines a night‑specific gray‑pixel detector with a novel illumination estimator, then uses a deep RL policy to adaptively adjust AWB settings per image.
  • First RL‑based color‑constancy model: Treats AWB as a sequential decision problem, allowing the agent to learn a policy that maximizes a perceptual quality reward.
  • Multi‑sensor nighttime dataset: 1,200 raw images captured from four different camera sensors (smartphone, mirrorless, DSLR, and a low‑cost sensor) under diverse night‑time lighting conditions.
  • Cross‑domain generalization: Demonstrates that the learned policy transfers well from low‑light to well‑illuminated scenes without retraining.
  • Open‑source implementation & demo: Code, pretrained models, and an interactive web demo are publicly available.

Methodology

  1. Statistical Pre‑processing

    • Detects salient gray pixels using a night‑adapted histogram analysis that discounts noisy dark regions.
    • Estimates an initial illumination vector by averaging the chromaticities of these gray pixels, providing a solid starting point for the RL agent.
  2. Reinforcement‑Learning Agent

    • State: Concatenation of the raw image’s global statistics (mean, variance per channel), the initial illumination estimate, and a low‑dimensional feature map from a shallow CNN.
    • Action: Small adjustments to the three AWB gain parameters (R, G, B). The agent can take up to 10 steps per image, mimicking iterative manual tweaking.
    • Reward: A perceptual metric that combines a gray‑world loss (how close the corrected image’s gray pixels are to neutral) with a structural similarity (SSIM) term to penalize over‑correction and preserve detail.
    • Training: Uses Proximal Policy Optimization (PPO) on the multi‑sensor dataset, with curriculum learning that starts on well‑lit images and gradually introduces darker, noisier scenes.
  3. Inference

    • The statistical estimator provides the initial guess; the RL policy then runs a few quick adjustment steps (typically < 5 ms on a modern GPU), yielding the final AWB‑corrected image.

Results & Findings

Metric (lower is better)Statistical BaselineRL‑AWB (Ours)State‑of‑the‑Art (DeepAWB)
Mean Angular Error (°)6.84.25.5
ΔEab (CIEDE2000)9.16.37.8
Runtime (ms)12815
  • Superior accuracy: RL‑AWB reduces the average angular error by ~38 % compared with the best existing deep‑learning AWB model, especially on the darkest images (≤ 0.01 lux).
  • Robust cross‑sensor performance: When trained on three sensors and tested on the fourth, the error increase is < 0.5°, indicating strong generalization.
  • Real‑time feasibility: The RL refinement adds only a few milliseconds, making the approach suitable for mobile or embedded pipelines.

Practical Implications

  • Mobile photography apps: Integrating RL‑AWB can dramatically improve night‑mode auto‑white‑balance without sacrificing speed, delivering more natural colors straight out of the camera.
  • Surveillance & automotive vision: Low‑light cameras often suffer from color casts that hinder downstream tasks (e.g., object detection). A plug‑and‑play RL‑AWB module can clean up the raw feed, improving the reliability of perception stacks.
  • Cross‑device pipelines: Because the model learns a sensor‑agnostic policy, manufacturers can deploy a single pretrained model across product lines, reducing engineering overhead.
  • Content creation tools: Photo‑editing software can offer an “auto‑night‑balance” button that mimics a professional colorist, saving time for creators working with raw night‑time footage.

Limitations & Future Work

  • Dependence on gray‑pixel detection: Extremely monochrome scenes (e.g., a night sky with few neutral surfaces) can still confuse the statistical front‑end, limiting the RL agent’s starting point.
  • Training data diversity: Although the new dataset spans four sensors, it focuses on urban night scenes; performance on exotic lighting (e.g., stage lighting, fireworks) remains untested.
  • Explainability: The RL policy is a black‑box; understanding why a particular gain adjustment was chosen is non‑trivial, which may be a concern for safety‑critical applications.

Future research directions include: augmenting the gray‑pixel detector with learned semantic cues, extending the dataset to cover a broader range of nighttime environments, and exploring model‑compression techniques to run the RL agent on ultra‑low‑power edge devices.

Authors

  • Yuan‑Kang Lee
  • Kuan‑Lin Chen
  • Chia‑Che Chang
  • Yu‑Lun Liu

Paper Information

  • arXiv ID: 2601.05249v1
  • Categories: cs.CV
  • Published: January 8, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »