[Paper] A Lightweight Real-Time Low-Light Enhancement Network for Embedded Automotive Vision Systems

Published: 2 months ago (December 2, 2025 at 12:44 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.02965v1

Overview

The paper presents UltraFast‑LieNET, a tiny yet powerful neural network designed to brighten low‑light images in real‑time on embedded automotive vision hardware. By shrinking the model to just a few hundred parameters while still beating state‑of‑the‑art (SOTA) methods in image quality, the authors address a critical bottleneck for night‑time driver‑assistance and autonomous‑driving systems.

Key Contributions

Dynamic Shifted Convolution (DSConv): A novel convolutional kernel that learns only 12 parameters and uses spatial “shifts” to capture features efficiently.
Multi‑scale Shifted Residual Block (MSRB): Stacks DSConv layers with different shift distances, dramatically expanding the receptive field without extra parameters.
Ultra‑lightweight architecture: The whole network can be configured down to 36 learnable parameters (the smallest practical version) and the best‑performing variant uses only 180 parameters.
Gradient‑aware multi‑level loss: A loss function that balances pixel‑wise fidelity, edge preservation, and gradient consistency, stabilizing training of such a tiny model.
Real‑world validation: Extensive experiments on the LOLI‑Street dataset and three public low‑light benchmarks show a PSNR gain of 4.6 dB over the previous best lightweight methods, while running at real‑time frame rates on typical automotive SoCs.

Methodology

Shifted Convolution Basics – Instead of learning a full 3×3 (or larger) kernel, DSConv learns a small set of scalar weights that are applied after shifting the input feature map by a fixed offset (e.g., left‑1, up‑2). The shift operation is parameter‑free and can be implemented as a simple memory address offset, making it extremely fast on hardware.
Dynamic Shifts – The shift distances themselves are not fixed; a lightweight gating module predicts the optimal offset for each layer during training, allowing the network to adapt its receptive field to the content of the image.
MSRB Construction – Several DSConvs with different shift magnitudes are placed in parallel, their outputs summed, and a residual connection is added. This design mimics a multi‑scale filter bank while keeping the parameter count minimal.
Network Stack – UltraFast‑LieNET stacks a handful of MSRBs (typically 3–5) followed by a final 1×1 convolution that maps the processed features back to the RGB space.
Training Objective – The loss combines:
- L1 pixel loss for overall brightness accuracy,
- Edge‑aware loss (using Sobel gradients) to keep structural details,
- Multi‑level gradient loss that penalizes discrepancies at several scales, encouraging the tiny model to learn both coarse illumination and fine textures.

All operations are fully convolutional, so the model can process images of any resolution without extra padding or resizing.

Results & Findings

Dataset	PSNR (dB)	Params	FPS (on typical automotive MCU)
LOLI‑Street (proposed)	26.51	180	~120 fps
ExDARK	24.8	180	~115 fps
Dark Zurich	23.9	180	~110 fps
SID (Sony)	25.2	180	~118 fps

UltraFast‑LieNET outperforms the previous lightweight champion (e.g., LLNet‑Lite) by 4.6 dB in PSNR while using ≈10× fewer parameters.
Visual inspection shows restored colors and sharp edges with virtually no halo artifacts, a common problem in aggressive low‑light enhancement.
Ablation studies confirm that both the multi‑scale shift design and the gradient‑aware loss are essential; removing either drops PSNR by >1 dB.

Practical Implications

Embedded automotive cameras: The model fits comfortably within the memory budget of typical ADAS/ADAS‑grade microcontrollers (e.g., NXP i.MX, Renesas R‑Car) and can run at >30 fps for 720p streams, enabling night‑time lane detection, pedestrian spotting, and traffic‑sign recognition without a dedicated GPU.
Energy efficiency: Fewer parameters mean lower DRAM bandwidth and reduced power draw—critical for electric‑vehicle platforms where every milliwatt counts.
Edge‑AI pipelines: UltraFast‑LieNET can be inserted as a pre‑processing block before any downstream perception model (object detection, semantic segmentation) to boost their accuracy under low illumination, often with a net gain in overall system latency because the enhancement step is cheaper than re‑training the downstream model for dark conditions.
Rapid prototyping: The codebase (PyTorch + ONNX export) and the tiny model size make it straightforward to convert to TensorRT, TVM, or vendor‑specific inference runtimes, accelerating integration into existing automotive software stacks.

Limitations & Future Work

Extreme darkness: While the network excels on typical night‑time scenes, performance degrades on severely under‑exposed frames (e.g., <0.1 lux) where more sophisticated noise modeling may be required.
Generalization to non‑automotive domains: The architecture is tuned for road‑scene statistics; applying it to indoor surveillance or medical imaging may need additional training data and possibly a modest increase in parameters.
Dynamic shift learning overhead: The gating module that predicts shift offsets adds a small runtime cost; future work could explore fixed, hardware‑friendly shift patterns or compile‑time optimization to eliminate this overhead.
End‑to‑end perception training: Integrating UltraFast‑LieNET directly into a joint training pipeline with downstream detection/segmentation networks could further improve overall system robustness—an avenue the authors plan to investigate.

Authors

Yuhan Chen
Yicui Shi
Guofa Li
Guangrui Bai
Jinyuan Shao
Xiangfei Huang
Wenbo Chu
Keqiang Li

Paper Information

arXiv ID: 2512.02965v1
Categories: cs.CV
Published: December 2, 2025
PDF: Download PDF

[Paper] A Lightweight Real-Time Low-Light Enhancement Network for Embedded Automotive Vision Systems

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] EditThinker: Unlocking Iterative Reasoning for Any Image Editor

[Paper] AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement

[Paper] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

[Paper] SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models