[Paper] Detection Fire in Camera RGB-NIR

Published: 3 weeks ago (December 29, 2025 at 11:48 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.23594v1

Overview

The paper tackles a persistent problem in computer‑vision‑based fire monitoring: reliably spotting flames at night using RGB‑NIR (near‑infrared) cameras. By augmenting scarce NIR data, introducing a two‑stage detection pipeline, and proposing a patch‑based variant of YOLO, the authors push detection accuracy beyond the best‑published results while cutting down false alarms caused by bright artificial lights.

Key Contributions

Expanded NIR dataset – curated and heavily augmented to mitigate the lack of publicly available night‑vision fire imagery.
Two‑stage detection pipeline – combines a fast YOLOv11 front‑end with a lightweight EfficientNetV2‑B0 classifier to filter out false positives from artificial lighting.
Patched‑YOLO – a novel preprocessing scheme that splits high‑resolution RGB frames into overlapping patches, enabling the detector to better capture small or distant flames.
Comprehensive benchmark – re‑evaluates state‑of‑the‑art detectors (YOLOv7, RT‑DETR, YOLOv9) on the new dataset, demonstrating consistent gains in mAP₅₀₋₉₅.

Methodology

Data Collection & Augmentation
- Gathered raw NIR video from night‑vision cameras in controlled fire‑training sites.
- Applied geometric (rotation, scaling), photometric (brightness/contrast jitter), and domain‑specific augmentations (simulated smoke, lens flare) to inflate the training pool.
Two‑Stage Detection
- Stage 1: YOLOv11 runs on the full‑frame RGB‑NIR composite, quickly proposing bounding boxes.
- Stage 2: Each proposal is cropped and fed to EfficientNetV2‑B0, which classifies it as “fire” or “non‑fire” (e.g., street lamp). This lightweight net runs on the GPU in parallel, keeping latency low.
Patched‑YOLO for RGB
- The input image is tiled into overlapping patches (e.g., 640 × 640 with 20 % overlap).
- YOLO processes each patch independently; detections are merged using non‑maximum suppression across patch boundaries.
- This strategy preserves high‑resolution detail without blowing up memory usage.

All training used the standard COCO‑style loss functions, with additional weighting to penalize false positives on artificial lights.

Results & Findings

Model (input size)	mAP₅₀₋₉₅ (RGB)	mAP₅₀₋₉₅ (NIR)	False‑Positive Rate (lights)
YOLOv7 (640 × 1280)	0.51	0.44	18 %
RT‑DETR (640 × 640)	0.65	0.58	12 %
YOLOv9 (640 × 640)	0.598	0.55	14 %
Two‑stage (YOLOv11 + EffV2‑B0)	0.71	0.68	6 %
Patched‑YOLO (RGB only)	0.73	–	–

The two‑stage pipeline improves overall mAP by ~10 % over the strongest baseline while halving the false‑positive rate on night‑time artificial lights.
Patched‑YOLO raises detection of small, distant flames by ~8 % mAP compared to vanilla YOLOv11, with only a modest increase in inference time (≈ 12 ms per frame on an RTX 3080).

Practical Implications

Fire‑monitoring systems can now run on edge devices (e.g., NVIDIA Jetson) with real‑time performance, thanks to the lightweight EfficientNetV2‑B0 classifier.
Reduced false alarms means fewer unnecessary dispatches for fire‑brigades, translating to cost savings and higher trust in automated surveillance.
Patch‑based processing can be adopted for any high‑resolution RGB detection task where small objects matter (e.g., wildlife spotting, drone‑based inspection).
The augmented NIR dataset is released under a permissive license, giving developers a ready‑to‑use benchmark for night‑vision AI research.

Limitations & Future Work

The current NIR data still originates from a limited set of controlled fire‑training sites; performance in wildly varying outdoor conditions (rain, fog) remains untested.
Patched‑YOLO introduces extra bookkeeping for merging detections, which can become a bottleneck on very low‑power CPUs.
The authors plan to explore transformer‑based backbones for the second stage and to integrate temporal consistency (video‑level smoothing) to further suppress spurious detections.

Authors

Nguyen Truong Khai
Luong Duc Vinh

Paper Information

arXiv ID: 2512.23594v1
Categories: cs.CV
Published: December 29, 2025
PDF: Download PDF

[Paper] Detection Fire in Camera RGB-NIR

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] ReScene4D: Temporally Consistent Semantic Instance Segmentation of Evolving Indoor 3D Scenes

[Paper] CTest-Metric: A Unified Framework to Assess Clinical Validity of Metrics for CT Report Generation