[Paper] Is Bigger Always Better? Efficiency Analysis in Resource-Constrained Small Object Detection

Published: 1 day ago (March 2, 2026 at 01:05 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2603.02142v1

Overview

The paper Is Bigger Always Better? Efficiency Analysis in Resource‑Constrained Small Object Detection challenges the prevailing “bigger‑is‑better” dogma in computer‑vision model scaling. By rigorously testing three scaling levers—model size, training‑set size, and image resolution—on rooftop photovoltaic (PV) detection in Madagascar, the authors show that tiny, high‑resolution models can outperform their massive counterparts both in raw accuracy and in efficiency (accuracy per megabyte of model).

Key Contributions

Systematic efficiency framework: Introduces a metric (mAP₅₀ per unit of model size) to compare models on a fair resource‑budget basis.
Empirical inversion of scaling laws: Demonstrates that the smallest YOLO 11 N model is 24× more efficient than the largest YOLO 11 X while also achieving the highest absolute mAP₅₀ (0.617).
Resolution as the dominant lever: Shows that increasing input resolution yields up to +120 % efficiency gain, dwarfing the marginal benefits of adding more training data at low resolutions.
Pareto‑dominance across 44 deployment scenarios: Small, high‑resolution configurations dominate the accuracy‑throughput trade‑off space, eliminating the need for a classic “accuracy vs. speed” compromise.
Domain‑specific insight for Earth observation (EO): Provides the first large‑scale, data‑scarce analysis of scaling laws for small‑object detection in satellite imagery.

Methodology

Dataset & Task – The authors curated a rooftop PV detection benchmark from high‑resolution satellite images of Madagascar, a classic “small‑object” problem where each PV panel occupies only a few pixels.
Scaling Dimensions
- Model size: Six YOLO 11 variants ranging from the ultra‑light YOLO 11 N (≈1 M parameters) to the heavyweight YOLO 11 X (≈90 M parameters).
- Dataset size: Sub‑samples of the training set (10 %, 30 %, 60 %, 100 %).
- Input resolution: Four resolutions (640×640, 960×960, 1280×1280, 1600×1600).
Training Protocol – All models were trained with identical hyper‑parameters (learning rate schedule, optimizer, augmentation) to isolate the effect of the three scaling knobs.
Efficiency Metric – For each configuration, the authors compute mAP₅₀ / model‑size (MB), allowing a direct comparison of “accuracy per byte”.
Pareto Analysis – The 44 possible configurations (6 models × 4 resolutions × ~2 dataset‑size regimes) are plotted in an accuracy‑throughput space; configurations that are not dominated by any other are identified as Pareto‑optimal.

Results & Findings

Scaling Lever	Impact on mAP₅₀	Impact on Efficiency (mAP₅₀/MB)
Model size (YOLO 11 N → YOLO 11 X)	+0.02 mAP₅₀ (tiny gain)	‑24× (efficiency collapses)
Resolution (640 → 1600)	+0.12 mAP₅₀	+120 % efficiency boost
Dataset size (10 % → 100 %)	+0.01–0.03 mAP₅₀ (negligible)	No measurable efficiency change

YOLO 11 N at 1600×1600 achieved the best absolute mAP₅₀ (0.617) and the highest efficiency, beating every larger model even when they used the same or higher resolution.
Adding more labeled images gave diminishing returns, especially when the resolution was low; the model quickly saturated on the information available in each pixel.
In all 44 deployment setups, the small‑high‑resolution point sat on the Pareto frontier, meaning no other configuration could improve accuracy without sacrificing throughput (or vice‑versa).

Practical Implications

Model selection for edge/IoT devices – When deploying CV on satellites, drones, or on‑board processors with strict memory limits, developers should prioritize higher input resolution over bigger backbones.
Cost‑effective data collection – In data‑scarce EO projects, investing heavily in labeling more imagery may not pay off; instead, allocate resources to acquire higher‑resolution sensors or to up‑sample existing data.
Simplified pipeline – Smaller models reduce inference latency, power consumption, and simplify containerization, enabling real‑time monitoring of rooftop PV installations for grid operators or NGOs.
Generalizable recipe – The efficiency‑first evaluation can be applied to other small‑object detection domains (e.g., wildlife counting, traffic sign detection) where the object occupies few pixels.

Limitations & Future Work

Domain specificity – The study focuses on rooftop PV detection in a single geographic region; results may differ for other object classes or terrains.
Hardware‑agnostic metric – Efficiency is measured per megabyte of model size, not per FLOPs or actual wall‑clock latency on specific hardware; future work could incorporate device‑specific benchmarks.
Resolution ceiling – Extremely high resolutions may hit memory limits on some edge devices; exploring tiling or multi‑scale inference strategies would be valuable.
Model families – Only YOLO 11 variants were examined; extending the analysis to transformer‑based detectors or lightweight CNNs (e.g., MobileNet‑V3) could confirm whether the observed inversion holds more broadly.

Authors

Kwame Mbobda‑Kuate
Gabriel Kasmi

Paper Information

arXiv ID: 2603.02142v1
Categories: cs.CV, cs.LG
Published: March 2, 2026
PDF: Download PDF

[Paper] Is Bigger Always Better? Efficiency Analysis in Resource-Constrained Small Object Detection

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Adaptive Confidence Regularization for Multimodal Failure Detection

[Paper] From Leaderboard to Deployment: Code Quality Challenges in AV Perception Repositories

[Paper] Sketch2Colab: Sketch-Conditioned Multi-Human Animation via Controllable Flow Distillation

[Paper] Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta