[Paper] Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

Published: 1 day ago (April 27, 2026 at 09:58 AM EDT)

5 min read

Source: arXiv

Source: arXiv - 2604.24492v1

Overview

The paper tackles a hidden mismatch in current hardware‑aware Neural Architecture Search (NAS): most pipelines design networks assuming full‑precision (FP32) training, then later quantize them to low‑precision (e.g., FP16) for edge deployment. This “post‑hoc” step can cause a noticeable drop in accuracy, especially on ultra‑constrained devices like the Intel Movidius Myriad X VPU used in space‑borne edge AI. By weaving low‑precision constraints directly into the NAS loop, the authors close the gap between the architecture’s search‑time behavior and its real‑world, on‑device performance.

Key Contributions

Deployment‑aligned low‑precision NAS: Introduces a simple yet effective modification that forces every candidate architecture to be fine‑tuned and evaluated under FP16 constraints during the search.
Hardware‑aware evaluation without extra search‑space engineering: Keeps the original evolutionary NAS algorithm untouched; the only change is the precision‑aware training pipeline.
Real‑world maritime monitoring use‑case: Demonstrates the method on a vessel‑segmentation model destined for the Myriad X VPU, a processor commonly used in satellites and other space‑borne platforms.
Quantitative gains: Recovers ~2/3 of the accuracy loss caused by naïve post‑training quantization (0.85 → 0.78 mIoU becomes 0.85 → 0.826 mIoU) without increasing parameter count (≈96 k).
Generalizable recipe: The approach can be applied to any hardware‑aware NAS framework that already supports a target device metric (latency, energy, etc.).

Methodology

Search Space & Evolutionary Strategy – The authors reuse a standard NAS search space for segmentation (varying encoder depth, kernel sizes, etc.) and an evolutionary algorithm that selects, mutates, and recombines architectures based on a composite fitness score (accuracy + latency on the target VPU).
Low‑Precision Fine‑Tuning – For each sampled architecture, after a brief warm‑up in FP32, the model is fine‑tuned for a few epochs using FP16 arithmetic (simulated on the host GPU). This forces the network to learn weights that are robust to the reduced mantissa and dynamic range of FP16.
On‑Device Evaluation Proxy – Instead of deploying every candidate to the VPU (impractical), the authors use a calibrated latency model and a “FP16‑simulated” validation set to estimate the on‑device mIoU. The fitness function therefore already reflects the low‑precision behavior.
Selection & Evolution – Architectures that achieve high simulated FP16 accuracy while meeting the latency budget are kept for the next generation, iterating until convergence.

The key insight is that the only extra cost is the FP16 fine‑tuning step, which is negligible compared to the overall NAS budget.

Results & Findings

Metric	Full‑Precision NAS (post‑hoc quant)	Deployment‑aligned Low‑Precision NAS
Parameters	95,791	95,791 (unchanged)
On‑Device Latency (Myriad X)	12 ms (meets budget)	12 ms (identical)
mIoU (FP32 validation)	0.85	0.85
mIoU (FP16 on‑device)	0.78	0.826

Accuracy Gap Reduction: The low‑precision aware search recovers ~66 % of the drop caused by naïve quantization.
No Extra Complexity: Parameter count and latency remain identical, proving that the improvement stems purely from better numerical robustness.
Robustness Across Seeds: Repeating the NAS with different random seeds yields consistent gains, indicating the method is not a fluke.

Practical Implications

Space‑borne Edge AI: Satellites and high‑altitude platforms often rely on low‑power VPUs; this technique directly translates to more reliable on‑board perception (e.g., maritime traffic monitoring, disaster mapping).
Edge Device Manufacturers: Chip designers can provide a “low‑precision simulation layer” that NAS tools can hook into, enabling co‑design of models that are born for the hardware rather than retro‑fitted.
Developer Workflow: Teams can keep using familiar NAS frameworks (e.g., AutoML, NNI) and simply swap in an FP16 fine‑tuning hook—no need to redesign search spaces or write custom quantization‑aware layers.
Cost Savings: By avoiding a post‑search quantization step that often requires manual re‑training or accuracy‑loss mitigation, product cycles shrink and the risk of field‑failures drops.
Generalization: While the paper focuses on FP16 and the Myriad X, the same principle applies to INT8, bfloat16, or any custom numeric format supported by the target accelerator.

Limitations & Future Work

Precision Scope: The study only explores FP16; ultra‑low‑bit formats (INT8, 4‑bit) may exhibit different training dynamics and could need more sophisticated loss scaling or regularization.
Search Overhead: Adding FP16 fine‑tuning modestly increases NAS runtime; scaling to larger search spaces (e.g., ImageNet‑scale models) may demand more efficient proxy metrics.
Hardware Diversity: Results are tied to the Myriad X VPU; cross‑device validation (e.g., Edge TPU, NVIDIA Jetson) would strengthen the claim of universal applicability.
Theoretical Guarantees: The paper provides empirical evidence but lacks a formal analysis of why certain architectures become more numerically robust under low‑precision training.

Future research could extend the framework to mixed‑precision NAS, incorporate hardware‑specific quantization error models, or combine the approach with differentiable NAS for faster convergence.

Authors

Parampuneet Kaur Thind
Vaibhav Katturu
Giacomo Zema
Roberto Del Prete

Paper Information

arXiv ID: 2604.24492v1
Categories: cs.CV, cs.AI, cs.ET, cs.LG, cs.NE
Published: April 27, 2026
PDF: Download PDF

[Paper] Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] No Pedestrian Left Behind: Real-Time Detection and Tracking of Vulnerable Road Users for Adaptive Traffic Signal Control

[Paper] SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring

[Paper] Improving Diversity in Black-box Few-shot Knowledge Distillation

[Paper] Diverse Image Priors for Black-box Data-free Knowledge Distillation