[Paper] Evolutionary Mapping of Neural Networks to Spatial Accelerators
Source: arXiv - 2602.04717v1
Overview
The paper presents an evolutionary, hardware‑in‑the‑loop framework that automatically maps neural‑network graphs onto spatial accelerators such as Intel Loihi 2. By treating the mapping problem as a black‑box optimization task, the authors eliminate the need for hand‑crafted, hardware‑specific heuristics, delivering up to 35 % lower latency and 40 % better energy efficiency on real neuromorphic chips.
Key Contributions
- First evolutionary mapping framework that directly interacts with neuromorphic hardware during optimization (hardware‑in‑the‑loop).
- Black‑box formulation of the mapping problem, making it agnostic to specific accelerator architectures.
- Demonstrated significant latency reductions (up to 35 %) on sparse multi‑layer perceptron (MLP) workloads compared to vendor heuristics.
- Showed energy‑efficiency gains (up to 40 %) without explicitly optimizing for power.
- Scalable evaluation on multi‑chip Loihi 2 systems, proving the approach works beyond a single die.
Methodology
- Problem Framing – The mapping of a neural‑network computation graph onto a 2‑D mesh of compute‑memory cores is expressed as a black‑box function: given a candidate placement, the hardware returns latency, energy, and resource utilization.
- Evolutionary Search – An evolutionary algorithm (EA) iteratively evolves a population of placement candidates. Standard EA operators (selection, crossover, mutation) are adapted to respect hardware constraints (e.g., core capacity, communication bandwidth).
- Hardware‑in‑the‑Loop – Instead of relying on a simulator, each candidate is executed on the actual Loihi 2 chip (or a multi‑chip cluster) to obtain true performance metrics. This eliminates modeling errors and captures subtle hardware effects such as routing contention.
- Fitness Evaluation – The primary objective is total inference latency; secondary objectives (energy, memory usage) are incorporated via a weighted multi‑objective score.
- Termination – The EA stops after a fixed budget of hardware evaluations or when improvements plateau, returning the best‑found mapping.
Results & Findings
| Benchmark | Baseline (vendor heuristic) | Evolutionary Mapping | Latency Reduction | Energy Improvement |
|---|---|---|---|---|
| Sparse MLP‑A (4 layers) | 12.8 ms | 8.3 ms | 35 % | ~30 % |
| Sparse MLP‑B (6 layers) | 19.5 ms | 13.7 ms | 30 % | ~40 % |
| Multi‑chip scaling (2 × Loihi 2) | 22.1 ms | 15.0 ms | 32 % | ~38 % |
- Latency gains stem from better placement of heavily communicating neurons onto neighboring cores, reducing hop count and contention.
- Energy gains emerge as a side‑effect: fewer inter‑core messages and shorter execution times lower dynamic power.
- The EA converges within a few hundred hardware evaluations, which is practical given the fast inference cycles on Loihi 2.
Practical Implications
- Developer Productivity – Engineers can feed a high‑level model (e.g., ONNX) into the framework and obtain an optimized hardware mapping without deep knowledge of Loihi’s mesh topology.
- Portability – Because the approach treats the accelerator as a black box, the same pipeline can target future spatial chips (e.g., other neuromorphic or in‑memory compute fabrics) with minimal changes.
- Edge Deployment – Lower latency and energy directly translate to longer battery life and higher throughput for edge AI devices that rely on neuromorphic processors.
- Toolchain Integration – The framework can be wrapped as a plugin for existing ML compilers (TVM, Glow), enabling end‑to‑end automated deployment pipelines.
Limitations & Future Work
- Hardware Evaluation Cost – While feasible for Loihi 2, the need to run each candidate on real silicon can become a bottleneck for larger search spaces or slower devices.
- Scope of Benchmarks – Experiments focus on sparse MLPs; extending to convolutional, recurrent, or transformer models may reveal new challenges.
- Multi‑Objective Optimization – Energy is only indirectly optimized; a dedicated Pareto‑front approach could give developers finer control over latency‑vs‑energy trade‑offs.
- Generalization – The evolutionary operators are tuned for Loihi’s 2‑D mesh; future work should explore adaptive operators that automatically adapt to arbitrary interconnect topologies.
Bottom line: By marrying evolutionary search with direct hardware feedback, this work paves the way for hands‑off, high‑performance deployment of neural networks on spatial accelerators—an exciting step toward making neuromorphic hardware a mainstream tool for AI developers.
Authors
- Alessandro Pierro
- Jonathan Timcheck
- Jason Yik
- Marius Lindauer
- Eyke Hüllermeier
- Marcel Wever
Paper Information
- arXiv ID: 2602.04717v1
- Categories: cs.NE
- Published: February 4, 2026
- PDF: Download PDF