[Paper] Evolutionary Mapping of Neural Networks to Spatial Accelerators

Published: (February 4, 2026 at 11:28 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2602.04717v1

Overview

The paper presents an evolutionary, hardware‑in‑the‑loop framework that automatically maps neural‑network graphs onto spatial accelerators such as Intel Loihi 2. By treating the mapping problem as a black‑box optimization task, the authors eliminate the need for hand‑crafted, hardware‑specific heuristics, delivering up to 35 % lower latency and 40 % better energy efficiency on real neuromorphic chips.

Key Contributions

  • First evolutionary mapping framework that directly interacts with neuromorphic hardware during optimization (hardware‑in‑the‑loop).
  • Black‑box formulation of the mapping problem, making it agnostic to specific accelerator architectures.
  • Demonstrated significant latency reductions (up to 35 %) on sparse multi‑layer perceptron (MLP) workloads compared to vendor heuristics.
  • Showed energy‑efficiency gains (up to 40 %) without explicitly optimizing for power.
  • Scalable evaluation on multi‑chip Loihi 2 systems, proving the approach works beyond a single die.

Methodology

  1. Problem Framing – The mapping of a neural‑network computation graph onto a 2‑D mesh of compute‑memory cores is expressed as a black‑box function: given a candidate placement, the hardware returns latency, energy, and resource utilization.
  2. Evolutionary Search – An evolutionary algorithm (EA) iteratively evolves a population of placement candidates. Standard EA operators (selection, crossover, mutation) are adapted to respect hardware constraints (e.g., core capacity, communication bandwidth).
  3. Hardware‑in‑the‑Loop – Instead of relying on a simulator, each candidate is executed on the actual Loihi 2 chip (or a multi‑chip cluster) to obtain true performance metrics. This eliminates modeling errors and captures subtle hardware effects such as routing contention.
  4. Fitness Evaluation – The primary objective is total inference latency; secondary objectives (energy, memory usage) are incorporated via a weighted multi‑objective score.
  5. Termination – The EA stops after a fixed budget of hardware evaluations or when improvements plateau, returning the best‑found mapping.

Results & Findings

BenchmarkBaseline (vendor heuristic)Evolutionary MappingLatency ReductionEnergy Improvement
Sparse MLP‑A (4 layers)12.8 ms8.3 ms35 %~30 %
Sparse MLP‑B (6 layers)19.5 ms13.7 ms30 %~40 %
Multi‑chip scaling (2 × Loihi 2)22.1 ms15.0 ms32 %~38 %
  • Latency gains stem from better placement of heavily communicating neurons onto neighboring cores, reducing hop count and contention.
  • Energy gains emerge as a side‑effect: fewer inter‑core messages and shorter execution times lower dynamic power.
  • The EA converges within a few hundred hardware evaluations, which is practical given the fast inference cycles on Loihi 2.

Practical Implications

  • Developer Productivity – Engineers can feed a high‑level model (e.g., ONNX) into the framework and obtain an optimized hardware mapping without deep knowledge of Loihi’s mesh topology.
  • Portability – Because the approach treats the accelerator as a black box, the same pipeline can target future spatial chips (e.g., other neuromorphic or in‑memory compute fabrics) with minimal changes.
  • Edge Deployment – Lower latency and energy directly translate to longer battery life and higher throughput for edge AI devices that rely on neuromorphic processors.
  • Toolchain Integration – The framework can be wrapped as a plugin for existing ML compilers (TVM, Glow), enabling end‑to‑end automated deployment pipelines.

Limitations & Future Work

  • Hardware Evaluation Cost – While feasible for Loihi 2, the need to run each candidate on real silicon can become a bottleneck for larger search spaces or slower devices.
  • Scope of Benchmarks – Experiments focus on sparse MLPs; extending to convolutional, recurrent, or transformer models may reveal new challenges.
  • Multi‑Objective Optimization – Energy is only indirectly optimized; a dedicated Pareto‑front approach could give developers finer control over latency‑vs‑energy trade‑offs.
  • Generalization – The evolutionary operators are tuned for Loihi’s 2‑D mesh; future work should explore adaptive operators that automatically adapt to arbitrary interconnect topologies.

Bottom line: By marrying evolutionary search with direct hardware feedback, this work paves the way for hands‑off, high‑performance deployment of neural networks on spatial accelerators—an exciting step toward making neuromorphic hardware a mainstream tool for AI developers.

Authors

  • Alessandro Pierro
  • Jonathan Timcheck
  • Jason Yik
  • Marius Lindauer
  • Eyke Hüllermeier
  • Marcel Wever

Paper Information

  • arXiv ID: 2602.04717v1
  • Categories: cs.NE
  • Published: February 4, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »