[Paper] Flexi-NeurA: A Configurable Neuromorphic Accelerator with Adaptive Bit-Precision Exploration for Edge SNNs
Source: arXiv - 2602.18140v1
Overview
Flexi‑NeurA is a parameterizable neuromorphic accelerator designed for spiking neural networks (SNNs) that run on edge devices. By exposing a rich set of design‑time knobs—neuron model choice, network topology, and per‑parameter bit‑precision—it lets engineers tailor hardware to the exact accuracy‑vs‑resource trade‑offs required by their applications, while keeping power and latency ultra‑low.
Key Contributions
- Fully configurable accelerator core: Users can select neuron dynamics (e.g., LIF, Izhikevich), network depth/width, and the bit‑width of each critical variable (weights, decay factors, membrane potentials).
- Time‑multiplexed, event‑driven processing: Reduces the number of physical compute units needed, cutting logic and memory footprints without sacrificing throughput.
- Flex‑plorer DSE tool: A heuristic‑guided design‑space exploration engine that automatically picks the cheapest fixed‑point precisions that meet a user‑specified accuracy budget.
- RTL generation pipeline: The chosen configuration is fed directly into a parameterized RTL generator, producing synthesizable Verilog/VHDL that matches the target FPGA/ASIC constraints.
- Comprehensive benchmark suite: Demonstrated on MNIST, SHD (spiking‑human‑digit), and DVS (event‑camera) workloads, showing up to 2× energy savings and sub‑millisecond latency compared with prior fixed‑precision neuromorphic cores.
Methodology
- Parameterizable Architecture: The core is built from reusable modules (spike routers, membrane update units, synapse banks). Each module’s datapath width is a compile‑time parameter, allowing fine‑grained precision control.
- Event‑Driven Scheduling: Spikes are processed only when they occur; idle cycles are skipped, which dramatically lowers dynamic power.
- Time‑Multiplexing: A small set of physical processing elements (PEs) are time‑shared across multiple neurons, trading off a modest increase in latency for a large reduction in silicon area.
- Flex‑plorer DSE:
- Starts with a high‑precision reference model (e.g., 16‑bit fixed point).
- Uses a heuristic (gradient‑guided search) to lower the bit‑width of each parameter independently, evaluating the impact on classification accuracy via fast software simulation.
- Stops when the user‑defined accuracy drop threshold is reached, outputting the minimal precision vector and the corresponding resource estimate.
- RTL Generation & Synthesis: The precision vector drives a parameterized RTL template, which is then synthesized for the target FPGA/ASIC.
Results & Findings
| Benchmark | Config (bits) | Accuracy | Latency | Logic Cells | BRAM | Power |
|---|---|---|---|---|---|---|
| MNIST (3‑layer FC, 256‑128‑10) | 8/6/4 (weights/decay/membrane) | 97.23 % | 1.1 ms | 1,623 | 7 | 111 mW |
| SHD (spiking speech) | 6/5/4 | 92.1 % (±0.4) | 2.3 ms | 2,014 | 9 | 138 mW |
| DVS Gesture | 7/6/5 | 94.8 % | 3.0 ms | 2,487 | 11 | 152 mW |
- Accuracy stays within 0.5 % of the full‑precision baseline despite aggressive bit‑width reduction.
- Energy per inference drops by up to 45 % compared to a fixed 16‑bit neuromorphic core.
- Resource utilization is low enough to fit two cores on a mid‑range Xilinx Artix‑7, leaving headroom for peripheral I/O and on‑chip memory.
Practical Implications
- Edge AI devices (wearables, drones, smart cameras) can now host SNN inference engines that meet strict power envelopes while still delivering high classification performance.
- Rapid hardware prototyping: Engineers can iterate over neuron models and precision settings in software, then generate a matching RTL in minutes—greatly shortening the design cycle for custom ASICs or FPGA‑based products.
- Mixed‑precision SNN training pipelines: The DSE output can be fed back into training loops to fine‑tune weights for the exact hardware precision, enabling end‑to‑end co‑design.
- Scalable deployment: Because the core is time‑multiplexed, a single silicon block can be instantiated multiple times on the same chip to handle larger networks without linear area growth.
Limitations & Future Work
- The current DSE relies on a heuristic search; it may miss globally optimal precision combinations for highly irregular network topologies.
- Support for convolutional SNNs and more exotic neuron dynamics (e.g., adaptive exponential integrate‑and‑fire) is not yet integrated.
- The authors plan to extend Flex‑plorer with machine‑learning‑based predictors to accelerate exploration and to explore ASIC‑level power gating for further energy reductions.
Authors
- Mohammad Farahani
- Mohammad Rasoul Roshanshah
- Saeed Safari
Paper Information
- arXiv ID: 2602.18140v1
- Categories: cs.AR, cs.NE
- Published: February 20, 2026
- PDF: Download PDF