[Paper] Flexi-NeurA: A Configurable Neuromorphic Accelerator with Adaptive Bit-Precision Exploration for Edge SNNs

Published: 2 months ago (February 20, 2026 at 06:01 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.18140v1

Overview

Flexi‑NeurA is a parameterizable neuromorphic accelerator designed for spiking neural networks (SNNs) that run on edge devices. By exposing a rich set of design‑time knobs—neuron model choice, network topology, and per‑parameter bit‑precision—it lets engineers tailor hardware to the exact accuracy‑vs‑resource trade‑offs required by their applications, while keeping power and latency ultra‑low.

Key Contributions

Fully configurable accelerator core: Users can select neuron dynamics (e.g., LIF, Izhikevich), network depth/width, and the bit‑width of each critical variable (weights, decay factors, membrane potentials).
Time‑multiplexed, event‑driven processing: Reduces the number of physical compute units needed, cutting logic and memory footprints without sacrificing throughput.
Flex‑plorer DSE tool: A heuristic‑guided design‑space exploration engine that automatically picks the cheapest fixed‑point precisions that meet a user‑specified accuracy budget.
RTL generation pipeline: The chosen configuration is fed directly into a parameterized RTL generator, producing synthesizable Verilog/VHDL that matches the target FPGA/ASIC constraints.
Comprehensive benchmark suite: Demonstrated on MNIST, SHD (spiking‑human‑digit), and DVS (event‑camera) workloads, showing up to 2× energy savings and sub‑millisecond latency compared with prior fixed‑precision neuromorphic cores.

Methodology

Parameterizable Architecture: The core is built from reusable modules (spike routers, membrane update units, synapse banks). Each module’s datapath width is a compile‑time parameter, allowing fine‑grained precision control.
Event‑Driven Scheduling: Spikes are processed only when they occur; idle cycles are skipped, which dramatically lowers dynamic power.
Time‑Multiplexing: A small set of physical processing elements (PEs) are time‑shared across multiple neurons, trading off a modest increase in latency for a large reduction in silicon area.
Flex‑plorer DSE:
- Starts with a high‑precision reference model (e.g., 16‑bit fixed point).
- Uses a heuristic (gradient‑guided search) to lower the bit‑width of each parameter independently, evaluating the impact on classification accuracy via fast software simulation.
- Stops when the user‑defined accuracy drop threshold is reached, outputting the minimal precision vector and the corresponding resource estimate.
RTL Generation & Synthesis: The precision vector drives a parameterized RTL template, which is then synthesized for the target FPGA/ASIC.

Results & Findings

Benchmark	Config (bits)	Accuracy	Latency	Logic Cells	BRAM	Power
MNIST (3‑layer FC, 256‑128‑10)	8/6/4 (weights/decay/membrane)	97.23 %	1.1 ms	1,623	7	111 mW
SHD (spiking speech)	6/5/4	92.1 % (±0.4)	2.3 ms	2,014	9	138 mW
DVS Gesture	7/6/5	94.8 %	3.0 ms	2,487	11	152 mW

Accuracy stays within 0.5 % of the full‑precision baseline despite aggressive bit‑width reduction.
Energy per inference drops by up to 45 % compared to a fixed 16‑bit neuromorphic core.
Resource utilization is low enough to fit two cores on a mid‑range Xilinx Artix‑7, leaving headroom for peripheral I/O and on‑chip memory.

Practical Implications

Edge AI devices (wearables, drones, smart cameras) can now host SNN inference engines that meet strict power envelopes while still delivering high classification performance.
Rapid hardware prototyping: Engineers can iterate over neuron models and precision settings in software, then generate a matching RTL in minutes—greatly shortening the design cycle for custom ASICs or FPGA‑based products.
Mixed‑precision SNN training pipelines: The DSE output can be fed back into training loops to fine‑tune weights for the exact hardware precision, enabling end‑to‑end co‑design.
Scalable deployment: Because the core is time‑multiplexed, a single silicon block can be instantiated multiple times on the same chip to handle larger networks without linear area growth.

Limitations & Future Work

The current DSE relies on a heuristic search; it may miss globally optimal precision combinations for highly irregular network topologies.
Support for convolutional SNNs and more exotic neuron dynamics (e.g., adaptive exponential integrate‑and‑fire) is not yet integrated.
The authors plan to extend Flex‑plorer with machine‑learning‑based predictors to accelerate exploration and to explore ASIC‑level power gating for further energy reductions.

Authors

Mohammad Farahani
Mohammad Rasoul Roshanshah
Saeed Safari

Paper Information

arXiv ID: 2602.18140v1
Categories: cs.AR, cs.NE
Published: February 20, 2026
PDF: Download PDF

[Paper] Flexi-NeurA: A Configurable Neuromorphic Accelerator with Adaptive Bit-Precision Exploration for Edge SNNs

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning

How I Built Two Generations of Neuromorphic Processor From Scratch

Pruning in Deep Learning: Structured vs Unstructured

OpenAI Calls In the Consultants For Its Enterprise Push