How I Built Two Generations of Neuromorphic Processor From Scratch

Published: 3 days ago (February 19, 2026 at 08:42 AM EST)

7 min read

Source: Dev.to

Overview

Your brain runs on about 20 W. It processes visual scenes, generates speech, and maintains balance — all simultaneously and in real time. The best GPU clusters in the world burn megawatts to approximate what 86 billion neurons do effortlessly.

Neuromorphic processors try to close that gap. Instead of shuttling numbers through ALUs, they mimic biology:

neurons fire discrete spikes,
synapses carry weighted connections,
computation only happens when something actually changes.

Intel’s Loihi chip demonstrated that this can work at scale, but it is proprietary and requires access through Intel’s cloud service.

I built my own – two generations, from scratch, solo, as a university student.

N1 – First‑generation processor

Target: feature‑parity with Intel Loihi 1.

128‑core processor
Each core contains 1 024 CUBA (current‑based) leaky‑integrate‑and‑fire neurons
131 072 synapses per core, stored in compressed‑sparse‑row (CSR) format

Headline features

Feature	Description
Programmable microcode learning engine	16 registers, 14 op‑codes. Each core runs a small program every timestep (STDP, three‑factor reward learning, homeostatic normalization, or any custom rule) – no RTL changes needed.
Dendritic compartment trees	4 compartments per neuron with configurable join operations (ADD, ABS_MAX, OR, PASS). Dendrites perform local nonlinear processing before signals reach the soma.
8‑bit graded spikes	Neurons carry intensity information, not just fire/no‑fire. This exceeds Loihi 1 (graded spikes only appear in Loihi 2).
24‑bit state precision	One bit more than Loihi 1’s 23‑bit, with RAZ (round‑away‑from‑zero) arithmetic that prevents neurons from getting stuck at non‑resting potentials.
Triple RV32IMF RISC‑V cluster	Three embedded processors with IEEE 754 FPU, hardware breakpoints, and a shared mailbox for supervisory control.

Validation

25 RTL testbenches covering 98 test scenarios – zero failures.
SDK at this stage: 168 tests across 14 Python modules.

N2 – Second‑generation processor

Goal: replicate the architectural leap from Loihi 1 → Loihi 2 – making the neuron programmable.

In N1 every neuron runs the same hard‑coded CUBA LIF computation, which limits functionality (no bursting, adaptation, oscillation, graded‑error coding without RTL changes).

What changed

Fixed datapath → fetch‑execute microcode engine.
Each neuron runs its own program from instruction SRAM.
Per‑neuron program‑offset register enables different neurons in the same core to execute different programs.
Register file (R0‑R15) is loaded from neuron‑parameter SRAM each timestep.
Instruction set includes arithmetic, shifts, min/max, conditional skips, and two spike‑emission modes: HALT (threshold‑based) and EMIT (forced payload).

This shift mirrors graphics: from fixed‑function pixel pipelines to programmable shaders. Once the neuron is programmable, the hardware becomes a platform rather than a fixed implementation.

Built‑in neuron models (microcode programs)

Model	Description
CUBA LIF	Bit‑identical to N1’s fixed path – reproduces the exact same spike trains.
Izhikevich	Two‑variable quadratic model with four presets (regular spiking, intrinsic bursting, chattering, fast spiking). Uses `MUL_SHIFT` for the v²/2ˢ quadratic term.
Adaptive LIF	Adds a slow adaptation current that accumulates on spikes and decays exponentially → spike‑frequency adaptation.
Sigma‑Delta	Maintains a running prediction of input; emits the prediction error as a spike payload via `EMIT`. Achieves temporal sparsity for slowly‑varying signals.
Resonate‑and‑Fire	Damped oscillator that fires only when driven at its resonant frequency – no spectral computation needed.

Additional architectural enhancements

4 graded‑spike payload formats (0/8/16/24 bit) – up from 8‑bit only in N1.
Variable‑precision weight packing (1/2/4/8/16 bit) – 16× memory compression at 1‑bit; Loihi 2 only goes to 8‑bit, N2’s 9‑16 bit range helps networks needing higher precision.
5 spike traces (x1, x2, y1, y2, y3) – up from 2 in N1. Enables triplet STDP (Pfister & Gerstner 2006) and complex eligibility traces.
Convolutional synapse encoding – stores weight kernels once per group; 2‑3× memory reduction for CNN topologies.
Per‑synapse‑group plasticity enable – 30‑70 % learning‑phase speed‑up in mixed fixed/plastic networks.
Persistent reward traces with exponential decay – enables temporal credit assignment for reinforcement learning.
Homeostatic threshold plasticity – epoch‑based proportional error rule, prevents firing‑rate drift in recurrent networks.
Full observability – 3 performance counters, 25‑variable state probes per neuron, 64‑deep trace FIFO, and energy metering.
Hardware‑accurate simulation defaults – 24‑bit fixed‑point arithmetic, strict SRAM pool‑depth limits matching RTL.

Physical validation (AWS F2 instance, Xilinx VU47P)

Metric	Value
Clock	62.5 MHz neuromorphic clock / 250 MHz PCIe
Cores	16‑core instance (full 128‑core design validated in simulation)
Integration tests	28/28 passing
RTL‑level tests	9 tests generating 163 000+ spikes with zero mismatches
Dual‑clock CDC	Gray‑code async FIFOs
Throughput	~8 690 timesteps/second
BRAM utilization	56 % aggregate for 16 cores (BRAM is the binding constraint)
Scalability	Full 128‑core design would need a larger device or multi‑FPGA partitioning.

End‑to‑end demonstration

Task: Train a recurrent SNN on the Spiking Heidelberg Digits (SHD) dataset (10 420 spoken‑digit recordings encoded as 700‑channel cochlea spike trains).

Architecture: 700 input → 768 recurrent hidden → 20 output.
Training: Surrogate gradients (fast sigmoid) with AdamW.
Quantisation: Weights quantised to 16 bits for hardware deployment.

Result:

Accuracy before quantisation: 85.9 %
Accuracy after quantisation: 85.4 % (‑0.4 % drop)

This surpasses published baselines: Cramer et al. (83.2 %) and Zenke & Vogels (83.4 %).

SDK growth

Version	Tests	Python modules
N1	168	14
N2	(18× increase) – exact numbers omitted in original text but the growth factor is 18× compared to N1.

TL;DR

N1 – 128‑core, 1 024 CUBA LIF neurons/core, 8‑bit graded spikes, 24‑bit state, programmable microcode learning engine.
N2 – adds a per‑neuron microcode engine, five built‑in neuron models, richer spike‑payload/weight formats, more trace memory, and a host of plasticity/observability features.
Both generations have been thoroughly validated (RTL testbenches, integration tests, hardware runs) and demonstrated on a real SNN task with state‑of‑the‑art accuracy.

All content and numbers are retained from the original segment; the markdown has been cleaned up for readability while preserving the original structure.

N2

Statistics

Category	Count	Additional
Test cases	168	3,091
Python modules	14	88
Neuron models	1	5
Synapse formats	3	4
Weight precisions	1	5
Features	—	155 (152 FULL, 3 HW_ONLY)
Lines of Python	~8 K	~52 K

Back‑ends

Three back‑ends (CPU cycle‑accurate, GPU via PyTorch, FPGA) share the same deploy / step / get_result API.
The GPU simulator delivers a 100–1000× speed‑up over the CPU version.

No‑install option – Catalyst Cloud

pip install catalyst-cloud

from catalyst_cloud import CatalystClient

client = CatalystClient(api_key="your_key")

network = {
    "populations": [
        {"name": "input",  "n": 100, "params": {"threshold": 1000}},
        {"name": "output", "n": 10,  "params": {"threshold": 600}}
    ],
    "connections": [
        {"from": "input", "to": "output", "weight": 500, "probability": 0.3}
    ]
}

job = client.submit(network, timesteps=1000)
result = job.wait()
print(result.spike_counts)

Free tier for research – no credit‑card required.
Cloud API:
Python cloud client: pip install catalyst-cloud

Links & Resources

GitHub repository:
Full SDK source (from $25 /mo – full N1 + N2 source):
Support / donations:
Contact:

License

Licensed under BSL 1.1 – source‑available, free for research; commercial use requires a paid licence.

Project snapshot

238 development phases
2 processors
3,091 tests
Built by a single developer at the University of Aberdeen

If you work on SNNs, neuromorphic computing, or alternative computing projects, feel free to get in touch!