[Paper] Three factor delay learning rules for spiking neural networks

Published: 1 month ago (January 2, 2026 at 07:28 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.00668v1

Overview

The paper introduces a new way to train spiking neural networks (SNNs) by learning both synaptic weights and the timing delays of spikes. Using online three‑factor learning rules, the authors achieve substantial accuracy gains on temporal tasks while dramatically shrinking model size and inference latency—making SNNs far more attractive for low‑power, real‑time neuromorphic hardware.

Key Contributions

Delay‑augmented LIF neurons – Extends the classic leaky‑integrate‑and‑fire (LIF) model with learnable synaptic and axonal delays for feed‑forward and recurrent architectures.
Three‑factor online learning rule – Combines a locally computed eligibility trace (via a smooth Gaussian surrogate for the spike derivative) with a top‑down error signal to update both weights and delays in real time.
Empirical gains – Demonstrates up to 20 % higher accuracy over weight‑only baselines and up to 14 % improvement when jointly learning weights + delays with comparable parameter budgets.
Competitive performance on SHD – Matches offline back‑propagation results on the Speech Heidelberg Digits (SHD) benchmark while cutting model size by 6.6× and inference latency by 67 % (only a 2.4 % accuracy drop vs. state‑of‑the‑art).
Hardware‑friendly design – Shows that on‑device, online learning of delays can reduce memory footprints and power consumption, a key requirement for neuromorphic processors.

Methodology

Neuron model – Starts from the standard LIF neuron and adds two delay parameters:
- Synaptic delay – Time between presynaptic spike emission and arrival at the postsynaptic membrane.
- Axonal delay – Extra latency before the spike is emitted after the membrane crosses threshold.
Eligibility trace – Each synapse maintains an eligibility trace that captures how past spikes influence the current membrane potential. The trace is computed using a Gaussian surrogate gradient that smooths the otherwise non‑differentiable spike function.
Three‑factor update – Parameter updates follow the classic three‑factor rule:
- Factor 1 – Presynaptic activity (spike).
- Factor 2 – Eligibility trace (local, time‑dependent sensitivity).
- Factor 3 – Global error signal (e.g., difference between desired and actual output).
The product of these three terms yields a weight or delay increment, allowing the network to adapt both synaptic strength and timing on the fly.
Training regime – Experiments are run on event‑based datasets (including SHD) using online stochastic gradient descent; no offline back‑propagation through time is required, which keeps memory usage low.

Results & Findings

Dataset	Baseline (weights‑only)	+Learned Delays	Joint Weights + Delays	Offline BPTT (state‑of‑the‑art)
SHD (speech)	71.2 %	84.5 % (+13.3 %)	86.9 % (+15.7 %)	89.3 % (≈2.4 % higher)
Other temporal benchmarks	58 % → 68 %	68 % → 78 %	78 % → 84 %	—

Model size: Delay‑augmented networks achieve the same or higher accuracy with ≈15 % of the parameters of comparable BPTT‑trained SNNs.
Latency: Because delays are learned directly in the forward pass, inference runs ~67 % faster than the offline‑trained counterparts.
Stability: The three‑factor rule remains stable across both feed‑forward and recurrent topologies, showing that delay learning scales to more complex dynamics.

Practical Implications

Neuromorphic chips – Reducing memory and compute requirements directly translates to lower silicon area and power draw, enabling edge devices (e.g., wearables, IoT sensors) to run sophisticated temporal pattern recognizers locally.
On‑device continual learning – Since the learning rule is online, devices can adapt to new sound signatures, sensor drift, or user‑specific patterns without offloading data to the cloud.
Temporal data processing – Applications such as speech command recognition, event‑camera vision, and bio‑signal classification can benefit from the added temporal precision that learned delays provide.
Simplified software stacks – The method avoids back‑propagation‑through‑time, meaning existing spiking frameworks (e.g., BindsNET, Norse, SpykeTorch) can implement the rule with minimal changes, accelerating adoption.

Limitations & Future Work

Surrogate gradient dependence – The Gaussian surrogate is hand‑tuned; its shape may affect convergence speed and final accuracy, suggesting a need for systematic exploration of surrogate families.
Scalability to large‑scale vision tasks – Experiments focus on temporal/audio benchmarks; extending delay learning to high‑resolution event‑camera datasets remains an open challenge.
Hardware validation – While the paper reports theoretical latency and size gains, a full silicon implementation (e.g., on Loihi or a custom ASIC) would be required to confirm real‑world energy savings.
Delay range constraints – Physical hardware imposes limits on how fine‑grained delays can be represented; future work should investigate quantization effects and hardware‑aware delay encoding.

Bottom line: By teaching spiking networks when to fire, not just how strongly to fire, Vassallo and Taherinejad open a practical path toward compact, low‑latency, and continuously learning neuromorphic systems—an exciting development for developers building the next generation of edge AI.

Authors

Luke Vassallo
Nima Taherinejad

Paper Information

arXiv ID: 2601.00668v1
Categories: cs.NE, cs.LG
Published: January 2, 2026
PDF: Download PDF

[Paper] Three factor delay learning rules for spiking neural networks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI

[Paper] Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

[Paper] FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing

[Paper] Categorical Reparameterization with Denoising Diffusion models