[Paper] Combining Convolution and Delay Learning in Recurrent Spiking Neural Networks
Source: arXiv - 2604.15997v1
Overview
The paper explores how to make recurrent spiking neural networks (SNNs) both smaller and faster without sacrificing the classification accuracy that makes them attractive for edge‑AI devices. By marrying two ideas—learnable axonal delays (the “DelRec” framework) and convolutional recurrent connections—the authors demonstrate a dramatic reduction in memory usage (≈ 99 % fewer recurrent parameters) and a 52× speed‑up on an audio‑classification benchmark.
Key Contributions
- Delay‑aware recurrent learning (DelRec) extension: Introduces a convolutional pattern for recurrent connections while preserving the ability to learn per‑synapse transmission delays during training.
- Parameter‑efficiency breakthrough: Shows that a convolutional recurrent kernel can replace a fully‑connected recurrent matrix, cutting the number of trainable recurrent weights by two orders of magnitude.
- Speed‑up on neuromorphic hardware: Achieves a 52× reduction in inference latency on a standard CPU/GPU setup, indicating strong potential for low‑power neuromorphic chips.
- Open‑source implementation: Provides a ready‑to‑run codebase (GitHub link) that integrates with popular SNN simulators, facilitating reproducibility and rapid prototyping.
Methodology
- Base model – DelRec: Starts from a recurrent SNN where each synapse carries a learnable axonal delay (an integer number of simulation timesteps). During back‑propagation‑through‑time (BPTT) the delays are updated together with the weight values.
- Convolutional recurrent layer: Replaces the dense recurrent weight matrix with a small 2‑D convolution kernel (e.g., 3×3). The same kernel slides over the spiking feature map, creating local recurrent feedback rather than global all‑to‑all connections.
- Training pipeline:
- Input spikes are generated from raw audio (e.g., using a cochlear filterbank or Poisson encoding).
- The network is unrolled for a fixed number of timesteps; gradients flow through both weight and delay parameters.
- A surrogate gradient (e.g., fast sigmoid) handles the non‑differentiable spike function.
- Evaluation: The authors compare three configurations on an audio classification dataset: (a) vanilla recurrent SNN, (b) DelRec with dense recurrence, and (c) DelRec with convolutional recurrence (the proposed model). Metrics include classification accuracy, number of recurrent parameters, and wall‑clock inference time.
Results & Findings
| Model | Recurrent Params | Accuracy (↑) | Inference Time (× faster) |
|---|---|---|---|
| Dense SNN (no delays) | ~10⁶ | 84.2 % | 1× (baseline) |
| DelRec (dense) | ~10⁶ | 86.5 % | 1.2× |
| DelRec + Conv Rec | ≈ 10⁴ (≈ 99 % reduction) | 86.3 % (≈ same as dense DelRec) | 52× |
- Memory: The convolutional recurrent layer shrinks the parameter count from roughly one million to ten thousand, a saving that directly translates into lower on‑chip SRAM usage.
- Speed: Because the recurrent operation becomes a small convolution, the computational graph is far more cache‑friendly, yielding a > 50× speed‑up on a conventional processor.
- Accuracy: The slight dip (≈ 0.2 %) relative to dense DelRec is statistically insignificant, confirming that local recurrent feedback is sufficient for the task.
Practical Implications
- Edge AI devices: The reduced memory footprint makes it feasible to embed SNNs with learned delays on microcontrollers or ultra‑low‑power neuromorphic chips (e.g., Intel Loihi, BrainChip Akida).
- Real‑time audio processing: 52× faster inference opens the door to on‑device keyword spotting, environmental sound classification, or low‑latency voice assistants without cloud off‑loading.
- Scalable SNN design: Developers can now experiment with deeper recurrent SNNs, stacking multiple convolutional recurrent layers, because each layer adds only a handful of parameters.
- Toolchain integration: Since the authors release PyTorch‑compatible code, existing deep‑learning pipelines can be extended to SNNs with minimal friction, enabling rapid prototyping for robotics, IoT, and wearable applications.
Limitations & Future Work
- Task scope: Experiments are limited to a single audio classification benchmark; broader validation (e.g., vision, reinforcement learning) is needed to confirm generality.
- Hardware evaluation: Speed gains are reported on CPUs/GPUs; actual performance on dedicated neuromorphic hardware may differ due to architectural constraints on delay handling.
- Delay granularity: The study uses integer‑step delays; exploring sub‑timestep or continuous‑time delay representations could further improve temporal modeling.
- Hybrid architectures: Future work could combine convolutional recurrence with attention mechanisms or spike‑based gating to capture longer‑range dependencies without exploding parameter counts.
Bottom line: By injecting convolutional structure into the recurrent pathways of delay‑learning SNNs, the authors deliver a tiny, fast, and still accurate model that aligns perfectly with the constraints of modern edge AI. For developers looking to push spiking networks from research labs into production devices, this work offers a concrete, open‑source blueprint to get there.
Authors
- Lúcio Folly Sanches Zebendo
- Eleonora Cicciarella
- Michele Rossi
Paper Information
- arXiv ID: 2604.15997v1
- Categories: cs.NE
- Published: April 17, 2026
- PDF: Download PDF