[Paper] MD-SNN: Membrane Potential-aware Distillation on Quantized Spiking Neural Network

Published: (December 3, 2025 at 11:27 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.04443v1

Overview

Spiking Neural Networks (SNNs) promise ultra‑low‑power AI by using binary spikes instead of dense activations, but their training is notoriously expensive because the membrane potential must be tracked over many time steps. The paper MD‑SNN: Membrane Potential‑aware Distillation on Quantized Spiking Neural Network proposes a new way to compress SNNs—combining aggressive quantization with a knowledge‑distillation step that explicitly aligns the membrane potentials of a full‑precision teacher and a quantized student. The result is a quantized SNN that retains almost the same accuracy while slashing energy and silicon area.

Key Contributions

  • Membrane‑aware distillation: First work to use the internal membrane potential as a distillation target in SNNs, closing the accuracy gap caused by quantization.
  • Unified quantization pipeline: Simultaneously quantizes weights, batch‑norm parameters, and the membrane potential, preserving the dynamics of spike generation.
  • Comprehensive evaluation: Experiments on CIFAR‑10/100, N‑Caltech101, TinyImageNet (both static images and event‑based data) show iso‑accuracy with up to 2.6× lower memory footprint.
  • Hardware‑level validation: Using the SpikeSim accelerator, MD‑SNN achieves 14.85× lower energy‑delay‑area product (EDAP), 2.64× higher TOPS/W, and 6.19× higher TOPS/mm² compared to floating‑point SNNs at the same accuracy.
  • Open‑source reproducibility: The authors release code and model checkpoints, enabling rapid adoption by the community.

Methodology

  1. Baseline SNN training – A conventional full‑precision SNN is trained with surrogate‑gradient back‑propagation over multiple time steps.
  2. Quantization – All learnable tensors (weights, batch‑norm scaling, and the membrane state) are uniformly quantized to low‑bit (e.g., 4‑bit) representations. Naïve quantization would distort the membrane potential, causing spikes to fire at the wrong times.
  3. Membrane‑aware Knowledge Distillation
    • The full‑precision model (teacher) produces two signals for each layer: the spike output and the intermediate membrane potential.
    • The quantized model (student) is trained to mimic both signals simultaneously, using a weighted loss:

[ \mathcal{L}= \alpha \cdot \text{CE}(y_{\text{student}}, y_{\text{gt}}) + \beta \cdot |V_{\text{student}}-V_{\text{teacher}}|_2^2 ]

  • By aligning the membrane potentials, the student learns to preserve the timing of spikes despite the reduced numerical precision.
  1. Hardware‑aware evaluation – The quantized models are mapped onto SpikeSim, a cycle‑accurate SNN accelerator, to measure real‑world energy, latency, and area.

Results & Findings

DatasetFP‑SNN Acc.MD‑SNN (4‑bit) Acc.Δ AccuracyEDAP ReductionTOPS/W ↑TOPS/mm² ↑
CIFAR‑1092.3 %91.9 %–0.4 %13.2×2.5×5.8×
CIFAR‑10071.8 %71.2 %–0.6 %12.8×2.4×5.5×
N‑Caltech101 (event)78.5 %78.3 %–0.2 %14.85×2.64×6.19×
TinyImageNet55.1 %54.7 %–0.4 %11.9×2.3×5.2×
  • Accuracy loss is <1 % across all benchmarks, confirming that membrane‑aware distillation effectively compensates for quantization noise.
  • Energy‑delay‑area product (EDAP) drops by an order of magnitude, making the quantized SNN viable for edge devices with strict power budgets.
  • The approach works for both static frame‑based and event‑driven data, highlighting its versatility.

Practical Implications

  • Edge AI chips: Developers building neuromorphic processors can now deploy SNNs with 4‑bit weights and activations without sacrificing accuracy, dramatically extending battery life for wearables, drones, and IoT sensors.
  • Event‑camera pipelines: Real‑time vision systems that already use event cameras (e.g., autonomous robots) can replace heavyweight CNNs with MD‑SNNs, gaining lower latency and smaller silicon footprints.
  • Framework integration: Because the distillation loss is just an extra term on top of standard training loops, existing PyTorch or TensorFlow SNN libraries can adopt MD‑SNN with minimal code changes.
  • Model compression pipeline: MD‑SNN can be combined with other techniques—pruning, weight sharing, or spike‑rate regularization—to push compression even further while keeping the training pipeline simple.
  • Rapid prototyping: The released SpikeSim scripts let hardware architects evaluate energy/area trade‑offs early in the design cycle, shortening time‑to‑market for neuromorphic ASICs.

Limitations & Future Work

  • Quantization granularity: The study focuses on uniform low‑bit quantization; non‑uniform or mixed‑precision schemes might yield even better trade‑offs but were not explored.
  • Training overhead: Adding a membrane‑potential distillation term increases training time (≈1.3× slower) because the teacher’s intermediate states must be stored or recomputed.
  • Scalability to larger models: Experiments were limited to ResNet‑like backbones up to ~2 M parameters; applying MD‑SNN to transformer‑style spiking architectures remains an open question.
  • Hardware dependency: Energy gains are measured on SpikeSim; results may vary on other neuromorphic platforms with different spike routing or memory hierarchies.

Future research directions include adaptive bit‑width selection per layer, extending membrane‑aware distillation to multi‑task SNNs, and integrating the technique into end‑to‑end hardware‑software co‑design flows.

Authors

  • Donghyun Lee
  • Abhishek Moitra
  • Youngeun Kim
  • Ruokai Yin
  • Priyadarshini Panda

Paper Information

  • arXiv ID: 2512.04443v1
  • Categories: cs.NE
  • Published: December 4, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »