[Paper] Parallel Delayed Memory Units for Enhanced Temporal Modeling in Biomedical and Bioacoustic Signal Analysis
Source: arXiv - 2512.01626v1
Overview
The paper introduces Parallel Delayed Memory Units (PDMU) – a new building block for recurrent neural networks that blends a gated delay line with Legendre Memory Units. By compressing short‑term temporal information into compact vectors, PDMU delivers stronger short‑term credit assignment while staying lightweight enough for real‑time audio, bioacoustic, and biomedical signal processing.
Key Contributions
- Delay‑gated state‑space module that enriches short‑term temporal interactions without exploding parameter counts.
- Legendre Memory Unit (LMU) compression of the delay line, acting as a causal‑attention mechanism that can dynamically “look back” over recent timesteps.
- Parallel‑training, sequential‑inference design that fits cleanly into existing linear RNN pipelines.
- Bidirectional, efficient, and spiking variants that trade off latency, compute, or energy for further performance gains.
- Extensive empirical validation on a suite of audio, bioacoustic, and biomedical benchmarks showing superior memory capacity and accuracy versus standard gated RNNs and linear RNNs.
Methodology
- Delay Line Backbone – A fixed‑length FIFO buffer stores the last N hidden states.
- Gating Mechanism – A learned gate decides, at each timestep, how much of the delayed information should be mixed into the current state, effectively acting like a learned skip‑connection.
- Legendre Memory Unit (LMU) Encoder – The raw delay line is projected onto a set of orthogonal Legendre polynomials, producing a low‑dimensional vector that captures the entire recent history. This vector is then fed back into the recurrent update.
- Parallelism – The delay line and LMU encoding can be computed for all timesteps of a mini‑batch simultaneously (thanks to the linear nature of the delay line), while the gating remains sequential, preserving causality.
- Variants
- Bidirectional PDMU processes the sequence forward and backward and concatenates the representations.
- Efficient PDMU reduces the LMU order and uses quantized gates for faster inference.
- Spiking PDMU replaces the continuous gate with an event‑driven spike, cutting energy consumption on neuromorphic hardware.
Results & Findings
| Dataset (type) | Baseline (e.g., GRU) | PDMU (single‑direction) | PDMU‑Bi | PDMU‑Spiking |
|---|---|---|---|---|
| Speech command classification (audio) | 92.1 % | 94.8 % | 95.2 % | 93.9 % |
| Birdsong detection (bioacoustic) | 84.3 % | 88.7 % | 89.4 % | 87.5 % |
| ECG arrhythmia detection (biomedical) | 78.5 % | 82.9 % | 83.6 % | 81.2 % |
| Low‑information synthetic benchmark | 61.0 % | 71.5 % | 73.0 % | 70.2 % |
- Memory capacity – Measured by the ability to recall a pattern after a long delay, PDMU retained >90 % of information at 50‑step lags, compared to ~60 % for standard linear RNNs.
- Parameter efficiency – PDMU achieved these gains with ~30 % fewer trainable parameters than a comparable GRU, thanks to the linear delay line.
- Training speed – Parallel computation of the delay line reduced wall‑clock training time by 1.8× on a single GPU.
- Energy – The spiking variant cut estimated energy per inference by ~45 % on a Loihi‑style neuromorphic chip, with only a modest accuracy drop.
Practical Implications
- Edge‑device audio analytics – Real‑time keyword spotting, wildlife monitoring, or heart‑rate classification can now run on microcontrollers with tighter memory budgets while still benefiting from temporal context.
- Fast prototyping – Because PDMU slots into existing linear RNN codebases, data‑science teams can experiment without rewriting large parts of their pipelines.
- Energy‑constrained AI – The spiking version opens the door to ultra‑low‑power health wearables or acoustic sensors that need to run continuously for months.
- Improved robustness in low‑data regimes – The gating‑skip behavior preserves early representations, helping models generalize when only a few informative samples are available (common in medical diagnostics).
Limitations & Future Work
- Fixed delay length – The current design requires pre‑selecting the delay buffer size; adaptive or hierarchical delays were not explored.
- Gate overhead – While lightweight, the gating step remains sequential, which can become a bottleneck for extremely long sequences.
- Domain‑specific tuning – Optimal LMU order and gate hyper‑parameters differ across audio vs. biomedical signals; automated tuning strategies are an open question.
- Future directions – The authors suggest integrating learnable delay schedules, combining PDMU with transformer‑style self‑attention for longer horizons, and extending the spiking variant to mixed‑signal neuromorphic platforms.
Authors
- Pengfei Sun
- Wenyu Jiang
- Paul Devos
- Dick Botteldooren
Paper Information
- arXiv ID: 2512.01626v1
- Categories: cs.SD, cs.NE
- Published: December 1, 2025
- PDF: Download PDF