[Paper] SCRAPL: Scattering Transform with Random Paths for Machine Learning
Source: arXiv - 2602.11145v1
Overview
The paper introduces SCRAPL (Scattering Transform with Random Paths for Machine Learning), a lightweight stochastic scheme that makes the powerful wavelet scattering transform practical as a differentiable loss in deep learning pipelines. By sampling a subset of scattering “paths” on‑the‑fly, SCRAPL dramatically cuts the computational cost while preserving the perceptual fidelity that scattering provides—opening the door for its use in audio synthesis, speech enhancement, and other signal‑processing‑heavy tasks.
Key Contributions
- Random‑path sampling framework for multivariable scattering transforms, turning an otherwise exhaustive (and costly) operation into a stochastic estimator suitable for SGD.
- Implementation for Joint Time‑Frequency Scattering (JTFS), enabling fine‑grained analysis of spectro‑temporal textures (e.g., drum hits, granular clouds).
- Importance‑sampling based initialization heuristic that adapts the path‑sampling distribution to the perceptual content of the training set, accelerating convergence.
- Demonstration on differentiable DSP (DDSP): unsupervised sound matching for a granular synthesizer and the iconic Roland TR‑808 drum machine.
- Open‑source release of the SCRAPL Python package together with reproducible audio examples.
Methodology
Scattering transforms decompose a signal through cascades of wavelet convolutions, producing a high‑dimensional set of coefficients (the paths). Computing all paths yields a deterministic, highly informative representation but is prohibitive for back‑propagation because each SGD step would need to evaluate thousands of convolutions.
SCRAPL tackles this by randomly sampling a small batch of paths per iteration:
- Path pool definition – All possible wavelet‑filter combinations across time and frequency scales are enumerated once.
- Stochastic selection – At each training step, a subset of paths is drawn according to a probability distribution (initially uniform, later refined by importance sampling).
- Partial scattering evaluation – Only the selected paths are computed, producing an unbiased estimator of the full scattering loss.
- Gradient back‑propagation – The estimator’s gradient is used to update the network parameters, just like any other stochastic loss.
The importance‑sampling heuristic monitors which paths contribute most to the loss on a validation subset and boosts their sampling probability, focusing computation on perceptually salient structures (e.g., transient attacks or resonant harmonics).
Results & Findings
- Speedup: SCRAPL reduces the per‑iteration cost of JTFS by ≈10‑15× compared with full‑path evaluation, while retaining comparable loss values (≤2 % deviation).
- Training stability: Networks trained with SCRAPL converge in ≈30 % fewer epochs thanks to the importance‑sampling initialization.
- Audio quality: In unsupervised matching of the granular synthesizer and TR‑808, the SCRAPL‑trained models achieve higher perceptual similarity scores (based on MOS‑like listening tests) than baselines using either raw waveform L2 loss or full‑path scattering (which was infeasible to train at scale).
- Generalization: The learned models transfer well to unseen drum patches and granular textures, indicating that the stochastic scattering loss captures robust, content‑agnostic audio characteristics.
Practical Implications
- Differentiable audio plugins: Developers can now embed JTFS‑based perceptual losses directly into VST/AU plugins for real‑time parameter optimization (e.g., automatic EQ, reverberation tuning).
- Efficient DDSP pipelines: SCRAPL makes it viable to train neural synthesizers on large sound libraries without resorting to heavyweight perceptual metrics that would otherwise stall training.
- Audio quality assessment: The random‑path estimator can serve as a fast proxy for perceptual distance in monitoring or adaptive bitrate streaming systems.
- Cross‑modal research: Because scattering is a mathematically grounded, modality‑agnostic transform, SCRAPL could be repurposed for video or multimodal signal processing where similar computational bottlenecks exist.
In short, SCRAPL bridges the gap between theoretical signal‑processing rigor and practical deep‑learning workflows, giving engineers a new tool for building perceptually aware audio models without sacrificing training speed.
Limitations & Future Work
- Estimator variance: Random sampling introduces noise into the loss; while importance sampling mitigates this, very low‑sample regimes can still destabilize training.
- Path‑selection overhead: Maintaining and updating the sampling distribution adds a modest bookkeeping cost, which may become noticeable for extremely large path spaces.
- Domain specificity: The current implementation focuses on JTFS for audio; extending SCRAPL to other scattering variants (e.g., 2‑D image scattering) will require tailored path‑pool definitions.
Future directions suggested by the authors include adaptive variance reduction techniques, integration with reinforcement‑learning style policy gradients for dynamic path selection, and benchmarking SCRAPL on speech enhancement and music transcription tasks to further validate its cross‑domain utility.
Authors
- Christopher Mitcheltree
- Vincent Lostanlen
- Emmanouil Benetos
- Mathieu Lagrange
Paper Information
- arXiv ID: 2602.11145v1
- Categories: cs.SD, cs.LG, eess.AS
- Published: February 11, 2026
- PDF: Download PDF