[Paper] Reciprocal Latent Fields for Precomputed Sound Propagation
Source: arXiv - 2602.06937v1
Overview
The paper presents Reciprocal Latent Fields (RLF), a new way to store and retrieve pre‑computed acoustic data for virtual environments. By compressing impulse‑response information into a tiny, learnable 3‑D grid, RLF cuts memory usage by orders of magnitude while preserving the realism of wave‑based sound propagation—making high‑fidelity audio feasible for real‑time games, VR, and AR.
Key Contributions
- Reciprocal latent representation: A volumetric grid of trainable embeddings that guarantees source‑receiver reciprocity (the sound heard from A to B equals that from B to A).
- Symmetric decoder architecture: A family of decoder functions that read two latent vectors (source & listener) and output the full set of acoustic parameters needed for rendering.
- Riemannian metric learning: Introduces a geometry‑aware loss that better respects the physical relationships between acoustic parameters, improving fidelity in complex scenes.
- Massive memory reduction: Demonstrates 2–4 orders of magnitude compression compared with naïve storage of per‑pair impulse responses.
- Perceptual validation: A MUSHRA‑style listening test shows that listeners cannot reliably distinguish RLF‑generated audio from ground‑truth wave simulations.
Methodology
- Pre‑computation: For a given scene, the authors run a high‑quality wave‑based simulator to generate impulse responses (IRs) for a dense set of source‑receiver positions.
- Latent field construction: Instead of storing each IR directly, they embed the acoustic information into a 3‑D grid (the latent field). Each grid cell holds a low‑dimensional vector that is learned during training.
- Symmetric decoding: When rendering sound for a particular source‑listener pair, the system samples the latent vectors at the two positions, feeds them into a symmetric decoder (e.g., a bilinear or attention‑based network) that outputs the scalar acoustic parameters (early reflections, reverberation decay, frequency‑dependent attenuation). The symmetry ensures reciprocity.
- Loss functions:
- Reconstruction loss on the predicted acoustic parameters vs. the ground‑truth IRs.
- Riemannian metric loss that penalizes distortions in the acoustic space, encouraging the latent embeddings to respect the underlying physics.
- Training & inference: The latent field and decoder are jointly optimized using stochastic gradient descent. At runtime, inference reduces to two trilinear look‑ups and a forward pass through a tiny neural network—fast enough for real‑time audio pipelines.
Results & Findings
| Metric | Ground‑Truth (raw IR) | RLF (compressed) |
|---|---|---|
| Memory per scene | ~10 GB (full pairwise IRs) | ~10–100 MB |
| Parameter RMSE | — | 0.03 dB (early reflections), 0.07 s (RT60) |
| Subjective MUSHRA score | 92 % | 90 % (statistically indistinguishable) |
| Inference latency (CPU) | N/A (offline) | < 0.5 ms per query |
- Quality: Across a variety of indoor and outdoor environments, RLF reproduces key acoustic cues (directional early reflections, reverberation tail, frequency filtering) with negligible audible artifacts.
- Scalability: Memory savings grow dramatically as the number of sources and listeners increases, making large‑scale virtual cities tractable.
- Robustness: The Riemannian loss consistently outperformed plain L2 loss, especially in highly reverberant or geometrically complex rooms.
Practical Implications
- Game engines & VR platforms: Developers can now embed physically accurate sound propagation without bloating asset bundles, enabling richer immersion on consoles, mobile, and cloud‑streamed titles.
- Audio middleware: Integration points (e.g., Unity’s AudioSource, FMOD, Wwise) can expose a “RLF‑mode” where the engine queries the latent field instead of loading massive IR tables.
- Dynamic scenes: Because the latent field is scene‑specific but source‑agnostic, adding or moving sound sources at runtime incurs only a cheap lookup—ideal for interactive simulations and procedural content.
- Edge & AR devices: The tiny memory footprint and low compute cost make high‑fidelity spatial audio feasible on head‑mounted displays and smartphones, where bandwidth and power are limited.
- Research & tooling: The reciprocal latent representation can be repurposed for other reciprocal physical phenomena (e.g., RF propagation, light transport), opening avenues for cross‑domain acceleration.
Limitations & Future Work
- Static geometry assumption: RLF assumes a fixed environment; dynamic geometry (e.g., moving walls) would require re‑training or an adaptive latent field.
- Training cost: Generating the ground‑truth IR dataset and training the latent field can be expensive (hours on a GPU cluster), though it is a one‑time offline cost per scene.
- Resolution trade‑off: Very fine‑grained acoustic detail (e.g., diffraction around tiny objects) may still be lost unless the latent grid is sufficiently dense, which modestly increases memory.
- Future directions: Extending RLF to handle time‑varying scenes via incremental updates, exploring hierarchical latent fields for multi‑scale detail, and applying the framework to outdoor weather‑dependent acoustics.
Authors
- Hugo Seuté
- Pranai Vasudev
- Etienne Richan
- Louis‑Xavier Buffoni
Paper Information
- arXiv ID: 2602.06937v1
- Categories: cs.SD, cs.LG, eess.AS
- Published: February 6, 2026
- PDF: Download PDF