[Paper] Reciprocal Latent Fields for Precomputed Sound Propagation

Published: 2 months ago (February 6, 2026 at 01:31 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.06937v1

Overview

The paper presents Reciprocal Latent Fields (RLF), a new way to store and retrieve pre‑computed acoustic data for virtual environments. By compressing impulse‑response information into a tiny, learnable 3‑D grid, RLF cuts memory usage by orders of magnitude while preserving the realism of wave‑based sound propagation—making high‑fidelity audio feasible for real‑time games, VR, and AR.

Key Contributions

Reciprocal latent representation: A volumetric grid of trainable embeddings that guarantees source‑receiver reciprocity (the sound heard from A to B equals that from B to A).
Symmetric decoder architecture: A family of decoder functions that read two latent vectors (source & listener) and output the full set of acoustic parameters needed for rendering.
Riemannian metric learning: Introduces a geometry‑aware loss that better respects the physical relationships between acoustic parameters, improving fidelity in complex scenes.
Massive memory reduction: Demonstrates 2–4 orders of magnitude compression compared with naïve storage of per‑pair impulse responses.
Perceptual validation: A MUSHRA‑style listening test shows that listeners cannot reliably distinguish RLF‑generated audio from ground‑truth wave simulations.

Methodology

Pre‑computation: For a given scene, the authors run a high‑quality wave‑based simulator to generate impulse responses (IRs) for a dense set of source‑receiver positions.
Latent field construction: Instead of storing each IR directly, they embed the acoustic information into a 3‑D grid (the latent field). Each grid cell holds a low‑dimensional vector that is learned during training.
Symmetric decoding: When rendering sound for a particular source‑listener pair, the system samples the latent vectors at the two positions, feeds them into a symmetric decoder (e.g., a bilinear or attention‑based network) that outputs the scalar acoustic parameters (early reflections, reverberation decay, frequency‑dependent attenuation). The symmetry ensures reciprocity.
Loss functions:
- Reconstruction loss on the predicted acoustic parameters vs. the ground‑truth IRs.
- Riemannian metric loss that penalizes distortions in the acoustic space, encouraging the latent embeddings to respect the underlying physics.
Training & inference: The latent field and decoder are jointly optimized using stochastic gradient descent. At runtime, inference reduces to two trilinear look‑ups and a forward pass through a tiny neural network—fast enough for real‑time audio pipelines.

Results & Findings

Metric	Ground‑Truth (raw IR)	RLF (compressed)
Memory per scene	~10 GB (full pairwise IRs)	~10–100 MB
Parameter RMSE	—	0.03 dB (early reflections), 0.07 s (RT60)
Subjective MUSHRA score	92 %	90 % (statistically indistinguishable)
Inference latency (CPU)	N/A (offline)	< 0.5 ms per query

Quality: Across a variety of indoor and outdoor environments, RLF reproduces key acoustic cues (directional early reflections, reverberation tail, frequency filtering) with negligible audible artifacts.
Scalability: Memory savings grow dramatically as the number of sources and listeners increases, making large‑scale virtual cities tractable.
Robustness: The Riemannian loss consistently outperformed plain L2 loss, especially in highly reverberant or geometrically complex rooms.

Practical Implications

Game engines & VR platforms: Developers can now embed physically accurate sound propagation without bloating asset bundles, enabling richer immersion on consoles, mobile, and cloud‑streamed titles.
Audio middleware: Integration points (e.g., Unity’s AudioSource, FMOD, Wwise) can expose a “RLF‑mode” where the engine queries the latent field instead of loading massive IR tables.
Dynamic scenes: Because the latent field is scene‑specific but source‑agnostic, adding or moving sound sources at runtime incurs only a cheap lookup—ideal for interactive simulations and procedural content.
Edge & AR devices: The tiny memory footprint and low compute cost make high‑fidelity spatial audio feasible on head‑mounted displays and smartphones, where bandwidth and power are limited.
Research & tooling: The reciprocal latent representation can be repurposed for other reciprocal physical phenomena (e.g., RF propagation, light transport), opening avenues for cross‑domain acceleration.

Limitations & Future Work

Static geometry assumption: RLF assumes a fixed environment; dynamic geometry (e.g., moving walls) would require re‑training or an adaptive latent field.
Training cost: Generating the ground‑truth IR dataset and training the latent field can be expensive (hours on a GPU cluster), though it is a one‑time offline cost per scene.
Resolution trade‑off: Very fine‑grained acoustic detail (e.g., diffraction around tiny objects) may still be lost unless the latent grid is sufficiently dense, which modestly increases memory.
Future directions: Extending RLF to handle time‑varying scenes via incremental updates, exploring hierarchical latent fields for multi‑scale detail, and applying the framework to outdoor weather‑dependent acoustics.

Authors

Hugo Seuté
Pranai Vasudev
Etienne Richan
Louis‑Xavier Buffoni

Paper Information

arXiv ID: 2602.06937v1
Categories: cs.SD, cs.LG, eess.AS
Published: February 6, 2026
PDF: Download PDF

[Paper] Reciprocal Latent Fields for Precomputed Sound Propagation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

[Paper] Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches

[Paper] Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay

[Paper] Reliable Mislabel Detection for Video Capsule Endoscopy Data