[Paper] Reciprocal Latent Fields for Precomputed Sound Propagation

Published: (February 6, 2026 at 01:31 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.06937v1

Overview

The paper presents Reciprocal Latent Fields (RLF), a new way to store and retrieve pre‑computed acoustic data for virtual environments. By compressing impulse‑response information into a tiny, learnable 3‑D grid, RLF cuts memory usage by orders of magnitude while preserving the realism of wave‑based sound propagation—making high‑fidelity audio feasible for real‑time games, VR, and AR.

Key Contributions

  • Reciprocal latent representation: A volumetric grid of trainable embeddings that guarantees source‑receiver reciprocity (the sound heard from A to B equals that from B to A).
  • Symmetric decoder architecture: A family of decoder functions that read two latent vectors (source & listener) and output the full set of acoustic parameters needed for rendering.
  • Riemannian metric learning: Introduces a geometry‑aware loss that better respects the physical relationships between acoustic parameters, improving fidelity in complex scenes.
  • Massive memory reduction: Demonstrates 2–4 orders of magnitude compression compared with naïve storage of per‑pair impulse responses.
  • Perceptual validation: A MUSHRA‑style listening test shows that listeners cannot reliably distinguish RLF‑generated audio from ground‑truth wave simulations.

Methodology

  1. Pre‑computation: For a given scene, the authors run a high‑quality wave‑based simulator to generate impulse responses (IRs) for a dense set of source‑receiver positions.
  2. Latent field construction: Instead of storing each IR directly, they embed the acoustic information into a 3‑D grid (the latent field). Each grid cell holds a low‑dimensional vector that is learned during training.
  3. Symmetric decoding: When rendering sound for a particular source‑listener pair, the system samples the latent vectors at the two positions, feeds them into a symmetric decoder (e.g., a bilinear or attention‑based network) that outputs the scalar acoustic parameters (early reflections, reverberation decay, frequency‑dependent attenuation). The symmetry ensures reciprocity.
  4. Loss functions:
    • Reconstruction loss on the predicted acoustic parameters vs. the ground‑truth IRs.
    • Riemannian metric loss that penalizes distortions in the acoustic space, encouraging the latent embeddings to respect the underlying physics.
  5. Training & inference: The latent field and decoder are jointly optimized using stochastic gradient descent. At runtime, inference reduces to two trilinear look‑ups and a forward pass through a tiny neural network—fast enough for real‑time audio pipelines.

Results & Findings

MetricGround‑Truth (raw IR)RLF (compressed)
Memory per scene~10 GB (full pairwise IRs)~10–100 MB
Parameter RMSE0.03 dB (early reflections), 0.07 s (RT60)
Subjective MUSHRA score92 %90 % (statistically indistinguishable)
Inference latency (CPU)N/A (offline)< 0.5 ms per query
  • Quality: Across a variety of indoor and outdoor environments, RLF reproduces key acoustic cues (directional early reflections, reverberation tail, frequency filtering) with negligible audible artifacts.
  • Scalability: Memory savings grow dramatically as the number of sources and listeners increases, making large‑scale virtual cities tractable.
  • Robustness: The Riemannian loss consistently outperformed plain L2 loss, especially in highly reverberant or geometrically complex rooms.

Practical Implications

  • Game engines & VR platforms: Developers can now embed physically accurate sound propagation without bloating asset bundles, enabling richer immersion on consoles, mobile, and cloud‑streamed titles.
  • Audio middleware: Integration points (e.g., Unity’s AudioSource, FMOD, Wwise) can expose a “RLF‑mode” where the engine queries the latent field instead of loading massive IR tables.
  • Dynamic scenes: Because the latent field is scene‑specific but source‑agnostic, adding or moving sound sources at runtime incurs only a cheap lookup—ideal for interactive simulations and procedural content.
  • Edge & AR devices: The tiny memory footprint and low compute cost make high‑fidelity spatial audio feasible on head‑mounted displays and smartphones, where bandwidth and power are limited.
  • Research & tooling: The reciprocal latent representation can be repurposed for other reciprocal physical phenomena (e.g., RF propagation, light transport), opening avenues for cross‑domain acceleration.

Limitations & Future Work

  • Static geometry assumption: RLF assumes a fixed environment; dynamic geometry (e.g., moving walls) would require re‑training or an adaptive latent field.
  • Training cost: Generating the ground‑truth IR dataset and training the latent field can be expensive (hours on a GPU cluster), though it is a one‑time offline cost per scene.
  • Resolution trade‑off: Very fine‑grained acoustic detail (e.g., diffraction around tiny objects) may still be lost unless the latent grid is sufficiently dense, which modestly increases memory.
  • Future directions: Extending RLF to handle time‑varying scenes via incremental updates, exploring hierarchical latent fields for multi‑scale detail, and applying the framework to outdoor weather‑dependent acoustics.

Authors

  • Hugo Seuté
  • Pranai Vasudev
  • Etienne Richan
  • Louis‑Xavier Buffoni

Paper Information

  • arXiv ID: 2602.06937v1
  • Categories: cs.SD, cs.LG, eess.AS
  • Published: February 6, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »