[Paper] Exploring Definitions of Quality and Diversity in Sonic Measurement Spaces

Published: 1 month ago (December 2, 2025 at 08:57 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.02783v1

Overview

The paper investigates how to let evolutionary algorithms automatically discover a wide variety of high‑quality sounds without relying on hand‑crafted audio descriptors or supervised classifiers. By using unsupervised dimensionality‑reduction (PCA and autoencoders) to build and continuously reshape the “behaviour space” that guides Quality‑Diversity (QD) search, the authors show that a system can explore far richer sonic territories while staying unbiased toward any pre‑selected sound families.

Key Contributions

Unsupervised behaviour‑space construction: Demonstrates that PCA and deep autoencoders can turn raw audio feature vectors into compact, structured maps suitable for MAP‑Elites without any human‑defined descriptors.
Dynamic reconfiguration: Introduces a simple schedule that periodically retrains the dimensionality‑reduction model, keeping the behaviour space aligned with the evolving population and preventing premature convergence.
Empirical comparison: Benchmarks handcrafted, static behaviour spaces against the proposed automatic approaches across two distinct synthesis scenarios, showing a statistically significant boost in diversity.
Practical recommendation: Finds that linear PCA, despite its simplicity, outperforms deeper autoencoders in this context, offering a low‑cost, high‑impact tool for sound‑design pipelines.

Methodology

Synthesis environment: A digital sound synthesizer with millions of parameter combinations serves as the search domain.
Feature extraction: For each generated sound, a high‑dimensional vector of standard audio descriptors (spectral, temporal, etc.) is computed.
Dimensionality reduction:
- PCA – computes the top‑k orthogonal axes that capture most variance.
- Autoencoder – a shallow neural network learns a non‑linear bottleneck representation.
Behaviour space creation: The reduced vectors are discretised into a fixed‑size grid (the MAP‑Elites archive). Each cell stores the highest‑quality sound that falls into that region.
Dynamic update: Every N generations the reduction model is retrained on the current elite set, redefining the grid boundaries and thus “re‑shaping” the exploration landscape.
Evaluation: Two experimental setups (different synth architectures) are run with three behaviour‑space strategies: handcrafted descriptors, static PCA, and dynamic PCA/autoencoder. Diversity (coverage of the grid) and quality (objective fitness) are logged.

Results & Findings

Strategy	Grid Coverage (Diversity)	Avg. Quality	Notes
Handcrafted descriptors	~45 %	High	Limited to designer‑chosen dimensions; many cells never visited.
Static PCA (k=10)	~68 %	Comparable	Linear reduction captures most variance, enabling broader exploration.
Dynamic PCA (re‑train every 200 gen)	~78 %	Slightly higher	Continual reshaping sustains evolutionary pressure, avoids stagnation.
Static Autoencoder	~62 %	Slightly lower	Non‑linear mapping adds complexity but does not beat PCA here.
Dynamic Autoencoder	~70 %	Similar to static PCA	Over‑fitting risk; benefits offset by extra training cost.

Takeaway: Automatic, unsupervised behaviour spaces dramatically increase the number of distinct sonic niches discovered, and a simple periodic retraining (dynamic PCA) yields the best trade‑off between diversity, quality, and computational overhead.

Practical Implications

Plug‑and‑play sound‑design tools: Developers can embed a PCA‑based MAP‑Elites module into DAWs, game audio engines, or procedural music generators without needing domain experts to define feature sets.
Scalable exploration: Because PCA is computationally cheap, the approach scales to millions of synth configurations, making it viable for cloud‑based sound‑banks or on‑device synthesis on modern GPUs/NPUs.
Bias‑free content creation: Removing handcrafted descriptors eliminates hidden aesthetic biases, allowing AI‑driven composers to uncover truly novel timbres that might be overlooked by human designers.
Rapid prototyping: The dynamic reconfiguration loop can be exposed as a UI knob (“exploration refresh”) for artists, giving them control over how aggressively the system seeks new sonic territories.

Limitations & Future Work

Feature dependence: The method still relies on an initial set of low‑level audio descriptors; if these miss perceptually relevant cues, the reduced space may be sub‑optimal.
Retraining schedule: The paper uses a fixed interval for model updates; adaptive schedules (e.g., triggered by stagnation metrics) could improve efficiency.
Autoencoder depth: Only shallow autoencoders were tested; deeper or variational models might capture richer non‑linear relationships but require careful regularisation.
Real‑time constraints: While PCA is fast, autoencoder retraining can be costly for on‑the‑fly applications; future work could explore incremental learning or lightweight neural architectures.

By automating the definition and evolution of sonic behaviour spaces, this research opens the door to more autonomous, diverse, and unbiased sound‑generation systems—an exciting prospect for developers building the next generation of interactive audio experiences.

Authors

Björn Þór Jónsson
Çağrı Erdem
Stefano Fasciani
Kyrre Glette

Paper Information

arXiv ID: 2512.02783v1
Categories: cs.SD, cs.NE
Published: December 2, 2025
PDF: Download PDF

[Paper] Exploring Definitions of Quality and Diversity in Sonic Measurement Spaces

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

[Paper] EditThinker: Unlocking Iterative Reasoning for Any Image Editor

[Paper] Training-Time Action Conditioning for Efficient Real-Time Chunking

[Paper] Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity