[Paper] Unsupervised Learning of Density Estimates with Topological Optimization

Published: (December 9, 2025 at 01:35 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.08895v1

Overview

This paper tackles a surprisingly sticky problem in unsupervised density estimation: picking the right kernel bandwidth. The authors show how to let topological data analysis (TDA) guide that choice automatically, removing the need for costly hand‑tuning or cross‑validation loops. By framing bandwidth selection as a topology‑aware optimization, they achieve more faithful density estimates—especially in higher‑dimensional settings where visual inspection is impossible.

Key Contributions

  • Topology‑driven loss function: Introduces a novel loss that penalizes deviations from the true persistent homology of the underlying data distribution.
  • Unsupervised bandwidth optimization: Provides an end‑to‑end algorithm that selects the kernel bandwidth without any labeled data or external validation set.
  • Comprehensive benchmarking: Evaluates the method against classic bandwidth selectors (Silverman’s rule, cross‑validation, plug‑in) on synthetic and real‑world datasets ranging from 2‑D to >10‑D.
  • Scalable implementation: Demonstrates that the approach can be integrated with existing KDE libraries and runs efficiently on CPU/GPU thanks to differentiable TDA primitives.
  • Open‑source release: Supplies code and reproducible notebooks, encouraging adoption by the ML community.

Methodology

  1. Kernel Density Estimation (KDE) – The standard KDE formula is used, but the bandwidth (h) is treated as a learnable parameter.
  2. Persistent Homology – For a given KDE, the authors compute its sublevel set filtration and extract persistence diagrams that capture connected components, loops, and higher‑dimensional voids.
  3. Topology‑Based Loss – They define a loss
    [ \mathcal{L}(h) = \sum_{k} w_k , d_{\text{Bottleneck}}(D_k^{\text{data}}, D_k^{\text{KDE}(h)}) ]
    where (D_k) are persistence diagrams for dimension (k) and (d_{\text{Bottleneck}}) measures diagram similarity. The weights (w_k) let users prioritize certain topological features.
  4. Gradient‑Based Optimization – Using differentiable approximations of the bottleneck distance (e.g., smooth Wasserstein‑type surrogates), the loss is back‑propagated to update (h).
  5. Stopping Criteria – Optimization halts when the loss plateaus or when a predefined number of iterations is reached, yielding the “topologically optimal” bandwidth.

Results & Findings

DatasetDim.Baseline (Silverman)CV‑KDETopology‑OptimizedRelative Improvement (KL)
2‑D Gaussian mixture20.1120.0980.06739%
Swiss‑roll (noisy)30.2150.1890.14333%
High‑dim. gene expression120.3740.3610.29821%
Real‑world sensor network80.2410.2290.18224%
  • Topological fidelity: Persistence diagrams of the optimized KDE match the ground‑truth diagrams far more closely than baselines, preserving the number of modes and loops.
  • Robustness to dimensionality: The advantage grows with dimension, where traditional bandwidth rules tend to over‑smooth.
  • Computation: On a standard laptop, the full optimization (including TDA) finishes within 30 seconds for ≤10 K samples, comparable to a single cross‑validation run.

Practical Implications

  • Plug‑and‑play KDE: Developers can replace manual bandwidth selection with a single function call that internally runs the topology‑aware optimizer, saving engineering time.
  • Better Bayesian priors: In probabilistic models that rely on KDE for prior or likelihood approximations (e.g., Approximate Bayesian Computation), more accurate densities lead to tighter posterior estimates.
  • Anomaly detection: Preserving topological features means rare but structurally important modes are not washed out, improving detection of outliers in high‑dimensional telemetry or cybersecurity data.
  • Data‑driven simulation: Stochastic simulators that need smooth yet faithful probability fields (e.g., fluid dynamics, material science) can benefit from automatically tuned KDEs without bespoke tuning per dataset.
  • Integration with ML pipelines: The method works with PyTorch/TensorFlow via autograd‑compatible TDA libraries, enabling end‑to‑end training where density estimation is a differentiable layer (e.g., normalizing flows).

Limitations & Future Work

  • Scalability to massive data: Persistent homology computation still scales roughly quadratically with sample size; the authors suggest using subsampling or streaming TDA for larger corpora.
  • Choice of topological weights: Selecting (w_k) requires domain knowledge; an adaptive scheme could automate this.
  • Extension beyond KDE: The current framework is tied to kernel density estimators; future research could apply topology‑based loss to other density models like Gaussian mixtures or normalizing flows.
  • Theoretical guarantees: While empirical results are strong, formal convergence proofs for the topology‑driven bandwidth estimator remain an open question.

Bottom line: By marrying kernel density estimation with topological data analysis, this work offers a practical, unsupervised route to smarter bandwidth selection—opening the door for more reliable density‑driven components across the ML stack.

Authors

  • Suina Tanweer
  • Firas A. Khasawneh

Paper Information

  • arXiv ID: 2512.08895v1
  • Categories: cs.LG, stat.ML
  • Published: December 9, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »