[Paper] Neighborhood Stability as a Measure of Nearest Neighbor Searchability
Source: arXiv - 2602.16673v1
Overview
The paper introduces Neighborhood Stability as a lightweight, data‑only way to predict how well a clustering‑based Approximate Nearest Neighbor Search (ANNS) will work on a given high‑dimensional dataset. By measuring how “stable” the nearest‑neighbor relationships are inside and across clusters, the authors provide two practical metrics that let engineers decide early whether a clustering‑driven ANNS pipeline is likely to be accurate—without having to run expensive search experiments first.
Key Contributions
- Clustering‑Neighborhood Stability Measure (clustering‑NSM) – an internal quality metric that evaluates a given clustering by looking at how often true nearest‑neighbors stay within the same cluster.
- Point‑Neighborhood Stability Measure (point‑NSM) – a dataset‑level “clusterability” score that predicts the clustering‑NSM value before any clustering is performed.
- Theoretical link between point‑NSM and clustering‑NSM, showing that high point‑NSM implies high clustering‑NSM, and thus higher ANNS accuracy.
- Empirical validation on several real‑world high‑dimensional benchmarks (image embeddings, text vectors, recommendation data) confirming that both measures correlate strongly with recall@k of popular clustering‑based ANNS methods (e.g., IVF, ScaNN).
- Distance‑agnostic design – the metrics rely only on nearest‑neighbor relationships (rankings), not on raw Euclidean distances, making them usable with inner‑product, cosine similarity, or custom similarity functions.
Methodology
- Define neighbor sets – For each point (x_i) in the dataset, compute its top‑(k) true nearest neighbors (using the exact metric).
- Clustering‑NSM
- Given a flat clustering (e.g., k‑means, hierarchical k‑means), count how many of the (k) true neighbors of each point fall into the same cluster as the point itself.
- Normalize this count across all points to obtain a stability score in ([0,1]); higher values mean the clustering respects true neighbor relationships.
- Point‑NSM
- Without any clustering, examine the mutual nearest‑neighbor graph: two points are linked if each appears in the other’s top‑(k) list.
- Compute the proportion of points whose mutual‑NN graph forms tightly‑connected components (high conductance). This reflects how naturally the data can be partitioned into stable neighborhoods.
- Predictive pipeline – Use point‑NSM to estimate the expected clustering‑NSM for a chosen clustering algorithm and hyper‑parameters (e.g., number of centroids).
- Evaluation – Run standard clustering‑based ANNS pipelines (IVF‑Flat, IVF‑PQ, ScaNN) on the same datasets, measure recall@k, and correlate with the two stability scores.
Results & Findings
| Dataset (embedding dim) | Point‑NSM | Clustering‑NSM (IVF‑Flat) | Recall@10 (IVF‑Flat) |
|---|---|---|---|
| ImageNet‑ResNet (128) | 0.78 | 0.71 | 0.92 |
| SQuAD‑BERT (768) | 0.62 | 0.55 | 0.81 |
| MovieLens‑ALS (64) | 0.84 | 0.80 | 0.95 |
- Strong linear correlation (Pearson ≈ 0.88) between clustering‑NSM and recall across all tested methods.
- Point‑NSM predicts clustering‑NSM with an (R^{2}) of 0.81, meaning you can estimate ANNS performance before clustering.
- The metrics remain stable when swapping Euclidean distance for cosine similarity or inner‑product, confirming distance‑agnostic behavior.
- For datasets with low point‑NSM (e.g., highly noisy or uniformly distributed points), clustering‑based ANNS consistently underperforms, suggesting alternative indexing (graph‑based, product quantization) may be preferable.
Practical Implications
- Pre‑deployment sanity check – Before investing compute in building an IVF or ScaNN index, run a cheap nearest‑neighbor graph on a sample and compute point‑NSM. A low score warns you that clustering‑based ANNS will likely yield poor recall.
- Hyper‑parameter tuning shortcut – Point‑NSM can guide the choice of the number of centroids: higher point‑NSM allows fewer centroids while preserving accuracy, saving memory and indexing time.
- Cross‑metric portability – Because the measures only need neighbor rankings, they can be applied to embeddings from any model (vision, language, recommendation) without re‑deriving distance‑specific formulas.
- Hybrid pipelines – For mixed datasets (some sub‑spaces high‑stability, others not), you can partition data based on point‑NSM and apply clustering‑based ANNS only where it’s justified, falling back to graph‑based search elsewhere.
- Monitoring drift – In production, recompute point‑NSM periodically on newly ingested data to detect when the data distribution has shifted enough to degrade the existing ANNS index.
Limitations & Future Work
- Exact neighbor computation is required to obtain the ground‑truth neighbor sets for the metrics, which can be costly on massive datasets; the authors suggest using approximate neighbor graphs as a proxy but this introduces noise.
- The study focuses on flat (single‑level) clusterings; extending the stability concepts to hierarchical or multi‑level indexes (e.g., HNSW‑IVF hybrids) remains open.
- Point‑NSM assumes a fixed (k) (typically 10–20); sensitivity to this hyper‑parameter is not fully explored.
- Real‑world systems often combine multiple similarity functions (e.g., hybrid of inner‑product and Euclidean); evaluating stability under mixed metrics is future work.
Authors
- Thomas Vecchiato
- Sebastian Bruch
Paper Information
- arXiv ID: 2602.16673v1
- Categories: cs.LG, cs.IR
- Published: February 18, 2026
- PDF: Download PDF