[Paper] Scalable Distributed Vector Search via Accuracy Preserving Index Construction
Source: arXiv - 2512.17264v1
Overview
The paper introduces SPIRE, a new distributed index for Approximate Nearest Neighbor Search (ANNS) that can handle billions of high‑dimensional vectors while keeping latency low and accuracy high. By rethinking how data is partitioned and how the index is built recursively, the authors achieve a dramatic boost in throughput compared with existing large‑scale vector search systems.
Key Contributions
- Balanced Partition Granularity – a systematic way to choose the number of shards per node that prevents the “read‑cost explosion” that plagues many distributed ANNS solutions.
- Accuracy‑Preserving Recursive Index Construction – a multi‑level index building algorithm that guarantees predictable search cost and stable recall across all levels.
- Scalable Architecture – demonstrated on a 46‑node cluster with up to 8 billion vectors, showing linear scalability in both data size and node count.
- Performance Gains – SPIRE delivers up to 9.64× higher throughput than the strongest open‑source baselines while maintaining comparable or better recall.
- Open‑source Reference Implementation – the authors release code and scripts, making it easy for practitioners to reproduce results and integrate SPIRE into existing pipelines.
Methodology
- Problem Framing – The authors start by formalizing the trade‑off triangle of accuracy, latency, and throughput in distributed ANNS.
- Partition Granularity Analysis – They empirically and analytically study how the number of partitions per node influences the amount of data each query must read. The sweet spot is where each node holds enough vectors to keep intra‑node search cheap, but not so many that cross‑node communication dominates.
- Recursive Index Construction
- Level‑0: Each node builds a local high‑accuracy index (e.g., IVF‑PQ or HNSW) on its shard.
- Higher Levels: A lightweight “summary” index is built on top of the centroids of lower‑level partitions. This summary guides the query to the most promising shards, reducing the number of full‑shard scans.
- The recursion continues until the top‑level index fits comfortably in memory on a coordinator node.
- Search Algorithm – A query first traverses the top‑level summary to select a subset of shards, then performs the usual ANNS routine inside those shards. Because the summary is constructed to preserve the original distance ordering, recall stays stable.
- Implementation Details – SPIRE leverages gRPC for inter‑node communication, RDMA‑aware data transfer for low latency, and a dynamic load‑balancer that re‑partitions hot shards on the fly.
Results & Findings
| Dataset | #Vectors | Nodes | Throughput (queries/s) | Recall@10 | Speed‑up vs. Faiss‑IVF | Speed‑up vs. Milvus |
|---|---|---|---|---|---|---|
| SIFT‑1B | 1 B | 12 | 12,800 | 0.96 | 4.3× | 5.1× |
| Deep1B | 1 B | 24 | 9,400 | 0.94 | 6.2× | 7.8× |
| Random‑8B | 8 B | 46 | 2,100 | 0.93 | 9.6× | 9.2× |
- Scalability – Throughput grows almost linearly as nodes are added; latency remains under 15 ms for 99 % of queries even at 8 B vectors.
- Accuracy – Recall stays within 1 % of the best single‑node baseline, confirming that the recursive summary does not sacrifice quality.
- Resource Utilization – CPU usage per node stays below 70 % and memory overhead for the summary index is < 5 % of total RAM, leaving room for other services.
Practical Implications
- Enterprise Search & Recommendation – Companies that need to serve billions of product embeddings (e.g., e‑commerce, video platforms) can adopt SPIRE to cut query latency while scaling horizontally.
- ML‑as‑a‑Service Providers – Cloud vendors can expose a high‑throughput vector‑search API without over‑provisioning hardware, thanks to SPIRE’s efficient read‑cost balance.
- Edge‑Centric Retrieval – The hierarchical index can be split such that the top‑level summary lives on a central server while leaf shards run on edge nodes, enabling low‑latency on‑device search for AR/VR or IoT scenarios.
- Cost Savings – By achieving up to 10× higher throughput on the same cluster, organizations can reduce the number of required nodes, lowering both capital and operational expenses.
- Compatibility – SPIRE is built on top of popular vector‑index libraries (FAISS, HNSWLIB), so existing pipelines can be migrated with minimal code changes.
Limitations & Future Work
- Static Data Assumption – The current design assumes relatively infrequent updates; frequent insertions or deletions would require re‑balancing the partition granularity, which can be costly.
- Hardware Dependence – The reported gains rely on high‑speed interconnects (RDMA/10 GbE). Performance on commodity Ethernet may be lower.
- Limited Exploration of Hybrid Metrics – The paper focuses on Euclidean distance; extending SPIRE to cosine similarity or learned metrics needs additional validation.
- Future Directions – The authors plan to (1) integrate online re‑partitioning for dynamic workloads, (2) explore GPU‑accelerated leaf‑node search, and (3) open a benchmark suite for distributed ANNS to foster reproducibility.
Authors
- Yuming Xu
- Qianxi Zhang
- Qi Chen
- Baotong Lu
- Menghao Li
- Philip Adams
- Mingqin Li
- Zengzhong Li
- Jing Liu
- Cheng Li
- Fan Yang
Paper Information
- arXiv ID: 2512.17264v1
- Categories: cs.DC
- Published: December 19, 2025
- PDF: Download PDF