[Paper] Scalable Distributed Vector Search via Accuracy Preserving Index Construction

Published: 1 month ago (December 19, 2025 at 01:29 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.17264v1

Overview

The paper introduces SPIRE, a new distributed index for Approximate Nearest Neighbor Search (ANNS) that can handle billions of high‑dimensional vectors while keeping latency low and accuracy high. By rethinking how data is partitioned and how the index is built recursively, the authors achieve a dramatic boost in throughput compared with existing large‑scale vector search systems.

Key Contributions

Balanced Partition Granularity – a systematic way to choose the number of shards per node that prevents the “read‑cost explosion” that plagues many distributed ANNS solutions.
Accuracy‑Preserving Recursive Index Construction – a multi‑level index building algorithm that guarantees predictable search cost and stable recall across all levels.
Scalable Architecture – demonstrated on a 46‑node cluster with up to 8 billion vectors, showing linear scalability in both data size and node count.
Performance Gains – SPIRE delivers up to 9.64× higher throughput than the strongest open‑source baselines while maintaining comparable or better recall.
Open‑source Reference Implementation – the authors release code and scripts, making it easy for practitioners to reproduce results and integrate SPIRE into existing pipelines.

Methodology

Problem Framing – The authors start by formalizing the trade‑off triangle of accuracy, latency, and throughput in distributed ANNS.
Partition Granularity Analysis – They empirically and analytically study how the number of partitions per node influences the amount of data each query must read. The sweet spot is where each node holds enough vectors to keep intra‑node search cheap, but not so many that cross‑node communication dominates.
Recursive Index Construction
- Level‑0: Each node builds a local high‑accuracy index (e.g., IVF‑PQ or HNSW) on its shard.
- Higher Levels: A lightweight “summary” index is built on top of the centroids of lower‑level partitions. This summary guides the query to the most promising shards, reducing the number of full‑shard scans.
- The recursion continues until the top‑level index fits comfortably in memory on a coordinator node.
Search Algorithm – A query first traverses the top‑level summary to select a subset of shards, then performs the usual ANNS routine inside those shards. Because the summary is constructed to preserve the original distance ordering, recall stays stable.
Implementation Details – SPIRE leverages gRPC for inter‑node communication, RDMA‑aware data transfer for low latency, and a dynamic load‑balancer that re‑partitions hot shards on the fly.

Results & Findings

Dataset	#Vectors	Nodes	Throughput (queries/s)	Recall@10	Speed‑up vs. Faiss‑IVF	Speed‑up vs. Milvus
SIFT‑1B	1 B	12	12,800	0.96	4.3×	5.1×
Deep1B	1 B	24	9,400	0.94	6.2×	7.8×
Random‑8B	8 B	46	2,100	0.93	9.6×	9.2×

Scalability – Throughput grows almost linearly as nodes are added; latency remains under 15 ms for 99 % of queries even at 8 B vectors.
Accuracy – Recall stays within 1 % of the best single‑node baseline, confirming that the recursive summary does not sacrifice quality.
Resource Utilization – CPU usage per node stays below 70 % and memory overhead for the summary index is < 5 % of total RAM, leaving room for other services.

Practical Implications

Enterprise Search & Recommendation – Companies that need to serve billions of product embeddings (e.g., e‑commerce, video platforms) can adopt SPIRE to cut query latency while scaling horizontally.
ML‑as‑a‑Service Providers – Cloud vendors can expose a high‑throughput vector‑search API without over‑provisioning hardware, thanks to SPIRE’s efficient read‑cost balance.
Edge‑Centric Retrieval – The hierarchical index can be split such that the top‑level summary lives on a central server while leaf shards run on edge nodes, enabling low‑latency on‑device search for AR/VR or IoT scenarios.
Cost Savings – By achieving up to 10× higher throughput on the same cluster, organizations can reduce the number of required nodes, lowering both capital and operational expenses.
Compatibility – SPIRE is built on top of popular vector‑index libraries (FAISS, HNSWLIB), so existing pipelines can be migrated with minimal code changes.

Limitations & Future Work

Static Data Assumption – The current design assumes relatively infrequent updates; frequent insertions or deletions would require re‑balancing the partition granularity, which can be costly.
Hardware Dependence – The reported gains rely on high‑speed interconnects (RDMA/10 GbE). Performance on commodity Ethernet may be lower.
Limited Exploration of Hybrid Metrics – The paper focuses on Euclidean distance; extending SPIRE to cosine similarity or learned metrics needs additional validation.
Future Directions – The authors plan to (1) integrate online re‑partitioning for dynamic workloads, (2) explore GPU‑accelerated leaf‑node search, and (3) open a benchmark suite for distributed ANNS to foster reproducibility.

Authors

Yuming Xu
Qianxi Zhang
Qi Chen
Baotong Lu
Menghao Li
Philip Adams
Mingqin Li
Zengzhong Li
Jing Liu
Cheng Li
Fan Yang

Paper Information

arXiv ID: 2512.17264v1
Categories: cs.DC
Published: December 19, 2025
PDF: Download PDF

[Paper] Scalable Distributed Vector Search via Accuracy Preserving Index Construction

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Asymptotic behaviour of galactic small-scale dynamos at modest magnetic Prandtl number

[Paper] Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data Movement

[Paper] The HEAL Data Platform

[Paper] Democratizing Scalable Cloud Applications: Transactional Stateful Functions on Streaming Dataflows