[Paper] Local Rendezvous Hashing: Bounded Loads and Minimal Churn via Cache-Local Candidates

Published: (December 29, 2025 at 07:52 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.23434v1

Overview

Consistent hashing is the go‑to technique for spreading data across a cluster of machines, but classic ring‑based designs either suffer from uneven load (high “peak‑to‑average” ratios) or require a massive number of virtual nodes that blow up memory traffic. The new Local Rendezvous Hashing (LRH) algorithm keeps the familiar ring layout and limits the candidate set for each key to a small, cache‑friendly window of neighboring nodes, delivering far better load balance while slashing lookup latency.

Key Contributions

  • Hybrid design that combines the deterministic ring of classic consistent hashing with the load‑balancing power of Highest Random Weight (HRH) selection.
  • Cache‑local candidate window: only the C nearest distinct physical nodes are considered, eliminating scattered memory accesses.
  • O(log |R| + C) lookup cost: a single binary search to locate the ring position plus a constant‑time enumeration of the C candidates using pre‑computed offsets.
  • Zero excess churn on node failures: when a node goes down, only keys whose original winner was that node are remapped, thanks to a fixed‑candidate filtering step.
  • Empirical gains: on a 5 k‑node cluster with 256 virtual nodes per machine (≈1.28 M ring entries), LRH cuts the max/avg load ratio from 1.2785 to 1.0947 and runs ~6.8× faster than an 8‑probe multi‑probe consistent hash while achieving comparable balance.

Methodology

  1. Ring construction – Each physical node contributes V virtual points (as in traditional consistent hashing). The full ring size is |R| = N × V.
  2. Key placement – For a given key, LRH performs a binary search on the sorted ring to find the first virtual point ≥ hash(key).
  3. Candidate window – Starting from that point, LRH walks forward, skipping duplicate physical nodes, until it has collected C distinct candidates. The “next‑distinct” offsets are pre‑computed, so the walk is just a few array index jumps that stay inside the CPU cache.
  4. HRW selection – Each candidate receives a weight = hash(key ∥ candidate_id). The candidate with the highest weight wins (optionally multiplied by a static node weight for heterogeneous clusters).
  5. Failure handling – If a node fails, LRH simply re‑evaluates the same C candidates; only keys whose winner was the failed node are reassigned, guaranteeing no extra churn beyond what is strictly necessary.

The whole process is deterministic, requires no extra probing loops, and fits neatly into modern CPU cache lines.

Results & Findings

MetricMulti‑probe (8 probes)LRH (C = 8)
Throughput8.80 M keys/s60.05 M keys/s
Max/Avg load ratio1.06971.0947 (vs. 1.2785 for plain ring)
Lookup costO(log R
Churn on failureNon‑zero excess churnZero excess churn

A micro‑benchmark showed that the dominant cost in multi‑probe hashing is the repeated binary searches and the resulting cache misses, not the arithmetic to generate probes. LRH’s single search plus a tight, cache‑resident candidate walk eliminates that bottleneck.

Practical Implications

  • High‑performance key‑value stores (e.g., distributed caches, NoSQL databases) can adopt LRH to achieve near‑optimal load balance without inflating the number of virtual nodes, thus saving memory and reducing GC pressure.
  • Edge and IoT deployments where node resources are limited benefit from the deterministic, low‑overhead lookup—especially when the cluster topology changes frequently.
  • Service mesh routing and sharding layers can use LRH to keep request distribution even while guaranteeing that a node failure only impacts the keys that truly belong to it, simplifying state‑migration logic.
  • Cost‑aware scaling: because LRH works with weighted nodes, operators can assign higher weights to more powerful machines, letting the algorithm automatically steer more traffic their way without custom partitioning code.
  • Cache‑friendly design aligns well with modern CPUs and can be implemented in languages that expose low‑level memory control (C/C++, Rust) or even in high‑level runtimes (Java, Go) using off‑heap structures.

Limitations & Future Work

  • Fixed window size C: While a small C (e.g., 8) works well in the evaluated setup, the optimal C may vary with cluster size, workload skew, or hardware cache characteristics; adaptive tuning is left for future research.
  • Assumes static ring topology: The analysis focuses on “fixed‑topology liveness changes.” Dynamic addition/removal of nodes beyond simple failures may require re‑balancing the pre‑computed offsets.
  • Weighted HRW not deeply explored: The paper mentions optional node weights but does not evaluate heterogeneous clusters extensively.
  • Implementation complexity: Pre‑computing distinct offsets and maintaining them across re‑hashes adds engineering overhead compared to naïve ring hashing.

Future work could explore adaptive C selection, integration with auto‑scaling controllers, and broader benchmarks across heterogeneous cloud environments.

Authors

  • Yongjie Guan

Paper Information

  • arXiv ID: 2512.23434v1
  • Categories: cs.DC, cs.NI, cs.PF
  • Published: December 29, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »