[Paper] LeaseGuard: Raft Leases Done Right

Published: 1 month ago (December 17, 2025 at 01:11 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.15659v1

Overview

The paper presents LeaseGuard, a new lease‑based protocol that lets Raft leaders serve strongly consistent reads without the costly quorum round‑trip that most Raft deployments currently require. By exploiting properties that are unique to Raft elections, LeaseGuard achieves zero‑latency reads while keeping the system safe during leader changes—a long‑standing pain point for distributed databases.

Key Contributions

A rigorously specified lease algorithm built on Raft’s election guarantees, formalized in TLA+.
Two availability‑boosting optimizations:
1. Rapid restoration of write throughput after a leader failover.
2. Near‑instant read availability on a newly elected leader.
Practical implementation in the LogCabin reference Raft codebase, demonstrating real‑world feasibility.
Comprehensive evaluation (Python simulation + C++ prototype) showing:
- Consistent reads drop from one network round‑trip to zero.
- Write throughput climbs from ~1 k to ~10 k ops/sec.
- 99 % of reads succeed immediately after a leader change.

Methodology

Problem framing – The authors dissect why existing Raft‑based systems either pay a per‑read quorum cost or use loosely defined leader leases that hurt availability.
LeaseGuard design – They derive a lease invariant directly from Raft’s election safety property: a leader can safely claim a lease only if it knows that no other node can become leader before the lease expires. This eliminates the need for extra “lease‑grant” messages.
Optimizations –
- Write‑throughput boost: When a leader steps down, the new leader pre‑emptively extends its lease using the term number, allowing pending writes to flow without waiting for a full election.
- Read‑availability boost: The new leader immediately serves reads for the majority of keys, deferring only those that might be in the “lease‑gap” window.
Formal verification – The entire protocol is encoded in TLA+ and model‑checked to prove safety (no stale reads) and liveness (reads eventually succeed).
Empirical evaluation –
- A Python event‑driven simulator explores a wide range of failure patterns and network latencies.
- A production‑grade implementation replaces LogCabin’s default quorum‑read path with LeaseGuard, measuring latency, throughput, and read‑availability during leader churn.

Results & Findings

Metric	Traditional Raft (quorum reads)	LeaseGuard
Read latency	1 network RTT (≈ 1 ms‑10 ms)	0 RTT (local read)
Write throughput	~1 k ops/s (limited by read‑write contention)	~10 k ops/s (≈ 10× boost)
Read success after failover	~0 % until lease expires (seconds)	~99 % instantly
Safety	Proven by Raft’s original proof	Proven again in TLA+ (no stale reads)

The data shows that LeaseGuard eliminates the read‑side bottleneck without compromising Raft’s strong consistency guarantees. Even under rapid leader failures, the system continues to serve reads almost immediately, a dramatic improvement over the “read‑pause” period of classic lease schemes.

Practical Implications

Lower latency for read‑heavy workloads – Services like configuration stores, feature‑flag systems, or metadata layers can now serve reads locally on the leader, shaving off network latency entirely.
Higher overall throughput – By decoupling reads from the quorum path, write pipelines stay saturated, which is especially valuable for micro‑service back‑ends that experience bursty write spikes.
Simpler deployment – LeaseGuard’s specification is concrete and formally verified, reducing the risk of subtle bugs that plague ad‑hoc lease implementations. Teams can adopt it as a drop‑in replacement in existing Raft‑based stacks (e.g., etcd, Consul, LogCabin) with minimal code changes.
Improved availability during failover – Cloud‑native operators often worry about “read‑downtime” when a leader crashes; LeaseGuard keeps the service responsive, easing SLA compliance.
Foundation for hybrid consistency models – Because reads are now cheap, developers can more easily build read‑optimistic caches or combine strong reads with eventual‑consistent replicas without a separate read‑path shim.

Limitations & Future Work

Assumes reliable clock monotonicity – LeaseGuard’s safety hinges on bounded clock drift; environments with highly variable clocks may need additional synchronization.
Focused on single‑leader Raft – The protocol has not been evaluated in multi‑leader or sharded Raft deployments, which could expose new edge cases.
Simulation‑heavy validation – While the LogCabin prototype shows promising numbers, larger‑scale production experiments (e.g., in geo‑distributed clusters) are needed to confirm scalability.
Potential integration overhead – Existing Raft libraries may require non‑trivial refactoring to expose the term‑based lease hooks used by LeaseGuard.

Future research directions include extending LeaseGuard to work with Raft variants that support joint consensus, exploring adaptive lease durations based on observed network latency, and integrating the protocol into widely‑used open‑source Raft implementations (etcd, Consul) for broader community validation.

Authors

A. Jesse Jiryu Davis
Murat Demirbas
Lingzhi Deng

Paper Information

arXiv ID: 2512.15659v1
Categories: cs.DC
Published: December 17, 2025
PDF: Download PDF

[Paper] LeaseGuard: Raft Leases Done Right

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Asymptotic behaviour of galactic small-scale dynamos at modest magnetic Prandtl number

[Paper] Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data Movement

[Paper] The HEAL Data Platform

[Paper] Democratizing Scalable Cloud Applications: Transactional Stateful Functions on Streaming Dataflows