[Paper] Lotus: Optimizing Disaggregated Transactions with Disaggregated Locks

Published: (December 17, 2025 at 10:49 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.16136v1

Overview

The paper introduces Lotus, a new distributed transaction system designed for disaggregated memory (DM) architectures. By moving lock management from memory‑node NICs to compute nodes, Lotus eliminates a major network bottleneck and delivers up to 2× higher throughput for OLTP workloads.

Key Contributions

  • Lock disaggregation: Locks are stored and processed on compute nodes (CNs) instead of memory nodes (MNs), freeing the MN RDMA NICs from heavy atomic‑operation traffic.
  • Application‑aware lock sharding: Lotus partitions locks based on workload locality, achieving balanced load across CNs while preserving cache friendliness.
  • Lock‑first transaction protocol: Transactions acquire all required locks before any data access, enabling early conflict detection and proactive aborts.
  • Lock‑rebuild‑free recovery: Treats locks as transient; after a CN crash, the system recovers without reconstructing lock state, keeping recovery lightweight.
  • Performance gains: Empirical evaluation shows up to 2.1× higher throughput and ≈50 % lower latency versus the best existing DM transaction systems.

Methodology

  1. System model: The authors assume a typical DM deployment where multiple CNs communicate with a pool of MNs via RDMA. Traditional designs place lock metadata on MNs, causing the NIC to handle a flood of one‑sided atomic ops (e.g., compare‑and‑swap).
  2. Lock disaggregation design: Lotus relocates each lock to the CN that initiates the transaction. A lightweight lock table is kept in the CN’s local memory, and lock ownership is advertised to MNs via a small “lock‑owner” directory that can be cached.
  3. Sharding algorithm: Locks are grouped by the primary data items they protect. Using workload statistics (e.g., hot keys), Lotus assigns groups to CNs so that most lock requests stay local, while a simple hash‑based fallback ensures even distribution when hotspots shift.
  4. Transaction flow:
    • Lock‑first phase: The CN sends a batch of lock‑acquire messages to the relevant CNs (including itself).
    • Validation: If any lock fails, the transaction aborts immediately—no data reads are performed.
    • Execution phase: Once all locks are held, the CN performs RDMA reads/writes on the data residing in MNs.
    • Commit/Release: Locks are released atomically after the write‑back, using a non‑blocking RDMA write.
  5. Recovery: Upon CN failure, the system treats all locks held by that CN as expired. Since locks are not persisted, other CNs can simply retry the aborted transactions without a costly reconstruction step.

Results & Findings

MetricLotus vs. Baseline (state‑of‑the‑art DM txn system)
Throughput↑ 2.1× (up to 2.1× higher)
Average latency↓ 49.4 % (nearly half)
NIC atomic‑op load↓ ≈ 70 % on MN RNICs
ScalabilityNear‑linear throughput increase when adding CNs, while baseline plateaus due to NIC saturation
Recovery time≈ 30 % lower than lock‑rebuild approaches

The experiments span YCSB‑type workloads and a TPC‑C‑like OLTP benchmark, demonstrating that the lock‑first protocol dramatically reduces wasted network traffic caused by aborts after data reads.

Practical Implications

  • For cloud providers: Deploying Lotus on disaggregated‑memory clusters (e.g., NVIDIA DGX‑SuperPOD, Azure’s memory‑pool services) can improve resource utilization without extra hardware investment.
  • For database engineers: Existing RDMA‑based transaction engines can adopt the lock‑first protocol and lock sharding logic to gain immediate performance wins, especially for workloads with high contention.
  • For developers of micro‑services: When services share a common DM store, moving lock state to the service host (CN) reduces cross‑node latency, making fine‑grained transactional semantics feasible at scale.
  • For system architects: The lock‑rebuild‑free recovery model simplifies failure handling, lowering the operational complexity of large‑scale DM deployments.

Limitations & Future Work

  • Workload dependence: Lotus relies on locality patterns (e.g., hot keys staying on a few CNs). Highly random access patterns could degrade sharding balance and re‑introduce network hot spots.
  • Memory overhead on CNs: Storing lock tables locally consumes additional memory on compute nodes, which may be constrained in some environments.
  • Fault‑tolerance scope: The current design handles CN crashes gracefully but assumes MNs remain reliable; extending the model to tolerate MN failures is left for future research.
  • Broader protocol integration: The authors plan to explore how Lotus interacts with multi‑version concurrency control (MVCC) and hybrid transaction models to further boost performance under mixed read‑write workloads.

Authors

  • Zhisheng Hu
  • Pengfei Zuo
  • Junliang Hu
  • Yizou Chen
  • Yingjia Wang
  • Ming-Chang Yang

Paper Information

  • arXiv ID: 2512.16136v1
  • Categories: cs.DC
  • Published: December 18, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »

[Paper] The HEAL Data Platform

Objective: The objective was to develop a cloud-based, federated system to serve as a single point of search, discovery and analysis for data generated under th...