[Paper] Lotus: Optimizing Disaggregated Transactions with Disaggregated Locks
Source: arXiv - 2512.16136v1
Overview
The paper introduces Lotus, a new distributed transaction system designed for disaggregated memory (DM) architectures. By moving lock management from memory‑node NICs to compute nodes, Lotus eliminates a major network bottleneck and delivers up to 2× higher throughput for OLTP workloads.
Key Contributions
- Lock disaggregation: Locks are stored and processed on compute nodes (CNs) instead of memory nodes (MNs), freeing the MN RDMA NICs from heavy atomic‑operation traffic.
- Application‑aware lock sharding: Lotus partitions locks based on workload locality, achieving balanced load across CNs while preserving cache friendliness.
- Lock‑first transaction protocol: Transactions acquire all required locks before any data access, enabling early conflict detection and proactive aborts.
- Lock‑rebuild‑free recovery: Treats locks as transient; after a CN crash, the system recovers without reconstructing lock state, keeping recovery lightweight.
- Performance gains: Empirical evaluation shows up to 2.1× higher throughput and ≈50 % lower latency versus the best existing DM transaction systems.
Methodology
- System model: The authors assume a typical DM deployment where multiple CNs communicate with a pool of MNs via RDMA. Traditional designs place lock metadata on MNs, causing the NIC to handle a flood of one‑sided atomic ops (e.g., compare‑and‑swap).
- Lock disaggregation design: Lotus relocates each lock to the CN that initiates the transaction. A lightweight lock table is kept in the CN’s local memory, and lock ownership is advertised to MNs via a small “lock‑owner” directory that can be cached.
- Sharding algorithm: Locks are grouped by the primary data items they protect. Using workload statistics (e.g., hot keys), Lotus assigns groups to CNs so that most lock requests stay local, while a simple hash‑based fallback ensures even distribution when hotspots shift.
- Transaction flow:
- Lock‑first phase: The CN sends a batch of lock‑acquire messages to the relevant CNs (including itself).
- Validation: If any lock fails, the transaction aborts immediately—no data reads are performed.
- Execution phase: Once all locks are held, the CN performs RDMA reads/writes on the data residing in MNs.
- Commit/Release: Locks are released atomically after the write‑back, using a non‑blocking RDMA write.
- Recovery: Upon CN failure, the system treats all locks held by that CN as expired. Since locks are not persisted, other CNs can simply retry the aborted transactions without a costly reconstruction step.
Results & Findings
| Metric | Lotus vs. Baseline (state‑of‑the‑art DM txn system) |
|---|---|
| Throughput | ↑ 2.1× (up to 2.1× higher) |
| Average latency | ↓ 49.4 % (nearly half) |
| NIC atomic‑op load | ↓ ≈ 70 % on MN RNICs |
| Scalability | Near‑linear throughput increase when adding CNs, while baseline plateaus due to NIC saturation |
| Recovery time | ≈ 30 % lower than lock‑rebuild approaches |
The experiments span YCSB‑type workloads and a TPC‑C‑like OLTP benchmark, demonstrating that the lock‑first protocol dramatically reduces wasted network traffic caused by aborts after data reads.
Practical Implications
- For cloud providers: Deploying Lotus on disaggregated‑memory clusters (e.g., NVIDIA DGX‑SuperPOD, Azure’s memory‑pool services) can improve resource utilization without extra hardware investment.
- For database engineers: Existing RDMA‑based transaction engines can adopt the lock‑first protocol and lock sharding logic to gain immediate performance wins, especially for workloads with high contention.
- For developers of micro‑services: When services share a common DM store, moving lock state to the service host (CN) reduces cross‑node latency, making fine‑grained transactional semantics feasible at scale.
- For system architects: The lock‑rebuild‑free recovery model simplifies failure handling, lowering the operational complexity of large‑scale DM deployments.
Limitations & Future Work
- Workload dependence: Lotus relies on locality patterns (e.g., hot keys staying on a few CNs). Highly random access patterns could degrade sharding balance and re‑introduce network hot spots.
- Memory overhead on CNs: Storing lock tables locally consumes additional memory on compute nodes, which may be constrained in some environments.
- Fault‑tolerance scope: The current design handles CN crashes gracefully but assumes MNs remain reliable; extending the model to tolerate MN failures is left for future research.
- Broader protocol integration: The authors plan to explore how Lotus interacts with multi‑version concurrency control (MVCC) and hybrid transaction models to further boost performance under mixed read‑write workloads.
Authors
- Zhisheng Hu
- Pengfei Zuo
- Junliang Hu
- Yizou Chen
- Yingjia Wang
- Ming-Chang Yang
Paper Information
- arXiv ID: 2512.16136v1
- Categories: cs.DC
- Published: December 18, 2025
- PDF: Download PDF