[Paper] SwitchDelta: Asynchronous Metadata Updating for Distributed Storage with In-Network Data Visibility

Published: 2 months ago (November 25, 2025 at 01:48 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.19978v1

Overview

The paper introduces SwitchDelta, a novel technique that pushes metadata updates into programmable network switches, allowing data to become visible before the traditional metadata write completes. By decoupling metadata from the critical write path, SwitchDelta speeds up ordered writes while still guaranteeing strong consistency—an attractive proposition for any developer building high‑performance distributed storage services.

Key Contributions

In‑network metadata buffering: Uses P4‑programmable switches to temporarily store metadata updates, making newly written data visible to clients without waiting for the metadata node.
Best‑effort data‑plane design: Introduces lightweight mechanisms (e.g., compact encoding, selective eviction) that respect the limited memory and processing budget of switches.
Metadata update protocol: A new protocol that reconciles the switch‑cached metadata with the persistent metadata store, ensuring eventual consistency and crash safety.
Broad evaluation: Demonstrates the approach on three representative in‑memory storage systems (log‑structured KV store, distributed file system, secondary index) and shows up to 52 % latency reduction and 127 % throughput boost on write‑heavy workloads.

Methodology

System Model – The authors assume a classic two‑tier architecture: data nodes store the actual payload, while a separate metadata service tracks object locations, versions, and visibility flags.
Switch‑side Buffer – When a client issues a write, the data node stores the payload first. The accompanying metadata update (e.g., “object X is now at version V”) is encapsulated in a small packet and forwarded to a programmable switch. The switch stores this update in a tiny hash‑based cache.
In‑network Visibility – While the metadata is still in the switch, any read request that traverses the same switch can be answered directly from the cached entry, effectively “seeing” the new data instantly.
Commit & Reconciliation – The metadata node later receives the same update (via a reliable control channel). It writes the entry to durable storage and sends an acknowledgment. The switch then either discards the cached copy or marks it as committed. If the switch crashes or the entry expires, the system falls back to the traditional path, preserving strong consistency.
Resource Management – Because switches have a few megabytes of SRAM, the design employs:
- Compact encoding (bit‑fields for version, object ID, etc.)
- Eviction policies that prioritize recent writes and drop stale entries.
- Fallback handling for cache misses, ensuring correctness without performance penalties.

Results & Findings

Workload	Latency Reduction	Throughput Gain
Write‑heavy KV store	≈ 52 % lower 99th‑percentile latency	≈ 127 % higher ops/sec
Distributed file system (small files)	38 % latency cut	94 % throughput boost
Secondary index (range scans)	31 % latency cut	68 % throughput boost

Key observations

Scalability: Benefits grow with the proportion of writes; read‑only workloads see negligible impact (as expected).
Switch load: Even with modest SRAM (≈ 2 MiB), the switch can buffer thousands of metadata updates without saturating its pipeline.
Failure resilience: In simulated switch failures, the system gracefully reverts to the classic ordered‑write path with no loss of consistency.

Practical Implications

Faster write‑heavy services: Cloud databases, log‑structured caches, and object stores can shave off tens of milliseconds per write, directly translating to lower tail latency for user‑facing APIs.
Cost‑effective scaling: Instead of provisioning more powerful metadata servers, operators can invest in inexpensive programmable switches (e.g., Tofino) to achieve similar performance gains.
Simplified client logic: Clients continue to use the standard read/write APIs; the visibility boost is transparent, requiring only a small library to encode metadata packets.
Potential for hybrid cloud: Edge or ISP switches could host the metadata buffer, bringing write visibility closer to the client and reducing cross‑region round‑trips.

Limitations & Future Work

Switch resource constraints: The approach relies on a small amount of SRAM; extremely high write rates could cause cache thrashing, limiting scalability.
Protocol complexity: Adding a control channel between metadata nodes and switches introduces extra engineering effort and debugging surface.
Security & isolation: Exposing metadata to the data plane raises questions about access control and multi‑tenant isolation, which the paper only touches on.
Future directions: The authors suggest exploring adaptive cache sizing, integrating with emerging P4‑runtime APIs for dynamic reconfiguration, and extending the model to persistent (SSD‑based) storage where write latency is higher.

Authors

Junru Li
Qing Wang
Zhe Yang
Shuo Liu
Jiwu Shu
Youyou Lu

Paper Information

arXiv ID: 2511.19978v1
Categories: cs.DC, cs.DB
Published: November 25, 2025
PDF: Download PDF

[Paper] SwitchDelta: Asynchronous Metadata Updating for Distributed Storage with In-Network Data Visibility

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] MAD-DAG: Protecting Blockchain Consensus from MEV

[Paper] Modeling the Effect of Data Redundancy on Speedup in MLFMA Near-Field Computation

[Paper] MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training

[Paper] Interactive Visualization of Proof-of-Work Consensus Protocol on Raspberry Pi