[Paper] Demystifying NVSHMEM: A System-Level Analysis on Symmetric Memory and Device-Initiated Operations in GPU Communication
Source: arXiv - 2606.05951v1
Overview
NVSHMEM is NVIDIA’s OpenSHMEM-based PGAS communication library for GPU clusters, enabling GPU-initiated, one-sided communication through symmetric memory. Despite its growing adoption, a system-level understanding of its design and behavior remains scattered across documentation, source code, and application experience. This paper presents a concise study of NVSHMEM’s programming model, implementation, and performance characteristics, focusing on symmetric memory, one-sided operations, and device-side collectives. We also examine DeepEP as a case study of NVSHMEM in performance-critical sparse deep learning workloads. Our analysis shows that NVSHMEM pioneered a device-side symmetric-memory programming model that enables fine-grained GPU-driven communication and is important for approaching the hardware performance limit. Overall, this work defines NVSHMEM’s role as a systems building block, highlights its design tradeoffs, and identifies opportunities for improving GPU communication runtimes.
Key Contributions
This paper presents research in the following areas:
- cs.DC
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.DC.
Authors
- Yijun Ma
- Siyuan Shen
- Tiancheng Chen
- Akhil Langer
- Jiri Kraus
- Benjamin Glick
- Craig Belusar
- Jeff Hammond
- Torsten Hoefler
Paper Information
- arXiv ID: 2606.05951v1
- Categories: cs.DC
- Published: June 4, 2026
- PDF: Download PDF