[Paper] Demystifying NVSHMEM: A System-Level Analysis on Symmetric Memory and Device-Initiated Operations in GPU Communication

Published: (June 4, 2026 at 05:50 AM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.05951v1

Overview

NVSHMEM is NVIDIA’s OpenSHMEM-based PGAS communication library for GPU clusters, enabling GPU-initiated, one-sided communication through symmetric memory. Despite its growing adoption, a system-level understanding of its design and behavior remains scattered across documentation, source code, and application experience. This paper presents a concise study of NVSHMEM’s programming model, implementation, and performance characteristics, focusing on symmetric memory, one-sided operations, and device-side collectives. We also examine DeepEP as a case study of NVSHMEM in performance-critical sparse deep learning workloads. Our analysis shows that NVSHMEM pioneered a device-side symmetric-memory programming model that enables fine-grained GPU-driven communication and is important for approaching the hardware performance limit. Overall, this work defines NVSHMEM’s role as a systems building block, highlights its design tradeoffs, and identifies opportunities for improving GPU communication runtimes.

Key Contributions

This paper presents research in the following areas:

  • cs.DC

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.DC.

Authors

  • Yijun Ma
  • Siyuan Shen
  • Tiancheng Chen
  • Akhil Langer
  • Jiri Kraus
  • Benjamin Glick
  • Craig Belusar
  • Jeff Hammond
  • Torsten Hoefler

Paper Information

  • arXiv ID: 2606.05951v1
  • Categories: cs.DC
  • Published: June 4, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »