[Paper] PCCL: Process Group-Aware Scalable and Generic Collective Algorithm Synthesizer

Published: (June 5, 2026 at 04:08 AM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.07019v1

Overview

Distributed machine learning has become increasingly important due to the massive scale of large-scale generative models. Both model parameters and data are distributed across many compute devices, which requires frequent collective communications to synchronize activations and parameter updates. Such collective communications have become a major bottleneck. While the performance of the collective algorithm depends on the physical network topology, the baseline collective algorithms in collective communication libraries are largely topology-agnostic. Collective algorithm synthesizers address this inefficiency by automatically generating topology-aware collective algorithms. However, prior works have largely overlooked that collective communication typically occurs only among a subset of devices, known as process groups. Additionally, most existing synthesizers are limited in the range of target collective patterns they can generate. We propose PCCL, a scalable and generic framework for synthesizing topology-aware collective algorithms. PCCL is process group-aware and capable of generating near-optimal collective algorithms even when only a subset of devices participates in collective operations. PCCL synthesizes arbitrary collective patterns, including 512-NPU All-to-All synthesis in 11.68 minutes.

Key Contributions

This paper presents research in the following areas:

  • cs.DC

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.DC.

Authors

  • William Won
  • Kartik Lakhotia
  • Madhu Kumar
  • Sudarshan Srinivasan
  • Tushar Krishna

Paper Information

  • arXiv ID: 2606.07019v1
  • Categories: cs.DC
  • Published: June 5, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »