[Paper] Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization

Published: 5 days ago (June 5, 2026 at 01:51 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.07496v1

Overview

Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communication efficiency is mainly determined by the condition number (κ=L/μ) and the network spectral gap (1-β). Although deterministic decentralized methods can simultaneously achieve accelerated (\sqrtκ) and (1/\sqrt{1-β}) dependences, no existing stochastic method attains both improvements at once. In this paper, we propose \emph{Multi-Gossip Accelerated DSGD} (MG-ADSGD), a decentralized stochastic algorithm that combines Nesterov-type primal—dual extrapolation with multi-round fast gossip averaging. The key idea is to couple the gossip depth with the mini-batch size so that additional communication rounds simultaneously improve consensus accuracy and reduce gradient variance. We show that MG-ADSGD achieves the communication complexity [ \widetilde{\mathcal O}!\left( \frac{σ^2}{μnε}\log\frac{1}ε + \sqrt{\fracκ{1-β}}\log\frac{1}ε \right), ] where (ε) denotes the target accuracy, (n) is the number of nodes, and (σ^2) is the gradient variance. To the best of our knowledge, this bound yields the best currently available communication complexity for decentralized stochastic strongly convex optimization, up to logarithmic factors that are independent of $ε$.

Key Contributions

This paper presents research in the following areas:

cs.LG
math.OC

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.LG.

Authors

Ming Sun
Kun Yuan

Paper Information

arXiv ID: 2606.07496v1
Categories: cs.LG, math.OC
Published: June 5, 2026
PDF: Download PDF

[Paper] Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] How reliable are LLMs when it comes to playing dice?

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

[Paper] Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

[Paper] Second-Order Path Kernel Interpolation Formulas in Machine Learning