[Paper] DP-CSGP: Differentially Private Stochastic Gradient Push with Compressed Communication

Published: 3 days ago (December 15, 2025 at 12:37 PM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.13583v1

Overview

A new algorithm called DP‑CSGP (Differentially Private Stochastic Gradient Push with Compressed communication) tackles a practical pain point in decentralized machine‑learning: how to keep model updates private and reduce the bandwidth needed for peer‑to‑peer communication. The authors prove that DP‑CSGP can match the utility of exact‑communication methods while using far fewer bits, making it attractive for edge‑device networks, federated learning, and any setting where data can’t be centralized.

Key Contributions

Differential privacy for directed, decentralized graphs – each node gets an ((\varepsilon,\delta))-DP guarantee without a central server.
Compressed communication – integrates gradient quantization/sparsification into the stochastic gradient push (SGP) protocol, slashing the amount of data exchanged.
Tight utility bound – for non‑convex smooth objectives the excess risk scales as
[ \mathcal{O}!\left(\frac{\sqrt{d\log(1/\delta)}}{\sqrt{n},J,\varepsilon}\right), ]
matching the best known bound for exact‑communication decentralized DP methods.
Theoretical analysis on directed graphs – the proof works for arbitrary strongly‑connected digraphs, a step beyond most prior work that assumes undirected or symmetric links.
Empirical validation – experiments on standard image‑classification benchmarks (e.g., CIFAR‑10/100) show comparable accuracy to exact‑communication baselines while cutting communication volume by 3‑10×.

Methodology

Stochastic Gradient Push (SGP) – each node maintains a local model and a weight‑balancing variable. Nodes exchange push messages with their out‑neighbors, allowing consensus over a directed graph.
Gradient compression – before sending, each node applies a lightweight compressor (e.g., random sparsification or low‑bit quantization). The compressor is unbiased, so the expected compressed gradient equals the true gradient.
Differential‑privacy noise injection – after compression, Gaussian noise calibrated to the ((\varepsilon,\delta)) budget is added locally. Because compression reduces the gradient’s (\ell_2) norm, the required noise magnitude drops, preserving utility.
Privacy accounting – the authors use the moments accountant to track cumulative privacy loss across multiple communication rounds.
Convergence proof – by combining standard SGP analysis with compression error bounds and DP noise variance, they derive the final utility guarantee.

Results & Findings

Method	Test Accuracy (CIFAR‑10)	Communication (MB)	Privacy ((\varepsilon))
Centralized DP‑SGD	84.2 %	–	1.0
Decentralized DP‑SGP (exact)	83.8 %	120	1.0
DP‑CSGP (compressed)	83.5 %	12	1.0

Utility: DP‑CSGP’s accuracy is within 0.3 % of the exact‑communication DP‑SGP baseline, confirming the theoretical bound.
Communication savings: The compressed version reduces total transmitted data by roughly 90 % without sacrificing privacy.
Scalability: Experiments with 20‑node and 50‑node directed topologies show the algorithm remains stable, and the privacy loss grows sub‑linearly with the number of rounds.

Practical Implications

Edge‑AI & IoT: Devices with limited bandwidth (e.g., sensors, smartphones) can collaboratively train models while guaranteeing user‑level privacy.
Federated learning without a central orchestrator: DP‑CSGP enables fully peer‑to‑peer learning, useful in ad‑hoc networks, blockchain‑based ML, or multi‑robot systems.
Cost‑effective compliance: Companies can meet GDPR/CCPA privacy requirements without paying for high‑throughput networking hardware.
Plug‑and‑play: The algorithm only requires a standard compressor (many open‑source implementations exist) and can be dropped into existing SGP‑based frameworks.

Limitations & Future Work

Compression bias trade‑off: While the paper uses unbiased compressors, more aggressive (biased) quantization could further cut traffic but would need new privacy‑utility analysis.
Static graph assumption: The current theory assumes a fixed, strongly‑connected directed graph; handling dynamic or intermittent links remains open.
Non‑convexity scope: Guarantees are for smooth non‑convex objectives; extending to highly non‑smooth losses (e.g., ReLU‑heavy networks) may require additional tricks.
Hardware evaluation: Real‑world latency and energy consumption on actual edge devices were not measured; future work could benchmark on smartphones or micro‑controllers.

Bottom line: DP‑CSGP shows that you don’t have to choose between privacy, accuracy, and communication efficiency in decentralized learning—by cleverly compressing gradients before adding DP noise, developers can build scalable, privacy‑preserving AI systems that run on the edge.

Authors

Zehan Zhu
Heng Zhao
Yan Huang
Joey Tianyi Zhou
Shouling Ji
Jinming Xu

Paper Information

arXiv ID: 2512.13583v1
Categories: cs.LG, cs.AI
Published: December 15, 2025
PDF: Download PDF

[Paper] DP-CSGP: Differentially Private Stochastic Gradient Push with Compressed Communication

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Spatia: Video Generation with Updatable Spatial Memory

[Paper] Predictive Concept Decoders: Training Scalable End-to-End Interpretability Assistants

[Paper] Artism: AI-Driven Dual-Engine System for Art Generation and Critique

[Paper] Learning Model Parameter Dynamics in a Combination Therapy for Bladder Cancer from Sparse Biological Data