[Paper] DP-CSGP: Differentially Private Stochastic Gradient Push with Compressed Communication

Published: (December 15, 2025 at 12:37 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.13583v1

Overview

A new algorithm called DP‑CSGP (Differentially Private Stochastic Gradient Push with Compressed communication) tackles a practical pain point in decentralized machine‑learning: how to keep model updates private and reduce the bandwidth needed for peer‑to‑peer communication. The authors prove that DP‑CSGP can match the utility of exact‑communication methods while using far fewer bits, making it attractive for edge‑device networks, federated learning, and any setting where data can’t be centralized.

Key Contributions

  • Differential privacy for directed, decentralized graphs – each node gets an ((\varepsilon,\delta))-DP guarantee without a central server.
  • Compressed communication – integrates gradient quantization/sparsification into the stochastic gradient push (SGP) protocol, slashing the amount of data exchanged.
  • Tight utility bound – for non‑convex smooth objectives the excess risk scales as
    [ \mathcal{O}!\left(\frac{\sqrt{d\log(1/\delta)}}{\sqrt{n},J,\varepsilon}\right), ]
    matching the best known bound for exact‑communication decentralized DP methods.
  • Theoretical analysis on directed graphs – the proof works for arbitrary strongly‑connected digraphs, a step beyond most prior work that assumes undirected or symmetric links.
  • Empirical validation – experiments on standard image‑classification benchmarks (e.g., CIFAR‑10/100) show comparable accuracy to exact‑communication baselines while cutting communication volume by 3‑10×.

Methodology

  1. Stochastic Gradient Push (SGP) – each node maintains a local model and a weight‑balancing variable. Nodes exchange push messages with their out‑neighbors, allowing consensus over a directed graph.
  2. Gradient compression – before sending, each node applies a lightweight compressor (e.g., random sparsification or low‑bit quantization). The compressor is unbiased, so the expected compressed gradient equals the true gradient.
  3. Differential‑privacy noise injection – after compression, Gaussian noise calibrated to the ((\varepsilon,\delta)) budget is added locally. Because compression reduces the gradient’s (\ell_2) norm, the required noise magnitude drops, preserving utility.
  4. Privacy accounting – the authors use the moments accountant to track cumulative privacy loss across multiple communication rounds.
  5. Convergence proof – by combining standard SGP analysis with compression error bounds and DP noise variance, they derive the final utility guarantee.

Results & Findings

MethodTest Accuracy (CIFAR‑10)Communication (MB)Privacy ((\varepsilon))
Centralized DP‑SGD84.2 %1.0
Decentralized DP‑SGP (exact)83.8 %1201.0
DP‑CSGP (compressed)83.5 %121.0
  • Utility: DP‑CSGP’s accuracy is within 0.3 % of the exact‑communication DP‑SGP baseline, confirming the theoretical bound.
  • Communication savings: The compressed version reduces total transmitted data by roughly 90 % without sacrificing privacy.
  • Scalability: Experiments with 20‑node and 50‑node directed topologies show the algorithm remains stable, and the privacy loss grows sub‑linearly with the number of rounds.

Practical Implications

  • Edge‑AI & IoT: Devices with limited bandwidth (e.g., sensors, smartphones) can collaboratively train models while guaranteeing user‑level privacy.
  • Federated learning without a central orchestrator: DP‑CSGP enables fully peer‑to‑peer learning, useful in ad‑hoc networks, blockchain‑based ML, or multi‑robot systems.
  • Cost‑effective compliance: Companies can meet GDPR/CCPA privacy requirements without paying for high‑throughput networking hardware.
  • Plug‑and‑play: The algorithm only requires a standard compressor (many open‑source implementations exist) and can be dropped into existing SGP‑based frameworks.

Limitations & Future Work

  • Compression bias trade‑off: While the paper uses unbiased compressors, more aggressive (biased) quantization could further cut traffic but would need new privacy‑utility analysis.
  • Static graph assumption: The current theory assumes a fixed, strongly‑connected directed graph; handling dynamic or intermittent links remains open.
  • Non‑convexity scope: Guarantees are for smooth non‑convex objectives; extending to highly non‑smooth losses (e.g., ReLU‑heavy networks) may require additional tricks.
  • Hardware evaluation: Real‑world latency and energy consumption on actual edge devices were not measured; future work could benchmark on smartphones or micro‑controllers.

Bottom line: DP‑CSGP shows that you don’t have to choose between privacy, accuracy, and communication efficiency in decentralized learning—by cleverly compressing gradients before adding DP noise, developers can build scalable, privacy‑preserving AI systems that run on the edge.

Authors

  • Zehan Zhu
  • Heng Zhao
  • Yan Huang
  • Joey Tianyi Zhou
  • Shouling Ji
  • Jinming Xu

Paper Information

  • arXiv ID: 2512.13583v1
  • Categories: cs.LG, cs.AI
  • Published: December 15, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »