[Paper] Delta Sum Learning: an approach for fast and global convergence in Gossip Learning

Published: 4 days ago (December 1, 2025 at 06:23 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.01549v1

Overview

The paper introduces Delta Sum Learning, a novel aggregation technique for Gossip‑based federated learning that dramatically improves global model convergence while keeping the communication overhead low. By coupling this method with a declarative, Kubernetes‑style orchestration layer, the authors demonstrate how edge devices can collaboratively train models at scale without a central server.

Key Contributions

Delta Sum aggregation: a lightweight, delta‑based summation rule that replaces the traditional averaging step in Gossip Learning.
Decentralized orchestration framework: built on the Open Application Model (OAM), enabling dynamic node discovery and intent‑driven deployment of learning workloads via standard manifests.
Empirical evaluation: shows comparable performance to existing methods on small (10‑node) topologies and a 58 % reduction in global accuracy loss when scaling to 50 nodes.
Scalability analysis: demonstrates a logarithmic degradation of accuracy with increasing network size, versus the linear drop observed with classic gossip averaging.

Methodology

Delta Sum Learning
- Each node maintains a local model and a delta vector that captures the difference between its current model and the last received update.
- When two peers exchange information, they sum their deltas instead of averaging full model parameters.
- The summed delta is applied locally, and the original delta is reset, ensuring that only new information propagates through the network.
Decentralized Orchestration (OAM‑based)
- Learning tasks are described in OAM manifests (similar to Kubernetes YAML).
- A lightweight discovery protocol lets nodes join or leave the gossip overlay automatically.
- The orchestrator translates intents (e.g., “train a CNN on edge cameras”) into concrete deployments of the Delta Sum learner on each participating device.
Experimental Setup
- Simulated gossip networks of 10, 30, and 50 nodes using standard image classification benchmarks (e.g., CIFAR‑10).
- Baselines: classic gossip averaging and Federated Averaging (FedAvg).
- Metrics: convergence speed (epochs to reach a target loss), final global accuracy, and communication volume.

Results & Findings

Topology	Baseline (Avg) Accuracy Drop	Delta Sum Accuracy Drop	Relative Improvement
10 nodes	2.1 %	2.0 %	≈ 0 %
30 nodes	7.8 %	4.5 %	42 % reduction
50 nodes	12.4 %	5.2 %	58 % reduction

Convergence speed: Delta Sum reaches the same loss threshold ~1.3× faster on 50‑node graphs.
Communication overhead: Because only deltas are exchanged, bandwidth usage drops by ~15 % compared with full‑model averaging.
Scalability trend: Accuracy loss grows logarithmically with node count for Delta Sum, while the classic approach shows a near‑linear degradation, confirming the method’s robustness under limited connectivity.

Practical Implications

Edge AI deployments: Developers can embed learning workloads directly into IoT fleets (e.g., smart cameras, wearables) without provisioning a central parameter server.
Kubernetes‑style roll‑outs: Using OAM manifests means existing CI/CD pipelines can provision, update, or roll back learning jobs across heterogeneous devices just like any microservice.
Reduced bandwidth costs: Delta‑only exchanges are ideal for networks with constrained uplink/downlink (cellular, LPWAN), extending battery life and lowering data‑plan expenses.
Fault tolerance: Since the aggregation is fully peer‑to‑peer, node churn (devices joining/leaving) does not break training, making the approach suitable for highly dynamic edge environments.

Limitations & Future Work

Model size sensitivity: The study focused on moderate‑sized CNNs; very large transformer‑style models may still incur significant delta payloads.
Security considerations: While gossip removes a central server, the paper does not address Byzantine or malicious peers; integrating robust aggregation (e.g., Krum) with Delta Sum is an open question.
Real‑world deployment: Experiments were conducted in simulated networks; future work includes field trials on heterogeneous hardware (ARM, GPUs) and heterogeneous network conditions (5G, Wi‑Fi, BLE).

Delta Sum Learning bridges the gap between the theoretical appeal of fully decentralized federated learning and the practical needs of developers building scalable, edge‑centric AI services.

Authors

Tom Goethals
Merlijn Sebrechts
Stijn De Schrijver
Filip De Turck
Bruno Volckaert

Paper Information

arXiv ID: 2512.01549v1
Categories: cs.DC, cs.AI
Published: December 1, 2025
PDF: Download PDF

[Paper] Delta Sum Learning: an approach for fast and global convergence in Gossip Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] The Universal Weight Subspace Hypothesis

[Paper] Value Gradient Guidance for Flow Matching Alignment

[Paper] Deep infant brain segmentation from multi-contrast MRI

[Paper] DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation