[Paper] Dynamic Topology Optimization for Non-IID Data in Decentralized Learning
Source: arXiv - 2602.03383v1
Overview
Decentralized learning (DL) lets a fleet of devices train a shared model without a central server, which is great for privacy‑preserving AI and edge scalability. The new Morph algorithm tackles two long‑standing pain points in DL:
- The data on each node is often non‑IID (e.g., different users generate different image styles).
- Most DL systems use a static communication graph, which can stall learning under heterogeneous data.
Morph dynamically rewires the peer‑to‑peer topology based on how “different” neighboring models are, yielding faster convergence and higher accuracy on standard benchmarks.
Key Contributions
- Adaptive topology optimization: Nodes continuously select peers that exhibit the greatest model dissimilarity, keeping the in‑degree constant while reshaping the overall graph.
- Gossip‑based peer discovery: No central coordinator or global view is required; nodes discover candidates through lightweight gossip messages.
- Diversity‑driven neighbor selection: By preferring diverse models, Morph mitigates the negative impact of non‑IID data distributions.
- Empirical validation: Experiments on CIFAR‑10 (image classification) and FEMNIST (handwritten characters) with up to 100 nodes show Morph closes the performance gap to a fully‑connected baseline within 0.5 % accuracy.
- Efficiency gains: Morph reaches target accuracy with fewer communication rounds and exhibits lower inter‑node variance, indicating more stable training.
Methodology
- Initial Setup – Each node starts with a local copy of the model and a fixed number of inbound connections (in‑degree).
- Model Exchange – Periodically, nodes gossip their current model parameters to a random subset of peers.
- Dissimilarity Measurement – After receiving neighbor models, a node computes a simple distance metric (e.g., Euclidean norm of parameter differences) to quantify how “different” each neighbor’s model is from its own.
- Peer Selection – The node retains connections to the k most dissimilar peers (where k equals the fixed in‑degree) and drops the rest. New peers are discovered via the gossip messages, so the graph evolves without any central authority.
- Local Update – Using the received models, each node performs a standard decentralized optimization step (e.g., weighted averaging followed by a gradient descent on its local data).
- Iterate – Steps 2‑5 repeat until convergence.
The core idea is that exchanging information with diverse models injects fresh gradients that counteract the bias introduced by non‑IID local datasets, while the gossip mechanism keeps the overhead low and the system robust to node churn.
Results & Findings
| Dataset | Nodes | Baseline (Epidemic) | Morph | Fully‑Connected Upper Bound |
|---|---|---|---|---|
| CIFAR‑10 | 100 | 71.3 % | 79.9 % (↑ 1.12×) | 80.4 % |
| FEMNIST | 100 | 84.1 % | 90.9 % (↑ 1.08×) | 91.5 % |
| CIFAR‑10 | 50 | 70.5 % | 79.5 % (gap ≤ 0.5 pp) | 80.0 % |
- Higher final accuracy: Morph consistently outperforms static and epidemic topologies, approaching the performance of an ideal fully‑connected network.
- Faster convergence: Morph reaches a given accuracy target in ~30 % fewer communication rounds.
- Stability: The variance of model parameters across nodes drops by ~40 % compared to baselines, indicating more uniform learning.
- Scalability: Results hold for both 50‑node and 100‑node deployments, suggesting the approach scales without needing more bandwidth per node.
Practical Implications
- Edge AI deployments – Devices like smartphones, IoT sensors, or autonomous drones can train shared models more reliably even when each device sees a different slice of the data (e.g., user‑specific images, locale‑specific sensor readings).
- Reduced bandwidth costs – Because Morph keeps the in‑degree fixed and only exchanges models with a handful of selected peers, total traffic is comparable to static topologies but yields better accuracy, translating to lower data‑plan expenses.
- Robustness to churn – The gossip‑based discovery works even when nodes join or leave, making Morph suitable for real‑world federated scenarios where connectivity is intermittent.
- Plug‑and‑play integration – Morph can be layered on top of existing decentralized optimization libraries (e.g., PyTorch Distributed, TensorFlow Federated) with minimal code changes—just replace the static neighbor list with the dynamic selection routine.
- Privacy‑preserving analytics – Since no central aggregator is needed, organizations can comply with data‑locality regulations while still benefiting from collaborative model improvements.
Limitations & Future Work
- Metric simplicity – The current dissimilarity measure is a raw parameter distance; more sophisticated metrics (e.g., Fisher information, task‑specific loss) could further improve peer selection.
- Assumption of reliable gossip – In highly lossy networks, gossip messages may be delayed or dropped, potentially slowing topology adaptation.
- Fixed in‑degree – While keeping the in‑degree constant simplifies analysis, allowing adaptive bandwidth allocation could yield additional gains.
- Security considerations – The algorithm does not address malicious peers that could deliberately send misleading models; integrating Byzantine‑resilient defenses is an open direction.
- Broader workloads – Experiments focus on image classification; evaluating Morph on NLP, reinforcement learning, or multimodal tasks would strengthen its generality.
Morph demonstrates that a smart communication graph—one that evolves to maximize model diversity—can dramatically close the performance gap that has long plagued decentralized learning on heterogeneous data. For developers building edge‑centric AI pipelines, it offers a practical recipe to get more bang for the same network budget.
Authors
- Bart Cox
- Antreas Ioannou
- Jérémie Decouchant
Paper Information
- arXiv ID: 2602.03383v1
- Categories: cs.LG, cs.DC
- Published: February 3, 2026
- PDF: Download PDF