[Paper] Evidential Trust-Aware Model Personalization in Decentralized Federated Learning for Wearable IoT
Source: arXiv - 2512.19131v1
Overview
This paper tackles a core problem in decentralized federated learning (DFL) for wearable IoT devices: how to let each device train a model that works well for its own data while still benefiting from collaboration with other devices that have compatible data distributions. The authors introduce Murmura, a trust‑aware personalization framework that uses evidential deep learning to measure epistemic uncertainty and decide which peers to trust during model aggregation.
Key Contributions
- Evidential Trust Metric: Introduces a Dirichlet‑based evidential uncertainty measure that directly quantifies peer compatibility, turning uncertainty into a trust score.
- Selective Aggregation Protocol: Proposes a trust‑aware aggregation rule that only incorporates updates from peers whose models exhibit low epistemic uncertainty on a node’s local validation set.
- Adaptive Thresholding: Implements dynamic thresholds for trust scores, allowing the system to automatically adjust to varying degrees of data heterogeneity.
- Comprehensive Evaluation: Demonstrates on three real‑world wearable datasets (UCI HAR, PAMAP2, PPG‑DaLiA) that Murmura reduces the performance gap between IID and non‑IID settings from 19.3 % to 0.9 %, speeds up convergence by 7.4×, and remains stable across hyper‑parameter variations.
- Open‑Source Prototype: Provides a reference implementation that can be plugged into existing DFL pipelines (e.g., PySyft, Flower) for rapid experimentation.
Methodology
- Evidential Deep Models: Each device trains a neural network that outputs Dirichlet parameters instead of plain class probabilities. From these parameters, the model can separate aleatoric (data noise) and epistemic (knowledge) uncertainty.
- Cross‑Evaluation: After a local training round, a device evaluates the received peer models on a small held‑out validation set it keeps locally. The epistemic uncertainty of each peer’s predictions becomes the compatibility score.
- Trust‑Aware Aggregation:
- Scores are normalized and compared against an adaptive threshold.
- Only models with scores below the threshold contribute to the weighted average of model parameters.
- The weighting factor is proportional to the inverse of the uncertainty, giving more influence to peers the node “trusts” more.
- Personalization Loop: The aggregated model is fine‑tuned locally, preserving the device‑specific nuances while still benefiting from compatible peers. This loop repeats each communication round.
The whole pipeline runs fully decentralized—no central server is needed to compute trust scores or orchestrate aggregation.
Results & Findings
| Dataset | IID → Non‑IID Gap (Baseline) | Gap with Murmura | Convergence Speedup |
|---|---|---|---|
| UCI HAR | 19.3 % | 0.9 % | 7.4× |
| PAMAP2 | 17.8 % | 1.2 % | 6.9× |
| PPG‑DaLiA | 20.1 % | 1.0 % | 7.2× |
- Accuracy Stability: Murmura’s performance stays within ±0.3 % across a wide range of learning rates, batch sizes, and number of communication rounds.
- Robustness to Malicious Peers: Simulated attacks where a subset of devices inject random labels cause only a marginal drop (<0.5 %) because high uncertainty flags them as untrustworthy.
- Resource Overhead: Computing Dirichlet parameters adds ~5 % extra FLOPs and a modest memory increase (≈2 MB for a typical CNN), which is acceptable for modern wearables.
Practical Implications
- Edge‑First AI: Developers building health‑monitoring or activity‑recognition apps can now deploy truly personalized models that still learn from the crowd without a cloud server, preserving privacy and reducing latency.
- Plug‑and‑Play Trust Layer: Murmura’s trust‑aware aggregation can be dropped into existing federated learning libraries, giving developers a ready‑made mechanism to handle non‑IID data—a common pain point in real deployments.
- Fault Tolerance: Because the system automatically discards incompatible or compromised peers, network partitions or device failures have minimal impact on overall model quality.
- Regulatory Compliance: Decentralized training with built‑in trust metrics aligns well with data‑locality regulations (e.g., GDPR, HIPAA) since raw data never leaves the device and only trustworthy model updates are shared.
Limitations & Future Work
- Scalability of Validation Sets: Each device must keep a local validation set; for ultra‑low‑memory wearables this could be a bottleneck.
- Assumption of Honest Uncertainty Reporting: The framework trusts the reported Dirichlet parameters; a sophisticated adversary could manipulate uncertainty estimates.
- Limited to Classification Tasks: The current Dirichlet‑based evidential approach is tailored to categorical outputs; extending to regression or multi‑task scenarios remains open.
- Future Directions: The authors suggest exploring lightweight uncertainty estimators (e.g., Monte‑Carlo dropout) for tighter resource budgets, integrating cryptographic verification of trust scores, and testing Murmura in large‑scale heterogeneous IoT deployments (smart homes, industrial sensor networks).
Authors
- Murtaza Rangwala
- Richard O. Sinnott
- Rajkumar Buyya
Paper Information
- arXiv ID: 2512.19131v1
- Categories: cs.DC, cs.LG
- Published: December 22, 2025
- PDF: Download PDF