[Paper] PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology

Published: (March 4, 2026 at 12:44 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2603.04323v1

Overview

Federated learning (FL) promises collaborative model training without moving raw data, but sharing raw gradients can leak sensitive information and struggle when client data are highly heterogeneous. The paper “PTOPOFL: Privacy‑Preserving Personalised Federated Learning via Persistent Homology” proposes a novel twist: replace gradient exchanges with compact topological summaries derived from persistent homology (PH). By communicating only a 48‑dimensional PH feature vector per client, the framework dramatically reduces reconstruction risk while still delivering personalized, high‑performing models.

Key Contributions

  • Topological Communication Layer – Introduces a gradient‑free protocol where each client sends a low‑dimensional persistent homology descriptor instead of raw model updates.
  • Privacy Guarantee via Information Contraction – Proves that PH descriptors leak strictly less mutual information per sample than gradients for strongly convex losses, making inversion ill‑posed.
  • Topology‑Guided Personalised Aggregation – Clusters clients using Wasserstein distances between PH diagrams, applies intra‑cluster weighted model averaging, and blends clusters with a global consensus.
  • Theoretical Convergence – Shows linear convergence of the Wasserstein‑weighted aggregation with an error floor provably lower than standard FedAvg.
  • Empirical Validation – Demonstrates state‑of‑the‑art AUC scores (0.841 on a multi‑hospital health dataset, 0.910 on a pathological benchmark) and a 4.5× reduction in reconstruction risk versus gradient‑based FL baselines.

Methodology

  1. Local Topological Extraction – Each client trains a local model on its private data and computes a persistence diagram from the model’s weight space (or activations). The diagram captures the “shape” of the learned representation (e.g., connected components, loops).
  2. Feature Vector Encoding – The diagram is transformed into a fixed‑size 48‑dimensional vector using standard PH vectorization techniques (e.g., persistence landscapes or silhouettes). This vector is the only information sent to the server.
  3. Similarity & Clustering – The server measures pairwise Wasserstein distances between client vectors, grouping clients with similar topological signatures into clusters.
  4. Topology‑Weighted Aggregation – Within each cluster, model updates are aggregated using weights derived from the PH similarity (more similar clients influence each other more).
  5. Global Consensus Blending – Cluster‑level models are merged into a global model, which is then broadcast back. Clients can keep the global model as a base and fine‑tune locally, achieving personalization.

The whole pipeline avoids transmitting raw gradients, thus sidestepping the main attack surface exploited by data‑reconstruction attacks.

Results & Findings

DatasetBaseline (FedAvg) AUCPTOPOFL AUCReconstruction Risk (relative)
8‑hospital health (2 adversarial)0.7820.8410.22× (4.5× reduction)
Pathological benchmark (10 clients)0.8620.9100.22×
  • Performance: PTOPOFL consistently outperforms FedAvg, FedProx, SCAFFOLD, and pFedMe, especially under severe non‑IID conditions.
  • Privacy: The mutual information analysis and empirical attacks confirm that PH descriptors are far harder to invert, cutting reconstruction success by more than 75 %.
  • Convergence: The Wasserstein‑weighted scheme reaches the target loss in fewer communication rounds, confirming the theoretical linear convergence claim.

Practical Implications

  • Secure Cross‑Organization Collaboration: Healthcare consortia, fintech networks, or any multi‑party setting can now share model insights without exposing raw gradients, dramatically lowering regulatory risk.
  • Personalisation at Scale: By clustering clients based on the intrinsic geometry of their learned models, PTOPOFL naturally yields personalized models that respect data heterogeneity—a boon for edge devices with wildly different usage patterns.
  • Bandwidth Efficiency: A 48‑dimensional float vector (~200 bytes) replaces potentially megabytes of gradient data, reducing network load and enabling FL over constrained IoT links.
  • Plug‑and‑Play Integration: The authors provide an open‑source PyTorch‑compatible library; existing FL pipelines (e.g., Flower, TensorFlow Federated) can adopt the topological communication layer with minimal code changes.

Limitations & Future Work

  • Computational Overhead on Clients: Computing persistent homology, while tractable for moderate model sizes, adds extra CPU/GPU load that may be prohibitive on very low‑power devices.
  • Fixed Descriptor Size: The 48‑dimensional vector is a design choice; scaling to larger models or richer topological features may require adaptive dimensionality.
  • Assumption of Strong Convexity: The privacy proof hinges on strongly convex loss functions; extending guarantees to highly non‑convex deep nets remains an open question.
  • Broader Attack Models: The paper focuses on reconstruction attacks; future work could explore resistance to membership inference, model inversion, or poisoning attacks within the PH framework.

Overall, PTOPOFL opens a promising avenue where geometry—rather than raw gradients—drives secure, personalized federated learning, offering a practical toolkit for developers aiming to build privacy‑first collaborative AI systems.

Authors

  • Kelly L Vomo-Donfack
  • Adryel Hoszu
  • Grégory Ginot
  • Ian Morilla

Paper Information

  • arXiv ID: 2603.04323v1
  • Categories: cs.LG, cs.CR, cs.DC, math.AT, stat.ML
  • Published: March 4, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »