[Paper] Training Together, Diagnosing Better: Federated Learning for Collagen VI-Related Dystrophies

Published: (December 18, 2025 at 01:44 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.16876v1

Overview

A new study demonstrates how Federated Learning (FL) can boost the accuracy of machine‑learning diagnostics for the ultra‑rare collagen VI‑related dystrophies (COL6‑RD). By training a shared model on microscopy images that stay on‑site at two international research groups, the authors achieve a diagnostic F1‑score of 0.82, markedly higher than models trained in isolation.

Key Contributions

  • First FL deployment for COL6‑RD: Connects two geographically separated biobanks through the Sherpa.ai FL platform while keeping patient images private.
  • Multi‑class pathology classifier: Automatically distinguishes the three dominant COL6‑RD mechanisms (exon skipping, glycine substitution, pseudo‑exon insertion) from immunofluorescence images of patient fibroblasts.
  • Performance lift: Global FL model outperforms single‑site models (0.57‑0.75 F1) and narrows the gap between research labs with heterogeneous data.
  • Open‑source pipeline: Provides reproducible code for data preprocessing, model architecture, and FL orchestration, facilitating adoption by other rare‑disease consortia.
  • Clinical relevance roadmap: Shows how the model can aid variant‑of‑uncertain‑significance (VUS) interpretation and prioritize sequencing strategies.

Methodology

  1. Data sources – Two partner institutions contributed collagen VI immunofluorescence microscopy slides from patient‑derived fibroblast cultures. Each site retained its raw images behind its firewall.
  2. Pre‑processing – Images were normalized, resized to a common resolution, and augmented (rotations, flips) to mitigate batch effects.
  3. Model architecture – A lightweight convolutional neural network (CNN) with three convolutional blocks followed by a fully‑connected classifier was chosen to run efficiently on modest hospital GPUs.
  4. Federated training loop
    • The central Sherpa.ai server distributes the current model weights to each site.
    • Each site performs a few local SGD epochs on its private data, computes weight updates, and sends only the encrypted gradients back.
    • The server aggregates updates using FedAvg (weighted by local sample size) to produce a new global model.
    • The cycle repeats for 50 communication rounds.
  5. Evaluation – After training, the global model is evaluated on a held‑out test set from both sites, and per‑site baselines are trained for comparison.

Results & Findings

MetricGlobal FL ModelBest Single‑Site Model
F1‑score0.820.75 (Site A) / 0.57 (Site B)
Precision0.840.78 / 0.60
Recall0.800.73 / 0.55

Confusion: Errors are most common between exon‑skipping and pseudo‑exon insertion, reflecting subtle visual similarities.

Interpretation: The federated approach not only raises overall accuracy but also improves class balance, suggesting that the model learns more robust, disease‑specific visual cues rather than over‑fitting to site‑specific staining patterns.

Practical Implications

  • Accelerated diagnosis: Clinicians can upload a single fibroblast image to a secure portal and receive a rapid, AI‑assisted pathogenic‑mechanism prediction, shortening the time to targeted genetic testing.
  • Privacy‑preserving collaboration: Hospitals can join a diagnostic network without exposing raw patient images, complying with GDPR, HIPAA, and other regulations.
  • Scalable rare‑disease consortia: The same FL framework can be extended to other ultra‑rare conditions where data are scattered across specialized centers.
  • Decision support for genomics: By flagging the likely molecular mechanism, the model can guide which exons to prioritize in sequencing panels, reducing cost and turnaround time.
  • Tooling for VUS interpretation: When a novel variant is found, the image‑based prediction offers orthogonal evidence that can tip the balance toward pathogenic or benign classification.

Limitations & Future Work

  • Dataset size & diversity: Only two sites participated; adding more institutions (especially from different continents) could further improve generalization.
  • Model complexity: The current CNN is deliberately simple; exploring transformer‑based vision models may capture subtler patterns.
  • Explainability: While saliency maps were generated, a systematic study of which image features drive each class decision is still needed for clinical trust.
  • Regulatory pathway: Translating the prototype into a certified medical device will require extensive validation on prospective patient cohorts.
  • Extension to multimodal data: Future work could fuse microscopy with genomic, transcriptomic, or clinical metadata within the FL framework for even richer diagnostic insight.

Authors

  • Astrid Brull
  • Sara Aguti
  • Véronique Bolduc
  • Ying Hu
  • Daniel M. Jimenez-Gutierrez
  • Enrique Zuazua
  • Joaquin Del‑Rio
  • Oleksii Sliusarenko
  • Haiyan Zhou
  • Francesco Muntoni
  • Carsten G. Bönnemann
  • Xabi Uribe‑Etxebarria

Paper Information

  • arXiv ID: 2512.16876v1
  • Categories: cs.LG, cs.AI, cs.CV, cs.DC
  • Published: December 18, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »