[Paper] A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine

Published: 1 week ago (January 29, 2026 at 01:48 PM EST)

3 min read

Source: arXiv

Source: arXiv - 2601.22124v1

Overview

A new study proposes Fed‑MedLoRA, a federated‑learning framework that lets multiple hospitals fine‑tune massive language models for medical tasks without sharing raw patient data or the full model weights. By sending only tiny low‑rank adapters, the approach slashes communication costs and tackles the notorious data‑heterogeneity problem that plagues traditional federated learning in healthcare.

Key Contributions

Parameter‑efficient federated learning: Introduces Fed‑MedLoRA, which transmits only LoRA (Low‑Rank Adaptation) adapters instead of the entire multi‑billion‑parameter LLM.
Heterogeneity‑aware aggregation: Extends the base method to Fed‑MedLoRA+, adding an adaptive, data‑aware weighting scheme that improves convergence when sites have vastly different patient populations and documentation styles.
Real‑world medical IE benchmark: Applies the framework to clinical information extraction (IE) across five diverse patient cohorts, comparing against strong baselines (BERT, LLaMA‑3, DeepSeek‑R1, GPT‑4o).
Comprehensive evaluation: Tests in‑domain performance, external validation on unseen institutions, and a low‑resource “new‑site” adaptation scenario using real notes from Yale New Haven Health.
Open‑source implementation: Provides code and adapter checkpoints to accelerate reproducibility and downstream adoption.

Methodology

Base Model Selection – Starts from a pre‑trained LLM (e.g., LLaMA‑3) that already exhibits strong medical reasoning.
LoRA Adapter Insertion – Inserts low‑rank trainable matrices into each transformer layer; the original weights stay frozen. This reduces the number of trainable parameters from billions to a few megabytes per site.
Federated Training Loop
- Each participating hospital downloads the current global adapter set.
- Local data (clinical notes) are used to fine‑tune only the adapters for a few epochs.
- Only the updated adapter deltas are uploaded back to the central server.
Adaptive Aggregation (Fed‑MedLoRA+) – The server computes site‑specific weights based on validation loss, data size, and a measure of distribution shift, then aggregates adapters accordingly.
Evaluation Pipeline – After each round, the global adapter is evaluated on a held‑out IE test set (entity and relation extraction) for each cohort, enabling early stopping and performance tracking.

Results & Findings

Setting	Model	F1 (Entity)	F1 (Relation)	Communication (GB)
In‑domain (5 sites)	Fed‑MedLoRA	84.2	78.5	0.12
Fed‑MedLoRA+ (heterogeneous)	Fed‑MedLoRA+	86.1	80.3	0.13
Baseline BERT‑based IE	—	71.4	64.0	0.45
LLaMA‑3 (centralized)	—	83.5	77.9	2.3
GPT‑4o (zero‑shot)	—	78.0	71.2	–

Communication savings: Transmitting adapters cut bandwidth by > 95 % compared with sending the full LLM.
Heterogeneity handling: Fed‑MedLoRA+ consistently outperformed the vanilla version on cohorts with divergent note styles (e.g., pediatric vs. oncology).
Low‑resource adaptation: When a brand‑new site with only 200 notes joined, the federated adapters boosted its IE F1 from 62 % (local fine‑tune) to 78 % after just two communication rounds.

Practical Implications

Scalable multi‑institution collaborations – Hospitals can jointly improve a shared medical LLM without exposing PHI or needing petabyte‑scale network links.
Rapid deployment in new clinics – A small batch of local notes is enough to “plug‑in” the global adapter, dramatically shortening time‑to‑value for AI‑assisted chart review or coding assistance.
Cost‑effective model updates – Because only adapters are exchanged, existing on‑premise LLM deployments (e.g., via NVIDIA DGX or cloud‑based inference APIs) can stay static while still benefiting from the latest federated knowledge.
Regulatory friendliness – The approach aligns with data‑locality requirements (e.g., HIPAA, GDPR) since raw text never leaves the institution.

Limitations & Future Work

Adapter expressiveness – While LoRA adapters are lightweight, they may not capture all nuances needed for highly specialized tasks (e.g., rare disease phenotyping).
Security of updates – The paper acknowledges potential model‑inversion attacks on uploaded adapters; future work should explore differential privacy or secure aggregation.
Broader task coverage – Experiments focus on information extraction; extending to generative clinical tasks (summarization, decision support) remains an open question.
Scalability to dozens of sites – The current study involves five institutions; testing the framework at national or international scales will be needed to validate robustness under extreme heterogeneity.

Authors

Anran Li
Yuanyuan Chen
Wenjun Long
Yu Yin
Yan Hu
Hyunjae Kim
Weipeng Zhou
Yujia Zhou
Hongyi Peng
Yang Ren
Xuguang Ai
Zhenyue Qin
Ming Hu
Xiaoxiao Li
Han Yu
Yih‑Chung Tham
Lucila Ohno‑Machado
Hua Xu
Qingyu Chen

Paper Information

arXiv ID: 2601.22124v1
Categories: cs.CL, cs.DC
Published: January 29, 2026
PDF: Download PDF

[Paper] A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] FOCUS: DLLMs Know How to Tame Their Compute Bound

[Paper] UPA: Unsupervised Prompt Agent via Tree-Based Search and Selection

[Paper] PaperBanana: Automating Academic Illustration for AI Scientists

[Paper] Agnostic Language Identification and Generation