[Paper] zkFL-Health: Blockchain-Enabled Zero-Knowledge Federated Learning for Medical AI Privacy
Source: arXiv - 2512.21048v1
Overview
The paper zkFL-Health proposes a new way to train medical AI models across hospitals without ever exposing raw patient data or trusting a single central server. By marrying federated learning (FL) with zero‑knowledge proofs (ZKPs) and Trusted Execution Environments (TEEs), the authors create a blockchain‑backed pipeline that guarantees both data privacy and verifiable correctness of the aggregated model updates.
Key Contributions
- Zero‑knowledge‑verified aggregation: Introduces a succinct ZKP (built on Halo2/Nova) that proves the global model was computed exactly from the committed client updates, without leaking any gradient information.
- TEE‑protected aggregator: Runs the aggregation logic inside a hardware‑based Trusted Execution Environment, eliminating the “single point of failure” problem of traditional FL servers.
- On‑chain audit trail: Stores cryptographic commitments and verification receipts on a public blockchain, providing immutable evidence for regulators and auditors.
- Healthcare‑specific threat model: Formalizes privacy and integrity risks unique to medical data sharing (e.g., membership inference, gradient inversion, malicious aggregator).
- Performance evaluation framework: Outlines metrics for accuracy, privacy leakage, latency, and operational cost, paving the way for real‑world benchmarking.
Methodology
- Local Training & Commitment: Each participating hospital trains the model on its own patient records and generates a cryptographic commitment (e.g., a hash) of its model update.
- Secure Aggregation in a TEE: The aggregator runs inside a Trusted Execution Environment (Intel SGX, AMD SEV, etc.). It fetches the committed updates, performs the standard FL aggregation (e.g., weighted averaging), and never exposes raw updates to the host OS.
- Zero‑Knowledge Proof Generation: While still inside the TEE, the system constructs a succinct ZKP that attests:
- The exact set of committed updates was used.
- The aggregation rule was applied correctly.
- No additional data was injected or omitted.
- On‑Chain Verification: Verifier nodes (could be other hospitals or independent auditors) download the proof, run a fast verification algorithm, and record the result on a blockchain (Ethereum, Polygon, etc.). The blockchain entry includes the global model hash and the proof receipt, creating an immutable log.
- Model Distribution: Once verified, the new global model is broadcast back to all participants for the next training round.
The whole flow is orchestrated with standard FL communication patterns (gRPC/WebSockets) and leverages existing ZKP libraries, so developers can plug it into existing pipelines with modest changes.
Results & Findings
While the paper primarily outlines the architecture and a planned evaluation, the authors anticipate the following outcomes based on preliminary simulations:
| Metric | Expected Outcome |
|---|---|
| Model Accuracy | Comparable to vanilla FL (≤ 1 % drop) because aggregation is mathematically identical. |
| Privacy Leakage | Near‑zero gradient leakage; ZKP prevents any adversary from extracting client updates. |
| Proof Generation Time | Sub‑second to a few seconds per round on modern CPUs with hardware acceleration. |
| Verification Cost | Micro‑cost on-chain (≈ $0.001 per proof on Ethereum L2) and sub‑millisecond verification time. |
| End‑to‑End Latency | Slight increase (≈ 5‑10 % overhead) due to proof generation, deemed acceptable for medical training cycles (hours‑days). |
These findings suggest that the added cryptographic guarantees come at a modest performance price, well within the tolerances of most clinical AI development timelines.
Practical Implications
- Regulatory Compliance: Immutable on‑chain proofs satisfy audit requirements from HIPAA, GDPR, and emerging AI‑specific regulations, reducing legal friction for multi‑institution collaborations.
- Trustless Partnerships: Hospitals can join a consortium without needing a mutually trusted aggregator; the TEE + ZKP combo enforces honesty automatically.
- Developer Tooling: The protocol can be wrapped as a library (e.g.,
zkfl-health-sdk) that abstracts away ZKP and blockchain interactions, letting ML engineers focus on model design. - Cost‑Effective Auditing: Verifiers are lightweight nodes; the blockchain storage cost is minimal compared to traditional secure logging solutions.
- Extensibility: The same pattern can be applied to other privacy‑sensitive domains (finance, genomics) where federated learning is attractive but trust is a barrier.
Limitations & Future Work
- TEE Availability & Attestation: Not all data centers host SGX/SEV hardware, and remote attestation adds operational complexity.
- Scalability of Proofs: While Halo2/Nova are efficient, proof generation still grows with the number of participants; future work will explore batch aggregation and recursive proofs.
- Network Overhead: Storing commitments and proofs on-chain can increase bandwidth, especially for large‑scale consortia; layer‑2 scaling solutions are under investigation.
- Real‑World Deployment: The paper’s evaluation is currently simulated; a pilot across actual hospitals will be needed to validate latency, fault tolerance, and integration with existing EMR systems.
Overall, zkFL-Health charts a promising path toward privacy‑preserving, auditable federated learning for medical AI—bridging the gap between cutting‑edge research and deployable, regulator‑friendly solutions.
Authors
- Savvy Sharma
- George Petrovic
- Sarthak Kaushik
Paper Information
- arXiv ID: 2512.21048v1
- Categories: cs.CR, cs.DC, cs.LG
- Published: December 24, 2025
- PDF: Download PDF