[Paper] DP-FEDSOFIM: Differentially Private Federated Stochastic Optimization using Regularized Fisher Information Matrix

Published: 3 weeks ago (January 14, 2026 at 12:11 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.09166v1

Overview

The paper introduces DP‑FedSOFIM, a new federated learning (FL) framework that combines differential privacy (DP) with a lightweight second‑order optimizer. By using the Fisher Information Matrix (FIM) as a natural‑gradient preconditioner on the server side, the method speeds up convergence while keeping the per‑client memory and compute costs linear in the model size—making DP‑FL viable for modern, high‑dimensional neural nets.

Key Contributions

Server‑side second‑order preconditioning: Leverages the FIM as a natural‑gradient matrix without requiring each client to store or invert a full (d \times d) covariance matrix.
Linear‑time and linear‑space client footprint: Uses the Sherman‑Morrison formula to update the inverse FIM efficiently, yielding (O(d)) memory and computation per client per round.
Rigorous privacy guarantee: Shows that the server‑side preconditioning is a post‑processing step, preserving the original ((\varepsilon,\delta))-DP budget of the client‑side noise injection.
Empirical superiority: Demonstrates on CIFAR‑10 that DP‑FedSOFIM consistently outperforms first‑order DP‑FL baselines (e.g., DP‑FedAvg, DP‑FedProx) across a range of tight privacy budgets.
Generalizable framework: The approach can be plugged into any existing DP‑FL pipeline that already aggregates noisy gradients, requiring only a modest change on the server.

Methodology

Standard DP‑FL pipeline: Each client computes a local gradient on its private data, clips it to a fixed norm, adds Gaussian noise calibrated to the desired ((\varepsilon,\delta)) guarantee, and sends the noisy gradient to the server.
Server‑side Fisher Information Matrix:
- The server maintains an estimate of the global Fisher Information Matrix (F), which captures curvature information of the loss landscape.
- Instead of storing the full matrix, the server keeps its inverse (F^{-1}) and updates it incrementally using the Sherman‑Morrison rank‑one update:
  [ F^{-1}_{t+1}=F^{-1}_t - \frac{F^{-1}_t u u^\top F^{-1}_t}{1 + u^\top F^{-1}_t u} ]
  where (u) is the aggregated (noisy) gradient vector. This update costs only (O(d)).
Natural‑gradient step: The server preconditions the aggregated gradient with (F^{-1}) before applying the model update:
[ w_{t+1}=w_t - \eta , F^{-1}_t , \tilde{g}_t ]
where (\tilde{g}_t) is the noisy, clipped gradient sum and (\eta) is a learning‑rate scalar.
Privacy preservation: Since the server only receives differentially private gradients and then performs deterministic post‑processing (matrix updates and multiplication), the overall privacy guarantee remains unchanged by the post‑processing theorem.

Results & Findings

Privacy budget ((\varepsilon))	DP‑FedAvg (test acc.)	DP‑FedProx (test acc.)	DP‑FedSOFIM (test acc.)
0.5	58.2 %	60.1 %	66.4 %
1.0	68.7 %	70.3 %	75.9 %
2.0	77.5 %	78.9 %	82.1 %

Faster convergence: DP‑FedSOFIM reaches 70 % accuracy in roughly half the number of communication rounds required by DP‑FedAvg under (\varepsilon=1).
Stability under tight budgets: The natural‑gradient preconditioner mitigates the variance introduced by DP noise, leading to smoother loss curves.
Scalability: Experiments with ResNet‑18 (≈ 11 M parameters) confirm that client‑side memory stays below 50 MB, well within the limits of typical edge devices.

Practical Implications

Edge‑device training: Mobile or IoT devices can now participate in DP‑FL for larger models without hitting memory or compute bottlenecks, opening doors for privacy‑preserving personalization (e.g., on‑device language models).
Reduced communication cost: Faster convergence translates to fewer rounds of gradient exchange, cutting bandwidth usage—a critical factor for federated setups with intermittent connectivity.
Easier integration: Since the only change is on the server side, existing DP‑FL deployments can adopt DP‑FedSOFIM by swapping the aggregation step for the natural‑gradient update, preserving the same client‑side code and privacy accounting.
Regulatory compliance: Organizations that must meet strict privacy budgets (e.g., GDPR, HIPAA) can achieve higher model utility without relaxing (\varepsilon), making DP‑FL a more attractive option for sensitive domains like healthcare or finance.

Limitations & Future Work

FIM approximation quality: The method relies on a running estimate of the Fisher matrix; if the data distribution drifts significantly across rounds, the preconditioner may become stale.
Evaluation scope: Experiments are limited to image classification (CIFAR‑10) and a single model architecture. Broader benchmarks (NLP, recommendation systems) are needed to confirm generality.
Server load: While client overhead is linear, the server must maintain and invert the global FIM, which can become a bottleneck for extremely large models (e.g., > 100 M parameters). Future work could explore low‑rank or block‑diagonal approximations to keep server computation scalable.
Privacy‑utility trade‑off analysis: A deeper theoretical study of how the curvature information interacts with DP noise could guide adaptive clipping or noise‑scaling strategies.

DP‑FedSOFIM demonstrates that second‑order information can be harnessed in a privacy‑preserving federated setting without sacrificing the lightweight nature required for real‑world deployments. As federated learning moves from research labs to production, such techniques will be key to delivering high‑quality, private AI services at scale.

Authors

Sidhant R. Nair
Tanmay Sen
Mrinmay Sen

Paper Information

arXiv ID: 2601.09166v1
Categories: cs.LG, cs.CR, cs.DC
Published: January 14, 2026
PDF: Download PDF

[Paper] DP-FEDSOFIM: Differentially Private Federated Stochastic Optimization using Regularized Fisher Information Matrix

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management