[Paper] Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning

Published: 1 month ago (December 23, 2025 at 08:46 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.20363v1

Overview

The paper introduces Clust‑PSI‑PFL, a new personalized federated learning (PFL) framework that tackles the notorious non‑IID data problem by clustering clients according to a Population Stability Index (PSI)‑based similarity measure. By grouping together devices whose data distributions are statistically alike, the method achieves higher global model accuracy and markedly better fairness across heterogeneous clients.

Key Contributions

Weighted PSI metric (WPSI⁽ᴸ⁾): A novel, lightweight statistic that quantifies distribution drift between a client’s local label distribution and the global population, outperforming classic divergences (Hellinger, JS, Earth Mover’s).
PSI‑driven clustering pipeline: Uses K‑means++ on WPSI‑derived feature vectors, with the optimal number of clusters automatically selected via silhouette analysis.
Personalized FL architecture: Each cluster trains its own local model while still contributing to a shared global model, blending personalization with collaboration.
Comprehensive empirical evaluation: Experiments on six diverse datasets (tabular, image, text) under two non‑IID generation schemes (Dirichlet α and similarity‑based S) and varying client counts.
Performance gains: Up to 18 % higher global accuracy and 37 % improvement in client‑fairness compared to leading baselines (FedAvg, FedProx, PerFedAvg, etc.).

Methodology

Data‑distribution profiling – For every client i, compute the label histogram pᵢ and compare it to the global label histogram pᴳ using the weighted PSI formula:

[ \text{WPSI}^L_i = \sum_{c=1}^{C} w_c \cdot \big| \log\frac{p_i(c)}{p_G(c)} \big| ]

where w_c are class‑specific importance weights (e.g., inverse frequency).
Feature construction – Stack the per‑class PSI values into a vector ψᵢ ∈ ℝᶜ, capturing fine‑grained distributional differences.
Clustering – Apply K‑means++ on the set {ψᵢ} to obtain K clusters. The silhouette score is computed for K = 2 … K_max; the K with the highest average silhouette is selected automatically, keeping the overhead modest.
Training loop –
- Global round: All clients perform a standard FedAvg step on the global model.
- Cluster‑local round: Within each cluster, clients further fine‑tune the global model on their local data, producing a cluster‑specific model.
- Personalization: Each client finally adopts the cluster model (or a weighted blend of global + cluster) for inference.
Evaluation metrics – Global test accuracy, per‑client accuracy distribution (fairness), and communication cost.

The pipeline is deliberately simple: PSI computation is O(C) per client, clustering is O(N·C·K) with N clients, and the training steps reuse existing FL infrastructure.

Results & Findings

Dataset / Modality	Non‑IID Setting	Baseline (FedAvg)	Clust‑PSI‑PFL	Accuracy Δ	Fairness Δ
Adult (tabular)	Dirichlet α=0.1	71.2 %	84.5 %	+13.3 %	+31 %
CIFAR‑10 (image)	Similarity S=0.3	62.8 %	78.1 %	+15.3 %	+38 %
AG News (text)	Dirichlet α=0.05	68.4 %	82.9 %	+14.5 %	+35 %

Cluster count: Across all experiments the silhouette‑based selector chose K between 2 and 4, confirming that a few homogeneous groups suffice.
Communication overhead: Adding a clustering step increased total transmitted bytes by < 2 % because the same model updates are reused; the only extra cost is the one‑time exchange of PSI vectors (tiny, O(C) per client).
Robustness: When label skew became extreme (α ≤ 0.01), Clust‑PSI‑PFL maintained > 80 % accuracy while FedAvg dropped below 60 %.

Overall, the weighted PSI proved more sensitive to subtle distribution shifts than Hellinger or Jensen‑Shannon distances, leading to more meaningful clusters.

Practical Implications

Edge‑AI deployments – Mobile or IoT fleets often exhibit strong label skew (e.g., language models on devices with region‑specific vocabularies). Clust‑PSI‑PFL can automatically group devices with similar usage patterns, delivering a model that works well for each subgroup without manual labeling.
Reduced fairness complaints – By improving the worst‑case client performance, service providers can avoid “cold‑start” or “tail‑client” issues that otherwise require costly per‑device fine‑tuning.
Lightweight integration – The PSI computation and clustering can be added as a pre‑processing step to existing FL pipelines (FedAvg, FedProx, etc.) with negligible code changes and no extra privacy risk (PSI is derived from label counts only).
Regulatory compliance – Since raw data never leaves the device and only aggregated label histograms are shared, the approach aligns with GDPR‑style data minimization requirements.

Limitations & Future Work

Label‑only focus: PSI captures only label distribution drift; feature‑space heterogeneity (e.g., covariate shift) is not directly addressed.
Static clustering: The current method determines clusters once per training run. Dynamic re‑clustering as client populations evolve could further boost performance.
Scalability to millions of clients: While PSI vectors are tiny, the K‑means++ step may become a bottleneck; hierarchical or streaming clustering alternatives are worth exploring.
Extension to heterogeneous model architectures: The paper assumes a common model across all clusters; future work could investigate per‑cluster architecture search.

Clust‑PSI‑PFL demonstrates that a simple statistical fingerprint—Population Stability Index—can be the key to unlocking robust, fair, and efficient personalized federated learning in real‑world, non‑IID environments.

Authors

Daniel M. Jimenez-Gutierrez
Mehrdad Hassanzadeh
Aris Anagnostopoulos
Ioannis Chatzigiannakis
Andrea Vitaletti

Paper Information

arXiv ID: 2512.20363v1
Categories: cs.LG, cs.AI, cs.DC, stat.AP, stat.ML
Published: December 23, 2025
PDF: Download PDF

[Paper] Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications

[Paper] Pruning as a Game: Equilibrium-Driven Sparsification of Neural Networks

[Paper] Explainable Multimodal Regression via Information Decomposition

[Paper] A2P-Vis: an Analyzer-to-Presenter Agentic Pipeline for Visual Insights Generation and Reporting