[Paper] IntraShuffler: A Privacy Preserving Framework for Heterogeneous DP Federated Learning

Published: 3 days ago (June 1, 2026 at 01:54 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2606.02563v1

Overview

The paper introduces IntraShuffler, a middleware that protects client privacy in heterogeneous‑differential‑privacy federated learning (HDP‑FL). By intelligently shuffling model updates within privacy‑compatible groups, it thwarts a new “privacy inference attack” that can otherwise expose client‑level data characteristics—even when the server already knows each client’s privacy budget.

Key Contributions

Privacy inference attack model: Demonstrates how an honest‑but‑curious server can combine gradient denoising with surrogate modeling to recover client data distributions and link updates across training rounds.
IntraShuffler framework: Proposes a privacy‑aware shuffling mechanism that (1) buckets clients with compatible ε‑budgets and (2) performs parameter‑level shuffling inside each bucket, preserving ε‑aware aggregation while breaking gradient linkability.
Empirical validation: Shows >60 % reduction in gradient recoverability and a drop in surrogate inference accuracy from 0.78 → 0.33 across four benchmark datasets, with negligible loss in model utility for common FL aggregation rules (FedAvg, FedProx, etc.).
Compatibility with HDP‑FL: Unlike classic shuffle‑model defenses, IntraShuffler works with per‑client ε‑weighting, making it practical for real‑world federated deployments where privacy budgets differ.

Methodology

Threat model: The server knows each client’s declared εᵢ and can observe the aggregated model after each round. It may apply gradient denoising to reduce DP noise and train a surrogate model that maps noisy updates back to data distribution attributes.
Attack evaluation:
- Surrogate inference accuracy measures how well the server predicts a client’s data distribution (e.g., class proportions).
- Linkage success quantifies the probability of correctly matching updates from the same client across rounds.
IntraShuffler design:
- Bucket formation: Clients are grouped so that the variance of εᵢ within a bucket stays below a configurable threshold, ensuring the server can still apply ε‑aware weighting.
- Parameter‑level shuffling: Within each bucket, the server randomly permutes the order of model parameters (or sub‑vectors) before aggregation, breaking the persistent structure that the attack relies on.
- Aggregation: After shuffling, the server performs the usual ε‑aware weighted average, then un‑shuffles the global model for distribution back to clients.
Experimental setup: Four heterogeneous datasets (e.g., FEMNIST, Shakespeare, CIFAR‑10 with non‑IID splits) were used. The authors compared three baselines (no defense, classic shuffle‑model, and DP‑only) against IntraShuffler under multiple aggregation rules.

Results & Findings

Metric	No Defense	Classic Shuffle	IntraShuffler
Surrogate inference accuracy	0.78	0.55	0.33
Linkage success (↑)	0.71	0.42	0.18
Test accuracy (model utility)	84.2 %	83.7 %	84.0 %
Gradient recoverability reduction	–	38 %	>60 %

Privacy gain: IntraShuffler cuts the attacker’s ability to infer client‑level distributional info by more than half compared to the classic shuffle model.
Utility preservation: Model performance remains within 0.2 % of the unprotected baseline, confirming that the shuffling does not materially degrade learning.
Scalability: Overhead is modest—bucket formation adds <5 ms per round on a 100‑client simulation, and parameter shuffling incurs negligible compute cost.

Practical Implications

Enterprise FL platforms: Companies that must honor different privacy contracts per client (e.g., hospitals with varying HIPAA constraints) can adopt IntraShuffler to keep per‑client ε‑weighting while mitigating inference leakage.
Edge‑AI deployments: Mobile or IoT federations often run on limited hardware; IntraShuffler’s lightweight shuffling fits within existing communication pipelines without extra bandwidth.
Regulatory compliance: By reducing the risk of linking updates to specific data sources, the framework helps satisfy stricter privacy regulations (GDPR Art. 25 “data protection by design”).
Open‑source integration: The middleware can be plugged into popular FL frameworks (TensorFlow Federated, PySyft) as a pre‑aggregation hook, making adoption straightforward for developers.

Limitations & Future Work

Bucket granularity trade‑off: Tight ε‑compatibility thresholds improve privacy but may force many small buckets, reducing the shuffling’s effectiveness. Adaptive bucket sizing strategies are needed.
Assumed honest‑but‑curious server: The attack model does not consider a malicious server that can tamper with the shuffling process; integrity verification mechanisms would be a logical extension.
Evaluation scope: Experiments focus on image/text classification tasks; extending to larger‑scale models (e.g., Transformers) and other modalities (speech, graph data) remains open.
Theoretical guarantees: While empirical results are strong, formal privacy bounds that combine HDP and intra‑bucket shuffling are not yet derived. Future work could aim to prove a unified privacy accounting framework.

Authors

Farhin Farhad Riya
Olivera Kotevska
Jinyuan Stella Sun

Paper Information

arXiv ID: 2606.02563v1
Categories: cs.LG, cs.CR, cs.DC
Published: June 1, 2026
PDF: Download PDF

[Paper] IntraShuffler: A Privacy Preserving Framework for Heterogeneous DP Federated Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

[Paper] Streaming Communication in Multi-Agent Reasoning

[Paper] Reinforcement Learning from Rich Feedback with Distributional DAgger

[Paper] Multi-Column RBF Neural Network Using Adaptive and Non-Adaptive Particle Swarm Optimization