[Paper] Asynchronous Secure Federated Learning with Byzantine aggregators

Published: (January 8, 2026 at 08:27 AM EST)
5 min read
Source: arXiv

Source: arXiv - 2601.04930v1

Overview

Federated learning (FL) lets many devices collaboratively train a model without sharing raw data, but it still faces two tough problems: asynchrony (clients update at different speeds) and malicious aggregators that could tamper with the model or try to infer private information.
The paper “Asynchronous Secure Federated Learning with Byzantine aggregators” proposes a novel protocol that keeps the usual FL performance while guaranteeing privacy through secure aggregation and differential privacy—even when a subset of the aggregation servers behave arbitrarily.

Key Contributions

  • Byzantine‑tolerant aggregator replication – Instead of assuming a single honest server, the design replicates the aggregator and tolerates a configurable fraction of corrupted replicas.
  • Asynchronous secure aggregation protocol – Clients mask their updates and add calibrated Gaussian noise; the replicated servers jointly unmask the aggregate without needing any consensus round, sidestepping the classic impossibility of agreement in fully asynchronous settings.
  • Uniform participation & balanced privacy budget – An “inclusion mechanism” forces slower clients to be selected as often as fast ones, preventing privacy erosion for high‑frequency contributors and avoiding bias in the trained model.
  • Performance parity with the state of the art – Empirical evaluation shows comparable convergence speed, model accuracy, and communication overhead to existing synchronous, honest‑majority FL systems.

Methodology

  1. System Model

    • Clients: Heterogeneous devices that compute local model updates at their own pace.
    • Aggregators: A set of k replicated servers; up to f may be Byzantine (malicious or crash‑faulty).
    • Network: Fully asynchronous – messages can be delayed arbitrarily, and there is no global clock.
  2. Secure Masking

    • Each client generates a random mask vector and adds it to its local model update.
    • The mask is split into k secret‑shares (using Shamir’s secret sharing) and sent to the k aggregators.
  3. Differential‑Privacy Noise

    • Clients also add independent Gaussian noise calibrated to a target ((\varepsilon,\delta)) privacy budget.
  4. Aggregation & Unmasking

    • Every aggregator locally sums the masked updates it receives.
    • Because each aggregator holds a share of every client’s mask, they can collectively compute the sum of all masks without any coordination round: each simply adds the shares it holds, and the sum of all masks cancels out when the k partial sums are added together by any honest party.
    • The final unmasked, noisy aggregate is then broadcast back to the clients.
  5. Inclusion Scheduler

    • A lightweight probabilistic scheduler tracks how often each client has contributed.
    • When a client’s participation count falls below a threshold, the server increases its selection probability, guaranteeing that over a sliding window every client contributes roughly the same number of updates.
  6. Security & Liveness Guarantees

    • Privacy: Secure masking + DP noise ensures that even a coalition of up to f corrupted aggregators learns nothing beyond the noisy aggregate.
    • Byzantine Resilience: As long as fewer than a third of the aggregators are faulty (the exact bound depends on the secret‑sharing parameters), the unmasking step succeeds.
    • Liveness: No consensus is required, so the protocol makes progress regardless of message delays or server crashes.

Results & Findings

MetricBaseline (Sync, Honest‑Majority)Proposed Async‑Byzantine Scheme
Test accuracy (CIFAR‑10)84.2 %83.9 %
Convergence epochs120118
Communication per round (KB)1.21.3
Privacy budget consumed per client (ε)1.01.0 (balanced)
Tolerated faulty aggregators0Up to 30 % of k
  • Accuracy & convergence remain within 0.5 % of the synchronous, honest‑majority baseline, confirming that the extra masking and noise do not degrade learning quality.
  • Throughput improves in realistic heterogeneous environments because fast clients no longer have to wait for a global synchronization barrier.
  • Privacy balance: The inclusion scheduler equalizes the number of updates per client, so the effective ε per client stays uniform across the training run.
  • Robustness: Experiments where 2 out of 7 aggregators behaved maliciously (e.g., dropping updates, injecting biased values) showed no noticeable impact on the final model, validating the Byzantine tolerance.

Practical Implications

  • Edge‑AI platforms (smartphones, IoT gateways) can now run FL without a trusted central server, reducing the risk of a single point of failure or data leakage.
  • Regulated industries (healthcare, finance) can meet stricter privacy mandates because the protocol provides formal differential‑privacy guarantees even when part of the infrastructure is compromised.
  • Developer ergonomics: The solution works with existing federated‑averaging codebases; the only added steps are mask generation and secret‑share distribution, both of which can be encapsulated in a lightweight library.
  • Scalable deployments: Since no consensus round is needed, the system stays responsive under high network latency or partial outages—ideal for geographically distributed data centers or mobile edge clouds.

Limitations & Future Work

  • Assumed bound on corrupted aggregators – The security proof requires that fewer than a certain fraction (e.g., 1/3) of the aggregator replicas are Byzantine; exceeding this bound could break privacy.
  • Secret‑sharing overhead – Splitting masks into k shares adds modest computational cost on the client side, which may be noticeable on ultra‑low‑power devices.
  • Static privacy budget – The current implementation uses a fixed ε for the entire training run; adaptive budgeting could improve utility for long‑running tasks.
  • Evaluation scope – Experiments focus on image classification benchmarks; applying the protocol to NLP or reinforcement‑learning workloads remains an open question.

Future directions include:

  1. Dynamic replica management (adding/removing aggregators on‑the‑fly).
  2. Hybrid cryptographic primitives (e.g., homomorphic encryption) to further reduce the trust assumptions.
  3. Adaptive inclusion scheduling that reacts to real‑time client availability and network conditions.

Authors

  • Antonella Del Pozzo
  • Achille Desreumaux
  • Mathieu Gestin
  • Alexandre Rapetti
  • Sara Tucci-Piergiovanni

Paper Information

  • arXiv ID: 2601.04930v1
  • Categories: cs.DC
  • Published: January 8, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »