[Paper] CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification

Published: 3 days ago (May 7, 2026 at 01:01 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.06571v1

Overview

The paper introduces CLAD, a federated‑learning framework that simultaneously detects anomalies and classifies attacks in heterogeneous IoT/IIoT environments. By marrying clustered federated learning with a novel dual‑mode micro‑architecture (DM²A), the authors show how to turn the massive amount of unlabeled edge data into a security advantage while keeping communication and privacy costs low.

Key Contributions

Clustered Federated Learning (CFL) for IoT heterogeneity – dynamically groups devices with similar traffic patterns, preventing a single global model from being pulled in opposite directions.
Dual‑Mode Micro‑Architecture (DM²A) – a shared encoder plus two heads (unsupervised anomaly detector + supervised attack classifier) that can be trained on both labeled and unlabeled clients in the same round.
Label‑agnostic training pipeline – the system extracts useful signals from 80 % of clients that have no ground‑truth labels, dramatically reducing the need for costly manual annotation.
Communication‑efficient design – model updates are compressed and only exchanged within clusters, cutting total bandwidth usage by ~50 % compared with vanilla federated IDS baselines.
Extensive empirical validation – experiments on realistic IoT traffic datasets demonstrate a 30 % relative boost in detection F1‑score over the strongest prior methods under severe label scarcity.

Methodology

Client Clustering – before each federated round, the server runs a lightweight similarity check on recent traffic feature statistics (e.g., flow size, protocol distribution). Devices whose statistics are close are placed in the same cluster. Each cluster maintains its own model copy, so updates from dissimilar devices do not interfere.
DM²A Architecture
- Shared Encoder: a few convolutional / transformer blocks that learn a compact representation of raw packet/flow data.
- Anomaly‑Detection Head: trained with a reconstruction‑oriented loss (e.g., auto‑encoder or contrastive loss) on all clients, regardless of label availability.
- Attack‑Classification Head: a standard cross‑entropy classifier that only receives gradients from the subset of clients that have labeled attack samples.
Joint Training Loop
- Every round, each client computes encoder gradients plus the appropriate head loss (both heads for labeled clients, only the anomaly head for unlabeled ones).
- The server aggregates updates per cluster using FedAvg, then broadcasts the refreshed model back to the cluster members.
Label‑Agnostic Aggregation – because the anomaly head is always present, even completely unlabeled devices contribute useful gradient information, keeping the encoder well‑regularized and preventing drift.

Results & Findings

Scenario	% Labeled Clients	Detection F1 (baseline)	Detection F1 (CLAD)	Communication Cost (relative)
Balanced traffic, 5 clusters	20 %	0.71	0.92 (+30 %)	0.5×
Highly skewed traffic, 8 clusters	10 %	0.64	0.84 (+31 %)	0.48×
Real‑world IoT testbed, 80 % unlabeled	20 %	0.68	0.88 (+30 %)	0.52×

Robustness to heterogeneity: clustering prevented catastrophic forgetting when devices exhibited wildly different protocols (e.g., MQTT vs. Modbus).
Label scarcity tolerance: performance degradation was modest even when only 10 % of clients supplied labels.
Bandwidth savings: because only cluster‑specific model deltas are exchanged, total uplink traffic is roughly half of a monolithic FL approach.

Practical Implications

Deployable IDS for edge fleets – operators can roll out CLAD across thousands of sensors, routers, and PLCs without needing to label every device’s traffic.
Reduced annotation overhead – security teams can focus labeling effort on a small, representative subset of devices, letting the unsupervised head learn the rest.
Scalable privacy‑preserving security – federated updates keep raw packets on‑device, satisfying GDPR‑type constraints while still benefiting from collective intelligence.
Cost‑effective network monitoring – halved communication translates to lower data‑plan expenses for remote or satellite‑linked IoT nodes.
Adaptable to new protocols – the clustering step automatically creates new model groups when a novel device type appears, avoiding a costly retraining of a monolithic model.

Limitations & Future Work

Cluster formation overhead – the similarity computation assumes periodic access to lightweight traffic statistics; in ultra‑low‑power devices this may still be non‑trivial.
Static clustering granularity – the current approach fixes the number of clusters per round; dynamic merging/splitting could further improve adaptability.
Evaluation on synthetic datasets – while the authors used a realistic testbed, broader validation on public IoT IDS benchmarks (e.g., TON_IoT, Edge‑IIoTset) would strengthen generality claims.
Potential adversarial poisoning – malicious clients could manipulate cluster assignments or gradient contributions; future work could integrate robust aggregation or Byzantine‑resilient clustering.

Overall, CLAD offers a compelling blueprint for building privacy‑preserving, label‑agnostic intrusion detection systems that can keep pace with the exploding diversity of IoT deployments.

Authors

Iason Ofeidis
Nikos Papadis
Randeep Bhatia
Leandros Tassiulas
TV Lakshman

Paper Information

arXiv ID: 2605.06571v1
Categories: cs.LG, cs.CR, cs.DC, cs.NI
Published: May 7, 2026
PDF: Download PDF

[Paper] CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

[Paper] GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction