[Paper] CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification
Source: arXiv - 2605.06571v1
Overview
The paper introduces CLAD, a federated‑learning framework that simultaneously detects anomalies and classifies attacks in heterogeneous IoT/IIoT environments. By marrying clustered federated learning with a novel dual‑mode micro‑architecture (DM²A), the authors show how to turn the massive amount of unlabeled edge data into a security advantage while keeping communication and privacy costs low.
Key Contributions
- Clustered Federated Learning (CFL) for IoT heterogeneity – dynamically groups devices with similar traffic patterns, preventing a single global model from being pulled in opposite directions.
- Dual‑Mode Micro‑Architecture (DM²A) – a shared encoder plus two heads (unsupervised anomaly detector + supervised attack classifier) that can be trained on both labeled and unlabeled clients in the same round.
- Label‑agnostic training pipeline – the system extracts useful signals from 80 % of clients that have no ground‑truth labels, dramatically reducing the need for costly manual annotation.
- Communication‑efficient design – model updates are compressed and only exchanged within clusters, cutting total bandwidth usage by ~50 % compared with vanilla federated IDS baselines.
- Extensive empirical validation – experiments on realistic IoT traffic datasets demonstrate a 30 % relative boost in detection F1‑score over the strongest prior methods under severe label scarcity.
Methodology
- Client Clustering – before each federated round, the server runs a lightweight similarity check on recent traffic feature statistics (e.g., flow size, protocol distribution). Devices whose statistics are close are placed in the same cluster. Each cluster maintains its own model copy, so updates from dissimilar devices do not interfere.
- DM²A Architecture
- Shared Encoder: a few convolutional / transformer blocks that learn a compact representation of raw packet/flow data.
- Anomaly‑Detection Head: trained with a reconstruction‑oriented loss (e.g., auto‑encoder or contrastive loss) on all clients, regardless of label availability.
- Attack‑Classification Head: a standard cross‑entropy classifier that only receives gradients from the subset of clients that have labeled attack samples.
- Joint Training Loop
- Every round, each client computes encoder gradients plus the appropriate head loss (both heads for labeled clients, only the anomaly head for unlabeled ones).
- The server aggregates updates per cluster using FedAvg, then broadcasts the refreshed model back to the cluster members.
- Label‑Agnostic Aggregation – because the anomaly head is always present, even completely unlabeled devices contribute useful gradient information, keeping the encoder well‑regularized and preventing drift.
Results & Findings
| Scenario | % Labeled Clients | Detection F1 (baseline) | Detection F1 (CLAD) | Communication Cost (relative) |
|---|---|---|---|---|
| Balanced traffic, 5 clusters | 20 % | 0.71 | 0.92 (+30 %) | 0.5× |
| Highly skewed traffic, 8 clusters | 10 % | 0.64 | 0.84 (+31 %) | 0.48× |
| Real‑world IoT testbed, 80 % unlabeled | 20 % | 0.68 | 0.88 (+30 %) | 0.52× |
- Robustness to heterogeneity: clustering prevented catastrophic forgetting when devices exhibited wildly different protocols (e.g., MQTT vs. Modbus).
- Label scarcity tolerance: performance degradation was modest even when only 10 % of clients supplied labels.
- Bandwidth savings: because only cluster‑specific model deltas are exchanged, total uplink traffic is roughly half of a monolithic FL approach.
Practical Implications
- Deployable IDS for edge fleets – operators can roll out CLAD across thousands of sensors, routers, and PLCs without needing to label every device’s traffic.
- Reduced annotation overhead – security teams can focus labeling effort on a small, representative subset of devices, letting the unsupervised head learn the rest.
- Scalable privacy‑preserving security – federated updates keep raw packets on‑device, satisfying GDPR‑type constraints while still benefiting from collective intelligence.
- Cost‑effective network monitoring – halved communication translates to lower data‑plan expenses for remote or satellite‑linked IoT nodes.
- Adaptable to new protocols – the clustering step automatically creates new model groups when a novel device type appears, avoiding a costly retraining of a monolithic model.
Limitations & Future Work
- Cluster formation overhead – the similarity computation assumes periodic access to lightweight traffic statistics; in ultra‑low‑power devices this may still be non‑trivial.
- Static clustering granularity – the current approach fixes the number of clusters per round; dynamic merging/splitting could further improve adaptability.
- Evaluation on synthetic datasets – while the authors used a realistic testbed, broader validation on public IoT IDS benchmarks (e.g., TON_IoT, Edge‑IIoTset) would strengthen generality claims.
- Potential adversarial poisoning – malicious clients could manipulate cluster assignments or gradient contributions; future work could integrate robust aggregation or Byzantine‑resilient clustering.
Overall, CLAD offers a compelling blueprint for building privacy‑preserving, label‑agnostic intrusion detection systems that can keep pace with the exploding diversity of IoT deployments.
Authors
- Iason Ofeidis
- Nikos Papadis
- Randeep Bhatia
- Leandros Tassiulas
- TV Lakshman
Paper Information
- arXiv ID: 2605.06571v1
- Categories: cs.LG, cs.CR, cs.DC, cs.NI
- Published: May 7, 2026
- PDF: Download PDF