[Paper] Unknown Attack Detection in IoT Networks using Large Language Models: A Robust, Data-efficient Approach
Source: arXiv - 2602.12183v1
Overview
A new paper introduces SiamXBERT, a meta‑learning framework that leverages large language models (LLMs) to spot previously unseen (zero‑day) attacks in IoT networks. By combining flow‑level statistics with raw packet data, the approach works even when traffic is encrypted and when only a handful of labeled examples are available—two pain points for today’s intrusion‑detection systems.
Key Contributions
- Dual‑modality representation: Merges flow‑level features (e.g., packet counts, durations) with packet‑level byte sequences, preserving rich behavioral cues without needing payload decryption.
- Siamese meta‑learning with BERT: Uses a transformer‑based language model (BERT) as the backbone of a Siamese network, enabling rapid adaptation to new attack families from just a few labeled samples.
- Data‑efficient learning: Demonstrates strong detection performance with dramatically fewer training instances compared with conventional deep‑learning IDSs.
- Robust cross‑dataset generalization: Validated on multiple IoT intrusion datasets, showing consistent gains in unknown‑attack F1‑score (up to 78.8 % improvement).
- Open‑source‑ready pipeline: Provides a reproducible training/evaluation workflow that can be plugged into existing security operation centers (SOCs).
Methodology
-
Feature Extraction
- Flow‑level: Standard NetFlow/IPFIX metrics (bytes, packets, duration, inter‑arrival times).
- Packet‑level: Raw byte sequences of the first N packets in a flow, tokenized and fed to a BERT‑style transformer.
-
Siamese Architecture
- Two identical BERT encoders process a query flow and a support flow (the few labeled examples of a new attack).
- The encoders output embeddings that are compared with a distance metric (e.g., cosine similarity).
-
Meta‑Learning (Few‑Shot Adaptation)
- During training, the model sees many “episodes,” each mimicking the few‑shot scenario: a small support set of a particular attack class and a query set.
- The loss encourages the model to pull together embeddings of the same class and push apart different classes, teaching it to generalize from minimal data.
-
Inference
- For an incoming flow, SiamXBERT computes its embedding and measures similarity against the support set of known attacks.
- If similarity falls below a learned threshold, the flow is flagged as unknown (potential zero‑day).
The whole pipeline runs on standard GPU hardware and can be integrated with existing IDS pipelines that already collect flow statistics.
Results & Findings
| Setting | Baseline (e.g., CNN, LSTM) | SiamXBERT | Δ F1 (unknown attacks) |
|---|---|---|---|
| Within‑dataset (same IoT testbed) | 0.62 | 0.89 | +43 % |
| Cross‑dataset (train on one IoT dataset, test on another) | 0.48 | 0.86 | +78.8 % |
| Training data size (10 % of full set) | 0.55 | 0.84 | +53 % |
- Data efficiency: With only 5–10 labeled samples per new attack, SiamXBERT reaches >80 % of the performance obtained with the full training set.
- Encrypted traffic compatibility: Since the packet‑level input is treated as a byte‑string, the model does not rely on payload semantics, making it viable for TLS‑encrypted IoT streams.
- Fast adaptation: New attack signatures can be incorporated in under a minute of fine‑tuning, suitable for real‑time SOC workflows.
Practical Implications
- Plug‑and‑play IDS upgrades: Security teams can augment existing flow‑based IDSs with SiamXBERT without overhauling packet capture infrastructure.
- Zero‑day readiness: The few‑shot capability means that as soon as a security analyst tags a handful of suspicious flows, the system can start flagging similar unknown traffic across the network.
- Cost reduction: Less reliance on massive labeled datasets lowers the barrier for small‑to‑mid‑size enterprises to deploy advanced ML‑based detection.
- Edge deployment: The dual‑modality design can be trimmed to run on edge gateways (e.g., Raspberry Pi with a modest GPU), enabling on‑device detection before traffic reaches the cloud.
- Compliance & privacy: Because the method works on encrypted payloads, it aligns with privacy regulations that restrict deep packet inspection.
Limitations & Future Work
- Model size: BERT‑based encoders are still relatively heavyweight for ultra‑low‑power IoT nodes; pruning or distillation will be needed for truly constrained devices.
- Support‑set management: Maintaining up‑to‑date few‑shot examples for a large and evolving attack taxonomy could become operationally complex.
- Adversarial robustness: The authors note that targeted adversarial perturbations on packet byte sequences could degrade similarity scores; hardening against such attacks is an open research direction.
- Broader protocol coverage: Experiments focused on common IoT protocols (MQTT, CoAP); extending to industrial control system traffic (Modbus, OPC‑UA) is left for future validation.
Overall, SiamXBERT showcases how transformer‑powered meta‑learning can bring data‑efficient, zero‑day detection to the fast‑moving world of IoT security—an exciting step toward more resilient, AI‑augmented network defenses.
Authors
- Shan Ali
- Feifei Niu
- Paria Shirani
- Lionel C. Briand
Paper Information
- arXiv ID: 2602.12183v1
- Categories: cs.CR, cs.SE
- Published: February 12, 2026
- PDF: Download PDF