[Paper] Unknown Attack Detection in IoT Networks using Large Language Models: A Robust, Data-efficient Approach

Published: 3 days ago (February 12, 2026 at 12:15 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.12183v1

Overview

A new paper introduces SiamXBERT, a meta‑learning framework that leverages large language models (LLMs) to spot previously unseen (zero‑day) attacks in IoT networks. By combining flow‑level statistics with raw packet data, the approach works even when traffic is encrypted and when only a handful of labeled examples are available—two pain points for today’s intrusion‑detection systems.

Key Contributions

Dual‑modality representation: Merges flow‑level features (e.g., packet counts, durations) with packet‑level byte sequences, preserving rich behavioral cues without needing payload decryption.
Siamese meta‑learning with BERT: Uses a transformer‑based language model (BERT) as the backbone of a Siamese network, enabling rapid adaptation to new attack families from just a few labeled samples.
Data‑efficient learning: Demonstrates strong detection performance with dramatically fewer training instances compared with conventional deep‑learning IDSs.
Robust cross‑dataset generalization: Validated on multiple IoT intrusion datasets, showing consistent gains in unknown‑attack F1‑score (up to 78.8 % improvement).
Open‑source‑ready pipeline: Provides a reproducible training/evaluation workflow that can be plugged into existing security operation centers (SOCs).

Methodology

Feature Extraction
- Flow‑level: Standard NetFlow/IPFIX metrics (bytes, packets, duration, inter‑arrival times).
- Packet‑level: Raw byte sequences of the first N packets in a flow, tokenized and fed to a BERT‑style transformer.
Siamese Architecture
- Two identical BERT encoders process a query flow and a support flow (the few labeled examples of a new attack).
- The encoders output embeddings that are compared with a distance metric (e.g., cosine similarity).
Meta‑Learning (Few‑Shot Adaptation)
- During training, the model sees many “episodes,” each mimicking the few‑shot scenario: a small support set of a particular attack class and a query set.
- The loss encourages the model to pull together embeddings of the same class and push apart different classes, teaching it to generalize from minimal data.
Inference
- For an incoming flow, SiamXBERT computes its embedding and measures similarity against the support set of known attacks.
- If similarity falls below a learned threshold, the flow is flagged as unknown (potential zero‑day).

The whole pipeline runs on standard GPU hardware and can be integrated with existing IDS pipelines that already collect flow statistics.

Results & Findings

Setting	Baseline (e.g., CNN, LSTM)	SiamXBERT	Δ F1 (unknown attacks)
Within‑dataset (same IoT testbed)	0.62	0.89	+43 %
Cross‑dataset (train on one IoT dataset, test on another)	0.48	0.86	+78.8 %
Training data size (10 % of full set)	0.55	0.84	+53 %

Data efficiency: With only 5–10 labeled samples per new attack, SiamXBERT reaches >80 % of the performance obtained with the full training set.
Encrypted traffic compatibility: Since the packet‑level input is treated as a byte‑string, the model does not rely on payload semantics, making it viable for TLS‑encrypted IoT streams.
Fast adaptation: New attack signatures can be incorporated in under a minute of fine‑tuning, suitable for real‑time SOC workflows.

Practical Implications

Plug‑and‑play IDS upgrades: Security teams can augment existing flow‑based IDSs with SiamXBERT without overhauling packet capture infrastructure.
Zero‑day readiness: The few‑shot capability means that as soon as a security analyst tags a handful of suspicious flows, the system can start flagging similar unknown traffic across the network.
Cost reduction: Less reliance on massive labeled datasets lowers the barrier for small‑to‑mid‑size enterprises to deploy advanced ML‑based detection.
Edge deployment: The dual‑modality design can be trimmed to run on edge gateways (e.g., Raspberry Pi with a modest GPU), enabling on‑device detection before traffic reaches the cloud.
Compliance & privacy: Because the method works on encrypted payloads, it aligns with privacy regulations that restrict deep packet inspection.

Limitations & Future Work

Model size: BERT‑based encoders are still relatively heavyweight for ultra‑low‑power IoT nodes; pruning or distillation will be needed for truly constrained devices.
Support‑set management: Maintaining up‑to‑date few‑shot examples for a large and evolving attack taxonomy could become operationally complex.
Adversarial robustness: The authors note that targeted adversarial perturbations on packet byte sequences could degrade similarity scores; hardening against such attacks is an open research direction.
Broader protocol coverage: Experiments focused on common IoT protocols (MQTT, CoAP); extending to industrial control system traffic (Modbus, OPC‑UA) is left for future validation.

Overall, SiamXBERT showcases how transformer‑powered meta‑learning can bring data‑efficient, zero‑day detection to the fast‑moving world of IoT security—an exciting step toward more resilient, AI‑augmented network defenses.

Authors

Shan Ali
Feifei Niu
Paria Shirani
Lionel C. Briand

Paper Information

arXiv ID: 2602.12183v1
Categories: cs.CR, cs.SE
Published: February 12, 2026
PDF: Download PDF

[Paper] Unknown Attack Detection in IoT Networks using Large Language Models: A Robust, Data-efficient Approach

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Automated Test Suite Enhancement Using Large Language Models with Few-shot Prompting

[Paper] PPTAM$η$: Energy Aware CI/CD Pipeline for Container Based Applications

[Paper] Performance Antipatterns: Angel or Devil for Power Consumption?

[Paper] Studying Quality Improvements Recommended via Manual and Automated Code Review