[Paper] Unsupervised Anomaly Detection in NSL-KDD Using $β$-VAE: A Latent Space and Reconstruction Error Approach
Source: arXiv - 2602.19785v1
Overview
The paper investigates how β‑Variational Autoencoders (β‑VAEs) can be used for unsupervised intrusion detection on the classic NSL‑KDD network‑traffic benchmark. By probing both the latent‑space geometry and the reconstruction error, the authors show that a well‑tuned β‑VAE can flag anomalous traffic without any labeled attack data—an attractive prospect for modern OT‑IT environments where new threats emerge constantly.
Key Contributions
- Dual‑metric anomaly detection: Introduces two complementary unsupervised scoring schemes—(1) distance‑based scores in the learned latent space, and (2) traditional reconstruction‑error scores.
- β‑VAE adaptation for network data: Demonstrates that adjusting the β hyper‑parameter (controlling the trade‑off between reconstruction fidelity and latent disentanglement) markedly improves the separability of normal vs. malicious flows.
- Empirical comparison on NSL‑KDD: Provides a thorough experimental evaluation that quantifies the trade‑offs between the two metrics (precision, recall, ROC‑AUC).
- Insightful analysis of latent representations: Visualizes how normal traffic clusters tightly while attacks scatter, supporting the hypothesis that latent distance is a strong anomaly indicator.
- Open‑source implementation: Releases code and trained models, enabling reproducibility and rapid prototyping for security teams.
Methodology
- Data preprocessing – The NSL‑KDD dataset is first one‑hot encoded for categorical fields and normalized. No attack labels are used during training; only the “normal” subset informs the model.
- β‑VAE architecture – A symmetric encoder/decoder network (fully‑connected layers) maps a 122‑dimensional input to a low‑dimensional latent vector (typically 2‑10 dimensions). The loss combines:
- Reconstruction term (binary cross‑entropy) that forces the decoder to rebuild the original packet features.
- KL‑divergence term multiplied by β, encouraging a smoother, more disentangled latent space.
- Scoring mechanisms
- Latent‑space distance: For each test sample, compute its latent embedding and measure Euclidean distance to the nearest training embedding (or to the centroid of normal embeddings). Larger distances imply anomalies.
- Reconstruction error: Compute the per‑sample reconstruction loss; high error suggests the model could not represent the input, flagging it as anomalous.
- Threshold selection – In an unsupervised setting, thresholds are set using a validation split of normal data (e.g., 95th percentile) to control false‑positive rates.
- Evaluation – Although training is unsupervised, the authors later map the scores to the known attack labels in NSL‑KDD to compute standard metrics (AUC, F1).
Results & Findings
| Metric | Latent‑space distance | Reconstruction error |
|---|---|---|
| ROC‑AUC | 0.93 | 0.86 |
| F1 (optimal threshold) | 0.78 | 0.71 |
| False‑positive rate @ 95 % recall | 12 % | 18 % |
- Latent‑space distance consistently outperforms reconstruction error on this dataset, especially for low‑frequency attack types that produce embeddings far from the normal cluster.
- β tuning matters: β ≈ 4 yields the best trade‑off; lower β values cause over‑fitting (latent space collapses), while very high β values degrade reconstruction quality.
- Visualization (t‑SNE of latent vectors) shows a tight “normal” cloud with attacks forming distinct outliers, confirming the intuition behind the distance‑based score.
Practical Implications
- Plug‑and‑play anomaly detector: Security engineers can deploy a β‑VAE trained only on benign traffic logs; the model will automatically flag novel malicious patterns without needing costly signature updates.
- Lightweight inference: Once trained, the encoder alone suffices to compute a latent embedding and distance score, making real‑time detection feasible on edge devices or within network appliances.
- Explainability boost: Distance scores can be visualized (e.g., heatmaps of latent clusters), helping SOC analysts prioritize alerts that are truly “far” from normal behavior.
- Hybrid systems: The approach can be combined with supervised classifiers—use the latent embeddings as features for downstream supervised models, improving detection of known attacks while retaining unsupervised coverage for zero‑day threats.
- Domain transfer: Because β‑VAEs learn a generic representation of traffic patterns, the same model can be fine‑tuned on other datasets (e.g., CIC‑IDS2017) with minimal labeled data.
Limitations & Future Work
- Dataset bias: NSL‑KDD is an older benchmark; real‑world traffic exhibits higher dimensionality, encrypted payloads, and concept drift, which may affect model robustness.
- Threshold sensitivity: Selecting a static threshold can be brittle in production; adaptive or percentile‑based thresholds need further study.
- Scalability of distance computation: Nearest‑neighbor searches in latent space become costly with massive training sets; approximate methods (e.g., FAISS) or learned density estimators could alleviate this.
- Explainability depth: While distance gives a coarse anomaly score, pinpointing which features triggered the anomaly remains an open challenge.
- Future directions: The authors suggest exploring β‑VAE‑based contrastive learning, integrating temporal dynamics (e.g., recurrent VAEs), and testing on live network streams to assess drift handling.
Authors
- Dylan Baptiste
- Ramla Saddem
- Alexandre Philippot
- François Foyer
Paper Information
- arXiv ID: 2602.19785v1
- Categories: cs.LG, cs.NE, stat.ML
- Published: February 23, 2026
- PDF: Download PDF