[Paper] Ranking-Enhanced Anomaly Detection Using Active Learning-Assisted Attention Adversarial Dual AutoEncoders
Source: arXiv - 2511.20480v1
Overview
This paper tackles one of the toughest problems in cybersecurity: spotting Advanced Persistent Threats (APTs) that hide in massive streams of system‑level logs. Because labeled attack data are extremely scarce, the authors combine unsupervised auto‑encoders with an active‑learning loop that asks a human (or oracle) to label only the most ambiguous samples. The result is a “ranking‑enhanced” detector that quickly learns to flag the rare APT events while keeping labeling effort to a minimum.
Key Contributions
- Dual AutoEncoder architecture with attention and adversarial training that learns richer representations of provenance traces.
- Active‑learning‑assisted ranking: the model scores unlabeled samples by uncertainty, queries the oracle for the top‑k, and re‑trains iteratively.
- Comprehensive evaluation on DARPA Transparent Computing provenance datasets covering Android, Linux, BSD, and Windows, where APT‑like attacks make up only 0.004 % of the data.
- Empirical evidence of superior detection rates compared with state‑of‑the‑art unsupervised and semi‑supervised anomaly detectors.
- A practical workflow that can be plugged into existing security operation centers (SOCs) to reduce manual labeling overhead.
Methodology
- Data Representation – Raw system calls and file‑access events are transformed into provenance graphs (nodes = processes/files, edges = interactions). These graphs are flattened into sequences and fed to the auto‑encoders.
- Dual AutoEncoder – Two parallel auto‑encoders (one for reconstruction, one for adversarial generation) share an attention module that highlights the most informative parts of the input sequence. The reconstruction error serves as an initial anomaly score.
- Active Learning Loop
- Uncertainty Ranking: For each unlabeled trace, the model computes a confidence margin (difference between top‑2 class probabilities) and a reconstruction‑error rank.
- Query Selection: The top‑N uncertain traces are sent to a human analyst (the “oracle”) for labeling.
- Model Update: Labeled samples are added to the training set; the dual auto‑encoders are fine‑tuned, and the attention weights are re‑calibrated.
- This cycle repeats until a stopping criterion (e.g., budget exhausted or performance plateau) is met.
- Evaluation Metrics – Precision, recall, F1‑score, and Area‑Under‑Precision‑Recall (AUPR) are reported, focusing on the minority APT class.
Results & Findings
| Dataset (OS) | Baseline (plain AE) | Proposed Dual AE + AL | Relative Gain |
|---|---|---|---|
| Android | Recall 0.31, AUPR 0.12 | Recall 0.58, AUPR 0.27 | +87 % recall |
| Linux | Recall 0.28, AUPR 0.10 | Recall 0.55, AUPR 0.24 | +96 % recall |
| BSD | Recall 0.33, AUPR 0.13 | Recall 0.60, AUPR 0.29 | +82 % recall |
| Windows | Recall 0.30, AUPR 0.11 | Recall 0.57, AUPR 0.26 | +90 % recall |
- Active learning reduces labeling cost: only ~1 % of the total traces needed to be manually labeled to achieve >50 % recall.
- Attention improves interpretability: heat‑maps over the provenance graph highlight the exact system calls that contributed most to the anomaly score, aiding analyst triage.
- Robustness across OSes: The same hyper‑parameters worked for all four operating systems, demonstrating the method’s generality.
Practical Implications
- SOC Integration – The framework can sit on top of existing log‑ingestion pipelines (e.g., Elastic Stack, Splunk) and continuously propose “high‑uncertainty” alerts for analyst review, dramatically cutting the time spent on false positives.
- Label‑Efficient Threat Hunting – Teams can bootstrap an APT detection model with just a handful of verified incidents, then let the active‑learning loop expand coverage automatically.
- Cross‑Platform Security – Because the model operates on provenance graphs rather than OS‑specific signatures, it can be deployed in heterogeneous environments (cloud VMs, containers, mobile devices) without retraining from scratch.
- Explainable AI for Audits – The attention heat‑maps provide a visual audit trail that satisfies compliance requirements (e.g., GDPR, NIST) when justifying why a particular activity was flagged.
Limitations & Future Work
- Oracle Dependency – The approach assumes a reliable human analyst to provide correct labels; noisy or delayed feedback could degrade performance.
- Scalability of Graph Construction – Building provenance graphs for high‑throughput environments may become a bottleneck; the authors suggest incremental graph updates as a next step.
- Adversarial Robustness – While an adversarial auto‑encoder is used for representation learning, the paper does not evaluate resistance against deliberately crafted evasion attacks.
- Future Directions – Extending the method to streaming data (online learning), incorporating threat‑intel feeds for richer context, and exploring self‑supervised pre‑training on massive unlabeled logs.
Authors
- Sidahmed Benabderrahmane
- James Cheney
- Talal Rahwan
Paper Information
- arXiv ID: 2511.20480v1
- Categories: cs.LG, cs.AI, cs.CR, cs.NE
- Published: November 25, 2025
- PDF: Download PDF