[Paper] From One Attack Domain to Another: Contrastive Transfer Learning with Siamese Networks for APT Detection

Published: (November 25, 2025 at 12:07 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.20500v1

Overview

The paper introduces a novel hybrid framework that blends transfer learning, contrastive learning, and Siamese neural networks to detect Advanced Persistent Threats (APTs) across different attack domains. By tackling the notorious class‑imbalance and feature‑drift problems that plague traditional detectors, the authors demonstrate a more robust, explainable, and portable solution for real‑world cyber‑defense.

Key Contributions

  • Cross‑domain APT detection: A transfer‑learning pipeline that retains detection performance when moving from a known (source) attack environment to an unseen (target) one.
  • Contrastive Siamese encoder: Uses a Siamese architecture with a contrastive loss to align source and target feature spaces, improving anomaly separability.
  • Attention‑based autoencoder for knowledge transfer: Learns compact, domain‑agnostic representations that preserve the most salient behaviors.
  • Explainable feature selection with SHAP: Applies Shapley Additive exPlanations to prune high‑dimensional telemetry to a stable, informative subset, reducing computational overhead.
  • Extensive empirical validation: Experiments on DARPA Transparent Computing (TC) datasets plus synthetic attack scenarios show consistent gains over classical ML and deep baselines.

Methodology

  1. Data preprocessing & SHAP‑driven feature pruning – Raw system‑call and network telemetry are first fed through a SHAP analysis. Features that consistently contribute to detection across multiple runs are kept, while noisy or redundant dimensions are dropped.
  2. Attention‑based autoencoder – The reduced feature set is encoded into a latent vector using an autoencoder equipped with attention heads. The attention mechanism highlights temporal or contextual patterns that are most relevant for APT behavior.
  3. Siamese contrastive learning – Two identical encoders (the “Siamese twins”) process source‑domain and target‑domain samples in parallel. A contrastive loss pushes representations of the same class (benign vs. malicious) together while pulling different‑class pairs apart, effectively aligning the two domains in a shared embedding space.
  4. Anomaly scoring – In the learned embedding, a simple distance‑based or lightweight classifier (e.g., one‑class SVM) flags outliers as potential APTs. Because the embedding is domain‑agnostic, the same scoring model can be reused across environments.
  5. Explainability layer – SHAP values are recomputed on the final detector to provide per‑instance explanations, helping analysts understand why a particular trace was flagged.

Results & Findings

  • Detection boost: Across multiple source‑to‑target transfers, the proposed method improves F1‑score by 7–12 % over the best deep baseline and up to 20 % over traditional random‑forest detectors.
  • Reduced dimensionality: SHAP‑based pruning cuts the feature space by ~65 % without sacrificing accuracy, leading to ~30 % faster inference.
  • Robustness to synthetic attacks: When evaluated on artificially generated APT scenarios (e.g., novel command‑and‑control patterns), the contrastive Siamese encoder maintains high separability, indicating resistance to feature drift.
  • Explainability: Analysts reported clearer insight into detection triggers thanks to SHAP visualizations, which highlighted a handful of system calls and network ports consistently associated with malicious activity.

Practical Implications

  • Plug‑and‑play detection modules: Security teams can train the model on historic logs from one environment (e.g., a corporate network) and redeploy it to a new environment (e.g., a cloud tenant) with minimal retraining.
  • Lower operational cost: By shrinking the feature set and using a lightweight downstream classifier, the solution fits comfortably on edge devices or within SIEM pipelines that demand sub‑second latency.
  • Audit‑ready alerts: Integrated SHAP explanations satisfy compliance and forensics requirements, giving SOC analysts actionable context rather than opaque scores.
  • Scalable to heterogeneous data: The attention‑autoencoder can ingest diverse telemetry (process, file, network) making the approach suitable for modern zero‑trust architectures that aggregate multi‑vector logs.

Limitations & Future Work

  • Dependence on quality of SHAP features: If the initial SHAP analysis discards subtle but critical indicators, detection may degrade; the authors suggest adaptive feature selection as a remedy.
  • Synthetic vs. real‑world novelty: While synthetic attacks were used to stress‑test the model, the paper acknowledges that truly novel APT tactics may still cause representation drift.
  • Computational overhead of training: The contrastive Siamese training phase is more resource‑intensive than a single‑stream model, which could be a barrier for organizations without GPU clusters.
  • Future directions: Extending the framework to continual learning (online updates), exploring other explainability techniques (e.g., LIME), and evaluating on additional public APT datasets (e.g., MITRE ATT&CK emulations).

Authors

  • Sidahmed Benabderrahmane
  • Talal Rahwan

Paper Information

  • arXiv ID: 2511.20500v1
  • Categories: cs.LG, cs.AI, cs.CR, cs.NE
  • Published: November 25, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »

It’s code red for ChatGPT

A smidge over three years ago, OpenAI threw the rest of the tech industry into chaos. When ChatGPT launched, even billed as a 'low-key research preview,' it bec...