[Paper] Identifying Adversary Tactics and Techniques in Malware Binaries with an LLM Agent

Published: (February 5, 2026 at 09:42 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.06325v1

Overview

The paper introduces TTPDetect, the first large‑language‑model (LLM)‑driven agent that can automatically pinpoint tactics, techniques, and procedures (TTPs) hidden inside stripped malware binaries. By marrying dense code retrieval with on‑the‑fly reasoning, the system bridges the gap between raw binary analysis and actionable threat‑intel, achieving near‑human‑level precision on real‑world samples.

Key Contributions

  • LLM‑based malware TTP agent – First end‑to‑end system that uses an LLM as an “analysis assistant” to map decompiled functions to ATT&CK‑style TTPs.
  • Hybrid retrieval pipeline – Combines traditional dense vector retrieval with LLM‑guided neural retrieval to efficiently locate promising entry‑point functions in massive, symbol‑less binaries.
  • Context Explorer – A function‑level agent that incrementally pulls in surrounding code (call‑graph, data‑flow, control‑flow) only when needed, keeping the LLM prompt size manageable.
  • TTP‑Specific Reasoning Guideline – A set of inference‑time prompts that steer the LLM toward ATT&CK‑aligned decision logic, reducing hallucinations.
  • New labeled dataset – Over 30 k decompiled functions from diverse malware families (Windows, Linux, Android) annotated with ATT&CK TTPs, released for reproducibility.
  • Strong empirical results – 93 %+ precision/recall on function‑level TTP detection and 87 % precision on full‑sample evaluation, outperforming prior static‑analysis baselines by up to 19 %.

Methodology

  1. Pre‑processing & Decompilation – Raw binaries are stripped of symbols, then decompiled (e.g., using Ghidra/IDA) into a function‑level intermediate representation.
  2. Dense Retrieval – Each function is embedded with a code‑specific encoder (e.g., CodeBERT). A nearest‑neighbor search quickly narrows the candidate set to the top‑k functions that look “malicious.”
  3. Neural Retrieval with LLM – The LLM receives the query (“Find functions that implement credential dumping”) and the top‑k candidates, re‑ranking them based on its internal code understanding.
  4. Context Explorer Agent – For a selected candidate, the agent lazily expands the context: it pulls the caller/callee functions, relevant data structures, and control‑flow snippets only when the LLM asks for more information. This keeps prompts short while still providing the full reasoning picture.
  5. TTP‑Specific Reasoning Guideline – A prompt template encodes ATT&CK definitions, typical code patterns, and decision thresholds. The LLM follows this guideline to output a TTP label (or “none”).
  6. Iterative Refinement – The system repeats steps 3‑5 for each high‑scoring function, aggregating TTPs at the binary level.

Results & Findings

MetricFunction‑level (test set)Full‑sample (real malware)
Precision93.25 %87.37 %
Recall93.81 %
F193.53 %
Baseline (static‑analysis)+10.38 % precision, +18.78 % recall
Recovery of expert‑written TTPs85.7 %
New TTPs discovered per sample10.5 (average)

Takeaway: TTPDetect not only matches human analysts in spotting known techniques but also uncovers a substantial number of previously undocumented behaviors, demonstrating its utility for threat‑intel enrichment.

Practical Implications

  • Automated Threat Intel Generation – Security teams can feed newly captured binaries into TTPDetect and instantly receive ATT&CK‑aligned TTP reports, cutting weeks of manual reverse‑engineering down to hours.
  • Prioritization of Incident Response – By surfacing high‑impact techniques (e.g., credential dumping, lateral movement), analysts can triage alerts more effectively.
  • Integration with SIEM/EDR – The function‑level TTP tags can be exported as structured indicators (STIX/TAXII), feeding downstream detection rules and behavioral analytics.
  • Malware Family Attribution – Consistent TTP fingerprints across samples help cluster unknown binaries into existing campaigns, aiding attribution and proactive defense.
  • Open‑source Research Catalyst – The released dataset and retrieval pipeline provide a baseline for future work on LLM‑driven binary analysis, encouraging community extensions (e.g., multi‑modal models that ingest raw bytes).

Limitations & Future Work

  • Dependence on Decompilation Quality – Stripped binaries with heavy obfuscation may yield inaccurate function boundaries, limiting the agent’s recall.
  • Prompt Length Constraints – Although the Context Explorer mitigates this, extremely large call graphs can still exceed model context windows.
  • LLM Hallucination Risk – Despite the reasoning guideline, occasional false TTP assignments occur, especially for novel or hybrid techniques not seen during training.
  • Platform Coverage – The current evaluation focuses on Windows, Linux, and Android; extending to IoT firmware or macOS binaries remains open.
  • Dynamic Behavior Fusion – Future versions could combine static LLM reasoning with dynamic execution traces (e.g., sandbox logs) for richer TTP inference.

Bottom line: TTPDetect showcases how LLM agents, when paired with smart retrieval and domain‑specific prompting, can transform raw, symbol‑less malware binaries into actionable threat intelligence—an advancement that promises to accelerate defensive workflows across the security industry.*

Authors

  • Zhou Xuan
  • Xiangzhe Xu
  • Mingwei Zheng
  • Louis Zheng-Hua Tan
  • Jinyao Guo
  • Tiantai Zhang
  • Le Yu
  • Chengpeng Wang
  • Xiangyu Zhang

Paper Information

  • arXiv ID: 2602.06325v1
  • Categories: cs.CR, cs.SE
  • Published: February 6, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »