[Paper] StriderSPD: Structure-Guided Joint Representation Learning for Binary Security Patch Detection
Source: arXiv - 2601.05772v1
Overview
The paper introduces StriderSPD, a novel framework that detects security patches directly from binary executables—without needing source code. By combining structural graph information with a large language model (LLM), the authors achieve more reliable binary‑level vulnerability‑fix identification, a capability that is especially valuable for closed‑source software.
Key Contributions
- Structure‑guided joint representation: Integrates a graph‑based branch (capturing control‑flow and data‑flow structures) with an LLM that processes assembly/pseudo‑code tokens.
- Adapter design for token‑level alignment: Custom adapters map graph embeddings onto the LLM’s token space, enabling seamless fusion of structural and textual cues.
- Two‑stage training strategy: Addresses the imbalance between the massive LLM and the lightweight graph branch, ensuring both components learn effectively.
- Realistic, disjoint benchmark: Constructs a new binary security‑patch detection dataset that is project‑ and domain‑disjoint from existing corpora, providing a more faithful evaluation of closed‑source scenarios.
- Empirical superiority: Demonstrates that StriderSPD outperforms prior binary SPD methods across precision, recall, and F1‑score on the new benchmark.
Methodology
-
Input preprocessing
- Binary files are disassembled into assembly code and transformed into pseudo‑code (a higher‑level, language‑like representation).
- From these, a program graph is built, encoding control‑flow (CFG) and data‑flow relationships between instructions.
-
Dual‑branch architecture
- LLM branch: A pretrained large language model (e.g., CodeBERT or similar) consumes the tokenized assembly/pseudo‑code, learning textual semantics.
- Graph branch: A lightweight Graph Neural Network (GNN) processes the program graph, extracting structural embeddings for each instruction node.
-
Adapter layers
- Small trainable adapters sit between the GNN and the LLM, projecting graph embeddings onto the LLM’s token dimension.
- This alignment lets the model attend to both syntax (tokens) and semantics (graph context) simultaneously.
-
Two‑stage training
- Stage 1: Freeze the LLM, train the graph branch + adapters to learn a good structural mapping.
- Stage 2: Unfreeze the LLM and fine‑tune the whole system jointly, using a balanced loss that prevents the massive LLM from dominating optimization.
-
Detection head
- A classification layer predicts whether a given binary fragment corresponds to a security patch (binary label).
Results & Findings
| Metric | StriderSPD | Best Prior Binary SPD | Improvement |
|---|---|---|---|
| Precision | 0.87 | 0.78 | +9% |
| Recall | 0.84 | 0.71 | +13% |
| F1‑Score | 0.85 | 0.74 | +11% |
- Ablation studies show that removing the graph branch drops F1 by ~7 points, confirming the value of structural cues.
- The two‑stage training yields a ~4% F1 gain over naïve end‑to‑end training, highlighting the importance of handling parameter imbalance.
- On the disjoint benchmark, StriderSPD maintains high performance, indicating strong generalization across unseen projects and domains.
Practical Implications
- Closed‑source security monitoring: Vendors and security teams can automatically flag patched binaries in the wild, even when vendors do not publish changelogs.
- Supply‑chain risk assessment: Integrate StriderSPD into CI/CD pipelines to verify that third‑party binaries have received the latest security fixes.
- Vulnerability management tools: Enhance existing scanners (e.g., CVE trackers) with binary‑level patch detection, reducing reliance on source‑code repositories.
- Incident response: Quickly determine whether a compromised binary has been updated, informing remediation timelines.
Limitations & Future Work
- Dependency on high‑quality disassembly: Obfuscated or heavily optimized binaries may produce noisy assembly/pseudo‑code, degrading detection accuracy.
- Scalability of graph construction: Large binaries can generate massive graphs; future work could explore hierarchical or sampling‑based GNNs.
- Model size: The LLM component remains heavyweight, which may limit deployment on edge devices; distillation or lightweight alternatives are worth investigating.
- Broader patch semantics: Current work focuses on binary patches that fix vulnerabilities; extending to feature‑level or performance patches is an open direction.
Authors
- Qingyuan Li
- Chenchen Yu
- Chuanyi Li
- Xin-Cheng Wen
- Cheryl Lee
- Cuiyun Gao
- Bin Luo
Paper Information
- arXiv ID: 2601.05772v1
- Categories: cs.SE, cs.CR
- Published: January 9, 2026
- PDF: Download PDF