[Paper] VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection

Published: 2 months ago (February 11, 2026 at 07:24 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.10787v1

Overview

The paper introduces VulReaD, a novel framework that combines large language models (LLMs) with a security‑focused knowledge graph to move software vulnerability detection (SVD) beyond simple “vulnerable / not vulnerable” decisions. By grounding the model’s reasoning in the Common Weakness Enumeration (CWE) taxonomy, VulReaD delivers explanations that are both human‑readable and semantically aligned with industry‑standard vulnerability categories.

Key Contributions

Knowledge‑graph‑guided reasoning: Integrates a security knowledge graph as a semantic backbone, ensuring that model outputs map cleanly onto CWE categories.
Contrastive reasoning supervision: Uses a powerful “teacher” LLM to automatically generate CWE‑consistent reasoning examples, eliminating the need for costly manual annotation.
Odds Ratio Preference Optimization (ORPO): A fine‑tuning objective that rewards explanations matching the CWE taxonomy while penalizing unsupported or contradictory statements.
Significant performance gains: Achieves 8–10 % higher binary F1 and up to 30 % improvement in macro‑F1 for multi‑class CWE classification over state‑of‑the‑art baselines.
Improved interpretability: Provides natural‑language rationales that are directly traceable to known vulnerability patterns, aiding security analysts and developers.

Methodology

Security Knowledge Graph Construction – The authors curate a graph where nodes represent CWE IDs, related code patterns, and security concepts, while edges capture hierarchical and semantic relationships (e.g., “CWE‑79 → Cross‑Site Scripting”).
Teacher‑Student LLM Pipeline –
- A large, pre‑trained LLM (the “teacher”) is prompted with source‑code snippets and asked to produce a CWE label and a concise reasoning chain that references graph concepts.
- These teacher‑generated pairs serve as contrastive supervision: each snippet has a correct CWE‑aligned explanation and a set of deliberately mismatched (negative) explanations.
Student Model Fine‑tuning with ORPO – The smaller “student” model is trained to maximize the odds ratio between correct and incorrect explanations, effectively learning to prefer taxonomy‑consistent reasoning while suppressing spurious claims.
Inference – At test time, the student model predicts both a CWE label and a natural‑language justification, which can be cross‑checked against the knowledge graph for validation.

Results & Findings

Dataset	Binary F1 ↑	Macro‑F1 (CWE) ↑	Micro‑F1 (CWE) ↑
Dataset A	+9 % vs. best DL baseline	+28 % vs. prior LLM	+17 %
Dataset B	+8 %	+30 %	+18 %
Dataset C	+10 %	+32 %	+19 %

LLM superiority: Even without the KG, the teacher LLM outperformed traditional deep‑learning detectors on binary vulnerability detection.
KG impact: Incorporating the knowledge graph closed the gap between raw LLM predictions and CWE‑aligned reasoning, boosting multi‑class metrics dramatically.
Interpretability boost: Human evaluators rated the generated explanations as “highly consistent” with CWE definitions in 86 % of cases, compared to 42 % for baseline LLM outputs.

Practical Implications

Developer tooling: IDE plugins could surface CWE‑specific warnings together with concise, graph‑backed explanations, helping developers fix issues before code review.
Automated code review pipelines: CI/CD systems can ingest VulReaD’s predictions to enforce policy checks (e.g., “no CWE‑89 SQL injection”) with audit‑ready rationales.
Security training: The natural‑language reasoning aligns with educational material, making it a useful teaching aid for junior engineers learning secure coding practices.
Incident response: When a vulnerability is flagged in production, the accompanying CWE‑aligned rationale can accelerate root‑cause analysis and patch prioritization.

Limitations & Future Work

Knowledge graph completeness: The KG currently covers a subset of CWE entries; rare or emerging weaknesses may be missed, limiting coverage.
Scalability of teacher generation: While automated, generating high‑quality contrastive explanations for massive codebases can be compute‑intensive.
Domain transfer: The study focuses on C/C++ and Java code; extending to scripting languages (e.g., JavaScript, Python) may require additional graph enrichment.
Future directions: The authors suggest enriching the KG with dynamic threat‑intel feeds, exploring few‑shot adaptation for new CWE categories, and integrating static analysis signals to further tighten reasoning accuracy.

Authors

Samal Mukhtar
Yinghua Yao
Zhu Sun
Mustafa Mustafa
Yew Soon Ong
Youcheng Sun

Paper Information

arXiv ID: 2602.10787v1
Categories: cs.SE, cs.AI, cs.CR, cs.IR
Published: February 11, 2026
PDF: Download PDF

[Paper] VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

[Paper] Semantic Chunking and the Entropy of Natural Language

[Paper] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

[Paper] Selection of CMIP6 Models for Regional Precipitation Projection and Climate Change Assessment in the Jhelum and Chenab River Basins