[Paper] VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection

Published: (February 11, 2026 at 07:24 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.10787v1

Overview

The paper introduces VulReaD, a novel framework that combines large language models (LLMs) with a security‑focused knowledge graph to move software vulnerability detection (SVD) beyond simple “vulnerable / not vulnerable” decisions. By grounding the model’s reasoning in the Common Weakness Enumeration (CWE) taxonomy, VulReaD delivers explanations that are both human‑readable and semantically aligned with industry‑standard vulnerability categories.

Key Contributions

  • Knowledge‑graph‑guided reasoning: Integrates a security knowledge graph as a semantic backbone, ensuring that model outputs map cleanly onto CWE categories.
  • Contrastive reasoning supervision: Uses a powerful “teacher” LLM to automatically generate CWE‑consistent reasoning examples, eliminating the need for costly manual annotation.
  • Odds Ratio Preference Optimization (ORPO): A fine‑tuning objective that rewards explanations matching the CWE taxonomy while penalizing unsupported or contradictory statements.
  • Significant performance gains: Achieves 8–10 % higher binary F1 and up to 30 % improvement in macro‑F1 for multi‑class CWE classification over state‑of‑the‑art baselines.
  • Improved interpretability: Provides natural‑language rationales that are directly traceable to known vulnerability patterns, aiding security analysts and developers.

Methodology

  1. Security Knowledge Graph Construction – The authors curate a graph where nodes represent CWE IDs, related code patterns, and security concepts, while edges capture hierarchical and semantic relationships (e.g., “CWE‑79 → Cross‑Site Scripting”).
  2. Teacher‑Student LLM Pipeline
    • A large, pre‑trained LLM (the “teacher”) is prompted with source‑code snippets and asked to produce a CWE label and a concise reasoning chain that references graph concepts.
    • These teacher‑generated pairs serve as contrastive supervision: each snippet has a correct CWE‑aligned explanation and a set of deliberately mismatched (negative) explanations.
  3. Student Model Fine‑tuning with ORPO – The smaller “student” model is trained to maximize the odds ratio between correct and incorrect explanations, effectively learning to prefer taxonomy‑consistent reasoning while suppressing spurious claims.
  4. Inference – At test time, the student model predicts both a CWE label and a natural‑language justification, which can be cross‑checked against the knowledge graph for validation.

Results & Findings

DatasetBinary F1 ↑Macro‑F1 (CWE) ↑Micro‑F1 (CWE) ↑
Dataset A+9 % vs. best DL baseline+28 % vs. prior LLM+17 %
Dataset B+8 %+30 %+18 %
Dataset C+10 %+32 %+19 %
  • LLM superiority: Even without the KG, the teacher LLM outperformed traditional deep‑learning detectors on binary vulnerability detection.
  • KG impact: Incorporating the knowledge graph closed the gap between raw LLM predictions and CWE‑aligned reasoning, boosting multi‑class metrics dramatically.
  • Interpretability boost: Human evaluators rated the generated explanations as “highly consistent” with CWE definitions in 86 % of cases, compared to 42 % for baseline LLM outputs.

Practical Implications

  • Developer tooling: IDE plugins could surface CWE‑specific warnings together with concise, graph‑backed explanations, helping developers fix issues before code review.
  • Automated code review pipelines: CI/CD systems can ingest VulReaD’s predictions to enforce policy checks (e.g., “no CWE‑89 SQL injection”) with audit‑ready rationales.
  • Security training: The natural‑language reasoning aligns with educational material, making it a useful teaching aid for junior engineers learning secure coding practices.
  • Incident response: When a vulnerability is flagged in production, the accompanying CWE‑aligned rationale can accelerate root‑cause analysis and patch prioritization.

Limitations & Future Work

  • Knowledge graph completeness: The KG currently covers a subset of CWE entries; rare or emerging weaknesses may be missed, limiting coverage.
  • Scalability of teacher generation: While automated, generating high‑quality contrastive explanations for massive codebases can be compute‑intensive.
  • Domain transfer: The study focuses on C/C++ and Java code; extending to scripting languages (e.g., JavaScript, Python) may require additional graph enrichment.
  • Future directions: The authors suggest enriching the KG with dynamic threat‑intel feeds, exploring few‑shot adaptation for new CWE categories, and integrating static analysis signals to further tighten reasoning accuracy.

Authors

  • Samal Mukhtar
  • Yinghua Yao
  • Zhu Sun
  • Mustafa Mustafa
  • Yew Soon Ong
  • Youcheng Sun

Paper Information

  • arXiv ID: 2602.10787v1
  • Categories: cs.SE, cs.AI, cs.CR, cs.IR
  • Published: February 11, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »