[Paper] Multi-Agent Taint Specification Extraction for Vulnerability Detection

Published: (January 15, 2026 at 04:31 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.10865v1

Overview

The paper introduces SemTaint, a hybrid system that blends large‑language‑model (LLM) semantic reasoning with classic static analysis to automatically generate taint‑flow specifications for JavaScript packages. By doing so, it overcomes two long‑standing hurdles in JavaScript SAST—dynamic language features and the massive, ever‑changing npm ecosystem—enabling more accurate vulnerability detection at scale.

Key Contributions

  • Multi‑agent architecture that coordinates a traditional static analyzer with an LLM to resolve ambiguous call edges and infer taint sources/sinks.
  • Automated extraction of per‑package taint specifications (sources, sinks, call edges, and library flow summaries) without manual rule writing.
  • Integration with CodeQL, demonstrating that the generated specs boost detection of real‑world bugs (106 of 162 previously missed vulnerabilities).
  • Discovery of four novel security flaws in popular npm libraries, proving the practical security value of the approach.
  • Empirical evaluation on a large corpus of npm packages, showing that LLM‑augmented analysis can scale to the size and dynamism of the JavaScript ecosystem.

Methodology

  1. Static Call‑Graph Construction – A conventional JavaScript static analyzer builds a conservative call graph for a target package and its dependencies.
  2. LLM‑Driven Edge Resolution – For call sites that remain unresolved due to dynamic features (e.g., eval, dynamic property access, higher‑order functions), a prompt‑engineered LLM (e.g., GPT‑4) is queried to predict the most likely target function(s) based on code context and documentation.
  3. Source/Sink Classification – The LLM is also asked to label functions/objects as sources (where attacker‑controlled data can enter) or sinks (where dangerous actions occur) for a given CWE (e.g., XSS, SQLi).
  4. Specification Synthesis – The resolved edges and labeled sources/sinks are compiled into a taint specification that describes how data may flow through the library.
  5. SAST Execution – The specification is fed into CodeQL, which then performs a full taint‑analysis pass across the codebase, flagging potential vulnerabilities.
  6. Feedback Loop – Detected false positives/negatives are used to refine prompts and improve LLM accuracy in subsequent runs.

Results & Findings

  • Detection boost: SemTaint enabled CodeQL to uncover 106 out of 162 known vulnerabilities that CodeQL alone missed—a 65 % improvement.
  • New bugs: The system identified four previously unknown security issues in widely used npm packages, all of which were responsibly disclosed and patched.
  • Coverage: Across a benchmark of 500 npm packages, the LLM resolved ≈ 78 % of previously ambiguous call edges, dramatically reducing the “unknown” portion of the call graph.
  • Performance: Adding the LLM step increased analysis time by an average of 1.8×, still within acceptable limits for CI/CD pipelines when run on modest cloud instances.

Practical Implications

  • Developer tooling: SemTaint can be packaged as a plug‑in for existing SAST platforms (e.g., CodeQL, SonarQube), giving teams immediate, higher‑quality taint specs without hand‑crafting rules.
  • CI/CD integration: The modest runtime overhead makes it feasible to run on every pull request for high‑risk JavaScript projects, catching bugs earlier.
  • Supply‑chain security: By automatically generating specs for third‑party dependencies, organizations can audit the entire npm dependency tree rather than trusting vendor‑provided docs.
  • Rapid adaptation: As new libraries appear, the LLM can infer specs on‑the‑fly, keeping security coverage up‑to‑date without a dedicated rule‑authoring effort.
  • Cross‑language potential: The same multi‑agent pattern could be applied to other dynamic ecosystems (Python, Ruby), extending its impact beyond JavaScript.

Limitations & Future Work

  • LLM reliability: The approach depends on the LLM’s correctness; occasional mis‑classifications can introduce false positives or miss true taint flows.
  • Prompt engineering overhead: Crafting effective prompts for diverse codebases remains a manual step that could benefit from automation.
  • Scalability to massive monorepos: While feasible for typical npm packages, extremely large codebases may incur higher latency due to the number of LLM queries.
  • Security of the LLM itself: Using a cloud‑hosted model raises concerns about data leakage; future work could explore on‑premise or distilled models.
  • Broader evaluation: Extending experiments to other SAST tools, additional CWE categories, and non‑JavaScript languages would validate generality.

SemTaint demonstrates that marrying symbolic static analysis with the semantic intuition of modern LLMs isn’t just a research curiosity—it’s a concrete step toward more robust, automated security for today’s fast‑moving JavaScript ecosystem.

Authors

  • Jonah Ghebremichael
  • Saastha Vasan
  • Saad Ullah
  • Greg Tystahl
  • David Adei
  • Christopher Kruegel
  • Giovanni Vigna
  • William Enck
  • Alexandros Kapravelos

Paper Information

  • arXiv ID: 2601.10865v1
  • Categories: cs.CR, cs.SE
  • Published: January 15, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »