[Paper] Multi-Agent Taint Specification Extraction for Vulnerability Detection

Published: 3 weeks ago (January 15, 2026 at 04:31 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.10865v1

Overview

The paper introduces SemTaint, a hybrid system that blends large‑language‑model (LLM) semantic reasoning with classic static analysis to automatically generate taint‑flow specifications for JavaScript packages. By doing so, it overcomes two long‑standing hurdles in JavaScript SAST—dynamic language features and the massive, ever‑changing npm ecosystem—enabling more accurate vulnerability detection at scale.

Key Contributions

Multi‑agent architecture that coordinates a traditional static analyzer with an LLM to resolve ambiguous call edges and infer taint sources/sinks.
Automated extraction of per‑package taint specifications (sources, sinks, call edges, and library flow summaries) without manual rule writing.
Integration with CodeQL, demonstrating that the generated specs boost detection of real‑world bugs (106 of 162 previously missed vulnerabilities).
Discovery of four novel security flaws in popular npm libraries, proving the practical security value of the approach.
Empirical evaluation on a large corpus of npm packages, showing that LLM‑augmented analysis can scale to the size and dynamism of the JavaScript ecosystem.

Methodology

Static Call‑Graph Construction – A conventional JavaScript static analyzer builds a conservative call graph for a target package and its dependencies.
LLM‑Driven Edge Resolution – For call sites that remain unresolved due to dynamic features (e.g., eval, dynamic property access, higher‑order functions), a prompt‑engineered LLM (e.g., GPT‑4) is queried to predict the most likely target function(s) based on code context and documentation.
Source/Sink Classification – The LLM is also asked to label functions/objects as sources (where attacker‑controlled data can enter) or sinks (where dangerous actions occur) for a given CWE (e.g., XSS, SQLi).
Specification Synthesis – The resolved edges and labeled sources/sinks are compiled into a taint specification that describes how data may flow through the library.
SAST Execution – The specification is fed into CodeQL, which then performs a full taint‑analysis pass across the codebase, flagging potential vulnerabilities.
Feedback Loop – Detected false positives/negatives are used to refine prompts and improve LLM accuracy in subsequent runs.

Results & Findings

Detection boost: SemTaint enabled CodeQL to uncover 106 out of 162 known vulnerabilities that CodeQL alone missed—a 65 % improvement.
New bugs: The system identified four previously unknown security issues in widely used npm packages, all of which were responsibly disclosed and patched.
Coverage: Across a benchmark of 500 npm packages, the LLM resolved ≈ 78 % of previously ambiguous call edges, dramatically reducing the “unknown” portion of the call graph.
Performance: Adding the LLM step increased analysis time by an average of 1.8×, still within acceptable limits for CI/CD pipelines when run on modest cloud instances.

Practical Implications

Developer tooling: SemTaint can be packaged as a plug‑in for existing SAST platforms (e.g., CodeQL, SonarQube), giving teams immediate, higher‑quality taint specs without hand‑crafting rules.
CI/CD integration: The modest runtime overhead makes it feasible to run on every pull request for high‑risk JavaScript projects, catching bugs earlier.
Supply‑chain security: By automatically generating specs for third‑party dependencies, organizations can audit the entire npm dependency tree rather than trusting vendor‑provided docs.
Rapid adaptation: As new libraries appear, the LLM can infer specs on‑the‑fly, keeping security coverage up‑to‑date without a dedicated rule‑authoring effort.
Cross‑language potential: The same multi‑agent pattern could be applied to other dynamic ecosystems (Python, Ruby), extending its impact beyond JavaScript.

Limitations & Future Work

LLM reliability: The approach depends on the LLM’s correctness; occasional mis‑classifications can introduce false positives or miss true taint flows.
Prompt engineering overhead: Crafting effective prompts for diverse codebases remains a manual step that could benefit from automation.
Scalability to massive monorepos: While feasible for typical npm packages, extremely large codebases may incur higher latency due to the number of LLM queries.
Security of the LLM itself: Using a cloud‑hosted model raises concerns about data leakage; future work could explore on‑premise or distilled models.
Broader evaluation: Extending experiments to other SAST tools, additional CWE categories, and non‑JavaScript languages would validate generality.

SemTaint demonstrates that marrying symbolic static analysis with the semantic intuition of modern LLMs isn’t just a research curiosity—it’s a concrete step toward more robust, automated security for today’s fast‑moving JavaScript ecosystem.

Authors

Jonah Ghebremichael
Saastha Vasan
Saad Ullah
Greg Tystahl
David Adei
Christopher Kruegel
Giovanni Vigna
William Enck
Alexandros Kapravelos

Paper Information

arXiv ID: 2601.10865v1
Categories: cs.CR, cs.SE
Published: January 15, 2026
PDF: Download PDF

[Paper] Multi-Agent Taint Specification Extraction for Vulnerability Detection

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Applying Formal Methods Tools to an Electronic Warfare Codebase (Experience report)

[Paper] A Practical Guide to Establishing Technical Debt Management

[Paper] RITA: A Tool for Automated Requirements Classification and Specification from Online User Feedback

[Paper] Automation and Reuse Practices in GitHub Actions Workflows: A Practitioner's Perspective