[Paper] A Practical Solution to Systematically Monitor Inconsistencies in SBOM-based Vulnerability Scanners

Published: (December 19, 2025 at 10:42 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.17710v1

Overview

The paper tackles a growing pain point in software supply‑chain security: SBOM‑based vulnerability scanners (SVS) often give inconsistent or silently incorrect results, leading to missed vulnerabilities. The authors introduce SVS‑TEST, a systematic method and open‑source tool that evaluates how well these scanners handle real‑world SBOMs, exposing reliability gaps that could otherwise give developers a false sense of safety.

Key Contributions

  • SVS‑TEST framework – a reproducible methodology and tooling suite for benchmarking the capability, maturity, and failure modes of SVS tools.
  • Comprehensive case study – evaluation of seven popular SVS tools using 16 carefully crafted SBOMs with known ground‑truth vulnerability data.
  • Empirical findings – detailed quantification of false negatives, silent failures, and error‑handling differences across tools.
  • Public artifacts – all SBOMs, ground‑truth data, and the SVS‑TEST code are released openly, enabling the community to replicate and extend the study.
  • Actionable guidance – concrete recommendations for both SVS‑tool developers and organizations that rely on these scanners.

Methodology

  1. SBOM Corpus Construction – The authors generated 16 SBOMs that vary in size, format (CycloneDX, SPDX), and complexity (e.g., nested dependencies, optional fields). Each SBOM is paired with a ground‑truth list of known vulnerable components.
  2. Tool Selection – Seven widely‑used SVS tools (both open‑source and commercial) were chosen to represent the current market.
  3. SVS‑TEST Execution – For each tool, the framework runs the scanner against every SBOM, captures the output, and automatically compares it to the ground‑truth list. It also records any error messages, exit codes, and execution logs.
  4. Metrics & Scoring – The authors define three primary metrics:
    • Recall – how many true vulnerabilities are reported.
    • Silent‑Failure Rate – cases where the tool returns no results despite valid input.
    • Error‑Handling Robustness – presence of meaningful diagnostics.
  5. Analysis – Results are aggregated, visualized, and statistically examined to highlight systematic differences among tools.

The entire pipeline is scripted in Python and containerized, making it easy for anyone to plug in additional scanners or SBOMs.

Results & Findings

MetricBest Performing ToolWorst Performing Tool
Recall (average)92 %68 %
Silent‑Failure Rate0 %38 %
Meaningful Error Messages95 % of runs42 % of runs
  • False negatives are common: Even the top tool missed roughly 8 % of known vulnerable components.
  • Silent failures are alarming: Three tools returned an empty vulnerability list for up to 38 % of valid SBOMs, offering no warning that the scan had failed.
  • Format sensitivity: Some scanners only handled CycloneDX correctly, while others broke on SPDX or on SBOMs that included optional metadata.
  • Error handling varies widely: Tools that emitted clear diagnostics allowed users to remediate input issues; those that failed silently left developers unaware of problems.

Overall, the study shows that relying on a single SVS tool without independent verification can leave software supply chains exposed.

Practical Implications

  • For DevOps teams: Integrate SVS‑TEST into CI pipelines to continuously validate the scanners you depend on. Detect silent failures before they propagate into production releases.
  • For security tool vendors: The benchmark provides a concrete checklist (format support, graceful error handling, comprehensive vulnerability databases) that can be used to improve product maturity.
  • For auditors & compliance officers: The metrics give a quantifiable way to assess whether an organization’s SBOM scanning process meets regulatory expectations (e.g., Executive Order 14028).
  • For open‑source maintainers: The publicly released SBOM corpus can serve as a regression suite when adding new language ecosystems or dependency‑graph features.

In short, SVS‑TEST turns “black‑box” scanning into a testable, observable component of the software supply‑chain security stack.

Limitations & Future Work

  • Scope of tools: Only seven scanners were evaluated; the landscape is rapidly evolving, and newer tools may exhibit different behaviors.
  • SBOM diversity: While the 16 SBOMs cover common edge cases, they do not exhaust all possible real‑world complexities (e.g., multi‑language monorepos, custom metadata).
  • Vulnerability database freshness: The ground‑truth data reflects a snapshot in time; tools that update their CVE feeds more frequently could appear less accurate in the study.
  • Future directions suggested by the authors include expanding the SBOM corpus, automating the inclusion of live vulnerability feeds, and extending SVS‑TEST to evaluate performance (runtime, resource usage) alongside correctness.

By acknowledging these constraints, the community can build on this foundation to create an even more robust ecosystem for SBOM‑based security testing.

Authors

  • Martin Rosso
  • Muhammad Asad Jahangir Jaffar
  • Alessandro Brighente
  • Mauro Conti

Paper Information

  • arXiv ID: 2512.17710v1
  • Categories: cs.SE, cs.CR
  • Published: December 19, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »