[Paper] A Comprehensive Study on the Impact of Vulnerable Dependencies on Open-Source Software

Published: (December 3, 2025 at 10:20 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.03868v1

Overview

The paper presents the largest‑to‑date empirical analysis of vulnerable open‑source dependencies across more than 1,000 GitHub projects and ~50 k releases spanning ten years. By leveraging their own Software Composition Analysis (SCA) tool, VODA, the authors quantify how often critical bugs slip into production, how long they linger, and what project‑level factors influence these risks—insights that are directly relevant to anyone who ships code that depends on third‑party libraries.

Key Contributions

  • Broad, multilingual dataset – 1 k+ real‑world projects covering Java, Python, Rust, Go, Ruby, PHP, and JavaScript, far larger and more diverse than prior studies.
  • Automated dependency tracing – VODA extracts full dependency trees (including transitive links) and maps each version to known CVEs.
  • Empirical metrics on vulnerability lifecycle – average persistence of a critical vulnerability exceeds 12 months, with many remaining unpatched for years.
  • Correlation analysis – links vulnerability prevalence to project characteristics such as team size, contributor turnover, release cadence, and dependency depth.
  • Open dataset & tooling – the authors release the curated dependency‑vulnerability data, enabling reproducibility and further research.

Methodology

  1. Project selection – The authors sampled 1 k+ popular GitHub repositories (stars ≥ 500) across seven programming ecosystems.
  2. Version mining – For each repository they retrieved every tagged release from 2013‑2023 (≈ 50 k releases).
  3. Dependency extraction – VODA parses language‑specific manifest files (e.g., pom.xml, requirements.txt, Cargo.toml) to build a complete dependency graph, distinguishing direct from transitive dependencies.
  4. Vulnerability mapping – Each library version is cross‑referenced with public CVE databases (NVD, GitHub Advisory DB, etc.) to flag known security issues and their severity (CVSS).
  5. Metric computation – The team calculates persistence time (from first appearance of a vulnerable version to its removal/fix), depth statistics, and aggregates project‑level attributes (contributors, commits, release frequency).
  6. Statistical analysis – Correlation and regression techniques assess how project metrics affect vulnerability exposure.

Results & Findings

FindingWhat it means
Transitive dependencies dominate – > 70 % of vulnerable libraries are not directly declared but come from deeper layers of the dependency tree.Even if you audit the libraries you import, you can still inherit bugs from their own dependencies.
Critical CVEs linger – Median time to remediate a critical vulnerability is 13 months; some persist > 3 years.Fixing high‑severity issues is far slower than the industry would like, exposing users to long‑term risk.
Depth matters – Projects with deeper dependency trees (> 3 levels) experience 1.8× more vulnerable components.Keeping the dependency graph shallow reduces the attack surface.
Team size & activity help – Larger, more active teams (≥ 5 core contributors, > monthly releases) remediate vulnerabilities ~30 % faster.Organizational health translates into better supply‑chain security.
Language differences – Java and JavaScript ecosystems show the highest proportion of vulnerable transitive deps, while Rust and Go have fewer but longer‑lasting critical CVEs.Security practices need to be tuned per ecosystem.

Practical Implications

  • Integrate SCA early and continuously – Running tools like VODA (or commercial equivalents) in CI/CD can surface transitive risks before they ship.
  • Automate dependency updates – Use bots (e.g., Dependabot, Renovate) to create pull requests for vulnerable libraries; the data shows that faster merges cut exposure time dramatically.
  • Limit dependency depth – Adopt “dependency hygiene” policies: prefer libraries with minimal transitive requirements, and prune unused packages regularly.
  • Prioritize critical CVEs – Since critical bugs linger for over a year, treat any CVSS ≥ 9.0 as a release blocker, even if the vulnerable version is deep in the graph.
  • Team and process investment – Encourage regular security reviews, allocate dedicated “dependency owners,” and maintain a release cadence that allows timely patches.
  • Cross‑language awareness – When mixing languages (e.g., a Node.js front‑end with a Python back‑end), apply ecosystem‑specific SCA rules; a one‑size‑fits‑all policy can miss high‑risk patterns.

Limitations & Future Work

  • CVE coverage bias – The analysis relies on publicly disclosed CVEs; undisclosed or zero‑day bugs are not captured, potentially under‑estimating risk.
  • Selection bias – Projects were chosen based on popularity (stars), which may not reflect the security posture of smaller or enterprise‑internal codebases.
  • Static snapshot – The study looks at released versions; it does not consider in‑flight development branches where vulnerable dependencies might already be present.
  • Future directions – The authors plan to extend VODA to monitor real‑time dependency changes, incorporate proprietary vulnerability feeds, and evaluate the impact of automated remediation tools on reducing persistence times.

Authors

  • Shree Hari Bittugondanahalli Indra Kumar
  • Lilia Rodrigues Sampaio
  • André Martin
  • Andrey Brito
  • Christof Fetzer

Paper Information

  • arXiv ID: 2512.03868v1
  • Categories: cs.SE, cs.CR
  • Published: December 3, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »