[Paper] Uncovering Hidden Inclusions of Vulnerable Dependencies in Real-World Java Projects

Published: (January 30, 2026 at 09:30 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2601.23020v1

Overview

The paper introduces Unshade, a hybrid dependency‑scanning technique for Java projects that uncovers “hidden” vulnerable libraries—those that have been repackaged, renamed, or otherwise altered so that traditional metadata‑based scanners miss them. By marrying fast SBOM (software‑bill‑of‑materials) analysis with a lightweight bytecode fingerprinting step, the authors reveal a massive, previously invisible attack surface in popular open‑source Java projects.

Key Contributions

  • Hybrid scanning pipeline that augments a project’s SBOM with bytecode‑level fingerprints, enabling detection of modified dependencies without sacrificing speed.
  • Bytecode‑based fingerprinting algorithm that reliably identifies a library even after it has been repackaged, shaded, or otherwise transformed.
  • Large‑scale empirical study of 1,808 top‑rated Maven projects on GitHub, showing that ~50 % contain at least one hidden vulnerable dependency.
  • Quantitative impact: 7,712 distinct CVEs were found in hidden dependencies, averaging >8 hidden vulnerable libraries per affected project—none of which are caught by conventional metadata scanners.
  • Open‑source prototype (Unshade) released, demonstrating practical scalability to real‑world codebases.

Methodology

  1. SBOM Generation – The tool first builds a conventional SBOM for the target Maven project using existing metadata (groupId, artifactId, version).
  2. Bytecode Fingerprinting – Every JAR on the classpath is inspected; a deterministic hash is derived from the bytecode of public classes (method signatures, constant pool entries, etc.). This fingerprint is resilient to renaming, shading, or repackaging.
  3. Augmentation – The original SBOM is enriched with any additional libraries whose fingerprints match known OSS components but are not listed in the Maven metadata.
  4. Vulnerability Lookup – The augmented SBOM is fed into a standard metadata‑based vulnerability database (e.g., NVD, OSS Index). Because the hidden libraries are now represented, their CVEs surface.
  5. Scalability Validation – The entire pipeline runs in a few seconds per project, making it feasible for CI/CD integration.

Results & Findings

  • Hidden vulnerable dependencies were present in 49.9 % of the studied projects.
  • Affected projects contained average 8.3 hidden vulnerable libraries each (range 1–42).
  • 7,712 unique CVEs were discovered exclusively in hidden dependencies; the same projects showed only ~2,300 CVEs when scanned with metadata‑only tools.
  • The fingerprinting step added < 2 % overhead compared to a pure metadata scan, confirming the approach’s practicality for continuous integration pipelines.

Practical Implications

  • CI/CD Integration – Teams can plug Unshade into their build pipelines to catch hidden vulnerabilities before release, complementing existing SCA tools.
  • Risk Management – Security dashboards that rely solely on declared dependencies dramatically under‑report risk; Unshade provides a more realistic exposure metric.
  • Supply‑Chain Hardening – Organizations adopting “shaded” or repackaged libraries (common in microservice frameworks) can now verify that the underlying components are still safe.
  • Policy Enforcement – Enterprises can define policies that block builds containing any hidden vulnerable dependency, reducing the chance of accidental exposure.
  • Developer Awareness – By surfacing the exact hidden libraries and associated CVEs, developers gain actionable insight (e.g., replace a shaded version with an up‑to‑date upstream artifact).

Limitations & Future Work

  • Language Scope – The current implementation targets Java bytecode; extending the fingerprinting technique to other JVM languages (Kotlin, Scala) or ecosystems (JavaScript, Python) remains open.
  • Fingerprint Collisions – While rare, the hash could theoretically collide for unrelated libraries; the authors suggest adding more bytecode features to tighten uniqueness.
  • Vulnerability Database Dependence – Accuracy hinges on the completeness of the external CVE feeds; missing entries will still be invisible.
  • Dynamic Loading – Dependencies loaded at runtime via reflection or custom class loaders may evade static bytecode analysis; future work could incorporate runtime instrumentation.

Unshade demonstrates that a modest, bytecode‑level tweak to existing SBOM workflows can dramatically improve security visibility for Java projects—a compelling reminder that “what you don’t see” can be the most dangerous part of your software supply chain.

Authors

  • Stefan Schott
  • Serena Elisa Ponta
  • Wolfram Fischer
  • Jonas Klauke
  • Eric Bodden

Paper Information

  • arXiv ID: 2601.23020v1
  • Categories: cs.SE, cs.CR
  • Published: January 30, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »