[Paper] A Study of Library Usage in Agent-Authored Pull Requests

Published: (December 12, 2025 at 09:21 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.11589v1

Overview

Lukas Twist’s study dives into how AI‑driven coding agents handle library imports when they automatically generate pull requests (PRs). By analyzing 26,760 agent‑authored PRs from the AIDev dataset, the work uncovers surprising patterns that matter for anyone building or consuming AI‑assisted development tools.

Key Contributions

  • Empirical measurement of library import frequency in agent‑generated PRs (≈ 30 % of PRs).
  • Quantification of new dependency introductions (only 1.3 % of PRs) and the version‑pinning behavior of agents (75 % of added deps specify a version).
  • Comparison with raw LLM outputs, showing agents are far more disciplined about versioning than “bare” language‑model suggestions.
  • Catalog of library diversity, revealing that agents draw from a much broader set of external packages than previously reported for non‑agentic LLM code generation.

Methodology

  1. Dataset – The author leveraged the publicly available AIDev corpus, which contains PRs automatically created by a variety of coding agents (e.g., GitHub Copilot, ChatGPT‑based bots).
  2. PR filtering – Only PRs where the author field matched a known agent identifier were kept, resulting in 26,760 PRs across multiple languages and ecosystems (primarily JavaScript/Node, Python, and Java).
  3. Static analysis – For each PR, the changed files were parsed to detect:
    • import/require statements (library usage).
    • Additions to dependency manifests (package.json, requirements.txt, pom.xml, etc.).
  4. Version extraction – When a new dependency was added, the manifest entry was examined to see if an explicit version constraint was present.
  5. Baseline comparison – A parallel set of human‑written PRs and raw LLM‑generated snippets (without an agent wrapper) were analyzed to contextualize the findings.

The pipeline is fully reproducible and uses off‑the‑shelf parsers (AST‑based for JavaScript/Python, XML parsers for Maven) to keep the analysis approachable for developers.

Results & Findings

MetricAgent‑authored PRsRaw LLM snippets (baseline)
PRs that import at least one library29.5 %22 %
PRs that add a new dependency1.3 %0.4 %
New dependencies with an explicit version75 %12 %
Number of distinct libraries referenced≈ 1,200≈ 350
  • Library imports are common but conservative – Agents tend to reuse already‑declared dependencies rather than pulling in fresh packages.
  • Versioning discipline – When agents do add a new library, they almost always pin a version, reducing the risk of downstream breakage.
  • Diverse ecosystem reach – The long tail of libraries (many used only once) suggests agents are not stuck on a narrow “favorite” set, unlike earlier studies of plain LLM code generation.

Practical Implications

  • Tool builders can trust that agent‑mediated PRs are less likely to introduce “dependency hell” compared with raw LLM suggestions, but they should still enforce review gates for any new package addition.
  • CI/CD pipelines may benefit from lightweight checks that flag un‑versioned dependency additions, a scenario that is now relatively rare but still possible.
  • Package maintainers can anticipate that AI agents will gradually surface a broader range of libraries, potentially increasing traffic to niche packages.
  • Developer onboarding – Teams adopting AI coding assistants can focus their policy discussions on when to allow agents to add new deps rather than fearing a flood of uncontrolled imports.

Limitations & Future Work

  • Language scope – The analysis concentrates on the three most popular ecosystems; behavior could differ for Rust, Go, or .NET.
  • Agent heterogeneity – The dataset aggregates many agents with varying internal prompts and post‑processing; disentangling individual agent strategies was outside the paper’s scope.
  • Temporal dynamics – The study captures a snapshot; as agents evolve, their library‑selection heuristics may shift, calling for longitudinal monitoring.

Future research could explore how agents decide which version to pin (latest stable vs. exact) and whether they respect project‑specific dependency policies (e.g., internal mirrors, security scanners).

Authors

  • Lukas Twist

Paper Information

  • arXiv ID: 2512.11589v1
  • Categories: cs.SE
  • Published: December 12, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »