[Paper] Novice Developers Produce Larger Review Overhead for Project Maintainers while Vibe Coding

Published: (February 27, 2026 at 05:55 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.23905v1

Overview

The paper investigates whether novice developers using AI‑assisted coding tools (referred to as “vibe coders”) can replace seasoned contributors in open‑source projects. By analysing almost 23 K pull requests (PRs) from 1.7 K vibe coders on GitHub, the authors show that less‑experienced coders tend to generate larger code changes but also create significantly more review work for maintainers, leading to lower acceptance rates and longer PR lifetimes.

Key Contributions

  • Empirical comparison of low‑experience vs. high‑experience vibe coders across 22 953 PRs.
  • Quantitative evidence that novice vibe coders submit PRs with 2.15× more commits and 1.47× more files changed than their experienced peers.
  • Demonstration that novice PRs attract 4.52× more review comments, have a 31 % lower acceptance rate, and stay open 5.16× longer before resolution.
  • Insight that the productivity boost from AI coding agents comes at the cost of higher review overhead for project maintainers.
  • Practical recommendations for project managers on balancing AI‑assisted contributions with reviewer capacity and targeted training.

Methodology

  1. Data collection – The authors leveraged the AIDev dataset, extracting all PRs that involved AI‑generated code (“vibe coding”) from 1,719 unique developers across multiple GitHub repositories.
  2. Experience classification – Developers were split into two groups:
    • Exp_Low: developers with fewer prior contributions (low experience).
    • Exp_High: developers with a richer contribution history (high experience).
  3. Metric extraction – For each PR, they measured: number of commits, files changed, lines added/deleted, number of review comments, time to close, and acceptance status (merged vs. rejected).
  4. Statistical analysis – Non‑parametric tests (Mann‑Whitney U) and effect‑size calculations were used to compare the two groups, controlling for repository size and language where possible.
  5. Validation – A subset of PRs was manually inspected to confirm that the AI‑generated code was indeed present and that the experience labels were accurate.

Results & Findings

  • Larger code footprints: Exp_Low PRs contain on average 2.15 × more commits and modify 1.47 × more files than Exp_High PRs.
  • Heavier review load: Reviewers left 4.52 × more comments on novice PRs, indicating more questions, suggestions, or rework needed.
  • Lower success rate: Only about 69 % of low‑experience PRs were merged, compared to 100 %‑plus for high‑experience PRs (31 % lower acceptance).
  • Longer turnaround: Novice PRs remained open 5.16 × longer before being closed or merged, stretching the feedback loop.
  • Interpretation: Novice vibe coders tend to rely on the AI to generate large code chunks quickly, but they lack the domain knowledge and testing rigor to ensure quality, shifting the verification burden onto human reviewers.

Practical Implications

  • Reviewer capacity planning – Teams adopting AI‑assisted coding should anticipate a surge in review comments when onboarding junior developers. Allocating additional reviewer bandwidth or automating parts of the review (e.g., static analysis) can mitigate bottlenecks.
  • Training & mentorship – Pairing novice vibe coders with mentors who can guide prompt engineering, code validation, and testing practices reduces the downstream review effort.
  • Selective AI usage – Encourage experienced developers to use AI for specific, well‑bounded tasks (e.g., boilerplate generation) while keeping them responsible for architectural decisions and critical sections.
  • Adaptive PR policies – Implement tiered review workflows: fast‑track PRs from experienced coders, while routing novice PRs through a more thorough checklist (linting, unit tests, CI checks) before human review.
  • Risk management – For safety‑critical or high‑stakes projects, relying solely on low‑experience AI‑generated contributions may be unsafe; a hybrid model that blends AI assistance with expert oversight is advisable.

Limitations & Future Work

  • Experience proxy – The study uses contribution count as a proxy for experience, which may not capture qualitative aspects like domain expertise or code quality history.
  • Dataset scope – All PRs come from the AIDev dataset; results might differ in private repositories or in languages not well‑represented on GitHub.
  • AI tool variance – The analysis does not differentiate between specific AI coding agents (e.g., GitHub Copilot vs. custom LLMs), which could have varying impact on code quality.
  • Future directions – The authors suggest extending the study to examine the effect of targeted prompt‑engineering training for novices, evaluating automated review tools that can pre‑filter AI‑generated code, and exploring longitudinal outcomes (e.g., whether novices improve over time with AI assistance).

Authors

  • Syed Ammar Asdaque
  • Imran Haider
  • Muhammad Umar Malik
  • Maryam Abdul Ghafoor
  • Abdul Ali Bangash

Paper Information

  • arXiv ID: 2602.23905v1
  • Categories: cs.SE
  • Published: February 27, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »