[Paper] Collaborator or Assistnat? How AI Coding Agents Partition Work Across Pull Request Lifecycles

Published: (May 8, 2026 at 01:06 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2605.08017v1

Overview

This paper investigates how AI‑powered coding assistants and collaborators take part in the full lifecycle of a pull request (PR). By examining nearly 30 K PRs across five popular tools (OpenAI, Claude, GitHub Copilot, Cursor, and Devin), the authors map who starts the work and who authorizes the merge, revealing a spectrum from “assistant” (human‑driven) to “collaborator” (agent‑driven) behaviours.

Key Contributions

  • Initiator × Approver taxonomy that defines six interaction scenarios for PRs (e.g., agent‑initiated + human‑approved, human‑initiated + agent‑approved).
  • Empirical analysis of 29,585 PR lifecycles, showing how each tool distributes initiative and oversight.
  • State‑machine models for each tool that illustrate the typical sequence of actions (branch creation, commits, reviews, merge).
  • Open replication package (data, scripts, and taxonomy) to enable further research on automation and governance in software development.
  • Insight that merge governance stays human‑centric even when agents dominate the operational work.

Methodology

  1. Data collection – The authors harvested PR metadata (events, timestamps, actors) from public repositories that used the five AI tools.
  2. Role inference – They classified each event as either initiated (who opened the branch/PR) or approved (who performed the final merge) and mapped it to the taxonomy.
  3. Lifecycle reconstruction – By ordering events, they built a per‑tool state machine that captures typical PR flows (e.g., “agent opens → human reviews → human merges”).
  4. Statistical analysis – Frequencies of each interaction scenario were computed per tool, and cross‑tool comparisons highlighted the collaborator‑assistant spectrum.

The approach is deliberately tool‑agnostic: any system that logs PR events can be slotted into the same taxonomy.

Results & Findings

Tool% PRs agent‑initiated% PRs human‑approvedTypical flow
Cursor≥ 96 %≈ 99 %Agent creates branch & PR, human reviews, human merges
Devin≥ 96 %≈ 99 %Same pattern as Cursor
Copilot≥ 96 %≈ 99 %Same pattern
OpenAI~ 30 %≈ 98 %Human drives PR, AI offers suggestions
Claude~ 25 %≈ 98 %Human‑led, AI assists in code edits
  • Collaborator tools (Cursor, Devin, Copilot) push operational initiative to the AI: they open branches, push commits, and keep the PR alive with minimal human prompting.
  • Assistant tools (OpenAI, Claude) stay in a supportive role: humans open PRs and decide when to merge; AI only supplies code snippets or refactorings.
  • Merge authority is overwhelmingly human across all tools; only a tiny fraction of PRs show an “agent‑approved” merge, and those cases lack clear decision‑maker logs.
  • The taxonomy uncovered six distinct interaction patterns, but > 95 % of observed PRs fell into two: agent‑initiated + human‑approved (collaborator) and human‑initiated + human‑approved (assistant).

Practical Implications

  • Tool selection: Teams that want AI to take the lead on routine bug‑fixes or scaffolding can adopt collaborator‑style agents (Copilot, Cursor). Those that need tight human control over what gets merged should prefer assistant‑style tools (OpenAI, Claude).
  • Workflow design: Knowing that merges stay human‑centric, organizations can design review gates (e.g., mandatory code‑owner approvals) without fearing AI‑driven “silent merges.”
  • Observability & audit: The paper highlights a blind spot—when an AI executes a merge, logs capture the executor but not the decision logic. Companies should augment CI/CD pipelines with explicit decision‑recording (e.g., signed merge requests).
  • Compliance & security: For regulated environments, the collaborator spectrum may raise concerns about “unauthenticated” code changes. The findings suggest that adding a final human approval step mitigates most risk.
  • Product roadmaps: Vendors can use the state‑machine models to identify missing hand‑off points (e.g., adding an “AI‑suggested merge” checkpoint) that could improve transparency and user trust.

Limitations & Future Work

  • Scope of tools – Only five AI agents were examined; newer or niche tools may exhibit different patterns.
  • Dataset bias – The PRs come from public repositories that already adopt these agents, possibly over‑representing enthusiastic early adopters.
  • Decision‑maker visibility – The study could not reliably attribute merge decisions to AI when the executor was an agent, leaving a gap in governance analysis.
  • Future directions suggested by the authors include: expanding the taxonomy to CI/CD pipelines, studying the impact of AI‑driven merges on code quality and defect rates, and building richer audit logs that capture both the executor and the decision authority.

Authors

  • Young Jo
  • Chung
  • Safwat Hassan

Paper Information

  • arXiv ID: 2605.08017v1
  • Categories: cs.SE
  • Published: May 8, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »