[Paper] Collaborator or Assistnat? How AI Coding Agents Partition Work Across Pull Request Lifecycles

Published: 3 days ago (May 8, 2026 at 01:06 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.08017v1

Overview

This paper investigates how AI‑powered coding assistants and collaborators take part in the full lifecycle of a pull request (PR). By examining nearly 30 K PRs across five popular tools (OpenAI, Claude, GitHub Copilot, Cursor, and Devin), the authors map who starts the work and who authorizes the merge, revealing a spectrum from “assistant” (human‑driven) to “collaborator” (agent‑driven) behaviours.

Key Contributions

Initiator × Approver taxonomy that defines six interaction scenarios for PRs (e.g., agent‑initiated + human‑approved, human‑initiated + agent‑approved).
Empirical analysis of 29,585 PR lifecycles, showing how each tool distributes initiative and oversight.
State‑machine models for each tool that illustrate the typical sequence of actions (branch creation, commits, reviews, merge).
Open replication package (data, scripts, and taxonomy) to enable further research on automation and governance in software development.
Insight that merge governance stays human‑centric even when agents dominate the operational work.

Methodology

Data collection – The authors harvested PR metadata (events, timestamps, actors) from public repositories that used the five AI tools.
Role inference – They classified each event as either initiated (who opened the branch/PR) or approved (who performed the final merge) and mapped it to the taxonomy.
Lifecycle reconstruction – By ordering events, they built a per‑tool state machine that captures typical PR flows (e.g., “agent opens → human reviews → human merges”).
Statistical analysis – Frequencies of each interaction scenario were computed per tool, and cross‑tool comparisons highlighted the collaborator‑assistant spectrum.

The approach is deliberately tool‑agnostic: any system that logs PR events can be slotted into the same taxonomy.

Results & Findings

Tool	% PRs agent‑initiated	% PRs human‑approved	Typical flow
Cursor	≥ 96 %	≈ 99 %	Agent creates branch & PR, human reviews, human merges
Devin	≥ 96 %	≈ 99 %	Same pattern as Cursor
Copilot	≥ 96 %	≈ 99 %	Same pattern
OpenAI	~ 30 %	≈ 98 %	Human drives PR, AI offers suggestions
Claude	~ 25 %	≈ 98 %	Human‑led, AI assists in code edits

Collaborator tools (Cursor, Devin, Copilot) push operational initiative to the AI: they open branches, push commits, and keep the PR alive with minimal human prompting.
Assistant tools (OpenAI, Claude) stay in a supportive role: humans open PRs and decide when to merge; AI only supplies code snippets or refactorings.
Merge authority is overwhelmingly human across all tools; only a tiny fraction of PRs show an “agent‑approved” merge, and those cases lack clear decision‑maker logs.
The taxonomy uncovered six distinct interaction patterns, but > 95 % of observed PRs fell into two: agent‑initiated + human‑approved (collaborator) and human‑initiated + human‑approved (assistant).

Practical Implications

Tool selection: Teams that want AI to take the lead on routine bug‑fixes or scaffolding can adopt collaborator‑style agents (Copilot, Cursor). Those that need tight human control over what gets merged should prefer assistant‑style tools (OpenAI, Claude).
Workflow design: Knowing that merges stay human‑centric, organizations can design review gates (e.g., mandatory code‑owner approvals) without fearing AI‑driven “silent merges.”
Observability & audit: The paper highlights a blind spot—when an AI executes a merge, logs capture the executor but not the decision logic. Companies should augment CI/CD pipelines with explicit decision‑recording (e.g., signed merge requests).
Compliance & security: For regulated environments, the collaborator spectrum may raise concerns about “unauthenticated” code changes. The findings suggest that adding a final human approval step mitigates most risk.
Product roadmaps: Vendors can use the state‑machine models to identify missing hand‑off points (e.g., adding an “AI‑suggested merge” checkpoint) that could improve transparency and user trust.

Limitations & Future Work

Scope of tools – Only five AI agents were examined; newer or niche tools may exhibit different patterns.
Dataset bias – The PRs come from public repositories that already adopt these agents, possibly over‑representing enthusiastic early adopters.
Decision‑maker visibility – The study could not reliably attribute merge decisions to AI when the executor was an agent, leaving a gap in governance analysis.
Future directions suggested by the authors include: expanding the taxonomy to CI/CD pipelines, studying the impact of AI‑driven merges on code quality and defect rates, and building richer audit logs that capture both the executor and the decision authority.

Authors

Young Jo
Chung
Safwat Hassan

Paper Information

arXiv ID: 2605.08017v1
Categories: cs.SE
Published: May 8, 2026
PDF: Download PDF

[Paper] Collaborator or Assistnat? How AI Coding Agents Partition Work Across Pull Request Lifecycles

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization

[Paper] Evaluating Design Conformance Through Trace Comparison

[Paper] Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem

[Paper] Can I Check What I Designed? Mapping Security Design DSLs to Code Analyzers