[Paper] On the Adoption of AI Coding Agents in Open-source Android and iOS Development

Published: 3 days ago (February 12, 2026 at 11:30 AM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.12144v1

Overview

The paper presents the first large‑scale empirical look at how AI‑powered coding assistants (e.g., GitHub Copilot, Code Llama, Claude) are being used in real‑world open‑source Android and iOS projects. By mining 2,901 AI‑authored pull requests (PRs) from 193 repositories, the authors reveal platform‑specific adoption patterns, acceptance rates, and the kinds of tasks where AI contributions succeed—or stumble.

Key Contributions

Dataset creation – Curated the AIDev dataset, a verified collection of AI‑generated PRs for Android (1,721 PRs) and iOS (1,180 PRs) open‑source apps.
Cross‑platform comparison – Demonstrated that Android projects receive roughly twice as many AI PRs and enjoy a higher acceptance rate (71 % vs. 63 % for iOS).
Agent‑level analysis – Showed significant variance among different coding agents on Android, highlighting that not all assistants perform equally.
Task‑category breakdown – Identified that routine tasks (feature additions, bug fixes, UI tweaks) are most likely to be merged, while structural changes (refactors, build‑system edits) face lower acceptance and longer review cycles.
Temporal evolution – Tracked PR resolution times over 2023‑2025, finding an improvement peak on Android mid‑2025 before a slight regression.
Baseline for future research – Provides the first quantitative benchmarks for evaluating AI‑generated contributions in mobile OSS, paving the way for platform‑aware agent design.

Methodology

Data collection – Queried GitHub’s REST API for PRs that explicitly credit an AI tool in the description or commit metadata.
Verification – Applied a two‑step manual vetting process to ensure the PRs were truly AI‑authored (e.g., checking for generated code snippets, tool‑specific tags).
Categorization – Mapped each PR to a task category (feature, bug‑fix, UI, refactor, build, docs, etc.) using a combination of keyword heuristics and manual labeling.
Statistical analysis – Compared acceptance rates, time‑to‑merge, and reviewer comments across platforms, agents, and categories using chi‑square tests and survival analysis for resolution time trends.
Temporal slicing – Split the data into quarterly windows to observe how AI contribution dynamics evolve over time.

The approach stays lightweight enough for developers to follow while still delivering rigorous, reproducible results.

Results & Findings

Dimension	Android	iOS
AI PR volume	1,721 (≈ 60 % of total)	1,180 (≈ 40 %)
Acceptance rate	71 % merged	63 % merged
Top‑performing agents	Agent A (78 % merge), Agent B (73 %)	Agent C (68 % merge) – less variance
Best‑rated task categories	Feature, Bug‑Fix, UI (≈ 75‑80 % merge)	Same trend, slightly lower (≈ 70‑75 % merge)
Hardest task categories	Refactor, Build (≈ 55‑60 % merge)	Refactor, Build (≈ 50‑55 % merge)
Resolution time trend	Median time dropped from 5 days (2023 Q1) to 2 days (mid‑2025) then rose to 3 days (late‑2025)	Steady around 4‑5 days, minor fluctuations

What it means:

Developers on Android are more willing to accept AI‑generated changes, possibly due to a larger ecosystem of tooling and community norms.
Routine, well‑scoped changes are where AI agents shine; deeper architectural edits still need human oversight.
The “sweet spot” for AI contribution speed peaked in mid‑2025, suggesting that recent model improvements translated into faster review cycles—until a saturation or quality dip set in.

Practical Implications

Tool selection: Teams can prioritize agents that have demonstrated higher acceptance on Android (e.g., Agent A) when targeting that platform, while being more cautious on iOS.
Workflow design: Encourage developers to use AI for incremental features, UI tweaks, and bug fixes, but route refactors and build‑system changes through a stricter review gate or a human‑first approach.
CI/CD integration: Since AI PRs resolve faster on Android, CI pipelines can be tuned to auto‑merge low‑risk AI contributions after a brief automated verification step, accelerating release cycles.
Community guidelines: Open‑source maintainers might adopt policies that require explicit AI attribution and a short human sanity‑check checklist, improving reviewer trust and acceptance rates.
Product road‑mapping: Companies building AI coding assistants can use these baselines to benchmark their models, focusing on improving structural change suggestions to close the acceptance gap.

Limitations & Future Work

Dataset bias: The study only covers public GitHub repositories that voluntarily disclose AI usage, potentially missing private or undisclosed AI contributions.
Agent granularity: Some PRs list multiple agents or generic “AI assistant,” making it hard to attribute performance to a single model.
Temporal horizon: The analysis stops at late‑2025; rapid model releases after that point could shift trends dramatically.
Human factors: The paper does not deeply explore reviewer expertise or project maturity, which could mediate acceptance decisions.

Future research could expand to other mobile ecosystems (e.g., Flutter, React Native), incorporate sentiment analysis of reviewer comments, and experiment with hybrid human‑AI review pipelines to quantify productivity gains.

Authors

Muhammad Ahmad Khan
Hasnain Ali
Muneeb Rana
Muhammad Saqib Ilyas
Abdul Ali Bangash

Paper Information

arXiv ID: 2602.12144v1
Categories: cs.SE, cs.AI
Published: February 12, 2026
PDF: Download PDF

[Paper] On the Adoption of AI Coding Agents in Open-source Android and iOS Development

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

[Paper] AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

[Paper] Agentic Test-Time Scaling for WebAgents