[Paper] Configuring Agentic AI Coding Tools: An Exploratory Study

Published: 3 days ago (February 16, 2026 at 07:24 AM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.14690v1

Overview

The paper Configuring Agentic AI Coding Tools: An Exploratory Study examines how developers actually set up the newest generation of “agentic” AI assistants—tools that can run autonomously, fetch data, and even invoke sub‑agents to complete coding tasks. By mining thousands of open‑source repositories, the authors map out the real‑world configuration practices that make these agents work, shedding light on emerging standards and the gaps that still need to be filled.

Key Contributions

Taxonomy of configuration mechanisms – identifies eight distinct ways developers can steer agentic coding tools (e.g., Context Files, Skills, Subagents).
Large‑scale empirical snapshot – analyzes 2,926 GitHub repos that use Claude Code, GitHub Copilot, Cursor, Gemini, or Codex, quantifying adoption rates for each mechanism.
Discovery of an emerging interoperable standard – the AGENTS.md file surfaces as a de‑facto cross‑tool format for declaring context.
Insight into “configuration cultures” – shows how different tool ecosystems favor different mechanisms (Claude Code users employ the widest variety).
Baseline for future longitudinal and experimental work – provides the first systematic measurement of how configuration choices affect agent performance.

Methodology

Tool selection – The study focuses on the five most popular agentic coding assistants that expose repository‑level configuration (Claude Code, GitHub Copilot, Cursor, Gemini, Codex).
Data collection – Using the GitHub REST API, the authors harvested all public repositories that contain any of the known configuration artifacts (e.g., *.json, *.md files named according to each tool’s spec). This yielded 2,926 distinct projects.
Classification – Each repository was manually labeled for the presence of the eight configuration mechanisms, with special attention to the three cross‑tool mechanisms (Context Files, Skills, Subagents).
Quantitative analysis – Frequency counts, co‑occurrence matrices, and per‑tool breakdowns were generated to spot trends.
Qualitative inspection – A sample of “Skills” and “Subagents” files was examined to understand whether they contain static prompts or executable workflows.

The approach balances breadth (thousands of repos) with depth (manual inspection of a representative subset), making the findings reliable for both researchers and practitioners.

Results & Findings

Finding	What the data show
Context Files dominate	Over 70 % of all repositories include at least one Context File; in many cases it is the only configuration artifact.
`AGENTS.md` as a lingua franca	This Markdown‑based file appears in 42 % of the sampled repos and is accepted by all five tools, hinting at an emerging standard.
Shallow adoption of advanced mechanisms	Only ~15 % of repos define a Skill, and ~8 % define a Subagent. When present, they usually contain a single static instruction rather than a multi‑step workflow.
Tool‑specific cultures	Claude Code users employ the full spectrum of mechanisms (average 3.2 per repo), while Copilot and Gemini users stick mostly to Context Files.
Artifact sparsity	The majority of repos (≈60 %) define just one configuration file; multi‑artifact setups are rare.

These patterns suggest that developers are still in the early adoption phase: they rely heavily on simple context provisioning and have yet to explore the richer, programmable capabilities that agentic tools promise.

Practical Implications

Standardize on AGENTS.md – Teams looking to future‑proof their codebases can adopt this Markdown format now; it works across the major agents and reduces tool‑lock‑in.
Start simple, iterate – Since most projects succeed with a single Context File, developers can get immediate value by curating relevant files, dependencies, and environment hints without writing complex Skill scripts.
Invest in reusable Skills – The low adoption of executable Skills points to an opportunity: libraries of ready‑to‑run workflows (e.g., “run tests → refactor → commit”) could dramatically boost productivity once they become more mature.
Tool selection matters – If a team wants to experiment with sophisticated orchestration (multiple Subagents, dynamic pipelines), Claude Code currently offers the richest ecosystem. Conversely, for lightweight assistance, Copilot or Gemini may be sufficient.
Monitoring agent performance – The study provides a baseline; developers can now track how adding a new configuration artifact (e.g., a Skill) changes metrics like code suggestion relevance, build success rate, or developer cycle time.

Limitations & Future Work

Snapshot in time – The analysis captures a static view of repositories; configuration practices may evolve rapidly as tools release new features.
Public‑repo bias – Private or enterprise codebases, which might use more advanced configurations, are not represented.
Performance correlation missing – The study does not directly measure how different configurations affect the quality or speed of the agents’ output; future experiments should link configuration choices to concrete productivity metrics.
Tool coverage – Only five agents were examined; emerging platforms (e.g., Anthropic’s Claude 3, Meta’s Llama‑Code) could introduce new mechanisms.

By addressing these gaps, subsequent research can turn the current descriptive baseline into actionable guidelines for building high‑performing, agent‑driven development pipelines.

Authors

Matthias Galster
Seyedmoein Mohsenimofidi
Jai Lal Lulla
Muhammad Auwal Abubakar
Christoph Treude
Sebastian Baltes

Paper Information

arXiv ID: 2602.14690v1
Categories: cs.SE
Published: February 16, 2026
PDF: Download PDF

[Paper] Configuring Agentic AI Coding Tools: An Exploratory Study

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Software-heavy Asset Administration Shells: Classification and Use Cases

[Paper] Mind the Gap: Evaluating LLMs for High-Level Malicious Package Detection vs. Fine-Grained Indicator Identification

[Paper] A Calculus of Overlays

[Paper] Algorithm-Based Pipeline for Reliable and Intent-Preserving Code Translation with LLMs