New Research Reassesses the Value of Agents.md Files for AI Coding

Published: 2 days ago (March 8, 2026 at 03:52 AM EDT)

5 min read

Source: Hacker News

Overview

Despite widespread industry recommendations, a new ETH Zurich paper concludes that AGENTS.md files may often hinder AI coding agents. The researchers recommend:

Omitting LLM‑generated context files entirely.
Limiting human‑written instructions to non‑inferable details (e.g., highly specific tooling or custom build commands).

Authors & Motivation

The team – Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, and Martin Vechev – justified the research by noting:

~60 000 open‑source repositories currently contain context files such as AGENTS.md.
Many agent frameworks include built‑in commands to auto‑generate these files.
No rigorous empirical investigation had examined whether these files actually improve an AI agent’s ability to resolve real‑world coding tasks.

One author also contributed to the Humanity Last Exam benchmark.

Dataset: AGENTbench

The researchers built AGENTbench, a novel dataset of 138 real‑world Python tasks sourced from niche repositories. This deliberately avoids the bias of popular benchmarks like SWE‑bench, which AI models may have partially memorized.

Tested Agents

Agent	Model
Claude	3.5 Sonnet
Codex	GPT‑5.2
Codex	GPT‑5.1 mini
Qwen	Code

Experimental Scenarios

No context file
LLM‑generated context file
Human‑written context file

All chosen niche repositories originally featured human‑written context files; the first two scenarios were created by removing or replacing those files.

Evaluation Metrics

Task success rate (determined by repository unit tests)
Number of agent steps
Overall inference cost

Key Findings

Context Type	Success‑Rate Δ vs. No‑Context	Steps ↑	Inference‑Cost ↑
LLM‑generated	‑3 % (degradation)	↑	+20 %
Human‑written	+4 % (marginal gain)	↑	+19 %

Including architectural overviews or repository‑structure explanations in AGENTS.md did not reduce the time the model spent locating relevant files.
Agents generally followed the instructions in the AGENTS.md file, leading to more tests, file reads, grep searches, and code‑quality checks—behaviors often unnecessary for the specific task.
The extra context forced the reasoning models to “think harder” without yielding better final patches.

Authors’ Conclusions

“We find that all context files consistently increase the number of steps required to complete tasks. LLM‑generated context files have a marginal negative effect on task success rates, while developer‑written ones provide a marginal performance gain.

Our trace analyses show that instructions in context files are generally followed and lead to more testing and broader exploration; however, they do not function as effective repository overviews. Overall, our results suggest that context files have only a marginal effect on agent behavior and are likely only desirable when manually written. This highlights a concrete gap between current agent‑developer recommendations and observed outcomes, and motivates future work on principled ways to automatically generate concise, task‑relevant guidance for coding agents.”

Community Reactions

Developer 1 – Pro‑`AGENTS.md`

“I read the study. I think it does the opposite of what the authors suggest—it’s actually vouching for good AGENTS.md files.”

— Hacker News comment

“The biggest use case for AGENTS.md files is domain knowledge that the model is not aware of and cannot instantly infer from the project. That is gained slowly over time from seeing the agents struggle due to this deficiency. Exactly the kind of thing very common in closed‑source, yet incredibly rare in public GitHub projects that have an AGENTS.md file—the huge majority of which are recent small vibe‑coded projects centered around LLMs. If 4 % gains are seen on the latter kind of project, which will have a very mixed quality of AGENTS files in the first place, then for bigger projects with high‑quality .md’s they’re invaluable when working with agents.”

Developer 2 – `AGENTS.md` as a Human Tool

“I’ve maintained a CLAUDE.md file for about 3 months now across two projects and the improvement is noticeable but not for the reasons you’d expect. The actual token‑level context it provides matters less than the fact that writing it forces you to articulate things about your codebase that were previously just in your head.”

— Reddit comment

Takeaway

LLM‑generated AGENTS.md files tend to hurt performance and raise costs.
Human‑written files can give a modest success‑rate boost but also increase steps and inference cost.
The primary value of AGENTS.md may lie in the act of documentation for developers rather than as a direct aid to AI agents.

Future work should explore principled, concise, task‑relevant guidance that truly benefits coding agents without incurring unnecessary overhead.

“we use this weird pattern for X because of a legacy constraint in Y.”  
Once that’s written down, the agent picks it up, but so does every new human on the team.

Developers can [review the paper online](https://arxiv.org/abs/2602.11988).  
The use of context files, such as `AGENTS.md`, `CLAUDE.md`, or `.cursorrules`, [grew in importance in the second half of 2025](https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation), coinciding with a larger push by AI coding agent providers.

About the Author

Bruno Couriol

Show more / Show less

New Research Reassesses the Value of Agents.md Files for AI Coding

Overview

Authors & Motivation

Dataset: AGENTbench

Tested Agents

Experimental Scenarios

Evaluation Metrics

Key Findings

Authors’ Conclusions

Community Reactions

Developer 1 – Pro‑`AGENTS.md`

Developer 2 – `AGENTS.md` as a Human Tool

Takeaway

About the Author

Bruno Couriol

Related posts

RISC-V Is Sloooow

HyperCard discovery: Neuromancer, Count Zero, Mona Lisa Overdrive (2022)

Google to Discontinue Widevine Cloud License Service in April 2027

FFmpeg-over-IP – Connect to remote FFmpeg servers

Overview

Authors & Motivation

Dataset: AGENTbench

Tested Agents

Experimental Scenarios

Evaluation Metrics

Key Findings

Authors’ Conclusions

Community Reactions

Developer 1 – Pro‑AGENTS.md

Developer 2 – AGENTS.md as a Human Tool

Takeaway

About the Author

Bruno Couriol

Related posts

RISC-V Is Sloooow

HyperCard discovery: Neuromancer, Count Zero, Mona Lisa Overdrive (2022)

Google to Discontinue Widevine Cloud License Service in April 2027

FFmpeg-over-IP – Connect to remote FFmpeg servers

Developer 1 – Pro‑`AGENTS.md`

Developer 2 – `AGENTS.md` as a Human Tool