AGENTS.md vs. Skills: How We Refactored OpenClaw to Fix AI Hallucinations

Published: 3 months ago (February 2, 2026 at 10:15 AM EST)

5 min read

Source: Dev.to

Source: Dev.to

Cover image for AGENTS.md vs. Skills: How We Refactored OpenClaw to Fix AI Hallucinations

I bet everyone has had this experience.

You ask your AI to use the new Gemini 3.0 Pro model, and it argues with you: “That model is invalid, I will use 1.5 Pro instead.”
Or you are working on a Next.js project, and the AI keeps debating you, insisting on using old getStaticProps syntax when you are clearly using the App Router.

It is exhausting. You enforce rules, you add docs, you install MCP servers, you build custom “Skills”… and it still hallucinates. You feel like you are just piling rule after rule on top of a broken foundation.

I was stuck in this loop for weeks. I built complex “Research Skills” designed to force the AI to be smart, but they just turned into black boxes. I pushed a button, the AI disappeared into a script, and it came back with wrong answers (like telling me a £716 visa cost £70k).

Then, last week, I saw an article that solved everything.
Vercel’s AI team published research that completely flipped my perspective. They found that simply dividing your project knowledge into Indices (in a markdown file) vs. Skills (executable code) changed the game.

I immediately tried it on my OpenClaw agent. I deleted my complex “Black Box” skills and replaced them with a simple AGENTS.md index.

The result? It worked perfectly. The hallucinations stopped. The “syntax debates” ended. Here is why—and how you can do it too.

The Vercel Wake‑Up Call

Vercel’s AI SDK team tested this exact problem on coding agents. They compared two methods for teaching an AI about Next.js 16:

Method	Description
Skills (Tools)	Giving the AI a tool to “look up documentation.”
Context (`AGENTS.md`)	Putting the documentation index in a markdown file in the root directory.

Results

Skills: 53 % pass rate (the AI often forgot to use the tool or used it incorrectly).
Context (AGENTS.md): 100 % pass rate.

Why? Because Skills require a decision – the AI has to stop and think, “Should I check the docs?” Often it gets lazy and guesses. Context is passive – the instructions are just there. The AI doesn’t have to choose to be smart; it has no choice but to see the map.

Refactoring OpenClaw: The “Hands vs. Brains” Split

We took this data and immediately refactored our entire agent stack. We realized we were making a fundamental architecture mistake: we were building Skills for things that should have been Context.

The Old Way (Black Box)

Task: “Research this.”
Mechanism: Call Tool: Research_Skill().
Reality: The AI offloads thinking to a hidden script. It stops being an intelligence and becomes a button‑pusher.

The New Way (The Hybrid Stack)

We split our architecture into two distinct layers: Hands and Brains.

1. Brains (`AGENTS.md` + `docs/`)

This layer holds knowledge, rules, and logic.

We deleted the Research.ts skill entirely. In its place we added a simple markdown file: docs/research.md.

# Research Protocol
1. **Source of Truth:** Always check official docs (.gov, .org) first.
2. **Citation:** You must link every claim.
3. **Limit:** Max 5 searches per topic.

In AGENTS.md (the file the AI always sees) we added a single line:

For research tasks, READ docs/research.md first.

2. Hands (`skills/`)

This layer is for execution only – actions the AI cannot perform with its brain alone.

We kept skills for things the AI physically cannot do:

git – running terminal commands
whatsapp – sending API requests
remindctl – talking to macOS

The Result: Transparency

Now, when I ask: “Research the cost of a UK Global Talent Visa.”

The AI reads AGENTS.md and sees the rule: “Read docs/research.md.”
It reads the protocol: “Check official sources.”

I see it work: it generates the search query site:gov.uk global talent visa fee and returns:

“The application fee is £716. Note: Some consultants charge £70k, but that is a service fee, not the visa cost.”

It worked not because I wrote better code, but because I stopped trying to code the thinking process.

The Guide: When to Use What?

Requirement	Use This	Why?
“I need you to know X”	`AGENTS.md`	Knowledge should be passive. Don’t make the AI “search” for your coding style.
“I need you to follow process Y”	`docs/Y.md`	Rules belong in markdown. They are easier to edit and easier for the AI to read.
“I need you to touch Z”	Skill	If it needs an API key or a CLI command, wrap it in a tool.

Start Small: The “Agile” Agent

Don’t over‑engineer. Start with a single AGENTS.md file in your root:

Add your project structure.
Add your preferred tech stack.
Add a link to your docs.

Watch your agent’s IQ double overnight. The best tool you can give your AI isn’t a Python script; it’s a good README.

AGENTS.md vs. Skills: How We Refactored OpenClaw to Fix AI Hallucinations

The Vercel Wake‑Up Call

Refactoring OpenClaw: The “Hands vs. Brains” Split

The Old Way (Black Box)

The New Way (The Hybrid Stack)

1. Brains (`AGENTS.md` + `docs/`)

2. Hands (`skills/`)

The Result: Transparency

The Guide: When to Use What?

Start Small: The “Agile” Agent

Related posts

OpenAI launches Codex app for macOS, here are the details

Balagan Agent 🌪️ Chaos Engineering for AI Agents

Moltbook Exposed: It's Human Slop, Not AI Awakening

Prompt Engineering Is a Temporary Skill

The Vercel Wake‑Up Call

Refactoring OpenClaw: The “Hands vs. Brains” Split

The Old Way (Black Box)

The New Way (The Hybrid Stack)

1. Brains (AGENTS.md + docs/)

2. Hands (skills/)

The Result: Transparency

The Guide: When to Use What?

Start Small: The “Agile” Agent

Related posts

OpenAI launches Codex app for macOS, here are the details

Balagan Agent 🌪️ Chaos Engineering for AI Agents

Moltbook Exposed: It's Human Slop, Not AI Awakening

Prompt Engineering Is a Temporary Skill

1. Brains (`AGENTS.md` + `docs/`)

2. Hands (`skills/`)