You're probably using Agent Skills wrong
Source: Dev.to
Background
The Claude Code ecosystem is rapidly evolving, and its naming conventions can be confusing. Among the many components, Agent Skills are often the most misused. A recent paper that surfaced on Hacker News highlights this issue:
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Agent Skills are structured packages of procedural knowledge that augment LLM agents at inference time. Despite rapid adoption, there is no standard way to measure whether they actually help. The benchmark evaluates 86 tasks across 11 domains with curated and self‑generated Skills. Curated Skills raise the average pass rate by 16.2 pp, but effects vary widely (e.g., +4.5 pp for Software Engineering, +51.9 pp for Healthcare). Self‑generated Skills provide no benefit on average.
— Xiangyi Li et al., arXiv [link]
The Hacker News headline (“Study: Self‑generated Agent Skills are useless”) is a bit editorialized, but the core finding resonates: many practitioners ask agents to write a skill before solving a task, which often amounts to a re‑implementation of “thinking blocks” with poorer results.
The Core Misstep: Self‑Generated Skills
The benchmark defines Self‑Generated Skills as:
“No Skills provided, but the agent is prompted to generate relevant procedural knowledge before solving the task. This isolates the impact of LLMs’ latent domain knowledge.”
In practice, this means taking a problem the model struggles with, asking it to write a skill for that problem, and then letting it attempt the solution. This approach:
- Reinvents the “thinking block” pattern but adds unnecessary overhead.
- Frequently yields negative deltas—the skill actually harms performance.
- Mirrors the classic mistake of asking a model to answer a question verbatim and then presenting that answer as original work.
To create a genuinely useful skill, the agent must first recognize a gap in its own knowledge or capabilities. Only then can it produce a skill that adds value beyond its latent knowledge.
What Exactly Is a Skill?
At its core, a Skill is a markdown file with optional metadata that tells agents and tools when to invoke it. Skills are typically organized in their own folder, allowing them to bundle auxiliary scripts, reference documents, or other assets.
.claude/skills/
└── monitor-gitlab-ci/
├── SKILL.md # Main skill description
├── monitor_ci.sh # Helper script
└── references/
├── api_commands.md
├── log_analysis.md
└── troubleshooting.md
In the example above, the skill enables older Claude versions to monitor a GitLab CI pipeline. The folder contains:
- SKILL.md – Human‑readable instructions and metadata.
- monitor_ci.sh – A concrete command‑line tool that the agent can invoke.
- references/ – Supplemental documentation for edge cases.
Proper Usage Patterns
1. Identify Real Gaps
Before asking an agent to generate a skill, ensure it cannot solve the task with its base knowledge. Typical signs include:
- Repeated failures or “hallucinations.”
- Requests for domain‑specific commands or APIs the model has never seen.
2. Capture the Gap as a Skill
When the agent finally overcomes the obstacle (often after human intervention), ask it:
“What piece of knowledge or procedure was missing that prevented you from succeeding earlier?”
Document that insight as a Skill, including any scripts or reference files needed for future runs.
3. Keep Skills Focused
Empirical results show that small, focused skills (2–3 modules) outperform large, monolithic documentation bundles. Aim for:
- One clear purpose per skill.
- Minimal, well‑named auxiliary files.
4. Store Skills Persistently
Because agents are stateless—each conversation starts fresh—persisting skills in a repository (e.g., the .claude/skills/ directory) ensures they are available across sessions.
5. Reuse Across Projects
When a skill proves useful in one project, consider abstracting it for broader applicability. This reduces duplication and speeds up onboarding for new agents.
Common Pitfalls to Avoid
| Pitfall | Why It Fails |
|---|---|
| Prompting the agent to write a skill before it knows the problem | The skill becomes a generic “thinking block” that adds no new information. |
| Using overly broad skills | Dilutes the benefit; agents may ignore or misuse the skill. |
| Treating skills as one‑off scripts | Without metadata, agents cannot discover when to apply them. |
| Relying solely on self‑generated skills | Benchmarks show no average improvement; curated or human‑validated skills are far more effective. |
Takeaway
- Curated, focused Skills can dramatically improve agent performance (up to +51.9 pp in some domains).
- Self‑generated Skills that are created on the fly without a clear knowledge gap typically do not help and can even hurt.
- The key to effective skill creation is recognizing a genuine deficiency, documenting it concisely, and persisting it for future reuse.
Happy hacking!