Building Agent Skills from Scratch
Source: Dev.to
Agent Skills – How They Work & How to Integrate Them
View the complete implementation on GitHub
What Are Agent Skills?
Agent skills solve a simple problem: system prompts become bloated when you try to make an agent good at everything.
Instead of stuffing everything into one massive prompt:
You're an expert at code review, git, file organization, API testing...
[2000 lines of instructions]
you define discrete, reusable skills:
You have access to these skills:
- code-review: Reviews code for bugs and security
- git-helper: Git workflows and troubleshooting
- file-organizer: Organizes files intelligently
- api-tester: Tests REST APIs
Load them when needed.
Each skill is a self‑contained markdown file that the agent can load on demand.
The Core Idea
A skill is a markdown file stored in a directory. It consists of:
- YAML front‑matter – contains
name,description, and optional metadata.
(See this guide on front‑matter if you’re new to it.) - Markdown body – detailed instructions that act as a temporary system prompt.
When the agent needs expertise, it loads the relevant skill:
User: "Review this code for SQL injection"
↓
Agent: "I need the code‑review skill"
↓
System: [Loads SKILL.md with security guidelines]
↓
Agent: [Follows those guidelines]
Skills are therefore structured, modular prompts that are discoverable and loaded only when required.
How It Actually Works
1️⃣ Discovery
Scan a directory for SKILL.md files and parse only their front‑matter.
This gives you a list of available skills without loading the full content, keeping memory usage low.
skills/
├── code-review/
│ └── SKILL.md # name: code-review, description: …
├── git-helper/
│ └── SKILL.md
Implementation details can be found in the repository’s discovery module.
2️⃣ Tool Registration
Convert each skill into an OpenAI function tool. The LLM sees these as callable functions:
{
"name": "activate_skill_code_review",
"description": "Reviews code for bugs, security, best practices"
},
{
"name": "activate_skill_git_helper",
"description": "Git workflows and troubleshooting"
}
The description is crucial—it guides the LLM in deciding which skill to activate. Be as specific and clear as possible.
3️⃣ Activation
When the LLM calls a skill function:
- Load the full
SKILL.mdcontent from disk. - Return it to the model as a tool result (i.e., a temporary system prompt).
- Let the LLM continue, now guided by the skill’s instructions.
This lazy‑loading approach means that if you have 20 skills but only use 2, only those 2 are ever read into memory.
4️⃣ Execution
The LLM reads the skill instructions and follows them as if they were part of the system prompt for that specific turn. After the task is complete, the skill’s instructions naturally fade from context unless you explicitly retain them for multi‑turn interactions.
What a Skill Looks Like
---
name: code-review
description: Reviews code for bugs, security, and best practices
version: 1.0.0
---
# Code Review Skill
You are an expert code reviewer.
* Identify potential bugs and security vulnerabilities.
* Suggest improvements for readability and performance.
* Provide concrete code snippets for fixes.
Each skill follows this same structure: front‑matter + detailed markdown body.
Quick Recap
| Step | What Happens |
|---|---|
| Discovery | Scan directory, read only front‑matter. |
| Tool Registration | Register each skill as an OpenAI function. |
| Activation | LLM calls a function → load full markdown → send as tool result. |
| Execution | LLM follows the skill’s instructions for the current task. |
With this pattern you keep your system prompts lean, make expertise modular, and let the agent dynamically pull in the exact knowledge it needs. Happy coding!
Checklist
Security
- SQL injection in queries
- XSS in user inputs
- Authentication bypasses
Quality
- Readability
- Maintainability
- DRY violations
Performance
- N+1 queries
- Memory leaks
- Inefficient algorithms
Response Format
Summary: Brief assessment
Critical Issues: Security problems (if any)
Improvements: Suggestions for better code
Positives: What works well
Why This Pattern Works
1. Context Efficiency
Instead of loading 10 KB of instructions upfront, you load 100 bytes of metadata. Full instructions only come in when needed. This matters when you’re paying per token.
2. Modularity
Each skill is independent. Add a new one by dropping in a SKILL.md file—no code changes needed. Want to remove a skill? Delete the directory.
3. Clarity
When debugging, you can see exactly which skill was activated and what instructions it provided. This makes troubleshooting much easier than a monolithic prompt.
4. Reusability
Share skills across projects. Someone else’s api-tester skill works in your agent with zero modification. Skills become a shared library of expertise.
Key Design Decisions
Lazy Loading
Don’t load all skills into memory at startup—this defeats the purpose because you’re back to loading everything upfront.
Do load on demand. Parse front‑matter during discovery, but keep the full content on disk until the LLM actually requests it.
Function Naming
Prefix skill functions clearly, e.g., activate_skill_code_review. This makes it obvious in logs what’s happening. When you see activate_skill_* in your logs, you know a skill was activated.
Conversation Flow
The exact sequence matters. Here’s what happens:
- User sends a message.
- LLM responds with
tool_calls(requesting a skill).
Critical: Add an assistant message withtool_callsto the conversation. - Add a tool message with the skill content.
- LLM continues with skill instructions.
- Final response.
If you skip step 3, OpenAI will reject your request. The tool_calls must be properly formatted with a type field and a nested function object. This is a common gotcha. (See OpenAI’s tools documentation for details.)
Looping for Multiple Tool Calls
Skills can chain. A skill might activate code execution, which might need another skill. Your agent should loop until there are no more tool calls:
while True:
response = llm.chat(messages=messages, tools=tools)
if not response.get("tool_calls"):
break
handle_tool_calls(response)
Always pass tools in every call, even after skill activation. Otherwise, skills can’t use other tools like code execution. (See full implementation for the complete loop logic.)
Practical Considerations
Skill Scope
One skill = one domain. Keep them focused.
Good examples: code-review, git-helper, api-tester
Bad example: developer-tools (too broad)
Skill Structure
Use clear sections with examples:
- What the skill does
- How to approach tasks
- Expected output format
- Examples of good results
A wall of text doesn’t work. Structure helps the LLM follow instructions.
Error Handling
What if a skill doesn’t exist? Return a helpful error, e.g.:
"Skill 'xyz' not found. Available: code-review, git-helper"
Common Mistakes & Troubleshooting
Loading Everything Upfront
Problem: Some implementations load all skills at startup, wasting memory and context tokens.
Fix: Load only metadata during discovery. Activate skills on demand.
Vague Skill Descriptions
The LLM uses skill descriptions to decide which to activate. Be specific.
- ❌ “Helps with code”
- ✅ “Reviews Python/JavaScript code for security vulnerabilities, PEP 8 compliance, and performance issues”
Include what the skill does, the task types it handles, and key capabilities.
Wrong Tool Calls Format
Error: Missing required parameter: messages[1].tool_calls[0].type
Cause: OpenAI requires a specific nested structure. tool_calls must have a type field and nest the function details under a function key.
Fix: Use the correct format with type: "function" and a nested function object. See the OpenAI tools documentation.
Forgetting to Include Tools After Skill Activation
Problem: After activating a skill, the LLM can’t use other tools like code execution.
Fix: Always pass tools in every LLM call. Don’t remove tools after skill activation because skills might need them.
No Structure in Skills
A wall of text doesn’t work. Use clear headings, bullet points, code examples, and expected output formats. The LLM follows structured instructions much better than prose.
When Skills Make Sense
Good fit:
- Multi‑domain agents that handle code, git, and DevOps
- Agents with specialized workflows
- Teams sharing common patterns
- Situations where you hit context limits
Not needed:
- Single‑purpose agents
- Agents with small, focused prompts
- Prototypes and experiments
Don’t over‑engineer. If your system prompt is small and manageable, you probably don’t need skills.
The Standard
AgentSkills.io defines the open format:
SKILL.mdnaming convention- YAML front‑matter schema
- Directory structure
- Best practices
Following the standard means your skills work with other implementations. Skills become portable across projects and teams.
Building Your First Skill
- Create the directory
mkdir -p skills/my-first-skill - Create
SKILL.mdwith YAML front‑matter and markdown instructions. - Integrate
SkillsManagerinto your agent – see the GitHub repo for full code. - Test it by asking your agent to use the skill and verifying that it activates.
That’s it. No code changes are needed to add new skills—just drop in a
SKILL.mdfile.
Bottom Line
Agent skills are structured prompts with a loading mechanism. The pattern works because:
- It keeps context lean by only loading what you need.
- It makes agents modular since skills are independent.
- It enables skill reuse, so you can share skills across projects.
- It simplifies debugging with clear activation logs.
You can build a working implementation in an afternoon. The core SkillsManager is about 130 lines of Python. (View the implementation)
- Start with one skill.
- See if it helps.
- Expand from there.
The complete working implementation is available on GitHub. Use it as a reference or starting point for your own agent.
Resources
- AgentSkills.io – Official specification
- Claude Skills – Anthropic’s skill examples
- Open Agent Skills – Community skill library
- Working Implementation – Complete code from this tutorial
- WTF is Frontmatter? – Understanding front‑matter/metadata in markdown files
- Anatomy of a Prompt – Guide to crafting effective AI prompts with structured approaches
- OpenAI Function Calling – Official OpenAI tools documentation