Building Agent Skills from Scratch

Published: (December 26, 2025 at 03:20 AM EST)
7 min read
Source: Dev.to

Source: Dev.to

Agent Skills – How They Work & How to Integrate Them

View the complete implementation on GitHub

What Are Agent Skills?

Agent skills solve a simple problem: system prompts become bloated when you try to make an agent good at everything.

Instead of stuffing everything into one massive prompt:

You're an expert at code review, git, file organization, API testing...
[2000 lines of instructions]

you define discrete, reusable skills:

You have access to these skills:
- code-review: Reviews code for bugs and security
- git-helper: Git workflows and troubleshooting
- file-organizer: Organizes files intelligently
- api-tester: Tests REST APIs

Load them when needed.

Each skill is a self‑contained markdown file that the agent can load on demand.

The Core Idea

A skill is a markdown file stored in a directory. It consists of:

  1. YAML front‑matter – contains name, description, and optional metadata.
    (See this guide on front‑matter if you’re new to it.)
  2. Markdown body – detailed instructions that act as a temporary system prompt.

When the agent needs expertise, it loads the relevant skill:

User: "Review this code for SQL injection"

Agent: "I need the code‑review skill"

System: [Loads SKILL.md with security guidelines]

Agent: [Follows those guidelines]

Skills are therefore structured, modular prompts that are discoverable and loaded only when required.

How It Actually Works

1️⃣ Discovery

Scan a directory for SKILL.md files and parse only their front‑matter.
This gives you a list of available skills without loading the full content, keeping memory usage low.

skills/
├── code-review/
│   └── SKILL.md   # name: code-review, description: …
├── git-helper/
│   └── SKILL.md

Implementation details can be found in the repository’s discovery module.

2️⃣ Tool Registration

Convert each skill into an OpenAI function tool. The LLM sees these as callable functions:

{
  "name": "activate_skill_code_review",
  "description": "Reviews code for bugs, security, best practices"
},
{
  "name": "activate_skill_git_helper",
  "description": "Git workflows and troubleshooting"
}

The description is crucial—it guides the LLM in deciding which skill to activate. Be as specific and clear as possible.

3️⃣ Activation

When the LLM calls a skill function:

  1. Load the full SKILL.md content from disk.
  2. Return it to the model as a tool result (i.e., a temporary system prompt).
  3. Let the LLM continue, now guided by the skill’s instructions.

This lazy‑loading approach means that if you have 20 skills but only use 2, only those 2 are ever read into memory.

4️⃣ Execution

The LLM reads the skill instructions and follows them as if they were part of the system prompt for that specific turn. After the task is complete, the skill’s instructions naturally fade from context unless you explicitly retain them for multi‑turn interactions.

What a Skill Looks Like

---
name: code-review
description: Reviews code for bugs, security, and best practices
version: 1.0.0
---

# Code Review Skill

You are an expert code reviewer.

* Identify potential bugs and security vulnerabilities.
* Suggest improvements for readability and performance.
* Provide concrete code snippets for fixes.

Each skill follows this same structure: front‑matter + detailed markdown body.

Quick Recap

StepWhat Happens
DiscoveryScan directory, read only front‑matter.
Tool RegistrationRegister each skill as an OpenAI function.
ActivationLLM calls a function → load full markdown → send as tool result.
ExecutionLLM follows the skill’s instructions for the current task.

With this pattern you keep your system prompts lean, make expertise modular, and let the agent dynamically pull in the exact knowledge it needs. Happy coding!

Checklist

Security

  • SQL injection in queries
  • XSS in user inputs
  • Authentication bypasses

Quality

  • Readability
  • Maintainability
  • DRY violations

Performance

  • N+1 queries
  • Memory leaks
  • Inefficient algorithms

Response Format

Summary: Brief assessment
Critical Issues: Security problems (if any)
Improvements: Suggestions for better code
Positives: What works well

Why This Pattern Works

1. Context Efficiency

Instead of loading 10 KB of instructions upfront, you load 100 bytes of metadata. Full instructions only come in when needed. This matters when you’re paying per token.

2. Modularity

Each skill is independent. Add a new one by dropping in a SKILL.md file—no code changes needed. Want to remove a skill? Delete the directory.

3. Clarity

When debugging, you can see exactly which skill was activated and what instructions it provided. This makes troubleshooting much easier than a monolithic prompt.

4. Reusability

Share skills across projects. Someone else’s api-tester skill works in your agent with zero modification. Skills become a shared library of expertise.

Key Design Decisions

Lazy Loading

Don’t load all skills into memory at startup—this defeats the purpose because you’re back to loading everything upfront.
Do load on demand. Parse front‑matter during discovery, but keep the full content on disk until the LLM actually requests it.

Function Naming

Prefix skill functions clearly, e.g., activate_skill_code_review. This makes it obvious in logs what’s happening. When you see activate_skill_* in your logs, you know a skill was activated.

Conversation Flow

The exact sequence matters. Here’s what happens:

  1. User sends a message.
  2. LLM responds with tool_calls (requesting a skill).
    Critical: Add an assistant message with tool_calls to the conversation.
  3. Add a tool message with the skill content.
  4. LLM continues with skill instructions.
  5. Final response.

If you skip step 3, OpenAI will reject your request. The tool_calls must be properly formatted with a type field and a nested function object. This is a common gotcha. (See OpenAI’s tools documentation for details.)

Looping for Multiple Tool Calls

Skills can chain. A skill might activate code execution, which might need another skill. Your agent should loop until there are no more tool calls:

while True:
    response = llm.chat(messages=messages, tools=tools)
    if not response.get("tool_calls"):
        break
    handle_tool_calls(response)

Always pass tools in every call, even after skill activation. Otherwise, skills can’t use other tools like code execution. (See full implementation for the complete loop logic.)

Practical Considerations

Skill Scope

One skill = one domain. Keep them focused.

Good examples: code-review, git-helper, api-tester
Bad example: developer-tools (too broad)

Skill Structure

Use clear sections with examples:

  • What the skill does
  • How to approach tasks
  • Expected output format
  • Examples of good results

A wall of text doesn’t work. Structure helps the LLM follow instructions.

Error Handling

What if a skill doesn’t exist? Return a helpful error, e.g.:

"Skill 'xyz' not found. Available: code-review, git-helper"

Common Mistakes & Troubleshooting

Loading Everything Upfront

Problem: Some implementations load all skills at startup, wasting memory and context tokens.

Fix: Load only metadata during discovery. Activate skills on demand.

Vague Skill Descriptions

The LLM uses skill descriptions to decide which to activate. Be specific.

  • ❌ “Helps with code”
  • ✅ “Reviews Python/JavaScript code for security vulnerabilities, PEP 8 compliance, and performance issues”

Include what the skill does, the task types it handles, and key capabilities.

Wrong Tool Calls Format

Error: Missing required parameter: messages[1].tool_calls[0].type

Cause: OpenAI requires a specific nested structure. tool_calls must have a type field and nest the function details under a function key.

Fix: Use the correct format with type: "function" and a nested function object. See the OpenAI tools documentation.

Forgetting to Include Tools After Skill Activation

Problem: After activating a skill, the LLM can’t use other tools like code execution.

Fix: Always pass tools in every LLM call. Don’t remove tools after skill activation because skills might need them.

No Structure in Skills

A wall of text doesn’t work. Use clear headings, bullet points, code examples, and expected output formats. The LLM follows structured instructions much better than prose.

When Skills Make Sense

Good fit:

  • Multi‑domain agents that handle code, git, and DevOps
  • Agents with specialized workflows
  • Teams sharing common patterns
  • Situations where you hit context limits

Not needed:

  • Single‑purpose agents
  • Agents with small, focused prompts
  • Prototypes and experiments

Don’t over‑engineer. If your system prompt is small and manageable, you probably don’t need skills.

The Standard

AgentSkills.io defines the open format:

  • SKILL.md naming convention
  • YAML front‑matter schema
  • Directory structure
  • Best practices

Following the standard means your skills work with other implementations. Skills become portable across projects and teams.

Building Your First Skill

  1. Create the directory
    mkdir -p skills/my-first-skill
  2. Create SKILL.md with YAML front‑matter and markdown instructions.
  3. Integrate SkillsManager into your agent – see the GitHub repo for full code.
  4. Test it by asking your agent to use the skill and verifying that it activates.

That’s it. No code changes are needed to add new skills—just drop in a SKILL.md file.

Bottom Line

Agent skills are structured prompts with a loading mechanism. The pattern works because:

  • It keeps context lean by only loading what you need.
  • It makes agents modular since skills are independent.
  • It enables skill reuse, so you can share skills across projects.
  • It simplifies debugging with clear activation logs.

You can build a working implementation in an afternoon. The core SkillsManager is about 130 lines of Python. (View the implementation)

  1. Start with one skill.
  2. See if it helps.
  3. Expand from there.

The complete working implementation is available on GitHub. Use it as a reference or starting point for your own agent.

Resources

Back to Blog

Related posts

Read more »

Giving AI Roles and Names

The Instability Problem Ask the same question to AI twice. You'll get different answers. Not wrong answers—just inconsistent. Different emphasis, different str...