Refactoring Agent Skills: From Context Explosion to a Fast, Reliable Workflow

Published: 2 months ago (February 15, 2026 at 05:13 PM EST)

8 min read

Source: Dev.to

Source: Dev.to

1️⃣ The Root Cause: Treating Skills Like Docs

The first trap is incredibly human:

“If I include everything, the model will always have what it needs.”

So you create one Skill per tool, and each Skill becomes a documentation dump:

setup steps
API references
exhaustive examples
“don’t do X” lists
every edge case since 2017

Then a task like “deploy a serverless function with a small UI” pulls in:

your Cloudflare Skill
your Docker Skill
your UI‑styling Skill
your web‑framework Skill …

…and the model starts its job already half‑drowned.

Claude Code’s own docs warn that Skills share the context window with the conversation, the request, and other Skills—meaning uncontrolled loading is a direct performance tax (you feel it as slowness, drift, and “why is it ignoring the obvious part?”).

Bottom line: your problem isn’t “lack of info.” It’s “too much irrelevant info.”

2️⃣ The Fix: Progressive Disclosure (Three Layers)

Claude Code docs explicitly recommend progressive disclosure: keep essential info in SKILL.md, and store the heavy stuff in separate files that get loaded only when the task requires them.

This maps cleanly to a three‑layer system:

Layer 1 – Metadata (always loaded)

A short YAML front‑matter with name, description, and a routing signal. Think of it like a book cover and blurb—you’re not teaching, you’re helping the model decide whether to open the book.

Layer 2 – Entry point: `SKILL.md` (loaded on activation)

Your navigation map:

what the Skill is for
when to use it
high‑level steps to follow
which reference files to open next

Not a tutorial, not a wiki.

Layer 3 – References & scripts (loaded only when needed)

Small, focused files:

one topic per file
~200–300 lines per file is a good target
scripts do deterministic work so the model doesn’t burn tokens “describing” actions

Example folder layout

.claude/skills/devops/
├── SKILL.md
├── references/
│   ├── serverless-cloudflare.md
│   ├── containers-docker.md
│   └── ci-cd-basics.md
└── scripts/
    ├── validate_env.py
    └── deploy_helper.sh

3️⃣ The “200‑Line Rule”: Brutal, Slightly Arbitrary, Weirdly Effective

In the community refactor story, the author landed on a hard constraint:

Keep SKILL.md under ~200 lines.

If you can’t, you’re putting too much in the entry point.

Claude’s own best‑practice docs recommend keeping the body under a few hundred lines (and splitting content as you approach that limit). “200 lines” is a sharper knife: it forces you to write a table of contents, not a textbook.

Why it works

The model can scan the entry quickly.
It can decide which reference file to load next.
The total “initial load” stays small enough that the conversation still has room to breathe.

Quick test you can steal

Start a fresh session (cold start).
Trigger your Skill.
If the first activation loads more than ~500 lines of content, your design is likely leaking scope.

4️⃣ The Real Mental Shift: From Tool‑Centric to Workflow‑Centric

This is the part most people miss.

Tool‑centric Skills (problematic)

cloudflare-skill
tailwind-skill
postgres-skill
kubernetes-skill

They’re encyclopedias. They don’t compose well.

Workflow‑centric Skills (recommended)

devops          (deploy + environments + CI/CD)
ui-styling      (design rules + component patterns)
web-frameworks  (routing + project structure + SSR pitfalls)
databases       (schema design + migrations + query patterns)

They map to what you actually do during development.

A workflow Skill answers:

“When I’m in this stage of work, what does the agent need to know to act correctly?”

—not—

“What is everything this tool can do?”

That reframing prevents context blow‑ups almost by itself.

5️⃣ A Minimal, Production‑Grade `SKILL.md` (Example)

Below is a deliberately small entry point you can copy and customise. Notice what’s missing: long examples, full docs, and “everything you might ever need.”

---
name: ui-styling
description: Apply consistent UI styling across the app (Tailwind + component conventions). Use when building or refactoring UI.
---

# UI Styling Skill

## When to use
- Starting a new UI component.
- Refactoring existing components for consistency.
- Updating the design system.

## High‑level workflow
1. **Identify** the component or page that needs styling.
2. **Run** `scripts/apply_tailwind.sh` to scaffold Tailwind classes.
3. **Reference** `references/tailwind-utilities.md` for utility‑class guidance.
4. **Validate** with `scripts/check_style.py` to ensure no lint errors.

## Reference files (load on demand)
- `references/tailwind-utilities.md` – list of approved utilities and patterns.
- `references/component-conventions.md` – naming, folder structure, and composition rules.

## Scripts (load on demand)
- `scripts/apply_tailwind.sh` – injects Tailwind classes into a file.
- `scripts/check_style.py` – runs style linting and reports violations.

## Quick tip
If a component already follows the design system, skip step 2 and go straight to validation.

TL;DR Checklist

Metadata only in the YAML front‑matter.
Keep SKILL.md ≤ 200 lines.
Store heavy content in references/ and scripts/.
Design Skills around workflows, not individual tools.
Test with a cold start; aim for ≤ 500 lines on first load.

Apply this playbook, and you’ll watch your context window breathe again. 🚀

When to use

You are building UI components or pages.
You need consistent spacing, typography, and responsive behavior.
You need to align with existing design conventions.

Workflow

Identify the UI surface (page/component) and its constraints (responsive, dark mode, accessibility).
Apply styling rules from the references—pick only what you need.
Validate the output against the checklist.

References (load only if needed)

references/design-tokens.md — Spacing, font scale, colour usage
references/tailwind-patterns.md — Layouts, common utility combos
references/accessibility-checklist.md — Keyboard, focus, contrast

Output contract

Use UK English in UI strings.
Prefer reusable components over copy‑paste blocks.
Keep className readable (extract when it gets messy).

Full‑screen toggles (example)

Enter fullscreen mode
Exit fullscreen mode

That’s it.

The Skill’s job is to route the agent to the right file at the right moment — not to become an on‑page encyclopedia.

6️⃣ Measuring Improvements (Without Lying to Yourself)

If you want repeatable results, track metrics that actually matter:

Initial lines loaded on activation.
Time to activation (roughly: how “snappy” it feels).
Relevance ratio (how much of the loaded content is used).
Context overflow frequency (how often long tasks crash).

You don’t need a full observability stack; a simple repository‑audit script is enough.

Tiny Python audit: count lines per Skill

from pathlib import Path

skills_dir = Path(".claude/skills")

def count_lines(p: Path) -> int:
    """Return the number of lines in a file, ignoring decode errors."""
    return sum(1 for _ in p.open("r", encoding="utf-8", errors="ignore"))

for skill in sorted(skills_dir.iterdir()):
    skill_md = skill / "SKILL.md"
    if skill_md.exists():
        lines = count_lines(skill_md)
        status = "OK" if lines < 200 else "TOO LONG"
        print(f"{skill.name}: {lines} lines – {status}")

Run this weekly and you’ll catch “documentation creep” before it becomes a crisis.

7️⃣ Common Failure Modes (And How to Avoid Them)

Failure mode: Claude writes “a doc” instead of “a Skill”

LLMs love expanding markdown into tutorials.

Fix:

Explicitly tell the model: this is not documentation.
Remove “beginner” filler.
Keep examples short; push detail into reference files.

Failure mode: Entry point bloats because the Skill scope is too wide

Fix:

Split the Skill by workflow stage.
Or move decision trees into reference files.

Failure mode: Too many references, still hard to navigate

Fix:

Add a short “map” section in SKILL.md.
Keep reference files single‑topic and named by intent, not by tool.

8️⃣ A Copyable Refactor Checklist

Audit – list Skills + line counts; flag any SKILL.md > 200 lines.
Group by workflow – merge tool‑specific Skills into capability Skills.
Create references – move detailed info out of SKILL.md.
Enforce entry constraints – keep SKILL.md lean and navigational.
Cold‑start test – ensure the first activation stays under your chosen budget.
Keep scripts deterministic – offload “do the thing” to code where possible.
Re‑check monthly – Skills drift over time; treat them like code.

Final take: Context engineering is “right info, right time”

The big lesson isn’t “200 lines” or “three layers”.
It’s this:

Context is a budget.

The best Skill design spends it like an engineer, not like a librarian.
Don’t load everything. Load what matters — when it matters — and keep the rest one file away.

Refactoring Agent Skills: From Context Explosion to a Fast, Reliable Workflow

1️⃣ The Root Cause: Treating Skills Like Docs

2️⃣ The Fix: Progressive Disclosure (Three Layers)

Layer 1 – Metadata (always loaded)

Layer 2 – Entry point: `SKILL.md` (loaded on activation)

Layer 3 – References & scripts (loaded only when needed)

Example folder layout

3️⃣ The “200‑Line Rule”: Brutal, Slightly Arbitrary, Weirdly Effective

Why it works

Quick test you can steal

4️⃣ The Real Mental Shift: From Tool‑Centric to Workflow‑Centric

Tool‑centric Skills (problematic)

Workflow‑centric Skills (recommended)

5️⃣ A Minimal, Production‑Grade `SKILL.md` (Example)

TL;DR Checklist

When to use

Workflow

References (load only if needed)

Output contract

Full‑screen toggles (example)

6️⃣ Measuring Improvements (Without Lying to Yourself)

Tiny Python audit: count lines per Skill

7️⃣ Common Failure Modes (And How to Avoid Them)

Failure mode: Claude writes “a doc” instead of “a Skill”

Failure mode: Entry point bloats because the Skill scope is too wide

Failure mode: Too many references, still hard to navigate

8️⃣ A Copyable Refactor Checklist

Final take: Context engineering is “right info, right time”

Related posts

Are 'Agent Skills' the Secret Sauce for AI Productivity?

When an AI Keeps Forgetting: Why LLM Workflows Collapse and What to Build Instead

Are We Over-Engineering LLM Stacks Too Early?

Why Lose Context in Claude Sessions? A Claude-Mem Solution

1️⃣ The Root Cause: Treating Skills Like Docs

2️⃣ The Fix: Progressive Disclosure (Three Layers)

Layer 1 – Metadata (always loaded)

Layer 2 – Entry point: SKILL.md (loaded on activation)

Layer 3 – References & scripts (loaded only when needed)

Example folder layout

3️⃣ The “200‑Line Rule”: Brutal, Slightly Arbitrary, Weirdly Effective

Why it works

Quick test you can steal

4️⃣ The Real Mental Shift: From Tool‑Centric to Workflow‑Centric

Tool‑centric Skills (problematic)

Workflow‑centric Skills (recommended)

5️⃣ A Minimal, Production‑Grade SKILL.md (Example)

TL;DR Checklist

When to use

Workflow

References (load only if needed)

Output contract

Full‑screen toggles (example)

6️⃣ Measuring Improvements (Without Lying to Yourself)

Tiny Python audit: count lines per Skill

7️⃣ Common Failure Modes (And How to Avoid Them)

Failure mode: Claude writes “a doc” instead of “a Skill”

Failure mode: Entry point bloats because the Skill scope is too wide

Failure mode: Too many references, still hard to navigate

8️⃣ A Copyable Refactor Checklist

Final take: Context engineering is “right info, right time”

Related posts

Are 'Agent Skills' the Secret Sauce for AI Productivity?

When an AI Keeps Forgetting: Why LLM Workflows Collapse and What to Build Instead

Are We Over-Engineering LLM Stacks Too Early?

Why Lose Context in Claude Sessions? A Claude-Mem Solution

Layer 1 – Metadata (always loaded)

Layer 2 – Entry point: `SKILL.md` (loaded on activation)

Layer 3 – References & scripts (loaded only when needed)

5️⃣ A Minimal, Production‑Grade `SKILL.md` (Example)