The Three-Level Skill, and Why Yours Is Probably One Level Too Many

Skills load in three levels — frontmatter, body, linked references. Most authors compress all three into one. The result is a 12,000-word SKILL.md the agent reads top to bottom whether it needs to or not, paying for the whole thing in context every time.

Initial Editor·2026-05-12·4min read·797 words

The Three-Level Skill, and Why Yours Is Probably One Level Too Many Skills load in three levels — frontmatter, body, linked references. Most authors compress all three into one. The result is a 12,000-word SKILL.md the agent reads top to bottom whether it needs to or not, paying for the whole thing in context every time. Read: https://aixfwd.com/posts/the-three-level-skill-and-why-yours-is-probably-one-level-too-many #AI #Tech #ClaudeCode #Skills #AIEngineering #DeveloperWorkflow

A skill that bundles everything into one file isn't a skill — it's a 12,000-word system prompt with a YAML hat. It loads in full whenever it matches. It pushes other skills out of context. It makes every interaction slower. And it costs token budget on parts the agent never needed to read.

The fix is layering, but most authors skip it. Here's the discipline.

The three levels

Level	What loads	When	Token cost
Frontmatter	YAML metadata (name, description, license)	Always — sits in the system prompt	Always paid
`SKILL.md` body	Instructions, examples, the "how to use it"	When the description matches the prompt	Paid on every match
`references/` files	Detail docs the body links to	Only when the body tells the agent to read them	Paid only on use

The win is the third level. Anything you can move from level two to level three is context the agent doesn't pay for on sessions where it doesn't need that information.

What belongs at each level

Frontmatter is for matching. Name, description, license, allowed tools. Nothing else. If you find yourself writing prose in the frontmatter, you've put it at the wrong level.

SKILL.md body is for the core workflow. Step-by-step instructions for the most common path. Error handling for the most common failures. Two or three short examples. Cross-references to deeper material in references/.

A good SKILL.md reads like a runbook: "do this, then this, then this, and here's what 'this' looks like." It does not read like a manual.

references/ is for everything that's true but rarely needed. The exhaustive API reference. The full error catalog. The decision tree for edge cases. The agent navigates there when the body tells it to, and only then.

The 5,000-word ceiling

The honest test for SKILL.md: is it under 5,000 words? Past that, the agent loses track of which section applies and which doesn't. Past 10,000, every match pays a context cost that crowds out everything else loaded in the session.

What gets demoted when the body is too long:

Multi-page tables of error codes → references/errors.md. The body says "if the call fails, check references/errors.md for the code mapping."
Full API request/response examples → references/api-patterns.md. The body has one canonical example; the rest live in the reference.
The decision tree for "which tool to call when" → if it has more than three branches, move it to references/decision-tree.md and link from the body.
Templates → assets/. A 600-line report template isn't instructions; it's an output. Don't paste it into the body.

A before/after that makes the discipline concrete

A bloated skill on disk:

report-generator/
└── SKILL.md  (8,400 words: workflow + every template + every error case)

The disciplined version:

report-generator/
├── SKILL.md  (1,200 words: the workflow, two examples, links out)
├── references/
│   ├── error-catalog.md   (loaded only when an error fires)
│   ├── api-patterns.md    (loaded only when the workflow hits the API step)
│   └── output-rules.md    (loaded only for the final formatting pass)
└── assets/
    ├── report-template.md
    └── summary-template.md

Both versions trigger on the same prompts. The second loads ~85% less content per session. The references load when they're actually needed.

Reference linking the agent will actually follow

Linking to a reference is not the same as making the agent read it. The body has to give the agent a reason to navigate, in a sentence the agent can match against.

Before writing queries, consult `references/api-patterns.md` for:
- Rate limiting guidance
- Pagination patterns
- Error codes and handling

Three bullets. Each one is a phrase the agent will see in the prompt context and recognize as a reason to fetch the file. Without this kind of pointer, the reference sits unread.

When to skip progressive disclosure

Two cases where flattening everything into SKILL.md is the right call:

Total skill content fits in 1,500 words. Splitting into references at that size adds ceremony without saving tokens. Keep it flat.
Every invocation needs every section. Rare, but it happens — usually for skills that gate on a strict workflow where every step references the next. Splitting just adds navigation overhead.

The opposite mistake — splitting too aggressively, creating fifteen reference files for a skill used twice a week — is also real. Each reference is a navigation hop. If the agent has to chain four file reads before it can act, you've replaced "big file" with "fragmented file."

If your SKILL.md is over 5,000 words, you've written documentation, not a skill. The agent pays for documentation on every match. The references folder exists so it doesn't have to.

The Three-Level Skill, and Why Yours Is Probably One Level Too Many

The three levels

What belongs at each level

The 5,000-word ceiling

A before/after that makes the discipline concrete

Reference linking the agent will actually follow

When to skip progressive disclosure

// more in tech

The Smallest Agent That Works, Part 3: The Three Agents With State

The Smallest Agent That Works, Part 2: The Three Reach-Out Agents

The Smallest Agent That Works, Part 1: The Three Cheap Agents

What MLX Got to Throw Away (That PyTorch Can't)

The Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy

Every Useful Skill Is One of Five Shapes

The Three-Level Skill, and Why Yours Is Probably One Level Too Many

The three levels

What belongs at each level

The 5,000-word ceiling

A before/after that makes the discipline concrete

Reference linking the agent will actually follow

When to skip progressive disclosure

// more in tech

The Smallest Agent That Works, Part 3: The Three Agents With State

The Smallest Agent That Works, Part 2: The Three Reach-Out Agents

The Smallest Agent That Works, Part 1: The Three Cheap Agents

What MLX Got to Throw Away (That PyTorch Can't)

The Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy

Every Useful Skill Is One of Five Shapes

New posts, every week.Delivered Sunday mornings.

New posts, every week.
Delivered Sunday mornings.