A skill that bundles everything into one file isn't a skill — it's a 12,000-word system prompt with a YAML hat. It loads in full whenever it matches. It pushes other skills out of context. It makes every interaction slower. And it costs token budget on parts the agent never needed to read.
The fix is layering, but most authors skip it. Here's the discipline.
The three levels
| Level | What loads | When | Token cost |
|---|---|---|---|
| Frontmatter | YAML metadata (name, description, license) | Always — sits in the system prompt | Always paid |
SKILL.md body |
Instructions, examples, the "how to use it" | When the description matches the prompt | Paid on every match |
references/ files |
Detail docs the body links to | Only when the body tells the agent to read them | Paid only on use |
The win is the third level. Anything you can move from level two to level three is context the agent doesn't pay for on sessions where it doesn't need that information.
What belongs at each level
Frontmatter is for matching. Name, description, license, allowed tools. Nothing else. If you find yourself writing prose in the frontmatter, you've put it at the wrong level.
SKILL.md body is for the core workflow. Step-by-step instructions for the most common path. Error handling for the most common failures. Two or three short examples. Cross-references to deeper material in references/.
A good SKILL.md reads like a runbook: "do this, then this, then this, and here's what 'this' looks like." It does not read like a manual.
references/ is for everything that's true but rarely needed. The exhaustive API reference. The full error catalog. The decision tree for edge cases. The agent navigates there when the body tells it to, and only then.
The 5,000-word ceiling
The honest test for SKILL.md: is it under 5,000 words? Past that, the agent loses track of which section applies and which doesn't. Past 10,000, every match pays a context cost that crowds out everything else loaded in the session.
What gets demoted when the body is too long:
- Multi-page tables of error codes →
references/errors.md. The body says "if the call fails, checkreferences/errors.mdfor the code mapping." - Full API request/response examples →
references/api-patterns.md. The body has one canonical example; the rest live in the reference. - The decision tree for "which tool to call when" → if it has more than three branches, move it to
references/decision-tree.mdand link from the body. - Templates →
assets/. A 600-line report template isn't instructions; it's an output. Don't paste it into the body.
A before/after that makes the discipline concrete
A bloated skill on disk:
report-generator/
└── SKILL.md (8,400 words: workflow + every template + every error case)
The disciplined version:
report-generator/
├── SKILL.md (1,200 words: the workflow, two examples, links out)
├── references/
│ ├── error-catalog.md (loaded only when an error fires)
│ ├── api-patterns.md (loaded only when the workflow hits the API step)
│ └── output-rules.md (loaded only for the final formatting pass)
└── assets/
├── report-template.md
└── summary-template.md
Both versions trigger on the same prompts. The second loads ~85% less content per session. The references load when they're actually needed.
Reference linking the agent will actually follow
Linking to a reference is not the same as making the agent read it. The body has to give the agent a reason to navigate, in a sentence the agent can match against.
Before writing queries, consult `references/api-patterns.md` for:
- Rate limiting guidance
- Pagination patterns
- Error codes and handling
Three bullets. Each one is a phrase the agent will see in the prompt context and recognize as a reason to fetch the file. Without this kind of pointer, the reference sits unread.
When to skip progressive disclosure
Two cases where flattening everything into SKILL.md is the right call:
- Total skill content fits in 1,500 words. Splitting into references at that size adds ceremony without saving tokens. Keep it flat.
- Every invocation needs every section. Rare, but it happens — usually for skills that gate on a strict workflow where every step references the next. Splitting just adds navigation overhead.
The opposite mistake — splitting too aggressively, creating fifteen reference files for a skill used twice a week — is also real. Each reference is a navigation hop. If the agent has to chain four file reads before it can act, you've replaced "big file" with "fragmented file."
If your
SKILL.mdis over 5,000 words, you've written documentation, not a skill. The agent pays for documentation on every match. The references folder exists so it doesn't have to.