Tech

The Three-Level Skill, and Why Yours Is Probably One Level Too Many

Skills load in three levels — frontmatter, body, linked references. Most authors compress all three into one. The result is a 12,000-word SKILL.md the agent reads top to bottom whether it needs to or not, paying for the whole thing in context every time.

Initial Editor·2026-05-12·4min read·797 words

A skill that bundles everything into one file isn't a skill — it's a 12,000-word system prompt with a YAML hat. It loads in full whenever it matches. It pushes other skills out of context. It makes every interaction slower. And it costs token budget on parts the agent never needed to read.

The fix is layering, but most authors skip it. Here's the discipline.

The three levels

Level What loads When Token cost
Frontmatter YAML metadata (name, description, license) Always — sits in the system prompt Always paid
SKILL.md body Instructions, examples, the "how to use it" When the description matches the prompt Paid on every match
references/ files Detail docs the body links to Only when the body tells the agent to read them Paid only on use

The win is the third level. Anything you can move from level two to level three is context the agent doesn't pay for on sessions where it doesn't need that information.

What belongs at each level

Frontmatter is for matching. Name, description, license, allowed tools. Nothing else. If you find yourself writing prose in the frontmatter, you've put it at the wrong level.

SKILL.md body is for the core workflow. Step-by-step instructions for the most common path. Error handling for the most common failures. Two or three short examples. Cross-references to deeper material in references/.

A good SKILL.md reads like a runbook: "do this, then this, then this, and here's what 'this' looks like." It does not read like a manual.

references/ is for everything that's true but rarely needed. The exhaustive API reference. The full error catalog. The decision tree for edge cases. The agent navigates there when the body tells it to, and only then.

The 5,000-word ceiling

The honest test for SKILL.md: is it under 5,000 words? Past that, the agent loses track of which section applies and which doesn't. Past 10,000, every match pays a context cost that crowds out everything else loaded in the session.

What gets demoted when the body is too long:

  • Multi-page tables of error codesreferences/errors.md. The body says "if the call fails, check references/errors.md for the code mapping."
  • Full API request/response examplesreferences/api-patterns.md. The body has one canonical example; the rest live in the reference.
  • The decision tree for "which tool to call when" → if it has more than three branches, move it to references/decision-tree.md and link from the body.
  • Templatesassets/. A 600-line report template isn't instructions; it's an output. Don't paste it into the body.

A before/after that makes the discipline concrete

A bloated skill on disk:

report-generator/
└── SKILL.md  (8,400 words: workflow + every template + every error case)

The disciplined version:

report-generator/
├── SKILL.md  (1,200 words: the workflow, two examples, links out)
├── references/
│   ├── error-catalog.md   (loaded only when an error fires)
│   ├── api-patterns.md    (loaded only when the workflow hits the API step)
│   └── output-rules.md    (loaded only for the final formatting pass)
└── assets/
    ├── report-template.md
    └── summary-template.md

Both versions trigger on the same prompts. The second loads ~85% less content per session. The references load when they're actually needed.

Reference linking the agent will actually follow

Linking to a reference is not the same as making the agent read it. The body has to give the agent a reason to navigate, in a sentence the agent can match against.

Before writing queries, consult `references/api-patterns.md` for:
- Rate limiting guidance
- Pagination patterns
- Error codes and handling

Three bullets. Each one is a phrase the agent will see in the prompt context and recognize as a reason to fetch the file. Without this kind of pointer, the reference sits unread.

When to skip progressive disclosure

Two cases where flattening everything into SKILL.md is the right call:

  • Total skill content fits in 1,500 words. Splitting into references at that size adds ceremony without saving tokens. Keep it flat.
  • Every invocation needs every section. Rare, but it happens — usually for skills that gate on a strict workflow where every step references the next. Splitting just adds navigation overhead.

The opposite mistake — splitting too aggressively, creating fifteen reference files for a skill used twice a week — is also real. Each reference is a navigation hop. If the agent has to chain four file reads before it can act, you've replaced "big file" with "fragmented file."

If your SKILL.md is over 5,000 words, you've written documentation, not a skill. The agent pays for documentation on every match. The references folder exists so it doesn't have to.

// more in tech

see all →
Tech· 2026-05-29· 5min

The Smallest Agent That Works, Part 3: The Three Agents With State

Stateless agents fit most tasks. State is the most expensive capability you can add — it doubles your operational surface, breaks your debugging, and rewards exactly the use cases that can't survive without it. Memory, environment control, self-learning. Part 3 of three.

#agent-architecture#ai-engineering#ai-agents#system-design
Tech· 2026-05-27· 5min

The Smallest Agent That Works, Part 2: The Three Reach-Out Agents

When the cheap tiers run out, the agent has to reach beyond the model itself — into knowledge it doesn't have, tools it can't natively use, or its own previous answer. RAG, tool use, and self-critique: three patterns, three failure modes worth pricing in. Part 2 of three.

#llm#rag#agent-architecture#ai-engineering
Tech· 2026-05-26· 5min

The Smallest Agent That Works, Part 1: The Three Cheap Agents

Most agent stacks are built one tier too capable for the job. Three of the cheapest architectures — a fixed pipeline, an LLM with rule constraints, and a reasoning loop — solve more problems than the architecture diagrams admit. Part 1 of three.

#llm#agent-architecture#ai-engineering#ai-agents
Tech· 2026-05-15· 5min

What MLX Got to Throw Away (That PyTorch Can't)

Every mature framework is a museum of decisions you can't take back. MLX is interesting mostly because it started after the decisions that matter for Apple Silicon were already mistakes — and the things it threw away are the things that were quietly costing the rest of us the most.

#ai-engineering#apple-silicon#mlx#ml-frameworks
Tech· 2026-05-15· 5min

The Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy

For two years the industry's default answer to every inference question has been "bigger cluster." A different hardware topology is quietly making that the wrong default for a non-trivial slice of workloads — and the framework layer that earns it is the buzzword most decks haven't caught up with yet.

#hardware#ai-infrastructure#inference#edge-ai
Tech· 2026-05-14· 5min

Every Useful Skill Is One of Five Shapes

Skills aren't a freeform format. The useful ones fit one of five shapes — sequential workflow, multi-MCP coordination, iterative refinement, context-aware selection, domain-specific intelligence. Picking the right shape is most of the design work. Picking the wrong one is most of the bugs.

#claude-code#workflow#agents#skills