The Five Layers of an Agent Setup — and When Each One Is Premature

Every agent setup looks the same on day one — a single CLAUDE.md and a handful of half-remembered prompts. Six months later, some have grown into clean, layered systems that disappear into the work. Most have grown into a yard sale.

The difference isn't how much got added. It's whether each thing was earned by a symptom the team actually had.

Five layers are worth knowing about, and they sit in a strict order. Each earns something specific. Each has a failure mode if you reach for it too early. Here's the ladder, with the conditions under which each layer is still premature.

Layer	What it earns you	Add it when	Skip it when
Memory	Always-loaded constitution	The agent re-asks the same architecture questions every session	A 40-line file already covers the rules
Skills	On-demand, modular context	You paste the same 200-line preamble into five prompts	The repeated prompts are repeated by one person on one project
Hooks	Deterministic guardrails	An incident already happened — or is one careless command away	"I'll just be careful" still holds
Subagents	Forked context windows	Main context hits 70% before the work is done	The whole task fits comfortably in one window
Plugins	Distribution to a team	Three teammates rebuilt the same skill from scratch	You're the only user, and you'll edit it again next week

1. Memory is the only layer everyone needs

A CLAUDE.md at the repo root is non-negotiable. It's the only layer that's always loaded, always active. Treat it as the constitution — architecture rules, naming conventions, test expectations, the repo's mental model in one place.

Two locations matter. ~/.claude/CLAUDE.md holds cross-project conventions: commit message rules, your preferred test runner, "never use --no-verify," "don't add Co-Authored-By lines." .claude/CLAUDE.md per repo holds repo-specific facts: which directory holds the migrations, which framework version is actually installed, what the deploy target is.

When to keep it small. If your CLAUDE.md is over 200 lines, you've turned it into documentation. It's not documentation — it's the prompt prefix on every interaction. Every line is paid for in tokens forever. Move the long-form context into a skill that loads on demand.

When not to add anything past this layer. If your repo is small enough that a 50-line CLAUDE.md captures the rules and the agent stays on track, stop. The next four layers exist to fix problems you don't have yet.

2. Skills are for prompts you've typed twice

The moment you catch yourself pasting the same 150-line preamble into a second prompt — "remember, our error handling pattern is X, here's the helper, here's the schema…" — that's the signal. Extract it into a skill.

A skill is a SKILL.md plus whatever scripts, templates, or reference docs go with it. It loads on demand based on description match, not on every turn. That's the unlock: you can have fifty skills installed and pay zero baseline tokens for the forty-eight you don't invoke this session.

Two patterns earn their keep:

Workflow skills — "draft-a-PR", "review-this-migration", "create-a-post." They encode a sequence of steps with a checklist and gate each step on user input.
Knowledge-pack skills — "the auth module", "the billing pipeline." They bundle reference material about one part of the codebase and only load when the conversation drifts that way.

When not to add a skill. If it's only ever invoked by you, only on one project, and the "skill" is two paragraphs of context — you've written a markdown note with extra ceremony. A regular note in the repo will outlive the skill format.

The other failure: a skill that always fires. If your skill's description is broad enough that it matches every prompt, it's a second CLAUDE.md wearing a hat. Tighten the description until it only catches the cases it was built for.

3. Hooks are for the second time something burned

Hooks are deterministic. They fire on events — PreToolUse, PostToolUse, Stop, SessionStart, SubagentStop — match against a pattern, and run a shell command. No model in the loop. No interpretation. They behave like git hooks for your agent.

The honest use cases are narrow:

PreToolUse matching rm -rf on shared paths → block. The hook saves you exactly once. That once is the entire ROI.
PostToolUse on file writes → run prettier or eslint --fix. Closes the style loop without burning prompt tokens on "remember to format."
Stop → desktop notification or Slack ping when a long-running task finishes. Cheap to wire up, recovers minutes of attention per day.

When not to add a hook. If you're trying to enforce a judgment — "only commit if the change is small enough," "block the edit if it touches sensitive logic" — you've picked the wrong layer. Hooks check matchers, not nuance. Put judgment in a skill or a subagent and let the model reason.

The other failure mode: hooks that fire on every tool call and add hundreds of milliseconds. If your hook isn't fast enough to be invisible, it's slow enough to be resented. A 200ms hook on PostToolUse is a tax on every edit.

4. Subagents are for when context becomes the bottleneck

A subagent is a fork — its own context window, its own model choice, its own tools, its own permissions. The parent delegates a task, gets back a result, and never sees the intermediate work.

The honest reason to reach for one is context pressure. You're doing a real piece of work in the main thread — designing a migration, drafting a release — and you don't want a 40,000-token exploration of "which files reference the old API" filling the same window. Delegate the search. Get back a summary. Keep moving.

The other honest reason is permission scoping. A test-runner subagent that can execute the test suite but can't edit code is safer than the same capabilities granted to the main agent. A code-reviewer subagent with read-only access can audit the diff without the temptation to "just fix" things on the way.

When not to add a subagent. If the task fits in the parent's context comfortably, a subagent just adds latency. Each delegation pays a round-trip cost in tokens (the prompt) and wall clock (a fresh model call). Subagents earn their keep on long-running searches, repetitive sub-tasks, and risky permissions — not on one-shot questions.

A subagent that returns 8,000 tokens to your parent context has cost you context, not saved it. Tune the prompt so it returns a summary the parent can act on, not a transcript of how it got there.

And subagents can't spawn subagents. Plan the tree before you fork.

5. Plugins are the last layer, and most setups never get there

A plugin bundles skills, agents, hooks, and slash commands into a distributable unit. Think npm packages for agent capabilities — versioned, installable, shareable.

The case for one is real and narrow: you've built a setup that works for you, three teammates have asked how to get the same thing, and copy-pasting the files has started to drift across machines. Package it. Publish it — private registry, GitHub repo, internal marketplace, whichever fits.

When not to build a plugin. If you are the only user. If your skills change weekly. If the team you'd distribute to is two people who sit next to you and could just clone the dotfiles. Plugins introduce a versioning surface — once someone installs your-plugin@1.2.0, breaking changes have costs. Most agent setups die long before they earn that overhead.

The honest test: would you publish this to npm if the format were JavaScript? If the answer is "no, it's too specific to my workflow," it's also too specific to be a plugin.

Two side rails worth knowing

MCP servers and agent teams sit beside this ladder rather than on it.

MCP servers are how agents reach external systems — GitHub, databases, APIs, custom integrations. They aren't a "layer" in the same sense; they're the connectors between the layers and the outside world. Add one when an integration you keep approximating in prompts could be a tool call instead. Skip it when the integration is one gh command you can just shell out to.

Agent teams — multiple agents coordinating with shared permissions and message passing — are the same story as subagents, one level up. They earn their keep when work decomposes cleanly into parallel streams (an "explorer" mapping the repo while a "writer" drafts against the result). They don't earn their keep when the "team" is one agent with extra ceremony and a Slack channel.

The order matters more than the inventory

Most setups skip straight from layer 1 to layer 5 — a CLAUDE.md and a marketplace install of someone else's plugin. The skipped layers come back as friction:

No skills → the same context gets crammed into every prompt, and the prompt prefix grows until CLAUDE.md is unreadable.
No hooks → the same mistake (the bad rm, the missing format, the silent on-finish) happens four times before someone wires the guard.
No subagents → the main context hits its ceiling at 60% of the work done, and the rest of the session is a fight against compaction.

Build the ladder one rung at a time, in order, against a symptom you actually have. The ceiling on a good agent setup isn't how many layers you've added. It's how many of them you can ignore on any given day because they're working quietly in the background.

If most of your setup feels invisible most of the time, you've done it right. If you're managing tools instead of using them, you've added a layer that hadn't earned its rung yet.

The Five Layers of an Agent Setup — and When Each One Is Premature

1. Memory is the only layer everyone needs

2. Skills are for prompts you've typed twice

3. Hooks are for the second time something burned

4. Subagents are for when context becomes the bottleneck

5. Plugins are the last layer, and most setups never get there

Two side rails worth knowing

The order matters more than the inventory

// more in tech

The Smallest Agent That Works, Part 3: The Three Agents With State

The Smallest Agent That Works, Part 2: The Three Reach-Out Agents

The Smallest Agent That Works, Part 1: The Three Cheap Agents

What MLX Got to Throw Away (That PyTorch Can't)

The Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy

Every Useful Skill Is One of Five Shapes

The Five Layers of an Agent Setup — and When Each One Is Premature

1. Memory is the only layer everyone needs

2. Skills are for prompts you've typed twice

3. Hooks are for the second time something burned

4. Subagents are for when context becomes the bottleneck

5. Plugins are the last layer, and most setups never get there

Two side rails worth knowing

The order matters more than the inventory

// more in tech

The Smallest Agent That Works, Part 3: The Three Agents With State

The Smallest Agent That Works, Part 2: The Three Reach-Out Agents

The Smallest Agent That Works, Part 1: The Three Cheap Agents

What MLX Got to Throw Away (That PyTorch Can't)

The Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy

Every Useful Skill Is One of Five Shapes

New posts, every week.Delivered Sunday mornings.

New posts, every week.
Delivered Sunday mornings.