Tech

The Five Layers of an Agent Setup — and When Each One Is Premature

Memory, skills, hooks, subagents, plugins. Five layers in a strict order, each earning a specific payoff, each a mistake if you reach for it before a symptom asks for it. The honest part of the ladder is the rungs you don't climb yet.

Initial Editor·2026-05-11·8min read·1,652 words

Every agent setup looks the same on day one — a single CLAUDE.md and a handful of half-remembered prompts. Six months later, some have grown into clean, layered systems that disappear into the work. Most have grown into a yard sale.

The difference isn't how much got added. It's whether each thing was earned by a symptom the team actually had.

Five layers are worth knowing about, and they sit in a strict order. Each earns something specific. Each has a failure mode if you reach for it too early. Here's the ladder, with the conditions under which each layer is still premature.

Layer What it earns you Add it when Skip it when
Memory Always-loaded constitution The agent re-asks the same architecture questions every session A 40-line file already covers the rules
Skills On-demand, modular context You paste the same 200-line preamble into five prompts The repeated prompts are repeated by one person on one project
Hooks Deterministic guardrails An incident already happened — or is one careless command away "I'll just be careful" still holds
Subagents Forked context windows Main context hits 70% before the work is done The whole task fits comfortably in one window
Plugins Distribution to a team Three teammates rebuilt the same skill from scratch You're the only user, and you'll edit it again next week

1. Memory is the only layer everyone needs

A CLAUDE.md at the repo root is non-negotiable. It's the only layer that's always loaded, always active. Treat it as the constitution — architecture rules, naming conventions, test expectations, the repo's mental model in one place.

Two locations matter. ~/.claude/CLAUDE.md holds cross-project conventions: commit message rules, your preferred test runner, "never use --no-verify," "don't add Co-Authored-By lines." .claude/CLAUDE.md per repo holds repo-specific facts: which directory holds the migrations, which framework version is actually installed, what the deploy target is.

When to keep it small. If your CLAUDE.md is over 200 lines, you've turned it into documentation. It's not documentation — it's the prompt prefix on every interaction. Every line is paid for in tokens forever. Move the long-form context into a skill that loads on demand.

When not to add anything past this layer. If your repo is small enough that a 50-line CLAUDE.md captures the rules and the agent stays on track, stop. The next four layers exist to fix problems you don't have yet.

2. Skills are for prompts you've typed twice

The moment you catch yourself pasting the same 150-line preamble into a second prompt — "remember, our error handling pattern is X, here's the helper, here's the schema…" — that's the signal. Extract it into a skill.

A skill is a SKILL.md plus whatever scripts, templates, or reference docs go with it. It loads on demand based on description match, not on every turn. That's the unlock: you can have fifty skills installed and pay zero baseline tokens for the forty-eight you don't invoke this session.

Two patterns earn their keep:

  • Workflow skills — "draft-a-PR", "review-this-migration", "create-a-post." They encode a sequence of steps with a checklist and gate each step on user input.
  • Knowledge-pack skills — "the auth module", "the billing pipeline." They bundle reference material about one part of the codebase and only load when the conversation drifts that way.

When not to add a skill. If it's only ever invoked by you, only on one project, and the "skill" is two paragraphs of context — you've written a markdown note with extra ceremony. A regular note in the repo will outlive the skill format.

The other failure: a skill that always fires. If your skill's description is broad enough that it matches every prompt, it's a second CLAUDE.md wearing a hat. Tighten the description until it only catches the cases it was built for.

3. Hooks are for the second time something burned

Hooks are deterministic. They fire on events — PreToolUse, PostToolUse, Stop, SessionStart, SubagentStop — match against a pattern, and run a shell command. No model in the loop. No interpretation. They behave like git hooks for your agent.

The honest use cases are narrow:

  • PreToolUse matching rm -rf on shared paths → block. The hook saves you exactly once. That once is the entire ROI.
  • PostToolUse on file writes → run prettier or eslint --fix. Closes the style loop without burning prompt tokens on "remember to format."
  • Stop → desktop notification or Slack ping when a long-running task finishes. Cheap to wire up, recovers minutes of attention per day.

When not to add a hook. If you're trying to enforce a judgment — "only commit if the change is small enough," "block the edit if it touches sensitive logic" — you've picked the wrong layer. Hooks check matchers, not nuance. Put judgment in a skill or a subagent and let the model reason.

The other failure mode: hooks that fire on every tool call and add hundreds of milliseconds. If your hook isn't fast enough to be invisible, it's slow enough to be resented. A 200ms hook on PostToolUse is a tax on every edit.

4. Subagents are for when context becomes the bottleneck

A subagent is a fork — its own context window, its own model choice, its own tools, its own permissions. The parent delegates a task, gets back a result, and never sees the intermediate work.

The honest reason to reach for one is context pressure. You're doing a real piece of work in the main thread — designing a migration, drafting a release — and you don't want a 40,000-token exploration of "which files reference the old API" filling the same window. Delegate the search. Get back a summary. Keep moving.

The other honest reason is permission scoping. A test-runner subagent that can execute the test suite but can't edit code is safer than the same capabilities granted to the main agent. A code-reviewer subagent with read-only access can audit the diff without the temptation to "just fix" things on the way.

When not to add a subagent. If the task fits in the parent's context comfortably, a subagent just adds latency. Each delegation pays a round-trip cost in tokens (the prompt) and wall clock (a fresh model call). Subagents earn their keep on long-running searches, repetitive sub-tasks, and risky permissions — not on one-shot questions.

A subagent that returns 8,000 tokens to your parent context has cost you context, not saved it. Tune the prompt so it returns a summary the parent can act on, not a transcript of how it got there.

And subagents can't spawn subagents. Plan the tree before you fork.

5. Plugins are the last layer, and most setups never get there

A plugin bundles skills, agents, hooks, and slash commands into a distributable unit. Think npm packages for agent capabilities — versioned, installable, shareable.

The case for one is real and narrow: you've built a setup that works for you, three teammates have asked how to get the same thing, and copy-pasting the files has started to drift across machines. Package it. Publish it — private registry, GitHub repo, internal marketplace, whichever fits.

When not to build a plugin. If you are the only user. If your skills change weekly. If the team you'd distribute to is two people who sit next to you and could just clone the dotfiles. Plugins introduce a versioning surface — once someone installs your-plugin@1.2.0, breaking changes have costs. Most agent setups die long before they earn that overhead.

The honest test: would you publish this to npm if the format were JavaScript? If the answer is "no, it's too specific to my workflow," it's also too specific to be a plugin.

Two side rails worth knowing

MCP servers and agent teams sit beside this ladder rather than on it.

MCP servers are how agents reach external systems — GitHub, databases, APIs, custom integrations. They aren't a "layer" in the same sense; they're the connectors between the layers and the outside world. Add one when an integration you keep approximating in prompts could be a tool call instead. Skip it when the integration is one gh command you can just shell out to.

Agent teams — multiple agents coordinating with shared permissions and message passing — are the same story as subagents, one level up. They earn their keep when work decomposes cleanly into parallel streams (an "explorer" mapping the repo while a "writer" drafts against the result). They don't earn their keep when the "team" is one agent with extra ceremony and a Slack channel.

The order matters more than the inventory

Most setups skip straight from layer 1 to layer 5 — a CLAUDE.md and a marketplace install of someone else's plugin. The skipped layers come back as friction:

  • No skills → the same context gets crammed into every prompt, and the prompt prefix grows until CLAUDE.md is unreadable.
  • No hooks → the same mistake (the bad rm, the missing format, the silent on-finish) happens four times before someone wires the guard.
  • No subagents → the main context hits its ceiling at 60% of the work done, and the rest of the session is a fight against compaction.

Build the ladder one rung at a time, in order, against a symptom you actually have. The ceiling on a good agent setup isn't how many layers you've added. It's how many of them you can ignore on any given day because they're working quietly in the background.

If most of your setup feels invisible most of the time, you've done it right. If you're managing tools instead of using them, you've added a layer that hadn't earned its rung yet.

// more in tech

see all →
Tech· 2026-05-29· 5min

The Smallest Agent That Works, Part 3: The Three Agents With State

Stateless agents fit most tasks. State is the most expensive capability you can add — it doubles your operational surface, breaks your debugging, and rewards exactly the use cases that can't survive without it. Memory, environment control, self-learning. Part 3 of three.

#agent-architecture#ai-engineering#ai-agents#system-design
Tech· 2026-05-27· 5min

The Smallest Agent That Works, Part 2: The Three Reach-Out Agents

When the cheap tiers run out, the agent has to reach beyond the model itself — into knowledge it doesn't have, tools it can't natively use, or its own previous answer. RAG, tool use, and self-critique: three patterns, three failure modes worth pricing in. Part 2 of three.

#llm#rag#agent-architecture#ai-engineering
Tech· 2026-05-26· 5min

The Smallest Agent That Works, Part 1: The Three Cheap Agents

Most agent stacks are built one tier too capable for the job. Three of the cheapest architectures — a fixed pipeline, an LLM with rule constraints, and a reasoning loop — solve more problems than the architecture diagrams admit. Part 1 of three.

#llm#agent-architecture#ai-engineering#ai-agents
Tech· 2026-05-15· 5min

What MLX Got to Throw Away (That PyTorch Can't)

Every mature framework is a museum of decisions you can't take back. MLX is interesting mostly because it started after the decisions that matter for Apple Silicon were already mistakes — and the things it threw away are the things that were quietly costing the rest of us the most.

#ai-engineering#apple-silicon#mlx#ml-frameworks
Tech· 2026-05-15· 5min

The Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy

For two years the industry's default answer to every inference question has been "bigger cluster." A different hardware topology is quietly making that the wrong default for a non-trivial slice of workloads — and the framework layer that earns it is the buzzword most decks haven't caught up with yet.

#hardware#ai-infrastructure#inference#edge-ai
Tech· 2026-05-14· 5min

Every Useful Skill Is One of Five Shapes

Skills aren't a freeform format. The useful ones fit one of five shapes — sequential workflow, multi-MCP coordination, iterative refinement, context-aware selection, domain-specific intelligence. Picking the right shape is most of the design work. Picking the wrong one is most of the bugs.

#claude-code#workflow#agents#skills