Most "AI coding tools" ship two files: a prompt and a button. Claude Code ships seven architectural layers, and you feel the difference the first time you try to do anything longer than a single turn.
The LinkedIn-ready way to describe it is "a small operating system for autonomous engineering." The more useful way: each layer exists because the simpler version kept hitting a specific class of failure. Strip any one of them and you're back to a toy.
Here are the seven, and what each one actually buys you when you're eight hours into a refactor.
The layers at a glance
| Layer | What it does | What breaks without it |
|---|---|---|
| Input | Permission gate, session manager | You babysit every bash call, or let it run wild |
| Master loop | Perception → Action → Observation | Agent can't course-correct mid-task |
| Multi-agent | Subagent spawners, worktree isolators | Helper output pollutes your main context until it overflows |
| Knowledge | Skills, memory, compression | You pay for context you're not using this turn |
| Execution | Tool dispatch, prompt cache, streaming | Long sessions cost 3–5× what they should |
| Observability | Hooks, event bus, background executor | "Do X on every edit" is something you remember, not something the tool enforces |
| Integration | MCP runtime | Your GitHub integration is locked to one vendor |
1. The permission gate is why you can walk away
The Input Layer sits in front of every tool call with a YAML-driven Deny / Allow / Approve gate. That sounds like compliance theater until you try to run an agent unattended and realize the alternative is either approving every git status by hand or setting --dangerously-skip-permissions and hoping.
The actual payoff: you write ~/.claude/settings.json with patterns like "Bash(git diff:*)": "allow" and "Bash(rm:*)": "deny". Routine commands stop prompting. Destructive ones still do. The Session Manager persists conversation state across restarts, so --continue actually continues instead of silently rebooting.
Where this breaks down: allowlists are only as strong as your patterns. A rule like "Bash(git:*)": "allow" waves through git push --force too. Write the patterns narrow or accept the prompts.
2. One loop beats a pipeline
Most agent frameworks hard-code the flow — reader → planner → coder → tester, in that order. Claude Code has one loop: Perception → Action → Observation, repeated until the task is done. Every tool call, every file read, every edit goes through the same loop.
Why it matters: the agent can read a file, hit an unexpected error, decide to read a second file, and try a different approach — without an orchestration framework blocking the course-correction. Watch a long session and you'll see the loop re-plan three or four times on a single task. A fixed DAG cannot do that without a state machine the framework author never wrote.
Honest caveat: because the loop is open-ended, costs scale with turns, not tokens. A chatty agent on a hard problem is expensive in a way a pipelined agent isn't.
3. Subagents exist so your main context doesn't rot
The Multi-Agent Layer is not really about "a team of agents working together." It's about context discipline. Every file a helper reads, every grep output it processes, would otherwise land in your main context window. Two hours in, you're out of room for anything that matters.
Claude Code's subagents (the Task tool, subagent_type: Explore being the canonical one) return a summary to the parent, not their transcript. The parent gets "found three files matching pattern X" — not the 40 KB of grep output that conclusion was built from. Worktree Isolators apply the same idea to filesystem state: each parallel task gets its own git branch, so two agents editing the same file don't merge-fight.
When not to delegate: if you actually need the raw output (the exact log line, a full diff, two implementations side by side), a summary-only return loses the signal. Do that work in the main turn.
4. The Knowledge Layer is the quiet ceiling-raiser
Task Graph, Skill Registry, Memory Store, Context Compressor. Four components, one job: keep the working set small.
The most visible is the Skill Registry. Tools that ship skills as "always-loaded system prompts" bloat every request, whether the skill is relevant or not. Claude Code loads skills on demand — 80 installed, pay the tokens for the two that fired this turn. Memory persists across sessions so "we use pnpm here" survives /clear. The Context Compressor keeps long sessions under the window without silently losing the parts you still need.
Honest caveat: skill matching is not magic. A skill with a vague description field won't fire when you need it. The registry is only as good as the metadata each skill carries, which means your skill library needs maintenance, not just accumulation.
5. The prompt cache is the unglamorous money-saver
Execution Layer: Typed Tool Dispatch, Prompt Cache, Streaming Runtime. The one that actually changes your bill is the cache.
Anthropic's prompt cache has a 5-minute TTL and reads cached tokens at roughly 10% of the full input rate. Claude Code structures its context so the unchanging parts — system prompt, loaded files, earlier messages — sit at the front of the prefix and hit the cache every turn. A naive client re-bills the full conversation every turn; Claude Code doesn't. Long coding sessions end up 3–5× cheaper than the same session built without cache discipline.
Where this breaks down: a 20-minute thinking pause loses the warm cache. The next turn pays full freight before the cache refills. If you step away for lunch, expect one expensive turn when you come back.
6. Hooks make "I wish it did X" enforceable
The Observability Layer gives you an Event Bus with PreToolUse, PostToolUse, UserPromptSubmit, Stop, and a handful of other lifecycle events. Each one can run a shell command that the harness executes — not the agent.
That last detail is the whole point. "Run prettier after every edit" or "block commits to main" stops being something you have to remember and remind the agent about. It becomes a PostToolUse entry in settings.json that the harness enforces regardless of what the agent decides. The Background Executor keeps long-running processes alive across turns, so your dev server, test watcher, and build don't die between questions.
Honest caveat: hooks run with the agent's shell permissions. A hook that calls curl | sh on untrusted input is a foot-gun wearing a costume. Keep them simple, keep them local.
7. MCP is why your tools outlive Claude Code
The Integration Layer is MCP (Model Context Protocol). Write a filesystem server, a GitHub server, a Slack server once, and it runs in Claude Code, Cursor, Zed, and anything else that speaks the protocol. Your tooling is portable. The platform is not the platform.
Compare that to closed ecosystems where your GitHub integration is someone else's product, shipped on their roadmap, killed on their schedule. MCP flips the default: your tools plug in, not the other way around. It's also why new servers appear for things Anthropic has never heard of — Postgres, Linear, Figma, your company's internal API — without anyone asking permission.
Honest caveat: MCP is still young. Server quality varies, auth stories are inconsistent, and "works great in one client" doesn't always mean "works great in all of them." Pin versions, don't trust demos.
What this means when you pick tools
Every layer above exists because the simpler version of Claude Code kept hitting a specific class of failure. Permissions prevented destructive bash calls. The master loop prevented fixed-DAG paralysis. Subagents prevented context rot. The cache prevented runaway bills. Hooks prevented "we agreed the agent would do X" turning into tribal knowledge. MCP prevented the tooling moat from reforming somewhere else.
You can tell the layers are load-bearing because tools that launched without them are adding them back. Single-turn IDE assistants grew background agents. Thin agent frameworks grew hooks, permissions, and cache-aware context builders — usually late, usually bolted on, usually never quite native.
Most agent tools are a demo on Monday and tech debt by Friday. The difference isn't the model. It's whether the seven layers around the model are load-bearing or optional.
Ship with the one that built them in.