Tech

Claude Code Is a Distributed System. Stop Treating It Like a Prompt.

Seven architectural layers sit between you and the model in Claude Code, and each one is load-bearing. Here's what every layer actually buys you when you're eight hours into a refactor — and what specifically breaks when a tool pretends it doesn't need one.

Initial Editor·2026-04-24·7min read·1,399 words·16 views

Most "AI coding tools" ship two files: a prompt and a button. Claude Code ships seven architectural layers, and you feel the difference the first time you try to do anything longer than a single turn.

The LinkedIn-ready way to describe it is "a small operating system for autonomous engineering." The more useful way: each layer exists because the simpler version kept hitting a specific class of failure. Strip any one of them and you're back to a toy.

Here are the seven, and what each one actually buys you when you're eight hours into a refactor.

The layers at a glance

Layer What it does What breaks without it
Input Permission gate, session manager You babysit every bash call, or let it run wild
Master loop Perception → Action → Observation Agent can't course-correct mid-task
Multi-agent Subagent spawners, worktree isolators Helper output pollutes your main context until it overflows
Knowledge Skills, memory, compression You pay for context you're not using this turn
Execution Tool dispatch, prompt cache, streaming Long sessions cost 3–5× what they should
Observability Hooks, event bus, background executor "Do X on every edit" is something you remember, not something the tool enforces
Integration MCP runtime Your GitHub integration is locked to one vendor

1. The permission gate is why you can walk away

The Input Layer sits in front of every tool call with a YAML-driven Deny / Allow / Approve gate. That sounds like compliance theater until you try to run an agent unattended and realize the alternative is either approving every git status by hand or setting --dangerously-skip-permissions and hoping.

The actual payoff: you write ~/.claude/settings.json with patterns like "Bash(git diff:*)": "allow" and "Bash(rm:*)": "deny". Routine commands stop prompting. Destructive ones still do. The Session Manager persists conversation state across restarts, so --continue actually continues instead of silently rebooting.

Where this breaks down: allowlists are only as strong as your patterns. A rule like "Bash(git:*)": "allow" waves through git push --force too. Write the patterns narrow or accept the prompts.

2. One loop beats a pipeline

Most agent frameworks hard-code the flow — reader → planner → coder → tester, in that order. Claude Code has one loop: Perception → Action → Observation, repeated until the task is done. Every tool call, every file read, every edit goes through the same loop.

Why it matters: the agent can read a file, hit an unexpected error, decide to read a second file, and try a different approach — without an orchestration framework blocking the course-correction. Watch a long session and you'll see the loop re-plan three or four times on a single task. A fixed DAG cannot do that without a state machine the framework author never wrote.

Honest caveat: because the loop is open-ended, costs scale with turns, not tokens. A chatty agent on a hard problem is expensive in a way a pipelined agent isn't.

3. Subagents exist so your main context doesn't rot

The Multi-Agent Layer is not really about "a team of agents working together." It's about context discipline. Every file a helper reads, every grep output it processes, would otherwise land in your main context window. Two hours in, you're out of room for anything that matters.

Claude Code's subagents (the Task tool, subagent_type: Explore being the canonical one) return a summary to the parent, not their transcript. The parent gets "found three files matching pattern X" — not the 40 KB of grep output that conclusion was built from. Worktree Isolators apply the same idea to filesystem state: each parallel task gets its own git branch, so two agents editing the same file don't merge-fight.

When not to delegate: if you actually need the raw output (the exact log line, a full diff, two implementations side by side), a summary-only return loses the signal. Do that work in the main turn.

4. The Knowledge Layer is the quiet ceiling-raiser

Task Graph, Skill Registry, Memory Store, Context Compressor. Four components, one job: keep the working set small.

The most visible is the Skill Registry. Tools that ship skills as "always-loaded system prompts" bloat every request, whether the skill is relevant or not. Claude Code loads skills on demand — 80 installed, pay the tokens for the two that fired this turn. Memory persists across sessions so "we use pnpm here" survives /clear. The Context Compressor keeps long sessions under the window without silently losing the parts you still need.

Honest caveat: skill matching is not magic. A skill with a vague description field won't fire when you need it. The registry is only as good as the metadata each skill carries, which means your skill library needs maintenance, not just accumulation.

5. The prompt cache is the unglamorous money-saver

Execution Layer: Typed Tool Dispatch, Prompt Cache, Streaming Runtime. The one that actually changes your bill is the cache.

Anthropic's prompt cache has a 5-minute TTL and reads cached tokens at roughly 10% of the full input rate. Claude Code structures its context so the unchanging parts — system prompt, loaded files, earlier messages — sit at the front of the prefix and hit the cache every turn. A naive client re-bills the full conversation every turn; Claude Code doesn't. Long coding sessions end up 3–5× cheaper than the same session built without cache discipline.

Where this breaks down: a 20-minute thinking pause loses the warm cache. The next turn pays full freight before the cache refills. If you step away for lunch, expect one expensive turn when you come back.

6. Hooks make "I wish it did X" enforceable

The Observability Layer gives you an Event Bus with PreToolUse, PostToolUse, UserPromptSubmit, Stop, and a handful of other lifecycle events. Each one can run a shell command that the harness executes — not the agent.

That last detail is the whole point. "Run prettier after every edit" or "block commits to main" stops being something you have to remember and remind the agent about. It becomes a PostToolUse entry in settings.json that the harness enforces regardless of what the agent decides. The Background Executor keeps long-running processes alive across turns, so your dev server, test watcher, and build don't die between questions.

Honest caveat: hooks run with the agent's shell permissions. A hook that calls curl | sh on untrusted input is a foot-gun wearing a costume. Keep them simple, keep them local.

7. MCP is why your tools outlive Claude Code

The Integration Layer is MCP (Model Context Protocol). Write a filesystem server, a GitHub server, a Slack server once, and it runs in Claude Code, Cursor, Zed, and anything else that speaks the protocol. Your tooling is portable. The platform is not the platform.

Compare that to closed ecosystems where your GitHub integration is someone else's product, shipped on their roadmap, killed on their schedule. MCP flips the default: your tools plug in, not the other way around. It's also why new servers appear for things Anthropic has never heard of — Postgres, Linear, Figma, your company's internal API — without anyone asking permission.

Honest caveat: MCP is still young. Server quality varies, auth stories are inconsistent, and "works great in one client" doesn't always mean "works great in all of them." Pin versions, don't trust demos.

What this means when you pick tools

Every layer above exists because the simpler version of Claude Code kept hitting a specific class of failure. Permissions prevented destructive bash calls. The master loop prevented fixed-DAG paralysis. Subagents prevented context rot. The cache prevented runaway bills. Hooks prevented "we agreed the agent would do X" turning into tribal knowledge. MCP prevented the tooling moat from reforming somewhere else.

You can tell the layers are load-bearing because tools that launched without them are adding them back. Single-turn IDE assistants grew background agents. Thin agent frameworks grew hooks, permissions, and cache-aware context builders — usually late, usually bolted on, usually never quite native.

Most agent tools are a demo on Monday and tech debt by Friday. The difference isn't the model. It's whether the seven layers around the model are load-bearing or optional.

Ship with the one that built them in.

// more in tech

see all →
Tech· 2026-05-29· 5min

The Smallest Agent That Works, Part 3: The Three Agents With State

Stateless agents fit most tasks. State is the most expensive capability you can add — it doubles your operational surface, breaks your debugging, and rewards exactly the use cases that can't survive without it. Memory, environment control, self-learning. Part 3 of three.

#agent-architecture#ai-engineering#ai-agents#system-design
Tech· 2026-05-27· 5min

The Smallest Agent That Works, Part 2: The Three Reach-Out Agents

When the cheap tiers run out, the agent has to reach beyond the model itself — into knowledge it doesn't have, tools it can't natively use, or its own previous answer. RAG, tool use, and self-critique: three patterns, three failure modes worth pricing in. Part 2 of three.

#llm#rag#agent-architecture#ai-engineering
Tech· 2026-05-26· 5min

The Smallest Agent That Works, Part 1: The Three Cheap Agents

Most agent stacks are built one tier too capable for the job. Three of the cheapest architectures — a fixed pipeline, an LLM with rule constraints, and a reasoning loop — solve more problems than the architecture diagrams admit. Part 1 of three.

#llm#agent-architecture#ai-engineering#ai-agents
Tech· 2026-05-15· 5min

What MLX Got to Throw Away (That PyTorch Can't)

Every mature framework is a museum of decisions you can't take back. MLX is interesting mostly because it started after the decisions that matter for Apple Silicon were already mistakes — and the things it threw away are the things that were quietly costing the rest of us the most.

#ai-engineering#apple-silicon#mlx#ml-frameworks
Tech· 2026-05-15· 5min

The Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy

For two years the industry's default answer to every inference question has been "bigger cluster." A different hardware topology is quietly making that the wrong default for a non-trivial slice of workloads — and the framework layer that earns it is the buzzword most decks haven't caught up with yet.

#hardware#ai-infrastructure#inference#edge-ai
Tech· 2026-05-14· 5min

Every Useful Skill Is One of Five Shapes

Skills aren't a freeform format. The useful ones fit one of five shapes — sequential workflow, multi-MCP coordination, iterative refinement, context-aware selection, domain-specific intelligence. Picking the right shape is most of the design work. Picking the wrong one is most of the bugs.

#claude-code#workflow#agents#skills