Tech

Claude Code's Forked Subagents Aren't an Upgrade. They're the Opposite Tool.

Anthropic shipped /fork in Claude Code v2.1.117 — a subagent that inherits the entire parent conversation instead of a compressed handoff prompt. The obvious read is "subagents but better." It's wrong. Forks and regular subagents sit at opposite ends of a single axis: context inheritance. The blank slate was never a bug. Here's the decision rule.

Initial Editor·2026-04-23·6min read·1,126 words·28 views

Anthropic shipped forked subagents in Claude Code v2.1.117 last Tuesday, then fixed the obvious scale problem two days later (v2.1.118: /fork writes a pointer to the parent transcript instead of duplicating the whole conversation per fork). A forked subagent inherits the full parent context and shares its prompt cache; a regular subagent gets a 2K-token summary and a fresh slate. These are not two settings on the same dial. They're opposite primitives, and treating /fork as a strict upgrade will quietly make your results worse in ways that are hard to debug.

The one axis: context inheritance

Every delegation primitive in Claude Code sits on a single line.

On one end: /fork inherits the entire parent conversation. Every file read, every tool result, every opinion you and the agent converged on — all of it comes along. Same prompt cache, so the marginal token cost at launch is near zero until the fork diverges from the parent.

On the other end: a regular subagent gets a prompt. A fresh one. Your 180K-token working session becomes whatever 2K-token summary Claude Code chooses to hand over. Everything else is gone.

The instinct is to treat "more context = better." It isn't better. It's different. The blank slate is a feature, and about a third of the time it's the feature you actually want.

Where forks win: nuance preservation

Imagine an hour spent negotiating a design with Claude Code: which fonts, which density, which interaction patterns, which competitor you're explicitly not copying. The conversation has 40K tokens of accumulated taste. Now you want three parallel variations.

A regular subagent gets a handoff prompt. That prompt compresses 40K tokens of nuance into a system message that names the colors and says "make it feel premium." The subagent generates something generic because the brief it received is generic. The nuance died in the compression.

/fork hands the full conversation to each variation. The variation sees you rejecting three earlier attempts at the same thing and understands why. The taste survives the handoff.

The same logic applies anywhere the accumulated context is the asset: research that needs to know what you've already ruled out, Mermaid diagrams that need to match the architecture you just argued about, MCP calls that depend on the state of the thing you've been editing.

Where forks lose: the blank slate was the point

You wrote a module. You want a review. If you fork, the reviewer inherits the conversation where you and Claude Code jointly decided on the current design. The reviewer has seen every tradeoff argument, every rejected alternative, every place you said "good enough." It will not tell you the module is wrong. It will tell you the module is consistent with what you said you wanted — which is a different claim and a much weaker one.

A fresh subagent reads the code cold. It doesn't know which shortcuts were deliberate. It doesn't know which lines you already defended. It brings the outside perspective that a review is supposed to bring.

Same structural reason for:

  • Adversarial testing. Asking "what's wrong with this argument" to a fork that helped build the argument is theater.
  • Red-team / devil's advocate work. A fork is biased by accumulated agreement.
  • Sanity-check reads. "Does this file do what the name says?" — a fork has context that shortcuts the question.
  • Parallel codebase search. For fast grep-style work, conversation context is noise, not signal.

In each case, the fork's "inherited nuance" is actually inherited bias, and the blank slate is doing structural work a fork can't do.

The decision table

Task Pick Why
Three design variations with shared taste /fork The taste is the asset; compression kills it.
Code review of what you just built Subagent Fresh eyes. Fork inherits your justifications.
Research that depends on what you've ruled out /fork Ruled-out context is hard to compress into a prompt.
Parallel codebase search Subagent Conversation context is noise for grep-style work.
Quick side question, single answer /btw Single-turn, no tools needed; lighter than /fork.
Multi-step tangent that may be thrown away /fork Multi-step, then /rewind if unhelpful.
"Challenge my premise" Subagent Forks share your premise.
Mermaid diagram of the current architecture /fork Needs all the decisions in scope.
Log verification after a long session /fork Needs to know which logs matter.
Security review Subagent The whole point is adversarial distance.

The trick: run both and read the delta

The most interesting use of /fork isn't fork-by-default. It's forking and non-forking the same question in parallel and reading the disagreement.

A fork and a subagent both answer "is this approach correct." They agree → you have real signal. They disagree → the disagreement tells you exactly where the in-context bias sits. The fork said yes because it was in the room when you decided; the subagent said no because nothing about the code on its own justifies the decision. That gap is usually where the bug lives.

This only works because the two primitives genuinely differ. Treat them as the same tool and you lose the cross-check.

When not to fork

  • You're reviewing code the main session wrote. Use a subagent. Fork will rubber-stamp.
  • You need a fresh perspective on a stuck problem. The whole point is that the stuck perspective doesn't help. Subagent.
  • The conversation is 5K tokens and the task is mechanical. Forking has no nuance to preserve; you're paying overhead for nothing. Inline or subagent.
  • You're on an external build without CLAUDE_CODE_FORK_SUBAGENT=1. Check your env before blaming the tool.
  • Transcript is genuinely huge and old. Even with v2.1.118's pointer-based storage, fork startup scales with context size. A fresh subagent launches faster.

The honest caveat

Forks are genuinely new. /btw, /recap, and the recent memory-consolidation work all run on forks under the hood, and those features are better than their pre-fork predecessors for the same reasons forks beat subagents in the design-variation case. The feature is real and load-bearing.

But the naming — "forked subagent" — suggests the same tool with a better argument list. It isn't. It's a second primitive at the opposite end of the context-inheritance axis, and the delegation decision you're making every time you hit Tab isn't "fork or not" — it's "do I want this agent to carry my assumptions, or leave them at the door?"

The blank slate was never a bug. It was the reason you delegated in the first place.

// more in tech

see all →
Tech· 2026-05-27· 5min

The Smallest Agent That Works, Part 2: The Three Reach-Out Agents

When the cheap tiers run out, the agent has to reach beyond the model itself — into knowledge it doesn't have, tools it can't natively use, or its own previous answer. RAG, tool use, and self-critique: three patterns, three failure modes worth pricing in. Part 2 of three.

#llm#rag#agent-architecture#ai-engineering
Tech· 2026-05-26· 5min

The Smallest Agent That Works, Part 1: The Three Cheap Agents

Most agent stacks are built one tier too capable for the job. Three of the cheapest architectures — a fixed pipeline, an LLM with rule constraints, and a reasoning loop — solve more problems than the architecture diagrams admit. Part 1 of three.

#llm#agent-architecture#ai-engineering#ai-agents
Tech· 2026-05-15· 5min

What MLX Got to Throw Away (That PyTorch Can't)

Every mature framework is a museum of decisions you can't take back. MLX is interesting mostly because it started after the decisions that matter for Apple Silicon were already mistakes — and the things it threw away are the things that were quietly costing the rest of us the most.

#ai-engineering#apple-silicon#mlx#ml-frameworks
Tech· 2026-05-15· 5min

The Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy

For two years the industry's default answer to every inference question has been "bigger cluster." A different hardware topology is quietly making that the wrong default for a non-trivial slice of workloads — and the framework layer that earns it is the buzzword most decks haven't caught up with yet.

#hardware#ai-infrastructure#inference#edge-ai
Tech· 2026-05-14· 5min

Every Useful Skill Is One of Five Shapes

Skills aren't a freeform format. The useful ones fit one of five shapes — sequential workflow, multi-MCP coordination, iterative refinement, context-aware selection, domain-specific intelligence. Picking the right shape is most of the design work. Picking the wrong one is most of the bugs.

#claude-code#workflow#agents#skills
Tech· 2026-05-13· 5min

MCP Gives You the Kitchen. Skills Are the Recipe.

Most teams ship one of these and call the job done. MCP gives the agent tools. Skills tell it which to use, in what order, with which fallbacks. Without skills, your MCP integration ends with users asking 'okay, what now?'

#claude-code#mcp#agents#skills