START HEREThe 30-second version, then the part the video skipped
If you watched the short: yes — Claude Code subagents can now spawn their own subagents, five layers deep, and that lets you fan a job out and collapse it back into one clean answer. That's the why. This guide is the how. Below is the stuff that didn't fit in 60 seconds: the exact version it shipped in, what the depth-5 cap actually counts, the one-line Codex config that controls it, five orchestration patterns with the literal prompts I use, and the failure modes that can multiply your token bill. Skim the table of patterns, steal the prompts, read the gotchas before you run anything overnight.
One caveat up front: this shipped fast. The depth-5 cap was announced by Boris Cherny (Claude Code lead at Anthropic) on June 9 2026 alongside the v2.1.172 build, and as of that rollout the canonical Claude Code sub-agents doc had not yet been updated to reflect it — it still described subagents as unable to spawn other subagents. So treat the mechanics below as accurate to the announcement and your own testing, not yet to the official reference. Verify against /version and the live docs before you build on it.
- Shipped in Claude Code v2.1.172, June 9 2026 — no flag, on by default (per Boris Cherny's announcement + Digg coverage)
- Codex gates the same behavior behind one config key:
agents.max_depth - Nesting is a context tool first, a parallelism tool second — and it is NOT free
UNDER THE HOODThe mechanics: what depth-5 actually means
The changelog for v2.1.172 is a single line: "Sub-agents can now spawn their own sub-agents (up to 5 levels deep)." No settings page, no migration note. Here's what that line hides. Every spawned subagent — at any depth — starts with its own clean context window and its own system prompt. A layer-4 agent gets the exact same fresh-start treatment as a layer-1 agent; depth doesn't degrade the window. When work finishes, a child returns only final text or a validated object to its parent — never its raw tool calls or intermediate reasoning. That's the whole trick: noise stays buried at the bottom, signal bubbles up. The cap is enforced server-side and is not user-configurable in Claude Code — you can't raise it to 6, and the docs don't even pin down whether '5' counts the root or only the levels beneath it (treat it as 'about five' and design well inside it). Every agent also carries an agent_id and a parent_agent_id, so if you have OpenTelemetry tracing on, a nested run renders as a literal tree of spans — you can see exactly where each token landed.
| Property | How it behaves | Why it matters to you |
|---|---|---|
| Context per layer | Each subagent gets a fresh window + own system prompt, at every depth | A deep agent reasons just as cleanly as a shallow one — depth isn't a tax on quality |
| What collapses up | Only final text / validated object returns to the parent | Your main session stays lean — it never sees the raw logs the depth-3 agent waded through |
| Depth cap | Hard-capped at 5, server-side, not configurable in Claude Code | You can't tune it — you design the tree to fit, or push the knob in Codex instead |
| Lineage | agent_id + parent_agent_id on every agent | With OTel on, you get an audit tree of where tokens and time actually went |
TWO TOOLS, TWO MODELSClaude Code vs Codex: where the knob lives
Both tools have nesting; they expose it completely differently. Claude Code ships it on, capped, invisible. Codex makes you opt in per project with a TOML key — which is actually nicer when you want a bounded tree, because you set the exact ceiling. Here's the side-by-side, with the config you'd actually paste.
- Codex — open
.codex/config.tomland add an[agents]block - Set
max_depth = 2(children + grandchildren) — start small, 2–3 covers most real trees - Optionally raise
max_threadsif you want wider parallel fan-out (default is 6) - Per-worker batch jobs time out at
job_max_runtime_seconds(default 1800s = 30 min) — bump it for long verifications
| Claude Code | Codex | |
|---|---|---|
| Enable nesting | Update to v2.1.172+; on by default | Set agents.max_depth above 1 in config |
| Depth control | Hard-capped at 5, not configurable | max_depth (default 1 = children only, no grandchildren) |
| Concurrency control | Not user-exposed | max_threads (default 6 concurrent agent threads) |
| Define a custom agent | .claude/agents/*.md (per-project) or user-level | TOML in .codex/agents/ (project) or ~/.codex/agents/ |
| Batch fan-out | Prompt-driven | spawn_agents_on_csv — one worker per CSV row |
max_depth = 1 allows one level of children and blocks their grandchildren. To get a three-level tree (root → child → grandchild) you need max_depth = 2. The defaults above (max_depth 1, max_threads 6, job_max_runtime_seconds 1800) are from the current Codex config reference — Codex config changes often, so confirm them with your installed build before you paste.THE MENTAL MODELThe shape every pattern uses: fan out, then collapse
Picture a funnel on its side. Your main session asks one big question, fans it into parallel subagents (one per slice of work), and each slice — only if it hits something noisy — fans out again into its own helpers. Work explodes outward, runs in parallel, then collapses inward: deep results summarize up to their parent, parents up to you. You spend almost none of your main context and still get an answer that was investigated in depth instead of skimmed. The five patterns below are all this one shape, pointed at different jobs. The table is your index — pick by the job you actually have.
| Pattern | Reach for it when… | Depth that earns its keep |
|---|---|---|
| 1. Research / claim-verification | You have many sources and need claims actually checked, not vibe-checked | 2 (source → per-claim verifier) |
| 2. Debugging cause-tree | An incident has several plausible causes and you suspect a hidden root | 3 (incident → cause → log cross-check) |
| 3. Blast-radius (2nd/3rd-order effects) | A change is irreversible and consequences span many files/contracts | 3 (item → blast radius → dependency lookup) |
| 4. Skill head-to-head | You want an honest 'run it for real' verdict between two skills | 2 (each skill → its own worker agents) |
| 5. Skill optimization loop | A skill you run daily is good-but-not-reliable and you'll let it iterate | 2 (runner → grader → rewrite, repeat) |
PATTERN 1 · DEPTH 2Pattern 1 — Research & claim-verification fan-out
You have a folder of articles, or ten sources, and you want the big claims actually checked. Hand one source to each layer-1 subagent; its job is to extract the claims worth checking. For each claim that matters, it spawns a layer-2 subagent to do the noisy verification — search your knowledge base, hit the web, cross-reference. All sources run in parallel; all the messy searching is buried at layer 2. What comes back is a short table: claim → verdict → evidence. When/why: this is the highest-leverage pattern for research, due diligence, or fact-checking at volume, where a flat agent would skim. Don't reach past depth 2 here — claim verification rarely needs a grandchild's grandchild.
- Prompt: 'Load each article, one per subagent. Extract the biggest checkable claims from each.'
- Then: 'For each important claim, spawn a separate subagent to verify it against my notes and the web.'
- Worked example: 12 sources → 12 layer-1 readers → ~40 layer-2 verifiers running in parallel; you read one 40-row table, not 40 search dumps
- Tip: ask for the evidence URL in each row so you can spot-check a verdict in seconds
PATTERN 2 · DEPTH 3Pattern 2 — The debugging cause-tree
The most technical payoff, and the one that catches bugs a flat agent misses. A layer-1 subagent loads a fresh incident and lists the plausible causes. It fans those out: one layer-2 subagent per suspected cause, each searching the codebase for evidence. When one smells something deeper, it spawns a layer-3 subagent to cross-check the server logs or live system. Why it beats one big window: shallow debugging finds the first thing that looks broken and stops. The cause-tree keeps a separate clean window for the hunch that there's a deeper root under the obvious symptom — so it surfaces the real bug, not the first one. The tree collapses to a single root cause and you fix it in a main session that's barely been touched. This is exactly the '10 bug reports, each its own agent, each splitting again' move from the video — here are the literal layers.
- Layer 1: 'Load this incident. List every plausible cause, one line each.'
- Layer 2: 'Spawn one subagent per cause. Each hunts the codebase for evidence for or against it — in parallel.'
- Layer 3: 'Any cause that looks like a symptom of something deeper spawns a check against the server logs / live system.'
- Result: the tree collapses to a single root cause + the evidence chain, with your main context still lean
PATTERN 3 · DEPTH 3Pattern 3 — Second & third-order effects (blast radius)
Big decisions fail on the consequences nobody traced. You're working through ten contracts, or ten files, toward one goal. Give each its own layer-1 subagent to map the direct implications. Then go deeper: a layer-2 subagent measures the blast radius — if we change THIS, what else has to change? Any noisy digging it needs (old email threads, related clauses, dependent code) it pushes to a layer-3 subagent so layer 2 stays clean. You're now reasoning about second- and third-order effects in parallel, across every item, without holding it all in one head. When/why: run it before any irreversible change — a migration, a contract amendment, a schema change. It's a cheap way to see around corners that a single pass ships right past.
- Layer 1: one subagent per item (contract / file / decision) → direct implications of the goal
- Layer 2: 'Measure the blast radius — list everything else that must change if we do this.'
- Layer 3: delegate the noisy history/dependency lookups so layer 2 stays high-signal
- Worked example: a DB column rename → 9 file-owner agents → each spawns a 'who reads this column?' agent → you get one impact list before you touch a line
PATTERNS 4 & 5 · DEPTH 2Pattern 4 — Skills head-to-head, and Pattern 5 — optimize them overnight
Two moves that only work now that agents nest. Pattern 4 — head-to-head: load Skill A into one layer-1 subagent and Skill B into another, in parallel. Each spawns its own helpers to actually do the job — say two competing code-review skills, each running its own review subagents — then you ask the main session which output is better and why. You get an honest, run-it-for-real verdict instead of reading two READMEs. Pattern 5 — optimization loop: have a subagent run a skill, spawn helpers to grade the output against a rubric, rewrite the skill, and run it again — version after version, overnight if you let it — until it's reliably better. When/why: Pattern 4 when you're choosing between tools; Pattern 5 for a skill you run every single day, where a 10% reliability gain compounds. Pattern 5 is also the one most likely to run up a bill unattended — cap the iterations.
- Compare (P4): 'Run Skill A and Skill B in separate subagents on the same task, then tell me which output is better and why.'
- Each skill spawns its own worker subagents to do the real work — a fair fight, not a spec sheet
- Optimize (P5): 'Run the skill → grade the output in a subagent against this rubric → rewrite the skill → repeat, max N rounds.'
- Always set a hard round/iteration cap on P5 — an open-ended overnight loop is exactly how token bills surprise you
WHERE IT BITESThe failure modes (read this before you run anything deep)
Nesting is a power tool, and the costs are real and non-linear. The headline number to plan around: a community write-up on claudefa.st estimates subagent-heavy workflows run roughly 7x the tokens of a single-threaded session, and that cost multiplies geometrically with each layer you add. That figure is a community estimate, not an Anthropic-published benchmark — your real multiplier depends on how wide and deep your tree runs — but the direction is the point: budget several times the cost of a flat session, and watch it climb with depth. The same write-up notes early reaction split sharply over exactly this: enthusiasm for the context control, pushback on the token bill. The nastier, quieter failure is prompt imprecision compounding: a vague instruction at depth 1 becomes four vague subagents at depth 2 and sixteen at depth 3, each burning tokens on the wrong thing. Depth amplifies whatever you gave it — clarity or sloppiness. Treat the rules below as non-negotiable for anything that runs unattended.
- Several times the token cost vs single-threaded (a claudefa.st community analysis puts it near 7x), multiplying with depth — budget for it, don't be surprised by it
- Vagueness compounds: an unclear depth-1 prompt becomes 4 unclear ones at depth-2, 16 at depth-3. Tighten the prompt before you go deep
- Exponential fan-out: wide × deep is the runaway case. Keep trees either wide-and-shallow or narrow-and-deep, rarely both
- No auto-stop on bad work: the model won't reliably halt a doomed branch. Cap iterations on any loop (Pattern 5 especially)
- Default to flat: most edits and quick tasks should never nest. Reach for depth only when the job is genuinely big AND noisy
agent_id / parent_agent_id lineage renders the whole run as a span tree, so when a job costs more than expected you can see which branch burned it — instead of guessing.Get the next drop
New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.
By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.
Frequently asked questions
How deep can subagents actually nest, and can I change it?
agents.max_depth yourself (default 1), so you control the exact ceiling per project.What does Codex's agents.max_depth = 1 actually allow?
max_depth = 1 lets the root spawn direct children but blocks those children from spawning grandchildren. For a three-level tree (root → child → grandchild) you need max_depth = 2. It lives under the [agents] block in your .codex/config.toml, alongside max_threads (default 6 concurrent threads).How much more do nested subagents cost?
Will the model nest on its own, or do I have to ask?
What's the most common way nesting goes wrong?
How do I see where the tokens went in a nested run?
agent_id and a parent_agent_id, so a nested run renders as a literal tree of spans — you can audit exactly which branch and which depth burned the budget, instead of guessing from a single aggregate number.