Kno2gether kno2gether.com ↗ Start free
Field Guide

Nested Subagents: Claude Code's Depth-5 Upgrade (and 5 Patterns That Actually Use It)

Your AI agent now spawns its own agents — up to five layers deep. Here's the mechanics nobody put on screen: how the depth cap really works, the five orchestration patterns with exact prompts, the token math, and the failure modes that eat your budget.

See how Knotie works
01

START HEREThe 30-second version, then the part the video skipped

If you watched the short: yes — Claude Code subagents can now spawn their own subagents, five layers deep, and that lets you fan a job out and collapse it back into one clean answer. That's the why. This guide is the how. Below is the stuff that didn't fit in 60 seconds: the exact version it shipped in, what the depth-5 cap actually counts, the one-line Codex config that controls it, five orchestration patterns with the literal prompts I use, and the failure modes that can multiply your token bill. Skim the table of patterns, steal the prompts, read the gotchas before you run anything overnight. One caveat up front: this shipped fast. The depth-5 cap was announced by Boris Cherny (Claude Code lead at Anthropic) on June 9 2026 alongside the v2.1.172 build, and as of that rollout the canonical Claude Code sub-agents doc had not yet been updated to reflect it — it still described subagents as unable to spawn other subagents. So treat the mechanics below as accurate to the announcement and your own testing, not yet to the official reference. Verify against /version and the live docs before you build on it.

  • Shipped in Claude Code v2.1.172, June 9 2026 — no flag, on by default (per Boris Cherny's announcement + Digg coverage)
  • Codex gates the same behavior behind one config key: agents.max_depth
  • Nesting is a context tool first, a parallelism tool second — and it is NOT free
02

UNDER THE HOODThe mechanics: what depth-5 actually means

The changelog for v2.1.172 is a single line: "Sub-agents can now spawn their own sub-agents (up to 5 levels deep)." No settings page, no migration note. Here's what that line hides. Every spawned subagent — at any depth — starts with its own clean context window and its own system prompt. A layer-4 agent gets the exact same fresh-start treatment as a layer-1 agent; depth doesn't degrade the window. When work finishes, a child returns only final text or a validated object to its parent — never its raw tool calls or intermediate reasoning. That's the whole trick: noise stays buried at the bottom, signal bubbles up. The cap is enforced server-side and is not user-configurable in Claude Code — you can't raise it to 6, and the docs don't even pin down whether '5' counts the root or only the levels beneath it (treat it as 'about five' and design well inside it). Every agent also carries an agent_id and a parent_agent_id, so if you have OpenTelemetry tracing on, a nested run renders as a literal tree of spans — you can see exactly where each token landed.

PropertyHow it behavesWhy it matters to you
Context per layerEach subagent gets a fresh window + own system prompt, at every depthA deep agent reasons just as cleanly as a shallow one — depth isn't a tax on quality
What collapses upOnly final text / validated object returns to the parentYour main session stays lean — it never sees the raw logs the depth-3 agent waded through
Depth capHard-capped at 5, server-side, not configurable in Claude CodeYou can't tune it — you design the tree to fit, or push the knob in Codex instead
Lineageagent_id + parent_agent_id on every agentWith OTel on, you get an audit tree of where tokens and time actually went
The model does NOT reliably nest on its own yet. If you want depth, you usually have to ask for it by name in the prompt ('verify each claim in a separate subagent') or bake the delegation into a skill. Assume flat-by-default unless you say otherwise.
03

TWO TOOLS, TWO MODELSClaude Code vs Codex: where the knob lives

Both tools have nesting; they expose it completely differently. Claude Code ships it on, capped, invisible. Codex makes you opt in per project with a TOML key — which is actually nicer when you want a bounded tree, because you set the exact ceiling. Here's the side-by-side, with the config you'd actually paste.

  1. Codex — open .codex/config.toml and add an [agents] block
  2. Set max_depth = 2 (children + grandchildren) — start small, 2–3 covers most real trees
  3. Optionally raise max_threads if you want wider parallel fan-out (default is 6)
  4. Per-worker batch jobs time out at job_max_runtime_seconds (default 1800s = 30 min) — bump it for long verifications
Claude CodeCodex
Enable nestingUpdate to v2.1.172+; on by defaultSet agents.max_depth above 1 in config
Depth controlHard-capped at 5, not configurablemax_depth (default 1 = children only, no grandchildren)
Concurrency controlNot user-exposedmax_threads (default 6 concurrent agent threads)
Define a custom agent.claude/agents/*.md (per-project) or user-levelTOML in .codex/agents/ (project) or ~/.codex/agents/
Batch fan-outPrompt-drivenspawn_agents_on_csv — one worker per CSV row
Codex depth counts from 0: the root session is depth 0, so max_depth = 1 allows one level of children and blocks their grandchildren. To get a three-level tree (root → child → grandchild) you need max_depth = 2. The defaults above (max_depth 1, max_threads 6, job_max_runtime_seconds 1800) are from the current Codex config reference — Codex config changes often, so confirm them with your installed build before you paste.
04

THE MENTAL MODELThe shape every pattern uses: fan out, then collapse

Picture a funnel on its side. Your main session asks one big question, fans it into parallel subagents (one per slice of work), and each slice — only if it hits something noisy — fans out again into its own helpers. Work explodes outward, runs in parallel, then collapses inward: deep results summarize up to their parent, parents up to you. You spend almost none of your main context and still get an answer that was investigated in depth instead of skimmed. The five patterns below are all this one shape, pointed at different jobs. The table is your index — pick by the job you actually have.

PatternReach for it when…Depth that earns its keep
1. Research / claim-verificationYou have many sources and need claims actually checked, not vibe-checked2 (source → per-claim verifier)
2. Debugging cause-treeAn incident has several plausible causes and you suspect a hidden root3 (incident → cause → log cross-check)
3. Blast-radius (2nd/3rd-order effects)A change is irreversible and consequences span many files/contracts3 (item → blast radius → dependency lookup)
4. Skill head-to-headYou want an honest 'run it for real' verdict between two skills2 (each skill → its own worker agents)
5. Skill optimization loopA skill you run daily is good-but-not-reliable and you'll let it iterate2 (runner → grader → rewrite, repeat)
05

PATTERN 1 · DEPTH 2Pattern 1 — Research & claim-verification fan-out

You have a folder of articles, or ten sources, and you want the big claims actually checked. Hand one source to each layer-1 subagent; its job is to extract the claims worth checking. For each claim that matters, it spawns a layer-2 subagent to do the noisy verification — search your knowledge base, hit the web, cross-reference. All sources run in parallel; all the messy searching is buried at layer 2. What comes back is a short table: claim → verdict → evidence. When/why: this is the highest-leverage pattern for research, due diligence, or fact-checking at volume, where a flat agent would skim. Don't reach past depth 2 here — claim verification rarely needs a grandchild's grandchild.

  1. Prompt: 'Load each article, one per subagent. Extract the biggest checkable claims from each.'
  2. Then: 'For each important claim, spawn a separate subagent to verify it against my notes and the web.'
  3. Worked example: 12 sources → 12 layer-1 readers → ~40 layer-2 verifiers running in parallel; you read one 40-row table, not 40 search dumps
  4. Tip: ask for the evidence URL in each row so you can spot-check a verdict in seconds
06

PATTERN 2 · DEPTH 3Pattern 2 — The debugging cause-tree

The most technical payoff, and the one that catches bugs a flat agent misses. A layer-1 subagent loads a fresh incident and lists the plausible causes. It fans those out: one layer-2 subagent per suspected cause, each searching the codebase for evidence. When one smells something deeper, it spawns a layer-3 subagent to cross-check the server logs or live system. Why it beats one big window: shallow debugging finds the first thing that looks broken and stops. The cause-tree keeps a separate clean window for the hunch that there's a deeper root under the obvious symptom — so it surfaces the real bug, not the first one. The tree collapses to a single root cause and you fix it in a main session that's barely been touched. This is exactly the '10 bug reports, each its own agent, each splitting again' move from the video — here are the literal layers.

  1. Layer 1: 'Load this incident. List every plausible cause, one line each.'
  2. Layer 2: 'Spawn one subagent per cause. Each hunts the codebase for evidence for or against it — in parallel.'
  3. Layer 3: 'Any cause that looks like a symptom of something deeper spawns a check against the server logs / live system.'
  4. Result: the tree collapses to a single root cause + the evidence chain, with your main context still lean
07

PATTERN 3 · DEPTH 3Pattern 3 — Second & third-order effects (blast radius)

Big decisions fail on the consequences nobody traced. You're working through ten contracts, or ten files, toward one goal. Give each its own layer-1 subagent to map the direct implications. Then go deeper: a layer-2 subagent measures the blast radius — if we change THIS, what else has to change? Any noisy digging it needs (old email threads, related clauses, dependent code) it pushes to a layer-3 subagent so layer 2 stays clean. You're now reasoning about second- and third-order effects in parallel, across every item, without holding it all in one head. When/why: run it before any irreversible change — a migration, a contract amendment, a schema change. It's a cheap way to see around corners that a single pass ships right past.

  1. Layer 1: one subagent per item (contract / file / decision) → direct implications of the goal
  2. Layer 2: 'Measure the blast radius — list everything else that must change if we do this.'
  3. Layer 3: delegate the noisy history/dependency lookups so layer 2 stays high-signal
  4. Worked example: a DB column rename → 9 file-owner agents → each spawns a 'who reads this column?' agent → you get one impact list before you touch a line
08

PATTERNS 4 & 5 · DEPTH 2Pattern 4 — Skills head-to-head, and Pattern 5 — optimize them overnight

Two moves that only work now that agents nest. Pattern 4 — head-to-head: load Skill A into one layer-1 subagent and Skill B into another, in parallel. Each spawns its own helpers to actually do the job — say two competing code-review skills, each running its own review subagents — then you ask the main session which output is better and why. You get an honest, run-it-for-real verdict instead of reading two READMEs. Pattern 5 — optimization loop: have a subagent run a skill, spawn helpers to grade the output against a rubric, rewrite the skill, and run it again — version after version, overnight if you let it — until it's reliably better. When/why: Pattern 4 when you're choosing between tools; Pattern 5 for a skill you run every single day, where a 10% reliability gain compounds. Pattern 5 is also the one most likely to run up a bill unattended — cap the iterations.

  1. Compare (P4): 'Run Skill A and Skill B in separate subagents on the same task, then tell me which output is better and why.'
  2. Each skill spawns its own worker subagents to do the real work — a fair fight, not a spec sheet
  3. Optimize (P5): 'Run the skill → grade the output in a subagent against this rubric → rewrite the skill → repeat, max N rounds.'
  4. Always set a hard round/iteration cap on P5 — an open-ended overnight loop is exactly how token bills surprise you
09

WHERE IT BITESThe failure modes (read this before you run anything deep)

Nesting is a power tool, and the costs are real and non-linear. The headline number to plan around: a community write-up on claudefa.st estimates subagent-heavy workflows run roughly 7x the tokens of a single-threaded session, and that cost multiplies geometrically with each layer you add. That figure is a community estimate, not an Anthropic-published benchmark — your real multiplier depends on how wide and deep your tree runs — but the direction is the point: budget several times the cost of a flat session, and watch it climb with depth. The same write-up notes early reaction split sharply over exactly this: enthusiasm for the context control, pushback on the token bill. The nastier, quieter failure is prompt imprecision compounding: a vague instruction at depth 1 becomes four vague subagents at depth 2 and sixteen at depth 3, each burning tokens on the wrong thing. Depth amplifies whatever you gave it — clarity or sloppiness. Treat the rules below as non-negotiable for anything that runs unattended.

  • Several times the token cost vs single-threaded (a claudefa.st community analysis puts it near 7x), multiplying with depth — budget for it, don't be surprised by it
  • Vagueness compounds: an unclear depth-1 prompt becomes 4 unclear ones at depth-2, 16 at depth-3. Tighten the prompt before you go deep
  • Exponential fan-out: wide × deep is the runaway case. Keep trees either wide-and-shallow or narrow-and-deep, rarely both
  • No auto-stop on bad work: the model won't reliably halt a doomed branch. Cap iterations on any loop (Pattern 5 especially)
  • Default to flat: most edits and quick tasks should never nest. Reach for depth only when the job is genuinely big AND noisy
Turn on OpenTelemetry tracing if you nest regularly. The agent_id / parent_agent_id lineage renders the whole run as a span tree, so when a job costs more than expected you can see which branch burned it — instead of guessing.

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.

Frequently asked questions

How deep can subagents actually nest, and can I change it?
In Claude Code the cap is 5 levels, enforced server-side, and you cannot raise it — there's no flag for it in v2.1.172. This came from Boris Cherny's June 9 2026 announcement; at that rollout the canonical sub-agents doc hadn't been updated to reflect nesting yet, so confirm against the live docs and your own build. The cap also doesn't specify whether '5' counts the root session or only the levels beneath it, so design comfortably inside it. In Codex it's the opposite: you set agents.max_depth yourself (default 1), so you control the exact ceiling per project.
What does Codex's agents.max_depth = 1 actually allow?
Codex counts depth from 0 — the root session is depth 0. So max_depth = 1 lets the root spawn direct children but blocks those children from spawning grandchildren. For a three-level tree (root → child → grandchild) you need max_depth = 2. It lives under the [agents] block in your .codex/config.toml, alongside max_threads (default 6 concurrent threads).
How much more do nested subagents cost?
A lot — budget several times a single-threaded session. A community analysis on claudefa.st pegs subagent-heavy workflows at roughly 7x the tokens of a flat run, and the cost multiplies geometrically with each layer, so a depth-3 tree is dramatically more expensive than a depth-1 one. Treat that 7x as a community estimate rather than a published benchmark; your real number depends on how wide and deep your tree runs. Either way the trade-off holds: you're buying cleaner context and parallel investigation with tokens, so reserve it for jobs where the depth genuinely pays off and cap any iteration loops.
Will the model nest on its own, or do I have to ask?
You mostly have to ask. As of the v2.1.172 rollout the model does not reliably delegate downward on its own — you get the most by instructing it explicitly ('verify each claim in a separate subagent', 'spawn one agent per cause') or by baking the delegation into a custom skill or agent definition. Assume flat-by-default unless your prompt says otherwise.
What's the most common way nesting goes wrong?
Vague prompts compounding with depth. A fuzzy instruction at level 1 becomes four fuzzy subagents at level 2 and sixteen at level 3, each spending tokens on the wrong thing. Depth amplifies whatever you hand it. Tighten the top-level prompt before you go deep, keep trees from being both wide and deep, and always cap iteration loops so a doomed branch can't run all night.
How do I see where the tokens went in a nested run?
Turn on OpenTelemetry tracing. Every agent carries an agent_id and a parent_agent_id, so a nested run renders as a literal tree of spans — you can audit exactly which branch and which depth burned the budget, instead of guessing from a single aggregate number.
Why not just use one big context window instead?
Because context windows degrade as they fill — an agent reasoning over thousands of lines of logs and file dumps makes worse decisions on the edits that matter. Each subagent pushes that noise into its own fresh window and returns only the summary, so the main conversation stays short and high-signal. Nesting extends that benefit to every layer, not just the first.
When should I NOT nest?
Most of the time. Everyday edits, quick lookups, and small tasks should stay flat — the latency and ~7x token cost aren't worth it. Reach for nesting only when the work is both big and noisy: research at volume, real multi-cause debugging, or high-stakes irreversible decisions where missing a second-order effect is expensive.
Sources · Claude Code Nested Subagents: 5 Levels Deep — claudefa.st (community analysis; source of the ~7x token estimate) · Boris Cherny releases nested subagent support (depth 5) — Digg · Create custom subagents — Claude Code Docs · Subagents — Codex / OpenAI Developers (agents.max_depth, spawn_agents_on_csv) · Configuration Reference — Codex (agents.max_depth, max_threads, job_max_runtime_seconds) · Anthropic Just Dropped the Biggest Subagent Upgrade Yet — Ray Amjad

Orchestrating agents is a skill. Businesses pay for the result.

Look at what these patterns really teach: how to make a fleet of AI agents investigate, verify, and report — reliably, in parallel. That orchestration is exactly what local businesses want and will never build themselves. That's the opening Knotie gives you: spin up AI voice and chat agents for clients under your own brand and domain (Retell, LiveKit, VAPI, Ultravox, n8n), with a customer portal, credit billing, and your margin built in. You get good at directing agents here. Knotie lets you sell that to the people who can't.

See how Knotie works
Explore Kno2getherOne home for the products, experiments, tools & free guides.