Ground truthFirst, fix one thing the hype gets wrong
Ultracode is not a new model and not a secret API setting. In Anthropic's own effort docs it says it plainly: ultracode appears in Claude Code's effort menu, but it is not an additional API effort level. The real API ladder is five rungs only — low, medium, high (the default), xhigh, max. What /effort ultracode actually does is pair the xhigh rung with standing permission for Claude Code to launch multi-agent workflows on its own. So when you toggle it, you are not buying a smarter brain. You are handing Claude a budget and the keys to spin up a swarm. That distinction is the whole game, because the swarm is where the money goes.
/version to confirm yours is current. On Pro you also flip the Dynamic workflows row in /config on first. ultracode only appears in /effort on models that support the xhigh rung; the effort-levels doc lists which models qualify.Why it can beat a single passThe actual difference: who holds the plan
The video said the agents argue. True, but the deeper reason orchestration wins is structural, and it is the one thing the short can't show you: where the plan and the half-finished work live. A normal chat, subagents, even agent teams all keep intermediate results in Claude's context window — so a long job competes with itself for attention and drifts. A workflow moves the loop, the branching, and every intermediate result into a JavaScript script the runtime executes outside the conversation. Claude's context ends up holding only the final answer. That is why a 500-file pass doesn't forget what it found in file 30 by the time it reaches file 480.
| Subagents | Skills | Agent teams | Workflows (ultracode) | |
|---|---|---|---|---|
| Who decides what runs next | Claude, turn by turn | Claude, per the prompt | Lead agent, turn by turn | The script |
| Where mid-run results live | Context window | Context window | Shared task list | Script variables (off-context) |
| What's repeatable | The worker def | The instructions | The team def | The orchestration itself |
| Scale | A few per turn | Same as subagents | A handful of peers | Dozens to hundreds per run |
| If you interrupt | Restarts the turn | Restarts the turn | Teammates keep running | Resumable in the same session |
When 16 agents earn their keepThe decision rule: orchestrate or run one pass
Spawning a swarm costs meaningfully more tokens than the same task in a chat, so the question is never is ultracode better — it's does this task reward fanning out. Here is the rule I use. A task is worth a workflow only if it clears both gates below. If it clears only one (or neither), /effort high gives you the same answer for a fraction of the bill.
- Gate 1 — Breadth or stakes. Is the work spread across many files/sources (a codebase-wide sweep, a big migration, multi-source research), OR is being wrong genuinely expensive (a security audit, a plan you'll commit weeks to)? If neither, stop — use
/effort high. - Gate 2 — Does it reward cross-checking? Would independent angles or adversarial review actually change the answer? Research that needs sources weighed against each other: yes. Re-running a deterministic refactor: no, one pass is fine.
- Both gates clear -> run a workflow. Good fits: codebase-wide bug/auth sweeps, a 500-file migration, multi-source research where claims must survive cross-checking, drafting a hard plan from several independent angles.
- Only one (or zero) clears ->
/effort high. Single-file edits, quick lookups, and strict serial A->B->C work where each step just needs the last one's output get nothing from the orchestration tax.
Steer the bill before you hit goThe 3 settings that cap the spend — named, with where each lives
The token bill is the only real downside, and it's entirely yours to prevent. These are the three controls that actually move the number, not vibes. Set them before the run, not after the invoice.
- 1) Model — set in
/model, checked before the run. Every agent in a workflow uses your session's model unless the script routes a stage elsewhere. So a swarm running on Opus is the priciest possible shape. Run/modelfirst; if you normally code on a smaller model, stay on it, or tell Claude in your prompt to use a smaller model for the stages that don't need the strongest one. This is the single biggest lever — model choice multiplies across every one of the dozens-to-hundreds of agents. - 2) Scope — controlled in your prompt + watched in
/workflows. Run on a slice first: one directory instead of the whole repo, one narrow question instead of a broad one. The/workflowsprogress view shows each phase's agent count, token total, and elapsed time live; presspto pause/resume,xto stop the whole run, and you keep every completed agent's result. Stop the moment tokens outpace value. - 3) The hard caps + the approval gate — built into the runtime, surfaced at launch. The runtime bounds a runaway script at 16 concurrent agents (fewer on low-core machines) and 1,000 agents total per run — that's the ceiling, not a target. Before any run, the approval card lists the planned phases and a token-usage caution; choose View raw script (or
Ctrl+Gto open it in your editor) to see the plan before you spend a token. In Default/accept-edits mode you get this prompt every run unless you opted into don't ask again for that workflow.
"disableWorkflows": true in ~/.claude/settings.json, or CLAUDE_CODE_DISABLE_WORKFLOWS=1, or just toggle off Ultracode keyword trigger in /config to stop the keyword firing by accident.What the trade-off looks like in practiceWorked example: cost vs quality on the same task
Take one real task — audit every endpoint under src/routes/ for missing auth checks — and run it three ways. The numbers below are about shape and direction, not a promise of exact counts (your repo size, model, and plan move them). The point is to show where each option pays off and where it just burns tokens.
| Approach | How it runs | Relative cost | Best when |
|---|---|---|---|
/effort high, single pass | One serial pass reads files in sequence; results stay in context | Lowest | Small route folder; you mostly trust one careful read |
xhigh (no workflow) | Deeper per-step reasoning, still one agent; expect meaningfully higher tokens than high | Medium | Tricky logic in a few files where the reasoning is the hard part |
ultracode workflow | Fans out across routes, agents cross-check findings, votes, reports survivors | Highest | Wide route surface where a missed auth check is expensive to ship |
xhigh, keep high as the floor for most intelligence-sensitive work, and reserve max for genuinely frontier problems — on most tasks max adds real cost for small gains and can even overthink structured output. When you do run xhigh/ultracode, give the run a generous max_tokens ceiling so the swarm has room to think and act — there's no official number here, so treat any specific figure you see floating around as community practice, not Anthropic guidance, and tune it to your own runs.Try orchestration without flipping the sessionThe lowest-risk way to feel it: /deep-research
If you want the ultracode experience without setting /effort ultracode on the whole session, run the one workflow Anthropic ships in the box. /deep-research <question> fans web searches across several angles, fetches and cross-checks the sources it finds, votes on each claim, and hands back a cited report with the claims that didn't survive cross-checking already filtered out. It's the cleanest demo of the adversarial-verification pattern, scoped to one question.
- Run it:
/deep-research What changed in the Node.js permission model between v20 and v22? - Approve the plan when Claude Code asks (it shows the phases first).
- Watch with
/workflows-> arrow to the run -> Enter. You'll see agent count, tokens, and elapsed time per phase. - Read the cited report when it lands. Requires the WebSearch tool to be available.
s in /workflows to save its script as a /command — to your project's .claude/workflows/ (shared with the repo) or ~/.claude/workflows/ (just you). It then runs as /<name> and can take input via an args global.Get the next drop
New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.
By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.
Frequently asked questions
Is ultracode a smarter model than /effort high?
xhigh reasoning level plus standing permission for Claude Code to launch multi-agent workflows on its own. The model is the same; what changes is that Claude can now spin up a swarm without asking each time.What's the difference between typing ultracode in a prompt and setting /effort ultracode?
ultracode in one prompt runs just that single task as a workflow and leaves your effort level untouched. /effort ultracode makes Claude decide, for every substantive task the rest of the session, whether to orchestrate. It lasts the session and resets when you start a new one — drop back with /effort high for routine work. (On some builds the literal trigger keyword has been workflow rather than ultracode; plain natural-language requests like "run a workflow" work either way.)How many agents can actually run, and can a script run away?
Which single setting cuts cost the most?
/model. Every agent uses the session's model unless the script routes a stage elsewhere, so the model choice multiplies across the whole swarm. Running a big workflow on a smaller model, or telling Claude to use a smaller model for the stages that don't need the strongest one, moves the bill more than anything else you can do.Can I stop a workflow partway and not lose the work?
/workflows, p pauses/resumes and x stops a run. When you resume, agents that already finished return their cached results and only the rest run live. The catch: if you quit Claude Code entirely while a workflow is running, the next session starts it fresh.Do I have to approve what it's about to do?
Ctrl+G) to read the plan first. In Auto mode you're prompted on first launch only; in bypass/claude -p/Agent SDK there's no prompt and the run starts immediately. Note the subagents always run in acceptEdits and inherit your tool allowlist regardless of session mode — so add the shell/web commands they need beforehand to avoid mid-run pauses.When is xhigh the right call instead of full ultracode?
xhigh gives you the deep per-step thinking (Anthropic recommends it as the starting point for coding/agentic work) without spawning a swarm. Reach for an actual workflow only when the task is wide or needs cross-checking. And reserve max for genuinely frontier problems — on most work it just adds cost.How do I reuse a workflow I liked?
/workflows, select the run and press s. Save it to .claude/workflows/ (shared via the repo) or ~/.claude/workflows/ (just you). It becomes /<name> in future sessions and can accept input through an args global — so you can pass a question or a list of paths at call time instead of editing the script.