Why this mattersThe shift: from supervising tasks to directing an agent that checks itself
An engineer on Anthropic's Claude Code team described the change Claude Fable 5 made to his day like this: he used to check whether Claude did the work right — break the task into small chunks, double-check the output, catch where it stopped early. Now he increasingly checks whether Claude did the right work. His job, in his words, became more about direction and setup than supervision. It maps to two real, shipped features you can verify and run today: a /goal command that makes the agent loop until a condition you defined is met, and a dynamic workflow that makes it verify its own output and report what differed from your spec. Both are documented (links at the bottom). The story is just the on-ramp; the prompt shape below is the thing you can test yourself in five minutes.
Part oneAnatomy of a /goal: a crisp definition-of-done
/goal is a Claude Code command (v2.1.139+). You give it a completion condition and Claude keeps working across turns toward it — you don't re-prompt each step. After every turn, a separate small fast model (Haiku by default) reads the conversation and decides yes/no: is the condition met? A 'no' sends Claude back to work with the reason as guidance for the next turn. A 'yes' clears the goal. Setting it starts a turn immediately — the condition itself is the directive. This is the 'loops until done' behaviour from the video: the agent doesn't clock off at 5pm regardless; it keeps going until your definition-of-done holds.
- Set it:
/goal all tests in test/auth pass and the lint step is clean— the condition is the prompt. No separate message needed. - Make the condition provable from Claude's own output. The evaluator doesn't run commands or read files itself — it only judges what Claude has surfaced in the transcript. 'npm test exits 0' works because Claude runs it and the result lands in the conversation.
- Give it a measurable end state (a test result, a build exit code, a file count, an empty queue), a stated check (how to prove it), and constraints that must not change on the way (e.g. 'no other test file is modified').
- Bound the run: add a clause like
or stop after 20 turnsso a goal that can't converge doesn't burn tokens forever. The condition can be up to 4,000 characters. - Check or stop:
/goal(no argument) shows the condition, turns evaluated, token spend, and the evaluator's latest reason;/goal clearends it early.
Part twoAnatomy of a workflow: verify each part, diff against spec
A dynamic workflow is Claude writing its own throwaway orchestration script for the task in front of it — instead of running one linear conversation, it splits the work into independent units and spawns a separate subagent for each, all running in parallel. Concretely: ask it to verify a migration across 40 call sites and it can fan that into batches checked simultaneously, then have a second set of subagents try to refute each result — re-run the test, re-read the spec, look for the case the first pass missed. They iterate until the answers converge, and that convergence happens before anything reaches you. That refutation loop is your free quality-assurance layer: the /goal makes the run keep going until done; the workflow makes it grade its own output in parallel and surface only the disagreements. You invoke one in plain language ('use a workflow to…'), or enable the ultracode effort setting and let Claude decide when the task is big enough to warrant one.
- Verify each part, not the whole. Ask the workflow to verify each part of the plan separately — a per-component check catches the one module that drifted, where a single 'does it work?' pass hides it.
- Demand a diff against spec. The high-leverage instruction is 'report what was implemented and if anything differed.' That report is where you spend your attention now — not re-reading every line, just the delta between what you asked for and what shipped.
- Let refutation do the QA. Subagents checking each other's findings is the mechanism. You don't have to design it — you invoke it by asking for a workflow and stating what 'correct' means.
The patternThe copy-paste template: spec → goal → workflow → report
This is the engineer's actual prompt shape, generalised. Start from a spec you've already written (a design doc, a ticket, a short paragraph of acceptance criteria). Then chain the two parts: the goal sets the definition-of-done and makes it loop; the workflow forces per-part verification and a diff report. Swap the bracketed parts for your task.
- Write the spec first. Even three bullet points. The diff report is only as useful as the spec it's diffing against — vague spec, useless report.
- Set the goal:
/goal implement [SPEC or @spec.md] fully — every acceptance criterion holds and the test suite exits 0, without modifying [files/areas that must stay untouched], or stop after [N] turns - Then, in the run, ask for the workflow: 'Use a workflow to verify each part of the plan against the spec, and prepare a report of what was implemented and whether anything differed from the spec — flag every deviation with the file and the reason.'
- Read the report, not the code. Your job is now the deltas: what differed, and whether each deviation is acceptable. Approve, or feed a correction back as a new turn — the goal is still active, so it keeps working.
- For unattended runs, pair it with auto mode (approves tool calls within a turn) so each goal turn runs without per-tool prompts; /goal also works non-interactively, e.g.
claude -p "/goal …"runs the whole loop in one invocation.
Where to point itThree jobs that fit this shape
The pattern earns its keep on long, verifiable work — where the value is in the agent not stopping early and proving it hit the bar. Three that map cleanly:
- The overnight refactor. Goal: 'migrate every call site off the old API until the project compiles and all tests pass, touching no test files.' Workflow: verify each migrated module compiles and its tests pass, report any call site it couldn't migrate and why. You read the exceptions list in the morning, not the diff.
- The spec build. Goal: 'implement @design-doc.md until all acceptance criteria hold.' Workflow: verify each criterion independently, report which are met and where the implementation diverged from the doc. The diff is your acceptance review.
- Batch QA. Goal: 'work through the labelled issue backlog until the queue is empty.' Workflow: for each item, verify the fix against the issue's stated repro and report any it closed without a verifiable check. You audit the unverifiable ones.
Don't get burnedFailure modes: where this quietly breaks
Five ways this quietly breaks — all traceable to a weak spec or an unprovable condition. Catch these before you set the goal, not three hours into a run.
- Vague done-criteria. 'Make the code better' has no measurable end state, so the evaluator can never confidently say yes — the loop runs until your turn-cap, burning tokens (and at $10/$50 per million on Fable 5, that adds up). Always give a test result, exit code, count, or empty-queue condition.
- No verifiable success check. The /goal evaluator only judges what's in the transcript — it doesn't run commands or open files. If your condition is 'the feature works' but Claude never runs the feature, the evidence isn't there to evaluate. State how Claude should prove it ('npm test exits 0', 'git status is clean').
- Verifying the whole instead of each part. A single end-of-run 'does it work?' check hides the one component that drifted. Ask the workflow to verify each part and diff each against spec — the deviation you care about is usually local.
- A spec the report can't diff against. If the spec lives only in your head, 'what differed from the spec' has nothing to compare to. Write it down — even briefly — and reference it (
@spec.md) so the diff is real. - No turn or time bound. A goal with no
or stop after N turnsclause and a condition that can't converge will keep looping. Bound every long run.
Get the next drop
New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.
By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.
Frequently asked questions
What's the exact difference between /goal and a workflow?
/goal (Claude Code v2.1.139+) sets a completion condition and keeps Claude working across turns until a separate fast model (Haiku by default) confirms the condition holds — it's the 'loop until done' part. A dynamic workflow is Claude writing an orchestration script that fans work across parallel subagents and checks the results before they reach you — it's the 'verify and report' part. The two-part prompt sets a goal (definition-of-done) and asks for a workflow (per-part verification + diff against spec).Are /goal and workflows specific to Fable 5, or do they work on other models?
/goal is a command and dynamic workflows are an orchestration capability, both available in recent Claude Code. They shine on Fable 5 because it's built for long-horizon agentic runs (minutes to hours, self-verifying), which is exactly what a looping goal plus a verifying workflow exercises. You can use the pattern with other capable models; the payoff scales with how long and autonomously the model can run.How do I write a /goal condition the evaluator can actually judge?
Is Claude Fable 5 free, and for how long?
Can I run this unattended, like the overnight job in the video?
/goal with auto mode so tool calls are approved within each turn — then every goal turn runs without per-tool prompts. /goal also works in non-interactive mode: claude -p "/goal CHANGELOG.md has an entry for every PR merged this week" runs the loop to completion in a single invocation (Ctrl+C to stop early). A goal still active when a session ends is restored on --resume/--continue, though the turn count and token baseline reset. That's the 'leave it running on the Mac Mini overnight' workflow, made reproducible.