Kno2gether kno2gether.com ↗ Start free
Prompt Pattern Guide

The Two-Part Prompt: Pair a /goal With a Workflow So Claude Verifies Its Own Work

An Anthropic engineer changed one line in how he prompts and his job flipped from checking whether Claude did the work right to checking whether it did the right work. The exact two-part prompt shape: a definition-of-done that loops, plus a workflow that diffs the output against the spec.

See how Knotie ships agents
01

Why this mattersThe shift: from supervising tasks to directing an agent that checks itself

An engineer on Anthropic's Claude Code team described the change Claude Fable 5 made to his day like this: he used to check whether Claude did the work right — break the task into small chunks, double-check the output, catch where it stopped early. Now he increasingly checks whether Claude did the right work. His job, in his words, became more about direction and setup than supervision. It maps to two real, shipped features you can verify and run today: a /goal command that makes the agent loop until a condition you defined is met, and a dynamic workflow that makes it verify its own output and report what differed from your spec. Both are documented (links at the bottom). The story is just the on-ramp; the prompt shape below is the thing you can test yourself in five minutes.

Fable 5 (claude-fable-5) is Anthropic's most capable widely released model, built for long-horizon agentic work — runs that take minutes to hours without you babysitting. At launch it was free on paid Claude plans (Pro, Max, Team, seat-based Enterprise) for an introductory window (June 9–22, 2026) — check claude.com/pricing for current access. Note: during that window Fable 5 counted roughly double the usage of Opus 4.8 against your plan limit, so a long overnight /goal run burned your quota about 2× faster than the same work on Opus 4.8. After the window it requires usage credits billed at API rates (at launch, $10 / $50 per million input/output tokens — double Opus 4.8). Either way, the goal+workflow discipline below isn't just about quality, it's about not paying — in quota or credits — for a long run that drifted off-spec.
02

Part oneAnatomy of a /goal: a crisp definition-of-done

/goal is a Claude Code command (v2.1.139+). You give it a completion condition and Claude keeps working across turns toward it — you don't re-prompt each step. After every turn, a separate small fast model (Haiku by default) reads the conversation and decides yes/no: is the condition met? A 'no' sends Claude back to work with the reason as guidance for the next turn. A 'yes' clears the goal. Setting it starts a turn immediately — the condition itself is the directive. This is the 'loops until done' behaviour from the video: the agent doesn't clock off at 5pm regardless; it keeps going until your definition-of-done holds.

  1. Set it: /goal all tests in test/auth pass and the lint step is clean — the condition is the prompt. No separate message needed.
  2. Make the condition provable from Claude's own output. The evaluator doesn't run commands or read files itself — it only judges what Claude has surfaced in the transcript. 'npm test exits 0' works because Claude runs it and the result lands in the conversation.
  3. Give it a measurable end state (a test result, a build exit code, a file count, an empty queue), a stated check (how to prove it), and constraints that must not change on the way (e.g. 'no other test file is modified').
  4. Bound the run: add a clause like or stop after 20 turns so a goal that can't converge doesn't burn tokens forever. The condition can be up to 4,000 characters.
  5. Check or stop: /goal (no argument) shows the condition, turns evaluated, token spend, and the evaluator's latest reason; /goal clear ends it early.
Why a separate evaluator matters: left to its own judgement, an agent decides it's done when it thinks so. /goal hands the done/not-done call to a fresh model looking only at the evidence in the transcript — so 'done' is decided by something other than the thing that did the work. That's the structural reason it stops cutting corners.
03

Part twoAnatomy of a workflow: verify each part, diff against spec

A dynamic workflow is Claude writing its own throwaway orchestration script for the task in front of it — instead of running one linear conversation, it splits the work into independent units and spawns a separate subagent for each, all running in parallel. Concretely: ask it to verify a migration across 40 call sites and it can fan that into batches checked simultaneously, then have a second set of subagents try to refute each result — re-run the test, re-read the spec, look for the case the first pass missed. They iterate until the answers converge, and that convergence happens before anything reaches you. That refutation loop is your free quality-assurance layer: the /goal makes the run keep going until done; the workflow makes it grade its own output in parallel and surface only the disagreements. You invoke one in plain language ('use a workflow to…'), or enable the ultracode effort setting and let Claude decide when the task is big enough to warrant one.

  • Verify each part, not the whole. Ask the workflow to verify each part of the plan separately — a per-component check catches the one module that drifted, where a single 'does it work?' pass hides it.
  • Demand a diff against spec. The high-leverage instruction is 'report what was implemented and if anything differed.' That report is where you spend your attention now — not re-reading every line, just the delta between what you asked for and what shipped.
  • Let refutation do the QA. Subagents checking each other's findings is the mechanism. You don't have to design it — you invoke it by asking for a workflow and stating what 'correct' means.
Independent verifiers beat self-critique. Anthropic's own guidance for long Fable 5 runs is to establish a checking method and run it on a cadence — and that fresh-context verifier subagents tend to outperform a model grading its own work. The workflow gives you that for free; the diff-against-spec instruction points it at the thing you actually care about.
04

The patternThe copy-paste template: spec → goal → workflow → report

This is the engineer's actual prompt shape, generalised. Start from a spec you've already written (a design doc, a ticket, a short paragraph of acceptance criteria). Then chain the two parts: the goal sets the definition-of-done and makes it loop; the workflow forces per-part verification and a diff report. Swap the bracketed parts for your task.

  1. Write the spec first. Even three bullet points. The diff report is only as useful as the spec it's diffing against — vague spec, useless report.
  2. Set the goal: /goal implement [SPEC or @spec.md] fully — every acceptance criterion holds and the test suite exits 0, without modifying [files/areas that must stay untouched], or stop after [N] turns
  3. Then, in the run, ask for the workflow: 'Use a workflow to verify each part of the plan against the spec, and prepare a report of what was implemented and whether anything differed from the spec — flag every deviation with the file and the reason.'
  4. Read the report, not the code. Your job is now the deltas: what differed, and whether each deviation is acceptable. Approve, or feed a correction back as a new turn — the goal is still active, so it keeps working.
  5. For unattended runs, pair it with auto mode (approves tool calls within a turn) so each goal turn runs without per-tool prompts; /goal also works non-interactively, e.g. claude -p "/goal …" runs the whole loop in one invocation.
The plain-English version the engineer used: 'Set a goal to implement the spec fully. Then use a workflow to verify each part of the plan and prepare a report on what was implemented and if anything differed.' Two sentences. That's the whole pattern.
05

Where to point itThree jobs that fit this shape

The pattern earns its keep on long, verifiable work — where the value is in the agent not stopping early and proving it hit the bar. Three that map cleanly:

  1. The overnight refactor. Goal: 'migrate every call site off the old API until the project compiles and all tests pass, touching no test files.' Workflow: verify each migrated module compiles and its tests pass, report any call site it couldn't migrate and why. You read the exceptions list in the morning, not the diff.
  2. The spec build. Goal: 'implement @design-doc.md until all acceptance criteria hold.' Workflow: verify each criterion independently, report which are met and where the implementation diverged from the doc. The diff is your acceptance review.
  3. Batch QA. Goal: 'work through the labelled issue backlog until the queue is empty.' Workflow: for each item, verify the fix against the issue's stated repro and report any it closed without a verifiable check. You audit the unverifiable ones.
Common thread: a measurable end state the agent can prove, and a per-item verification you'd otherwise do by hand. If you can't state what 'done' looks like as something Claude's own output can demonstrate, this pattern isn't the right tool yet — tighten the spec first.
06

Don't get burnedFailure modes: where this quietly breaks

Five ways this quietly breaks — all traceable to a weak spec or an unprovable condition. Catch these before you set the goal, not three hours into a run.

  • Vague done-criteria. 'Make the code better' has no measurable end state, so the evaluator can never confidently say yes — the loop runs until your turn-cap, burning tokens (and at $10/$50 per million on Fable 5, that adds up). Always give a test result, exit code, count, or empty-queue condition.
  • No verifiable success check. The /goal evaluator only judges what's in the transcript — it doesn't run commands or open files. If your condition is 'the feature works' but Claude never runs the feature, the evidence isn't there to evaluate. State how Claude should prove it ('npm test exits 0', 'git status is clean').
  • Verifying the whole instead of each part. A single end-of-run 'does it work?' check hides the one component that drifted. Ask the workflow to verify each part and diff each against spec — the deviation you care about is usually local.
  • A spec the report can't diff against. If the spec lives only in your head, 'what differed from the spec' has nothing to compare to. Write it down — even briefly — and reference it (@spec.md) so the diff is real.
  • No turn or time bound. A goal with no or stop after N turns clause and a condition that can't converge will keep looping. Bound every long run.
This pattern moves your effort upstream — more on writing a crisp, provable spec, less on babysitting and re-reading output. That's the trade the engineer described: direction and setup over supervision.

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.

Frequently asked questions

What's the exact difference between /goal and a workflow?
They do different jobs and pair together. /goal (Claude Code v2.1.139+) sets a completion condition and keeps Claude working across turns until a separate fast model (Haiku by default) confirms the condition holds — it's the 'loop until done' part. A dynamic workflow is Claude writing an orchestration script that fans work across parallel subagents and checks the results before they reach you — it's the 'verify and report' part. The two-part prompt sets a goal (definition-of-done) and asks for a workflow (per-part verification + diff against spec).
Are /goal and workflows specific to Fable 5, or do they work on other models?
They're Claude Code features, not model features — /goal is a command and dynamic workflows are an orchestration capability, both available in recent Claude Code. They shine on Fable 5 because it's built for long-horizon agentic runs (minutes to hours, self-verifying), which is exactly what a looping goal plus a verifying workflow exercises. You can use the pattern with other capable models; the payoff scales with how long and autonomously the model can run.
How do I write a /goal condition the evaluator can actually judge?
Make it provable from Claude's own output. The evaluator reads the conversation transcript — it does not run commands or open files itself. So 'all tests in test/auth pass and the lint step is clean' works because Claude runs the tests and lint, and the results land in the transcript. Give it three things: a measurable end state (test result, exit code, file count, empty queue), a stated check (how Claude proves it), and constraints that must not change. Add 'or stop after N turns' to bound the run. Conditions can be up to 4,000 characters.
Is Claude Fable 5 free, and for how long?
Time-sensitive — verify before relying on it: check claude.com/pricing for current plan access. At launch, Fable 5 was free on paid Claude plans (Pro, Max, Team, and seat-based Enterprise) for an introductory window of June 9–22, 2026 — with one catch worth knowing: during that free window, Fable 5 counted roughly double the usage of Opus 4.8 against your plan limit, so a long overnight /goal run burned your monthly quota about 2× faster than the same work on Opus 4.8. 'Free' didn't mean unmetered. After the window it requires usage credits, billed at API rates (at launch, $10 per million input tokens and $50 per million output — double Opus 4.8). Either way, the goal+workflow discipline matters: a long run that drifts off-spec is expensive in quota or credits, so a provable definition-of-done and a verifying workflow pay for themselves.
Can I run this unattended, like the overnight job in the video?
Yes. Pair /goal with auto mode so tool calls are approved within each turn — then every goal turn runs without per-tool prompts. /goal also works in non-interactive mode: claude -p "/goal CHANGELOG.md has an entry for every PR merged this week" runs the loop to completion in a single invocation (Ctrl+C to stop early). A goal still active when a session ends is restored on --resume/--continue, though the turn count and token baseline reset. That's the 'leave it running on the Mac Mini overnight' workflow, made reproducible.
Sources · Keep Claude working toward a goal — Claude Code Docs (/goal command, evaluator, conditions) · Introducing dynamic workflows in Claude Code — Anthropic (parallel subagents, verify-before-it-reaches-you, converge loop) · A harness for every task: dynamic workflows in Claude Code — Anthropic · Long-running Claude for scientific computing — Anthropic (long-horizon agentic runs, self-verification) · Claude Code — Anthropic (product overview) · Anthropic released Claude Fable 5, its most powerful model — TechCrunch (Fable 5 launch, free window) · Anthropic releases Claude Fable 5, Mythos 5 — SD Times · Source video: Claude Fable 5 walkthrough (goal + workflows demo) — YouTube, credited to Samin Yasar

The agents you deploy for clients deserve the same definition-of-done

This pattern keeps YOUR builds on-spec. The same idea — give the agent a verifiable goal, make it check itself, report the deltas — is exactly how you'd want a voice or chat agent you sell to a client to behave: complete the task, prove it hit the bar, and surface what it couldn't. Knotie lets you build and resell those agents under your own brand and domain, multi-provider, with credit billing and your margin built in — so 'self-verifying agent' becomes a product you ship, not a prompt you babysit.

See how Knotie ships agents