Karpathy's Method: How to Build 10x Faster with Claude (Spec → Verifier → Environment)

Q: What's the single highest-leverage move in the whole method?

Defining 'correct' before the build and attaching it to something that can fail — a test, a diff against a sample row, a real authz call. Asking Claude 'did you verify it?' afterward is theatre; it'll say yes. Verification only works when the criteria exist up front and the check can come back red.

The 30-second recap, then the part the video skipped

If you saw the short: Karpathy's framing is that the difference between old software and AI is verifiability — "traditional computers automate what you can specify in code; LLMs automate what you can verify." The car-wash question (drive or walk 50m to a car wash? — every model says walk) is just a clean way to show models breaking where the context lives only in your head. Three layers fix that: write a spec, add a verifier, set up the environment. That's the idea. The problem is that "write a detailed spec" and "verify the output" are the kind of advice that sounds obvious and changes nothing. So this guide does one thing the 60 seconds couldn't: it runs one real task through all three layers and shows you the actual artifacts — the spec text, the verifier prompts, the files you commit. Steal them and adapt.

Attribution, so you can trust the rest: context engineering, the verifiability thesis, and "you can outsource your thinking but not your understanding" are Karpathy's own. The Spec → Verifier → Environment packaging is a teaching scaffold built on those ideas — a way to apply them, not a quote from him.

WORKED EXAMPLEThe task we'll run through all three layers

Pick something small enough to finish in a sitting but real enough to have edge cases. We'll use this one for the whole guide: > "Add an endpoint to our API that exports a customer's invoices as a CSV." That sentence is a vague ask. Handed to Claude as-is, you'll get a CSV export — probably the wrong columns, no auth check, and a memory blow-up the first time someone with 40,000 invoices hits it. The three layers are how you close that gap before a line of code is written. Watch the same task get sharper at each step.

Vague ask: "export invoices as CSV" — Claude fills the gaps with guesses
After Layer 1 (Spec): exact columns, auth rule, date format, the 40k-row case named out loud
After Layer 2 (Verifier): a definition of "correct" Claude can check itself against
After Layer 3 (Environment): the rules that made it behave without you re-typing them

LAYER 1 · SPECLayer 1 — The Spec: turn the vague ask into a contract

A spec is just your understanding written down in a form Claude can build from. Karpathy is lukewarm on plain plan mode — useful, but he'd rather work with the agent to design a genuinely detailed spec. Don't write it solo and don't accept Claude's first draft. Make it interview you, then read what it produces like an engineer, not a customer. Keep specs small and compartmentalized (this one endpoint, not "the whole billing module").

Open plan/spec mode and hand it the one-liner plus this: "Don't build yet. Interview me. Ask the 5–8 questions whose answers you'd otherwise have to guess — auth, data shape, edge cases, format, scale — one at a time."
Answer honestly, including the things you'd normally leave implicit ("only the account owner can export", "some customers have 40k+ invoices").
Then force the decision check — paste: "Restate the spec as a numbered contract. For each key decision, show me the option you chose and the one you rejected, so I can veto."
Read every line. If a line is vague to you, it's vaguer to the model — rewrite it before you continue.

The output you're aiming for is a short contract, e.g.: (1) Route GET /customers/:id/invoices.csv. (2) Auth: caller must be the account owner or admin; otherwise 403. (3) Columns, in order: invoice_id, issued_date (ISO-8601), amount_cents, currency, status. (4) Stream the response — do not build the whole file in memory; assume up to 50k rows. (5) Empty result returns a header-only CSV, not a 404. Five lines. That's the whole point — five specified lines beat five paragraphs of prose.

LAYER 2 · VERIFIERLayer 2 — The Verifier: give 'correct' a definition the model can check

This is the layer to actually get good at. The most painful part of working with AI is checking its output, so most people eyeball it and hope. Karpathy's point is that verification is the real lever — and the move is to make the model help verify itself against something concrete, not against your vibe. Three moves do most of the work.

Set the bar up front, before building: "List the evaluation criteria you'll judge this endpoint against — correctness, security, performance, edge cases — then build to them." This kills vague asks like "make it good."
Feed it a real reference so 'correct' has a shape: paste one row of real (or realistic) expected CSV output and the exact header line. Now "matches the format" is checkable, not aspirational.
Wire it to reality where you can: "Write a test that calls the endpoint as a non-owner and asserts 403, and one that exports 50,000 generated invoices and asserts memory stays flat (streamed, not buffered). Run them." A passing test beats "looks right."
Bundle it into the build prompt: end with "Before you call this done, run the verification step: check the output against the criteria above and the sample row, and report each check as pass/fail with the evidence."

An AI output being checked against a glowing checklist and gauge — the verification layer

The trap: asking Claude "did you verify it?" after the fact. It will cheerfully say yes. Verification only bites when the criteria exist before the build and the check is a thing that can fail (a test, a diff against a sample, a real call) — not a self-report.

LAYER 3 · ENVIRONMENTLayer 3 — The Environment: make the first two automatic

Layers 1 and 2 work once. The environment is what stops you re-typing them every session. Two pieces do almost all of it: a CLAUDE.md that states the rules in plain language, and — for the lines that must never be crossed — a hook that enforces them at the tool level. A prompt rule is a polite request the model can rationalize past. A hook is a wall it physically cannot walk through. The table sorts every action into three buckets; put the buckets in CLAUDE.md, and back the NEVER row with a hook.

Write the three buckets above into CLAUDE.md at the repo root. It loads every session with no script to run — the right home for conventions that don't change.
For the NEVER row, add a PreToolUse hook in .claude/settings.json with matcher: "Edit|Write" pointing at a script. CLAUDE.md asks; the hook blocks.
In the script, read tool_input.file_path from the JSON on stdin (jq -r '.tool_input.file_path'), and if it matches .env, /infra/ or /migrations/*, print a reason to stderr and exit 2 — exit 2 is a blocking error and the stderr text goes back to Claude as the reason.
Prefer exit 2 for the simple case; if you want richer control, exit 0 with a JSON permissionDecision: "deny" on stdout instead.
Audit it now and then — paste your CLAUDE.md back and ask: "Which of these rules are vague, unenforceable, or never actually checked? Rewrite those."

Bucket	For our CSV task	How you enforce it
ALWAYS (autopilot)	Before a multi-step change, write the spec + eval criteria and add a failable test	A line in `CLAUDE.md` — loads every session
ASK FIRST (pause)	Schema/migration changes, new deps, anything touching auth or billing	A line in `CLAUDE.md` — Claude checks with you
NEVER (hard line)	Edit `.env`, files under `/infra`, or existing DB migrations	A `PreToolUse` hook — Claude physically can't

Minimal hook, in plain terms: settings has hooks.PreToolUse[0].matcher = "Edit|Write" and hooks[0].command pointing at protect.sh. The script does f=$(jq -r '.tool_input.file_path' < /dev/stdin), then a case "$f" on the protected globs that echoes a message to >&2 and runs exit 2. That's the whole wall — a few lines, and Claude can no longer touch those paths no matter what the prompt says.

THE DEEPER SKILLContext engineering: what actually goes in the window

Karpathy's term for the real job is context engineering, which he defines as "the delicate art and science of filling the context window with just the right information for the next step." The three layers above are how you do it on purpose. His warning is the part people miss: "too little or of the wrong form and the LLM doesn't have the right context; too much or too irrelevant, and cost goes up and performance comes down." More context is not better context. For the CSV task, the table below is the difference between a window full of noise and a window aimed at the next step.

Goes in the window	For our CSV task	Why it earns its place
The spec/contract	The 5-line numbered contract from Layer 1	The model's actual target — without it, it guesses
A concrete reference	One sample CSV row + the exact header line	Turns 'correct format' into something checkable
The real schema/types	The `Invoice` model and the auth helper signature	Stops it inventing column names and authz it can't call
Evaluation criteria	Correctness / security / perf / edge-case list	Defines 'done' before the build, not after
Leave OUT	The whole repo, unrelated modules, old chat	Irrelevant tokens raise cost and lower performance

Practical version: before a hard step, ask "What's the minimum context you need to do this next step well — which files, which examples?" Then give exactly that. You're curating the window, not stuffing it.

CHEAT SHEETRun sheet: the whole loop on one page

Once it clicks, the method is a short loop you can run on any task. The labels are a scaffold; the discipline underneath — understand, specify, define correct, enforce — is the bit that compounds.

Spec — "Interview me, then restate as a numbered contract; show the option you rejected per decision."
Verify — "List evaluation criteria first; build to them; add a test that can fail; report each check pass/fail with evidence."
Environment — put rules in CLAUDE.md; enforce the hard lines with a PreToolUse hook, not a prompt.
Curate context — give the minimum right files/examples for the next step; leave the rest out.
The skill that lasts — "You can outsource your thinking, but you can't outsource your understanding." Stay the person who understands the problem well enough to steer.

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.

Frequently asked questions

What does Karpathy's method actually change about how I prompt Claude?

It stops you prompting and starts you specifying. Instead of one paragraph of intent, you (1) make Claude interview you and restate a numbered spec/contract, (2) set evaluation criteria and a failable test before the build so 'correct' is defined, and (3) move the rules into a CLAUDE.md + a hook so you're not re-typing them. The labels are a teaching scaffold; the underlying ideas — context engineering and verification — are Karpathy's.

How do I write a good spec without over-engineering it?

Aim for a short numbered contract, not a document. For the CSV example that's ~5 lines: the route, the auth rule, the exact columns and date format, the streaming/scale requirement, and the empty-result behavior. The test: every line is a decision someone could get wrong. If a line is vague to you, rewrite it before you build — it's vaguer to the model.

What's the single highest-leverage move in the whole method?

Defining 'correct' before the build and attaching it to something that can fail — a test, a diff against a sample row, a real authz call. Asking Claude 'did you verify it?' afterward is theatre; it'll say yes. Verification only works when the criteria exist up front and the check can come back red.

Why use a hook instead of just writing the rule in CLAUDE.md?

CLAUDE.md is a request the model can rationalize past mid-task; a PreToolUse hook is enforcement it can't. The hook matches Edit/Write, reads tool_input.file_path from stdin, and exits with code 2 (stderr becomes Claude's error) or returns a permissionDecision: deny on exit 0. Use CLAUDE.md for conventions, the hook for genuine never-cross lines like .env or migrations.

What's the difference between context engineering and prompt engineering?

Karpathy's framing: prompt engineering is the wording of a single task; context engineering is 'the delicate art and science of filling the context window with just the right information for the next step.' The catch is that more isn't better — too much or irrelevant context raises cost and lowers performance. So you curate the minimum right files and examples for the next step, not the whole repo.

Is this only for coding tasks?

No. The CSV endpoint is just a concrete carrier. A spec (numbered contract), evaluation criteria, and the always/ask-first/never rules are plain language and work for a report, an analysis, or a content brief. Only the hook is code-specific — that's the optional bit for hard file-level guardrails.

Does any of this depend on Claude specifically?

The spec and verifier moves are model-agnostic — they're prompting and process. CLAUDE.md and PreToolUse hooks are Claude Code features, so the environment layer as written is Claude-specific; other agents have their own equivalents (rules files, tool permissions).

Sources · Stop Prompting Claude. Use Karpathy's Method Instead — Austin Marchese · Andrej Karpathy on 'context engineering' over 'prompt engineering' (X) · Context Engineering — "filling the context window with the right information" (GitHub, quoting Karpathy) · Claude Code — Hooks reference (PreToolUse, matcher, exit-code 2 vs permissionDecision deny) · Claude Code — CLAUDE.md & settings (Anthropic docs)

Karpathy's Method: How to Build 10x Faster with Claude (Spec → Verifier → Environment)

The 30-second recap, then the part the video skipped

WORKED EXAMPLEThe task we'll run through all three layers

LAYER 1 · SPECLayer 1 — The Spec: turn the vague ask into a contract

LAYER 2 · VERIFIERLayer 2 — The Verifier: give 'correct' a definition the model can check

LAYER 3 · ENVIRONMENTLayer 3 — The Environment: make the first two automatic

THE DEEPER SKILLContext engineering: what actually goes in the window

CHEAT SHEETRun sheet: the whole loop on one page

Get the next drop

Frequently asked questions

Verified agents are worth money. Most businesses can't build them.

Keep the Spec → Verifier → Environment build sheet — and get the next playbook first

Karpathy's Method: How to Build 10x Faster with Claude (Spec → Verifier → Environment)

The 30-second recap, then the part the video skipped

WORKED EXAMPLEThe task we'll run through all three layers

LAYER 1 · SPECLayer 1 — The Spec: turn the vague ask into a contract

LAYER 2 · VERIFIERLayer 2 — The Verifier: give 'correct' a definition the model can check

LAYER 3 · ENVIRONMENTLayer 3 — The Environment: make the first two automatic

THE DEEPER SKILLContext engineering: what actually goes in the window

CHEAT SHEETRun sheet: the whole loop on one page

Get the next drop

Frequently asked questions

More free guides

Verified agents are worth money. Most businesses can't build them.

Keep the Spec → Verifier → Environment build sheet — and get the next playbook first