Verified against Anthropic's docsThe fast facts (so the table makes sense)
Fable 5 (claude-fable-5, the public release of the Mythos line) is Anthropic's most capable widely-released model. Confirm these before you wire anything up — the rest of this guide builds on them.
- Model id —
claude-fable-5— there is alsoclaude-mythos-5, the same model without the safety classifiers, but it's invite-only (Project Glasswing). For everyone else it'sclaude-fable-5. - Context / output — 1M tokens in, 128K out. Stream anything over ~16K output or the SDK times out the request.
- Price — $10 / 1M input · $50 / 1M output — 2× Opus 4.8's $5/$25, 3.3× Sonnet's $3/$15, 10× Haiku's $1/$5.
- Quiet cost multiplier — Fable 5 uses the Opus-4.7 tokenizer: the same text is ~30% more tokens than on older models. So your effective bill is closer to ~2.6× Opus on identical inputs, not 2×. Re-measure with
count_tokensbefore you trust an old budget.
Route by task, not by reflexWhen is Fable 5 actually worth $50/1M? A real decision table
The honest answer is rarely as your default. Match the task to the cheapest model that clears the quality bar — Fable 5 earns its premium on a narrow set of jobs. Use this as your routing rule:
| Task | Reach for Fable 5? | Cheaper model that usually wins | Why |
|---|---|---|---|
| Deep, multi-source research brief | Yes | — | Long-horizon synthesis across a full 1M window is where it pulls ahead. |
| Greenfield build from a clear, detailed spec | Yes | — | Testers report one-shot implementations of systems that took days to iterate. |
| Large refactor / repo-wide change | Yes | — | Holds long context and self-verifies across many files. |
| Hard judgment call (contracts, tradeoffs, design review) | Yes | — | Pay for being right when the cost of wrong is high. |
| Agentic coding loop (interactive) | Maybe | Opus 4.8 at xhigh | Opus 4.8 is near-frontier at half the price; benchmark both on YOUR repo first. |
| Everyday chat / drafting / Q&A | No | Sonnet 4.6 | Sonnet is the speed/intelligence sweet spot at $3/$15. |
| Classification, extraction, routing, tagging | No | Haiku 4.5 | $1/$5 and fast; Fable 5 is pure waste here. |
| High-volume / latency-sensitive jobs | No | Haiku 4.5 / Sonnet 4.6 | Minutes-long turns and 10× price make Fable 5 a non-starter at scale. |
| Offensive-security or bio/lab work | No — it'll refuse | Opus 4.8 | Fable 5's classifiers decline these (see the refusal section). |
The model-price ladder (the numbers behind the routing)
Same base_url, same SDK — you switch models by changing one string. Here's what each rung costs and what it's for.
| Model | $/1M in · out | Context · max out | Best for |
|---|---|---|---|
Fable 5 (claude-fable-5) | $10 · $50 | 1M · 128K | Hardest research, big greenfield builds, judgment |
Opus 4.8 (claude-opus-4-8) | $5 · $25 | 1M · 128K | Strong default for agentic + coding |
Sonnet 4.6 (claude-sonnet-4-6) | $3 · $15 | 1M · 64K | Best speed/intelligence balance |
Haiku 4.5 (claude-haiku-4-5) | $1 · $5 | 200K · 64K | Fast, cheap, simple, high-volume |
Copy these, in orderSwitch from Opus without a 400: the exact settings
Fable 5 shares Opus 4.8's request shape but rejects a few parameters Opus tolerated. Each one below is a hard 400 if you leave it in. Steps 1–4 are the ones that bite during migration.
- Set the model to
claude-fable-5. - Delete
temperature,top_p, andtop_k. All three are removed — any of them returns a 400. Steer with the prompt instead. - Remove
thinking: {type: "enabled", budget_tokens: N}. Budgets are gone; sending one is a 400. Thinking is always on (adaptive) — just omit thethinkingfield, or setthinking: {type: "adaptive"}. - Do NOT send
thinking: {type: "disabled"}. This one is Fable-specific: it's accepted on Opus 4.8/4.7 but a 400 on Fable 5. There is no "thinking off" — control depth with effort instead. - Set depth via
output_config: {effort: ...}—low·medium·high(default) ·xhigh·max. Start athigh; only goxhighfor the most capability-sensitive work,maxfor genuinely frontier problems (it can overthink). - Drop any last-assistant-turn prefill — also a 400. Use
output_config.format(structured output) or a system-prompt instruction to shape output. - Stream for outputs over ~16K tokens so you don't hit an HTTP timeout, and give
max_tokensreal headroom (it caps thinking + text combined).
temperature, top_p, top_k, budget_tokens, thinking:{type:"disabled"}, or a trailing assistant message, fix it — each is its own 400.The one most people missThe error that isn't a 400 — and breaks naive code anyway
Unlike Opus, Fable 5 runs safety classifiers that can decline a request — and a decline is not an error. It comes back as a successful HTTP 200 with stop_reason: "refusal" and an empty (or partial) content array. Any code that reads response.content[0].text without checking stop_reason first will crash on a refusal.
- What triggers it — Offensive-security/exploit work (
cyber), bio/lab methods (bio), help building competing models (frontier_llm), or asking it to dump its own reasoning as text (reasoning_extraction). Benign security and life-sciences work can trip these too. - How to detect it — Branch on
stop_reason == "refusal"— NOT oncontent.stop_details.categorynames the policy, but it can benull, so don't key your logic off it. - Billing quirk — A refusal before any output costs nothing and doesn't count against rate limits. A mid-stream refusal bills the input + already-streamed output — discard the partial.
- Monitoring trap — Refusals are 200s, so dashboards built on error/5xx rates never see them. Emit your own metric per refusal.
stop_reason == "refusal" and re-send the same request to a cheaper model like claude-opus-4-8. It's a fallback path you wire up yourself, not something that happens by default.Two more gotchas that look like bugs
Both of these produce confusing failures that have nothing to do with your prompt.
- Every request 400s out of nowhere — Fable 5 requires 30-day data retention and is not available under zero-data-retention (ZDR). If your org is on ZDR — or any retention below 30 days — every Fable 5 call returns
400 invalid_request_error, even a perfectly valid one. Check the org's retention setting before you debug the payload. - Your cost / token math is suddenly off — The ~30% tokenizer inflation means token counts, context budgets, and
max_tokensvalues measured on Opus/Sonnet/Haiku don't carry over. A prompt that fit comfortably before can blow your budget. Re-runcount_tokenspassingmodel: "claude-fable-5"— the response reports counts under both the new and old tokenizers so you can see the delta. - Migrated prompts feel worse, not better — Over-prescriptive scaffolding written for older models can degrade Fable 5. It follows instructions tightly and plans well on its own — strip the step-by-step hand-holding and re-test; a short "act when you have enough info; don't refactor beyond the task" beats a long checklist.
Put it to work without lighting money on fire
Concrete patterns that play to Fable 5's strengths (long-horizon, autonomous, self-verifying) while keeping the bill sane.
- Research analyst — Load 10+ sources into the 1M window at
higheffort; ask for a cited, decision-ready brief. This is the textbook Fable 5 job. - Architecture partner — Give the WHOLE spec up front in one well-specified turn, then let it plan and build. It rewards a clear goal more than mid-task nudging.
- Long autonomous run — Plan for minutes-long (sometimes hours-long) turns: stream, show progress, and check in asynchronously instead of blocking. Add "audit each progress claim against a tool result before reporting it" to kill fabricated status updates.
- Parallel sub-agents — It dispatches sub-agents reliably — delegate independent subtasks and keep working rather than spawning-and-blocking.
- Route, don't default — Send the hard ~10% here; everything else to Opus 4.8 / Sonnet / Haiku. One endpoint, one-line model swap (see below).
If you don't want to run the plumbing yourselfReach every model — including Fable 5 — under one billed endpoint
The whole point of the model-picker table is that the model is a config value you swap per task. To make that real you need one place to call all of them — and if you're serving customers, a way to meter and bill what they use. Knotie's AI Gateway is exactly that: an OpenAI-compatible endpoint. Point the standard OpenAI SDK at https://api.knotie.ai, change the model name, and you reach budget, mid, and premium tiers (Claude — including Fable 5 — plus GPT and Gemini families) without a new provider integration per model.
- One-line model switch — Swapping
claude-fable-5forclaude-opus-4-8(or a GPT/Gemini model) is the model name +base_url— not a new SDK, not new auth. - Per-customer metered keys — Mint virtual keys, restrict each key to specific models from a tiered list, set a profit markup, and bill usage on credits — under your own brand and domain.
- Guardrails before a demo — Restrict a key to mid-tier only, add domain whitelisting, and watch spend per key — so a client demo can't quietly run the $50/1M model on everything.
Get the next drop
New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.
By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.
Frequently asked questions
Is Fable 5 just a renamed Opus? Why the new name?
claude-mythos-5 is the same model minus the safety classifiers). It sits above the Opus tier, not beside it.What's the single most common 400 when moving from Opus to Fable 5?
temperature, top_p, and top_k were fine on older Claude models and are all hard 400s on Fable 5. Delete them first. The Fable-only surprise is that thinking: {type: "disabled"} — which Opus 4.8 accepts — is also a 400.My request returns 200 with empty content and no error. What happened?
stop_reason: "refusal", classifiers declined the request. It's a successful response, not an exception. Check stop_reason before reading content, and retry on a cheaper model (e.g. Opus 4.8) if you want an answer.Every Fable 5 call 400s but my payload looks valid. Why?
How do I control how hard it thinks now that budget_tokens is gone?
output_config.effort: low / medium / high (default) / xhigh / max. Start at high. Effort affects all token spend — text, tool calls, and thinking — so lower effort also means fewer tool calls, not just shorter reasoning.Will my Opus token budgets still be right on Fable 5?
max_tokens values are wrong — and your effective price is closer to ~2.6× Opus, not 2×. Re-measure with count_tokens using model: "claude-fable-5" (it returns both tokenizers' counts).