Agent Tokenomics: How to Price an AI Agent for a Client (Without Guessing or Losing Money)

Why most people price AI agents wrong

Most builders quote a client a flat monthly number pulled out of thin air, then quietly hope the token bill stays under it. That works until the agent gets used. The fix is boring and it works: figure out what ONE run of the agent actually costs in tokens, multiply by how often it runs, add your margin, and quote from that. This guide shows the exact arithmetic — with real 2026 prices — and the one trick (routing tasks to the right-sized model) that cuts the bill the most. Cloudflare runs this at scale internally: across its first 30 days, its AI code-review system did 131,246 review runs over 48,095 merge requests, and the median run cost $0.98 (average $1.19). That's the whole game in one number — a known, predictable cost per task.

A flat quote with no cost model is a bet, not a price.
The unit that matters is cost PER TASK (per call, per review, per conversation), not per month.
Once you know cost-per-task, the monthly number falls out of usage — and so does your margin.

The only formula you need

Token billing is simple once you see it. Every model has two prices: one for the tokens you send IN (the prompt, the context, the data), one for the tokens it sends OUT (the answer). You pay per million tokens. So the cost of one agent task is:

Cost per task = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price)
Input tokens = your system prompt + the user's request + any context/data you stuff in (this is usually the BIG number).
Output tokens = what the model writes back (usually much smaller, but priced ~5× higher per token).
Rough rule: 1 token ≈ 4 characters ≈ 0.75 of a word. A page of text is ~500–700 tokens.

Output is priced about 5× input on every current Claude model, so a chatty agent that writes long answers costs more than the input size alone suggests. Watch the output.

Real 2026 prices (verified)

Here's what three Claude tiers actually cost as of June 2026, straight from Anthropic's pricing page. Cheap, mid, and frontier. The gap between them is the entire reason routing works — the cheapest model is 5× cheaper than the frontier one for the same tokens.

Model	Input ($/1M tokens)	Output ($/1M tokens)	Use it for
Claude Haiku 4.5	$1	$5	Easy, high-volume calls — classify, extract, route, short replies
Claude Sonnet 4.6	$3	$15	The everyday workhorse — most real tasks
Claude Opus 4.8	$5	$25	The hard 20% — tricky reasoning, the calls you can't get wrong

Prices change. Always re-check the vendor's own pricing page before you quote a client — these are Anthropic's published rates on 2026-06-10. There's also a 90%-off cache-read price for repeated context, which we'll use below.

Worked example: pricing one agent task

Say your agent reviews a chunk of work for a client — a support ticket, a contract clause, a code diff, whatever. A realistic task sends ~30,000 input tokens (instructions + the thing to review + some context) and writes ~3,000 output tokens back. Same task, three models:

Model	Input cost	Output cost	Cost per task
Haiku 4.5	30k × $1/1M = $0.030	3k × $5/1M = $0.015	$0.045
Sonnet 4.6	30k × $3/1M = $0.090	3k × $15/1M = $0.045	$0.135
Opus 4.8	30k × $5/1M = $0.150	3k × $25/1M = $0.075	$0.225

These are illustrative token volumes (30k in / 3k out) priced at Anthropic's real published rates. Your token counts will differ — measure your own agent on 10 real runs and average them. But the SHAPE holds: the frontier model costs ~5× the cheap one for the identical task.

The move that cuts the bill: model-routing

Here's the lever. Most tasks are easy. A few are hard. If you send everything to the frontier model 'to be safe', you pay frontier prices for work a cheap model would nail. Instead, route: a cheap model (or a small classifier) decides if a task is easy or hard, then easy tasks go to Haiku and only the hard ones go to Opus. Cloudflare does exactly this — it reserves its top-tier models (Claude Opus 4.7 / GPT-5.4) for the coordinator that orchestrates the review, and runs the bulk 'sub-reviewer' work on standard-tier models (Claude Sonnet 4.6 / GPT-5.3 Codex), with a lightweight model for text-heavy odds and ends. Same pattern, any scale. Watch what it does to 1,000 tasks a month:

Strategy	Math	Monthly cost
Everything on Opus 4.8	1,000 × $0.225	$225
Routed: 80% Haiku + 20% Opus	(800 × $0.045) + (200 × $0.225)	$81
You just saved	$225 − $81	$144 (64% less)

The routing decision itself costs almost nothing — a one-line Haiku classification is a fraction of a cent. Two more free wins: cache the parts of your prompt that don't change (cached input reads are 90% off), and don't let the agent ramble (output is the expensive side).

The hidden costs that blow up your estimate

Your clean 30k-in / 3k-out number is the floor, not the bill. In production three things quietly add tokens, and if you priced off the floor you eat them. Budget for all three before you quote.

Retries. When a call fails, times out, or returns malformed JSON, you re-send the whole prompt — and you pay for the failed attempt AND the retry. An agent with a 2× average re-run rate doubles its input cost. Cloudflare's own system averages 2.7 re-reviews per merge request, which is exactly why their per-review average ($1.19) sits above the median ($0.98).
Context bloat. Multi-turn agents resend the growing conversation on every step. Turn 1 sends 5k tokens; turn 10 might resend 50k because the whole history rides along each time. The input side compounds — measure a full conversation, not one turn.
Tool calls. Every tool you give the model adds tokens before the task even starts. On Claude, the tool-use system prompt alone is ~290–500 input tokens depending on model, plus the JSON schema of each tool, plus every tool_result you feed back in. A 6-tool agent that calls 3 tools per task can add several thousand input tokens you never see in the 'happy path' math.

Caching kills most of this. The static parts — system prompt, tool definitions, long instructions — are identical on every call, so cache them: cache reads are billed at 0.1× the input rate (a 90% discount) on every current Claude model. The 5-minute cache write costs 1.25× input once, then every read is 90% off, so it pays for itself after a single reuse.

Turn cost-per-task into a client price

Now you have a real cost floor, build the quote on top of it instead of guessing. Four steps:

1. Measure: run your agent on ~10 real tasks, average the input + output tokens, get your true cost-per-task (use the formula above).
2. Estimate volume: ask the client how many runs/month they expect, then assume MORE — usage always grows once it works. Price for the higher number.
3. Add a safety multiple: multiply your raw token cost by 3–5× to cover retries, longer-than-expected tasks, support, and your margin. Token cost should be a small slice of what you charge, not the whole price.
4. Quote a plan with a ceiling: e.g. 'up to 1,000 runs/month included, then $X per 100 after'. Now overage is the client's choice, not your loss.

Bill the client on a clean unit they understand — a credit, a minute, a 'run' — not raw tokens. They should never see the word 'token'. You absorb the routing complexity; they see one predictable number.

A tier-based pricing table you can copy

If you sell more than one plan, stop quoting per task and sell tiers. Each tier caps which models the agent is allowed to use, which sets a hard ceiling on your cost-per-task — so you can publish a price with a known margin. This is the same 30k-in / 3k-out task from above, priced per tier at a 4× markup. Adjust the volume and markup to your own numbers.

Plan	Models allowed	Your cost / task	Client price / task (4×)	Margin
Starter	Haiku 4.5 only	$0.045	$0.18	$0.135
Pro	Haiku + Sonnet (routed 70/30)	$0.072	$0.29	$0.218
Premium	Haiku + Sonnet + Opus (routed 60/25/15)	$0.101	$0.40	$0.299

Pro math: (0.7 × $0.045) + (0.3 × $0.135) = $0.072. Premium math: (0.6 × $0.045) + (0.25 × $0.135) + (0.15 × $0.225) = $0.101. The premium tier costs you ~2.2× more per task than Starter, so it should command a premium price — and a Starter client who never touches Opus can't blow your margin, because the tier physically blocks the expensive model.

Metering and billing per client — without building it yourself

The pricing model above has one moving part that's a pain to build: actually metering each client's usage against the model they're allowed to use, applying your markup, and drawing it down from a credit balance — per client, under your own brand. If you'd rather not write that billing layer, a gateway can do it. Knotie's AI Gateway is OpenAI-compatible: you keep the standard OpenAI SDK and point base_url at https://api.knotie.ai. You mint a virtual key per client and restrict that key to specific models — exactly the per-tier model ceiling from the table above (budget / mid / premium families: Claude, GPT, Gemini). Each key is pre-funded from Knotie Credits (100 credits = $1; $10 / 1,000 credits reserved per key, overflow drawing your main balance), and per key you set a profit multiplier and a price per credit, so the markup math you just did is enforced automatically and billed to the client. Clients can self-serve their own keys from your white-label portal, and you can whitelist approved domains per key.

This handles the billing mechanics — metering, markup, credit draw-down per key. It does not pick models for you or fail over automatically: the routing logic (easy → cheap, hard → frontier) is still something you implement in your own agent. The gateway meters and bills whatever model your code calls.

The 6-point tokenomics checklist

Run this before you send any AI-agent quote:

Do I know my agent's average input AND output tokens from REAL runs (not a guess)?
Have I priced one task on the cheap, mid, AND frontier model so I know the spread?
Am I routing — easy tasks to a cheap model, only the hard ones to the frontier?
Am I caching the static part of my prompt (90% off on repeat reads)?
Did I check the vendor's live pricing page TODAY, not from memory?
Is my client price a clean unit (credit/minute/run) with a ceiling, at a 3–5× multiple over raw cost?

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.

Frequently asked questions

How do I estimate the token cost of an AI agent before I build it?

Take your best guess at input tokens (system prompt + the data/context you'll feed it + the user's request) and output tokens (the answer length), then apply the formula: <code>(input ÷ 1,000,000 × input price) + (output ÷ 1,000,000 × output price)</code>. A page of text is roughly 500–700 tokens. Once a prototype exists, replace the guess by measuring 10 real runs and averaging — that's your true cost-per-task.

What is model-routing and why does it cut cost so much?

Routing means sending each task to the smallest model that can do it well: easy/high-volume calls go to a cheap model (e.g. Claude Haiku 4.5 at $1/$5 per million tokens), and only the hard calls go to the frontier model (e.g. Opus 4.8 at $5/$25). Because the cheap model is ~5× cheaper, and most tasks are easy, the blended cost drops sharply. In the guide's example, routing 80% of tasks to Haiku cut a $225/month bill to $81 — 64% less.

Is the $0.98 per-review Cloudflare number real?

Yes — it's from Cloudflare's own engineering blog ("Orchestrating AI code review at scale"). Across the first 30 days (March 10 – April 9, 2026), their internal AI code-review system ran 131,246 reviews over 48,095 merge requests, with a median cost of $0.98 per review and an average of $1.19. They route work across model tiers — top-tier models coordinate, standard-tier models do the bulk reviewing. It's a clean real-world anchor for 'known cost per task'.

Should I charge clients per token?

No. Clients don't think in tokens and a per-token bill feels unpredictable and scary. Charge on a clean unit they understand — a credit, a per-minute rate, or a 'run' — with a monthly ceiling and a clear overage price. You handle the token math and routing behind the scenes; they get one predictable number. Set your unit price at roughly 3–5× your raw token cost to cover retries, support, and margin.

Why is output so much more expensive than input?

On every current Claude model, output tokens are priced about 5× the input rate (Haiku $1 in / $5 out, Opus $5 in / $25 out). Generating text is more compute-intensive than reading it. The practical takeaway: a verbose agent that writes long answers can cost more than its large input suggests, so cap output length and ask for concise responses where you can.

Do tools and retries really change the cost that much?

They can, and they're the most common reason a real bill beats the estimate. Every tool you attach adds tokens before the task starts — on Claude, the tool-use system prompt is ~290–500 input tokens depending on model, plus each tool's JSON schema, plus every <code>tool_result</code> you feed back. Retries are worse: a failed or malformed call means you pay for the attempt AND the re-send. Measure your agent on real runs (which include the retries and tool round-trips), not on a clean single pass, and cache the static parts so repeated context is billed at 0.1× input.

How should I price multiple tiers without losing money on the top one?

Cap which models each tier can use. If your Starter plan only allows a cheap model (e.g. Haiku 4.5), a Starter client physically can't trigger frontier-model costs, so your margin on that tier is fixed. Reserve Sonnet/Opus for higher tiers and price those up to match. The pricing table above shows the pattern: Starter at $0.045/task, Premium (routed across all three) at ~$0.101/task, each marked up ~4×. Enforce the model ceiling at the key/gateway level, not on trust.

Sources · Orchestrating AI Code Review at scale — Cloudflare Blog · Pricing — Anthropic / Claude API Docs (model rates, cache read 0.1×, prompt-caching multipliers) · Tool use pricing — Anthropic / Claude API Docs (tool-use system prompt 290–500 tokens by model)

Agent Tokenomics: How to Price an AI Agent for a Client (Without Guessing or Losing Money)

Why most people price AI agents wrong

The only formula you need

Real 2026 prices (verified)

Worked example: pricing one agent task

The move that cuts the bill: model-routing

The hidden costs that blow up your estimate

Turn cost-per-task into a client price

A tier-based pricing table you can copy

Metering and billing per client — without building it yourself

The 6-point tokenomics checklist

Get the next drop

Frequently asked questions

Resell the agent under your own brand, with the billing built in

Keep the agent-pricing cheat-sheet — and get the next playbook free

Agent Tokenomics: How to Price an AI Agent for a Client (Without Guessing or Losing Money)

Why most people price AI agents wrong

The only formula you need

Real 2026 prices (verified)

Worked example: pricing one agent task

The move that cuts the bill: model-routing

The hidden costs that blow up your estimate

Turn cost-per-task into a client price

A tier-based pricing table you can copy

Metering and billing per client — without building it yourself

The 6-point tokenomics checklist

Get the next drop

Frequently asked questions

More free guides

Resell the agent under your own brand, with the billing built in

Keep the agent-pricing cheat-sheet — and get the next playbook free