Why you're hereThe 30-second recap (then we get to work)
Two of the three big labs restricted open/flat access in one quarter. Anthropic blocked flat-rate Claude plans from running third-party agents (April 4, pushing that usage to pay-as-you-go; some bills spiked, before a capped credit softened it). Google is retiring the open-source Gemini CLI on June 18, 2026 and replacing it with the closed-source, Go-based Antigravity CLI (agy) — free, Pro and Ultra users lose access; enterprise licences and paid API keys keep working. That's the why, and it's covered in the sources. The rest of this page is the how: the exact moves to make 'the model' a config value so the next pricing change is an annoyance, not an outage.
gemini directly, that command stops working June 18. Jump to the Gemini-CLI section below — you have a hard deadline.The foundationMove 1 — get off flat-rate onto your own API keys
Flat-rate subscriptions are priced for interactive human use. The moment an agent runs on one, you're one policy change away from a broken pipeline or a 50× bill. For anything automated, run on metered API keys you own. Create them once, store them as environment variables, and never paste a key into code.
- Anthropic — Console → API Keys: console.anthropic.com/settings/keys. Export as
ANTHROPIC_API_KEY. - OpenAI — Platform → API keys: platform.openai.com/api-keys. Export as
OPENAI_API_KEY. - Google — AI Studio → Get API key: aistudio.google.com/apikey. This is the paid Gemini API path that survives June 18. Export as
GOOGLE_API_KEY(some tools useGEMINI_API_KEY). - OpenRouter — one key, ~hundreds of models behind an OpenAI-compatible endpoint: openrouter.ai/settings/keys. Set a per-key credit limit; copy it immediately (shown once). Export as
OPENROUTER_API_KEY. - Put them in a
.env(and your CI secret store), never in a committed file:ANTHROPIC_API_KEY=sk-ant-...,OPENAI_API_KEY=sk-...,OPENROUTER_API_KEY=sk-or-....
https://openrouter.ai/api/v1) is OpenAI-compatible, so most harnesses point at it with no code change. The trade-off is a small routing margin and one more company in your trust chain.The decouplerMove 2 — put a provider-agnostic harness in front
A harness is the agent loop (read files, plan, edit, run). Keep the harness; make the model behind it a config value. Here are three real ones with the actual config — not 'use OpenCode', the lines you paste.
- OpenCode — supports 75+ providers via models.dev. Set the default model in
opencode.jsonwith"model": "providerID/modelID"(e.g."anthropic/claude-sonnet-4-5"). To switch, change that one string. To add any OpenAI-compatible endpoint, add aproviderblock keyed by an ID, with"npm": "@ai-sdk/openai-compatible"and"options": { "baseURL": "..." }. - Aider — pick the model at launch:
aider --model anthropic/claude-sonnet-4-5or via OpenRouteraider --model openrouter/anthropic/claude-sonnet-4-5. Persist it in.aider.conf.ymlwithmodel:,weak-model:(cheap model for commits/titles) andeditor-model:. Swapping = editing one line. - Claude Code — point it at your key with
ANTHROPIC_API_KEY, or at a gateway withANTHROPIC_BASE_URL. Important: Claude Code speaks the Anthropic Messages API, not the OpenAI format — whatever sits behindANTHROPIC_BASE_URLmust accept the/v1/messagesshape (Anthropic, an Anthropic-compatible proxy, or a small translator). It is not a drop-in for an OpenAI-only endpoint.
| Harness | Providers | Local-model support | How you switch model | Best for |
|---|---|---|---|---|
| OpenCode | 75+ via models.dev + any OpenAI-compatible custom provider | Yes (Ollama via openai-compatible baseURL) | Edit "model": "providerID/modelID" in opencode.json (or /models) | Most flexibility; the cleanest multi-provider config |
| Aider | Anthropic, OpenAI, Google, OpenRouter, OpenAI-compatible, Ollama | Yes (ollama/<model> or openai-compat base) | --model ... flag or model: in .aider.conf.yml | Fast git-native pair-programming in the terminal |
| Claude Code | Anthropic + any Anthropic-Messages-compatible gateway | Only via a translator that speaks Messages API | ANTHROPIC_BASE_URL + key/token env vars | Staying on Claude while keeping the endpoint swappable |
claude-sonnet-4-5 and gemini-2.5-pro here are examples — model strings change over time, so substitute your provider's current model name (Anthropic's models page, Google AI Studio, or models.dev for OpenCode). Pick ONE harness as your default and learn its config file cold. The point isn't which harness — it's that the model is a string you control, not a vendor binary you can't.The safety netMove 3 — add a local fallback (exact commands)
Not every call needs a frontier model. A local open-weight coder handles the high-volume, low-stakes work (boilerplate, renames, commit messages, first-draft edits) for free, and it keeps running if an API key gets throttled or a billing page moves. Ollama exposes an OpenAI-compatible endpoint, so any harness above can point at it.
- Install Ollama, then pull a coder model:
ollama pull qwen2.5-coder(the 7B is ~4.2 GB; useqwen2.5-coder:14bor:32bif you have 32 GB+ RAM and a GPU). - Start the server:
ollama serve(keep it running in its own tab). The OpenAI-compatible API is athttp://localhost:11434/v1. - Sanity-check it:
curl http://localhost:11434/v1/modelsshould list your pulled models. - Point Aider at it:
aider --model ollama/qwen2.5-coder(or setOPENAI_API_BASE=http://localhost:11434/v1and use--model openai/qwen2.5-coder). - Point OpenCode at it: add a provider block with
"npm": "@ai-sdk/openai-compatible"and"options": { "baseURL": "http://localhost:11434/v1" }, then set"model": "ollama/qwen2.5-coder".
The clean versionMove 4 — make 'the model' one config value via a gateway
Once you have keys, a harness and a fallback, the last step is to stop hard-wiring providers at all. Put one OpenAI-compatible gateway in front of every model. With the standard OpenAI SDK, a swap becomes a base_url + model-name change — not a new provider integration scattered across your code.
- Self-host it: run a router (e.g. LiteLLM Proxy) that exposes one
/v1/chat/completionsendpoint and maps friendly model names to upstream providers. Your apps only ever seebase_url=http://your-gateway/v1and a model string. - Or use OpenRouter as the gateway (
base_url=https://openrouter.ai/api/v1) if you just want reach without running infra. - Either way, 'switch the model' is now: change the model name (and base_url if you move gateways). Your application code doesn't change.
base_url to https://api.knotie.ai, then switch the model by name — but it can also pick an "auto" model and automatically fail over to another model when one fails or dies, with no code change on your side. You keep calling the same endpoint and key; a model dying becomes something handled for you, not an outage you wake up to. On top of that, each virtual key can be restricted to specific models, with metered credit billing and a profit markup, so you can bill model usage per customer under your own brand. For a solo stack a self-hosted router does the job, but it won't fail over on its own — that's the piece you'd otherwise have to build and babysit yourself.Hard deadline: June 18If you were using gemini — your three real options
The video promised the Gemini-CLI migration. Here it is, concretely. Anything that calls the open-source gemini binary stops serving free/Pro/Ultra users on June 18, 2026. Pick one:
- Stay in Google's world (enterprise): move to the closed-source Antigravity CLI (
agy). Org access via a Gemini Code Assist Standard/Enterprise licence is unchanged. Google states the replacement supports its agent features — check the official Antigravity docs for parity details before you assume a one-to-one mapping. This is the path if you're committed to Gemini and have an enterprise licence. - Keep Gemini, drop the CLI: create a paid Gemini API key in Google AI Studio (aistudio.google.com/apikey) and call it from a provider-agnostic harness (Aider
--model gemini/gemini-2.5-pro, or OpenCode with the Google provider). Same model, no dependence on a binary Google can retire. - Swap the model entirely: if
geminiwas just 'the AI in my script', replace it with any harness above on your own keys. Now the provider is a config value and this whole episode can't repeat on you.
Beyond the videoSwap-safety: the gotchas nobody warns you about
A swap is rarely 'change one string and ship'. Models are not interchangeable parts; they differ in ways that silently break behaviour. Check these before you cut over.
- Context window differs. A prompt that fit one model can overflow another. If your harness packs large repo context, a smaller window truncates silently — answers get worse with no error.
- Tool / function-calling formats differ. OpenAI-style
tools/tool_calls, Anthropictool_useblocks, and Gemini function-calling are not identical. An OpenAI-compatible gateway normalises the wire format, but quirks (parallel calls, strict JSON schema, forced tool use) still vary by upstream model. - System-prompt sensitivity. The same system prompt can produce different obedience and formatting across models. A prompt tuned for one model often needs a tweak for the next.
- Cost deltas are large. Frontier vs mid vs local can differ by 10×+ per token. After a swap, watch token spend for a day before trusting it in automation.
- A/B test the swap, don't trust the vibe. Run the same eval set or real task through both models, diff the outputs, and compare cost and latency. Promote the swap only when the new model holds up on YOUR tasks — not on a benchmark.
The deliverableYour 5-point lock-in audit
Run this now. Each 'yes' is a single point of failure to fix this week.
- Does any CI job, cron, or product call
gemini(or one specific CLI binary) directly? → wrap it so the backend is swappable. - Are you running automated agents on a FLAT-RATE subscription instead of API keys? → move to metered keys you own.
- If your main provider doubled prices tomorrow, what breaks — and how fast could you actually switch?
- Do you have a local/open-weight fallback for non-critical, high-volume work?
- Is 'the model' a config value (string + base_url) in your stack, or is it hard-wired into the code?
Do this weekPre-June-18 checklist
A tight, do-it-now list before the Gemini CLI cutoff.
- Grep your repos and crontab for
gemini(andgemini-cli). List every hit. - Create the API keys you'll need (at minimum OpenRouter, plus the lab keys you actually use).
- Stand up one harness (OpenCode or Aider) on those keys and reproduce your most important
geminitask through it. - Pull one local model (
ollama pull qwen2.5-coder) and wire it as the cheap fallback. - Pick your gateway approach (self-hosted router, OpenRouter, or a managed gateway) and set
base_urlonce so future swaps are one line. - Re-run your CI on the new path before the 18th. Delete the
geminidependency once green.
Get the next drop
New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.
By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.
Frequently asked questions
What exactly happens to Gemini CLI on June 18, 2026?
agy), announced at Google I/O on May 19, 2026 — about a 30-day migration window.I just want the model to be swappable. Which harness should I pick?
"model": "providerID/modelID" in opencode.json and switch by editing that string (75+ providers via models.dev). If you live in the terminal and want git-native pair-programming, use Aider and set the model with --model or in .aider.conf.yml. If you're committed to Claude, use Claude Code and point ANTHROPIC_BASE_URL at a gateway. Pick one and learn its config file.Can I point Claude Code at an OpenAI-compatible gateway like OpenRouter directly?
ANTHROPIC_BASE_URL only changes the hostname — Claude Code still sends the Anthropic Messages API (/v1/messages) shape. Whatever sits behind it must accept that shape (Anthropic, an Anthropic-compatible proxy, or a translator). For OpenAI-compatible endpoints, use OpenCode or Aider instead, which speak the OpenAI format natively.What's the fastest way to add a free local fallback?
ollama pull qwen2.5-coder then ollama serve. It exposes an OpenAI-compatible API at http://localhost:11434/v1 (verify with curl http://localhost:11434/v1/models). Point Aider at it with aider --model ollama/qwen2.5-coder, or add it to OpenCode as an @ai-sdk/openai-compatible provider with that baseURL.How do I make 'switch the model' a one-line change?
base_url + a model name. Self-host a router (e.g. LiteLLM Proxy), use OpenRouter (https://openrouter.ai/api/v1), or a managed gateway. After that, swapping a model is a config edit, not a code change scattered across your integration.What if a model gets deprecated or goes down mid-run — do I still have to swap by hand?
What breaks when I swap one model for another?
tool_calls vs Anthropic tool_use vs Gemini function-calling differ in the details), system-prompt sensitivity (the same prompt obeys differently), and cost (frontier vs local can differ 10×+). A/B test the swap on your own task set before trusting it in automation.