The Provider-Agnostic AI Stack: A Real Migration Guide (Before Gemini CLI Dies June 18)

Q: Can I point Claude Code at an OpenAI-compatible gateway like OpenRouter directly?

Not directly. ANTHROPIC_BASE_URL only changes the hostname — Claude Code still sends the Anthropic Messages API (/v1/messages) shape. Whatever sits behind it must accept that shape (Anthropic, an Anthropic-compatible proxy, or a translator). For OpenAI-compatible endpoints, use OpenCode or Aider instead, which speak the OpenAI format natively.

Q: What's the fastest way to add a free local fallback?

Install Ollama, run ollama pull qwen2.5-coder then ollama serve. It exposes an OpenAI-compatible API at http://localhost:11434/v1 (verify with curl http://localhost:11434/v1/models). Point Aider at it with aider --model ollama/qwen2.5-coder, or add it to OpenCode as an @ai-sdk/openai-compatible provider with that baseURL.

Why you're hereThe 30-second recap (then we get to work)

Two of the three big labs restricted open/flat access in one quarter. Anthropic blocked flat-rate Claude plans from running third-party agents (April 4, pushing that usage to pay-as-you-go; some bills spiked, before a capped credit softened it). Google is retiring the open-source Gemini CLI on June 18, 2026 and replacing it with the closed-source, Go-based Antigravity CLI (agy) — free, Pro and Ultra users lose access; enterprise licences and paid API keys keep working. That's the why, and it's covered in the sources. The rest of this page is the how: the exact moves to make 'the model' a config value so the next pricing change is an annoyance, not an outage.

If you have a CI job, cron, or product that shells out to gemini directly, that command stops working June 18. Jump to the Gemini-CLI section below — you have a hard deadline.

The foundationMove 1 — get off flat-rate onto your own API keys

Flat-rate subscriptions are priced for interactive human use. The moment an agent runs on one, you're one policy change away from a broken pipeline or a 50× bill. For anything automated, run on metered API keys you own. Create them once, store them as environment variables, and never paste a key into code.

Anthropic — Console → API Keys: console.anthropic.com/settings/keys. Export as ANTHROPIC_API_KEY.
OpenAI — Platform → API keys: platform.openai.com/api-keys. Export as OPENAI_API_KEY.
Google — AI Studio → Get API key: aistudio.google.com/apikey. This is the paid Gemini API path that survives June 18. Export as GOOGLE_API_KEY (some tools use GEMINI_API_KEY).
OpenRouter — one key, ~hundreds of models behind an OpenAI-compatible endpoint: openrouter.ai/settings/keys. Set a per-key credit limit; copy it immediately (shown once). Export as OPENROUTER_API_KEY.
Put them in a .env (and your CI secret store), never in a committed file: ANTHROPIC_API_KEY=sk-ant-..., OPENAI_API_KEY=sk-..., OPENROUTER_API_KEY=sk-or-....

OpenRouter is the fast hedge: one key reaches Claude, GPT, Gemini, Llama and more, and its endpoint (https://openrouter.ai/api/v1) is OpenAI-compatible, so most harnesses point at it with no code change. The trade-off is a small routing margin and one more company in your trust chain.

The decouplerMove 2 — put a provider-agnostic harness in front

A harness is the agent loop (read files, plan, edit, run). Keep the harness; make the model behind it a config value. Here are three real ones with the actual config — not 'use OpenCode', the lines you paste.

OpenCode — supports 75+ providers via models.dev. Set the default model in opencode.json with "model": "providerID/modelID" (e.g. "anthropic/claude-sonnet-4-5"). To switch, change that one string. To add any OpenAI-compatible endpoint, add a provider block keyed by an ID, with "npm": "@ai-sdk/openai-compatible" and "options": { "baseURL": "..." }.
Aider — pick the model at launch: aider --model anthropic/claude-sonnet-4-5 or via OpenRouter aider --model openrouter/anthropic/claude-sonnet-4-5. Persist it in .aider.conf.yml with model:, weak-model: (cheap model for commits/titles) and editor-model:. Swapping = editing one line.
Claude Code — point it at your key with ANTHROPIC_API_KEY, or at a gateway with ANTHROPIC_BASE_URL. Important: Claude Code speaks the Anthropic Messages API, not the OpenAI format — whatever sits behind ANTHROPIC_BASE_URL must accept the /v1/messages shape (Anthropic, an Anthropic-compatible proxy, or a small translator). It is not a drop-in for an OpenAI-only endpoint.

Harness	Providers	Local-model support	How you switch model	Best for
OpenCode	75+ via models.dev + any OpenAI-compatible custom provider	Yes (Ollama via openai-compatible baseURL)	Edit `"model": "providerID/modelID"` in opencode.json (or `/models`)	Most flexibility; the cleanest multi-provider config
Aider	Anthropic, OpenAI, Google, OpenRouter, OpenAI-compatible, Ollama	Yes (`ollama/<model>` or openai-compat base)	`--model ...` flag or `model:` in .aider.conf.yml	Fast git-native pair-programming in the terminal
Claude Code	Anthropic + any Anthropic-Messages-compatible gateway	Only via a translator that speaks Messages API	`ANTHROPIC_BASE_URL` + key/token env vars	Staying on Claude while keeping the endpoint swappable

Model IDs like claude-sonnet-4-5 and gemini-2.5-pro here are examples — model strings change over time, so substitute your provider's current model name (Anthropic's models page, Google AI Studio, or models.dev for OpenCode). Pick ONE harness as your default and learn its config file cold. The point isn't which harness — it's that the model is a string you control, not a vendor binary you can't.

The safety netMove 3 — add a local fallback (exact commands)

Not every call needs a frontier model. A local open-weight coder handles the high-volume, low-stakes work (boilerplate, renames, commit messages, first-draft edits) for free, and it keeps running if an API key gets throttled or a billing page moves. Ollama exposes an OpenAI-compatible endpoint, so any harness above can point at it.

Install Ollama, then pull a coder model: ollama pull qwen2.5-coder (the 7B is ~4.2 GB; use qwen2.5-coder:14b or :32b if you have 32 GB+ RAM and a GPU).
Start the server: ollama serve (keep it running in its own tab). The OpenAI-compatible API is at http://localhost:11434/v1.
Sanity-check it: curl http://localhost:11434/v1/models should list your pulled models.
Point Aider at it: aider --model ollama/qwen2.5-coder (or set OPENAI_API_BASE=http://localhost:11434/v1 and use --model openai/qwen2.5-coder).
Point OpenCode at it: add a provider block with "npm": "@ai-sdk/openai-compatible" and "options": { "baseURL": "http://localhost:11434/v1" }, then set "model": "ollama/qwen2.5-coder".

Local models trail frontier models on hard reasoning and long-context tasks. Use them as the cheap lane, not the only lane — route the hard calls to a frontier API and keep local for volume.

The clean versionMove 4 — make 'the model' one config value via a gateway

Once you have keys, a harness and a fallback, the last step is to stop hard-wiring providers at all. Put one OpenAI-compatible gateway in front of every model. With the standard OpenAI SDK, a swap becomes a base_url + model-name change — not a new provider integration scattered across your code.

Self-host it: run a router (e.g. LiteLLM Proxy) that exposes one /v1/chat/completions endpoint and maps friendly model names to upstream providers. Your apps only ever see base_url=http://your-gateway/v1 and a model string.
Or use OpenRouter as the gateway (base_url=https://openrouter.ai/api/v1) if you just want reach without running infra.
Either way, 'switch the model' is now: change the model name (and base_url if you move gateways). Your application code doesn't change.

Here's the gotcha with the DIY version: swapping the model by hand is still a manual move. When a model gets deprecated, rate-limited, or just goes down mid-run, your pipeline stalls until a human edits the config. The managed version closes that gap. Knotie's AI Gateway is OpenAI-compatible — keep the OpenAI SDK and swap base_url to https://api.knotie.ai, then switch the model by name — but it can also pick an "auto" model and automatically fail over to another model when one fails or dies, with no code change on your side. You keep calling the same endpoint and key; a model dying becomes something handled for you, not an outage you wake up to. On top of that, each virtual key can be restricted to specific models, with metered credit billing and a profit markup, so you can bill model usage per customer under your own brand. For a solo stack a self-hosted router does the job, but it won't fail over on its own — that's the piece you'd otherwise have to build and babysit yourself.

Hard deadline: June 18If you were using `gemini` — your three real options

The video promised the Gemini-CLI migration. Here it is, concretely. Anything that calls the open-source gemini binary stops serving free/Pro/Ultra users on June 18, 2026. Pick one:

Stay in Google's world (enterprise): move to the closed-source Antigravity CLI (agy). Org access via a Gemini Code Assist Standard/Enterprise licence is unchanged. Google states the replacement supports its agent features — check the official Antigravity docs for parity details before you assume a one-to-one mapping. This is the path if you're committed to Gemini and have an enterprise licence.
Keep Gemini, drop the CLI: create a paid Gemini API key in Google AI Studio (aistudio.google.com/apikey) and call it from a provider-agnostic harness (Aider --model gemini/gemini-2.5-pro, or OpenCode with the Google provider). Same model, no dependence on a binary Google can retire.
Swap the model entirely: if gemini was just 'the AI in my script', replace it with any harness above on your own keys. Now the provider is a config value and this whole episode can't repeat on you.

Do this BEFORE June 18, not on the 18th. CI breakages discovered at the deadline are the expensive kind.

Beyond the videoSwap-safety: the gotchas nobody warns you about

A swap is rarely 'change one string and ship'. Models are not interchangeable parts; they differ in ways that silently break behaviour. Check these before you cut over.

Context window differs. A prompt that fit one model can overflow another. If your harness packs large repo context, a smaller window truncates silently — answers get worse with no error.
Tool / function-calling formats differ. OpenAI-style tools/tool_calls, Anthropic tool_use blocks, and Gemini function-calling are not identical. An OpenAI-compatible gateway normalises the wire format, but quirks (parallel calls, strict JSON schema, forced tool use) still vary by upstream model.
System-prompt sensitivity. The same system prompt can produce different obedience and formatting across models. A prompt tuned for one model often needs a tweak for the next.
Cost deltas are large. Frontier vs mid vs local can differ by 10×+ per token. After a swap, watch token spend for a day before trusting it in automation.
A/B test the swap, don't trust the vibe. Run the same eval set or real task through both models, diff the outputs, and compare cost and latency. Promote the swap only when the new model holds up on YOUR tasks — not on a benchmark.

Cheap A/B harness: keep a folder of 10–20 representative tasks (a bug fix, a refactor, a doc-gen, a tricky tool call). Run the old and new model over all of them, eyeball the diffs. Twenty minutes saves a bad cutover.

The deliverableYour 5-point lock-in audit

Run this now. Each 'yes' is a single point of failure to fix this week.

Does any CI job, cron, or product call gemini (or one specific CLI binary) directly? → wrap it so the backend is swappable.
Are you running automated agents on a FLAT-RATE subscription instead of API keys? → move to metered keys you own.
If your main provider doubled prices tomorrow, what breaks — and how fast could you actually switch?
Do you have a local/open-weight fallback for non-critical, high-volume work?
Is 'the model' a config value (string + base_url) in your stack, or is it hard-wired into the code?

Do this weekPre-June-18 checklist

A tight, do-it-now list before the Gemini CLI cutoff.

Grep your repos and crontab for gemini (and gemini-cli). List every hit.
Create the API keys you'll need (at minimum OpenRouter, plus the lab keys you actually use).
Stand up one harness (OpenCode or Aider) on those keys and reproduce your most important gemini task through it.
Pull one local model (ollama pull qwen2.5-coder) and wire it as the cheap fallback.
Pick your gateway approach (self-hosted router, OpenRouter, or a managed gateway) and set base_url once so future swaps are one line.
Re-run your CI on the new path before the 18th. Delete the gemini dependency once green.

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.

Frequently asked questions

What exactly happens to Gemini CLI on June 18, 2026?

Per Google's developer blog, the open-source Gemini CLI and Gemini Code Assist IDE extensions stop serving requests for free, Pro, and Ultra users on June 18, 2026. Organizations with Gemini Code Assist Standard/Enterprise licences, or paid Gemini API keys, keep access. The replacement is the closed-source, Go-based Antigravity CLI (binary agy), announced at Google I/O on May 19, 2026 — about a 30-day migration window.

I just want the model to be swappable. Which harness should I pick?

If you want the cleanest multi-provider config, use OpenCode — set "model": "providerID/modelID" in opencode.json and switch by editing that string (75+ providers via models.dev). If you live in the terminal and want git-native pair-programming, use Aider and set the model with --model or in .aider.conf.yml. If you're committed to Claude, use Claude Code and point ANTHROPIC_BASE_URL at a gateway. Pick one and learn its config file.

Can I point Claude Code at an OpenAI-compatible gateway like OpenRouter directly?

Not directly. ANTHROPIC_BASE_URL only changes the hostname — Claude Code still sends the Anthropic Messages API (/v1/messages) shape. Whatever sits behind it must accept that shape (Anthropic, an Anthropic-compatible proxy, or a translator). For OpenAI-compatible endpoints, use OpenCode or Aider instead, which speak the OpenAI format natively.

What's the fastest way to add a free local fallback?

Install Ollama, run ollama pull qwen2.5-coder then ollama serve. It exposes an OpenAI-compatible API at http://localhost:11434/v1 (verify with curl http://localhost:11434/v1/models). Point Aider at it with aider --model ollama/qwen2.5-coder, or add it to OpenCode as an @ai-sdk/openai-compatible provider with that baseURL.

How do I make 'switch the model' a one-line change?

Put one OpenAI-compatible gateway in front of every provider and have your apps only ever set base_url + a model name. Self-host a router (e.g. LiteLLM Proxy), use OpenRouter (https://openrouter.ai/api/v1), or a managed gateway. After that, swapping a model is a config edit, not a code change scattered across your integration.

What if a model gets deprecated or goes down mid-run — do I still have to swap by hand?

With a DIY router, yes: the model name is a config value, but changing it is still a manual edit, so a model that dies mid-pipeline stalls you until someone updates it. A managed gateway can remove that manual step. Knotie's AI Gateway, for example, can pick an 'auto' model and automatically fail over to another model when one fails or dies, with no code change on your side — you keep calling the same endpoint and key. That turns 'a model died' from an outage into something handled for you. If you self-host, plan to build and maintain that failover logic yourself.

What breaks when I swap one model for another?

Four things, usually: context window (a prompt that fit may now overflow and truncate silently), tool/function-calling format (OpenAI tool_calls vs Anthropic tool_use vs Gemini function-calling differ in the details), system-prompt sensitivity (the same prompt obeys differently), and cost (frontier vs local can differ 10×+). A/B test the swap on your own task set before trusting it in automation.

Does this lock-in risk apply to voice/chat AI agents I deploy for clients?

Yes, the same dynamic. If your voice or chat agents are wired to one provider, one pricing change can break client deployments. The hedge is the same: a multi-provider design where the model is a swappable config value, with metering so you can bill usage per client. That's the architecture a platform like Knotie is built around.

Sources · Transitioning Gemini CLI to Antigravity CLI — Google Developers Blog · Bye-bye Gemini CLI — The Register · Anthropic says Claude Code subscribers will need to pay extra for OpenClaw — TechCrunch · Anthropic reinstates third-party agent usage — with a catch — VentureBeat · OpenCode — Config (opencode.json: model, provider) · OpenCode — Providers (custom OpenAI-compatible, baseURL) · OpenCode — Models (75+ providers via models.dev) · Aider — OpenRouter (--model openrouter/...) · Aider — YAML config file (.aider.conf.yml: model, weak-model, editor-model) · Aider — OpenAI-compatible APIs (OPENAI_API_BASE, --model openai/...) · Ollama — qwen2.5-coder model library · Ollama — OpenAI-compatible API (localhost:11434/v1) · Claude Code — settings & environment variables (ANTHROPIC_API_KEY, ANTHROPIC_BASE_URL) · Anthropic — Models overview (current model IDs) · Google Antigravity — official docs · OpenRouter — Quickstart (OpenAI-compatible base_url https://openrouter.ai/api/v1) · OpenRouter — Create API key · Knotie — AI Gateway (OpenAI-compatible, base_url api.knotie.ai, per-key model restriction, metered billing)

The Provider-Agnostic AI Stack: A Real Migration Guide (Before Gemini CLI Dies June 18)

Why you're hereThe 30-second recap (then we get to work)

The foundationMove 1 — get off flat-rate onto your own API keys

The decouplerMove 2 — put a provider-agnostic harness in front

The safety netMove 3 — add a local fallback (exact commands)

The clean versionMove 4 — make 'the model' one config value via a gateway

Hard deadline: June 18If you were using `gemini` — your three real options

Beyond the videoSwap-safety: the gotchas nobody warns you about

The deliverableYour 5-point lock-in audit

Do this weekPre-June-18 checklist

Get the next drop

Frequently asked questions

The lock-in hitting coding CLIs is already hitting the agents you sell

Keep the copy-paste migration moves — and beat the June 18 Gemini CLI cutoff

The Provider-Agnostic AI Stack: A Real Migration Guide (Before Gemini CLI Dies June 18)

Why you're hereThe 30-second recap (then we get to work)

The foundationMove 1 — get off flat-rate onto your own API keys

The decouplerMove 2 — put a provider-agnostic harness in front

The safety netMove 3 — add a local fallback (exact commands)

The clean versionMove 4 — make 'the model' one config value via a gateway

Hard deadline: June 18If you were using gemini — your three real options

Beyond the videoSwap-safety: the gotchas nobody warns you about

The deliverableYour 5-point lock-in audit

Do this weekPre-June-18 checklist

Get the next drop

Frequently asked questions

More free guides

The lock-in hitting coding CLIs is already hitting the agents you sell

Keep the copy-paste migration moves — and beat the June 18 Gemini CLI cutoff

Hard deadline: June 18If you were using `gemini` — your three real options