Know what Gemini is actually costing you
Google bills Gemini by model and token across Vertex AI and the Gemini API. Here is how the billing works — and how to keep a running total, an exact-cost option, and a forecast for Gemini in the same view as every other provider you run.
Google Cloud's billing console is genuinely capable: it breaks your Gemini spend down by service and SKU, groups it by project, exports every line to BigQuery, and even forecasts your end-of-month cost. The catch is that all of it stops at the edge of Google Cloud. It can't tell you what Gemini costs next to the Anthropic, OpenAI, or hosting bills you run alongside it — and it only surfaces any of this once you go into the console and build the report. For a solo developer or a small team, the numbers are accurate but siloed in Google's own tooling: one provider's slice of a bill you still have to assemble by hand.
This guide covers how Google bills for Gemini, where your usage data actually lives and the two ways CostCompass can price it, why a running total is harder to keep than it looks, and how to forecast next month before the invoice arrives.

How does Gemini API billing work?
Gemini spend has two sides, and a complete picture needs both. The first is metered usage — pay-as-you-go API calls, priced by the token and varying by model. The second is a flat-rate subscription — a Google AI plan, or a Gemini-in-Workspace add-on for a team. Most developers building on the API live mostly in the first, but the two are billed in completely different places, which is part of why a single running total is harder to keep than it looks.
Start with metered usage. Google charges for the API by the token, and the rate depends entirely on which model handled the request. Every call meters the input tokens you send — your prompt, system instructions, and any context — and the output tokens the model generates back. Output is the more expensive side, several times the input rate, so a long completion costs more than the prompt size alone suggests.
Two more token categories sit alongside those, and tracking them separately matters because each is priced differently. Cached input — a large, stable prefix Google can reuse across calls — bills at a fraction of the normal input rate. And on the reasoning models, a block of hidden thinking tokens is generated before the answer and billed at the output rate, even though you never see it.
| Token category | How it’s priced |
|---|---|
| Input | Base rate for the prompt, system instructions, and context |
| Output | Highest rate — several times the input rate |
| Cached input | A fraction of the input rate when a later call reuses the prefix |
| Thinking | Hidden reasoning tokens, billed at the output rate |
The lever that moves your bill the most is model choice, and the lineup spreads across more than an order of magnitude per token. Gemini 3.1 Pro is the flagship, aimed at the hardest reasoning, agentic, and coding work; Gemini 3.5 Flash is the everyday workhorse, landing near Pro-level quality at Flash pricing; and Gemini 3.1 Flash-Lite is the cheapest tier for high-volume, simpler calls. Matching the model to the job is the single biggest cut available.
| Model | Tier | Best suited for |
|---|---|---|
| Gemini 3.1 Pro | Most capable | Hard reasoning, agentic, and coding work |
| Gemini 3.5 Flash | Mid-range | Everyday coding, drafting, and high-volume agents |
| Gemini 3.1 Flash-Lite | Cheapest | High-volume, latency-sensitive, simpler tasks |
There’s also a wrinkle in where Gemini is billed. The same models are sold through two front doors: Vertex AI, Google Cloud’s enterprise surface, and the Gemini API from AI Studio, billed under the name Generative Language API. Which one you use depends on how you set up — and a project can use both — which is the first reason Gemini spend resists a quick glance: the same model’s usage can land on two different meters. (Exact per-model rates live on Google’s Vertex AI pricing page and the Gemini API pricing page, and change over time — treat any number you’ve memorized as provisional.)
It’s also worth knowing that Gemini bills as part of your wider Google Cloud account. On Google’s own invoice, its tokens sit on the same bill as everything else you run there — Compute Engine CPU hours, Cloud Storage, networking egress, and the rest — which is part of why picking your Gemini spend out of the total takes some digging. This guide stays on the Gemini slice: CostCompass reads only the Vertex AI and Gemini API line items, so the figure you see is your Gemini spend on its own, not your whole Google Cloud bill.
That leaves the part that isn’t metered at all. A Google AI subscription or a Gemini Workspace add-on is a fixed monthly charge that lives entirely outside the API billing system. No usage endpoint returns it, so it has to be accounted for by hand rather than pulled automatically.
Why is Gemini spend hard to track?
The billing is mechanical; staying ahead of it across a real project is not. Four things get in the way:
- It’s split across two surfaces. The same models bill as two separate services — Vertex AI and the Gemini API (the Generative Language API) — and the Gemini API keeps its prepay balance and history in the AI Studio billing tab, apart from the Cloud console. Seeing your true Gemini total means stitching those views together.
- Projects multiply. As soon as you have a staging project, a production project, and a personal experiment, Gemini spend is scattered across separate Google Cloud projects — each with its own slice.
- Thinking tokens are invisible. A reasoning model can burn far more output than its prompts imply, and you don’t see the thinking tokens that drove the charge — only the total, after the fact.
- Gemini is rarely your only provider. If you also call Anthropic for Claude or build on OpenAI, run inference on RunPod, or serve a site through Vercel and Cloudflare, the Gemini figure is one line in a bill you have to assemble by hand from a half-dozen dashboards.
What’s the difference between estimated and exact Gemini cost?
Most providers only let you estimate API cost from token counts. Gemini is the exception — with one piece of setup, CostCompass can show Google’s own billed number — so it’s worth understanding both.
The moment you connect, CostCompass reads your Gemini token usage from Google Cloud Monitoring and computes the cost itself, multiplying those token counts by per-model rates from a stored pricing table and saving the rate it used alongside each day. That’s an estimate — accurate, reproducible, and available immediately with nothing to configure. A past month’s cost stays put even after Google changes its prices, because the rate that applied then was saved with it. In the product, that figure is marked Estimated.

Enable Google’s BigQuery billing export once — a one-time setup in the Cloud console — and CostCompass switches to reading Google’s actual billed cost: the authoritative number from your invoice, discounts and all. The spend figure upgrades from estimated to exact, and the Estimated tag drops away. CostCompass points you to the setup with a short walkthrough when you’re ready; until then the estimate keeps you covered. The trade is honest: the estimate is instant and reflects usage the moment Cloud Monitoring reports it, while the exact number needs that one setup and lands about a day behind. Google’s billing export is usually ready within 24 hours and keeps settling after that — late-reported usage and credits can still adjust it until the month closes — so the exact figure is the authoritative total, just not the most up-to-the-minute one. The estimate stays your live read on today’s spend; the exact number is what Google will actually bill.
How can you reduce your Gemini API bill?
Tracking is half the job; acting on what you see is the other half. A few levers move a Gemini bill the most:
- Right-size the model. Send high-volume, simpler calls to Flash-Lite or Flash and reserve 3.1 Pro for work that genuinely needs it. With more than an order of magnitude between the cheapest and priciest model per token, one routing change can cut a workload’s cost sharply.
- Lean on cached input. Keep large, stable prefixes — system prompts, reference documents — consistent so the cached-input rate applies instead of full input on every call.
- Dial thinking down. Reasoning tokens bill at the output rate, so a lower thinking budget on tasks that don’t need deep reasoning trims the invisible output charge.
- Batch the non-urgent work. Move evals, backfills, and offline jobs off the interactive path so a spike there doesn’t blur your live spend.
- Watch the trend, not just the total. A per-day, per-model view turns a creeping spike into something you catch in days rather than discover on the invoice.
How do you forecast next month’s Gemini bill?
Forecasting doesn’t take anything exotic. The dependable method is a run rate: take your spend over the last several days, turn it into a daily average, and project it across a full month.

CostCompass uses exactly this method. It takes your trailing seven-day burn rate, multiplies it by the number of days in next month, and adds next month’s fixed subscriptions. The result is a single forward number: at this rate, here’s what next month will cost. Seven days is short enough to react to a recent change — a new feature shipping, a model swap — but long enough to smooth the gap between a quiet weekend and a busy Tuesday.
How do you track Gemini API costs automatically?
CostCompass connects to Google Cloud and reads your Gemini usage directly — grouped by model, one day at a time, across both Vertex AI and the Gemini API. It separates the token categories Google reports — input, output, cached input, and the hidden thinking tokens — so the per-model picture reflects how you’re actually using the API.

Two things make it practical for a single developer. First, you connect with Google’s own sign-in, and what comes back is encrypted in your browser before anything is stored. You approve CostCompass on Google’s consent screen, and the access it hands back is locked in your browser with your vault password. We do keep a copy — but only the locked version, which we have no way to open, because your vault password stays in your browser. To read your usage, your browser unlocks that access and passes it through a relay that uses it for that one request and is built not to log or keep it. So the only Google data we hold at rest is a locked blob we can’t open — no usable key to your account sits in our database or our logs. One sign-in covers every Google Cloud project you choose to enable.

Second, Gemini doesn’t sit alone. The same dashboard rolls it up alongside your other connected AI and compute providers — Claude and OpenAI included — into one month-to-date total, one forecast, and one breakdown.
For the part no API exposes — a Google AI or Gemini Workspace subscription — you enter the monthly cost once. CostCompass prorates it across the month, so both your month-to-date total and the forecast reflect the subscription alongside the metered usage rather than missing it entirely.
Getting started takes three steps:
- Click Connect Google Cloud and authorize CostCompass in Google’s consent screen, then pick which projects to track.
- Confirm the month-to-date figure matches what you expect — it arrives as an accurate estimate straight away.
- When you want the exact billed number, enable Google’s BigQuery billing export once and the figure upgrades from estimated to exact. From then on your Gemini usage refreshes on its own — the running total and the forecast stay current without your touching them.
Frequently asked questions
- Does CostCompass store my Google Cloud credentials?
- Not in any form we can read. You connect through Google's own sign-in and approve CostCompass on Google's consent screen — there's no API key to paste. The access Google grants is locked in your browser with your vault password before anything is stored. We do keep a copy, but only the locked version we have no way to open, because your vault password stays in your browser. To read your usage, your browser unlocks that access and passes it through a relay that uses it for that one request and is built not to log or keep it. So what we hold at rest is a locked blob we can't open — no usable Google credential ends up in our database or logs.
- What's the difference between estimated and exact cost — and how do I get exact?
- On connect, CostCompass reads your Gemini token usage from Google Cloud Monitoring and computes the cost itself from published per-model rates, saving the rate it used alongside each day. That's an accurate estimate, and it's available immediately with nothing to set up. If you enable Google's BigQuery billing export once, CostCompass switches to reading Google's own billed cost — the authoritative number from your invoice — and the spend figure upgrades from estimated to exact — a one-time setup in the Google Cloud console is all it takes. One caveat — Google's billed data lands about a day behind and keeps settling until the month closes, so the estimate stays your most current read while the exact number is the authoritative total once Google catches up.
- Does it cover both Vertex AI and the Gemini API?
- Yes. Gemini bills through two front doors — Vertex AI (the enterprise surface) and the Gemini API (the AI Studio surface, billed as the Generative Language API) — and CostCompass reads both, so your total reflects all of your Gemini usage rather than whichever console you happened to open.
- Does CostCompass break Gemini spend down by model?
- Yes. It reads usage one day at a time, grouped by model, and separates the token categories Google reports — input, output, cached input, and the hidden thinking tokens reasoning models generate — so the per-model picture reflects how you actually use the API. New models appear on their own as soon as Google reports usage for them.
- Can it track Gemini across more than one Google Cloud project?
- Yes. One Google sign-in covers every project you choose to enable, and CostCompass rolls them into a single Gemini total — so spend spread across a staging project, a production project, and a personal experiment lands as one number instead of three consoles.
- How does the next-month forecast work?
- CostCompass takes your trailing seven-day burn rate, multiplies it by the number of days in next month, and adds next month's fixed subscriptions. That projects what next month will cost if your current pace holds — across every provider you've connected, not just Gemini, so the forecast covers your whole AI bill in one number.
- Why use CostCompass instead of Google's Cloud Billing console?
- Google's console is good at Google — it breaks spend down by service and SKU and even forecasts your end-of-month cost, but only for Google Cloud, and only once you go in and build the report. CostCompass rolls Gemini up with every other provider you run — Anthropic, OpenAI, your hosting bills — into one month-to-date total, one forecast, and one breakdown, refreshed on a schedule and waiting the moment you open it. You get Gemini's exact billed cost without living in BigQuery, and your whole AI stack as a single number instead of a console per vendor.
About the author
Joubert Berger builds CostCompass, a spend-intelligence dashboard that pulls usage from AI and compute providers into one month-to-date total, a forecast, and a per-provider breakdown. This guide reflects how CostCompass reads each provider's own usage API — see the security model for how your keys are handled.
Stop guessing at the Gemini bill
Connect Google Cloud once and watch your Gemini spend — month-to-date, forecast, and per-model breakdown — keep itself current.