CostCompass An Almanac Beta
OpenAI cost tracking

Know what OpenAI is actually costing you

OpenAI bills the API by model and by token, across a GPT-5 lineup that runs from cheap nano models to expensive reasoning runs. Here is how the billing works — and how to keep a running total and a forecast in front of you instead of waiting for the invoice.

By Joubert Berger Published June 3, 2026

OpenAI's invoice gives you one number: what the API cost last month. It doesn't tell you whether a single reasoning-heavy feature is quietly responsible for half of it, whether you're on track to spend more this month than last, or how it stacks up against everything else you run. For a solo developer or a small team building on the API, that one figure lands after the month is over — too late to change anything about it.

This guide covers how OpenAI bills for the API, where your usage data actually lives and what it takes to read it, why a running total is harder to keep than it looks, and how to forecast next month before the invoice arrives.

An antique almanac engraving: a graduated row of balance-weights ascending from tiny to large, the largest cut away to reveal a copper inner core, with a single key threaded through their ring-tops.
One lineup from nano to reasoning — and the largest cost is hidden inside.

How does OpenAI bill for usage?

OpenAI spend has two sides, and a complete picture needs both. The first is metered API usage — pay-as-you-go calls, priced by the token and varying by model. The second is a flat-rate subscription — ChatGPT Plus and Pro for individuals, Business and Enterprise seats for organizations. Most developers building on the platform live mostly in the first, but the two are billed in completely different places, which is part of why a single running total is harder to keep than it looks.

Start with metered usage. OpenAI charges for the API by the token, and the rate depends entirely on which model handled the request. Every call has two metered parts: the input tokens you send — your prompt, system instructions, tool definitions, and any conversation history — and the output tokens the model generates back. Output is the more expensive side, usually several times the input rate, so a chatty model or a long completion costs more than the prompt size alone suggests.

The lever that moves your bill the most is model choice, and the modern lineup spreads across more than price — it spans different units of billing. The flagship GPT-5 family handles text and reasoning, billed per token across a range that itself covers more than two orders of magnitude: a gpt-5.4-nano call costs a tiny fraction of a gpt-5.5-pro one. Around it sit models for other modalities, each metered its own way.

What you’re generatingExample modelsHow OpenAI bills it
Text & reasoninggpt-5.5, gpt-5.4, gpt-5.4-mini/nanoPer input / output token
Realtime & voicegpt-realtime-2Per token (audio far above text) or per minute
Imagesgpt-image-2, gpt-image-1-miniPer generated image, plus prompt tokens
Videosora-2, sora-2-proPer second of generated video

The older o-series reasoning models (o3, o4-mini) still surface on some bills but are winding down as the GPT-5 family takes over their role. That mix of units is the first reason an OpenAI bill resists a quick glance: a thousand tokens, one generated image, and a second of video are three different meters that all land on the same invoice.

Reasoning is the wrinkle that catches people out. Ask a GPT-5 model to think hard — a high reasoning-effort setting, or a genuinely difficult problem — and before it answers it generates a block of hidden reasoning tokens, its internal chain of thought, which OpenAI bills at the output rate even though you never see them. A prompt that looks small can produce a large, invisible output charge, which is why reasoning spend tends to surprise teams the first time they lean on it. (Exact per-model rates live on OpenAI’s API pricing page and change often — treat any number you’ve memorized as provisional.)

A couple of mechanisms can pull your actual bill below a naive tokens-times-rate estimate. OpenAI automatically caches large repeated prompt prefixes and charges a reduced rate for that cached input when a later call reuses it; separately, the Batch API runs non-urgent work asynchronously at half price. Both are worth using — and both are reasons a figure computed from token counts at standard rates is an upper bound, not the exact invoice.

Then there’s the part that isn’t metered at all. A ChatGPT Plus, Pro, Business, or Enterprise plan is a fixed monthly charge that lives entirely outside the API billing system. No usage endpoint returns it, so it has to be accounted for by hand rather than pulled automatically.

Where does OpenAI’s usage data actually live?

Your usage comes from OpenAI’s organization usage API — and reaching it is less straightforward than reading the bill, because it takes a particular kind of key and returns data at an org-wide scope. Two requirements are worth knowing before you start.

First, the key. The usage endpoint only accepts an Admin key (prefixed sk-admin-) or a service-account key (sk-svcacct-) that holds the Usage read permission — both created under your organization’s settings. The ordinary project keys you use to make model calls (sk-proj-…) are refused outright. It’s a different class of credential from the one your application ships with.

Second, the scope. That admin key reads usage for the whole organization at once, across every project and every project key underneath it. That’s the opposite of the per-project view in the dashboard, and it’s genuinely useful: where the console makes you click through each project and add the pieces up yourself, one admin key already returns the sum. The trade-off is that the data is organization-wide — it isn’t split out per project key — so the question the API answers cleanly is “what did the org spend, by model, by day,” not “which key spent it.”

A horizontal bar chart of month-to-date OpenAI spend broken down by model — GPT-5.5, GPT-5.4, GPT-5.4 mini, and o4-mini.
Month-to-date OpenAI spend grouped by model, the way the organization usage API reports it.

Why is OpenAI spend hard to keep ahead of?

The billing is mechanical; staying ahead of it across a real project is not. Four things get in the way:

  • Reasoning tokens are invisible. A high-reasoning feature can burn far more output than its prompts imply, and you don’t see the thinking tokens that drove the charge — only the total, after the fact.
  • The lineup sprawls. With nano, mini, flagship, and reasoning-heavy models all in play — often within one codebase — the blended rate depends on a mix that shifts as you tune which model handles which job. A change that routes more traffic to a pricier model moves the bill without any change in request volume.
  • Usage is spiky. A batch job, a traffic surge, or one runaway loop retrying a large prompt can move a day’s spend several-fold. By the time it shows up on the invoice, it’s already spent.
  • OpenAI is rarely your only provider. If you also call Anthropic for Claude, run inference on RunPod, or serve a site through Vercel and Cloudflare, the OpenAI figure is one line in a bill you have to assemble by hand from a half-dozen dashboards.

How can you reduce your OpenAI API bill?

Tracking is half the job; acting on what you see is the other half. A handful of levers move an OpenAI bill the most:

  • Right-size the model. Route simple, high-volume calls to a mini or nano model and reserve the flagship and reasoning models for work that needs them. Because the per-token spread across the GPT-5 family runs more than two orders of magnitude, one routing change can cut a workload’s cost sharply.
  • Dial reasoning effort down. Reasoning tokens bill at the output rate, so a lower reasoning-effort setting on tasks that don’t need deep thinking trims the invisible output charge.
  • Lean on cached input. Keep large, stable prefixes — system prompts, reference documents — consistent so OpenAI’s automatic caching charges the reduced rate instead of full input on every call.
  • Batch the non-urgent work. Move backfills, evals, and offline jobs to the Batch API for half-price throughput.
  • Watch the trend, not just the total. A per-day, per-model view turns a creeping spike into something you catch in days rather than discover on the invoice.

As an example of how quickly this compounds: a feature that quietly defaults to a reasoning model on every request can spend more on hidden thinking tokens than on the answer itself — moving the easy cases to a mini model and reserving reasoning for the hard ones is often the single biggest cut available.

How do you forecast next month’s OpenAI bill?

Forecasting doesn’t take anything exotic. The dependable method is a run rate: take your spend over the last several days, turn it into a daily average, and project it across a full month.

A 30-day line chart of daily OpenAI spend with a sharp mid-month spike that settles to a higher baseline.
A 30-day trend makes a spike — say, a new reasoning-model feature — visible days before the invoice would.

CostCompass uses exactly this method. It takes your trailing seven-day burn rate, multiplies it by the number of days in next month, and adds next month’s fixed subscriptions. The result is a single forward number: at this rate, here’s what next month will cost. Seven days is short enough to react to a recent change — a new feature shipping, a model swap — but long enough to smooth the gap between a quiet weekend and a busy Tuesday.

How CostCompass tracks your OpenAI API costs automatically

CostCompass connects to OpenAI’s organization usage API and reads your usage directly — grouped by model, one day at a time, recording input and output tokens for each. Here’s the catch that report hands you: OpenAI reports usage in tokens, not dollars — the endpoint returns counts, never a billed amount. So CostCompass computes the cost itself, multiplying those token counts by per-model rates from a stored pricing table and saving the rate it used alongside each day’s usage. That keeps every number reproducible: a past month’s cost stays put even after OpenAI changes its prices, because the rate that applied then was saved with it.

The CostCompass dashboard showing month-to-date spend with a forecast and burn rate.
Month-to-date spend across every connected provider, with a forecast and burn rate, the moment you open the dashboard.

All of that — reading your org-wide usage and pricing it day by day — runs on the one credential you connected: that Admin key. Two things about how CostCompass works make it practical for a single developer. First, your key is encrypted in your browser before it’s ever stored. The Admin or service-account key you connect is sealed with your vault password and saved only as ciphertext CostCompass can’t decrypt — so our App Server only ever holds that ciphertext, and your vault password stays in your browser. When we fetch OpenAI usage data, the key is decrypted in your browser and forwarded straight to OpenAI through a broker that holds it for the moment of the call and is built not to log it, so the plaintext stays out of our database and our logs.

Second, OpenAI doesn’t sit alone. The same dashboard rolls it up alongside your other connected AI and compute providers — into one month-to-date total, one forecast, and one breakdown.

Your ChatGPT subscription is the one piece no usage API reports — a Plus, Pro, Business, or Enterprise plan. You enter its monthly cost once, and CostCompass prorates it across the month, so both your month-to-date total and the forecast reflect the subscription alongside the metered API usage rather than missing it entirely.

Getting started takes three steps:

  1. In OpenAI, create an Admin key (or a service-account key with the Usage read permission).
  2. Paste it into CostCompass — it’s encrypted in your browser before it’s stored, so our server only ever holds ciphertext.
  3. Confirm the month-to-date figure matches what you expect. From then on your metered API usage refreshes on its own — the running total and the forecast stay current without your touching them.

Frequently asked questions

What kind of OpenAI key does CostCompass need?
An Admin key (sk-admin-…) or a service-account key (sk-svcacct-…) with the Usage read permission. Those are the only key types OpenAI's organization usage API accepts — ordinary project keys (sk-proj-…) are refused with a 401. You create one under your organization's settings, and because it reads usage org-wide, a single key covers every project you run.
Does CostCompass store my OpenAI key?
Not in any form we can read. The API key is encrypted in your browser with your vault password, and CostCompass stores only the resulting ciphertext — an opaque blob it has no way to decrypt. Your vault password stays in your browser, and our App Server only ever sees that ciphertext, not a plaintext key. When we fetch OpenAI usage data, the key is decrypted in your browser and forwarded to OpenAI through a broker that holds it for the moment of the call and is built not to log it, so the plaintext stays out of our database and logs.
Does CostCompass break OpenAI spend down by model?
Yes. It reads OpenAI's organization usage report one day at a time, grouped by model, and records input and output tokens for each. New models — a new GPT-5 point release, say — appear on their own as soon as OpenAI reports usage for them.
How does it handle reasoning tokens?
When a GPT-5 (or older o-series) model thinks before answering, OpenAI bills the hidden "thinking" tokens at the output rate, and they arrive inside the output-token count the usage API reports. CostCompass tracks that output figure as-is, so the reasoning cost is already captured — it's just folded into output rather than shown as a separate line.
Does it track image, video, and realtime usage (gpt-image, Sora, gpt-realtime)?
CostCompass reads OpenAI's token-metered chat-completions usage, so it covers the text and reasoning models — the GPT-5 family — that make up most API bills. Image generation (billed per image), Sora video (per second), and realtime voice (a separate audio meter) run on their own usage reports that aren't part of that token feed, so those are the OpenAI surfaces it doesn't pull automatically yet.
Can it track my ChatGPT subscription, not just API usage?
Yes, but you add it manually. The usage API reports metered API tokens only — a ChatGPT Plus, Pro, Business, or Enterprise subscription isn't exposed by any billing endpoint. You enter the monthly cost once and CostCompass prorates it into your month-to-date total and forecast.
Why might the CostCompass figure differ slightly from OpenAI's dashboard?
CostCompass computes cost from published per-model rates at the moment each day is recorded, so the number is reproducible and doesn't move when OpenAI later changes prices. Discounts that don't show up in the token counts — the Batch API's 50% rate, automatic cached-input pricing, or promotional credits — can make your actual invoice a little lower than a straight tokens-times-rate total.
Why use CostCompass instead of OpenAI's usage dashboard?
OpenAI's dashboard shows what you've already spent on OpenAI, after the fact, with no projection of where your spend is heading. CostCompass turns that into a live, forward-looking picture — a running month-to-date total, a forecast, and a per-model breakdown — for OpenAI and the rest of your connected providers, in one view. You see a spike forming days before the invoice instead of after.

About the author

Joubert Berger builds CostCompass, a spend-intelligence dashboard that pulls usage from AI and compute providers into one month-to-date total, a forecast, and a per-provider breakdown. This guide reflects how CostCompass reads each provider's own usage API — see the security model for how your keys are handled.

Stop guessing at the OpenAI bill

Connect OpenAI once and watch your metered OpenAI spend — month-to-date, forecast, and per-model breakdown — keep itself current.