CostCompass An Almanac Beta
Claude cost tracking

Know what Claude is actually costing you

Anthropic bills Claude by model and by token, and the invoice arrives a month too late to act on. Here is how the billing works — and how to keep a running total and a forecast in front of you in real time.

By Joubert Berger Published June 2, 2026

The invoice from Anthropic tells you one number: what you spent on Claude last month. It does not tell you whether that number is about to double, which models are driving it, or how it stacks up against everything else you run. For a solo developer or a small team shipping on top of Claude, that single figure arrives too late to act on — the month is already over.

This guide covers how Anthropic bills for Claude, why a running total is harder to keep than it looks, and how to forecast where your spend is heading before the invoice lands.

An antique almanac engraving: a jug pours coin-like tokens into a hopper that splits the single stream into four collecting pans of unequal fill — one pan picked out in copper — while a wax-sealed letter lies unopened nearby.
One spend, split across four token meters — totalled only by an invoice still sealed.

How does Claude API billing work?

Claude costs usually fall into two buckets, and a complete picture needs both. The first is metered usage — pay-as-you-go API calls, priced by the token and varying by model. The second is a flat-rate subscription — Claude Pro and Max for the chat app, Team and Enterprise seats for organizations. Most developers shipping on the API live mostly in the first, but the two are reported in completely different places, which is part of why a single running total is harder to keep than it looks.

Start with metered usage. At its simplest, a request meters two things: the input tokens you send — your prompt, system instructions, and any conversation history — and the output tokens Claude generates in reply. (Prompt caching splits input further, as the next section covers.) Output is the more expensive of the two, typically several times the input rate, so a verbose assistant costs more than its prompt size alone would suggest.

Rates vary widely by tier. Anthropic’s lineup runs from Haiku, the fast and inexpensive tier, through Sonnet in the middle, to Opus, aimed at the hardest work and typically the priciest tier — the spread between the cheapest and priciest model is more than an order of magnitude per token. Choosing the right model for each job is the single biggest lever on your bill.

TierRelative costBest suited for
HaikuCheapestHigh-volume, latency-sensitive, simpler tasks
SonnetMid-rangeEveryday coding, drafting, and agent work
OpusMost expensive (>10× Haiku per token)The hardest reasoning where quality justifies the cost

Prompt caching adds a wrinkle worth understanding. When you reuse a large, stable prefix — a long system prompt, a reference document, a set of few-shot examples — Anthropic can cache it. Writing to the cache is priced above ordinary input — how far above depends on the cache duration you choose — while reading from it on later calls costs a fraction of the normal input rate. Used well, caching cuts the cost of repeated context dramatically, which is why it pays to track cache-read and cache-write tokens separately rather than lumping all input together. (Exact per-model rates live on Anthropic’s API pricing page and change over time — treat any number you commit to memory as provisional.)

That leaves four distinct token categories on a single request, each priced differently — which is why tracking them separately matters:

Token categoryHow it’s priced
InputBase rate for the prompt, system instructions, and history you send
OutputHighest rate — typically several times the input rate
Cache writeSlightly above the input rate, charged the first time a prefix is cached
Cache readA fraction of the input rate on later calls that reuse the cached prefix

The subscription side is simpler, but easier to lose track of. A Claude Pro or Max plan, or a row of Team seats, is a fixed monthly charge that doesn’t move with usage — and, crucially, it lives entirely outside the usage API. There is no billing endpoint that returns it, so it has to be accounted for by hand rather than pulled automatically.

Why is Claude spend hard to track?

Tokens are simple. Keeping a running total across a real project is not. Four things get in the way:

  • Keys multiply. As soon as you have a staging environment, a production app, and a personal experiment, you have three API keys — often spread across separate Anthropic workspaces — each with its own slice of spend. The number you care about is the sum, and nobody hands you the sum.
  • Usage is spiky. A batch job, a traffic surge, or a single runaway loop that retries a 200-kilotoken prompt can move a day’s spend by 10x. By the time it surfaces on the invoice, it’s already spent.
  • There’s no forecast. Anthropic shows you what you have used, not what you are on track to use. The question that actually keeps you up — “at this rate, what will next month cost?” — has no button.
  • Claude is rarely your only provider. If you also call OpenAI, run inference on RunPod, or serve a site through Vercel and Cloudflare, the Claude figure is one line in a bill you have to assemble by hand from a half-dozen dashboards.

How can you reduce your Claude API bill?

Tracking is half the job; acting on what you see is the other half. A few levers move a Claude bill the most:

  • Right-size the model. Send high-volume, simpler calls to Haiku and reserve Opus for the work that genuinely needs it. With more than an order of magnitude between the cheapest and priciest tier per token, matching model to task is the single biggest cut available.
  • Lean hard on prompt caching. A large, stable prefix — a long system prompt, a reference document — read from cache costs a fraction of full input. Keeping that prefix byte-stable so it stays cached, rather than re-sending it uncached on every call, is often the biggest saving on Claude specifically.
  • Trim output. Output bills several times the input rate, so asking for concise answers, capping max tokens, and avoiding needless re-generation all pull the expensive side down.
  • Batch the non-urgent work. Move evals, backfills, and offline jobs off the interactive path so a spike there doesn’t blur your live spend.
  • Watch the trend, not just the total. A per-day, per-model view turns a creeping spike into something you catch in days rather than discover on the invoice.

As an example of how quickly this compounds: a large system prompt re-sent uncached on every request can cost several times what the same prefix costs as a cache read — turning that one prefix into a cache hit is often a bigger saving than switching models.

Spreadsheet or dashboard: which should you use?

The usual first answer is a spreadsheet. You log into the Anthropic Console, read the usage page, copy the figure into a sheet, and repeat for every other provider. It works, briefly. Then you miss a few days, the numbers go stale, and the spreadsheet quietly stops reflecting reality — right when a spike happens.

The deeper problem is that a spreadsheet is a snapshot, not a system. It doesn’t refresh itself, it doesn’t break spend down by model without manual work, and it can’t warn you that today’s burn rate puts you over budget. Anything you have to remember to update is something you will eventually forget to update.

A dashboard turns that manual snapshot into a scheduled system. It reads your usage on a schedule, keeps the running total current, and does the arithmetic — month-to-date, by model, by provider — for you. The point isn’t prettier charts; it’s that the number is correct when you glance at it, without your having done anything.

How do you forecast next month’s Claude bill?

Forecasting next month’s spend doesn’t require anything exotic. The reliable approach is a run rate: take your spend over the last several days, turn it into a daily average, and project it across a full month.

A 30-day line chart of daily Claude spend, with a recent upward slope.
A 30-day spend trend makes a developing spike visible days before the invoice would.

CostCompass uses exactly this method. It takes your trailing seven-day burn rate, multiplies it by the number of days in next month, and adds next month’s fixed subscriptions. The result is a single forward number: at this rate, here is what next month will cost. A seven-day window is short enough to react to a recent change in usage but long enough to smooth the difference between a quiet Sunday and a busy Tuesday.

How CostCompass tracks your Claude API costs automatically

CostCompass connects to Anthropic’s Admin API and reads your Claude usage directly — broken down by model, one day at a time. It separates the token categories Anthropic reports — ordinary input, cache writes, cache reads, and output — so the per-model picture reflects how you’re actually using the API, caching and all.

The CostCompass dashboard showing month-to-date spend with a forecast and burn rate.
Month-to-date spend across every connected provider, with a forecast and burn rate, the moment you open the dashboard.

Two things make it practical for a single developer. First, your key is encrypted in your browser before it’s ever stored. The Anthropic admin key you connect is sealed with your vault password and saved only as ciphertext CostCompass can’t decrypt — so our App Server only ever holds that ciphertext, and your vault password stays in your browser. When we fetch Anthropic usage data, the key is decrypted in your browser and forwarded straight to Anthropic through a broker that holds it for the moment of the call and is built not to log it, so the plaintext stays out of our database and our logs.

Second, Claude doesn’t sit alone. The same dashboard rolls Anthropic up alongside your other connected AI and compute providers — OpenAI included — into one month-to-date total, one forecast, and one breakdown.

A horizontal bar chart of month-to-date spend by provider, Anthropic at the top.
Where the month's spend actually went, provider by provider.

For the part no API exposes — a Claude Pro, Max, Team, or Enterprise subscription — you enter the monthly cost once. CostCompass tracks it manually and prorates it across the month, so both your month-to-date total and the forecast reflect the subscription alongside the metered usage rather than missing it entirely.

Getting started takes three steps:

  1. In Anthropic, create an Admin API key — the usage report comes from the Admin API.
  2. Paste it into CostCompass — it’s encrypted in your browser before it’s stored, so our server only ever holds ciphertext.
  3. Confirm the month-to-date figure matches what you expect. From then on your metered Claude usage refreshes on its own — the running total and the forecast stay current without your touching them.

Frequently asked questions

Does CostCompass store my Anthropic API key?
Not in any form we can read. The API key is encrypted in your browser with your vault password, and CostCompass stores only the resulting ciphertext — an opaque blob it has no way to decrypt. Your vault password stays in your browser, and our App Server only ever sees that ciphertext, not a plaintext key. When we fetch Anthropic usage data, the key is decrypted in your browser and forwarded to Anthropic through a broker that holds it for the moment of the call and is built not to log it, so the plaintext stays out of our database and logs.
Does CostCompass break Claude spend down by model?
Yes. It reads your usage from Anthropic's Admin API per model, one day at a time, and separates the token categories Anthropic reports — ordinary input, cache writes, cache reads, and output — so the per-model picture reflects how you actually use the API.
How does the next-month forecast work?
CostCompass takes your trailing seven-day burn rate, multiplies it by the number of days in next month, and adds next month's fixed subscriptions. That projects what next month will cost if your current pace holds — the forward-looking number a provider dashboard never gives you.
Does it include my Claude subscription, not just API usage?
Yes, but you add it manually. Anthropic's usage API reports metered token usage only — subscription charges like Claude Pro, Max, Team, or Enterprise aren't exposed by any billing endpoint. You enter the monthly cost once and CostCompass prorates it into your month-to-date total and forecast.
Which Claude models does it track?
Every model Anthropic returns in the usage report — across the Haiku, Sonnet, and Opus tiers — grouped by model. New models appear automatically as Anthropic reports usage for them.
Can it track Claude alongside my other providers?
Yes. CostCompass rolls Anthropic up with your other connected AI and compute providers into one month-to-date total, one forecast, and one breakdown — so Claude is one line in a number you no longer have to assemble by hand.
Is CostCompass worth it if Claude is the only API I use?
Even with one provider, Anthropic's Console shows what you've already spent, not where your spend is heading — there's no running forecast and no read on whether today's burn rate is trending over budget. CostCompass gives you both — a live month-to-date total and a projection of what next month will cost at your current pace, refreshed on a schedule so the number is right whenever you glance at it. And the day you add a second provider — an image API, a hosting bill, an inference box — it's already one combined total instead of another dashboard to reconcile.

About the author

Joubert Berger builds CostCompass, a spend-intelligence dashboard that pulls usage from AI and compute providers into one month-to-date total, a forecast, and a per-provider breakdown. This guide reflects how CostCompass reads each provider's own usage API — see the security model for how your keys are handled.

Stop guessing at the Claude bill

Connect Anthropic once and watch your metered Claude spend — month-to-date, forecast, and per-model breakdown — keep itself current.