The three ways AI cost tools get your numbers
Most tools that track AI spend collect it one of three ways — by routing your traffic, by living in your code, or by reading the bill each provider already keeps. The method you pick decides what the tool can ever show you. Here's how the named tools sort out, and where CostCompass fits.
Search for a tool to track your AI spend and you'll find dozens, all promising the same dashboard. What separates them is where the numbers come from. One tool reroutes your traffic through itself. Another asks you to wrap your code in its library. A third never touches your application and just reads the bill each provider already keeps. That choice, made before you see a single screen, decides what the tool can ever show you.
Which one fits depends on what you're trying to find out. Debugging the cost of a single request is a different job from knowing what AI is costing you this month across everything you run, and the three methods aren't equally good at both. It's worth seeing them side by side before you compare dashboards. Here they are, with the named tools that use each.

How do AI cost tools collect your spend?
Most cost tools do the same three things: read each provider’s usage, price it into money, and add it up. What differs is where the tool stands to take the first reading, and there are three places to stand.
It can stand in the flow, as a proxy your requests pass through. It can stand inside your code, as a library that reports each call. Or it can stand outside everything, reading the records each provider already keeps. Sorting tools by that one decision is the most useful way to compare them. Everything else follows from it: what spend the tool can see, whether it estimates or reads the real figure, and what it costs you to run. A few tools support more than one method; what matters is the one they lead with.
| Method | Setup | In your request path? | What it sees — and misses | The number is |
|---|---|---|---|---|
| Proxy / gateway | Easy — point your base URL at it | Yes | Only traffic you route through it; direct calls and non-API spend are missed | Estimated from tokens (or its own charge) |
| SDK / library instrumentation | Most work — wrap each call, then maintain it | In your code | Only the calls you instrument; un-wrapped paths and non-model spend are missed | Estimated from observed tokens |
| Read the bill (at the source) | Easy — paste a key, no code (cloud exports take more) | No | Every provider with a readable billing API; dashboard-only providers can’t be pulled | The provider’s own recorded usage, priced |
Each method below names three tools that use it, and what it’s good for. A tool is placed by the primary method its own documentation describes, checked in June 2026. Products move, so confirm against each tool’s own docs before you decide.
The tools, by method:
- Proxy or gateway: Helicone, Cloudflare AI Gateway, OpenRouter
- SDK instrumentation: Langfuse, Maxim, Orbit
- Read the bill at the source: Vantage, StackSpend, CostGoat — the method CostCompass uses
Method 1: Route your traffic through a proxy or gateway
A gateway sits between your application and the model provider. You point your calls at the gateway’s endpoint instead of the provider’s, often a one-line change, and every request flows through it on the way out. Because it sees the traffic, it can count tokens, log the request, cache responses, enforce a spend limit, and route between models. Cost reporting comes out as a by-product of all that.
Helicone is the developer-facing example: change your base URL and your requests proxy through it, and it logs cost, latency, and tokens per call (it also offers an off-path async logger if you’d rather not sit in the request path). Cloudflare AI Gateway works the same way — a one-line endpoint change in front of your providers — and is candid that its cost figure is an estimate computed from observed tokens, pointing you to the provider’s own dashboard for the authoritative number. OpenRouter is a gateway of a different kind: you buy model access through it, so its figure is the actual charge rather than a reconstruction — but only for the traffic you route and pay through OpenRouter.
The shared trait, and the catch, is that a gateway only sees what flows through it. Reroute your traffic and you get rich per-request detail; send anything straight to a provider and it’s invisible. And because the whole frame is API calls, a rented GPU box or a hosting bill falls outside what a gateway tracks, since neither is a routed request. In the usual proxy setup you’re also a hop in your own latency budget (an off-path logging mode, where one is offered, avoids that). For per-request control and debugging that’s a fair trade. If all you want is a monthly total across everything, it’s a lot of plumbing pointed at the wrong question.
Method 2: Instrument your code with an SDK
The second method moves the meter inside your application. You install a library, wrap your provider client with it, and from then on every call you’ve instrumented reports its tokens and cost to the tool. Nothing reroutes — your requests still go straight to the provider — but there’s now code from the tool in your codebase.
Langfuse is the widely-used open-source option here, built on OpenTelemetry: you trace your calls and it records cost alongside the tokens, taking the figure from the model’s response when it’s there and otherwise computing it from a built-in price table. Maxim instruments agents and evaluation runs, capturing token cost per trace as part of a broader quality-and-observability picture. Orbit takes the narrowest angle of the three: you wrap your model client and tag each call with a feature and a customer, so cost lands as per-feature, per-customer unit economics. That makes it a margin tool more than a billing one. It never sees your provider keys, and it never reads your bill either; the cost is reconstructed from the calls it observes.
What you buy with an SDK is detail a bill can’t give you: cost traced down to the individual call, tagged by feature, prompt, and customer. The price is code to install and maintain — the wrappers have to keep working as your code and the providers’ SDKs change — and you inherit the same blind spot as a gateway. The tool only knows about the calls you instrumented, so anything you forgot to wrap, and any spend that isn’t a model call, just isn’t there. For someone optimizing the unit economics of an AI product, that detail is the point. If you just want this month’s number, the instrumentation is overhead you don’t need.
Method 3: Read the bill at the source
The third method never touches your application. Instead of metering traffic, it reads the usage or billing records each provider already keeps — through a read-only key, a billing export, or a cost API — and prices that into money. Nothing reroutes, and nothing is installed in your code or your latency budget. Because it starts from the provider’s own records rather than from traffic a library happened to see, it can cover anything with a readable usage API on one panel: model tokens, a GPU box, object storage.
Vantage is the broad example, spanning cloud providers and, increasingly, model APIs. For the AI providers it connects to a read-only admin key and pulls usage through their cost API, never writing anything back. It scales from a startup to an enterprise, which means both more breadth and more setup. StackSpend is the closest in spirit to a solo-dev tool: you paste read-only keys for each provider and it reads billing across both cloud and AI services, self-serve, no traffic and no code. CostGoat reaches the same records from a different place — a privacy-first desktop app that keeps your keys on your own machine and fetches each provider’s usage locally, covering AI APIs, cloud, and subscriptions for an indie developer who’d rather nothing left the laptop.
Others read the bill too, with a different center of gravity: CloudZero pulls cloud billing exports for enterprise cost allocation and unit economics; Holori normalizes multi-cloud spend, with its model-provider connectors still expanding as of mid-2026; Binadox reads provider APIs for cloud and LLM cost (its proxy and agent are for discovering shadow SaaS, not for metering your API spend — a useful thing to keep straight).
The catch with this method is real but narrow. It reports at the provider’s own resolution rather than per individual request, so it won’t help you debug a single expensive call, and it’s only as current as the last time you pulled it. Setup is usually a key-paste with no code, though pulling a cloud platform’s cost export, like an AWS or BigQuery billing feed, is more involved than that. A month-to-date figure is a running estimate from published rates rather than a settled invoice, and discounts buried below the raw usage can make the final bill a little lower. What you give up in per-call granularity, you get back as the whole bill across everything you run, with nothing wired into your application.
Which AI cost tracking method should you use?
It depends on the question you’re asking, and the three methods answer different ones.
If you need to know what a single request cost — to debug a runaway prompt, attribute spend to one customer, cache repeated calls, or enforce a hard limit inline — a proxy or an SDK is the right tool, and the code or the request-path hop is worth it. That per-call resolution is exactly what reading the bill can’t give you. A finance team allocating a large cloud estate across products wants a cloud cost platform built for showback, and its enterprise setup pays for itself at that scale.
But if you’re a solo developer or a small team, and the question is what is AI costing me this month, across every provider I run, and where is it heading, the answer is to read the bill. You don’t need a request-level trace to answer that, and the plumbing the other two methods ask for is overhead pointed away from the question. The whole monthly number, across model APIs and compute alike, with nothing installed: that’s the at-source method’s home ground.
Where CostCompass fits
CostCompass is a read-the-bill tool built for that last case: a solo developer who wants the whole AI bill in one place. It uses the same method as Vantage, StackSpend, and CostGoat. The mechanism is shared across all four; who it’s built for is what sets it apart.

You connect each provider once. After that, a click on Refresh reads that provider’s latest usage and prices it into money, so a dozen different meters (tokens, GPU-hours, characters, requests) land in one comparable figure, with a forecast that runs your trailing seven-day burn rate out across next month. The data is kept current the moment you click Refresh, not by a background timer: CostCompass stays quiet until you look, then brings every provider’s latest usage in at once. It doesn’t poll around the clock, and it doesn’t send alerts. A developing climb shows up the next time you refresh, while there’s still month left to act on it.

Two things shape it for a single developer rather than a finance org. First, it spans model APIs and compute together — Claude and OpenAI sit on the same panel as the GPU box and the hosting bill, the compute spend most model-only tools leave out — across the providers you connect. Second, your keys are encrypted in your browser before they’re ever stored. Each provider key is sealed with your vault password and saved only as ciphertext the server can’t decrypt, so the App Server only ever holds that ciphertext. When usage is fetched, the key is decrypted in your browser and forwarded to the provider through a broker built not to log it. StackSpend reads the same kind of records server-side, and CostGoat keeps keys on a desktop machine. CostCompass uses a browser vault instead: you get the convenience of a hosted dashboard without the server ever holding a usable key.
It’s one of several tools that read the bill, and the claim here is a narrow one: for a solo developer who wants the month’s whole AI number on demand, with nothing rerouted and nothing in their code, it’s built for exactly that. For the wider set of approaches, including provider dashboards and a plain spreadsheet, see how to track AI costs across providers, and for what a good dashboard should show, what an AI API cost dashboard is.
Frequently asked questions
- What are the three ways AI cost tracking tools collect billing data?
- A tool gets your numbers in one of three ways. A proxy or gateway sits in your request path and meters spend as your calls pass through it. An SDK or library lives in your code and reports usage from inside your application. A billing-API tool reads each provider's own usage or billing records after the fact, outside your application entirely. These aren't ranked tiers; each one buys a different thing. But the method you choose fixes what the tool can ever see and what it costs you to run.
- Which method tracks AI cost without changing my code or my request path?
- Reading the bill at the source. A proxy means rerouting your calls through it. An SDK means adding its package to your application. Both put something between your code and the provider. The billing-API approach reads the records each provider already keeps, so nothing sits in your code or your latency budget. It's the only one of the three that tracks spend with your application running exactly as it did before.
- Why do proxies and SDKs only show cost for traffic that runs through them?
- Because that's all they can see. A gateway only meters the requests you route through it, and an SDK only reports the calls you wrapped. Anything you send straight to a provider is invisible to them, and so is any spend that isn't an API call at all — a rented GPU box, a hosting bill, a flat subscription. A billing-API tool starts from the provider's own records instead, so it can cover spend that never flowed through a single library.
- Do these tools show what I was actually charged, or an estimate?
- It depends on the method. Most proxies and SDKs reconstruct cost by counting tokens and multiplying by a price table, which is an estimate that can drift from the real bill. A reseller gateway you buy credits through is the exception — its figure is the actual charge, though only for traffic you bought through it. A billing-API tool reads the provider's own recorded usage and prices it at published rates, so the number is a reproducible estimate computed from the provider's records rather than from whatever traffic a library happened to observe.
- Can a billing-API tool cover cloud and compute costs, not just model tokens?
- Yes, for any provider that exposes a usage or billing API. That's the practical edge of reading the bill over instrumenting calls — a GPU box, object storage, a hosting bill, and a model API all keep usage records, so they can sit on one panel together. "In one view" means the providers you connect, not every provider that exists. Where a provider only shows billing in its own dashboard with no API to read, no tool of any method can pull it automatically.
- How current is an at-source tool's number, and does it warn me of a spike?
- It's as current as the last time you pulled it, at the provider's own reporting resolution. CostCompass is kept current the moment you click Refresh — it reads the latest usage from every connected provider on demand. It doesn't poll them in the background, and it doesn't send alerts, so a developing climb shows up the next time you look, with the month still open to act on it. Keep in mind that a month-to-date total is a running figure, not a settled invoice. Discounts that don't appear in raw usage can make the final bill a little lower.
- Why use CostCompass instead of a proxy, an SDK, or a cloud cost tool?
- Each of those is built for a different question. A proxy or SDK answers "what did this request cost," and earns its place in your code or request path when you're debugging or routing individual calls. A cloud cost platform answers "where is our infrastructure spend going" for a finance team. CostCompass answers "what is AI costing me this month across everything I run" for a solo developer. It reads each provider's own usage on demand and prices it into one live month-to-date total, with a forecast and a per-provider, per-model breakdown across model APIs and compute alike — nothing rerouted, nothing in your code, and your keys encrypted in your browser before they're ever stored.
About the author
Joubert Berger builds CostCompass, a spend-intelligence dashboard that pulls usage from AI and compute providers into one month-to-date total, a forecast, and a per-provider breakdown. This guide reflects how CostCompass reads each provider's own usage API — see the security model for how your keys are handled.
Read the bill across every provider you run
Connect each provider once and pull a single month-to-date total and forecast on demand — no traffic rerouted, no code changed, nothing in your request path.