If you are trying to track LLM API spend, the hard part is usually not collecting a bill total. The hard part is understanding which provider, model, feature, team, or customer caused that total to move. This guide is for developers, product managers, and engineering leaders who need a usable operating model, not just another pricing page.
The short version: track spend daily, normalize it across providers, and make sure every request can be tied back to a business dimension you care about. If you cannot explain yesterday's spend by provider, model, feature, and customer, you are not really tracking LLM API spend yet. If you also need a unified view after rollout, see AI cost monitoring and cloud + AI cost monitoring.
Quick answer: how do you track LLM API spend properly?
A good LLM API spend tracking setup does five things:
- Pulls usage or cost data from the provider source of truth every day.
- Normalizes all providers into the same dimensions: provider, model, project, feature, environment, team, and customer.
- Separates volume changes from price changes, so you can tell whether more requests or a more expensive model caused the increase.
- Alerts on daily pace and anomalies instead of waiting for month-end invoices.
- Keeps cloud-routed model usage visible too, because Bedrock, Vertex, and Azure OpenAI often disappear into broader cloud billing unless you tag them carefully.
That means OpenAI and Anthropic are only half the picture. Many teams also need to track AWS Bedrock, Vertex AI, or Azure OpenAI inside the same reporting layer. If you are specifically building an ownership and dashboard layer around that data, see AI cost observability.
Why LLM API spend is harder to track than it looks
LLM API spend becomes messy fast because providers bill differently and products use models in different ways.
- OpenAI and Anthropic give you direct API usage and cost data, but only if you structure projects, API keys, and filters clearly.
- Bedrock, Vertex, and Azure OpenAI often land inside cloud billing, where model spend is mixed with storage, networking, and other infrastructure.
- One feature might call multiple models in one user flow: embeddings, reranking, chat generation, and fallback routing.
- Product teams talk about features and customers, while finance sees invoices and engineering sees requests.
That is why a raw provider dashboard is not enough. It tells you what the provider billed. It usually does not tell you what changed in your product.
What data should you capture on every LLM request?
At minimum, each request needs enough metadata to answer, "what was this for?"
If you want finance and product teams to use the same numbers, do not stop at provider and model. Add ownership dimensions early. How to attribute AI costs by feature, team, and customer goes deeper on that part.
How do you track direct-provider LLM spend from OpenAI and Anthropic?
For direct APIs, the cleanest pattern is to use the provider's own usage and cost endpoints as your source of truth, then enrich that data with your internal metadata.
OpenAI
As of March 2026, OpenAI provides both organization usage and cost reporting endpoints, plus dashboard exports for CSV workflows. That makes OpenAI one of the easier providers to track programmatically, provided you use project structure and API keys consistently.
Practical recommendation:
- assign API keys or project boundaries to major product surfaces,
- capture internal feature and customer metadata in your app logs,
- and reconcile provider-side daily cost with your internal request-level view.
If you use Batch or cached prompts, watch those separately. They change unit economics materially even when request counts stay flat. See AI API pricing in 2026 if you need the pricing-side context.
Anthropic
Anthropic's Usage and Cost Admin API is strong for organizations that need workspace, model, and service-tier views. The catch is that you need an Admin API key, and that is an organizational workflow decision, not just an engineering one.
Practical recommendation:
- use the Admin API for finance-grade reporting,
- keep model-level trend views separate from feature attribution,
- and watch long-context usage explicitly because token volume can jump without a visible traffic increase.
For both providers, the key mistake is treating raw provider data as your final dashboard. Provider data should be the billing source of truth, but your internal metadata should explain why the bill moved.
How do you track Bedrock, Vertex AI, and Azure OpenAI spend?
Cloud-routed LLM spend is where many teams lose visibility.
AWS Bedrock
Bedrock pricing and billing live inside AWS. If you only look at total AWS spend, Bedrock often gets buried inside a broader cloud bill. The practical fix is to use application inference profiles, cost allocation tags, and a reporting layer that can isolate generative AI usage from the rest of AWS.
If you run multiple teams, products, or tenants on Bedrock, this is worth doing early. Otherwise, you end up arguing over one undifferentiated AWS line item.
Google Vertex AI
Vertex AI usage usually becomes operationally useful only after Cloud Billing export is enabled to BigQuery. Without that export, you can inspect the console, but it is much harder to build recurring reporting and cross-team views. That is the same pattern described in GCP billing export pitfalls that break cost visibility, just applied to model workloads.
Azure OpenAI
Azure OpenAI tracking usually depends on Azure cost management views plus disciplined subscription, resource group, and tagging structure. If different environments or teams share the same Azure footprint without good labeling, spend reviews get fuzzy quickly.
This is the big cross-provider rule: direct APIs expose model spend directly, but cloud-routed LLMs often need billing exports, tags, or cost allocation structure before the numbers become actionable.
What should an LLM spend dashboard show every day?
A useful daily view is not just "total spend so far this month." It should answer the questions operators actually ask when something moves.
If your dashboard cannot explain a change within a few clicks, it is a reporting artifact, not an operating tool.
What alerts actually help before the invoice arrives?
The first alert should usually not be "monthly budget exceeded." That arrives too late.
Better default alerts:
- daily spend above expected range,
- sudden increase in one provider or one model,
- feature-level cost spike,
- prompt size jump,
- retry or failure surge,
- or forecasted month-end overspend.
This is the operational sequence that works best:
- Alert on daily anomaly.
- Check provider, model, and feature deltas.
- Confirm whether the change is still active.
- Fix the route, prompt, retry loop, or usage cap.
If you need concrete thresholds, use How to set AI and cloud alert thresholds. If an alert already fired, use How to investigate an AI spend spike.
When is a spreadsheet enough, and when do you need a monitoring layer?
For very small teams, a spreadsheet can work for a while. But only under narrow conditions.
The rule of thumb is simple: if your spend review depends on exporting CSVs from more than one place, you are already past spreadsheet scale.
A practical implementation pattern that works
If you need one concrete recommendation, use this:
- Start with provider daily cost as the billing source of truth.
- Add request-level metadata for feature, environment, team, and customer.
- Normalize all providers into one schema.
- Track input tokens, output tokens, and request counts separately.
- Review daily anomalies and weekly trends.
- Forecast month-end spend from current pace.
- Investigate every spike by provider, model, and feature before trying broad cost optimization.
This is also why the "gateway vs direct API" decision matters. A gateway can improve consistency if it becomes the place where metadata, routing, and observability are enforced. If that decision is still open, read Direct provider API vs AI gateway: which should you use?.
FAQ
What is the difference between LLM usage tracking and LLM spend tracking?
Usage tracking tells you requests, tokens, and model activity. Spend tracking tells you what those requests cost. You need both. Usage explains behavior. Spend explains impact.
Can I track LLM API spend by feature?
Yes, but only if your app attaches a stable feature key to each request or to the API key, project, or routing context that generated it. Otherwise you can see provider totals, but not product-level drivers.
How do I track LLM spend when it runs through AWS Bedrock or Vertex AI?
Treat cloud billing as part of the source of truth. For Bedrock, use inference profiles and cost allocation tags where possible. For Vertex AI, enable Cloud Billing export to BigQuery. The key is to separate model costs from the rest of cloud spend.
Should I trust provider dashboards or my own internal telemetry?
Use provider data as the billing source of truth and internal telemetry as the explanation layer. If they disagree, reconcile to provider cost first, then debug your metadata pipeline.
How often should I refresh LLM spend data?
Daily is the minimum useful cadence for most production teams. High-volume or high-risk systems may justify more frequent refreshes, but daily is the level where you can still catch problems before month-end.
What is the first alert I should set?
Set an alert for abnormal daily spend or abnormal provider/model movement, not just a month-end budget breach. Daily pace alerts give you time to respond.
Is one provider dashboard enough if I mostly use OpenAI?
Only if OpenAI is truly the only source of model spend and you do not need feature, customer, or forecast views. As soon as Anthropic, Cursor, Bedrock, Vertex, or Azure OpenAI enter the picture, you need a cross-provider view.
Practical takeaway
Tracking LLM API spend well means building one shared view across providers, models, features, and owners. The billing source of truth should come from the providers. The explanation layer should come from your own product metadata. Once you combine those, alerts and forecasts become much more useful than end-of-month invoice review.
If that is the system you need, StackSpend gives you one place to track AI cost monitoring, cloud + AI cost monitoring, and related provider setup paths like OpenAI, Anthropic, AWS, GCP, and Azure.
References
- How to Attribute AI Costs by Feature, Team, and Customer
- How to Set AI and Cloud Alert Thresholds
- How to Investigate an AI Spend Spike
- Direct Provider API vs AI Gateway: Which Should You Use?
- AI API Pricing in 2026
- OpenAI Pricing
- OpenAI Usage and Costs API Reference
- OpenAI Usage Dashboard Export Help
- Anthropic Usage and Cost API
- Anthropic Get Cost Report API
- Amazon Bedrock Pricing
- Create an Application Inference Profile in Amazon Bedrock
- Amazon Bedrock Cost Allocation Tags Announcement
- Vertex AI Generative AI Pricing
- Export Cloud Billing Data to BigQuery
- Azure OpenAI Pricing
- Manage Costs for Azure OpenAI