LLM costs are usage-based, which means they usually fail fast and fail noisily. A prompt bug, a retry loop, or a successful product launch can push spend up in hours rather than weeks.
This guide is for teams choosing tools to manage that spend. The practical takeaway is simple: most teams need one tool for visibility, one tool for control if they route across models, and optionally one tool for observability if they need to understand which features or prompts are expensive.
Quick answer: which tools do most teams actually need?
If you want the shortest path:
- Start with provider dashboards so you have a direct source of truth.
- Add a unified cost view when you use more than one provider or want Slack/email alerts.
- Add a gateway/router only when you need fallbacks, budget routing, or model switching.
- Add observability when you need to know which prompt, feature, or customer is driving cost.
That stack is enough for most teams until LLM spend becomes a meaningful line item.
Which category solves which problem?
The important distinction: these tools do different jobs. A gateway is not a finance view. An observability tool is not an invoice view. A provider dashboard is not a unified monitoring layer.
1. Provider Dashboards (OpenAI, Anthropic, etc.)
What it is: Each provider exposes its own usage or billing view. OpenAI shows usage by model and project. Anthropic shows usage and spend. Cursor, Hugging Face, and others have similar consoles.
Strengths: Free, direct, and usually the first place to verify that usage is real.
Weaknesses: One dashboard per provider. No total picture if you use more than one vendor.
When to use it: Always. Even if you later add a unified platform, keep provider dashboards as the underlying reference.
2. LiteLLM
What it is: An open-source gateway that sits between your app and multiple model providers. It supports routing, fallbacks, budget controls, and unified API patterns.
Strengths: Flexible, self-hosted, good if you want to own routing logic and avoid depending on one broker.
Weaknesses: You operate it. Reliability, alerts, and reporting are your problem unless you add other tools.
When to use it: When you already know you need routing logic, provider fallback, or budget-aware traffic control.
3. OpenRouter
What it is: A managed unified API that gives access to many models behind one endpoint.
Strengths: Very easy to experiment across providers and move traffic between models without managing many keys.
Weaknesses: You are adding an intermediary. That is a good trade for speed, but not for teams that want direct provider relationships and no broker layer.
When to use it: When speed of experimentation matters more than infrastructure control.
4. Langfuse
What it is: LLM observability for traces, prompts, evaluations, and cost attribution.
Strengths: Helps answer the question "which feature, prompt, or user flow is expensive?" That is usually the missing link after a team has basic billing visibility.
Weaknesses: It does not replace provider billing or unified invoice monitoring.
When to use it: When product and engineering need feature-level insight rather than just provider totals.
5. Portkey
What it is: A managed AI gateway with routing, guardrails, caching, and budget-aware controls.
Strengths: Useful if you want a gateway but do not want to self-host it.
Weaknesses: Another platform in the request path. Good trade-off for some teams, unnecessary for others.
When to use it: When you want routing and guardrails with less operational overhead than self-hosting.
6. Cloudflare AI Gateway
What it is: An edge gateway focused on analytics, controls, caching, and traffic handling.
Strengths: Particularly useful when repeated prompts can be cached or when Cloudflare is already part of your stack.
Weaknesses: Better at request-path control than finance-style cost monitoring.
When to use it: When network placement, caching, and edge controls matter as much as provider flexibility.
7. Vercel AI SDK + Vercel AI Gateway
What it is: A developer-friendly interface for building AI features with optional gateway functionality on Vercel.
Strengths: Good developer experience, low friction for teams already building on Vercel.
Weaknesses: Best fit for Vercel-centric teams, less useful as a standalone cost-management choice.
When to use it: When Vercel is already your application platform and you want the simplest integrated path.
8. StackSpend
What it is: A unified monitoring layer for OpenAI, Anthropic, Cursor, Hugging Face, Grok, and cloud providers including AWS, GCP, and Azure.
Strengths: One place for total cloud and AI spend, daily Slack or email alerts, anomaly detection, and forecasting.
Weaknesses: It is a monitoring layer, not a request router. It helps you see and react; it does not control traffic.
When to use it: When you need daily visibility across providers without stitching together multiple dashboards or building custom reporting.
9. AWS Cost Explorer / GCP Billing / Azure Cost Management
What it is: Native cloud billing and analytics tools.
Strengths: Essential when your model usage sits inside Bedrock, Vertex AI, or Azure OpenAI, because those costs often appear as part of your wider cloud bill.
Weaknesses: They stop at the cloud boundary. They do not unify direct API spend from OpenAI, Anthropic, Cursor, or other providers.
When to use it: When cloud-hosted AI is a material part of your architecture.
10. Spreadsheets + Manual Export
What it is: CSV exports and manual tracking in Sheets or Excel.
Strengths: Cheap, flexible, and enough for very early-stage teams with one or two providers.
Weaknesses: Becomes fragile fast. Manual exports break exactly when the team gets busy or spend starts changing quickly.
When to use it: Only as a temporary phase, not a long-term answer.
What setup should most teams choose?
Here is a concrete recommendation:
- 1 provider, low volume: provider dashboard is enough for now.
- 2+ providers or weekly surprises: add unified monitoring.
- Need fallbacks or model routing: add a gateway.
- Need feature-level attribution: add observability.
- Using Bedrock, Vertex AI, or Azure OpenAI: keep cloud billing in the stack too.
For many teams, the practical stack is:
- Provider dashboard
- Unified monitoring layer
- Optional gateway
- Optional observability
That order matters. Start with visibility before you add more moving parts.
What should a startup do first?
If you're a small team, do this in order:
- Turn on provider-level usage views.
- Identify your top two cost drivers by provider and model.
- Add daily alerting once LLM spend is big enough to matter.
- Add a router only after you have a clear routing need.
The common mistake is starting with a complex gateway before you have a reliable view of spend.
FAQ
Do I need a gateway for cost management?
No. A gateway helps with traffic control. For spend visibility, alerts, and reporting, you often want a monitoring layer first.
What is the cheapest way to track LLM costs?
Provider dashboards are the cheapest because they are already included. The trade-off is fragmentation once you use multiple providers.
Can I use StackSpend with a gateway?
Yes. That is a common setup. The gateway controls requests; StackSpend gives you a read-only billing and monitoring layer across providers.
How do I choose between LiteLLM and OpenRouter?
Choose LiteLLM if you want control and are comfortable operating infrastructure. Choose OpenRouter if you want speed and convenience more than operational ownership.
When do I need Langfuse or another observability tool?
When the question changes from "which provider is expensive?" to "which feature or prompt is expensive?"
What if I use Bedrock, Vertex AI, or Azure OpenAI?
Use the cloud billing tools as part of your stack, because those costs often show up inside wider cloud spend rather than in a separate vendor dashboard.
Can spreadsheets be enough?
Yes, for a short time. If you have one provider and low spend, they are fine. Once you add providers or need alerts, they stop being reliable.