Token-based billing is what makes LLM spend hard to predict. The same feature can cost 3× more this week than last with no traffic change — because prompts got longer, output got more verbose, or a retry loop doubled the calls.
Tracking OpenAI token cost is about connecting four numbers to dollars.
The four numbers that matter
- Input tokens — driven by prompt length and retrieved context. The fastest-growing and most-ignored cost lever.
- Output tokens — driven by response length. Often priced higher than input, so verbose responses cost more than they look.
- Requests — the multiplier. Retries, streaming reconnects, background agents, and batch jobs quietly raise request count.
- Model — the per-token rate. The same token volume on
gpt-4ovsgpt-4o-miniis a large cost difference.
Total cost is roughly: requests × (input tokens × input rate + output tokens × output rate). Any one of those moving will move the bill.
What to watch
Track tokens per request against a baseline. A jump there is almost always a prompt or context change, not a traffic change — and it is the single most useful early-warning signal for OpenAI spend. Then break spend down by model and by project so you can attribute the change.
From tracking to alerting
StackSpend's OpenAI cost monitoring ties token volume to cost by model and project, and anomaly detection fires when the token/request ratio or model mix shifts — the day it happens, not at invoice time. For SaaS teams, the same data powers cost per LLM request and AI COGS.
Already over budget? See why is my OpenAI bill so high.