GPT-4o made high-quality output cheap enough to use everywhere. That is exactly why it can quietly become the largest line on your OpenAI bill: lower unit cost invites higher volume, and volume is where spend actually lives.
Tracking GPT-4o cost well means watching three things, not one.
1. Cost by model, not just total spend
A single OpenAI total tells you nothing about why it moved. The useful view is spend broken down by model — GPT-4o, GPT-4o-mini, GPT-4.1, o-series — so you can see when a workload silently shifts from a cheaper model to a more expensive one, or when a fallback becomes the default.
The most common GPT-4o cost surprise is a routing change: a feature that was supposed to use gpt-4o-mini starts hitting gpt-4o, and the per-request cost jumps 10–20× with no code review flag.
2. Tokens per request
GPT-4o pricing is per token, so the lever that moves cost fastest is average tokens per request — driven by prompt length, retrieved context, and output verbosity. A prompt change that adds a few hundred tokens of context to a high-traffic endpoint can raise the bill more than any model switch.
Track input tokens, output tokens, and the ratio against your baseline. A jump in tokens-per-request is usually a prompt or retrieval change, not a traffic change.
3. Attribution to project and feature
Total GPT-4o spend is only actionable if you can tie it to the project, feature, or customer driving it. That is what turns "the bill went up" into "the new summarization feature is responsible."
Make it a daily signal
Native usage dashboards update slowly and show one number. StackSpend's OpenAI cost monitoring breaks spend down by model and project, ties usage to cost, and fires an anomaly alert the day GPT-4o spend or tokens-per-request shifts — so a model-routing change is a same-day notification, not an end-of-month invoice surprise.
If your bill has already jumped, start with why is my OpenAI bill so high.