Use this when you need a forecast you can defend in front of your team, your board, or yourself.
The fast answer: forecast AI costs with a base case, a growth case, and a stress case. Use cost per request, requests per user, and expected usage growth as the inputs, then compare your daily pace against that forecast as the month progresses.
What you will get in 10 minutes
- A simple AI cost forecasting model
- A way to handle uncertainty without pretending precision
- A daily review loop that tells you when the forecast is breaking
Why AI forecasts break
AI spend is different from traditional infrastructure forecasting because usage can change faster than the underlying stack.
Forecasts usually break when:
- prompt length changes
- response length changes
- a feature launch increases request volume
- a cheaper or more expensive model becomes the default
- background workflows grow silently
That means the model needs both a usage assumption and a cost-per-request assumption.
Start with the three inputs that matter most
At minimum, use:
- cost per request
- requests per user or workflow
- expected active users or job volume
Simple formula:
Monthly inference cost =
cost per request x requests per day x 30
If you have user-level data:
Monthly inference cost =
cost per request x requests per user x active users
If your product has several AI workflows, create one line per workflow instead of one blended average.
Build the base case first
Your base case should reflect what happens if the current product behavior continues without a major surprise.
Example:
| Metric | Value |
| --- | --- |
| Active users | 5,000 |
| Requests per user per day | 6 |
| Cost per request | $0.005 |
Base-case monthly inference forecast:
5,000 x 6 x 30 x $0.005 = $4,500
Then add the infrastructure around it:
- compute for serving and workers
- storage for logs and retrieval
- networking for transfer and API traffic
Add a growth case
The growth case answers: what if usage expands exactly the way the team hopes it will?
Examples:
- more active users
- higher frequency of requests
- more workflows using the same model stack
Keep this simple. If you expect 25 percent more requests next month, model that directly. Do not hide the assumption.
Example:
Base requests per month: 900,000
Growth assumption: +25%
Growth requests per month: 1,125,000
At $0.005 per request = $5,625
Add a stress case
This is the most useful part of the forecast.
The stress case tells you what happens if two bad things happen at once, for example:
- traffic grows faster than planned
- cost per request rises because the team changed model or prompt behavior
Example stress inputs:
- requests up 40 percent
- cost per request up from $0.005 to $0.0065
That is still not complicated, but it gives leadership something much more useful than false certainty.
Keep track of cost per feature, not just cost per org
If all AI usage is blended together, your forecast is hard to improve.
Break out major workflows such as:
- chat assistant
- background summarization
- retrieval and embedding jobs
- support automation
- coding or agent workflows
This gives you:
- cost per feature
- cost per user
- better prioritization if the forecast drifts
Compare daily pace against forecast
Forecasting is not a once-a-month exercise.
The real value comes from checking:
- month-to-date spend
- month-end forecast
- variance vs budget
If you are halfway through the month and already above the base-case pace, the team should know immediately.
This is where daily tracking matters. It gives you time to correct:
- request volume
- model routing
- prompt behavior
- batch schedules
A practical review model
Use this simple structure:
| Scenario | What it assumes | What you do with it |
| --- | --- | --- |
| Base | Current usage continues | Run the normal plan |
| Growth | Product usage grows as expected | Check whether budget still holds |
| Stress | Usage and cost per request both increase | Prepare corrective actions |
You do not need a complex forecasting platform to start. You need a model that the team will actually review.
Forecasting is better with cross-provider analysis
If you use more than one AI or cloud provider, forecasting from a single vendor dashboard is incomplete.
An AI workflow can move spend across:
- OpenAI or Anthropic inference
- AWS or GCP compute
- vector database storage
- networking or egress
That is why cross-provider analysis matters. It helps you understand whether the forecast problem is really inference, or whether adjacent infrastructure is growing faster.
How StackSpend helps
StackSpend makes this workflow easier by providing:
- cross-provider daily spend visibility
- category-based analysis across inference, compute, storage, and networking
- daily forecast vs budget tracking
- faster review of cost changes by service and provider
That turns forecasting into an operating loop, not just a spreadsheet exercise.
Final take
A useful AI cost forecast does not try to predict everything. It gives the team:
- a base case
- a growth case
- a stress case
- a daily pace check
That is enough to spot problems early and make better decisions while usage is still changing.