Back to topic hub
Guides
March 11, 2026
By Andrew Day

How to Forecast AI Costs in Production

Forecast AI costs in production using cost per request, growth assumptions, stress cases, and daily variance tracking instead of guesswork.

Share this post

Send it to someone managing cloud or AI spend.

LinkedInX

Use this when you need a forecast you can defend in front of your team, your board, or yourself.

The fast answer: forecast AI costs with a base case, a growth case, and a stress case. Use cost per request, requests per user, and expected usage growth as the inputs, then compare your daily pace against that forecast as the month progresses.

What you will get in 10 minutes

  • A simple AI cost forecasting model
  • A way to handle uncertainty without pretending precision
  • A daily review loop that tells you when the forecast is breaking

Why AI forecasts break

AI spend is different from traditional infrastructure forecasting because usage can change faster than the underlying stack.

Forecasts usually break when:

  • prompt length changes
  • response length changes
  • a feature launch increases request volume
  • a cheaper or more expensive model becomes the default
  • background workflows grow silently

That means the model needs both a usage assumption and a cost-per-request assumption.

Start with the three inputs that matter most

At minimum, use:

  • cost per request
  • requests per user or workflow
  • expected active users or job volume

Simple formula:

Monthly inference cost =
cost per request x requests per day x 30

If you have user-level data:

Monthly inference cost =
cost per request x requests per user x active users

If your product has several AI workflows, create one line per workflow instead of one blended average.

Build the base case first

Your base case should reflect what happens if the current product behavior continues without a major surprise.

Example:

| Metric | Value |
| --- | --- |
| Active users | 5,000 |
| Requests per user per day | 6 |
| Cost per request | $0.005 |

Base-case monthly inference forecast:

5,000 x 6 x 30 x $0.005 = $4,500

Then add the infrastructure around it:

  • compute for serving and workers
  • storage for logs and retrieval
  • networking for transfer and API traffic

Add a growth case

The growth case answers: what if usage expands exactly the way the team hopes it will?

Examples:

  • more active users
  • higher frequency of requests
  • more workflows using the same model stack

Keep this simple. If you expect 25 percent more requests next month, model that directly. Do not hide the assumption.

Example:

Base requests per month: 900,000
Growth assumption: +25%
Growth requests per month: 1,125,000
At $0.005 per request = $5,625

Add a stress case

This is the most useful part of the forecast.

The stress case tells you what happens if two bad things happen at once, for example:

  • traffic grows faster than planned
  • cost per request rises because the team changed model or prompt behavior

Example stress inputs:

  • requests up 40 percent
  • cost per request up from $0.005 to $0.0065

That is still not complicated, but it gives leadership something much more useful than false certainty.

Keep track of cost per feature, not just cost per org

If all AI usage is blended together, your forecast is hard to improve.

Break out major workflows such as:

  • chat assistant
  • background summarization
  • retrieval and embedding jobs
  • support automation
  • coding or agent workflows

This gives you:

  • cost per feature
  • cost per user
  • better prioritization if the forecast drifts

Compare daily pace against forecast

Forecasting is not a once-a-month exercise.

The real value comes from checking:

  • month-to-date spend
  • month-end forecast
  • variance vs budget

If you are halfway through the month and already above the base-case pace, the team should know immediately.

This is where daily tracking matters. It gives you time to correct:

  • request volume
  • model routing
  • prompt behavior
  • batch schedules

A practical review model

Use this simple structure:

| Scenario | What it assumes | What you do with it |
| --- | --- | --- |
| Base | Current usage continues | Run the normal plan |
| Growth | Product usage grows as expected | Check whether budget still holds |
| Stress | Usage and cost per request both increase | Prepare corrective actions |

You do not need a complex forecasting platform to start. You need a model that the team will actually review.

Forecasting is better with cross-provider analysis

If you use more than one AI or cloud provider, forecasting from a single vendor dashboard is incomplete.

An AI workflow can move spend across:

  • OpenAI or Anthropic inference
  • AWS or GCP compute
  • vector database storage
  • networking or egress

That is why cross-provider analysis matters. It helps you understand whether the forecast problem is really inference, or whether adjacent infrastructure is growing faster.

How StackSpend helps

StackSpend makes this workflow easier by providing:

  • cross-provider daily spend visibility
  • category-based analysis across inference, compute, storage, and networking
  • daily forecast vs budget tracking
  • faster review of cost changes by service and provider

That turns forecasting into an operating loop, not just a spreadsheet exercise.

Final take

A useful AI cost forecast does not try to predict everything. It gives the team:

  • a base case
  • a growth case
  • a stress case
  • a daily pace check

That is enough to spot problems early and make better decisions while usage is still changing.

What to do next

Continue in Academy

Build budget and forecast

Turn historical AI and cloud spend into a budget, forecast, and weekly review rhythm that helps teams stay ahead of invoice surprises.

Share this post

Send it to someone managing cloud or AI spend.

LinkedInX

Know where your cloud and AI spend stands — every day, starting today.

Sign up