Cloud & AI Cost Glossary

Plain-English definitions of the terms behind cloud and AI cost management — from AI COGS and unit economics to anomaly detection, commitment discounts, and pace to forecast.

FinOps

FinOps (Financial Operations) is the practice of giving engineering, finance, and product teams shared, real-time accountability for variable cloud and AI spend. Instead of treating the bill as a finance problem discovered weeks later, FinOps pushes cost decisions to the people who create them — at the moment they create them — using shared data, budgets, and alerts.

AI Spend Intelligence

AI Spend Intelligence is the category of cost tooling that goes beyond dashboards: it normalises spend across every cloud and AI provider, then applies analysis and automation — anomaly detection, forecasting, attribution, and a conversational cost analyst — so teams get answers and actions, not just charts. Where a monitoring tool shows you the numbers, an AI Spend Intelligence platform does the analysis, explains the variance with cited figures, and surfaces it where the team already works. It is the layer StackSpend occupies for engineering-led teams that have growing cloud and AI spend but no dedicated FinOps function.

AI cost analyst

An AI cost analyst is a conversational interface to cloud and AI spend that answers plain-English questions — "what drove our bill up this week?", "are we on track against budget?", "draft a board summary" — with cited figures, and can take confirmed actions like acknowledging an anomaly or creating a budget. It collapses the dashboard archaeology, SQL, and spreadsheet pivots of cost analysis into a single question, which is why it functions as the FinOps analyst a smaller team cannot yet justify hiring. StackSpend ships one as its in-app Cost Intelligence Agent.

Conversational FinOps

Conversational FinOps is the practice of managing cloud and AI cost through natural-language questions rather than dashboards, SQL, and spreadsheets. Instead of drilling through provider portals to answer "what drove our bill up this week?" or "are we on track against budget?", a team asks a conversational cost analyst and gets a cited answer in seconds — and can act on it (acknowledge an anomaly, create a budget) in the same place. It lowers the cost of investigation enough that engineering-led teams without a dedicated FinOps function can still run a real cost practice. StackSpend delivers it through its in-app Cost Intelligence Agent.

AI COGS

AI COGS (Cost of Goods Sold) is the inference cost baked into a software product: the OpenAI, Anthropic, Bedrock, or other model spend consumed by each user interaction or feature. Tracking AI COGS lets a team calculate gross margin per product line and see how model usage affects unit economics, rather than burying inference cost in a single undifferentiated API bill.

Cost anomaly

A cost anomaly is a sudden, statistically significant deviation in spend from a service or provider’s historical baseline — for example, OpenAI spend doubling overnight because of a prompt bug or a runaway agent loop. Anomalies are the early-warning signal cost monitoring exists to catch, because they usually surface days before the invoice does.

Anomaly detection

Anomaly detection is the automated process of learning each service’s normal spend pattern and flagging deviations that exceed it. Effective detection accounts for weekly seasonality and growth trends so it alerts on genuine spikes — not on a predictable Monday-morning increase — and delivers the alert (Slack, email, webhook) before the cost compounds.

Related: Cost anomaly, Budget guardrail

Burn rate

Burn rate is how fast a team is spending over a given period, usually expressed per day or per month. For cloud and AI costs, daily burn rate is the most actionable view because it makes a mid-month spike visible immediately, instead of being averaged away in a monthly total.

Related: Runway, Pace to forecast

Runway

Runway is the amount of time a company can keep operating before it runs out of money, calculated as available cash divided by burn rate. Because cloud and AI spend is one of the largest variable costs for many software companies, an unnoticed spend spike directly shortens runway — which is why daily cost visibility is a runway-protection tool, not just a reporting one.

Related: Burn rate, Pace to forecast

Unit economics

Unit economics describes the direct revenue and costs tied to a single unit of a business — one customer, one API request, or one feature. For AI products, unit economics depend heavily on inference cost: if the model spend per active user grows faster than the revenue per user, the product becomes less profitable as it scales, even while top-line revenue rises.

Gross margin

Gross margin is revenue minus cost of goods sold (COGS), divided by revenue. For AI-powered software, inference cost (AI COGS) is an increasingly large component of COGS, so attributing model spend to the features and customers that drive it is what makes a true gross-margin number possible rather than a guess.

Related: AI COGS, Unit economics

Cost per token

Cost per token is the unit price of large-language-model usage, billed separately for input (prompt) and output (completion) tokens. Because output tokens usually cost several times more than input tokens, and because prompt size compounds across retries and long contexts, cost per token is the lever that most directly determines AI COGS.

Cost per request

Cost per request is the average cost of serving a single API call or user action, including model tokens, retries, tool calls, and any downstream infrastructure. It is the most useful denominator for AI unit economics because it maps cleanly onto product behaviour: a feature that triggers five model calls per click costs five times more per use than one that triggers one.

Related: Cost per token, Unit economics

AI agent cost

AI agent cost (or agentic cost) is the spend generated by autonomous, multi-step AI systems — agents that plan, call tools, retry, and loop until a task is done. Because a single user action can fan out into dozens or hundreds of model and tool calls, agent cost is far less predictable than a one-prompt-one-response workload: a runaway loop can multiply token volume 10x overnight before anyone notices. Controlling it means tracking cost per task and request volume by workflow, not just total tokens, so a spike in agent activity is caught the day it happens rather than on the invoice.

Egress cost

Egress cost is the fee a cloud provider charges to move data out of its network or across regions. It is a frequent source of surprise bills because it scales with traffic rather than with stored data, and it often hides inside an aggregate networking line item until something — a new integration, a misrouted backup — makes it spike.

Related: Cost anomaly

Idle resource cost

Idle resource cost is money spent on provisioned-but-unused capacity: oversized instances, forgotten dev environments, unattached storage volumes, or always-on resources that only need to run during business hours. Because idle resources accumulate silently and never trigger an error, they are typically found by cost review rather than by monitoring.

Related: Commitment discount

Commitment discount

A commitment discount is a reduced rate a cloud provider offers in exchange for a usage or spend commitment over one to three years — for example AWS Savings Plans and Reserved Instances, or committed-use discounts on GCP. The discount only pays off if committed capacity stays well utilised, so commitment decisions depend on accurate forecasts of baseline demand.

Showback and chargeback

Showback and chargeback are two models for attributing shared cloud and AI cost to the teams, products, or customers that generate it. Showback reports each team’s cost for visibility and accountability; chargeback goes further and actually bills it to that team’s budget. Both depend on consistent cost allocation, usually through tagging.

Cost allocation tagging

Cost allocation tagging is the practice of labelling cloud resources with metadata — team, product, environment, customer — so that spend can be grouped and attributed instead of viewed only by service. Tag coverage is the foundation of showback, chargeback, and per-feature margin: untagged spend is unattributable spend.

Pace to forecast

Pace to forecast compares spend so far in a period against the projected end-of-period total, answering "are we on track to hit budget?" while there is still time to act. Unlike a month-end variance report, a pace-to-forecast signal is forward-looking: a red pace on day 10 is an invitation to intervene, not a post-mortem.

Related: Burn rate, Budget guardrail

Budget guardrail

A budget guardrail is a defined spend threshold that triggers a notification — or an automated action — when usage approaches or crosses it. Guardrails turn a budget from a number reviewed monthly into a live control: the team hears about a breach in Slack on the day it happens, not in next month’s invoice.

Input vs output tokens

Large language models bill usage in two directions: input tokens (the prompt, context, and any retrieved documents you send) and output tokens (the completion the model generates). Output tokens typically cost three to five times more than input tokens, so two teams on the same model can have very different effective rates depending on their mix — a summarisation workload is output-light, while a long-context RAG workload is input-heavy. Tracking the split, not just a single token total, is what makes per-model cost comparable and optimisable.

Prompt caching (cache read and write tokens)

Prompt caching lets a model reuse a previously processed prompt prefix instead of reprocessing it on every call. Providers bill it as two distinct token types: a cache write (creating the cached prefix, often at a small premium over standard input) and a cache read (reusing it, typically at a large discount — often around a tenth of the input price). For workloads with a large, stable system prompt or shared context, cache reads can dominate token volume while contributing little cost, so counting them as ordinary input badly overstates spend.

Blended token rate

A blended token rate is the single effective price per token you actually pay once input, output, cache-read, and cache-write tokens are combined at your real usage mix. Headline per-token prices mislead because they quote one direction in isolation; your blended rate reflects how much of your traffic is cheap cached input versus expensive fresh output. It is the only fair basis for comparing two models or forecasting a switch, because it prices the model against your workload rather than a vendor’s example.

Cost per prompt

Cost per prompt is the total cost of a single model interaction: input tokens plus output tokens plus any cache reads and writes for that call, priced at the model’s per-direction rates. It is more actionable than cost per token because a real request bundles a large context with a small completion (or vice versa), and it is the unit that maps cleanly onto cost per request, cost per feature, and ultimately AI COGS.

API-equivalent usage value

API-equivalent usage value is what a given volume of LLM usage would cost if priced at the provider’s public API rates. It exists to make usage from flat-rate subscription tools — Claude Code, Cursor, and similar — comparable with pay-as-you-go API spend in a single number, so a team can see the true scale of its AI consumption regardless of how each tool happens to bill. It is explicitly not an invoice amount: a $20/month subscription can carry hundreds of dollars of API-equivalent value, which is why StackSpend always shows billed cost separately from usage value.

Token-based vs request-based billing

AI tools bill in one of two shapes: token-based (you pay per input/output token, as with most model APIs) or request-based (you pay per request, seat, or flat subscription, as with some coding assistants and included-usage tiers). The distinction matters for cost control because token-based usage exposes a full input/output/cache breakdown you can attribute and optimise, whereas request-based usage often reports only a call count — so per-token analytics and model-swap recommendations are only possible where the provider surfaces token data.

Keep exploring

AI cost monitoring

Open

Cloud cost monitoring

Know where your cloud and AI spend stands — every day.

Connect providers in minutes. Get 90 days of visibility and start receiving daily cost updates before the invoice lands.

Start Free Trial

14-day free trial. No credit card required. Plans from $29/month.