Embeddings are individually cheap, which is exactly why their cost sneaks up on teams. The unit price invites patterns — reprocess everything, embed on every event, re-index on every change — that turn a tiny per-call cost into a large bill.
How embedding cost spikes happen
- Full-corpus backfills. Re-embedding an entire knowledge base after a model change or schema tweak runs millions of tokens in one job. Do it a few times a month and it dominates the embeddings line.
- Per-event embedding. Embedding on every document edit, every message, or every webhook — instead of batching or debouncing — multiplies call volume with usage.
- No caching. Re-embedding unchanged content because there's no content hash or cache check.
- Oversized chunks or overlap. Aggressive chunk overlap multiplies token volume per document.
The spike is rarely the chat model — it's an embeddings or indexing job that ran more often than anyone tracked.
How to track it
Separate embeddings spend from chat/completion spend, then compare against your baseline. A backfill shows up as a sharp, short-lived spike; a per-event pipeline shows up as a rising slope. Both are easy to miss in a single monthly total.
StackSpend's OpenAI cost monitoring breaks spend down by endpoint and model so embeddings cost is visible on its own, and anomaly detection flags an embeddings backfill the day it runs — not when the invoice arrives.
If your embeddings bill already jumped, start with why is my OpenAI bill so high.