Use this when a cost anomaly just fired and you need to know which deployed change most likely caused it, fast, before the spend compounds.
The fast answer: Take the anomaly's service, environment, and start time. Find every successful deployment to that service and environment in the hours before the spike. Resolve each deployment to the pull requests it shipped, then look for code changes that plausibly move cost — prompt edits, token limits, retry logic, caching, cron schedules, model swaps. The PR that landed just before the spike and touches a cost-relevant path is your prime suspect.
When a cost anomaly fires — say "guide-generation OpenAI spend +42%" — the first instinct is always the same: what did we ship? The problem is that the answer lives in GitHub while the anomaly lives in your cost tool, so most teams burn 30 minutes doing manual cross-tool detective work under pressure. This guide gives you a repeatable method to close that loop.
If you want the upstream half of this workflow first, start with AI cost anomaly detection. This post picks up the moment after an anomaly is detected.
Quick answer: how do you trace a cost spike to a pull request?
You correlate three facts the anomaly already gives you against your deployment history:
- What changed — the anomalous service or provider (e.g.
guide-generation, OpenAI). - Where — the environment (usually production).
- When — the anomaly start time, not the detection time.
Then you work backwards: successful deployments → deployed commit SHAs → the PRs included since the previous deploy → the changed files in those PRs. The candidate that best matches on timing, service, and cost-relevant code is the most likely cause.
The key discipline is to think in terms of candidates and evidence, not a single "culprit". You are ranking likelihood, not proving guilt.
Step 1: Anchor on the anomaly's start time, not its alert time
Cost data lags. A spike that started at 14:00 may not alert until the next morning's batch, because provider billing data arrives in arrears. If you search for deploys around the alert time, you will look in the wrong window entirely.
Always pull the anomaly start time — the first interval where spend deviated from baseline — and search backwards from there. A 6-hour lookback before the start is a sensible default for most services; widen it if deploys are infrequent.
Step 2: List successful deployments for the same service and environment
Filter your deployment history to:
- the same service that's anomalous (not your whole fleet),
- the same environment (production spikes are caused by production deploys),
- successful deployments only (a rolled-back deploy didn't cause sustained spend),
- inside the lookback window ending at the anomaly start.
If nothing deployed in that window, that's a strong signal the cause is not a code change — jump to the "no source-control signal" section below.
Step 3: Resolve deployments to the pull requests they shipped
A deployment is a commit SHA. To make it human-readable, resolve the range of commits between the previous successful deployment and this one, then map those commits to their pull requests. That gives you the actual set of merged PRs that went live in this release — the real unit engineers think in.
For each PR, you want:
- title, number, author, and labels,
- merged time and deployed time,
- the list of changed files.
Step 4: Score candidates on timing, service, and cost-relevant code
Not every PR in a release is a suspect. Rank them. These are the signals that matter, drawn from how cost-causal changes actually behave:
| Signal type | Raises confidence | Lowers confidence |
|---|---|---|
| Timing | Deploy landed shortly before the anomaly start | Deploy landed after the spike had already begun |
| Service match | Deployed service is the anomalous service | Change is in an unrelated service |
| Code relevance | Changed files map to the cost-driving code path | Changed files are docs or tests only |
| Change type | Prompt, model, token limit, retry, cache, queue, batch, or cron change | Cosmetic or unrelated refactor |
| Metric shape | The metric that moved matches the change (e.g. output tokens/request up after a prompt edit) | Cost rose proportionally to traffic, not per-request cost |
The cost-relevant change types to grep for
When you open the changed files, you're hunting for a short list of patterns that reliably move spend:
- Prompt changes — longer system prompts, more few-shot examples, more context stuffed in.
- Model swaps — a route moved from a cheaper model to a premium one, or a fallback flipped.
- Token limit changes — a raised
max_tokensthat lets responses run longer. - Retry and timeout logic — a new retry loop that multiplies calls on failure.
- Caching changes — a cache that got disabled, shortened, or keyed incorrectly.
- Batching and concurrency — work that moved from batched to per-item calls.
- Cron and schedule changes — a job that now runs hourly instead of daily.
A single PR touching any of these, deployed just before the spike, in the anomalous service, is almost always your answer.
Step 5: Confirm with the metric shape
Causation language is easy to get wrong, so confirm before you accuse. Match the shape of the cost increase to the change:
- Spend up because per-request cost rose (more tokens per call) → points to a prompt, model, or token-limit change.
- Spend up because request volume rose → points to a retry loop, a cron change, or simply more real traffic.
- Spend up proportionally to customer traffic → may not be a regression at all; it could be expected growth.
If traffic explains the whole increase, you likely have a launch, not a bug. See how to investigate an AI spend spike for the full traffic-vs-regression decision tree.
What if no deployment matches?
Sometimes the spike is real but no deploy lines up. That's useful information, not a dead end. The usual non-code causes:
- Traffic growth — more users, more usage, proportional cost.
- Provider billing backfill — the provider posted delayed usage, so the "spike" is a timing artifact.
- A scheduled job — a batch or cron run that isn't tied to a recent deploy.
- Missing metadata — the cost can't be attributed because tagging or service mapping is incomplete.
When you reach this branch, document "no strong code-change signal found" and move to the traffic and billing checks. Quietly closing the loop here is just as valuable as finding a PR.
Make this repeatable instead of heroic
Doing this by hand works once. Doing it every time, under pressure, at 9am after the overnight batch, is where teams lose hours. The durable fix is to capture the correlation as part of your operating rhythm:
- Ingest deployment metadata (service, environment, SHA, time) so the lookback is automatic.
- Maintain code mappings from repositories and paths to services, so changed files resolve to the right cost center.
- Attach the ranked candidates and evidence directly to the anomaly, so the next engineer doesn't redo the work.
StackSpend's source-control correlation does this by connecting your anomalies to recent deployments and surfacing ranked "related changes" with the evidence behind them — read-only, GitHub-first. The point isn't the tool; it's that the what-deployed question should already be answered by the time a human opens the anomaly. For the broader operating loop, see cost incident response: from anomaly to root cause to resolved issue.
Practical takeaway
Finding the PR behind a cost spike is a correlation problem, not a guessing game. Anchor on the anomaly's start time, list successful deploys to the same service and environment, resolve them to PRs, and rank candidates by timing, service match, and cost-relevant code. Confirm with the metric shape before you call anything the cause — and when no deploy matches, say so and check traffic and billing instead.
If you want the surrounding workflow, pair this with AI cost anomaly detection and cloud + AI cost monitoring.
FAQ
Should I search around the anomaly alert time or the start time?
The start time. Cost data arrives in arrears, so the alert can fire hours or a day after the spend actually began. Searching around the alert time looks in the wrong window.
How far back should I look for the causing deployment?
A 6-hour lookback before the anomaly start is a reasonable default. Widen it for services that deploy infrequently, and narrow it for high-frequency deploys to reduce noise.
What if several PRs deployed at once?
Rank them. The PR that touches the anomalous service and a cost-relevant code path (prompt, model, token limit, retry, cache, cron) is the strongest candidate. PRs that only change docs or tests can usually be ruled out.
How do I confirm a PR actually caused the spike?
Match the metric shape to the change. If per-request cost rose after a prompt or token-limit change, that's strong evidence. If only request volume rose, look at retries, cron schedules, or genuine traffic growth instead.
What if no deployment lines up with the spike?
Treat that as a signal that the cause is probably not code. Check traffic growth, provider billing backfill, scheduled jobs, and missing cost metadata before concluding.