Use this when the model needs to answer a narrow decision question such as yes or no, pick one allowed route, or select from a short menu of actions.
The short answer: LLM-based decisions work best when the answer space is tightly bounded and the model is not the final authority. Define the options, route uncertain cases to review, and keep deterministic rules outside the model.
What you will get in 9 minutes
- A practical rule for when binary or constrained-choice outputs are appropriate
- The difference between decisioning, routing, and scoring
- A threshold worksheet for review vs automation
- The common cases where code or traditional ML is a better fit
Use this when
- You need the model to pick one of a few allowed actions
- A workflow needs triage, routing, or approval support
- You are using long prompts for a task that is really a bounded decision
- You want a cleaner path from model output to product behavior
The 60-second answer
| Decision type | Good fit for an LLM? |
| --- | --- |
| Yes-no with ambiguous input | Sometimes |
| Pick one of 3 to 10 allowed routes | Often |
| Exact threshold or policy check | Usually no, use code |
| Repeated classification with lots of labeled data | Often a classic ML candidate |
If the decision is fully deterministic, make it deterministic. If the input is messy and interpretive but the output space is small, an LLM can help.
What this pattern is really for
Most “binary decision” workflows are one of these:
- pass or fail
- approve or escalate
- one of a few support queues
- one of a few content or workflow templates
That means the prompt should not ask the model to be creative. It should ask the model to choose from a closed set and explain why.
A good output contract
Use a schema like:
{
"choice": "approve | reject | escalate",
"confidenceBand": "high | medium | low",
"reason": "string",
"needsReview": true
}
This gives you:
- one constrained output
- one signal for routing
- one explanation for review
When this pattern works well
Good examples:
- route a support ticket to billing, bug, or feature request
- choose whether a transcript should go to manual QA
- select one approved follow-up template
- decide whether an extracted record is ready for the next step
Bad examples:
- legal or financial approvals with hard thresholds that code can check
- a “yes or no” decision where the real problem is missing data
- high-volume, well-labeled prediction tasks that classic ML can handle cheaply
Confidence is a routing signal, not truth
Do not treat model confidence as a substitute for evaluation.
Use it to answer:
- should this be automated?
- should this go to review?
- should this fall back to another path?
Then calibrate that rule against real examples.
Where classic ML or code wins
Prefer deterministic code when:
- business rules are explicit
- thresholds are known
- the answer must be auditable and exact
Prefer traditional ML when:
- the label set is stable
- you have enough training data
- latency and unit cost matter at scale
LLMs are strongest here when the input is messy, the label space is small, and the decision boundary is still too semantic or language-heavy for rules alone.
Threshold worksheet
For one workflow, write down:
- What are the only allowed outputs?
- Which outputs can be automated?
- Which outputs must be reviewed?
- What false-positive cost is unacceptable?
- What false-negative cost is acceptable?
If you cannot answer those questions, the workflow is not ready for automated decisioning.
Common failure modes
- treating “confidence” as if it were calibrated probability
- using free-form output for a bounded choice
- automating high-risk decisions with no review path
- using an LLM where a deterministic rule already exists
How StackSpend helps
Bounded-decision workflows are easier to measure as discrete product features. That makes it easier to compare automation rate, review rate, and model tier cost after rollout.