Back to topic hub
Guides
March 12, 2026
By Andrew Day

Binary decisions and constrained choice with LLMs

Use LLMs for yes-no and limited-choice decisions only when the decision space, fallback path, and review thresholds are explicit.

Share this post

Send it to someone managing cloud or AI spend.

LinkedInX

Use this when the model needs to answer a narrow decision question such as yes or no, pick one allowed route, or select from a short menu of actions.

The short answer: LLM-based decisions work best when the answer space is tightly bounded and the model is not the final authority. Define the options, route uncertain cases to review, and keep deterministic rules outside the model.

What you will get in 9 minutes

  • A practical rule for when binary or constrained-choice outputs are appropriate
  • The difference between decisioning, routing, and scoring
  • A threshold worksheet for review vs automation
  • The common cases where code or traditional ML is a better fit

Use this when

  • You need the model to pick one of a few allowed actions
  • A workflow needs triage, routing, or approval support
  • You are using long prompts for a task that is really a bounded decision
  • You want a cleaner path from model output to product behavior

The 60-second answer

| Decision type | Good fit for an LLM? |
| --- | --- |
| Yes-no with ambiguous input | Sometimes |
| Pick one of 3 to 10 allowed routes | Often |
| Exact threshold or policy check | Usually no, use code |
| Repeated classification with lots of labeled data | Often a classic ML candidate |

If the decision is fully deterministic, make it deterministic. If the input is messy and interpretive but the output space is small, an LLM can help.

What this pattern is really for

Most “binary decision” workflows are one of these:

  • pass or fail
  • approve or escalate
  • one of a few support queues
  • one of a few content or workflow templates

That means the prompt should not ask the model to be creative. It should ask the model to choose from a closed set and explain why.

A good output contract

Use a schema like:

{
  "choice": "approve | reject | escalate",
  "confidenceBand": "high | medium | low",
  "reason": "string",
  "needsReview": true
}

This gives you:

  • one constrained output
  • one signal for routing
  • one explanation for review

When this pattern works well

Good examples:

  • route a support ticket to billing, bug, or feature request
  • choose whether a transcript should go to manual QA
  • select one approved follow-up template
  • decide whether an extracted record is ready for the next step

Bad examples:

  • legal or financial approvals with hard thresholds that code can check
  • a “yes or no” decision where the real problem is missing data
  • high-volume, well-labeled prediction tasks that classic ML can handle cheaply

Confidence is a routing signal, not truth

Do not treat model confidence as a substitute for evaluation.

Use it to answer:

  • should this be automated?
  • should this go to review?
  • should this fall back to another path?

Then calibrate that rule against real examples.

Where classic ML or code wins

Prefer deterministic code when:

  • business rules are explicit
  • thresholds are known
  • the answer must be auditable and exact

Prefer traditional ML when:

  • the label set is stable
  • you have enough training data
  • latency and unit cost matter at scale

LLMs are strongest here when the input is messy, the label space is small, and the decision boundary is still too semantic or language-heavy for rules alone.

Threshold worksheet

For one workflow, write down:

  1. What are the only allowed outputs?
  2. Which outputs can be automated?
  3. Which outputs must be reviewed?
  4. What false-positive cost is unacceptable?
  5. What false-negative cost is acceptable?

If you cannot answer those questions, the workflow is not ready for automated decisioning.

Common failure modes

  • treating “confidence” as if it were calibrated probability
  • using free-form output for a bounded choice
  • automating high-risk decisions with no review path
  • using an LLM where a deterministic rule already exists

How StackSpend helps

Bounded-decision workflows are easier to measure as discrete product features. That makes it easier to compare automation rate, review rate, and model tier cost after rollout.

What to do next

Continue in Academy

Build production LLM applications

Choose the right LLM pattern for structured data, retrieval, agents, chat, multimodal workflows, and ML-adjacent systems.

Share this post

Send it to someone managing cloud or AI spend.

LinkedInX

Know where your cloud and AI spend stands — every day, starting today.

Sign up