Use this when retrieval is inconsistent because the user query is vague, compound, or mismatched to how your corpus is organized.
The short answer: not every bad retrieval result is an embedding problem. Often the real issue is that the query needs rewriting, decomposition, or a different route before it hits the index.
What you will get in 9 minutes
- A simple framework for rewrite vs decompose vs route
- Examples of when each step helps
- A worksheet for designing a better pre-retrieval layer
- Metrics that isolate pre-retrieval quality from final answer quality
Use this when
- Users ask multi-part or conversational questions
- The same corpus contains very different document types
- Retrieval sometimes works and sometimes fails for similar intent
- The best answer often needs more than one retrieval path
The 60-second answer
| Problem | Best first move |
| --- | --- |
| Query is vague or slang-heavy | rewrite |
| Query contains multiple asks | decompose |
| Query belongs to one of several corpora or tools | route |
Do not jump straight to bigger prompts if the retrieval input is the real bottleneck.
Pattern 1: Query rewriting
Rewriting helps when the user query is:
- informal
- incomplete
- phrased differently from the indexed material
- missing stable domain language
Good rewrite goal:
- preserve intent
- improve retrievability
Bad rewrite goal:
- guess the answer early
- over-specify details the user did not ask for
Pattern 2: Query decomposition
Decompose when one question contains several retrieval intents.
Examples:
- “What changed in pricing and what does it mean for enterprise customers?”
- “Which models are cheaper, and which still pass our coding evals?”
In those cases a single retrieval call often mixes unrelated evidence. Splitting the question lets the system retrieve and answer in smaller, cleaner parts.
Pattern 3: Retrieval routing
Route when different sources require different retrieval methods.
Examples:
- product docs vs support tickets
- policy docs vs metrics dashboards
- SQL-backed systems vs narrative corpora
Routing can decide:
- which corpus to search
- whether to use lexical, dense, or hybrid retrieval
- whether the query should go to a tool instead of a retriever
What a good pre-retrieval layer does
Before retrieval, decide:
- does the query need normalization?
- does it need splitting?
- which source or method should receive it?
That layer often improves quality more cheaply than larger prompts or stronger generation models.
How to evaluate this layer
Measure:
- rewritten-query retrieval lift
- decomposed-query recall lift
- routing accuracy
- answer correctness after routing
If you only measure final answer quality, you cannot tell whether the improvement came from better routing or better generation.
Pre-retrieval worksheet
For one workflow, define:
- Common query shapes
- Which shapes need rewriting
- Which shapes should be decomposed
- Which sources or tools each shape should route to
- Which metric proves the layer helped
Common failure modes
- rewriting every query when only a few need it
- decomposing questions that should stay whole
- using one retrieval route for every corpus
- not logging the rewritten or routed query for debugging
How StackSpend helps
Pre-retrieval layers change workflow cost by altering how many searches, tool calls, and generation steps occur per request. Tracking spend by workflow helps show whether smarter routing reduced wasted retrieval and token volume.