Retrieval Strategy Design: Vector, Keyword, and Hybrid Search
Source: Dev.to
Focus: engineering trade‑offs, system architecture, and practical defaults
Audience: Backend engineers familiar with embeddings who want to build reliable, controllable search systems.
1. Where Retrieval Strategy Fits in the System
A typical modern retrieval pipeline looks like this:
flowchart TD
A[User Query] --> B[Query Rewrite / Intent Analysis]
B --> C[Multi‑Channel Retrieval
(Vector / Keyword / Metadata)]
C --> D[Hybrid Merge]
D --> E[Top‑K Limiting]
E --> F[Score Threshold Filtering]
F --> G[(Optional) Reranking]
G --> H[LLM Generation]
Concepts such as vector search, hybrid search, Top‑K, and threshold filtering are not isolated features; they work together inside the recall and filtering stages of this pipeline.
2. Vector Search: The Semantic Recall Layer
2.1 What Vector Search Solves
Vector search addresses semantic mismatch:
- The user and the document use different words.
- The meaning is similar, but lexical overlap is low.
Example
Query: How to reduce dopamine addiction
Document: Attention control and dopamine regulation
Keyword search fails here, but embeddings succeed.
2.2 Core Parameters Engineers Must Understand
Similarity Metric
| Metric | Typical Use |
|---|---|
| Cosine Similarity | Industry default (most embedding models are trained assuming cosine) |
| Dot Product | Often used when vectors are L2‑normalized |
| L2 Distance | Useful for certain metric‑learning models |
Index Type (Performance‑Critical)
| Index Type | Use Case |
|---|---|
| Flat | Small datasets, maximum accuracy |
| HNSW | General‑purpose, production default |
| IVF | Very large‑scale datasets |
For most knowledge‑base and RAG systems, HNSW offers the best trade‑off between speed and recall.
2.3 Fundamental Weakness of Vector Search
- Strong at recall – retrieves related content.
- Weak at precision – may return irrelevant but semantically nearby items.
Therefore vector search must be combined with:
- Top‑K limits
- Score thresholds
- (Optional) Reranking
3. Keyword Search (BM25): The Precision Layer
Keyword search is not obsolete; its role is deterministic precision. It excels at:
- Code and stack traces
- API names
- Error messages
- Proper nouns, numbers, and IDs
In many technical queries, keyword search outperforms embeddings.
A key benefit is controllability: deterministic matching reduces hallucinations.
4. Hybrid Search: The Industry Standard
Hybrid search combines the strengths of both approaches:
- Vector search → semantic recall
- Keyword search → lexical precision
4.1 Parallel Hybrid (Most Common)
Vector Search Top‑K = 20
Keyword Search Top‑K = 20
↓
Merge Results
↓
Rerank
Advantages
- Simple to implement
- Stable behavior
- Widely used in production
4.2 Score‑Fusion Hybrid
A weighted scoring approach:
Final Score = α × Vector Score + β × BM25 Score
Suitable for search‑engine‑like systems that require a strong global ranking.
5. Top‑K: A Recall Boundary, Not a Quality Guarantee
Misconception: “Higher Top‑K means better results.”
Reality: Top‑K defines the maximum recall scope.
- Large Top‑K → more noise, higher token usage, higher latency.
Practical Defaults
| Scenario | Recommended Top‑K |
|---|---|
| FAQ | 3–5 |
| Technical Docs | 5–10 |
| Code Search | 10–20 |
Typical RAG defaults
- Vector Top‑K: 8–10
- Keyword Top‑K: 8–10
6. Score Threshold Filtering: The Missing Safeguard
Top‑K always returns results—even when nothing is relevant. Threshold filtering solves this:
Only keep results where score > threshold
Failure example without thresholds
Query: Apple phone
Result: Apple fruit
Threshold Guidelines (Cosine Similarity)
| Similarity | Interpretation |
|---|---|
| > 0.85 | Strongly relevant |
| 0.75–0.85 | Acceptable |
6. Rerank Top 5
7. Send Top 3 to LLM
This pipeline balances recall, precision, cost, and stability.
8. What Engineers Should Actually Focus On
8.1 Recall vs. Precision Trade‑off
Vector Search → Recall
Keyword Search → Precision
Reranker → Final Quality
Understanding this triangle matters more than tuning any single parameter.
8.2 Chunk Design Matters More Than Algorithms
Poor chunking breaks all retrieval strategies:
- Chunks too long → embedding dilution
- Chunks too short → context fragmentation
Good retrieval starts with good chunk boundaries.
8.3 Top‑K Is Not the Final Output Size
Typical production flow:
Retrieve 20 → Filter to 12 → Rerank to 5 → LLM consumes 3
Conclusion
Modern retrieval systems are not built on vector search alone.
Hybrid retrieval + threshold filtering + reranking is the real foundation for reliable, controllable AI‑augmented generation.
If you design retrieval with a system mindset instead of a single‑algorithm mindset, quality improves dramatically.