Retrieval Strategy Design: Vector, Keyword, and Hybrid Search

Published: 3 days ago (February 28, 2026 at 01:08 AM EST)

4 min read

Source: Dev.to

Focus: engineering trade‑offs, system architecture, and practical defaults

Audience: Backend engineers familiar with embeddings who want to build reliable, controllable search systems.

1. Where Retrieval Strategy Fits in the System

A typical modern retrieval pipeline looks like this:

flowchart TD
    A[User Query] --> B[Query Rewrite / Intent Analysis]
    B --> C[Multi‑Channel Retrieval
(Vector / Keyword / Metadata)]
    C --> D[Hybrid Merge]
    D --> E[Top‑K Limiting]
    E --> F[Score Threshold Filtering]
    F --> G[(Optional) Reranking]
    G --> H[LLM Generation]

Concepts such as vector search, hybrid search, Top‑K, and threshold filtering are not isolated features; they work together inside the recall and filtering stages of this pipeline.

2. Vector Search: The Semantic Recall Layer

2.1 What Vector Search Solves

Vector search addresses semantic mismatch:

The user and the document use different words.
The meaning is similar, but lexical overlap is low.

Example

Query:    How to reduce dopamine addiction
Document: Attention control and dopamine regulation

Keyword search fails here, but embeddings succeed.

2.2 Core Parameters Engineers Must Understand

Similarity Metric

Metric	Typical Use
Cosine Similarity	Industry default (most embedding models are trained assuming cosine)
Dot Product	Often used when vectors are L2‑normalized
L2 Distance	Useful for certain metric‑learning models

Index Type (Performance‑Critical)

Index Type	Use Case
Flat	Small datasets, maximum accuracy
HNSW	General‑purpose, production default
IVF	Very large‑scale datasets

For most knowledge‑base and RAG systems, HNSW offers the best trade‑off between speed and recall.

2.3 Fundamental Weakness of Vector Search

Strong at recall – retrieves related content.
Weak at precision – may return irrelevant but semantically nearby items.

Therefore vector search must be combined with:

Top‑K limits
Score thresholds
(Optional) Reranking

3. Keyword Search (BM25): The Precision Layer

Keyword search is not obsolete; its role is deterministic precision. It excels at:

Code and stack traces
API names
Error messages
Proper nouns, numbers, and IDs

In many technical queries, keyword search outperforms embeddings.
A key benefit is controllability: deterministic matching reduces hallucinations.

4. Hybrid Search: The Industry Standard

Hybrid search combines the strengths of both approaches:

Vector search → semantic recall
Keyword search → lexical precision

4.1 Parallel Hybrid (Most Common)

Vector Search Top‑K = 20
Keyword Search Top‑K = 20
        ↓
Merge Results
        ↓
Rerank

Advantages

Simple to implement
Stable behavior
Widely used in production

4.2 Score‑Fusion Hybrid

A weighted scoring approach:

Final Score = α × Vector Score + β × BM25 Score

Suitable for search‑engine‑like systems that require a strong global ranking.

5. Top‑K: A Recall Boundary, Not a Quality Guarantee

Misconception: “Higher Top‑K means better results.”

Reality: Top‑K defines the maximum recall scope.

Large Top‑K → more noise, higher token usage, higher latency.

Practical Defaults

Scenario	Recommended Top‑K
FAQ	3–5
Technical Docs	5–10
Code Search	10–20

Typical RAG defaults

Vector Top‑K: 8–10
Keyword Top‑K: 8–10

6. Score Threshold Filtering: The Missing Safeguard

Top‑K always returns results—even when nothing is relevant. Threshold filtering solves this:

Only keep results where score > threshold

Failure example without thresholds

Query:  Apple phone
Result: Apple fruit

Threshold Guidelines (Cosine Similarity)

Similarity	Interpretation
> 0.85	Strongly relevant
0.75–0.85	Acceptable

6. Rerank Top 5
7. Send Top 3 to LLM

This pipeline balances recall, precision, cost, and stability.

8. What Engineers Should Actually Focus On

8.1 Recall vs. Precision Trade‑off

Vector Search → Recall
Keyword Search → Precision
Reranker      → Final Quality

Understanding this triangle matters more than tuning any single parameter.

8.2 Chunk Design Matters More Than Algorithms

Poor chunking breaks all retrieval strategies:

Chunks too long → embedding dilution
Chunks too short → context fragmentation

Good retrieval starts with good chunk boundaries.

8.3 Top‑K Is Not the Final Output Size

Typical production flow:

Retrieve 20 → Filter to 12 → Rerank to 5 → LLM consumes 3

Conclusion

Modern retrieval systems are not built on vector search alone.

Hybrid retrieval + threshold filtering + reranking is the real foundation for reliable, controllable AI‑augmented generation.

If you design retrieval with a system mindset instead of a single‑algorithm mindset, quality improves dramatically.