Retrieval Strategy Design: Vector, Keyword, and Hybrid Search

Published: (February 28, 2026 at 01:08 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Focus: engineering trade‑offs, system architecture, and practical defaults

Audience: Backend engineers familiar with embeddings who want to build reliable, controllable search systems.

1. Where Retrieval Strategy Fits in the System

A typical modern retrieval pipeline looks like this:

flowchart TD
    A[User Query] --> B[Query Rewrite / Intent Analysis]
    B --> C[Multi‑Channel Retrieval
(Vector / Keyword / Metadata)]
    C --> D[Hybrid Merge]
    D --> E[Top‑K Limiting]
    E --> F[Score Threshold Filtering]
    F --> G[(Optional) Reranking]
    G --> H[LLM Generation]

Concepts such as vector search, hybrid search, Top‑K, and threshold filtering are not isolated features; they work together inside the recall and filtering stages of this pipeline.

2. Vector Search: The Semantic Recall Layer

2.1 What Vector Search Solves

Vector search addresses semantic mismatch:

  • The user and the document use different words.
  • The meaning is similar, but lexical overlap is low.

Example

Query:    How to reduce dopamine addiction
Document: Attention control and dopamine regulation

Keyword search fails here, but embeddings succeed.

2.2 Core Parameters Engineers Must Understand

Similarity Metric

MetricTypical Use
Cosine SimilarityIndustry default (most embedding models are trained assuming cosine)
Dot ProductOften used when vectors are L2‑normalized
L2 DistanceUseful for certain metric‑learning models

Index Type (Performance‑Critical)

Index TypeUse Case
FlatSmall datasets, maximum accuracy
HNSWGeneral‑purpose, production default
IVFVery large‑scale datasets

For most knowledge‑base and RAG systems, HNSW offers the best trade‑off between speed and recall.

  • Strong at recall – retrieves related content.
  • Weak at precision – may return irrelevant but semantically nearby items.

Therefore vector search must be combined with:

  • Top‑K limits
  • Score thresholds
  • (Optional) Reranking

3. Keyword Search (BM25): The Precision Layer

Keyword search is not obsolete; its role is deterministic precision. It excels at:

  • Code and stack traces
  • API names
  • Error messages
  • Proper nouns, numbers, and IDs

In many technical queries, keyword search outperforms embeddings.
A key benefit is controllability: deterministic matching reduces hallucinations.

4. Hybrid Search: The Industry Standard

Hybrid search combines the strengths of both approaches:

  • Vector search → semantic recall
  • Keyword search → lexical precision

4.1 Parallel Hybrid (Most Common)

Vector Search Top‑K = 20
Keyword Search Top‑K = 20

Merge Results

Rerank

Advantages

  • Simple to implement
  • Stable behavior
  • Widely used in production

4.2 Score‑Fusion Hybrid

A weighted scoring approach:

Final Score = α × Vector Score + β × BM25 Score

Suitable for search‑engine‑like systems that require a strong global ranking.

5. Top‑K: A Recall Boundary, Not a Quality Guarantee

Misconception: “Higher Top‑K means better results.”

Reality: Top‑K defines the maximum recall scope.

  • Large Top‑K → more noise, higher token usage, higher latency.

Practical Defaults

ScenarioRecommended Top‑K
FAQ3–5
Technical Docs5–10
Code Search10–20

Typical RAG defaults

  • Vector Top‑K: 8–10
  • Keyword Top‑K: 8–10

6. Score Threshold Filtering: The Missing Safeguard

Top‑K always returns results—even when nothing is relevant. Threshold filtering solves this:

Only keep results where score > threshold

Failure example without thresholds

Query:  Apple phone
Result: Apple fruit

Threshold Guidelines (Cosine Similarity)

SimilarityInterpretation
> 0.85Strongly relevant
0.75–0.85Acceptable
6. Rerank Top 5
7. Send Top 3 to LLM

This pipeline balances recall, precision, cost, and stability.

8. What Engineers Should Actually Focus On

8.1 Recall vs. Precision Trade‑off

Vector Search → Recall
Keyword Search → Precision
Reranker      → Final Quality

Understanding this triangle matters more than tuning any single parameter.

8.2 Chunk Design Matters More Than Algorithms

Poor chunking breaks all retrieval strategies:

  • Chunks too long → embedding dilution
  • Chunks too short → context fragmentation

Good retrieval starts with good chunk boundaries.

8.3 Top‑K Is Not the Final Output Size

Typical production flow:

Retrieve 20 → Filter to 12 → Rerank to 5 → LLM consumes 3

Conclusion

Modern retrieval systems are not built on vector search alone.

Hybrid retrieval + threshold filtering + reranking is the real foundation for reliable, controllable AI‑augmented generation.

If you design retrieval with a system mindset instead of a single‑algorithm mindset, quality improves dramatically.

0 views
Back to Blog

Related posts

Read more »

Google Gemini Writing Challenge

What I Built - Where Gemini fit in - Used Gemini’s multimodal capabilities to let users upload screenshots of notes, diagrams, or code snippets. - Gemini gener...