What is Hybrid search in RAGs?

Published: 4 days ago (April 30, 2026 at 09:47 AM EDT)

3 min read

Source: Dev.to

Source: Dev.to

Need of Hybrid Search

We have documents containing error codes in Python with their respective definitions and use‑cases. A user writes a query to know about “What is ERR_404_AUTH?”

Classic RAG: Retrieves all authentication and error‑related context it can find from a vector DB (document embeddings).
Lexical search: Searches for the terms ["What", "is", "ERR_404_AUTH"].
Hybrid search: Searches for the keyword "ERR_404_AUTH" and retrieves semantically similar documents using similarity search.

Using BM25

Take BM25 as an extended version of TF‑IDF for keyword‑based search.
LangChain provides a built‑in BM25Retriever, making the implementation straightforward.

# pip install rank_bm25
from langchain_community.retrievers import BM25Retriever
from langchain_core.documents import Document

# Chunks from your text splitter
chunks = [
    Document(page_content="The AX-705 engine uses a 4-stroke cycle."),
    Document(page_content="Maintenance for AX-705 requires synthetic oil."),
    Document(page_content="Four-stroke engines are common in modern cars.")
]

# Build the BM25 index (the inverted index)
bm25_retriever = BM25Retriever.from_documents(chunks)
bm25_retriever.k = 2  # Retrieve top 2

Creating the Hybrid “Ensemble”

To combine exact keyword matching with semantic meaning, merge a vector retriever with the BM25 retriever.

from langchain.retrievers import EnsembleRetriever

# Assume `chroma_retriever` is already created from your vector store
hybrid_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, chroma_retriever],
    weights=[0.3, 0.7]  # 30 % importance to keywords, 70 % to meaning
)

The 4 Steps of BM25 (Under the Hood)

When you call hybrid_retriever.invoke("AX-705 engine"), the BM25 component follows these steps:

Tokenization – The query “AX-705 engine” is split into ["ax-705", "engine"].
Lookup – The retriever checks its inverted index (a dictionary) for document chunks containing these exact strings.
Scoring (f(q, d)) – A BM25 score is computed for each match, considering:
- Rareness: Rare terms like “AX-705” receive higher weight than common terms like “engine”.
- Saturation: Repeating a term many times does not linearly increase the score, preventing keyword stuffing.
- Length Penalty: Shorter, focused chunks rank higher than very long ones that contain the terms.
Ranking – Returns a list of chunks sorted by the BM25 score.

Next Steps: Reciprocal Rank Fusion (RRF)

When the EnsembleRetriever obtains separate BM25 and vector lists, it must combine them. Because their scores are on different scales, RRF is used.

Logic: RRF looks at the rank (position) of a document in each list rather than the raw scores.
Intuition: A document ranked #1 by BM25 but #50 by the vector search still receives a high combined score, reflecting a perfect keyword match.

These techniques together enable effective hybrid search in Retrieval‑Augmented Generation (RAG) pipelines.

What is Hybrid search in RAGs?

Need of Hybrid Search

Using BM25

Creating the Hybrid “Ensemble”

The 4 Steps of BM25 (Under the Hood)

Next Steps: Reciprocal Rank Fusion (RRF)

Related posts

The smarter the model, the more it saves.

Caching AI Responses in a Desktop App — Don't Pay Twice for the Same Question

LLM386: borrowing a 1990s idea for managing LLM context

Token Consumption Anxiety and the Open Source App I Built to Solve It