What is Hybrid search in RAGs?
Source: Dev.to
Need of Hybrid Search
We have documents containing error codes in Python with their respective definitions and use‑cases. A user writes a query to know about “What is ERR_404_AUTH?”
- Classic RAG: Retrieves all authentication and error‑related context it can find from a vector DB (document embeddings).
- Lexical search: Searches for the terms
["What", "is", "ERR_404_AUTH"]. - Hybrid search: Searches for the keyword
"ERR_404_AUTH"and retrieves semantically similar documents using similarity search.
Using BM25
Take BM25 as an extended version of TF‑IDF for keyword‑based search.
LangChain provides a built‑in BM25Retriever, making the implementation straightforward.
# pip install rank_bm25
from langchain_community.retrievers import BM25Retriever
from langchain_core.documents import Document
# Chunks from your text splitter
chunks = [
Document(page_content="The AX-705 engine uses a 4-stroke cycle."),
Document(page_content="Maintenance for AX-705 requires synthetic oil."),
Document(page_content="Four-stroke engines are common in modern cars.")
]
# Build the BM25 index (the inverted index)
bm25_retriever = BM25Retriever.from_documents(chunks)
bm25_retriever.k = 2 # Retrieve top 2Creating the Hybrid “Ensemble”
To combine exact keyword matching with semantic meaning, merge a vector retriever with the BM25 retriever.
from langchain.retrievers import EnsembleRetriever
# Assume `chroma_retriever` is already created from your vector store
hybrid_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, chroma_retriever],
weights=[0.3, 0.7] # 30 % importance to keywords, 70 % to meaning
)The 4 Steps of BM25 (Under the Hood)
When you call hybrid_retriever.invoke("AX-705 engine"), the BM25 component follows these steps:
- Tokenization – The query “AX-705 engine” is split into
["ax-705", "engine"]. - Lookup – The retriever checks its inverted index (a dictionary) for document chunks containing these exact strings.
- Scoring (f(q, d)) – A BM25 score is computed for each match, considering:
- Rareness: Rare terms like “AX-705” receive higher weight than common terms like “engine”.
- Saturation: Repeating a term many times does not linearly increase the score, preventing keyword stuffing.
- Length Penalty: Shorter, focused chunks rank higher than very long ones that contain the terms.
- Ranking – Returns a list of chunks sorted by the BM25 score.
Next Steps: Reciprocal Rank Fusion (RRF)
When the EnsembleRetriever obtains separate BM25 and vector lists, it must combine them. Because their scores are on different scales, RRF is used.
- Logic: RRF looks at the rank (position) of a document in each list rather than the raw scores.
- Intuition: A document ranked #1 by BM25 but #50 by the vector search still receives a high combined score, reflecting a perfect keyword match.
These techniques together enable effective hybrid search in Retrieval‑Augmented Generation (RAG) pipelines.