Vectors vs. Keywords: Why 'Close Enough' is Dangerous in MedTech RAG
Source: Dev.to
The Problem with Pure Vector Search in MedTech RAG
RAG (Retrieval‑Augmented Generation) has become the go‑to approach for many applications: data is stored in a vector database (e.g., Pinecone, Weaviate, pgvector), embeddings are generated, and an LLM is used to produce answers.
In a Clinical Decision Support prototype, a critical issue emerged: pure vector search is a precision trap. In medicine, returning the wrong document can be dangerous.
Example: Hyperglycemia vs. Hypoglycemia
A clinician asks:
“Protocol for acute hyperglycemia management”
If the system relies solely on semantic (cosine‑similarity) search, it may retrieve a document about hypoglycemia (low blood sugar) because the two terms are close in the embedding space—they appear in similar contexts (blood sugar, insulin, emergencies). The LLM might then summarize the wrong content, leading to an incorrect recommendation (e.g., giving glucose instead of insulin).
Why Vectors Struggle with Certain Linguistic Phenomena
- Antonyms (high/low, hot/cold)
- Proper nouns (drug names such as Celexa vs. Celebrex)
- Negations (“Patient does not have fever”)
These cases require exact lexical matching, which dense vectors alone cannot guarantee.
A Hybrid Search Architecture
The remedy is not to discard vectors but to combine them with traditional keyword search (BM25). This hybrid approach leverages the strengths of both:
| Retrieval Type | Strength |
|---|---|
| Dense (vectors) | Captures intent and contextual similarity (high recall) |
| Sparse (keywords/BM25) | Ensures exact term matches (high precision) |
Simple Hybrid Search Pseudocode (Python)
def hybrid_search(query, alpha=0.5):
"""
Perform a hybrid search that blends keyword and vector results.
Parameters
----------
query : str
The user’s search query.
alpha : float, optional
Weight between keyword (0) and vector (1) search. Default is 0.5.
Returns
-------
list
Top combined results.
"""
# 1. Vector (semantic) search
vector_results = vector_db.search(
query_vector=generate_embedding(query),
limit=50
)
# 2. Keyword (BM25) search
keyword_results = keyword_db.search(
query_text=query,
limit=50
)
# 3. Merge using Reciprocal Rank Fusion (RRF)
combined_results = rrf_merge(vector_results, keyword_results, alpha)
# Return the top‑10 fused results
return combined_results[:10]
Most modern vector databases already provide built‑in hybrid capabilities, so you often don’t need to implement the logic from scratch.
Adding a Cross‑Encoder Reranker
Even after hybrid retrieval, some irrelevant documents may remain. A cross‑encoder reranker (e.g., bge‑reranker or Cohere’s rerank endpoint) can be applied to the top‑N results to score “How relevant is Document A to Query B?”. This step is computationally expensive, so it is performed only on the small subset returned by the hybrid search.
The reranker acts as a final sanity check:
- If the query is about hyperglycemia and the document discusses hypoglycemia, the reranker assigns a low score, effectively filtering it out.
Practical Guidance for Medical RAG Systems
- Use vectors for recall – retrieve a broad set of potentially relevant documents.
- Use keywords for precision – filter out false positives such as antonyms or mismatched drug names.
- Apply a reranker – perform a final relevance assessment on the narrowed set.
By respecting the limitations of embeddings and layering additional retrieval signals, you can build safer, more reliable Clinical Decision Support tools.
Happy coding, and may your cosine similarities stay accurate!