Vectors vs. Keywords: Why 'Close Enough' is Dangerous in MedTech RAG

Published: (December 31, 2025 at 11:00 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

The Problem with Pure Vector Search in MedTech RAG

RAG (Retrieval‑Augmented Generation) has become the go‑to approach for many applications: data is stored in a vector database (e.g., Pinecone, Weaviate, pgvector), embeddings are generated, and an LLM is used to produce answers.

In a Clinical Decision Support prototype, a critical issue emerged: pure vector search is a precision trap. In medicine, returning the wrong document can be dangerous.

Example: Hyperglycemia vs. Hypoglycemia

A clinician asks:

“Protocol for acute hyperglycemia management”

If the system relies solely on semantic (cosine‑similarity) search, it may retrieve a document about hypoglycemia (low blood sugar) because the two terms are close in the embedding space—they appear in similar contexts (blood sugar, insulin, emergencies). The LLM might then summarize the wrong content, leading to an incorrect recommendation (e.g., giving glucose instead of insulin).

Why Vectors Struggle with Certain Linguistic Phenomena

  • Antonyms (high/low, hot/cold)
  • Proper nouns (drug names such as Celexa vs. Celebrex)
  • Negations (“Patient does not have fever”)

These cases require exact lexical matching, which dense vectors alone cannot guarantee.

A Hybrid Search Architecture

The remedy is not to discard vectors but to combine them with traditional keyword search (BM25). This hybrid approach leverages the strengths of both:

Retrieval TypeStrength
Dense (vectors)Captures intent and contextual similarity (high recall)
Sparse (keywords/BM25)Ensures exact term matches (high precision)

Simple Hybrid Search Pseudocode (Python)

def hybrid_search(query, alpha=0.5):
    """
    Perform a hybrid search that blends keyword and vector results.

    Parameters
    ----------
    query : str
        The user’s search query.
    alpha : float, optional
        Weight between keyword (0) and vector (1) search. Default is 0.5.

    Returns
    -------
    list
        Top combined results.
    """
    # 1. Vector (semantic) search
    vector_results = vector_db.search(
        query_vector=generate_embedding(query),
        limit=50
    )

    # 2. Keyword (BM25) search
    keyword_results = keyword_db.search(
        query_text=query,
        limit=50
    )

    # 3. Merge using Reciprocal Rank Fusion (RRF)
    combined_results = rrf_merge(vector_results, keyword_results, alpha)

    # Return the top‑10 fused results
    return combined_results[:10]

Most modern vector databases already provide built‑in hybrid capabilities, so you often don’t need to implement the logic from scratch.

Adding a Cross‑Encoder Reranker

Even after hybrid retrieval, some irrelevant documents may remain. A cross‑encoder reranker (e.g., bge‑reranker or Cohere’s rerank endpoint) can be applied to the top‑N results to score “How relevant is Document A to Query B?”. This step is computationally expensive, so it is performed only on the small subset returned by the hybrid search.

The reranker acts as a final sanity check:

  • If the query is about hyperglycemia and the document discusses hypoglycemia, the reranker assigns a low score, effectively filtering it out.

Practical Guidance for Medical RAG Systems

  1. Use vectors for recall – retrieve a broad set of potentially relevant documents.
  2. Use keywords for precision – filter out false positives such as antonyms or mismatched drug names.
  3. Apply a reranker – perform a final relevance assessment on the narrowed set.

By respecting the limitations of embeddings and layering additional retrieval signals, you can build safer, more reliable Clinical Decision Support tools.


Happy coding, and may your cosine similarities stay accurate!

Back to Blog

Related posts

Read more »

AI SEO agencies Nordic

!Cover image for AI SEO agencies Nordichttps://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads...