I Built Vector-Only Search First. Here's Why I Had to Rewrite It.

Published: (February 20, 2026 at 01:32 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

I spent three weeks building a pure vector search for an e‑commerce product catalog, embedding everything with intfloat/multilingual-e5-large, storing the vectors in Qdrant, and running a few test queries.

  • “Gift for someone who likes cooking” → kitchen knives, spice sets – ✅
  • “Nike Air Max 90 black” → Adidas running shoes – ❌
  • “XJ‑4520” (a real SKU) → random kitchen appliance – ❌

The engine understood intent but failed on the simplest exact‑match lookups.

Why Vector‑Only Search Fails

Descriptive queries work

Embeddings map text into a high‑dimensional space where semantically similar meanings cluster. A query like “gift for someone who likes cooking” lands near knives, cookbooks, and spice sets even though the word gift never appears in the product titles.

SKUs and model numbers

A SKU such as XJ‑4520 is just a meaningless string to an embedding model. It gets projected somewhere in vector space, and the nearest neighbors are whatever other meaningless strings happen to be nearby. In practice, SKU lookups almost never return the correct product.

Brand + attribute combos

“Nike Air Max 90 black size 42” should return a single product. Vector search returned Nike items and Adidas and Puma shoes because they are all semantically “athletic shoes.” The exact match often fell to page 2.

Numeric filters

Queries like “under $50” or “500 ml bottle” are not understood as numeric constraints. The model knows that 500 ml is related to bottle and liquid, but it won’t filter by the actual number.

Short, specific queries

A single token such as “Bosch” yields random power tools with vector search, whereas a BM25 index would return all Bosch products ranked by relevance.

Parallel execution

Run a BM25 index and a vector index against the same catalog, then merge the results. BM25 excels at exact matches (SKUs, brand names, specific attributes) while vectors handle descriptive, intent‑based, and cross‑language queries.

Score normalization

BM25 scores range roughly 0–25+, while vector similarities are bounded between 0 and 1. Normalizing both to a common 0‑1 range is essential before combining them.

def normalize_scores(results: dict[str, float]) -> dict[str, float]:
    """Scale scores to the 0‑1 interval."""
    if not results:
        return {}
    min_score = min(results.values())
    max_score = max(results.values())
    if max_score == min_score:
        return {k: 1.0 for k in results}
    return {
        k: (v - min_score) / (max_score - min_score)
        for k, v in results.items()
    }

Configurable weighting

After normalization, blend the two score streams with store‑specific weights. A parts supplier that relies heavily on part numbers will give BM25 a larger weight, whereas a fashion retailer describing desired styles will favor the vector side.

Cross‑encoder reranking

Apply a cross‑encoder (e.g., cross-encoder/ms-marco-MiniLM-L-6-v2) to the top‑k merged candidates. The model directly compares each candidate with the query, re‑ordering results to fix cases where the naïve merge mis‑ranks items.

Practical Considerations

  • SKU‑like queries (alphanumeric, no spaces) → skip vector search entirely.
  • Long descriptive sentences → increase vector weight.
  • Mixed queries (e.g., “red Nike something for running”) → balance both engines.

Single‑word queries

Vector search struggles with single tokens; rely more on BM25 or fallback heuristics.

Handling mixed result sets

If BM25 returns 50 hits and vectors only 3, the merge can become BM25‑biased. Normalization plus weight tuning mitigates this.

Typos and misspellings

Both BM25 and vector models degrade with heavy misspellings. Consider adding a spelling‑correction layer or fuzzy matching.

Personalization

The described pipeline treats each query independently. Adding user‑history signals would require a separate ranking layer.

Caching embeddings

Embedding computation is expensive. Cache vector representations of product texts, but implement robust invalidation when catalog data changes.

Tech Stack

  • Vector store & BM25: Qdrant (built‑in BM25 support)
  • Embeddings: intfloat/multilingual-e5-large (1024‑dim, 100+ languages)
  • Reranker: cross-encoder/ms-marco-MiniLM-L-6-v2
  • Language: Python (async search execution)
  • Orchestration: LangGraph (integrated into a larger chat‑assistant workflow)

Conclusion

Don’t start with a vector‑only solution. While embeddings excel at understanding intent and handling multilingual, descriptive queries, they fall short on exact‑match lookups, numeric constraints, and short tokens. A hybrid approach—running BM25 and vector search in parallel, normalizing scores, applying configurable weights, and optionally reranking with a cross‑encoder—delivers a robust, production‑ready search experience.

If you’ve faced similar challenges in e‑commerce search, I’d love to hear how you solved them.

0 views
Back to Blog

Related posts

Read more »

Warm Introduction

Introduction Hello everyone! I'm fascinated by the deep tech discussions here. It's truly amazing to see the community thrive. Project Overview I'm passionate...