I Built Vector-Only Search First. Here's Why I Had to Rewrite It.
Source: Dev.to
I spent three weeks building a pure vector search for an e‑commerce product catalog, embedding everything with intfloat/multilingual-e5-large, storing the vectors in Qdrant, and running a few test queries.
- “Gift for someone who likes cooking” → kitchen knives, spice sets – ✅
- “Nike Air Max 90 black” → Adidas running shoes – ❌
- “XJ‑4520” (a real SKU) → random kitchen appliance – ❌
The engine understood intent but failed on the simplest exact‑match lookups.
Why Vector‑Only Search Fails
Descriptive queries work
Embeddings map text into a high‑dimensional space where semantically similar meanings cluster. A query like “gift for someone who likes cooking” lands near knives, cookbooks, and spice sets even though the word gift never appears in the product titles.
SKUs and model numbers
A SKU such as XJ‑4520 is just a meaningless string to an embedding model. It gets projected somewhere in vector space, and the nearest neighbors are whatever other meaningless strings happen to be nearby. In practice, SKU lookups almost never return the correct product.
Brand + attribute combos
“Nike Air Max 90 black size 42” should return a single product. Vector search returned Nike items and Adidas and Puma shoes because they are all semantically “athletic shoes.” The exact match often fell to page 2.
Numeric filters
Queries like “under $50” or “500 ml bottle” are not understood as numeric constraints. The model knows that 500 ml is related to bottle and liquid, but it won’t filter by the actual number.
Short, specific queries
A single token such as “Bosch” yields random power tools with vector search, whereas a BM25 index would return all Bosch products ranked by relevance.
Combining BM25 and Vector Search
Parallel execution
Run a BM25 index and a vector index against the same catalog, then merge the results. BM25 excels at exact matches (SKUs, brand names, specific attributes) while vectors handle descriptive, intent‑based, and cross‑language queries.
Score normalization
BM25 scores range roughly 0–25+, while vector similarities are bounded between 0 and 1. Normalizing both to a common 0‑1 range is essential before combining them.
def normalize_scores(results: dict[str, float]) -> dict[str, float]:
"""Scale scores to the 0‑1 interval."""
if not results:
return {}
min_score = min(results.values())
max_score = max(results.values())
if max_score == min_score:
return {k: 1.0 for k in results}
return {
k: (v - min_score) / (max_score - min_score)
for k, v in results.items()
}
Configurable weighting
After normalization, blend the two score streams with store‑specific weights. A parts supplier that relies heavily on part numbers will give BM25 a larger weight, whereas a fashion retailer describing desired styles will favor the vector side.
Cross‑encoder reranking
Apply a cross‑encoder (e.g., cross-encoder/ms-marco-MiniLM-L-6-v2) to the top‑k merged candidates. The model directly compares each candidate with the query, re‑ordering results to fix cases where the naïve merge mis‑ranks items.
Practical Considerations
Query analysis before search
- SKU‑like queries (alphanumeric, no spaces) → skip vector search entirely.
- Long descriptive sentences → increase vector weight.
- Mixed queries (e.g., “red Nike something for running”) → balance both engines.
Single‑word queries
Vector search struggles with single tokens; rely more on BM25 or fallback heuristics.
Handling mixed result sets
If BM25 returns 50 hits and vectors only 3, the merge can become BM25‑biased. Normalization plus weight tuning mitigates this.
Typos and misspellings
Both BM25 and vector models degrade with heavy misspellings. Consider adding a spelling‑correction layer or fuzzy matching.
Personalization
The described pipeline treats each query independently. Adding user‑history signals would require a separate ranking layer.
Caching embeddings
Embedding computation is expensive. Cache vector representations of product texts, but implement robust invalidation when catalog data changes.
Tech Stack
- Vector store & BM25: Qdrant (built‑in BM25 support)
- Embeddings:
intfloat/multilingual-e5-large(1024‑dim, 100+ languages) - Reranker:
cross-encoder/ms-marco-MiniLM-L-6-v2 - Language: Python (async search execution)
- Orchestration: LangGraph (integrated into a larger chat‑assistant workflow)
Conclusion
Don’t start with a vector‑only solution. While embeddings excel at understanding intent and handling multilingual, descriptive queries, they fall short on exact‑match lookups, numeric constraints, and short tokens. A hybrid approach—running BM25 and vector search in parallel, normalizing scores, applying configurable weights, and optionally reranking with a cross‑encoder—delivers a robust, production‑ready search experience.
If you’ve faced similar challenges in e‑commerce search, I’d love to hear how you solved them.