Why I Added an LLM Parser on Top of Vector Search (And What It Changed)
Source: Dev.to
I Thought Vector Search Was Enough
I built Queryra – an AI‑search plugin for WooCommerce and Shopify.
It replaced keyword matching with semantic embeddings, so customers could type things like “something warm for winter” and instantly find sweaters, fleece jackets, blankets, etc. Zero‑result queries became rare. It worked… until someone searched:
“wireless headphones under $80, not Beats”
The vector search returned wireless headphones, but many were $200 and several were Beats. The price cap and brand exclusion were completely invisible to the embedding model.
That’s when I realized: vector search is only layer 1. I was missing layer 2.
The Problem With Pure Vector Search
Embeddings excel at one thing: encoding semantic similarity.
- “Sneakers” lands close to “trainers” and “running shoes”.
- “Gift for dad” finds garden tools, BBQ sets, watches – even without those words in the query.
But a query like “laptop under $1000 for video editing, not Chromebook” contains two fundamentally different types of information:
| Type | Description |
|---|---|
| Semantic intent | What the customer wants (a powerful laptop for video work) |
| Structural constraints | How to filter results (price cap, category exclusion) |
Embeddings handle semantic intent well, but they have no mechanism for structural constraints. You can’t encode “under $1000” as a direction in vector space, and “not Chromebook” isn’t a semantic concept – it’s an instruction to the search system. Every vector‑only implementation has this blind spot, and it gets worse as queries become more specific.
Who suffers most? The highest‑intent buyers – the ones ready to purchase right now.
The Solution: LLM Parser as Layer Two
I added a query parser that runs before the vector search. Its job is to decompose the query into structured components.
Example
{
"semantic_query": "organic shampoo",
"price_max": 25,
"attribute_exclude": ["sulfates"],
"sort_by": "rating"
}
Each component is then sent to the appropriate subsystem:
| Component | Destination | Purpose |
|---|---|---|
semantic_query | Vector search | Finds semantically relevant products |
price_max | Database filter | Hard cut at $25 |
attribute_exclude | Post‑filter | Removes sulfate‑containing products |
sort_by | Result reranking | Surfaces highest‑rated first |
The vector layer discovers what the customer means; the parser layer applies what they said.
The Bypass Problem (Latency)
The parser adds ~700–800 ms latency. For a simple query like “blue t‑shirt”, that overhead is unnecessary because embeddings alone handle it fine.
Routing Shortcut
import re
def should_parse(query: str) -> bool:
# Price signals
if re.search(r'under \$|below \$|\$\d+|budget|cheap|premium', query, re.I):
return True
# Exclusion signals
if re.search(r'\bnot\b|\bwithout\b|\bno\b|\bexclude\b', query, re.I):
return True
# Sorting signals
if re.search(r'best rated|top rated|newest|cheapest|most popular', query, re.I):
return True
# Brand signals (capitalized words that aren't at sentence start)
if re.search(r'(?<!^)(?<!\. )[A-Z][a-z]+(?:\s[A-Z][a-z]+)*', query):
return True
return False # Simple query — go straight to vector search
Simple queries skip the parser entirely. Complex queries get full intent extraction. The routing is invisible to the user – they just receive better results.
What Changed
| Query | Before (vector only) | After (vector + parser) |
|---|---|---|
| “headphones under $80” | All headphones | Headphones ≤ $80 only |
| “not from BrandX” | Includes BrandX | BrandX excluded |
| “best rated coffee maker” | Random order | Sorted by rating |
| “organic, no sulfates” | All organic shampoos | Sulfate‑free filtered |
The first row of every table is identical – simple semantic queries work the same. Every other row shows the gap that the parser fills.
One Unexpected Benefit: Typos + Constraints
I expected the parser to help with structured queries, but it also solved a secondary problem: typos combined with constraints.
- Vector search handles typos well on its own – “moisturiser” finds “moisturizer”.
- “moisturiser under $20 without pareban” (misspelled parabens) broke the exclusion because the embedding similarity dropped on the misspelled term.
The LLM parser handles both in one pass: it corrects the typo, extracts the price constraint, and identifies the exclusion. This combined robustness was a pleasant surprise.
The Trade‑off
The parser incurs a cost because it makes an LLM API call on complex queries. I use gpt‑4.1‑nano (identical quality to gpt‑4o‑mini for this use case, ~33 % cheaper). With the bypass logic, only a fraction of queries hit the parser, but the cost still scales with traffic.
- Self‑hosted option: replace the API call with a local model (e.g., Ollama + Mistral 7B works reasonably well for intent extraction).
- SaaS product: factor the LLM usage into pricing.
Where This Goes Next
The parser currently extracts:
- Price ranges
- Brand references
- Attribute filters & exclusions
- Sorting preferences
- Basic negations
Next on the list: multi‑intent queries.
Example: “Something for the office and something for the gym” – two separate semantic searches whose results are merged. Vector search alone can’t split the intent; the parser can.
If you’re building e‑commerce search and hitting the same wall – vector results that ignore everything after the first two meaningful words – this two‑layer approach is worth the added complexity.
I wrote a longer, non‑technical version for store owners here: Why Vector Search Alone Isn’t Enough for Ecommerce Stores
Happy to answer questions!
Queryra is AI search for WooCommerce and Shopify. queryra.com