Why I Added an LLM Parser on Top of Vector Search (And What It Changed)

Published: (March 9, 2026 at 07:27 AM EDT)
5 min read
Source: Dev.to

Source: Dev.to

I Thought Vector Search Was Enough

I built Queryra – an AI‑search plugin for WooCommerce and Shopify.
It replaced keyword matching with semantic embeddings, so customers could type things like “something warm for winter” and instantly find sweaters, fleece jackets, blankets, etc. Zero‑result queries became rare. It worked… until someone searched:

“wireless headphones under $80, not Beats”

The vector search returned wireless headphones, but many were $200 and several were Beats. The price cap and brand exclusion were completely invisible to the embedding model.

That’s when I realized: vector search is only layer 1. I was missing layer 2.

Embeddings excel at one thing: encoding semantic similarity.

  • “Sneakers” lands close to “trainers” and “running shoes”.
  • “Gift for dad” finds garden tools, BBQ sets, watches – even without those words in the query.

But a query like “laptop under $1000 for video editing, not Chromebook” contains two fundamentally different types of information:

TypeDescription
Semantic intentWhat the customer wants (a powerful laptop for video work)
Structural constraintsHow to filter results (price cap, category exclusion)

Embeddings handle semantic intent well, but they have no mechanism for structural constraints. You can’t encode “under $1000” as a direction in vector space, and “not Chromebook” isn’t a semantic concept – it’s an instruction to the search system. Every vector‑only implementation has this blind spot, and it gets worse as queries become more specific.

Who suffers most? The highest‑intent buyers – the ones ready to purchase right now.

The Solution: LLM Parser as Layer Two

I added a query parser that runs before the vector search. Its job is to decompose the query into structured components.

Example

{
  "semantic_query": "organic shampoo",
  "price_max": 25,
  "attribute_exclude": ["sulfates"],
  "sort_by": "rating"
}

Each component is then sent to the appropriate subsystem:

ComponentDestinationPurpose
semantic_queryVector searchFinds semantically relevant products
price_maxDatabase filterHard cut at $25
attribute_excludePost‑filterRemoves sulfate‑containing products
sort_byResult rerankingSurfaces highest‑rated first

The vector layer discovers what the customer means; the parser layer applies what they said.

The Bypass Problem (Latency)

The parser adds ~700–800 ms latency. For a simple query like “blue t‑shirt”, that overhead is unnecessary because embeddings alone handle it fine.

Routing Shortcut

import re

def should_parse(query: str) -> bool:
    # Price signals
    if re.search(r'under \$|below \$|\$\d+|budget|cheap|premium', query, re.I):
        return True
    # Exclusion signals
    if re.search(r'\bnot\b|\bwithout\b|\bno\b|\bexclude\b', query, re.I):
        return True
    # Sorting signals
    if re.search(r'best rated|top rated|newest|cheapest|most popular', query, re.I):
        return True
    # Brand signals (capitalized words that aren't at sentence start)
    if re.search(r'(?<!^)(?<!\. )[A-Z][a-z]+(?:\s[A-Z][a-z]+)*', query):
        return True

    return False   # Simple query — go straight to vector search

Simple queries skip the parser entirely. Complex queries get full intent extraction. The routing is invisible to the user – they just receive better results.

What Changed

QueryBefore (vector only)After (vector + parser)
“headphones under $80”All headphonesHeadphones ≤ $80 only
“not from BrandX”Includes BrandXBrandX excluded
“best rated coffee maker”Random orderSorted by rating
“organic, no sulfates”All organic shampoosSulfate‑free filtered

The first row of every table is identical – simple semantic queries work the same. Every other row shows the gap that the parser fills.

One Unexpected Benefit: Typos + Constraints

I expected the parser to help with structured queries, but it also solved a secondary problem: typos combined with constraints.

  • Vector search handles typos well on its own – “moisturiser” finds “moisturizer”.
  • “moisturiser under $20 without pareban” (misspelled parabens) broke the exclusion because the embedding similarity dropped on the misspelled term.

The LLM parser handles both in one pass: it corrects the typo, extracts the price constraint, and identifies the exclusion. This combined robustness was a pleasant surprise.

The Trade‑off

The parser incurs a cost because it makes an LLM API call on complex queries. I use gpt‑4.1‑nano (identical quality to gpt‑4o‑mini for this use case, ~33 % cheaper). With the bypass logic, only a fraction of queries hit the parser, but the cost still scales with traffic.

  • Self‑hosted option: replace the API call with a local model (e.g., Ollama + Mistral 7B works reasonably well for intent extraction).
  • SaaS product: factor the LLM usage into pricing.

Where This Goes Next

The parser currently extracts:

  • Price ranges
  • Brand references
  • Attribute filters & exclusions
  • Sorting preferences
  • Basic negations

Next on the list: multi‑intent queries.
Example: “Something for the office and something for the gym” – two separate semantic searches whose results are merged. Vector search alone can’t split the intent; the parser can.

If you’re building e‑commerce search and hitting the same wall – vector results that ignore everything after the first two meaningful words – this two‑layer approach is worth the added complexity.

I wrote a longer, non‑technical version for store owners here: Why Vector Search Alone Isn’t Enough for Ecommerce Stores

Happy to answer questions!

Queryra is AI search for WooCommerce and Shopify. queryra.com

0 views
Back to Blog

Related posts

Read more »