Why I Added an LLM Parser on Top of Vector Search (And What It Changed)

Published: 14 hours ago (March 9, 2026 at 07:27 AM EDT)

5 min read

Source: Dev.to

I Thought Vector Search Was Enough

I built Queryra – an AI‑search plugin for WooCommerce and Shopify.
It replaced keyword matching with semantic embeddings, so customers could type things like “something warm for winter” and instantly find sweaters, fleece jackets, blankets, etc. Zero‑result queries became rare. It worked… until someone searched:

“wireless headphones under $80, not Beats”

The vector search returned wireless headphones, but many were $200 and several were Beats. The price cap and brand exclusion were completely invisible to the embedding model.

That’s when I realized: vector search is only layer 1. I was missing layer 2.

The Problem With Pure Vector Search

Embeddings excel at one thing: encoding semantic similarity.

“Sneakers” lands close to “trainers” and “running shoes”.
“Gift for dad” finds garden tools, BBQ sets, watches – even without those words in the query.

But a query like “laptop under $1000 for video editing, not Chromebook” contains two fundamentally different types of information:

Type	Description
Semantic intent	What the customer wants (a powerful laptop for video work)
Structural constraints	How to filter results (price cap, category exclusion)

Embeddings handle semantic intent well, but they have no mechanism for structural constraints. You can’t encode “under $1000” as a direction in vector space, and “not Chromebook” isn’t a semantic concept – it’s an instruction to the search system. Every vector‑only implementation has this blind spot, and it gets worse as queries become more specific.

Who suffers most? The highest‑intent buyers – the ones ready to purchase right now.

The Solution: LLM Parser as Layer Two

I added a query parser that runs before the vector search. Its job is to decompose the query into structured components.

Example

{
  "semantic_query": "organic shampoo",
  "price_max": 25,
  "attribute_exclude": ["sulfates"],
  "sort_by": "rating"
}

Each component is then sent to the appropriate subsystem:

Component	Destination	Purpose
`semantic_query`	Vector search	Finds semantically relevant products
`price_max`	Database filter	Hard cut at $25
`attribute_exclude`	Post‑filter	Removes sulfate‑containing products
`sort_by`	Result reranking	Surfaces highest‑rated first

The vector layer discovers what the customer means; the parser layer applies what they said.

The Bypass Problem (Latency)

The parser adds ~700–800 ms latency. For a simple query like “blue t‑shirt”, that overhead is unnecessary because embeddings alone handle it fine.

Routing Shortcut

import re

def should_parse(query: str) -> bool:
    # Price signals
    if re.search(r'under \$|below \$|\$\d+|budget|cheap|premium', query, re.I):
        return True
    # Exclusion signals
    if re.search(r'\bnot\b|\bwithout\b|\bno\b|\bexclude\b', query, re.I):
        return True
    # Sorting signals
    if re.search(r'best rated|top rated|newest|cheapest|most popular', query, re.I):
        return True
    # Brand signals (capitalized words that aren't at sentence start)
    if re.search(r'(?<!^)(?<!\. )[A-Z][a-z]+(?:\s[A-Z][a-z]+)*', query):
        return True

    return False   # Simple query — go straight to vector search

Simple queries skip the parser entirely. Complex queries get full intent extraction. The routing is invisible to the user – they just receive better results.

What Changed

Query	Before (vector only)	After (vector + parser)
“headphones under $80”	All headphones	Headphones ≤ $80 only
“not from BrandX”	Includes BrandX	BrandX excluded
“best rated coffee maker”	Random order	Sorted by rating
“organic, no sulfates”	All organic shampoos	Sulfate‑free filtered

The first row of every table is identical – simple semantic queries work the same. Every other row shows the gap that the parser fills.

One Unexpected Benefit: Typos + Constraints

I expected the parser to help with structured queries, but it also solved a secondary problem: typos combined with constraints.

Vector search handles typos well on its own – “moisturiser” finds “moisturizer”.
“moisturiser under $20 without pareban” (misspelled parabens) broke the exclusion because the embedding similarity dropped on the misspelled term.

The LLM parser handles both in one pass: it corrects the typo, extracts the price constraint, and identifies the exclusion. This combined robustness was a pleasant surprise.

The Trade‑off

The parser incurs a cost because it makes an LLM API call on complex queries. I use gpt‑4.1‑nano (identical quality to gpt‑4o‑mini for this use case, ~33 % cheaper). With the bypass logic, only a fraction of queries hit the parser, but the cost still scales with traffic.

Self‑hosted option: replace the API call with a local model (e.g., Ollama + Mistral 7B works reasonably well for intent extraction).
SaaS product: factor the LLM usage into pricing.

Where This Goes Next

The parser currently extracts:

Price ranges
Brand references
Attribute filters & exclusions
Sorting preferences
Basic negations

Next on the list: multi‑intent queries.
Example: “Something for the office and something for the gym” – two separate semantic searches whose results are merged. Vector search alone can’t split the intent; the parser can.

If you’re building e‑commerce search and hitting the same wall – vector results that ignore everything after the first two meaningful words – this two‑layer approach is worth the added complexity.

I wrote a longer, non‑technical version for store owners here: Why Vector Search Alone Isn’t Enough for Ecommerce Stores

Happy to answer questions!

Queryra is AI search for WooCommerce and Shopify. queryra.com

Why I Added an LLM Parser on Top of Vector Search (And What It Changed)

I Thought Vector Search Was Enough

The Problem With Pure Vector Search

The Solution: LLM Parser as Layer Two

Example

The Bypass Problem (Latency)

Routing Shortcut

What Changed

One Unexpected Benefit: Typos + Constraints

The Trade‑off

Where This Goes Next

Related posts

I built a VS Code extension that lets you chat with your database - everything runs locally

I Built Cryptographic Audit Trails for AI Agents. Here Is Why.

When AI Becomes Your On-Call Engineer: The Future of Incident Response

I Just Want to Look Up What I Asked Claude Last Tuesday