Why Regex Fails at Google Taxonomy: Building a 98% Accurate RAG Agent
Source: Dev.to
The Problem
In Google Merchant Center, categorization is everything. If you misclassify a product, your ads stop running. Most feed tools use keyword matching (Regex).
Rule: If title contains “Dog” → Category: Animals > Pets > Dogs
Input: “Hot Dog Costume”
Result: Animals > Pets > Dogs ❌ (Wrong!)
This is why 15‑20 % of products in large catalogs often sit in “Disapproved” purgatory.
Solution: Vector‑Based Categorization
I built CatMap AI to solve this using vectors, not keywords. Instead of rules, we convert the entire Google Product Taxonomy (5,500+ nodes) into a vector index using OpenAI’s text-embedding-3-small.
When a product comes in (e.g., “Pallash Casual Women’s Kurti”), we don’t look for the word “Kurti”. We look for the mathematical concept of the product in vector space.
Handling Cultural Terms
Standard vector search can fail on culturally specific terms.
Input: “Kurri”
Vector Match: Generic Clothing (Confidence: Low)
Agentic Loop
-
Attempt 1: Standard search → Result: Uncategorized.
-
Trigger: Agent detects failure.
-
Action: Agent calls an LLM (
gpt-5-nano) to “expand” the query.Prompt: “What is a Kurti? Give me synonyms.”
Response: “Tunic, Blouse, Shirt”.
-
Attempt 2: Vector search with “Tunic Blouse Shirt”.
-
Result: Apparel > Clothing > Shirts & Tops ✅
Results
- Coverage: 100 % (up from 85 %).
- Accuracy: 98.3 %.
- Latency: ~200 ms per row.
Simplified Logic
// Simplified categorization flow
if (result.status === "Uncategorized") {
const synonyms = await expandQuery(product.name); // AI call
const newContext = await VectorStore.search(synonyms);
return categorizeWithContext(product, newContext);
}
Try It Out
I’m opening a Free Beta for developers. Link to CatMap AI
Follow for more engineering deep dives into AI agents.