Why Regex Fails at Google Taxonomy: Building a 98% Accurate RAG Agent

Published: 1 month ago (December 15, 2025 at 01:42 AM EST)

2 min read

Source: Dev.to

The Problem

In Google Merchant Center, categorization is everything. If you misclassify a product, your ads stop running. Most feed tools use keyword matching (Regex).

Rule: If title contains “Dog” → Category: Animals > Pets > Dogs
Input: “Hot Dog Costume”
Result: Animals > Pets > Dogs ❌ (Wrong!)

This is why 15‑20 % of products in large catalogs often sit in “Disapproved” purgatory.

Solution: Vector‑Based Categorization

I built CatMap AI to solve this using vectors, not keywords. Instead of rules, we convert the entire Google Product Taxonomy (5,500+ nodes) into a vector index using OpenAI’s text-embedding-3-small.

When a product comes in (e.g., “Pallash Casual Women’s Kurti”), we don’t look for the word “Kurti”. We look for the mathematical concept of the product in vector space.

Handling Cultural Terms

Standard vector search can fail on culturally specific terms.

Input: “Kurri”
Vector Match: Generic Clothing (Confidence: Low)

Agentic Loop

Attempt 1: Standard search → Result: Uncategorized.
Trigger: Agent detects failure.
Action: Agent calls an LLM (gpt-5-nano) to “expand” the query.

Prompt: “What is a Kurti? Give me synonyms.”

Response: “Tunic, Blouse, Shirt”.
Attempt 2: Vector search with “Tunic Blouse Shirt”.
Result: Apparel > Clothing > Shirts & Tops ✅

Results

Coverage: 100 % (up from 85 %).
Accuracy: 98.3 %.
Latency: ~200 ms per row.

Simplified Logic

// Simplified categorization flow
if (result.status === "Uncategorized") {
    const synonyms = await expandQuery(product.name); // AI call
    const newContext = await VectorStore.search(synonyms);
    return categorizeWithContext(product, newContext);
}

Try It Out

I’m opening a Free Beta for developers. Link to CatMap AI

Follow for more engineering deep dives into AI agents.

Why Regex Fails at Google Taxonomy: Building a 98% Accurate RAG Agent

The Problem

Solution: Vector‑Based Categorization

Handling Cultural Terms

Agentic Loop

Results

Simplified Logic

Try It Out

Related posts

Why Tiny Daily Tasks Are Killing Your Productivity — And One Hub Solves It

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Dec 9th - Dec15th, 2025)

Building Story CLI: From 30-Minute IP Registration to Under 5

Energia Solar + Mercado Livre para MEI: Requisitos Técnicos em 2025