Why Regex Fails at Google Taxonomy: Building a 98% Accurate RAG Agent

Published: (December 15, 2025 at 01:42 AM EST)
2 min read
Source: Dev.to

Source: Dev.to

The Problem

In Google Merchant Center, categorization is everything. If you misclassify a product, your ads stop running. Most feed tools use keyword matching (Regex).

Rule: If title contains “Dog” → Category: Animals > Pets > Dogs
Input: “Hot Dog Costume”
Result: Animals > Pets > Dogs ❌ (Wrong!)

This is why 15‑20 % of products in large catalogs often sit in “Disapproved” purgatory.

Solution: Vector‑Based Categorization

I built CatMap AI to solve this using vectors, not keywords. Instead of rules, we convert the entire Google Product Taxonomy (5,500+ nodes) into a vector index using OpenAI’s text-embedding-3-small.

When a product comes in (e.g., “Pallash Casual Women’s Kurti”), we don’t look for the word “Kurti”. We look for the mathematical concept of the product in vector space.

Handling Cultural Terms

Standard vector search can fail on culturally specific terms.

Input: “Kurri”
Vector Match: Generic Clothing (Confidence: Low)

Agentic Loop

  1. Attempt 1: Standard search → Result: Uncategorized.

  2. Trigger: Agent detects failure.

  3. Action: Agent calls an LLM (gpt-5-nano) to “expand” the query.

    Prompt: “What is a Kurti? Give me synonyms.”

    Response: “Tunic, Blouse, Shirt”.

  4. Attempt 2: Vector search with “Tunic Blouse Shirt”.

  5. Result: Apparel > Clothing > Shirts & Tops

Results

  • Coverage: 100 % (up from 85 %).
  • Accuracy: 98.3 %.
  • Latency: ~200 ms per row.

Simplified Logic

// Simplified categorization flow
if (result.status === "Uncategorized") {
    const synonyms = await expandQuery(product.name); // AI call
    const newContext = await VectorStore.search(synonyms);
    return categorizeWithContext(product, newContext);
}

Try It Out

I’m opening a Free Beta for developers. Link to CatMap AI

Follow for more engineering deep dives into AI agents.

Back to Blog

Related posts

Read more »