From Catalog Chaos to Real-Time Recommendations: Building a Product Graph with LLMs and Neo4j

Published: (December 11, 2025 at 10:52 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Most product recommendation systems I’ve seen are basically fancy keyword matchers. They work okay when you have millions of clicks to analyze, but they completely fall apart when:

  • You launch a new product with zero interaction data 📉
  • Your catalog is a mess of inconsistent tags and descriptions 🤦
  • You want to explain WHY you’re recommending something (not just show a black‑box score)

I just built a real‑time recommendation engine that actually understands products using LLMs and graph databases. The core logic is only ~100 lines of Python.

The Secret Sauce: Product Taxonomy + Knowledge Graphs

Instead of relying on user behavior alone, we teach an LLM to understand:

  • What a product actually is (fine‑grained taxonomy like “gel pen” not “office supplies”)
  • What people buy together (complementary products like “gel pen” → “notebook”, “pen holder”)

All of this is stored in a Neo4j graph database where relationships become first‑class citizens. You can now query things like “show me all products that share a complementary taxonomy with this gel pen.”

Real‑World Example: The Gel Pen Problem

When someone browses a gel pen, a traditional recommender might show:

  • Other gel pens (same category)
  • Popular items (based on sales)
  • Random “customers also bought” (if you have enough data)

With our approach, the LLM analyzes the product description and extracts:

  • Primary taxonomy: gel pen, writing instrument
  • Complementary taxonomy: notebook, pencil case, desk organizer

The graph now knows these relationships, so viewing the gel pen can surface notebooks, planners, and organizers—with explainable connections.

The Architecture (Simplified)

Product JSONs → CocoIndex Pipeline → LLM Extraction → Neo4j Graph

1. Ingest Products as a Stream

We watch a folder of product JSON files with auto‑refresh:

data_scope["products"] = flow_builder.add_source(
    cocoindex.sources.LocalFile(
        path="products",
        included_patterns=["*.json"]
    ),
    refresh_interval=datetime.timedelta(seconds=5)
)

Every time a product file changes, it triggers a pipeline update—no manual rebuilds.

2. Clean and Normalize Data

We map raw JSON into a clean structure:

@cocoindex.op.function(behavior_version=2)
def extract_product_info(product: cocoindex.typing.Json, filename: str) -> ProductInfo:
    return ProductInfo(
        id=f"{filename.removesuffix('.json')}",
        url=product["source"],
        title=product["title"],
        price=float(product["price"].lstrip("$").replace(",", "")),
        detail=Template(PRODUCT_TEMPLATE).render(**product),
    )

The detail field becomes a markdown “product sheet” that we feed to the LLM.

3. Let the LLM Do the Heavy Lifting

We define the taxonomy contract as dataclasses:

@dataclasses.dataclass
class ProductTaxonomy:
    """
    A concise noun or short phrase based on core functionality.
    Use lowercase, avoid brands/styles.
    Be specific: "pen" not "office supplies".
    """
    name: str

@dataclasses.dataclass
class ProductTaxonomyInfo:
    taxonomies: list[ProductTaxonomy]
    complementary_taxonomies: list[ProductTaxonomy]

Then we call the LLM:

taxonomy = data["detail"].transform(
    cocoindex.functions.ExtractByLlm(
        llm_spec=cocoindex.LlmSpec(
            api_type=cocoindex.LlmApiType.OPENAI,
            model="gpt-4.1"
        ),
        output_type=ProductTaxonomyInfo
    )
)

The LLM reads the markdown description and returns structured JSON matching our schema—no parsing nightmares.

4. Build the Knowledge Graph in Neo4j

We export three things:

  • Product nodes: id, title, price, url
  • Taxonomy nodes: unique labels like “gel pen”, “notebook”
  • Relationships: PRODUCT_TAXONOMY and PRODUCT_COMPLEMENTARY_TAXONOMY
product_node.export(
    "product_node",
    cocoindex.storages.Neo4j(
        connection=conn_spec,
        mapping=cocoindex.storages.Nodes(label="Product")
    ),
    primary_key_fields=["id"],
)

Neo4j automatically deduplicates nodes by primary key. If five products all mention “notebook” as a complementary taxonomy, they all link to the same Taxonomy node.

Running It Live

After setting up Postgres (for CocoIndex’s incremental processing) and Neo4j, run:

pip install -e .
cocoindex update --setup main

You’ll see output such as:

documents: 9 added, 0 removed, 0 updated

Then open Neo4j Browser at http://localhost:7474 and execute:

MATCH p=()-->() RETURN p

Boom—your entire product graph visualized.

Why This Actually Works

  • LLMs are excellent at text understanding – offload messy natural‑language interpretation to a model you control with schema and docstrings.
  • Graphs are made for relationships – you get explainable connections and can run graph algorithms (PageRank, community detection, shortest path, etc.).
  • Incremental updates are free – CocoIndex handles all the plumbing; add a product file, get an updated graph.

What You Can Build Next

  • Add brand, material, or use‑case taxonomies as separate node types.
  • Plug in clickstream data to weight edges or create FREQUENTLY_BOUGHT_WITH relationships.
  • Swap OpenAI for Ollama (on‑prem LLMs) when you need full control.
  • Layer on graph algorithms to find product clusters or detect trending categories.

Try It Yourself

Full working code is open‑source:

👉 CocoIndex Product Recommendation Example

The repository includes:

  • Complete flow definition
  • LLM extraction ops
  • Neo4j mappings
  • Sample product JSONs

If you’re experimenting with LLM‑native data pipelines or graph‑based recommendations, I’d love to hear what you’re building. Drop a comment or tag me!

P.S. If you found this useful, give the CocoIndex repo a star ⭐.

P.P.S. You can also explore the pipeline visually with CocoInsight (free beta) — it’s like DevTools for your data pipeline, with zero data retention.

Back to Blog

Related posts

Read more »