From Catalog Chaos to Real-Time Recommendations: Building a Product Graph with LLMs and Neo4j

Published: 6 days ago (December 11, 2025 at 10:52 PM EST)

4 min read

Source: Dev.to

Most product recommendation systems I’ve seen are basically fancy keyword matchers. They work okay when you have millions of clicks to analyze, but they completely fall apart when:

You launch a new product with zero interaction data 📉
Your catalog is a mess of inconsistent tags and descriptions 🤦
You want to explain WHY you’re recommending something (not just show a black‑box score)

I just built a real‑time recommendation engine that actually understands products using LLMs and graph databases. The core logic is only ~100 lines of Python.

The Secret Sauce: Product Taxonomy + Knowledge Graphs

Instead of relying on user behavior alone, we teach an LLM to understand:

What a product actually is (fine‑grained taxonomy like “gel pen” not “office supplies”)
What people buy together (complementary products like “gel pen” → “notebook”, “pen holder”)

All of this is stored in a Neo4j graph database where relationships become first‑class citizens. You can now query things like “show me all products that share a complementary taxonomy with this gel pen.”

Real‑World Example: The Gel Pen Problem

When someone browses a gel pen, a traditional recommender might show:

Other gel pens (same category)
Popular items (based on sales)
Random “customers also bought” (if you have enough data)

With our approach, the LLM analyzes the product description and extracts:

Primary taxonomy: gel pen, writing instrument
Complementary taxonomy: notebook, pencil case, desk organizer

The graph now knows these relationships, so viewing the gel pen can surface notebooks, planners, and organizers—with explainable connections.

The Architecture (Simplified)

Product JSONs → CocoIndex Pipeline → LLM Extraction → Neo4j Graph

1. Ingest Products as a Stream

We watch a folder of product JSON files with auto‑refresh:

data_scope["products"] = flow_builder.add_source(
    cocoindex.sources.LocalFile(
        path="products",
        included_patterns=["*.json"]
    ),
    refresh_interval=datetime.timedelta(seconds=5)
)

Every time a product file changes, it triggers a pipeline update—no manual rebuilds.

2. Clean and Normalize Data

We map raw JSON into a clean structure:

@cocoindex.op.function(behavior_version=2)
def extract_product_info(product: cocoindex.typing.Json, filename: str) -> ProductInfo:
    return ProductInfo(
        id=f"{filename.removesuffix('.json')}",
        url=product["source"],
        title=product["title"],
        price=float(product["price"].lstrip("$").replace(",", "")),
        detail=Template(PRODUCT_TEMPLATE).render(**product),
    )

The detail field becomes a markdown “product sheet” that we feed to the LLM.

3. Let the LLM Do the Heavy Lifting

We define the taxonomy contract as dataclasses:

@dataclasses.dataclass
class ProductTaxonomy:
    """
    A concise noun or short phrase based on core functionality.
    Use lowercase, avoid brands/styles.
    Be specific: "pen" not "office supplies".
    """
    name: str

@dataclasses.dataclass
class ProductTaxonomyInfo:
    taxonomies: list[ProductTaxonomy]
    complementary_taxonomies: list[ProductTaxonomy]

Then we call the LLM:

taxonomy = data["detail"].transform(
    cocoindex.functions.ExtractByLlm(
        llm_spec=cocoindex.LlmSpec(
            api_type=cocoindex.LlmApiType.OPENAI,
            model="gpt-4.1"
        ),
        output_type=ProductTaxonomyInfo
    )
)

The LLM reads the markdown description and returns structured JSON matching our schema—no parsing nightmares.

4. Build the Knowledge Graph in Neo4j

We export three things:

Product nodes: id, title, price, url
Taxonomy nodes: unique labels like “gel pen”, “notebook”
Relationships: PRODUCT_TAXONOMY and PRODUCT_COMPLEMENTARY_TAXONOMY

product_node.export(
    "product_node",
    cocoindex.storages.Neo4j(
        connection=conn_spec,
        mapping=cocoindex.storages.Nodes(label="Product")
    ),
    primary_key_fields=["id"],
)

Neo4j automatically deduplicates nodes by primary key. If five products all mention “notebook” as a complementary taxonomy, they all link to the same Taxonomy node.

Running It Live

After setting up Postgres (for CocoIndex’s incremental processing) and Neo4j, run:

pip install -e .
cocoindex update --setup main

You’ll see output such as:

documents: 9 added, 0 removed, 0 updated

Then open Neo4j Browser at http://localhost:7474 and execute:

MATCH p=()-->() RETURN p

Boom—your entire product graph visualized.

Why This Actually Works

LLMs are excellent at text understanding – offload messy natural‑language interpretation to a model you control with schema and docstrings.
Graphs are made for relationships – you get explainable connections and can run graph algorithms (PageRank, community detection, shortest path, etc.).
Incremental updates are free – CocoIndex handles all the plumbing; add a product file, get an updated graph.

What You Can Build Next

Add brand, material, or use‑case taxonomies as separate node types.
Plug in clickstream data to weight edges or create FREQUENTLY_BOUGHT_WITH relationships.
Swap OpenAI for Ollama (on‑prem LLMs) when you need full control.
Layer on graph algorithms to find product clusters or detect trending categories.

Try It Yourself

Full working code is open‑source:

👉 CocoIndex Product Recommendation Example

The repository includes:

Complete flow definition
LLM extraction ops
Neo4j mappings
Sample product JSONs

If you’re experimenting with LLM‑native data pipelines or graph‑based recommendations, I’d love to hear what you’re building. Drop a comment or tag me!

P.S. If you found this useful, give the CocoIndex repo a star ⭐.

P.P.S. You can also explore the pipeline visually with CocoInsight (free beta) — it’s like DevTools for your data pipeline, with zero data retention.