How I Built a Self-Updating Neo4j Knowledge Graph from Meeting Notes (That Saves 99% on LLM Costs)

Published: 1 week ago (December 10, 2025 at 09:15 PM EST)

3 min read

Source: Dev.to

The Problem: Your Meeting Notes Are Wasted

Every day, organizations hold 62‑80 million meetings in the US alone. Those meetings generate decisions, action items, and task assignments—but most of that intelligence dies in Google Docs.

Want to know “Who was in all the budget meetings?” or “What tasks did Alex get assigned this month?” Good luck searching through thousands of Markdown files.

The real killer? Meeting notes are living documents. People fix names, reassign tasks, update decisions. Without incremental processing, you’re stuck choosing between:

💸 Massive LLM bills from reprocessing everything
📉 A stale, outdated knowledge graph

I solved this by building a self‑updating Neo4j knowledge graph that only processes changed documents—cutting LLM costs by 99%.

What We’re Building

A pipeline that turns messy meeting notes into a queryable graph database:

Google Drive → Detect Changes → Split Meetings → LLM Extract → Neo4j

Result: Three node types (Meeting, Person, Task) and three relationships (ATTENDED, DECIDED, ASSIGNED_TO) that let you query:

“Which meetings did Sarah attend?”
“Where was this task decided?”
“Who owns all Q4 tasks?”

The Secret Sauce: Incremental Processing

1. Only Process What Changed

The Google Drive source tracks last‑modified timestamps. When you have 100 000 meeting notes and only 1 % change daily, you process 1 000 files—not 100 000.

@cocoindex.flow_def(name="MeetingNotesGraph")
def meeting_notes_graph_flow(
    flow_builder: cocoindex.FlowBuilder,
    data_scope: cocoindex.DataScope
) -> None:
    credential_path = os.environ["GOOGLE_SERVICE_ACCOUNT_CREDENTIAL"]
    root_folder_ids = os.environ["GOOGLE_DRIVE_ROOT_FOLDER_IDS"].split(",")

    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.GoogleDrive(
            service_account_credential_path=credential_path,
            root_folder_ids=root_folder_ids,
            recent_changes_poll_interval=datetime.timedelta(seconds=10),
        ),
        refresh_interval=datetime.timedelta(minutes=1),
    )

Impact: 99 % reduction in LLM API costs for typical 1 % daily churn.

2. Smart Document Splitting

Meeting files often contain multiple sessions. Split them intelligently while keeping the header (e.g., ## Meeting Title) with each section to preserve context for the LLM.

with data_scope["documents"].row() as document:
    document["meetings"] = document["content"].transform(
        cocoindex.functions.SplitBySeparators(
            separators_regex=[r"\n\n##?\ "],
            keep_separator="RIGHT",
        )
    )

3. Structured LLM Extraction

Define a concrete schema instead of asking the model for “some JSON.”

@dataclass
class Person:
    name: str

@dataclass
class Task:
    description: str
    assigned_to: list[Person]

@dataclass
class Meeting:
    time: datetime.date
    note: str
    organizer: Person
    participants: list[Person]
    tasks: list[Task]

Extract with caching; identical inputs reuse cached outputs, eliminating redundant LLM calls.

with document["meetings"].row() as meeting:
    parsed = meeting["parsed"] = meeting["text"].transform(
        cocoindex.functions.ExtractByLlm(
            llm_spec=cocoindex.LlmSpec(
                api_type=cocoindex.LlmApiType.OPENAI,
                model="gpt-4",
            ),
            output_type=Meeting,
        )
    )

Building the Graph

Collect Nodes and Relationships

meeting_nodes = data_scope.add_collector()
attended_rels = data_scope.add_collector()
decided_tasks_rels = data_scope.add_collector()
assigned_rels = data_scope.add_collector()

meeting_key = {"note_file": document["filename"], "time": parsed["time"]}

meeting_nodes.collect(**meeting_key, note=parsed["note"])
attended_rels.collect(
    id=cocoindex.GeneratedField.UUID,
    **meeting_key,
    person=parsed["organizer"]["name"],
    is_organizer=True,
)

with parsed["participants"].row() as participant:
    attended_rels.collect(
        id=cocoindex.GeneratedField.UUID,
        **meeting_key,
        person=participant["name"],
    )

Export to Neo4j with Upsert Logic

meeting_nodes.export(
    "meeting_nodes",
    cocoindex.targets.Neo4j(
        connection=conn_spec,
        mapping=cocoindex.targets.Nodes(label="Meeting"),
    ),
    primary_key_fields=["note_file", "time"],
)

Declare Person and Task nodes:

flow_builder.declare(
    cocoindex.targets.Neo4jDeclaration(
        connection=conn_spec,
        nodes_label="Person",
        primary_key_fields=["name"],
    )
)

flow_builder.declare(
    cocoindex.targets.Neo4jDeclaration(
        connection=conn_spec,
        nodes_label="Task",
        primary_key_fields=["description"],
    )
)

Export relationships:

attended_rels.export(
    "attended_rels",
    cocoindex.targets.Neo4j(
        connection=conn_spec,
        mapping=cocoindex.targets.Relationships(
            rel_type="ATTENDED",
            source=cocoindex.targets.NodeFromFields(
                label="Person",
                fields=[cocoindex.targets.TargetFieldMapping(
                    source="person", target="name"
                )],
            ),
            target=cocoindex.targets.NodeFromFields(
                label="Meeting",
                fields=[
                    cocoindex.targets.TargetFieldMapping("note_file"),
                    cocoindex.targets.TargetFieldMapping("time"),
                ],
            ),
        ),
    ),
    primary_key_fields=["id"],
)

Running the Pipeline

Setup

export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2

pip install cocoindex

Build the Graph

cocoindex update main

Query in Neo4j Browser (`http://localhost:7474`)

// Who attended which meetings?
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p, m

// Tasks decided in meetings
MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m, t

// Task assignments by person
MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p, t

Why This Matters

1. Cost Savings at Scale

Traditional approach: Reprocess 100 000 docs → 100 000 LLM calls
Incremental approach: Process 1 000 changed docs → 1 000 LLM calls

Result: 99 % cost reduction.

2. Real‑Time Updates

Switch to live mode and the graph updates automatically when meeting notes change:

refresh_interval=datetime.timedelta(minutes=1)

3. Data Lineage

CocoIndex tracks every transformation, allowing you to trace any Neo4j node back through LLM extraction to the source document.

Beyond Meeting Notes

This pattern works for any text‑heavy domain where documents evolve over time, delivering cost‑effective, up‑to‑date knowledge graphs.

How I Built a Self-Updating Neo4j Knowledge Graph from Meeting Notes (That Saves 99% on LLM Costs)

The Problem: Your Meeting Notes Are Wasted

What We’re Building

The Secret Sauce: Incremental Processing

1. Only Process What Changed

2. Smart Document Splitting

3. Structured LLM Extraction

Building the Graph

Collect Nodes and Relationships

Export to Neo4j with Upsert Logic

Running the Pipeline

Setup

Build the Graph

Query in Neo4j Browser (`http://localhost:7474`)

Why This Matters

1. Cost Savings at Scale

2. Real‑Time Updates

3. Data Lineage

Beyond Meeting Notes

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

The Problem: Your Meeting Notes Are Wasted

What We’re Building

The Secret Sauce: Incremental Processing

1. Only Process What Changed

2. Smart Document Splitting

3. Structured LLM Extraction

Building the Graph

Collect Nodes and Relationships

Export to Neo4j with Upsert Logic

Running the Pipeline

Setup

Build the Graph

Query in Neo4j Browser (http://localhost:7474)

Why This Matters

1. Cost Savings at Scale

2. Real‑Time Updates

3. Data Lineage

Beyond Meeting Notes

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

Query in Neo4j Browser (`http://localhost:7474`)