I Turned My Meeting Notes Into a Self-Updating Neo4j Knowledge Graph

Published: 1 week ago (December 10, 2025 at 08:11 PM EST)

3 min read

Source: Dev.to

Introduction

Every team generates meeting notes, but answering questions such as “Who was in all the budget meetings?” or “What tasks did Alex get assigned this month?” is difficult when the notes are just static text.
This article demonstrates a CocoIndex workflow that turns Markdown meeting notes stored in Google Drive into a live Neo4j knowledge graph that updates automatically whenever the notes change.

Why a Self‑Updating Graph?

Large organizations can have tens of thousands to millions of meeting notes spread across folders and tools.
Notes evolve constantly—people change, tasks move, decisions are revised.
Traditional keyword search treats notes as static, leading to either expensive full recomputation or an outdated graph.

CocoIndex’s incremental processing detects only changed documents, runs LLM extraction on those sections, and upserts the results into Neo4j, keeping compute and LLM costs low even at scale.

Graph Model

Node label	Key fields
Meeting	`note_file`, `time`
Person	`name`
Task	`description`

Relationship type	Direction
ATTENDED	Person → Meeting
DECIDED	Meeting → Task
ASSIGNED_TO	Person → Task

This model supports queries like:

“Which meetings did Dana attend?”
“Where was this task decided?”
“Who currently owns all tasks created in Q4?”

Pipeline Overview

Google Drive (change tracking) – Detects new or modified files.
Identify changed documents – Uses service‑account credentials and last‑modified timestamps.
Split each file into individual meetings – Based on Markdown headings.
LLM extraction (only for changed meetings) – Returns structured data matching predefined dataclasses.
Collect nodes and relationships – In‑memory tables (collectors).
Export to Neo4j with upsert semantics – Prevents duplicate nodes/edges.

Prerequisites

Requirement	Details
Neo4j	Local instance (UI at `http://localhost:7474`), user `neo4j`, password `cocoindex`.
OpenAI API	Set `OPENAI_API_KEY` in the environment.
Google Cloud service account	Must have read access to the meeting‑note folders in Drive.

export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2

The GOOGLE_DRIVE_ROOT_FOLDER_IDS variable can contain a comma‑separated list if notes are stored in multiple folders.

Defining the Flow

import os, datetime
import cocoindex
from dataclasses import dataclass

@cocoindex.flow_def(name="MeetingNotesGraph")
def meeting_notes_graph_flow(flow_builder: cocoindex.FlowBuilder,
                             data_scope: cocoindex.DataScope) -> None:
    credential_path = os.environ["GOOGLE_SERVICE_ACCOUNT_CREDENTIAL"]
    root_folder_ids = os.environ["GOOGLE_DRIVE_ROOT_FOLDER_IDS"].split(",")

    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.GoogleDrive(
            service_account_credential_path=credential_path,
            root_folder_ids=root_folder_ids,
            recent_changes_poll_interval=datetime.timedelta(seconds=10),
        ),
        refresh_interval=datetime.timedelta(minutes=1),
    )

recent_changes_poll_interval controls how often Drive is polled for modifications; refresh_interval determines the overall flow frequency.

Splitting Files into Meetings

with data_scope["documents"].row() as document:
    document["meetings"] = document["content"].transform(
        cocoindex.functions.SplitBySeparators(
            separators_regex=[r"\n\n##?\ "],
            keep_separator="RIGHT",
        )
    )

Keeping the heading (RIGHT) preserves titles, dates, and other cues useful for LLM extraction.

Data Schema

@dataclass
class Person:
    name: str

@dataclass
class Task:
    description: str
    assigned_to: list[Person]

@dataclass
class Meeting:
    time: datetime.date
    note: str
    organizer: Person
    participants: list[Person]
    tasks: list[Task]

These dataclasses are supplied to the LLM so the output already conforms to the expected structure.

Extraction per Meeting

with document["meetings"].row() as meeting:
    parsed = meeting["parsed"] = meeting["text"].transform(
        cocoindex.functions.ExtractByLlm(
            llm_spec=cocoindex.LlmSpec(
                api_type=cocoindex.LlmApiType.OPENAI,
                model="gpt-4o",
            ),
            output_type=Meeting,
        )
    )

CocoIndex caches the extraction result; the LLM is only invoked when the input text, model, or schema changes.

Collectors (In‑Memory Tables)

meeting_nodes = data_scope.add_collector()
attended_rels = data_scope.add_collector()
decided_tasks_rels = data_scope.add_collector()
assigned_rels = data_scope.add_collector()

Populating Collectors

meeting_key = {"note_file": document["filename"], "time": parsed["time"]}

meeting_nodes.collect(**meeting_key, note=parsed["note"])
attended_rels.collect(
    id=cocoindex.GeneratedField.UUID,
    **meeting_key,
    person=parsed["organizer"]["name"],
    is_organizer=True,
)

# Similar loops (omitted for brevity) add:
# - ATTENDED edges for each participant
# - DECIDED edges from meeting to each task
# - ASSIGNED_TO edges from each person to their tasks

Exporting to Neo4j

Nodes

meeting_nodes.export(
    "meeting_nodes",
    cocoindex.targets.Neo4j(
        connection=conn_spec,
        mapping=cocoindex.targets.Nodes(label="Meeting"),
    ),
    primary_key_fields=["note_file", "time"],
)

flow_builder.declare(
    cocoindex.targets.Neo4jDeclaration(
        connection=conn_spec,
        nodes_label="Person",
        primary_key_fields=["name"],
    )
)

flow_builder.declare(
    cocoindex.targets.Neo4jDeclaration(
        connection=conn_spec,
        nodes_label="Task",
        primary_key_fields=["description"],
    )
)

Relationships

attended_rels.export(
    "attended_rels",
    cocoindex.targets.Neo4j(
        connection=conn_spec,
        mapping=cocoindex.targets.Relationships(
            rel_type="ATTENDED",
            source=cocoindex.targets.NodeFromFields(
                label="Person",
                fields=[cocoindex.targets.TargetFieldMapping(
                    source="person", target="name"
                )],
            ),
            target=cocoindex.targets.NodeFromFields(
                label="Meeting",
                fields=[
                    cocoindex.targets.TargetFieldMapping("note_file"),
                    cocoindex.targets.TargetFieldMapping("time"),
                ],
            ),
        ),
    ),
    primary_key_fields=["id"],
)

Export equivalents for DECIDED (Meeting → Task) and ASSIGNED_TO (Person → Task) use similar definitions, ensuring relationship IDs prevent duplication on re‑runs.

Running the Workflow

pip install -e .
cocoindex update main

After the update finishes, open the Neo4j Browser at http://localhost:7474 and try the following queries:

MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p, m;

MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m, t;

MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p, t;

CocoIndex only mutates nodes and relationships that actually changed, so the graph remains in sync with the source notes without unnecessary rewrites.