I Turned My Meeting Notes Into a Self-Updating Neo4j Knowledge Graph
Source: Dev.to
Introduction
Every team generates meeting notes, but answering questions such as “Who was in all the budget meetings?” or “What tasks did Alex get assigned this month?” is difficult when the notes are just static text.
This article demonstrates a CocoIndex workflow that turns Markdown meeting notes stored in Google Drive into a live Neo4j knowledge graph that updates automatically whenever the notes change.
Why a Self‑Updating Graph?
- Large organizations can have tens of thousands to millions of meeting notes spread across folders and tools.
- Notes evolve constantly—people change, tasks move, decisions are revised.
- Traditional keyword search treats notes as static, leading to either expensive full recomputation or an outdated graph.
CocoIndex’s incremental processing detects only changed documents, runs LLM extraction on those sections, and upserts the results into Neo4j, keeping compute and LLM costs low even at scale.
Graph Model
| Node label | Key fields |
|---|---|
| Meeting | note_file, time |
| Person | name |
| Task | description |
| Relationship type | Direction |
|---|---|
| ATTENDED | Person → Meeting |
| DECIDED | Meeting → Task |
| ASSIGNED_TO | Person → Task |
This model supports queries like:
- “Which meetings did Dana attend?”
- “Where was this task decided?”
- “Who currently owns all tasks created in Q4?”
Pipeline Overview
- Google Drive (change tracking) – Detects new or modified files.
- Identify changed documents – Uses service‑account credentials and
last‑modifiedtimestamps. - Split each file into individual meetings – Based on Markdown headings.
- LLM extraction (only for changed meetings) – Returns structured data matching predefined dataclasses.
- Collect nodes and relationships – In‑memory tables (collectors).
- Export to Neo4j with upsert semantics – Prevents duplicate nodes/edges.
Prerequisites
| Requirement | Details |
|---|---|
| Neo4j | Local instance (UI at http://localhost:7474), user neo4j, password cocoindex. |
| OpenAI API | Set OPENAI_API_KEY in the environment. |
| Google Cloud service account | Must have read access to the meeting‑note folders in Drive. |
export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2
The GOOGLE_DRIVE_ROOT_FOLDER_IDS variable can contain a comma‑separated list if notes are stored in multiple folders.
Defining the Flow
import os, datetime
import cocoindex
from dataclasses import dataclass
@cocoindex.flow_def(name="MeetingNotesGraph")
def meeting_notes_graph_flow(flow_builder: cocoindex.FlowBuilder,
data_scope: cocoindex.DataScope) -> None:
credential_path = os.environ["GOOGLE_SERVICE_ACCOUNT_CREDENTIAL"]
root_folder_ids = os.environ["GOOGLE_DRIVE_ROOT_FOLDER_IDS"].split(",")
data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.GoogleDrive(
service_account_credential_path=credential_path,
root_folder_ids=root_folder_ids,
recent_changes_poll_interval=datetime.timedelta(seconds=10),
),
refresh_interval=datetime.timedelta(minutes=1),
)
recent_changes_poll_interval controls how often Drive is polled for modifications; refresh_interval determines the overall flow frequency.
Splitting Files into Meetings
with data_scope["documents"].row() as document:
document["meetings"] = document["content"].transform(
cocoindex.functions.SplitBySeparators(
separators_regex=[r"\n\n##?\ "],
keep_separator="RIGHT",
)
)
Keeping the heading (RIGHT) preserves titles, dates, and other cues useful for LLM extraction.
Data Schema
@dataclass
class Person:
name: str
@dataclass
class Task:
description: str
assigned_to: list[Person]
@dataclass
class Meeting:
time: datetime.date
note: str
organizer: Person
participants: list[Person]
tasks: list[Task]
These dataclasses are supplied to the LLM so the output already conforms to the expected structure.
Extraction per Meeting
with document["meetings"].row() as meeting:
parsed = meeting["parsed"] = meeting["text"].transform(
cocoindex.functions.ExtractByLlm(
llm_spec=cocoindex.LlmSpec(
api_type=cocoindex.LlmApiType.OPENAI,
model="gpt-4o",
),
output_type=Meeting,
)
)
CocoIndex caches the extraction result; the LLM is only invoked when the input text, model, or schema changes.
Collectors (In‑Memory Tables)
meeting_nodes = data_scope.add_collector()
attended_rels = data_scope.add_collector()
decided_tasks_rels = data_scope.add_collector()
assigned_rels = data_scope.add_collector()
Populating Collectors
meeting_key = {"note_file": document["filename"], "time": parsed["time"]}
meeting_nodes.collect(**meeting_key, note=parsed["note"])
attended_rels.collect(
id=cocoindex.GeneratedField.UUID,
**meeting_key,
person=parsed["organizer"]["name"],
is_organizer=True,
)
# Similar loops (omitted for brevity) add:
# - ATTENDED edges for each participant
# - DECIDED edges from meeting to each task
# - ASSIGNED_TO edges from each person to their tasks
Exporting to Neo4j
Nodes
meeting_nodes.export(
"meeting_nodes",
cocoindex.targets.Neo4j(
connection=conn_spec,
mapping=cocoindex.targets.Nodes(label="Meeting"),
),
primary_key_fields=["note_file", "time"],
)
flow_builder.declare(
cocoindex.targets.Neo4jDeclaration(
connection=conn_spec,
nodes_label="Person",
primary_key_fields=["name"],
)
)
flow_builder.declare(
cocoindex.targets.Neo4jDeclaration(
connection=conn_spec,
nodes_label="Task",
primary_key_fields=["description"],
)
)
Relationships
attended_rels.export(
"attended_rels",
cocoindex.targets.Neo4j(
connection=conn_spec,
mapping=cocoindex.targets.Relationships(
rel_type="ATTENDED",
source=cocoindex.targets.NodeFromFields(
label="Person",
fields=[cocoindex.targets.TargetFieldMapping(
source="person", target="name"
)],
),
target=cocoindex.targets.NodeFromFields(
label="Meeting",
fields=[
cocoindex.targets.TargetFieldMapping("note_file"),
cocoindex.targets.TargetFieldMapping("time"),
],
),
),
),
primary_key_fields=["id"],
)
Export equivalents for DECIDED (Meeting → Task) and ASSIGNED_TO (Person → Task) use similar definitions, ensuring relationship IDs prevent duplication on re‑runs.
Running the Workflow
pip install -e .
cocoindex update main
After the update finishes, open the Neo4j Browser at http://localhost:7474 and try the following queries:
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p, m;
MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m, t;
MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p, t;
CocoIndex only mutates nodes and relationships that actually changed, so the graph remains in sync with the source notes without unnecessary rewrites.