Why we built provenance into a notes app
Source: Dev.to
Introduction
Notes written in the field are great for recording results and reflections, but keeping the flow that led to those results as structured data is surprisingly hard. The procedure that lived only in someone’s head, the implicit assumptions that didn’t make it onto the page, and the judgment calls that got summarized away in meeting decks—when you read the notes back years later, those rarely survive together.
I felt this myself. More than ten years after stepping away from active experimental work, I tried to recall the flow of one of those experiments. What I had left were fragmented notes and a few meeting decks. The results were there, but the flow that led to them had thinned out over time.
The Need for Structured Flow
A common methodology suggests writing the procedure as a flow chart in your notebook. Even so, field notes tend to be results‑centric, and keeping the flow recorded as structure on top of that takes more effort than expected. To reproduce work later, having a setup that records the flow as structured data alongside the results would be valuable.
Japanese cooking has a saying about sa‑shi‑su‑se‑so—adding sugar, salt, vinegar, soy sauce, and miso in that order changes the taste. Experiments are the same. What you add, when, and how long you heat it for—those choices shape the outcome. The flow is the result.
So I wanted a way to keep “what came from what, through what flow” — that causal data — as a structured record. This became especially interesting as AI starts handling experimental data.
Provenance Data Model (PROV‑DM)
I came across PROV‑DM (Provenance Data Model), a W3C standard for describing what was made, from what, and how. It defines three primitives—Entity, Activity, Agent—and the relations between them.
Academic data systems use it, but personal notes apps generally do not. Yet the daily output of a researcher already fits this shape: “I heat‑treated Sample A and got Sample B” is literally “Entity B was generated by Activity (heat treatment) from Entity A.” With this, I had a way to keep the experimental flow—the part that used to live only in my head—as structured data.
Looking deeper, the same model also fits document edit history, not only content provenance. In fact, that may be closer to what PROV‑DM was originally designed for.
Graphium: Two Layers of Provenance
In Graphium, provenance is tracked in two layers:
Layer 1 – Content Provenance
The experimental workflow (Sample A → Sample B, etc.). This keeps the procedure inside a note as causal structure, allowing a researcher—or an AI reading the note later—to trace the flow at a higher resolution.
Layer 2 – Document Edit Provenance
Who edited what, and when. The editor (human or AI) maps to prov:Agent, edit operations to prov:Activity, and document revisions to prov:Entity. Recording “AI or not” as an Agent gives a clear distinction when re‑reading or sharing work.
Both layers grow in value as AI becomes part of the picture. Structured procedures become material that can be analyzed or reused, and edit provenance clarifies the provenance of AI‑generated content.
Design Approach
Asking users to author a graph directly is a non‑starter. Graphium maps PROV‑DM onto the grammar of the document itself:
- Headings become Activities.
- Short inline highlights inside headings turn the named term (e.g., “NaCl”, “80 °C”, “clear solution”) into Entities.
The writing experience stays “type a heading, write a paragraph, occasionally highlight a word.” The provenance graph is a computed view—never something you edit by hand.
Not every link should be causal. Statements like “This paper was interesting” or “this concept resembles that one” are non‑directional. Forcing causality on them is unnatural, so Graphium splits them:
@mentions default to knowledge links (no direction, cycles allowed).- Relations between inline highlights inside heading scopes are provenance links (directed, acyclic).
Thus the same act of writing yields two different graphs underneath.
Broader Applicability
The “what came from what” question applies far beyond lab work: recipes, software change histories, medical records—the shape is the same. That’s why the header image of this post is a bread‑making note rather than a chemistry experiment.
Source Code
The implementation is open‑source: