From Statistical Evidence to Executable Data Graphs
Source: Dev.to
Problem Statement
Most enterprises don’t lack data—they lack verified structure.
We’ve all seen relationship diagrams in slide decks: they look clean and make sense, but they are descriptive, not executable.
In practice, data relationships drift:
- Foreign keys are incomplete
- Naming conventions change
- Cross‑system links go undocumented
The real question becomes: How do you move from “assumed relationships” to verified, machine‑readable structure?
Our Approach
At Arisyn, we start from the data itself instead of relying on metadata. We analyze value behavior using statistical metrics:
null_row_num– understand field completenessdistinct_num– evaluate domain uniquenessco_occureandinclusion_ratio– detect structural inclusion
If 90 %+ of distinct values in one column appear in another, we treat that as a structural inclusion signal rather than a coincidence.
Methodology
- Compute statistical signals for each column pair.
- Validate relationships based on thresholds (e.g., inclusion ratio ≥ 0.9).
- Generate a structured JSON graph describing the validated edges.
Example JSON Graph
[
{
"source_table": "orders",
"source_column": "customer_id",
"target_table": "customers",
"target_column": "id"
},
{
"source_table": "order_items",
"source_column": "order_id",
"target_table": "orders",
"target_column": "id"
}
]
Each edge in the JSON is statistically validated.
Benefits
- Executable graphs can generate JOIN paths automatically.
- Diagrams become explanations; the underlying graph enforces relationships.
- Once relationships are machine‑readable, AI no longer has to guess—they operate within verified constraints.
Conclusion
Shifting from assumed, descriptive diagrams to statistically validated, executable data graphs transforms how enterprises manage and trust their data relationships.