[Paper] Clinical Data Goes MEDS? Let's OWL make sense of it

Published: (January 7, 2026 at 01:25 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.04164v1

Overview

The paper introduces MEDS‑OWL, an OWL ontology that maps the Medical Event Data Standard (MEDS) into the Semantic Web world. By turning MEDS‑formatted clinical event data into RDF graphs, the authors enable FAIR‑compliant, provenance‑rich datasets that can be queried and linked with other biomedical resources—opening the door for reproducible, graph‑based machine‑learning pipelines in healthcare.

Key Contributions

  • MEDS‑OWL ontology: a lightweight, formally defined OWL model (13 classes, 10 object properties, 20 data properties, 24 axioms) that captures the event‑centric concepts of MEDS.
  • meds2rdf Python library: an open‑source converter that ingests MEDS JSON/CSV files and emits validated RDF graphs conforming to MEDS‑OWL.
  • SHACL validation suite: a set of Shape Constraint Language rules that automatically check the structural integrity of the generated graphs.
  • Proof‑of‑concept on a synthetic aneurysm care pathway dataset, demonstrating end‑to‑end transformation, validation, and the ability to link clinical events to external ontologies.
  • FAIR alignment: the combined stack (ontology + converter + SHACL) satisfies key FAIR principles (Findable, Accessible, Interoperable, Reusable) for event‑based health data.

Methodology

  1. Modeling MEDS in OWL – The authors distilled the MEDS specification into a concise ontology, reusing existing biomedical vocabularies (e.g., SNOMED CT, FHIR) where possible and defining new classes for “Event”, “Patient”, “Encounter”, etc.
  2. Implementation of meds2rdf – A Python package parses MEDS records, maps each field to the corresponding OWL class/property, and builds an RDF graph using the rdflib library.
  3. Validation with SHACL – After conversion, the graph is run through a SHACL engine that checks cardinalities, datatype constraints, and required relationships, ensuring the output is semantically sound.
  4. Demonstration – A synthetic dataset describing the timeline of ruptured intracranial aneurysm treatment (diagnosis, imaging, surgery, follow‑up) was transformed, validated, and inspected with SPARQL queries to illustrate typical analytics use‑cases.

Results & Findings

  • The generated RDF graph faithfully represented all MEDS events and passed 100 % of the SHACL constraints.
  • Querying the graph revealed complex temporal patterns (e.g., median time from diagnosis to surgery) that are cumbersome to extract from flat MEDS files.
  • The ontology’s modest size kept conversion overhead low: converting a 10 k‑record MEDS file took ≈2 seconds on a standard laptop.
  • Linking to external ontologies (e.g., mapping procedure codes to SNOMED CT) was achieved with single‑line SPARQL joins, showcasing the interoperability gains.

Practical Implications

  • Data pipelines: Developers can plug meds2rdf into existing ETL workflows to automatically produce RDF datasets ready for graph databases (Neo4j, Blazegraph) or triplestores (GraphDB, Virtuoso).
  • Reproducible ML: Event‑centric RDF graphs enable feature engineering via graph embeddings (e.g., node2vec, GraphSAGE) while preserving provenance metadata, improving model transparency.
  • Cross‑institution collaboration: Because the output conforms to FAIR and Semantic Web standards, hospitals can share de‑identified event data without losing semantic richness, facilitating multi‑center studies.
  • Regulatory reporting: The SHACL validation layer provides an auditable checkpoint that can be integrated into compliance pipelines for clinical data submissions.
  • Rapid prototyping: With a small ontology and a ready‑made converter, data scientists can experiment with knowledge‑graph analytics (e.g., causal path discovery) without building a custom schema from scratch.

Limitations & Future Work

  • Synthetic evaluation: The proof‑of‑concept uses a simulated dataset; real‑world clinical data may expose edge cases (missing timestamps, heterogeneous coding systems) not covered by the current SHACL rules.
  • Ontology scope: MEDS‑OWL focuses on core event concepts; richer clinical domains (genomics, imaging metadata) will require extensions or integration with larger ontologies.
  • Performance at scale: While conversion is fast for modest sizes, the authors note the need for benchmarking on millions of events and exploring streaming or parallel conversion strategies.
  • Tooling ecosystem: Future releases aim to provide tighter integration with popular FHIR servers, automated ontology versioning, and a GUI for SHACL rule authoring.

Bottom line: MEDS‑OWL and the accompanying meds2rdf library give developers a pragmatic bridge between standardized clinical event data and the Semantic Web, paving the way for more interoperable, reproducible, and graph‑driven health‑AI solutions.

Authors

  • Alberto Marfoglia
  • Jong Ho Jhee
  • Adrien Coulet

Paper Information

  • arXiv ID: 2601.04164v1
  • Categories: cs.LG, cs.AI
  • Published: January 7, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »