The Impossible Normalization: Why Your Database Hates Biology đ§Ź
Source: Dev.to
User Objects
We need to talk about User Objects.
If youâve been in web dev for more than five minutes, youâve probably built an eâcommerce backend or a todo app. The data modeling usually goes something like this:
- A
Userexists. - A
Productexists (it has a SKU, a price, and a description). - The user buys the product.
Itâs clean and deterministic. If I buy a Red TâShirt (SizeâŻL), that object doesnât change. It sits in a warehouse, gets shipped, and arrives as a Red TâShirt.
Now, imagine that Red TâShirt could spontaneously change its size depending on the weather, or describe itself as âsort of reddishâpink but hurts when you touch it.â
Welcome to HealthTech.
The âSimpleâ Symptom Trap
Letâs say youâre building an app for patients to track their symptoms. The junior dev in us immediately thinks: âEasy. SQL table.â
CREATE TABLE symptoms (
id SERIAL PRIMARY KEY,
user_id INT,
name VARCHAR(255), -- "Headache"
severity INT, -- 1 to 10
created_at TIMESTAMP
);
Ship it, right?
Wrong.
Two days later, a user writes: âMy headache is a 7/10, but itâs specifically behind my left eye and it pulses when I stand up.â
Your name column just broke. You need location (left eye), quality (pulsating), and aggravating factors (standing up).
So you think, âOkay, Iâll switch to a NoSQL document store. Iâll just dump a JSON blob!â
{
"symptom": "Headache",
"meta": {
"location": "Left retro-orbital",
"quality": "Pulsating",
"trigger": "Orthostatic"
}
}
Great. Now try to query that across 10,000 users to find everyone with âhead issues.â You canât, because UserâŻA wrote âHeadache,â UserâŻB wrote âMigraine,â and UserâŻC wrote âMy head hurts.â
Enter the Ontologies: SNOMEDâCT and LOINC
To solve this, the industry invented standard terminologies. The big boss here is SNOMEDâCT, which assigns a unique code to basically every medical concept in existence. A headache isnât âHeadacheâ; itâs SCTID:âŻ25064002.
This sounds like a developerâs dream (normalization!), but it turns into an implementation nightmare. Biology is a graph, not a list. A specific type of headache is a child of âPain,â which is a child of âClinical Finding.â
To store this properly, we usually look toward FHIR (Fast Healthcare Interoperability Resources). Hereâs a simplified JSON for an observation of body temperature:
{
"resourceType": "Observation",
"status": "final",
"category": [
{
"coding": [
{
"system": "http://terminology.hl7.org/CodeSystem/observation-category",
"code": "vital-signs",
"display": "Vital Signs"
}
]
}
],
"code": {
"coding": [
{
"system": "http://loinc.org",
"code": "8310-5",
"display": "Body temperature"
}
]
},
"valueQuantity": {
"value": 39.1,
"unit": "degrees C",
"system": "http://unitsofmeasure.org",
"code": "Cel"
}
// ... timestamps, performers, device used ...
}
We went from temperature: 39.1 to a 30âline JSON object because in biology, context is everything. A temperature of 39.1âŻÂ°C taken orally is different from one taken axillary (armpit). If you donât store the metadata, the data is medically dangerous.
The Map Is Not The Territory
The biggest philosophical hurdle we face as engineers is that human health is continuous, but databases are discrete.
When a user selects âModerate Painâ from a dropdown, we collapse a complex, fluid biological sensation into a distinct enum, losing resolution.
I wrote more about this loss of data resolution on my personal blogâif youâre into deeper architectural theory, check out my other articles.
The challenge is creating a UI that feels human (e.g., âWhere does it hurt?â) while mapping it to rigid, complex ontologies like FHIR and SNOMED without the user ever needing to know those standards.
So, Whatâs the Solution?
There isnât a silver bullet, but the following patterns tend to work in the wild:
- Store the Raw Intent: Keep what the user actually typed or clicked. Never discard the source of truth.
- Map Later: Use an ingestion pipeline to translate âMy head hurtsâ to
SCTID:âŻ25064002. Donât force the user to speak SNOMED. - Hybrid Schemas: Use relational columns for highâlevel data (patient ID, date) and
JSONBcolumns for the clinical payload (the FHIR resource).
Itâs messy, frustrating, and definitely harder than selling Tâshirts. But at least itâs never boring.
Anyone else suffering through HL7/FHIR integrations right now? Letâs commiserate in the comments. đ