Unstructured Text is the Final Boss: Parsing Doctor's Notes with LLMs 🏥

Published: 12 hours ago (December 25, 2025 at 12:34 AM EST)

4 min read

Source: Dev.to

Hey devs! 👋

Let’s be honest. We all live in a bubble where we think data looks like this:

{
  "patient_id": 1024,
  "symptoms": ["headache", "nausea"],
  "severity": "moderate",
  "is_critical": false
}

It’s beautiful. It’s parsable. It’s type‑safe. 😍

But if you’ve ever worked in HealthTech (or scraped any legacy enterprise system), you know the reality is usually a terrifying block of free text written by a tired human at 3 AM.

I’ve been deep in the trenches lately trying to standardize clinical notes, and dealing with doctors’ notes makes parsing HTML with regex look like a vacation.

The Reality Check: “Pt c/o…”

Doctors don’t write JSON. They write in a secret code of abbreviations, typos, and shorthand.

The “data” actually looks like this:

“Pt 45yo m, c/o SOB x 2d. Denies CP. Hx of HTN, on lisinopril. Exam: wheezing b/l. Plan: nebs + steroids.”

If you run a standard keyword search for “High Blood Pressure,” you might miss this record entirely because the doctor wrote “HTN” (Hypertension).
If you search for “Pain,” you might get a false positive because the note says “Denies CP” (Chest Pain).

Traditional NLP struggles here because context is everything. “SOB” means “Shortness of Breath” in a hospital, but something very different in a Reddit comment section. 😂

The Hallucination Trap 👻

The modern solution is often phrased as: “Just throw it into ChatGPT/LLM, right?”

Well… yes and no.

If you ask a generic LLM to “Summarize this patient’s status,” it can do a great job—until it doesn’t. The biggest risk in medical AI is hallucination.

Example: A model read a note mentioning a “family history of diabetes” and output a structured JSON stating the patient currently has diabetes.

Big yikes. In healthcare, that kind of error is unacceptable.

The Fix: The RAG + Fine‑Tuning Sandwich 🥪

To make the data queryable (e.g., “Show me all patients with respiratory issues”) without the AI lying, we need a strict pipeline.

1. Fine‑Tuning (Teaching the Language)

Out‑of‑the‑box models like gpt-3.5-turbo often lack the nuance of niche specialties. Fine‑tuning a smaller model (e.g., Llama 3 or Mistral) on medical texts teaches it that bid means “twice a day” (bis in die), not an auction offer.

2. Structured Extraction (The Translator)

Instead of asking the LLM to “chat,” we force it to extract data into a predefined schema using tools like Pydantic or Instructor.

import instructor
from pydantic import BaseModel, Field
from openai import OpenAI

# Define the structure we WANT (The Dream)
class ClinicalNote(BaseModel):
    patient_age: int
    symptoms: list[str] = Field(description="List of physical complaints")
    medications: list[str]
    diagnosis_confirmed: bool = Field(description="Is the diagnosis final or just suspected?")

client = instructor.patch(OpenAI())

text_blob = "Pt 45yo m, c/o SOB x 2d. Denies CP. Hx of HTN, on lisinopril."

resp = client.chat.completions.create(
    model="gpt-4",
    response_model=ClinicalNote,
    messages=[
        {"role": "system", "content": "You are a medical scribe. Extract data accurately."},
        {"role": "user", "content": text_blob},
    ],
)

print(resp.model_dump_json(indent=2))

Output

{
  "patient_age": 45,
  "symptoms": ["Shortness of Breath"],
  "medications": ["lisinopril"],
  "diagnosis_confirmed": false
}

Now we have SQL‑queryable data! 🚀

3. RAG for Verification (The Guardrail)

Even with extraction, we need to trust the result. We embed the original notes into a vector database (e.g., Pinecone or Weaviate). When a user asks, “Does this patient have heart issues?”, the system:

Retrieves the specific chunk mentioning “Denies CP” and “Hx of HTN”.
Feeds only that chunk to the LLM.
Cites the source.

If the AI can’t find a relevant chunk, it is programmed to say “I don’t know” rather than guessing.

Conclusion

Standardizing free‑text clinical notes is painful, but it’s the only way to unlock the value in medical records. We must move away from “magic black‑box” AI toward structured AI pipelines—validating inputs, enforcing JSON schemas, and grounding everything in retrieved context.

It’s messy work, but someone’s gotta do it! 💻✨

Want to go deeper?
Check out my personal blog for the deep dives: wellally.tech/blog

Unstructured Text is the Final Boss: Parsing Doctor's Notes with LLMs 🏥

The Reality Check: “Pt c/o…”

The Hallucination Trap 👻

The Fix: The RAG + Fine‑Tuning Sandwich 🥪

1. Fine‑Tuning (Teaching the Language)

2. Structured Extraction (The Translator)

3. RAG for Verification (The Guardrail)

Conclusion

Related posts

Retrieval-Augmented Generation: Connecting LLMs to Your Data

RAG vs Fine-Tuning vs Prompt Engineering: The Ultimate Guide to Choosing the Right AI Strategy

Glitch v1: An LLM with a personality, anxiety and a bit of attitude

How to Do Evals on a Bloated RAG Pipeline