Medicine Encyclopedia 2.0: Stop Guessing and Start Scanning with Multimodal RAG

Published: 3 days ago (February 11, 2026 at 08:15 PM EST)

6 min read

Source: Dev.to

Architecture Overview

The logic flow:

User uploads a photo of the medication label.
PaddleOCR extracts the text.
Entity extraction separates drug names from dosage information.
RxNav API checks drug‑drug interactions.
ChromaDB retrieves local guidelines / personal health context.
An LLM reasoning engine synthesises a human‑readable safety report.
RAGas evaluates the answer for faithfulness and relevance.

graph TD
    A[User Uploads Photo] --> B[PaddleOCR: Text Extraction]
    B --> C{Entity Extraction}
    C -->|Drug Names| D[RxNav API: Interaction Check]
    C -->|Dosage Info| E[ChromaDB: Manuals/Guidelines]
    D --> F[LLM Reasoning Engine]
    E --> F
    F --> G[Final Response: Safety Advice]
    G --> H[Evaluation: RAGas]

Prerequisites

Component	Why we need it
PaddleOCR	Ultra‑fast, accurate OCR that handles tilted text and diverse fonts on medicine packaging.
ChromaDB	Lightweight vector store for local drug manuals, hospital guidelines, or personal health records.
RxNav API	Gold‑standard source for drug interaction data (National Library of Medicine).
RAGas	Toolkit to evaluate whether the RAG pipeline is hallucinating.

Make sure you have Python 3.9+ and the following packages installed:

pip install paddleocr chromadb ragas datasets requests

Step 1 – Extracting Ingredients with PaddleOCR

from paddleocr import PaddleOCR

# Initialise the OCR engine (angle classification enabled for rotated text)
ocr = PaddleOCR(use_angle_cls=True, lang='en')

def get_drug_names(img_path: str) -> str:
    """
    Perform OCR on the image and return a single string with all detected text.
    """
    result = ocr.ocr(img_path, cls=True)
    # Flatten the nested list and keep only the recognized text fragments
    raw_text = [line[1][0] for page in result for line in page]
    print(f"Detected Text: {raw_text}")
    return " ".join(raw_text)

# Example usage
# extracted_text = get_drug_names("advil_box.jpg")

Output example:

Detected Text: ['Advil', 'Ibuprofen', '200', 'mg', 'Take', '1', 'tablet', 'every', '4-6', 'hours']

Step 2 – Querying the RxNav API for Interactions

First, map the extracted drug names to RxNorm Concept Unique Identifiers (RXCUIs) (you can use the RxNav “approximateTerm” endpoint for this). Then request interaction data:

import requests
from typing import List

def check_interactions(rxcuis: List[str]) -> List[str]:
    """
    Query RxNav for interactions among the supplied RxCUIs.
    Returns a list of interaction descriptions.
    """
    ids = "+".join(rxcuis)
    url = f"https://rxnav.nlm.nih.gov/REST/interaction/list.json?rxcuis={ids}"
    response = requests.get(url).json()

    interactions = []
    if "fullInteractionTypeGroup" in response:
        for group in response["fullInteractionTypeGroup"]:
            for item in group["fullInteractionType"]:
                # Each interactionPair may contain multiple descriptions; we take the first.
                interactions.append(item["interactionPair"][0]["description"])
    return interactions

# Example:
# rxcuis = ["5640"]  # RxCUI for Ibuprofen
# interactions = check_interactions(rxcuis)

Step 3 – Augmenting with Local Context (ChromaDB)

The public API may miss institution‑specific guidelines or personal health history. Store such nuances in a vector store and retrieve the most relevant snippet at query time.

import chromadb
from chromadb.utils import embedding_functions
from typing import List

# Initialise Chroma client (in‑memory for the demo)
client = chromadb.Client()
collection = client.create_collection(name="medical_guidelines")

# Add a few example documents
collection.add(
    documents=[
        "Patient A has a history of stomach ulcers. Avoid NSAIDs like Ibuprofen.",
        "Guideline: Do not combine antihistamines with MAO inhibitors."
    ],
    metadatas=[
        {"source": "electronic_health_record"},
        {"source": "hospital_policy"}
    ],
    ids=["rec1", "rec2"]
)

def get_local_context(query: str, n_results: int = 1) -> List[str]:
    """
    Retrieve the most relevant local documents for the given query.
    """
    results = collection.query(query_texts=[query], n_results=n_results)
    return results['documents'][0]  # Returns a list of strings

# Example:
# context = get_local_context("Ibuprofen ulcer")

The “Official” Way to Build AI Agents

While this tutorial is a solid starting point for a Learning‑in‑Public project, production‑grade AI healthcare tools demand:

Rigorous prompt engineering and chain‑of‑thought reasoning.
Strict data‑privacy safeguards (HIPAA, GDPR).
Monitoring, logging, and model‑version control.

For deeper dives into Agentic RAG, Production‑ready Multimodal Pipelines, and compliance best practices, see the articles on the WellAlly Tech Blog.

Step 4 – Generating a Safety Report with an LLM

Combine OCR output, interaction data, and local context into a concise, user‑friendly message.

def generate_safety_report(ocr_text: str,
                          interactions: List[str],
                          context: List[str]) -> str:
    """
    Build a prompt for the LLM and return the generated safety report.
    """
    prompt = f"""
    User scanned a medicine label: "{ocr_text}"
    Known clinical interactions: {interactions}
    Personal health context: {context}

    Provide a short, plain‑language report that tells the user whether the medication is safe
    to take, or if a warning is needed. Use the format:
    "SAFE: ..." or "WARNING: ..."
    """
    # Replace the following line with your LLM call (e.g., OpenAI, Anthropic, etc.)
    # response = llm.complete(prompt)
    # For demo purposes we return a hard‑coded warning:
    return "WARNING: You are taking Advil (Ibuprofen) while having a history of stomach ulcers. Consult a doctor before use."

# Example:
# report = generate_safety_report(extracted_text, interactions, context)
# print(report)

Step 5 – Evaluating with RAGas

RAGas helps you measure faithfulness (does the answer stay true to the source?) and answer relevance (does it address the user’s question?).

from ragas import evaluate
from datasets import Dataset

# Assume `generated_report` is the string returned by `generate_safety_report`
generated_report = generate_safety_report(extracted_text, interactions, context)

# Build a tiny evaluation dataset
data_samples = {
    "question": ["Can I take Advil with my current meds?"],
    "answer": [generated_report],
    "contexts": [[f"{interactions} {context}"]],
    "ground_truth": ["WARNING: Ibuprofen conflicts with ulcer history. Consult a physician."]
}

eval_dataset = Dataset.from_dict(data_samples)

# Run RAGas evaluation (you may need to configure the LLM and embedding models)
metrics = evaluate(eval_dataset, metrics=["faithfulness", "answer_relevance"])
print(metrics)

The resulting scores tell you whether your pipeline is hallucinating or staying grounded in the retrieved evidence.

🎉 You’ve built a multimodal RAG system that:

Reads medication labels from a photo.
Extracts drug names via OCR.
Looks up interactions through RxNav.
Enriches the answer with local, patient‑specific context stored in ChromaDB.
Generates a concise safety report with an LLM.
Validates the output using RAGas.

Feel free to extend the system with:

Batch processing for multiple pills at once.
Voice‑assistant integration (e.g., Alexa, Google Assistant).
Secure storage of personal health records (encrypted, on‑device).

Happy hacking, and stay safe! 🚀

taset = Dataset.from_dict(data_samples)
# score = evaluate(dataset, metrics=[faithfulness, answer_relevance])
# print(score)

Conclusion: The Future of Health‑Tech

By combining PaddleOCR for vision, RxNav for medical truth, and ChromaDB for personalized context, we’ve built a powerful tool that literally saves lives. Multimodal RAG is moving fast, and this is just the tip of the iceberg!

What’s next?

Try adding a pill identification feature using a CNN.
Integrate voice‑to‑text so users can ask questions hands‑free.

If you enjoyed this build, drop a comment below or 🦄 heart this post! And don’t forget to visit WellAlly Tech for more high‑level AI tutorials.

Happy coding!

Medicine Encyclopedia 2.0: Stop Guessing and Start Scanning with Multimodal RAG

Architecture Overview

Prerequisites

Step 1 – Extracting Ingredients with PaddleOCR

Step 2 – Querying the RxNav API for Interactions

Step 3 – Augmenting with Local Context (ChromaDB)

The “Official” Way to Build AI Agents

Step 4 – Generating a Safety Report with an LLM

Step 5 – Evaluating with RAGas

🎉 You’ve built a multimodal RAG system that:

Conclusion: The Future of Health‑Tech

What’s next?

Related posts

Cast Your Bread Upon the Waters

If you think you can use LinkedIn automation — think twice

Take your voice anywhere, transcribe on YOUR hardware.

I gave my terminal an AI agent named Nura. She diagnoses my broken Ethiopian internet.

Architecture Overview

Prerequisites

Step 1 – Extracting Ingredients with PaddleOCR

Step 2 – Querying the RxNav API for Interactions

Step 3 – Augmenting with Local Context (ChromaDB)

The “Official” Way to Build AI Agents

Step 4 – Generating a Safety Report with an LLM

Step 5 – Evaluating with RAGas

🎉 You’ve built a multimodal RAG system that:

Conclusion: The Future of Health‑Tech

What’s next?

Related posts

Cast Your Bread Upon the Waters

If you think you can use LinkedIn automation — think twice

Take your voice anywhere, transcribe on YOUR hardware.

I gave my terminal an AI agent named Nura. She diagnoses my broken Ethiopian internet.

Step 1 – Extracting Ingredients with PaddleOCR

Step 2 – Querying the RxNav API for Interactions

Step 3 – Augmenting with Local Context (ChromaDB)

Step 4 – Generating a Safety Report with an LLM

Step 5 – Evaluating with RAGas