Medicine Encyclopedia 2.0: Stop Guessing and Start Scanning with Multimodal RAG
Source: Dev.to
Architecture Overview
The logic flow:
- User uploads a photo of the medication label.
- PaddleOCR extracts the text.
- Entity extraction separates drug names from dosage information.
- RxNav API checks drug‑drug interactions.
- ChromaDB retrieves local guidelines / personal health context.
- An LLM reasoning engine synthesises a human‑readable safety report.
- RAGas evaluates the answer for faithfulness and relevance.
graph TD
A[User Uploads Photo] --> B[PaddleOCR: Text Extraction]
B --> C{Entity Extraction}
C -->|Drug Names| D[RxNav API: Interaction Check]
C -->|Dosage Info| E[ChromaDB: Manuals/Guidelines]
D --> F[LLM Reasoning Engine]
E --> F
F --> G[Final Response: Safety Advice]
G --> H[Evaluation: RAGas]
Prerequisites
| Component | Why we need it |
|---|---|
| PaddleOCR | Ultra‑fast, accurate OCR that handles tilted text and diverse fonts on medicine packaging. |
| ChromaDB | Lightweight vector store for local drug manuals, hospital guidelines, or personal health records. |
| RxNav API | Gold‑standard source for drug interaction data (National Library of Medicine). |
| RAGas | Toolkit to evaluate whether the RAG pipeline is hallucinating. |
Make sure you have Python 3.9+ and the following packages installed:
pip install paddleocr chromadb ragas datasets requests
Step 1 – Extracting Ingredients with PaddleOCR
from paddleocr import PaddleOCR
# Initialise the OCR engine (angle classification enabled for rotated text)
ocr = PaddleOCR(use_angle_cls=True, lang='en')
def get_drug_names(img_path: str) -> str:
"""
Perform OCR on the image and return a single string with all detected text.
"""
result = ocr.ocr(img_path, cls=True)
# Flatten the nested list and keep only the recognized text fragments
raw_text = [line[1][0] for page in result for line in page]
print(f"Detected Text: {raw_text}")
return " ".join(raw_text)
# Example usage
# extracted_text = get_drug_names("advil_box.jpg")
Output example:
Detected Text: ['Advil', 'Ibuprofen', '200', 'mg', 'Take', '1', 'tablet', 'every', '4-6', 'hours']
Step 2 – Querying the RxNav API for Interactions
First, map the extracted drug names to RxNorm Concept Unique Identifiers (RXCUIs) (you can use the RxNav “approximateTerm” endpoint for this). Then request interaction data:
import requests
from typing import List
def check_interactions(rxcuis: List[str]) -> List[str]:
"""
Query RxNav for interactions among the supplied RxCUIs.
Returns a list of interaction descriptions.
"""
ids = "+".join(rxcuis)
url = f"https://rxnav.nlm.nih.gov/REST/interaction/list.json?rxcuis={ids}"
response = requests.get(url).json()
interactions = []
if "fullInteractionTypeGroup" in response:
for group in response["fullInteractionTypeGroup"]:
for item in group["fullInteractionType"]:
# Each interactionPair may contain multiple descriptions; we take the first.
interactions.append(item["interactionPair"][0]["description"])
return interactions
# Example:
# rxcuis = ["5640"] # RxCUI for Ibuprofen
# interactions = check_interactions(rxcuis)
Step 3 – Augmenting with Local Context (ChromaDB)
The public API may miss institution‑specific guidelines or personal health history. Store such nuances in a vector store and retrieve the most relevant snippet at query time.
import chromadb
from chromadb.utils import embedding_functions
from typing import List
# Initialise Chroma client (in‑memory for the demo)
client = chromadb.Client()
collection = client.create_collection(name="medical_guidelines")
# Add a few example documents
collection.add(
documents=[
"Patient A has a history of stomach ulcers. Avoid NSAIDs like Ibuprofen.",
"Guideline: Do not combine antihistamines with MAO inhibitors."
],
metadatas=[
{"source": "electronic_health_record"},
{"source": "hospital_policy"}
],
ids=["rec1", "rec2"]
)
def get_local_context(query: str, n_results: int = 1) -> List[str]:
"""
Retrieve the most relevant local documents for the given query.
"""
results = collection.query(query_texts=[query], n_results=n_results)
return results['documents'][0] # Returns a list of strings
# Example:
# context = get_local_context("Ibuprofen ulcer")
The “Official” Way to Build AI Agents
While this tutorial is a solid starting point for a Learning‑in‑Public project, production‑grade AI healthcare tools demand:
- Rigorous prompt engineering and chain‑of‑thought reasoning.
- Strict data‑privacy safeguards (HIPAA, GDPR).
- Monitoring, logging, and model‑version control.
For deeper dives into Agentic RAG, Production‑ready Multimodal Pipelines, and compliance best practices, see the articles on the WellAlly Tech Blog.
Step 4 – Generating a Safety Report with an LLM
Combine OCR output, interaction data, and local context into a concise, user‑friendly message.
def generate_safety_report(ocr_text: str,
interactions: List[str],
context: List[str]) -> str:
"""
Build a prompt for the LLM and return the generated safety report.
"""
prompt = f"""
User scanned a medicine label: "{ocr_text}"
Known clinical interactions: {interactions}
Personal health context: {context}
Provide a short, plain‑language report that tells the user whether the medication is safe
to take, or if a warning is needed. Use the format:
"SAFE: ..." or "WARNING: ..."
"""
# Replace the following line with your LLM call (e.g., OpenAI, Anthropic, etc.)
# response = llm.complete(prompt)
# For demo purposes we return a hard‑coded warning:
return "WARNING: You are taking Advil (Ibuprofen) while having a history of stomach ulcers. Consult a doctor before use."
# Example:
# report = generate_safety_report(extracted_text, interactions, context)
# print(report)
Step 5 – Evaluating with RAGas
RAGas helps you measure faithfulness (does the answer stay true to the source?) and answer relevance (does it address the user’s question?).
from ragas import evaluate
from datasets import Dataset
# Assume `generated_report` is the string returned by `generate_safety_report`
generated_report = generate_safety_report(extracted_text, interactions, context)
# Build a tiny evaluation dataset
data_samples = {
"question": ["Can I take Advil with my current meds?"],
"answer": [generated_report],
"contexts": [[f"{interactions} {context}"]],
"ground_truth": ["WARNING: Ibuprofen conflicts with ulcer history. Consult a physician."]
}
eval_dataset = Dataset.from_dict(data_samples)
# Run RAGas evaluation (you may need to configure the LLM and embedding models)
metrics = evaluate(eval_dataset, metrics=["faithfulness", "answer_relevance"])
print(metrics)
The resulting scores tell you whether your pipeline is hallucinating or staying grounded in the retrieved evidence.
🎉 You’ve built a multimodal RAG system that:
- Reads medication labels from a photo.
- Extracts drug names via OCR.
- Looks up interactions through RxNav.
- Enriches the answer with local, patient‑specific context stored in ChromaDB.
- Generates a concise safety report with an LLM.
- Validates the output using RAGas.
Feel free to extend the system with:
- Batch processing for multiple pills at once.
- Voice‑assistant integration (e.g., Alexa, Google Assistant).
- Secure storage of personal health records (encrypted, on‑device).
Happy hacking, and stay safe! 🚀
taset = Dataset.from_dict(data_samples)
# score = evaluate(dataset, metrics=[faithfulness, answer_relevance])
# print(score)
Conclusion: The Future of Health‑Tech
By combining PaddleOCR for vision, RxNav for medical truth, and ChromaDB for personalized context, we’ve built a powerful tool that literally saves lives. Multimodal RAG is moving fast, and this is just the tip of the iceberg!
What’s next?
- Try adding a pill identification feature using a CNN.
- Integrate voice‑to‑text so users can ask questions hands‑free.
If you enjoyed this build, drop a comment below or 🦄 heart this post! And don’t forget to visit WellAlly Tech for more high‑level AI tutorials.
Happy coding!