医学百科全书 2.0：停止猜测，开始扫描，使用 Multimodal RAG

发布: 2个月前 (2026年2月12日 GMT+8 09:15)

7 分钟阅读

原文: Dev.to

Source: Dev.to

架构概览

逻辑流程：

用户上传药品标签的照片。
PaddleOCR 提取文本。
实体抽取 将药品名称与剂量信息分离。
RxNav API 检查药物间相互作用。
ChromaDB 检索本地指南 / 个人健康上下文。
LLM 推理引擎 合成可读的安全报告。
RAGas 评估答案的忠实度和相关性。

graph TD
    A[User Uploads Photo] --> B[PaddleOCR: Text Extraction]
    B --> C{Entity Extraction}
    C -->|Drug Names| D[RxNav API: Interaction Check]
    C -->|Dosage Info| E[ChromaDB: Manuals/Guidelines]
    D --> F[LLM Reasoning Engine]
    E --> F
    F --> G[Final Response: Safety Advice]
    G --> H[Evaluation: RAGas]

前置条件

组件	为什么需要它
PaddleOCR	超高速、精准的 OCR，能够处理药品包装上的倾斜文字和多种字体。
ChromaDB	轻量级向量存储，用于本地药品手册、医院指南或个人健康记录。
RxNav API	药物相互作用数据的金标准来源（美国国家医学图书馆）。
RAGas	用于评估 RAG 流程是否出现幻觉的工具包。

确保你使用 Python 3.9+ 并已安装以下软件包：

pip install paddleocr chromadb ragas datasets requests

第1步 – 使用 PaddleOCR 提取成分

from paddleocr import PaddleOCR

# Initialise the OCR engine (angle classification enabled for rotated text)
ocr = PaddleOCR(use_angle_cls=True, lang='en')

def get_drug_names(img_path: str) -> str:
    """
    Perform OCR on the image and return a single string with all detected text.
    """
    result = ocr.ocr(img_path, cls=True)
    # Flatten the nested list and keep only the recognized text fragments
    raw_text = [line[1][0] for page in result for line in page]
    print(f"Detected Text: {raw_text}")
    return " ".join(raw_text)

# Example usage
# extracted_text = get_drug_names("advil_box.jpg")

输出示例:

Detected Text: ['Advil', 'Ibuprofen', '200', 'mg', 'Take', '1', 'tablet', 'every', '4-6', 'hours']

步骤 2 – 查询 RxNav API 以获取相互作用

首先，将提取的药物名称映射到 RxNorm 概念唯一标识符 (RXCUI)（您可以使用 RxNav 的 “approximateTerm” 端点来完成此操作）。然后请求相互作用数据：

第3步 – 使用本地上下文增强（ChromaDB）

公共 API 可能遗漏机构特定的指南或个人健康史。将这些细微差别存储在向量库中，并在查询时检索最相关的片段。

import chromadb
from chromadb.utils import embedding_functions
from typing import List

# Initialise Chroma client (in‑memory for the demo)
client = chromadb.Client()
collection = client.create_collection(name="medical_guidelines")

# Add a few example documents
collection.add(
    documents=[
        "Patient A has a history of stomach ulcers. Avoid NSAIDs like Ibuprofen.",
        "Guideline: Do not combine antihistamines with MAO inhibitors."
    ],
    metadatas=[
        {"source": "electronic_health_record"},
        {"source": "hospital_policy"}
    ],
    ids=["rec1", "rec2"]
)

def get_local_context(query: str, n_results: int = 1) -> List[str]:
    """
    Retrieve the most relevant local documents for the given query.
    """
    results = collection.query(query_texts=[query], n_results=n_results)
    return results['documents'][0]  # Returns a list of strings

# Example:
# context = get_local_context("Ibuprofen ulcer")

“官方”构建 AI 代理的方式

虽然本教程是 Learning‑in‑Public 项目的一个扎实起点，但生产级 AI 医疗保健工具需要：

严谨的提示工程和链式思考推理。
严格的数据隐私保护（HIPAA、GDPR）。
监控、日志记录以及模型版本控制。

想深入了解 Agentic RAG、Production‑ready Multimodal Pipelines 和合规最佳实践，请参阅 WellAlly Tech Blog 上的相关文章。

第4步 – 使用大型语言模型生成安全报告

将 OCR 输出、相互作用数据和本地上下文合并为简洁、用户友好的信息。

def generate_safety_report(ocr_text: str,
                          interactions: List[str],
                          context: List[str]) -> str:
    """
    Build a prompt for the LLM and return the generated safety report.
    """
    prompt = f"""
    User scanned a medicine label: "{ocr_text}"
    Known clinical interactions: {interactions}
    Personal health context: {context}

    Provide a short, plain‑language report that tells the user whether the medication is safe
    to take, or if a warning is needed. Use the format:
    "SAFE: ..." or "WARNING: ..."
    """
    # Replace the following line with your LLM call (e.g., OpenAI, Anthropic, etc.)
    # response = llm.complete(prompt)
    # For demo purposes we return a hard‑coded warning:
    return "WARNING: You are taking Advil (Ibuprofen) while having a history of stomach ulcers. Consult a doctor before use."

# Example:
# report = generate_safety_report(extracted_text, interactions, context)
# print(report)

第 5 步 – 使用 RAGas 进行评估

RAGas 帮助你衡量 faithfulness（答案是否忠实于来源？）和 answer relevance（答案是否针对用户的问题？）。

from ragas import evaluate
from datasets import Dataset

# Assume `generated_report` is the string returned by `generate_safety_report`
generated_report = generate_safety_report(extracted_text, interactions, context)

# Build a tiny evaluation dataset
data_samples = {
    "question": ["Can I take Advil with my current meds?"],
    "answer": [generated_report],
    "contexts": [[f"{interactions} {context}"]],
    "ground_truth": ["WARNING: Ibuprofen conflicts with ulcer history. Consult a physician."]
}

eval_dataset = Dataset.from_dict(data_samples)

# Run RAGas evaluation (you may need to configure the LLM and embedding models)
metrics = evaluate(eval_dataset, metrics=["faithfulness", "answer_relevance"])
print(metrics)

得到的分数可以告诉你管道是否出现幻觉，或是否仍然基于检索到的证据。

🎉 你已经构建了一个多模态 RAG 系统，具备以下功能：

从照片中读取药品标签。
通过 OCR 提取药品名称。
使用 RxNav 查询相互作用。
使用存储在 ChromaDB 中的本地、患者特定上下文丰富答案。
使用大语言模型生成简明的安全报告。
使用 RAGas 验证输出。

可以进一步扩展系统，例如：

批处理：一次处理多颗药片。
语音助手集成（如 Alexa、Google Assistant）。
个人健康记录的安全存储（加密、设备本地）。

祝你玩得开心，保持安全！ 🚀

taset = Dataset.from_dict(data_samples)
# score = evaluate(dataset, metrics=[faithfulness, answer_relevance])
# print(score)

结论：健康科技的未来

通过将用于视觉的 PaddleOCR、用于医学真相的 RxNav 和用于个性化上下文的 ChromaDB 结合起来，我们构建了一个真正能拯救生命的强大工具。多模态 RAG 正在快速发展，这仅仅是冰山一角！

接下来是什么？

尝试使用 CNN 添加 药丸识别 功能。
集成 语音转文字，让用户可以免提提问。

如果你喜欢这个项目，请在下方留言或 🦄 点赞本文！别忘了访问 WellAlly Tech 获取更多高阶 AI 教程。

编码愉快！

医学百科全书 2.0：停止猜测，开始扫描，使用 Multimodal RAG

架构概览

前置条件

第1步 – 使用 PaddleOCR 提取成分

步骤 2 – 查询 RxNav API 以获取相互作用

第3步 – 使用本地上下文增强（ChromaDB）

“官方”构建 AI 代理的方式

第4步 – 使用大型语言模型生成安全报告

第 5 步 – 使用 RAGas 进行评估

🎉 你已经构建了一个多模态 RAG 系统，具备以下功能：

结论：健康科技的未来

接下来是什么？

相关文章

为什么你的 AI 编码代理成本呈指数增长（以及该如何应对）

让 Amazon Bedrock AgentCore 网关可访问（仅通过 CloudFront）

重新定义 Google Cloud 上的事件驱动架构

你的手机已经拥有能够证明照片真实的硬件，但没有人使用它。

架构概览

前置条件

第1步 – 使用 PaddleOCR 提取成分

步骤 2 – 查询 RxNav API 以获取相互作用

第3步 – 使用本地上下文增强（ChromaDB）

“官方”构建 AI 代理的方式

第4步 – 使用大型语言模型生成安全报告

第 5 步 – 使用 RAGas 进行评估

🎉 你已经构建了一个多模态 RAG 系统，具备以下功能：

结论：健康科技的未来

接下来是什么？

相关文章

为什么你的 AI 编码代理成本呈指数增长（以及该如何应对）

让 Amazon Bedrock AgentCore 网关可访问（仅通过 CloudFront）

重新定义 Google Cloud 上的事件驱动架构

你的手机已经拥有能够证明照片真实的硬件，但没有人使用它。

步骤 2 – 查询 RxNav API 以获取相互作用

第 5 步 – 使用 RAGas 进行评估