Pill-ID: CLIP와 Milvus를 활용한 약물 안전을 위한 시각적 RAG 시스템 구축

발행: 0개월 전 (2026년 4월 7일 오전 09:30 GMT+9)

5 분 소요

원문: Dev.to

Source: Dev.to

번역을 진행하려면 번역하고자 하는 본문 텍스트를 제공해 주세요. 현재는 링크만 포함되어 있어 실제 내용이 없으므로 번역을 수행할 수 없습니다. 텍스트를 복사해서 알려 주시면 바로 한국어로 번역해 드리겠습니다.

소개

Medication errors are a silent crisis, but with Visual Retrieval‑Augmented Generation (Visual RAG) and multimodal AI we can build systems that “see” and verify medication in real‑time. In this tutorial we’ll build Pill‑ID, a cross‑check system that uses computer vision and vector databases to identify pills from a photo and verify them against an electronic prescription. We’ll leverage:

CLIP for multimodal embeddings
Milvus for high‑speed similarity search
FastAPI to expose the verification API

Unlike traditional RAG, which focuses on text, Visual RAG lets us query a database using image features—searching for a vector that represents the shape, color, and texture of a specific pill.

graph TD
    A[User Takes Photo] --> B[OpenCV: Image Preprocessing]
    B --> C[CLIP: Generate Image Embedding]
    C --> D[Milvus: Vector Similarity Search]
    D --> E[Retrieve Metadata: Pill Name, Dosage]
    E --> F[FastAPI: Cross‑check with Prescription]
    F --> G{Match Found?}
    G -- Yes --> H[🚀 Safety Verified]
    G -- No --> I[⚠️ Warning: Dosage Mismatch]

Tech Stack

Python 3.9+
OpenCV – 이미지 조작
CLIP (OpenAI) – 텍스트와 이미지 사이의 다리
Milvus – 고성능 벡터 데이터베이스
FastAPI – 초고속 API 프레임워크

Milvus 설정 (시각적 지문)

Milvus는 CLIP이 생성한 512‑차원 벡터를 저장합니다.

# pymilvus setup
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection

# Connect to Milvus
connections.connect("default", host="localhost", port="19530")

# Define schema: primary key, image vector, and metadata
fields = [
    FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="pill_vector", dtype=DataType.FLOAT_VECTOR, dim=512),
    FieldSchema(name="pill_name", dtype=DataType.VARCHAR, max_length=200),
    FieldSchema(name="dosage_mg", dtype=DataType.INT64)
]

schema = CollectionSchema(fields, "Pill identification collection")
pill_collection = Collection("pill_registry", schema)

CLIP으로 이미지 임베딩 생성

We use the sentence‑transformers wrapper for CLIP (ViT‑B‑32).

import torch
from PIL import Image
from sentence_transformers import SentenceTransformer

# Load the CLIP model
model = SentenceTransformer('clip-ViT-B-32')

def get_image_embedding(image_path):
    """이미지를 512‑차원 벡터로 인코딩합니다."""
    img = Image.open(image_path)
    embedding = model.encode(img)
    return embedding.tolist()

알약 이미지 전처리 (OpenCV 사용)

원본 사진에는 배경 잡음이 포함되는 경우가 많습니다. 이 함수는 알약을 잘라내고 배경을 제거합니다.

import cv2
import numpy as np

def preprocess_pill_image(image_bytes):
    """Return a cropped image focusing on the pill."""
    nparr = np.frombuffer(image_bytes, np.uint8)
    img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)

    # Convert to grayscale and apply threshold
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    _, thresh = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY_INV)

    # Find the largest contour (assumed to be the pill)
    contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    if contours:
        c = max(contours, key=cv2.contourArea)
        x, y, w, h = cv2.boundingRect(c)
        cropped_pill = img[y:y+h, x:x+w]
        return cropped_pill
    return img

검증을 위한 FastAPI 엔드포인트

The endpoint receives a prescription name and a photo, then performs the cross‑check.

from fastapi import FastAPI, UploadFile, File

app = FastAPI()

@app.post("/verify-medication/")
async def verify_medication(prescribed_name: str, file: UploadFile = File(...)):
    # 1. Preprocess and embed the image
    image_bytes = await file.read()
    processed_img = preprocess_pill_image(image_bytes)
    vector = get_image_embedding(processed_img)

    # 2. Search Milvus for the closest match
    search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
    results = pill_collection.search(
        data=[vector],
        anns_field="pill_vector",
        param=search_params,
        limit=1,
        output_fields=["pill_name"]
    )

    detected_name = results[0][0].entity.get("pill_name")

    # 3. Cross‑check logic
    if detected_name.lower() == prescribed_name.lower():
        return {"status": "MATCH", "message": f"Verified: {detected_name} identified."}
    else:
        return {
            "status": "WARNING",
            "message": f"Possible Mismatch! Found {detected_name} but prescription says {prescribed_name}."
        }

프로덕션 고려 사항

개념 증명은 간단하지만, 프로덕션 수준의 의료 솔루션을 구현하려면 다음이 필요합니다:

엄격한 검증 및 신뢰도 점수 매기기
새로운 알약 이미지의 지속적인 수집을 위한 견고한 데이터 파이프라인
건강 데이터 규정 준수 (예: HIPAA)

보다 깊은 아키텍처 패턴 및 멀티모달 모델의 클라우드 네이티브 스케일링에 대한 자세한 내용은 WellAlly Tech Blog의 기술 심층 분석을 참고하세요.

다음은?

블리스터‑팩 감지를 지원하도록 추가
FHIR(Fast Healthcare Interoperability Resources) API와 통합하여 실제 처방 데이터 활용
더 빠른 엣지 추론을 위해 ONNX를 사용해 모델 배포

벡터 검색이나 CLIP 구현에 대해 질문이 있으면 아래에 댓글을 남겨 주세요!

Pill-ID: CLIP와 Milvus를 활용한 약물 안전을 위한 시각적 RAG 시스템 구축

소개

Tech Stack

Milvus 설정 (시각적 지문)

CLIP으로 이미지 임베딩 생성

알약 이미지 전처리 (OpenCV 사용)

검증을 위한 FastAPI 엔드포인트

프로덕션 고려 사항

다음은?

관련 글

AI와 함께하는 학습 소셜 네트워크 구축: 도전과 배움

Crucix: 27개의 실시간 데이터 소스를 통합하는 오픈 OSINT 대시보드

주 5 도전 과제 제출 타이머 CND0

전사: 상원 의원 마크 워너, ‘Face the Nation with Margaret Brennan’