단순 사진을 넘어: SAM과 GPT-4o로 픽셀-퍼펙트 칼로리 추정기 구축

발행: 3개월 전 (2026년 1월 28일 오전 09:45 GMT+9)

13 분 소요

원문: Dev.to

Source: Dev.to

사진 그 이상: SAM과 GPT‑4o를 활용한 픽셀‑정밀 칼로리 추정기 만들기

최근 AI 이미지 모델이 눈에 띄게 발전하면서, 단순히 사진을 인식하는 수준을 넘어 실제 세계 문제를 해결하는 데 활용할 수 있는 가능성이 커지고 있습니다. 이번 포스트에서는 Segment Anything Model (SAM) 과 GPT‑4o 를 결합해, 사진 한 장만으로 음식의 칼로리를 추정하는 시스템을 만드는 과정을 단계별로 살펴보겠습니다.

핵심 목표
1️⃣ 이미지에서 음식 영역을 정확히 분리한다.
2️⃣ 분리된 영역을 기반으로 음식 종류와 양을 추정한다.
3️⃣ 추정된 양을 칼로리 데이터베이스와 매핑해 최종 칼로리를 계산한다.

1. 프로젝트 구조

calorie-estimator/
├─ data/
│   └─ calorie_db.csv          # 음식별 100g당 칼로리 정보
├─ notebooks/
│   └─ demo.ipynb              # 전체 파이프라인 시연
├─ src/
│   ├─ sam_wrapper.py          # SAM 호출 래퍼
│   ├─ gpt4o_prompt.py         # GPT‑4o 프롬프트 템플릿
│   └─ estimator.py            # 최종 칼로리 계산 로직
└─ requirements.txt

2. SAM으로 음식 영역 분리하기

SAM은 Zero‑Shot 이미지 세그멘테이션 모델로, 별도의 라벨링 없이도 객체를 마스크 형태로 추출할 수 있습니다. sam_wrapper.py 에서는 Hugging Face 🤗 segment-anything 라이브러리를 래핑해 간단히 사용할 수 있도록 구현했습니다.

# src/sam_wrapper.py
from segment_anything import SamPredictor, sam_model_registry

class SAMWrapper:
    def __init__(self, model_type="vit_h", checkpoint_path="sam_vit_h_4b8939.pth"):
        self.model = sam_model_registry[model_type](checkpoint=checkpoint_path)
        self.predictor = SamPredictor(self.model)

    def segment(self, image, point_coords, point_labels):
        """
        image: numpy.ndarray (H, W, 3)
        point_coords: [[x, y], ...]  # 사용자가 클릭한 좌표
        point_labels: [1, 0, ...]    # 1 = foreground, 0 = background
        """
        self.predictor.set_image(image)
        masks, _, _ = self.predictor.predict(
            point_coords=point_coords,
            point_labels=point_labels,
            multimask_output=False,
        )
        return masks[0]   # (H, W) binary mask

팁: point_coords 에는 음식이 포함된 영역을 클릭한 좌표를 넣고, point_labels 에는 모두 1(전경) 로 지정하면 됩니다. 배경을 명시하고 싶다면 0을 추가해 주세요.

3. GPT‑4o에게 “이 음식은 뭐야?” 물어보기

SAM이 만든 마스크를 바탕으로 해당 영역을 crop 한 뒤, GPT‑4o에게 이미지와 함께 프롬프트를 전달합니다. 여기서는 gpt4o_prompt.py 에서 프롬프트 템플릿을 정의했습니다.

# src/gpt4o_prompt.py
def build_prompt(cropped_image_path):
    return f"""You are a nutrition expert. Identify the food item(s) in the attached image.
Provide the name of each food and an estimated weight in grams. 
If you are unsure, give a best‑guess range.

Respond in JSON format:
{{
  "items": [
    {{
      "name": "<food name>",
      "weight_g": <estimated weight>
    }},
    ...
  ]
}}"""

핵심 포인트

JSON 형식으로 응답을 요구해 파싱을 쉽게 함.
추정 무게를 물어 정확한 칼로리 계산이 가능하도록 함.
“best‑guess range”를 허용해 GPT‑4o가 불확실성을 표현하도록 함.

4. 칼로리 데이터베이스와 매핑

calorie_db.csv 는 USDA FoodData Central에서 추출한 100 g당 평균 칼로리 정보를 담고 있습니다.

name,calories_per_100g
apple,52
banana,89
chicken breast,165
rice,130
...

estimator.py 에서는 GPT‑4o가 반환한 JSON을 읽어, 각 음식에 대해 (weight_g / 100) × calories_per_100g 을 계산합니다.

# src/estimator.py
import pandas as pd
import json

class CalorieEstimator:
    def __init__(self, db_path="data/calorie_db.csv"):
        self.db = pd.read_csv(db_path).set_index("name")

    def estimate(self, gpt_response_json):
        data = json.loads(gpt_response_json)
        total_cal = 0
        details = []

        for item in data["items"]:
            name = item["name"].lower()
            weight = item["weight_g"]
            if name in self.db.index:
                cal_per_100g = self.db.loc[name, "calories_per_100g"]
                cal = (weight / 100) * cal_per_100g
                total_cal += cal
                details.append({"name": name, "weight_g": weight, "calories": cal})
            else:
                details.append({"name": name, "weight_g": weight, "calories": None})

        return {"total_calories": total_cal, "details": details}

5. 전체 파이프라인 실행 (Notebook 예시)

demo.ipynb 에서는 위 모듈들을 차례대로 호출해 “한 장의 사진 → 칼로리 추정” 흐름을 시연합니다.

# notebooks/demo.ipynb (핵심 코드)
from src.sam_wrapper import SAMWrapper
from src.gpt4o_prompt import build_prompt
from src.estimator import CalorieEstimator
import openai
import cv2
import matplotlib.pyplot as plt

# 1️⃣ 이미지 로드
img = cv2.imread("sample_meal.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# 2️⃣ SAM으로 마스크 생성 (예: 중앙에 클릭)
sam = SAMWrapper()
mask = sam.segment(img_rgb, point_coords=[[250, 300]], point_labels=[1])

# 3️⃣ 마스크 기반 crop
x, y, w, h = cv2.boundingRect(mask.astype('uint8'))
cropped = img_rgb[y:y+h, x:x+w]
cv2.imwrite("cropped_food.png", cv2.cvtColor(cropped, cv2.COLOR_RGB2BGR))

# 4️⃣ GPT‑4o에 프롬프트 전송
prompt = build_prompt("cropped_food.png")
response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": [{"type": "text", "text": prompt},
                                            {"type": "image_url", "image_url": {"url": "file://cropped_food.png"}}]}],
    temperature=0.0,
)

gpt_json = response.choices[0].message.content
print("GPT‑4o response:", gpt_json)

# 5️⃣ 칼로리 계산
estimator = CalorieEstimator()
result = estimator.estimate(gpt_json)
print(f"총 추정 칼로리: {result['total_calories']:.1f} kcal")
print("세부 내역:", result["details"])

결과 예시

GPT‑4o response: {
  "items": [
    {"name": "chicken breast", "weight_g": 150},
    {"name": "steamed rice", "weight_g": 200}
  ]
}
총 추정 칼로리: 447.5 kcal
세부 내역: [
  {"name": "chicken breast", "weight_g": 150, "calories": 247.5},
  {"name": "steamed rice", "weight_g": 200, "calories": 200.0}
]

6. 한계점 및 향후 개선 방향

구분	현재 한계	개선 아이디어
음식 종류 인식	GPT‑4o는 이미지 품질에 민감하고, 복합 요리(예: 파스타 샐러드)에서는 혼동 가능	멀티‑모달 파인‑튜닝 모델(예: CLIP + 라벨링) 도입
무게 추정 정확도	사람 눈에 의존한 “best‑guess” 방식 → 오차 ±30 %	이미지 메트릭(픽셀‑밀도) + 실제 식기 크기(컵, 포크) 정보를 활용
칼로리 데이터베이스	제한된 품목(≈500개)	공개 데이터셋(FAO, USDA) 전체를 통합하고, 사용자 정의 식품 추가 기능 제공
실시간 사용성	현재는 Notebook 기반 → 배포가 어려움	FastAPI + Streamlit 으로 웹 서비스화, 모바일 앱 연동

7. 마무리

SAM과 GPT‑4o를 결합하면 “이미지 → 의미 파악 → 정량적 추정” 이라는 강력한 파이프라인을 손쉽게 구축할 수 있습니다. 이번 예시에서는 음식 사진을 통해 칼로리를 추정했지만, 동일한 구조를 의료 이미지 분석, 재고 관리, 디자인 피드백 등 다양한 도메인에 적용할 수 있습니다.

핵심 교훈
Zero‑Shot 모델을 적절히 조합하면 라벨링 비용 없이도 실용적인 AI 서비스를 만들 수 있다.
프롬프트 설계와 구조화된 응답(JSON) 은 LLM 결과를 자동화 파이프라인에 매끄럽게 연결하는 핵심이다.

앞으로도 최신 멀티‑모달 모델이 출시될수록, 이런 모듈형 접근이 AI 솔루션 개발의 표준이 될 것으로 기대됩니다. 여러분도 직접 손에 넣은 사진을 가지고 실험해 보세요! 🚀

소개

우리는 모두 그런 경험이 있습니다: 맛있는 파스타 한 접시를 바라보며, 모든 그램을 피트니스 앱에 수동으로 기록하려고 애쓰는 순간 말이죠. 이 과정은 번거롭고, “낙관적인” 인간 오류가 발생하기 쉬우며, 솔직히 말해 식사의 즐거움을 망칩니다. 픽셀을 직접 영양 데이터로 변환할 수 있다면 어떨까요?

이 튜토리얼에서는 Meta의 Segment Anything Model (SAM) 과 GPT‑4o 의 추론 능력을 결합하여 Multimodal Dietary Analysis Engine 을 구축합니다. 이 시스템은 음식 항목을 분리하고, 기준 기반 스케일링을 사용해 부피를 추정한 뒤, 상세한 영양 성분 분석을 출력합니다.

아키텍처 개요

graph TD
    A[User Uploads Image] --> B[OpenCV Preprocessing]
    B --> C[SAM: Segment Anything Model]
    C --> D{Mask Generation}
    D -->|Isolate Food| E[GPT-4o Multimodal Analysis]
    D -->|Reference Object| E
    E --> F[Nutritional Estimation Engine]
    F --> G[FastAPI Response: Calories, Macros, Confidence Score]

Required Stack

PyTorch – SAM 가중치를 실행하기 위해.
Segment Anything (SAM) – Meta의 사전 학습된 비전 모델.
GPT‑4o API – 멀티모달 “뇌”.
FastAPI – 프로덕션 준비 마이크로서비스를 노출하기 위해.
OpenCV – 이미지 조작을 위해.

SAM을 이용한 음식 분할

import torch
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np

# Load the SAM model
sam_checkpoint = "sam_vit_h_4b8939.pth"
model_type = "vit_h"
device = "cuda" if torch.cuda.is_available() else "cpu"

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)
predictor = SamPredictor(sam)

def get_food_segment(image_path):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    predictor.set_image(image)

    # Simple prompt: center of the image
    input_point = np.array([[image.shape[1] // 2, image.shape[0] // 2]])
    input_label = np.array([1])

    masks, scores, logits = predictor.predict(
        point_coords=input_point,
        point_labels=input_label,
        multimask_output=True,
    )
    return masks[0]  # Most confident mask

GPT‑4o를 활용한 영양 분석

import base64
from openai import OpenAI

client = OpenAI()

def analyze_nutrition(image_path, mask_data):
    # Encode image as base64
    with open(image_path, "rb") as f:
        base64_image = base64.b64encode(f.read()).decode('utf-8')

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "You are a professional nutritionist. Analyze the food in the segmented area. Use surrounding objects (forks, plates) to estimate volume."
            },
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Estimate the calories and macronutrients for the food highlighted in this image."},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
                ]
            }
        ],
        response_format={"type": "json_object"}
    )
    return response.choices[0].message.content

FastAPI 엔드포인트

from fastapi import FastAPI, UploadFile, File
import shutil

app = FastAPI()

@app.post("/analyze-meal")
async def analyze_meal(file: UploadFile = File(...)):
    # 1. Save uploaded file temporarily
    temp_path = f"temp_{file.filename}"
    with open(temp_path, "wb") as buffer:
        shutil.copyfileobj(file.file, buffer)

    # 2. Run SAM segmentation
    mask = get_food_segment(temp_path)

    # 3. Call GPT‑4o for nutritional analysis
    nutrition_data = analyze_nutrition(temp_path, mask)

    return {"status": "success", "data": nutrition_data}

프로덕션 고려 사항

코드가 취미 프로젝트에서는 동작하지만, 프로덕션‑급 헬스 앱은 다음이 필요합니다:

강력한 오류 처리(예: 저조도 이미지, 겹치는 음식).
요청/응답 검증을 위한 Pydantic 모델.
사용자 수정을 위한 실시간 피드백 루프.

더 깊은 아키텍처 패턴과 헬스 기술 분야 AI 가시성을 원한다면 WellAlly Tech Blog를 참고하세요(프로덕션‑준비된 AI 헬스 솔루션을 위한 훌륭한 리소스).

다음 단계

Reference Object Detection 단계(예: YOLOv8)를 추가하여 스케일링 정확도를 향상시킵니다.
사용자가 추정된 부분 크기를 확인하거나 조정할 수 있는 피드백 루프를 구현합니다.

멀티모달 AI로 무엇을 만들고 있나요? 프로젝트를 공유하거나 댓글로 질문해 주세요!