不仅仅是一张照片：使用 SAM 和 GPT-4o 构建像素级卡路里估算器

发布: 3个月前 (2026年1月28日 GMT+8 08:45)

5 分钟阅读

原文: Dev.to

Source: Dev.to

要为您提供完整的中文翻译，请把需要翻译的文章正文（包括所有段落、标题、列表等）粘贴在这里。请注意，代码块、URL 和技术术语将保持原样不翻译。提供正文后，我会按照您的要求保留原始格式并完成翻译。

介绍

我们都经历过这种情况：盯着一盘美味的意大利面，试图手动把每克重量记录到健身应用中。这既繁琐，又容易出现“乐观”的人为错误，坦白说，还破坏了用餐的乐趣。如果我们能把这些像素直接转化为营养数据呢？

在本教程中，我们通过结合 Meta 的 Segment Anything Model (SAM) 与 GPT‑4o 的推理能力，构建一个 多模态膳食分析引擎。系统能够分离食物项目，使用基于参考的缩放来估算体积，并输出详细的营养成分分析。

架构概览

graph TD
    A[User Uploads Image] --> B[OpenCV Preprocessing]
    B --> C[SAM: Segment Anything Model]
    C --> D{Mask Generation}
    D -->|Isolate Food| E[GPT-4o Multimodal Analysis]
    D -->|Reference Object| E
    E --> F[Nutritional Estimation Engine]
    F --> G[FastAPI Response: Calories, Macros, Confidence Score]

Required Stack

PyTorch – 用于运行 SAM 权重。
Segment Anything (SAM) – Meta 的预训练视觉模型。
GPT‑4o API – 多模态的“大脑”。
FastAPI – 用于提供生产就绪的微服务。
OpenCV – 用于图像处理。

食物分割与 SAM

import torch
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np

# Load the SAM model
sam_checkpoint = "sam_vit_h_4b8939.pth"
model_type = "vit_h"
device = "cuda" if torch.cuda.is_available() else "cpu"

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)
predictor = SamPredictor(sam)

def get_food_segment(image_path):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    predictor.set_image(image)

    # Simple prompt: center of the image
    input_point = np.array([[image.shape[1] // 2, image.shape[0] // 2]])
    input_label = np.array([1])

    masks, scores, logits = predictor.predict(
        point_coords=input_point,
        point_labels=input_label,
        multimask_output=True,
    )
    return masks[0]  # Most confident mask

使用 GPT‑4o 进行营养分析

import base64
from openai import OpenAI

client = OpenAI()

def analyze_nutrition(image_path, mask_data):
    # Encode image as base64
    with open(image_path, "rb") as f:
        base64_image = base64.b64encode(f.read()).decode('utf-8')

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "You are a professional nutritionist. Analyze the food in the segmented area. Use surrounding objects (forks, plates) to estimate volume."
            },
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Estimate the calories and macronutrients for the food highlighted in this image."},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
                ]
            }
        ],
        response_format={"type": "json_object"}
    )
    return response.choices[0].message.content

FastAPI 端点

from fastapi import FastAPI, UploadFile, File
import shutil

app = FastAPI()

@app.post("/analyze-meal")
async def analyze_meal(file: UploadFile = File(...)):
    # 1. Save uploaded file temporarily
    temp_path = f"temp_{file.filename}"
    with open(temp_path, "wb") as buffer:
        shutil.copyfileobj(file.file, buffer)

    # 2. Run SAM segmentation
    mask = get_food_segment(temp_path)

    # 3. Call GPT‑4o for nutritional analysis
    nutrition_data = analyze_nutrition(temp_path, mask)

    return {"status": "success", "data": nutrition_data}

生产考虑

虽然代码在业余项目中可以运行，但生产级健康应用需要：

强大的错误处理（例如，低光图像、食物重叠）。
使用 Pydantic 模型进行请求/响应验证。
实时反馈循环以纠正用户操作。

欲了解更深入的架构模式和健康技术中的 AI 可观测性，请参阅 WellAlly Tech Blog（这是一个关于生产就绪 AI 健康解决方案的极佳资源）。

下一步

添加一个 Reference Object Detection 步骤（例如 YOLOv8），以提高缩放精度。
实现一个反馈循环，让用户能够确认或调整估计的份量大小。

你在用多模态 AI 构建什么？在评论中分享你的项目或提问吧！

不仅仅是一张照片：使用 SAM 和 GPT-4o 构建像素级卡路里估算器

介绍

架构概览

Required Stack

食物分割与 SAM

使用 GPT‑4o 进行营养分析

FastAPI 端点

生产考虑

下一步

相关文章

Carbon Robotics 构建了一个用于检测和识别植物的 AI 模型

致那些在 AI 竞争中被落后的人们

多模态提示：下一前沿——如何在单个请求中同时处理文本、图像和文件

如何使用 AI 将原始产品照片转化为工作室质量图像