我如何构建 ReadMenuAI:用 GenAI 解决‘诗意’中文菜单问题
Source: Dev.to
Hi everyone, I’m DoubleZ! 👋
大家好,我是 DoubleZ!👋
Have you ever walked into a Chinese restaurant, faced a menu full of beautiful but confusing characters, and had no idea what to order? Or worse, ordered a dish only to find it looks nothing like what you imagined?
你是否曾经走进一家中餐馆,面对一张满是美丽却让人困惑的汉字菜单,却不知道点什么?更糟的是,点了菜后发现它根本不像你想象的样子?
This is a common pain point for many expats and travelers. Traditional translation apps often fail because Chinese menus are a uniquely difficult data source: artistic fonts, handwritten text, inconsistent layouts, and “poetic” dish names that shouldn’t be translated literally (e.g., “Husband and Wife Lung Slices” — 夫妻肺片).
这是一大批外籍人士和旅行者的共同痛点。传统翻译应用往往失效,因为中文菜单是一个极其特殊且困难的数据源:艺术字体、手写文字、不统一的排版以及不该直译的“诗意”菜名(例如 “夫妻肺片”)。
Simple translation isn’t enough; users need visual context. That’s why I built ReadMenuAI.
单纯的文字翻译并不足够,用户还需要 视觉上下文。这也是我创建 ReadMenuAI 的原因。
What is ReadMenuAI?
ReadMenuAI 是什么?
ReadMenuAI is an AI‑powered tool that helps users “see” and understand Chinese menus. You simply upload a photo, and the AI transforms it into a digital, interactive experience.
ReadMenuAI 是一款 AI 驱动的工具,帮助用户“看见”并理解中文菜单。只需上传一张照片,AI 即会把它转化为数字化、交互式的体验。
- ✅ OCR & Extraction: Detects dish names and prices accurately.
- ✅ OCR 与提取: 精准识别菜名和价格。
- 🌍 Contextual Translation: Translates names while explaining ingredients and allergens.
- 🌍 上下文翻译: 翻译菜名并解释配料及过敏原。
- 🖼️ AI Image Generation (The “Wow” Factor): Generates high‑quality, representative photos for dishes that don’t have pictures.
- 🖼️ AI 图像生成(“惊喜”因素): 为没有配图的菜品生成高质量、具代表性的图片。
- 💰 Travel Utilities: Real‑time currency conversion and audio pronunciation for easy ordering.
- 💰 旅行实用功能: 实时汇率换算和语音朗读,帮助轻松点餐。
The Technical Deep Dive: The AI Stack
技术深度解析:AI 栈
Advanced OCR with Qwen3‑Vision
使用 Qwen3‑Vision 的高级 OCR
I used the latest Qwen3 (Tongyi Qianwen) multimodal models. Unlike standard OCR, these models excel at:
我使用了最新的 Qwen3(通义千问) 多模态模型。与普通 OCR 不同,这些模型在以下方面表现出色:
- Recognizing handwritten or highly stylized calligraphy.
- 识别 手写或高度艺术化 的书法。
- Maintaining the spatial relationship between a dish name and its price across messy layouts.
- 在混乱的排版中保持菜名与价格之间的空间关系。
Multimodal Parsing & Semantic Translation
多模态解析与语义翻译
The extracted text is fed into an LLM (Large Language Model) to go beyond literal translation. It identifies:
提取的文字会送入大语言模型(LLM),以实现超越字面翻译的效果。模型能够识别:
- The “Real” Meaning: Explaining that “Ants Climbing a Tree” is actually glass noodles with minced pork.
- 真实含义: 解释 “蚂蚁上树” 实际上是粉丝配肉末。
- Dietary Specs: Automatically tagging dishes as vegetarian, spicy, or containing common allergens.
- 饮食属性: 自动标记菜品是否素食、是否辣、是否含常见过敏原。
Visual Context via Image Generation
通过图像生成提供视觉上下文
For menus without photos, I integrated Tongyi Wanxiang 2 (Text‑to‑Image). This builds immediate trust—when a user sees a generated image of the dish, the “ordering anxiety” disappears.
对于没有配图的菜单,我集成了 通义万象 2(文本到图像)。这能立刻建立信任——用户看到生成的菜品图片后,点餐的焦虑感会消失。
The Full‑Stack Architecture
完整全栈架构
As a solo developer, I needed a stack that was fast to deploy but robust enough to handle heavy AI workloads.
作为独立开发者,我需要一个部署快速、但足以支撑高强度 AI 计算的技术栈。
- Frontend: Next.js 14 + Tailwind CSS + Shadcn UI
- 前端: Next.js 14 + Tailwind CSS + Shadcn UI
- Database & Auth: Supabase
- 数据库与认证: Supabase
- Caching: Upstash (Redis)
- 缓存: Upstash(Redis)
- Storage: Cloudflare R2
- 存储: Cloudflare R2
- Background Jobs: Trigger.dev (crucial for handling long‑running AI image generation)
- 后台任务: Trigger.dev(处理长时间运行的 AI 图像生成至关重要)
- Deployment: Cloudflare Workers
- 部署: Cloudflare Workers
Engineering Challenge: Handling Long‑Running Tasks
工程挑战:处理长时间任务
Image generation and deep parsing can take 10–20 seconds—too long for a standard serverless function timeout.
图像生成和深度解析可能需要 10–20 秒,这超出了普通无服务器函数的超时限制。
Instead of setting up a separate heavy backend, I used Trigger.dev, which allowed me to:
与其搭建一个单独的重后端,我选择了 Trigger.dev,它让我能够:
- Offload the image generation to a background queue.
- 将图像生成任务转移到 后台队列。
- Maintain a clean Next.js project structure.
- 保持 Next.js 项目结构简洁。
- Provide real‑time progress updates to the user via webhooks.
- 通过 webhook 向用户实时推送进度更新。
From Idea to Launch
从想法到上线
ReadMenuAI is my second major AI project. It implements the full lifecycle of a SaaS: Authentication, credit‑based billing (Stripe), and internationalization (supporting 12 languages).
ReadMenuAI 是我的第二个大型 AI 项目,完整实现了 SaaS 的全生命周期:认证、基于额度的计费(Stripe)以及国际化(支持 12 种语言)。
Beyond the tech, this project is about cultural connection. By removing the language barrier in restaurants, we’re making local culture more accessible and “delicious” for everyone.
技术之外,这个项目更在于 文化连接。消除餐厅中的语言障碍,让本土文化变得更易接触,也更“美味”。
Try it out here: readmenuai.com