ReadMenuAI를 만든 방법: GenAI로 ‘시적인’ 중국 메뉴 문제 해결

발행: 2시간 전 (2025년 12월 12일 오후 06:59 GMT+9)

3 min read

Source: Dev.to

Hi everyone, I’m DoubleZ! 👋

Have you ever walked into a Chinese restaurant, faced a menu full of beautiful but confusing characters, and had no idea what to order? Or worse, ordered a dish only to find it looks nothing like what you imagined?

This is a common pain point for many expats and travelers. Traditional translation apps often fail because Chinese menus are a uniquely difficult data source: artistic fonts, handwritten text, inconsistent layouts, and “poetic” dish names that shouldn’t be translated literally (e.g., “Husband and Wife Lung Slices” — 夫妻肺片).

Simple translation isn’t enough; users need visual context. That’s why I built ReadMenuAI.

What is ReadMenuAI?

ReadMenuAI is an AI‑powered tool that helps users “see” and understand Chinese menus. You simply upload a photo, and the AI transforms it into a digital, interactive experience.

✅ OCR & Extraction: Detects dish names and prices accurately.
🌍 Contextual Translation: Translates names while explaining ingredients and allergens.
🖼️ AI Image Generation (The “Wow” Factor): Generates high‑quality, representative photos for dishes that don’t have pictures.
💰 Travel Utilities: Real‑time currency conversion and audio pronunciation for easy ordering.

The Technical Deep Dive: The AI Stack

Advanced OCR with Qwen3‑Vision

I used the latest Qwen3 (Tongyi Qianwen) multimodal models. Unlike standard OCR, these models excel at:

Recognizing handwritten or highly stylized calligraphy.
Maintaining the spatial relationship between a dish name and its price across messy layouts.

Multimodal Parsing & Semantic Translation

The extracted text is fed into an LLM (Large Language Model) to go beyond literal translation. It identifies:

The “Real” Meaning: Explaining that “Ants Climbing a Tree” is actually glass noodles with minced pork.
Dietary Specs: Automatically tagging dishes as vegetarian, spicy, or containing common allergens.

Visual Context via Image Generation

For menus without photos, I integrated Tongyi Wanxiang 2 (Text‑to‑Image). This builds immediate trust—when a user sees a generated image of the dish, the “ordering anxiety” disappears.

The Full‑Stack Architecture

As a solo developer, I needed a stack that was fast to deploy but robust enough to handle heavy AI workloads.

Frontend: Next.js 14 + Tailwind CSS + Shadcn UI
Database & Auth: Supabase
Caching: Upstash (Redis)
Storage: Cloudflare R2
Background Jobs: Trigger.dev (crucial for handling long‑running AI image generation)
Deployment: Cloudflare Workers

Engineering Challenge: Handling Long‑Running Tasks

Image generation and deep parsing can take 10–20 seconds—too long for a standard serverless function timeout.

Instead of setting up a separate heavy backend, I used Trigger.dev, which allowed me to:

Offload the image generation to a background queue.
Maintain a clean Next.js project structure.
Provide real‑time progress updates to the user via webhooks.

From Idea to Launch

ReadMenuAI is my second major AI project. It implements the full lifecycle of a SaaS: Authentication, credit‑based billing (Stripe), and internationalization (supporting 12 languages).

Beyond the tech, this project is about cultural connection. By removing the language barrier in restaurants, we’re making local culture more accessible and “delicious” for everyone.

Try it out here: readmenuai.com

ReadMenuAI를 만든 방법: GenAI로 ‘시적인’ 중국 메뉴 문제 해결

What is ReadMenuAI?

The Technical Deep Dive: The AI Stack

Advanced OCR with Qwen3‑Vision

Multimodal Parsing & Semantic Translation

Visual Context via Image Generation

The Full‑Stack Architecture

Engineering Challenge: Handling Long‑Running Tasks

From Idea to Launch

관련 글

Salesforce Data 360 객체 이해: 통합 고객 프로필의 핵심

RAG 파이프라인에서 도메인 맞춤형 리트리버 노이즈 완화를 위한 파인튜닝

AI 검색이 온라인 가시성 작동 방식을 재편하고 있으며, 대부분의 브랜드는 다가올 일에 대비하지 못하고 있다.

NFTs 간단히 설명 – 2025년에 실제로 일어나고 있는 일은?