How I Built ReadMenuAI: Solving the 'Poetic' Chinese Menu Problem with GenAI
Source: Dev.to
Hi everyone, Iām DoubleZ! š
Have you ever walked into a Chinese restaurant, faced a menu full of beautiful but confusing characters, and had no idea what to order? Or worse, ordered a dish only to find it looks nothing like what you imagined?
This is a common pain point for many expats and travelers. Traditional translation apps often fail because Chinese menus are a uniquely difficult data source: artistic fonts, handwritten text, inconsistent layouts, and āpoeticā dish names that shouldnāt be translated literally (e.g., āHusband and Wife Lung Slicesā ā 夫妻čŗē).
Simple translation isnāt enough; users need visual context. Thatās why I built ReadMenuAI.
What is ReadMenuAI?
ReadMenuAI is an AIāpowered tool that helps users āseeā and understand Chinese menus. You simply upload a photo, and the AI transforms it into a digital, interactive experience.
- ā OCR & Extraction: Detects dish names and prices accurately.
- š Contextual Translation: Translates names while explaining ingredients and allergens.
- š¼ļø AI Image Generation (The āWowā Factor): Generates highāquality, representative photos for dishes that donāt have pictures.
- š° Travel Utilities: Realātime currency conversion and audio pronunciation for easy ordering.
The Technical Deep Dive: The AI Stack
Advanced OCR with Qwen3āVision
I used the latest Qwen3 (Tongyi Qianwen) multimodal models. Unlike standard OCR, these models excel at:
- Recognizing handwritten or highly stylized calligraphy.
- Maintaining the spatial relationship between a dish name and its price across messy layouts.
Multimodal Parsing & Semantic Translation
The extracted text is fed into an LLM (Large Language Model) to go beyond literal translation. It identifies:
- The āRealā Meaning: Explaining that āAnts Climbing a Treeā is actually glass noodles with minced pork.
- Dietary Specs: Automatically tagging dishes as vegetarian, spicy, or containing common allergens.
Visual Context via Image Generation
For menus without photos, I integrated Tongyi Wanxiang 2 (TextātoāImage). This builds immediate trustāwhen a user sees a generated image of the dish, the āordering anxietyā disappears.
The FullāStack Architecture
As a solo developer, I needed a stack that was fast to deploy but robust enough to handle heavy AI workloads.
- Frontend: Next.js 14 + Tailwind CSS + Shadcn UI
- Database & Auth: Supabase
- Caching: Upstash (Redis)
- Storage: Cloudflare R2
- Background Jobs: Trigger.dev (crucial for handling longārunning AI image generation)
- Deployment: Cloudflare Workers
Engineering Challenge: Handling LongāRunning Tasks
Image generation and deep parsing can take 10ā20āÆsecondsātoo long for a standard serverless function timeout.
Instead of setting up a separate heavy backend, I used Trigger.dev, which allowed me to:
- Offload the image generation to a background queue.
- Maintain a clean Next.js project structure.
- Provide realātime progress updates to the user via webhooks.
From Idea to Launch
ReadMenuAI is my second major AI project. It implements the full lifecycle of a SaaS: Authentication, creditābased billing (Stripe), and internationalization (supporting 12 languages).
Beyond the tech, this project is about cultural connection. By removing the language barrier in restaurants, weāre making local culture more accessible and ādeliciousā for everyone.
Try it out here: readmenuai.com