Stop Copy-Pasting from Images: Build a Universal Screen Translator with Python
Source: Dev.to
Lingo‑Live started with a frustration many of us have felt: trying to copy text from a YouTube video (or any on‑screen content) is impossible.
Most of us end up either typing everything by hand or pulling out our phones and using Google Lens—clunky, focus‑breaking, and far from ideal.
Lingo‑Live is a sleek desktop app that lets you translate anything you see on your screen instantly, feeling like a super‑power.
- Invisible – runs quietly in the background
- Instant – hit a hotkey, select an area, get a translation
- Modern – glassy UI, dark mode, blur effects
Press Ctrl + Alt + T, drag over any part of your screen, and the translated text appears on top of whatever you’re doing.
The “Glass” Overlay
The trickiest part was creating a window that stays on top without being annoying. I used CustomTkinter to build a frameless, translucent overlay that feels light and modern.
- Always on top so translations stay visible
- Semi‑transparent so you can still see context underneath
- Frameless – no title bar; custom drag‑and‑drop instead
The result feels less like an app and more like a layer on your desktop.
The Eyes (OCR)
When you trigger the hotkey, Lingo‑Live doesn’t try to “read the screen.” Instead, it:
- Lets you select a region
- Takes a screenshot of that area
- Sends it to Tesseract OCR to extract text from the pixels
# OCR step
from PIL import ImageGrab
screenshot = ImageGrab.grab(bbox=(x1, y1, x2, y2))
text = ocr_engine.extract_text(screenshot)
That’s where the magic starts—turning images into actual text.
The Brain (Translation)
Once OCR gives us something like こんにちは, we need a translation that actually makes sense. This is where Lingo.dev comes in. Instead of raw dictionary swaps, it handles context properly, which makes a huge difference—especially for UI text, error messages, and game dialogue. The result feels natural, not robotic.
The Voice (Text‑to‑Speech)
Sometimes you don’t want to read; you just want to hear it. I added Edge TTS, which uses the same high‑quality voices found in Microsoft Edge. Lingo‑Live can now read translations out loud—great for pronunciation or staying hands‑free.
“Fish are vertebrate animals that live in water…”
Leveling Up: AI Summarization
Full translations are great, but sometimes you just want the gist. A Summarize button powered by Google Gemini does the trick:
- The translated text is sent to Gemini
- Gemini returns a clean, one‑sentence summary
You get the point instantly—perfect for skimming foreign articles, long error messages, or RPG dialogue dumps.
Make It Yours: Settings That Actually Matter
Lingo‑Live includes a full settings system backed by JSON, allowing you to tailor the experience:
- Change the hotkey (e.g.,
Alt + Z) - Switch themes (dark mode is the default)
- Pick different fonts (e.g., Roboto → Segoe UI)
{
"hotkey": "Ctrl+Alt+T",
"theme": "dark",
"font": "Segoe UI"
}
Getting Started
The repository is open‑source. Clone it, explore the code, and stop copy‑pasting from pixels.
git clone https://github.com/Samar-365/lingo_live.git
Acknowledgements
Special thanks to @sumitsaurabh927 and @maxprilutskiy for their continuous guidance throughout the hackathon and for providing this great opportunity.
Happy coding!