I built a local screen reader that reads your screen aloud — no cloud, no API keys
Source: Dev.to
What it does
- 🖱️ Draw a rectangle on any part of your screen
- 📸 Snapshot that region every N seconds
- 🔍 Pixel‑diff check – skips frames where nothing changed
- 🧠 LightOnOCR‑2‑1B reads the text (runs on AMD GPU via ROCm)
- 🗣️ Kokoro‑82M speaks it through your speakers (runs on CPU)
🖥️ screen → 🔍 diff → 🧠 OCR → ✨ clean text → 🗣️ TTS → 🔊 speakerAuto page‑turn (the killer feature)
Draw a second rectangle over any button on screen. After TTS finishes speaking and the screen stays idle, sttts automatically clicks it. I use this with Kindle for PC – it reads the entire book hands‑free, turning pages automatically.
# Draw OCR region, then draw the next‑page button
uv run python capture.py --next-btn -i 2Models used
- OCR: LightOnOCR‑2‑1B – fast, accurate, runs on AMD GPU via ROCm
- TTS: Kokoro‑82M – high quality, ~100 ms latency on CPU
Both models download automatically from HuggingFace on first run. No API keys, no subscriptions.
Smart idle detection
Pixel‑level diff comparison means OCR and TTS only fire when something actually changed. Reading a static page? Silent. New content loaded? Speaks immediately.
# Only trigger OCR when >1% of pixels changed
uv run python capture.py --diff-threshold 1.0Quick start
# Install system dependencies
sudo apt-get install -y slop xdotool libportaudio2 libsndfile1
# Install uv (a fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository and run
git clone https://github.com/paradisecy/sttts
cd sttts
uv sync
uv run python capture.pyUse cases
- 📖 Hands‑free ebook reading (Kindle, ePub readers, PDFs)
- 📊 Spoken updates for financial dashboards
- ♿ Accessibility tool for apps lacking native screen‑reader support
- 💻 Listen to terminal output or logs while working
- 🌐 Hear any webpage without a browser extension
Tech stack
- Python 3.13
- PyTorch 2.8 + ROCm 6.3 (AMD GPU)
Key libraries
mss– fast screen capturetransformers– OCR model handlingkokoro– TTS modelsounddevice– audio playbackslop+xdotool– region selection and mouse clicks
⭐ GitHub: