I built a local screen reader that reads your screen aloud — no cloud, no API keys

Published: 3 weeks ago (April 11, 2026 at 08:30 AM EDT)

2 min read

Source: Dev.to

Source: Dev.to

What it does

🖱️ Draw a rectangle on any part of your screen
📸 Snapshot that region every N seconds
🔍 Pixel‑diff check – skips frames where nothing changed
🧠 LightOnOCR‑2‑1B reads the text (runs on AMD GPU via ROCm)
🗣️ Kokoro‑82M speaks it through your speakers (runs on CPU)

🖥️ screen → 🔍 diff → 🧠 OCR → ✨ clean text → 🗣️ TTS → 🔊 speaker

Auto page‑turn (the killer feature)

Draw a second rectangle over any button on screen. After TTS finishes speaking and the screen stays idle, sttts automatically clicks it. I use this with Kindle for PC – it reads the entire book hands‑free, turning pages automatically.

# Draw OCR region, then draw the next‑page button
uv run python capture.py --next-btn -i 2

Models used

OCR: LightOnOCR‑2‑1B – fast, accurate, runs on AMD GPU via ROCm
TTS: Kokoro‑82M – high quality, ~100 ms latency on CPU

Both models download automatically from HuggingFace on first run. No API keys, no subscriptions.

Smart idle detection

Pixel‑level diff comparison means OCR and TTS only fire when something actually changed. Reading a static page? Silent. New content loaded? Speaks immediately.

# Only trigger OCR when >1% of pixels changed
uv run python capture.py --diff-threshold 1.0

Quick start

# Install system dependencies
sudo apt-get install -y slop xdotool libportaudio2 libsndfile1

# Install uv (a fast Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository and run
git clone https://github.com/paradisecy/sttts
cd sttts
uv sync
uv run python capture.py

Use cases

📖 Hands‑free ebook reading (Kindle, ePub readers, PDFs)
📊 Spoken updates for financial dashboards
♿ Accessibility tool for apps lacking native screen‑reader support
💻 Listen to terminal output or logs while working
🌐 Hear any webpage without a browser extension

Tech stack

Python 3.13
PyTorch 2.8 + ROCm 6.3 (AMD GPU)

Key libraries

mss – fast screen capture
transformers – OCR model handling
kokoro – TTS model
sounddevice – audio playback
slop + xdotool – region selection and mouse clicks

⭐ GitHub:

I built a local screen reader that reads your screen aloud — no cloud, no API keys

What it does

Auto page‑turn (the killer feature)

Models used

Smart idle detection

Quick start

Use cases

Tech stack

Related posts

Aadi-Tech Vault: Personal Security Reimagined

Show HN: Editing 2000 photos made me build a macOS bulk photo editor

How I automated 62% of Europe's RGAA accessibility criteria

you can prove you know a secret without ever revealing it. no, seriously. the math actually works. wrote a breakdown of how ZK proofs work