VoxTube – Convert YouTube videos to audio with local TTS
Source: Dev.to
Problem
I kept queuing YouTube tutorials and talks but never watching them. Video demands attention in a way that audio doesn’t.
Solution
VoxTube extracts transcripts from YouTube videos and converts them to audio using high‑quality TTS, so I can “watch” YouTube during my commute, while cooking, and during workouts.
Technical details
- Built with Bun + Hono (~300 lines)
- Uses Kokoro TTS (runs locally via Docker)
- Caches generated audio
- No cloud dependencies
What I learned
- Bun’s file APIs are really nice for streaming audio.
- Modern TTS (Kokoro) sounds surprisingly natural.
- Most YouTube videos have transcripts available.
Stats
- 2 weeks to MVP
- ~300 lines of code
- $0 monthly costs (runs locally)
GitHub: