I Built a Voice-to-Code VS Code Extension That Runs Entirely On-Device
Source: Dev.to
Every AI coding assistant requires typing. GitHub Copilot, Continue, Kiro — they all expect you to type your prompts. But what if you could just talk? That’s why I built VoxPilot.
Developers often spend time typing prompts like “refactor this function to use async/await with proper error handling and add unit tests.” That’s about 15 seconds of typing for something that could be said in 3 seconds. For those with RSI or carpal tunnel, typing isn’t just slow—it’s painful.
VoxPilot is a VS Code extension that captures your voice, transcribes it locally using Moonshine ASR, and sends the resulting text to your coding assistant. The key word is locally: your audio never leaves your machine. There are no API keys, no cloud calls, and no telemetry. The ASR model is only 27 MB and runs via ONNX Runtime.
How VoxPilot Works
Audio Capture
Native CLI tools capture raw PCM audio at 16 kHz:
- Linux:
arecord - macOS:
sox - Windows:
ffmpeg
Voice Activity Detection
An energy‑based VAD detects when you start and stop speaking, so you don’t need to press a button—just talk.
Transcription
Moonshine’s encoder‑decoder architecture processes the audio through ONNX Runtime:
- Tiny model (27 MB): fast for short commands.
- Base model (65 MB): better for longer dictation.
Delivery
The transcript is sent to VS Code’s Chat API, targeting whatever participant you’ve configured (Copilot, Continue, etc.).
Microphone → PCM Audio → Voice Activity Detection → Moonshine ASR → Text → VS Code Chat
Privacy
Voice data is sensitive, so VoxPilot processes everything in‑memory and never writes audio to disk or sends it over the network. This privacy‑first approach was non‑negotiable.
Links
- Open VSX:
- GitHub:
MIT licensed. PRs welcome. ⭐️ Star the repo if it’s useful.