I Built a Voice-to-Code VS Code Extension That Runs Entirely On-Device

Published: (February 28, 2026 at 04:50 PM EST)
2 min read
Source: Dev.to

Source: Dev.to

Every AI coding assistant requires typing. GitHub Copilot, Continue, Kiro — they all expect you to type your prompts. But what if you could just talk? That’s why I built VoxPilot.

Developers often spend time typing prompts like “refactor this function to use async/await with proper error handling and add unit tests.” That’s about 15 seconds of typing for something that could be said in 3 seconds. For those with RSI or carpal tunnel, typing isn’t just slow—it’s painful.

VoxPilot is a VS Code extension that captures your voice, transcribes it locally using Moonshine ASR, and sends the resulting text to your coding assistant. The key word is locally: your audio never leaves your machine. There are no API keys, no cloud calls, and no telemetry. The ASR model is only 27 MB and runs via ONNX Runtime.

How VoxPilot Works

Audio Capture

Native CLI tools capture raw PCM audio at 16 kHz:

  • Linux: arecord
  • macOS: sox
  • Windows: ffmpeg

Voice Activity Detection

An energy‑based VAD detects when you start and stop speaking, so you don’t need to press a button—just talk.

Transcription

Moonshine’s encoder‑decoder architecture processes the audio through ONNX Runtime:

  • Tiny model (27 MB): fast for short commands.
  • Base model (65 MB): better for longer dictation.

Delivery

The transcript is sent to VS Code’s Chat API, targeting whatever participant you’ve configured (Copilot, Continue, etc.).

Microphone → PCM Audio → Voice Activity Detection → Moonshine ASR → Text → VS Code Chat

Privacy

Voice data is sensitive, so VoxPilot processes everything in‑memory and never writes audio to disk or sends it over the network. This privacy‑first approach was non‑negotiable.

  • Open VSX:
  • GitHub:

MIT licensed. PRs welcome. ⭐️ Star the repo if it’s useful.

0 views
Back to Blog

Related posts

Read more »

Google Gemini Writing Challenge

What I Built - Where Gemini fit in - Used Gemini’s multimodal capabilities to let users upload screenshots of notes, diagrams, or code snippets. - Gemini gener...