I Built a Voice-to-Code VS Code Extension That Runs Entirely On-Device

Published: 2 days ago (February 28, 2026 at 04:50 PM EST)

2 min read

Source: Dev.to

Every AI coding assistant requires typing. GitHub Copilot, Continue, Kiro — they all expect you to type your prompts. But what if you could just talk? That’s why I built VoxPilot.

Developers often spend time typing prompts like “refactor this function to use async/await with proper error handling and add unit tests.” That’s about 15 seconds of typing for something that could be said in 3 seconds. For those with RSI or carpal tunnel, typing isn’t just slow—it’s painful.

VoxPilot is a VS Code extension that captures your voice, transcribes it locally using Moonshine ASR, and sends the resulting text to your coding assistant. The key word is locally: your audio never leaves your machine. There are no API keys, no cloud calls, and no telemetry. The ASR model is only 27 MB and runs via ONNX Runtime.

How VoxPilot Works

Audio Capture

Native CLI tools capture raw PCM audio at 16 kHz:

Linux: arecord
macOS: sox
Windows: ffmpeg

Voice Activity Detection

An energy‑based VAD detects when you start and stop speaking, so you don’t need to press a button—just talk.

Transcription

Moonshine’s encoder‑decoder architecture processes the audio through ONNX Runtime:

Tiny model (27 MB): fast for short commands.
Base model (65 MB): better for longer dictation.

Delivery

The transcript is sent to VS Code’s Chat API, targeting whatever participant you’ve configured (Copilot, Continue, etc.).

Microphone → PCM Audio → Voice Activity Detection → Moonshine ASR → Text → VS Code Chat

Privacy

Voice data is sensitive, so VoxPilot processes everything in‑memory and never writes audio to disk or sends it over the network. This privacy‑first approach was non‑negotiable.

I Built a Voice-to-Code VS Code Extension That Runs Entirely On-Device

How VoxPilot Works

Audio Capture

Voice Activity Detection

Transcription

Delivery

Privacy

Links

Related posts

Shared Workflows: minha experiência definindo pipelines reutilizáveis

Building a Local-First Financial IDE: How I forced Gemini AI to do strict Double-Entry Accounting

I ran cursor-doctor on 50 real projects. Here's what broke.

Google Gemini Writing Challenge