How LLM Can Fix Your Posture
Source: Dev.to
The problem
I’m a system engineer running a home server with dozens of services, AI agents, and dashboards. I spend 5–7 hours a day at my workstation after my full‑time job. Most of that time goes to typing: commands, prompts, messages, notes.
My hands get tired. My back hurts from hunching over the keyboard. And the worst part: typing is the bottleneck between thinking and doing.
I wanted to give instructions the way I’d talk to a colleague—by speaking.
How it actually works
The solution turned out to be embarrassingly simple:
- Android app sends recognized text over Wi‑Fi to my workstation.
- Workstation service receives the text and types it into the active cursor position.
That’s it. No cloud, no server‑side processing, no Whisper.
Key insight: Android’s built‑in speech recognition is better than anything I tried.
I experimented with Whisper (multiple model sizes), Faster Whisper, Vosk, and several other libraries. They all had problems:
- Whisper‑small was too slow on CPU (3–4 s per utterance).
- Whisper‑medium ate 4 GB of RAM and was still slower than real‑time.
- Faster Whisper improved speed but accuracy with mixed Russian/English was poor.
- Vosk worked offline but the models were huge and recognition quality was inconsistent.
Android’s native speech‑to‑text just works. It’s fast, accurate, runs on the phone’s hardware, and handles language switching naturally. Google has spent billions optimizing on‑device recognition; I can’t compete with that on a single server.
The workflow
My phone sits on the desk next to me. When I want to “type” something:
- Open the app (or it’s already open).
- Speak naturally; text appears in real‑time on the phone screen.
- The text is transmitted over Wi‑Fi to my workstation.
- It is inserted wherever my cursor is: terminal, browser, IDE, chat.
- I hit Enter (on the phone or keyboard).
Language switching: Android auto‑detects language from phonemes. I use three languages daily—English, Russian, Ukrainian—and it switches between them naturally.
What changed
My productivity increased dramatically. Tasks that involved writing prompts, commit messages, or documentation took about 3× less time. The bottleneck shifted from typing to thinking, which is where it should be.
The physical change was even more dramatic. I have a motorized standing desk. Before voice input I rarely used the standing position because typing while standing is uncomfortable—wrists at a weird angle, keyboard too low or too high. Now I work standing half the day, just talking.
The irony is that as a system engineer, my posture improved not from ergonomics advice but from building a voice tool.
Technical details
Android app: Kotlin, uses Android’s SpeechRecognizer API. Connects to the workstation via WebSocket over the local network. Sends recognized text as plain‑string messages. The app stays in the foreground with a persistent notification so Android doesn’t kill the WebSocket connection.
Workstation service: Lightweight Python process (~80 lines). Receives WebSocket messages and uses xdotool (Linux) to type the text at the current cursor position. Simulates keyboard input at the OS level, so it works with any application.
Network: Pure local Wi‑Fi. Phone and workstation on the same network. Latency under 50 ms. No internet required. Total round‑trip from speech end to text appearing on screen is about 200 ms.
What I use it for daily
- Talking to Claude – about 60 % of all voice input (dictating prompts, describing bugs, giving instructions).
- Writing notes and worklogs – I used to skip them because they felt tedious; now I just say what I did.
- Git commit messages – commits are longer and more descriptive since I stopped typing them.
- Slack and Telegram messages – faster than thumb‑typing on a phone.
- Documentation – like this article.
What doesn’t work great
- Code – I don’t dictate code (variable names, brackets, indentation). Voice is terrible for this, but I haven’t written code manually in three months either—Claude writes it for me. I dictate the intent, the model writes the code.
- Noisy environments – Works great in my home office but drops accuracy significantly with background noise.
- Technical terms – When I say “xdotool” or “kubectl”, Android has no idea what I mean. I keep a dictionary of corrections for frequently used terms, but for these I just type.
Why local‑only matters
No API keys or prompts leave my network. No subscription. No account dependency. The entire system lives on my server—I own the data, the latency, the uptime.
Was it worth building?
It took a weekend to build the first working version. Three months later, I use it every single day.
Total cost: one weekend of coding, zero ongoing costs. The phone I already had. The Wi‑Fi network I already had. Android’s speech recognition is free.
Sometimes the most impactful tool isn’t the most complex one. It’s the one that removes friction from what you already do hundreds of times a day.
I type less. I think more. I stand up.
Originally published on klymentiev.com