I Made a Voice Agent That Plans 5Ks - Like A Runner.
Source: Dev.to
[](https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc2r3bexhi5rw5r28njc.png)
If you’ve ever asked for directions in the Caribbean, you already know the genesis for the name of my app, **BigTree**. I’m 100 % sure it’s not just us, but directions are “turn by the next big frangipani tree, and come straight down the road.” I’m yet to meet anyone who says, “turn at Latitude 51.5151° N, Longitude 0.2185° W,” lol.
It’s:
> “Go down the road, turn left at the big tree, pass the painted stone, and it’s right there.”
That’s how we think. That’s how we communicate. That’s how we move.
---
### The Problem
As a runner **and** a developer building **[GoodFinish](https://goodfinish.com/)** – a race‑management platform built specifically for small, grassroots Race Directors (50–200 runners, not corporate mega‑events) – there’s a friction point for race organizers:
> **Mapping the route was the most painful part of the process.**
> Most tools force you into slow, tedious point‑and‑click plotting.
> That’s not how we describe routes, and it’s definitely not how small‑town RDs think.
---
## What I Built with Google Gemini
I built **BigTree** — a conversational, voice‑activated route‑design add‑on for GoodFinish. It also works as a standalone app. Instead of clicking 200 points on a map, you just talk with it. Because it uses **Gemini 3.1 Pro** and the **Google Maps API** on the backend, it *should* be pretty accurate. It isn’t perfect yet, but give it a try and see the distances it gives you.
**Try it out for yourself:**
[BigTree Routes](https://ai.studio/apps/80064e56-1326-47ce-949b-3051f2640937)
You can say:
> “Start at the local park in **MY CITY**. Map a 5 km heading north toward the beach and loop back.”
BigTree listens, responds, suggests improvements, draws the polyline live on the map, and instantly generates a downloadable, industry‑standard **GPX** file.
*No GIS headaches. No technical friction. Just describe it the way you would describe it to a friend.*
---
## What Role Did Gemini Play?
Gemini isn’t a feature in BigTree – it’s the brain. Here’s how the architecture breaks down:
### 1️⃣ Real‑Time Voice Interface
**Gemini 2.5 Native Audio (Live API)** powers the live conversation.
- Listens to route descriptions in real‑time.
- Talks back (Zephyr voice) to confirm distances.
- Suggests route alternatives.
- Warns about disconnected roads.
- Allows interruptions mid‑conversation.
Latency is low enough that it feels natural – not “AI‑ish,” just fluid.
### 2️⃣ Spatial Reasoning Engine
**Gemini 3.1 Pro** handles the heavy geospatial thinking.
Using function calling, the Live API passes a structured intent to Gemini 3.1 Pro, which translates natural language (including landmark‑based Caribbean‑style directions) into:
- Exact latitude/longitude coordinate arrays
- Smooth polylines
- Raw GPX XML
Essentially, I turned an LLM into a geospatial engine. That part was fascinating.
### 3️⃣ Real‑World Context & Search
**Gemini 2.5 Flash + Maps Grounding**
- Finds real‑world landmarks.
- Handles requests like: “Route us past a good coffee shop at mile 2.”
**Gemini 3 Flash Preview + Search Grounding**
- Pulls real‑time data such as weather conditions for race day and live environmental context.
---
## What I Learned
### Technical Lessons
#### Streaming Audio with WebSockets
Integrating the Live API forced me deep into:
- Web Audio API
- PCM‑16 audio streaming
- Script‑processor nodes
- Raw audio‑chunk transmission over WebSockets
Real‑time voice is not trivial, but once it works it’s game‑changing.
#### Spatial Prompt Engineering
Getting an LLM to output:
- Strict JSON
- Clean coordinate arrays
- Valid GPX XML
- Smooth, realistic route curves
requires extremely disciplined prompting. You can’t “kind of” structure it – it must be deterministic enough for production.
### Unexpected Insight
Voice might be the ultimate UI for mapping. I didn’t realize how much friction traditional map tools create until I removed the mouse. When you can just speak and watch the route draw itself, it feels like magic.
More importantly, it **lowers the barrier for small, community Race Directors** who just want to host a great 5 km – not learn GIS software. That matters to me because GoodFinish was never about enterprise race timing; it’s about empowering the grassroots.
---
## What Worked Well
- **Live API latency & voice quality** – surprisingly natural.
- **Function‑calling reliability** – context flowed cleanly from voice session to backend route generation.
- The model clearly understood the difference between:
- “Describe a route”
- “Ask a general question”
That separation was impressive.
## Where I Hit Friction
- **Audio Buffer Management** – Capturing mic input → converting to exact PCM format → decoding returned audio streams was not plug‑and‑play. An out‑of‑the‑box abstraction or SDK utility for browser audio contexts would be incredible.
- **Strict JSON Output** – Occasionally, large GPX responses were wrapped in markdown code fences, e.g.:
```json
{
...
}
This broke JSON.parse() instantly. I had to implement backend sanitization to guarantee pipeline stability.
- LLM Hallucinations – The biggest one, of course, is that this is still an LLM 😆. You can get route hallucinations between different points on the map. Production AI requires guardrails.
Why This One Matters to Me
I build for:
- Small operators
- Grassroots events
- People who don’t have tech teams
- Communities where directions are still “turn by the big tree”
BigTree lets those communities create race routes as naturally as they give directions to a friend. It’s a small step toward making technology feel local again.
Local Run Club
The club would benefit from this and love it.
BigTree feels like one of those moments where AI stops being hype and becomes utility.
And we’re just getting started.
Tags: #gemini #google #webdev #running #buildinpublic