How I Built a Real-Time Hand Tracking Platform That Runs Entirely in the Browser
Source: Dev.to
The Problem with Traditional Hand Tracking
- Installing Python, OpenCV, and a dozen other dependencies
- Asking users to trust a native‑app download
- Wrestling with webcam permissions across different OSes
- Spinning up servers just to process hand movements
- Watching latency kill the “real‑time” experience
A Browser‑Only Solution
What if hand tracking just worked in a browser tab?
// Detect pinch gesture
const distance = Math.sqrt(
Math.pow(thumbTip.x - indexTip.x, 2) +
Math.pow(thumbTip.y - indexTip.y, 2)
);
const angle = Math.min(distance * 500, 90);
That’s real code mapping a pinch gesture to control animations. MediaPipe handles detection, Three.js renders the visuals.
- Browser loads MediaPipe – No install, just a CDN import
- Webcam feeds landmarks – 21 hand points tracked in real‑time
- JavaScript maps gestures – Translate finger positions into actions
- Three.js renders output – Smooth 3D visualization at 60 fps
The entire pipeline runs client‑side. Your webcam feed never leaves your device.
Browser SDK (zero install)
- Visit the Playground → Select an effect → Grant camera → Done
Cloud API
curl -X POST https://visionpipe3d.quochuy.dev/api/detect \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "image=@hand.jpg"
New accounts get 100 free API credits—no credit card required.
Browser SDK vs. Cloud API
| Feature | Browser SDK | Cloud API |
|---|---|---|
| Hand Landmark Detection | ✅ | ✅ |
| Real‑time Tracking | ✅ | ❌ (batch only) |
| Offline Support | ✅ (after load) | ❌ |
| Processing Location | Client‑side | Server‑side |
| Latency | ~16 ms | Network dependent |
| Privacy | Data stays local | Processed & discarded |
Pricing
| Plan | Credits | Cost | Per Credit |
|---|---|---|---|
| Free | 100 | $0 | – |
| Starter | 1,000 | $10 | $0.010 |
| Growth | 5,000 | $40 | $0.008 |
| Scale | 20,000 | $150 | $0.0075 |
- 1 credit = 1 API call
- Credits never expire.
- The Playground is always free.
Getting Started
Browser SDK
- Open the Playground URL.
- Grant camera access.
- Choose an effect and start detecting.
Zero friction—no Python, no installs, no “works on my machine”.
Cloud API
# Get your API key from the console
curl -X POST https://visionpipe3d.quochuy.dev/api/detect \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "image=@sample.jpg"
The 100 free credits are enough to build a proof‑of‑concept.
Resources
- GitHub:
- Website:
- Documentation:
Call to Action
What gesture‑based interfaces have you built or wanted to build? I’m curious about the use cases people are exploring beyond the typical “air guitar” demos.