From Keypoints to Measurements: Why Landmarks Alone Are Useless

Published: (June 10, 2026 at 05:49 PM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Every hand-tracking demo shows you 21 dots. The interesting part is what nobody shows: turning dots into numbers someone can act on. Run any modern hand-tracking model and you get 21 beautifully stable landmarks per hand at 30 FPS. Impressive — and by itself, worthless. No client has ever paid for dots. They pay for measurements: is this clearance compliant, is this part aligned, did this patient’s range of motion improve. I learned this on utility infrastructure work, where the deliverable was never “we detected the wire” — it was the attachment height of that wire, and whether it violates clearance rules. Keypoints were step one of three. My portfolio’s keypoint demo derives three measurements per hand, every frame: const wrist = lm[0]; const palm = distance(wrist, lm[9]); // scale reference

const pinch = distance(lm[4], lm[8]) / palm; // thumb tip ↔ index tip

The crucial line is the scale reference. Pixel distances are meaningless — they change as you move toward the camera. Dividing by palm length (wrist to middle knuckle) gives a relative measurement that’s stable under distance, and multiplying by the average adult palm length (~8.5 cm) converts it into an approximate real-world gap — the demo shows ”≈ 3.2 cm” floating on the pinch line. In infrastructure work the same role is played by a known object dimension — a standard crossarm, a pole class height. Every measurement-from-pixels system needs its ruler.

Finger counting is a geometric test (is each fingertip farther from the wrist than its middle joint?), and “hand openness” averages fingertip extension — three lines of geometry each, but they convert a model output into a readout a human understands instantly. The landmarks come from MediaPipe’s pretrained pipeline (palm detector → landmark regressor → gesture classifier, float16, WASM + GPU delegate) — Google’s models, credited on the page. The engineering I own is the integration (lazy loading, render loop, throttled UI) and the measurement layer on top. Knowing when a pretrained model suffices and when you need to fine-tune your own — as I did for utility keypoints, where off-the-shelf models had never seen a crossarm — is most of the senior judgment in applied CV. Whatever your domain: landmark → scale reference → relative measurement → threshold → decision. That last hop, from number to decision, is where the business value lives. Models are increasingly commodities; measurement systems are not. Try it (pinch slowly and watch the number): rs-03.github.io/portfolio-website/demos Source: github.com/rs-03/portfolio-website

0 views
Back to Blog

Related posts

Read more »