[Paper] Monocular Markerless Motion Capture Enables Quantitative Assessment of Upper Extremity Reachable Workspace

Published: (February 13, 2026 at 01:36 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.13176v1

Overview

A new study shows that a single, inexpensive webcam combined with AI‑driven markerless motion capture can accurately measure the Upper Extremity Reachable Workspace (UERW) – a standard clinical test of arm mobility. By validating this monocular setup against a gold‑standard marker‑based system, the authors demonstrate a low‑cost, easy‑to‑deploy alternative for clinicians and developers alike.

Key Contributions

  • First validation of a monocular (single‑camera) markerless motion‑capture pipeline for the UERW task.
  • Demonstrated high agreement (mean bias ≈ 0.6 % of reachable workspace) when the camera is placed directly in front of the participant.
  • Quantified the impact of camera angle, showing that an offset view underestimates workspace by ~5.7 %.
  • Provided an open‑source‑friendly workflow that can be reproduced with off‑the‑shelf FLIR (or similar) cameras and a pre‑trained AI pose estimator.
  • Highlighted a clinical workflow that integrates a VR headset for target presentation while capturing video from a single viewpoint.

Methodology

  1. Participants & Task – Nine healthy adults performed the standardized UERW assessment: reaching toward virtual targets arranged on a sphere centered on the torso, displayed through a VR headset.
  2. Data Capture
    • Reference: A full marker‑based motion‑capture rig (optical markers + multiple cameras) recorded 3‑D joint trajectories.
    • Test: Eight FLIR infrared cameras recorded the same session; the authors later selected two views for analysis: a frontal view (camera directly facing the participant) and an offset view (camera angled to the side).
  3. Monocular MMC Pipeline
    • Video from a single camera was fed into a state‑of‑the‑art AI pose estimator (e.g., OpenPose/MediaPipe) to extract 2‑D keypoints.
    • A calibrated perspective transformation lifted the 2‑D keypoints into a 3‑D workspace using the known camera intrinsics and the participant’s torso as a reference plane.
    • Reach percentages per octant of the virtual sphere were computed and compared to the marker‑based ground truth.
  4. Evaluation – Bias and standard deviation of the percentage of workspace reached were calculated for each camera configuration.

Results & Findings

Camera configurationMean bias (% of workspace)Std. dev.
Frontal (direct)+0.61 %±0.12 %
Offset (angled)‑5.66 %±0.45 %
  • The frontal view matched the gold‑standard almost perfectly, with errors well below 1 % of the total reachable volume.
  • The offset view systematically underestimated workspace, especially for targets behind the participant, confirming that camera placement is critical.
  • Qualitative inspection showed smooth, anatomically plausible joint trajectories from the AI estimator, even without markers.

Practical Implications

  • Clinics & Tele‑rehab – A single webcam (or even a smartphone) can replace costly multi‑camera rigs, enabling routine quantitative arm‑mobility assessments in outpatient settings or remote home‑based therapy.
  • Software Integration – Developers can embed the AI pose‑estimation pipeline into existing health‑tech platforms (e.g., EMR‑linked mobile apps) to automatically generate UERW scores after a short video capture.
  • VR‑augmented therapy – The study already uses a VR headset for target presentation; coupling this with a monocular camera creates a fully immersive, data‑rich rehab loop with minimal hardware.
  • Research & Data Collection – Large‑scale studies of upper‑limb function (e.g., post‑stroke, neuromuscular disease) become feasible without the logistical overhead of marker placement.
  • Cost Savings – Eliminating markers, multiple cameras, and specialized labs can cut setup costs by >90 %, making quantitative motion analysis accessible to community clinics and startups.

Limitations & Future Work

  • Sample Size & Population – Only nine unimpaired adults were tested; validation on patients with motor impairments (stroke, ALS, etc.) is still needed.
  • Depth Ambiguity – Monocular inference relies on a calibrated torso plane; extreme out‑of‑plane motions could degrade accuracy.
  • Camera Calibration – The pipeline assumes precise intrinsics; automated self‑calibration methods would improve robustness in real‑world deployments.
  • Extended Workspace – Future work should explore rear‑hemisphere coverage (e.g., using multiple frontal cameras or a rotating camera) to capture the full 3‑D workspace.

By addressing these points, the community can move toward truly ubiquitous, AI‑powered motion capture for clinical and consumer applications.

Authors

  • Seth Donahue
  • J. D. Peiffer
  • R. Tyler Richardson
  • Yishan Zhong
  • Shaun Q. Y. Tan
  • Benoit Marteau
  • Stephanie R. Russo
  • May D. Wang
  • R. James Cotton
  • Ross Chafetz

Paper Information

  • arXiv ID: 2602.13176v1
  • Categories: cs.CV
  • Published: February 13, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »