[Paper] Monocular Markerless Motion Capture Enables Quantitative Assessment of Upper Extremity Reachable Workspace
Source: arXiv - 2602.13176v1
Overview
A new study shows that a single, inexpensive webcam combined with AI‑driven markerless motion capture can accurately measure the Upper Extremity Reachable Workspace (UERW) – a standard clinical test of arm mobility. By validating this monocular setup against a gold‑standard marker‑based system, the authors demonstrate a low‑cost, easy‑to‑deploy alternative for clinicians and developers alike.
Key Contributions
- First validation of a monocular (single‑camera) markerless motion‑capture pipeline for the UERW task.
- Demonstrated high agreement (mean bias ≈ 0.6 % of reachable workspace) when the camera is placed directly in front of the participant.
- Quantified the impact of camera angle, showing that an offset view underestimates workspace by ~5.7 %.
- Provided an open‑source‑friendly workflow that can be reproduced with off‑the‑shelf FLIR (or similar) cameras and a pre‑trained AI pose estimator.
- Highlighted a clinical workflow that integrates a VR headset for target presentation while capturing video from a single viewpoint.
Methodology
- Participants & Task – Nine healthy adults performed the standardized UERW assessment: reaching toward virtual targets arranged on a sphere centered on the torso, displayed through a VR headset.
- Data Capture –
- Reference: A full marker‑based motion‑capture rig (optical markers + multiple cameras) recorded 3‑D joint trajectories.
- Test: Eight FLIR infrared cameras recorded the same session; the authors later selected two views for analysis: a frontal view (camera directly facing the participant) and an offset view (camera angled to the side).
- Monocular MMC Pipeline –
- Video from a single camera was fed into a state‑of‑the‑art AI pose estimator (e.g., OpenPose/MediaPipe) to extract 2‑D keypoints.
- A calibrated perspective transformation lifted the 2‑D keypoints into a 3‑D workspace using the known camera intrinsics and the participant’s torso as a reference plane.
- Reach percentages per octant of the virtual sphere were computed and compared to the marker‑based ground truth.
- Evaluation – Bias and standard deviation of the percentage of workspace reached were calculated for each camera configuration.
Results & Findings
| Camera configuration | Mean bias (% of workspace) | Std. dev. |
|---|---|---|
| Frontal (direct) | +0.61 % | ±0.12 % |
| Offset (angled) | ‑5.66 % | ±0.45 % |
- The frontal view matched the gold‑standard almost perfectly, with errors well below 1 % of the total reachable volume.
- The offset view systematically underestimated workspace, especially for targets behind the participant, confirming that camera placement is critical.
- Qualitative inspection showed smooth, anatomically plausible joint trajectories from the AI estimator, even without markers.
Practical Implications
- Clinics & Tele‑rehab – A single webcam (or even a smartphone) can replace costly multi‑camera rigs, enabling routine quantitative arm‑mobility assessments in outpatient settings or remote home‑based therapy.
- Software Integration – Developers can embed the AI pose‑estimation pipeline into existing health‑tech platforms (e.g., EMR‑linked mobile apps) to automatically generate UERW scores after a short video capture.
- VR‑augmented therapy – The study already uses a VR headset for target presentation; coupling this with a monocular camera creates a fully immersive, data‑rich rehab loop with minimal hardware.
- Research & Data Collection – Large‑scale studies of upper‑limb function (e.g., post‑stroke, neuromuscular disease) become feasible without the logistical overhead of marker placement.
- Cost Savings – Eliminating markers, multiple cameras, and specialized labs can cut setup costs by >90 %, making quantitative motion analysis accessible to community clinics and startups.
Limitations & Future Work
- Sample Size & Population – Only nine unimpaired adults were tested; validation on patients with motor impairments (stroke, ALS, etc.) is still needed.
- Depth Ambiguity – Monocular inference relies on a calibrated torso plane; extreme out‑of‑plane motions could degrade accuracy.
- Camera Calibration – The pipeline assumes precise intrinsics; automated self‑calibration methods would improve robustness in real‑world deployments.
- Extended Workspace – Future work should explore rear‑hemisphere coverage (e.g., using multiple frontal cameras or a rotating camera) to capture the full 3‑D workspace.
By addressing these points, the community can move toward truly ubiquitous, AI‑powered motion capture for clinical and consumer applications.
Authors
- Seth Donahue
- J. D. Peiffer
- R. Tyler Richardson
- Yishan Zhong
- Shaun Q. Y. Tan
- Benoit Marteau
- Stephanie R. Russo
- May D. Wang
- R. James Cotton
- Ross Chafetz
Paper Information
- arXiv ID: 2602.13176v1
- Categories: cs.CV
- Published: February 13, 2026
- PDF: Download PDF