[Paper] Monocular Markerless Motion Capture Enables Quantitative Assessment of Upper Extremity Reachable Workspace

Published: 3 days ago (February 13, 2026 at 01:36 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.13176v1

Overview

A new study shows that a single, inexpensive webcam combined with AI‑driven markerless motion capture can accurately measure the Upper Extremity Reachable Workspace (UERW) – a standard clinical test of arm mobility. By validating this monocular setup against a gold‑standard marker‑based system, the authors demonstrate a low‑cost, easy‑to‑deploy alternative for clinicians and developers alike.

Key Contributions

First validation of a monocular (single‑camera) markerless motion‑capture pipeline for the UERW task.
Demonstrated high agreement (mean bias ≈ 0.6 % of reachable workspace) when the camera is placed directly in front of the participant.
Quantified the impact of camera angle, showing that an offset view underestimates workspace by ~5.7 %.
Provided an open‑source‑friendly workflow that can be reproduced with off‑the‑shelf FLIR (or similar) cameras and a pre‑trained AI pose estimator.
Highlighted a clinical workflow that integrates a VR headset for target presentation while capturing video from a single viewpoint.

Methodology

Participants & Task – Nine healthy adults performed the standardized UERW assessment: reaching toward virtual targets arranged on a sphere centered on the torso, displayed through a VR headset.
Data Capture –
- Reference: A full marker‑based motion‑capture rig (optical markers + multiple cameras) recorded 3‑D joint trajectories.
- Test: Eight FLIR infrared cameras recorded the same session; the authors later selected two views for analysis: a frontal view (camera directly facing the participant) and an offset view (camera angled to the side).
Monocular MMC Pipeline –
- Video from a single camera was fed into a state‑of‑the‑art AI pose estimator (e.g., OpenPose/MediaPipe) to extract 2‑D keypoints.
- A calibrated perspective transformation lifted the 2‑D keypoints into a 3‑D workspace using the known camera intrinsics and the participant’s torso as a reference plane.
- Reach percentages per octant of the virtual sphere were computed and compared to the marker‑based ground truth.
Evaluation – Bias and standard deviation of the percentage of workspace reached were calculated for each camera configuration.

Results & Findings

Camera configuration	Mean bias (% of workspace)	Std. dev.
Frontal (direct)	+0.61 %	±0.12 %
Offset (angled)	‑5.66 %	±0.45 %

The frontal view matched the gold‑standard almost perfectly, with errors well below 1 % of the total reachable volume.
The offset view systematically underestimated workspace, especially for targets behind the participant, confirming that camera placement is critical.
Qualitative inspection showed smooth, anatomically plausible joint trajectories from the AI estimator, even without markers.

Practical Implications

Clinics & Tele‑rehab – A single webcam (or even a smartphone) can replace costly multi‑camera rigs, enabling routine quantitative arm‑mobility assessments in outpatient settings or remote home‑based therapy.
Software Integration – Developers can embed the AI pose‑estimation pipeline into existing health‑tech platforms (e.g., EMR‑linked mobile apps) to automatically generate UERW scores after a short video capture.
VR‑augmented therapy – The study already uses a VR headset for target presentation; coupling this with a monocular camera creates a fully immersive, data‑rich rehab loop with minimal hardware.
Research & Data Collection – Large‑scale studies of upper‑limb function (e.g., post‑stroke, neuromuscular disease) become feasible without the logistical overhead of marker placement.
Cost Savings – Eliminating markers, multiple cameras, and specialized labs can cut setup costs by >90 %, making quantitative motion analysis accessible to community clinics and startups.

Limitations & Future Work

Sample Size & Population – Only nine unimpaired adults were tested; validation on patients with motor impairments (stroke, ALS, etc.) is still needed.
Depth Ambiguity – Monocular inference relies on a calibrated torso plane; extreme out‑of‑plane motions could degrade accuracy.
Camera Calibration – The pipeline assumes precise intrinsics; automated self‑calibration methods would improve robustness in real‑world deployments.
Extended Workspace – Future work should explore rear‑hemisphere coverage (e.g., using multiple frontal cameras or a rotating camera) to capture the full 3‑D workspace.

By addressing these points, the community can move toward truly ubiquitous, AI‑powered motion capture for clinical and consumer applications.

Authors

Seth Donahue
J. D. Peiffer
R. Tyler Richardson
Yishan Zhong
Shaun Q. Y. Tan
Benoit Marteau
Stephanie R. Russo
May D. Wang
R. James Cotton
Ross Chafetz

Paper Information

arXiv ID: 2602.13176v1
Categories: cs.CV
Published: February 13, 2026
PDF: Download PDF

[Paper] Monocular Markerless Motion Capture Enables Quantitative Assessment of Upper Extremity Reachable Workspace

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

[Paper] Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision

[Paper] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

[Paper] FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control