[Paper] Real-Time Multimodal Data Collection Using Smartwatches and Its Visualization in Education

Published: 2 months ago (December 2, 2025 at 06:12 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.02651v1

Overview

The paper introduces Watch‑DMLT, a real‑time data‑capture app for Fitbit Sense 2 smartwatches, and ViSeDOPS, a web‑based dashboard that visualizes the synchronized multimodal streams collected during classroom activities. By combining physiological, motion, gaze, video, and annotation data from up to 16 devices simultaneously, the authors demonstrate a scalable pipeline for Multimodal Learning Analytics (MLA) that can be deployed in everyday educational settings.

Key Contributions

Watch‑DMLT: Open‑source Android app that streams heart‑rate, accelerometer, gyroscope, and optional gaze data from multiple smartwatches to a central server with sub‑second latency.
ViSeDOPS: Interactive visualization suite (timeline, heat‑maps, video overlay) that aligns multimodal streams with teacher‑provided annotations, enabling rapid exploratory analysis.
End‑to‑end deployment: Field trial with 65 university students (up to 16 concurrent watches) during oral presentations, proving the system’s robustness in a real classroom.
Data schema & synchronization protocol: A lightweight JSON‑based format and NTP‑based clock alignment that keep all streams temporally coherent without heavy infrastructure.
Open resources: Code, documentation, and a sample dataset released under an MIT‑compatible license to encourage replication and extension.

Methodology

Hardware & Sensors – Participants wore Fitbit Sense 2 devices that expose heart‑rate (PPG), 3‑axis accelerometer, gyroscope, and, when paired with an external eye‑tracker, gaze coordinates.
Software Stack –
- Watch‑DMLT runs on the watch, batches sensor readings into 200 ms packets, and pushes them over Bluetooth to a companion Android phone.
- The phone forwards packets via WebSocket to a cloud‑hosted Node.js server that timestamps each packet using NTP‑synchronised clocks.
Annotation Layer – Instructors use a simple web form to tag events (e.g., “question asked”, “feedback given”). These timestamps are merged with the sensor streams on the server.
Visualization – ViSeDOPS consumes the unified JSON log, rendering synchronized timelines, per‑student heat‑maps of motion intensity, and a split‑screen view that overlays video of the presentation with live physiological traces.
Evaluation – The system’s latency, packet loss, and battery impact were measured across multiple sessions; qualitative feedback was gathered from students and the instructor about usability and insightfulness.

Results & Findings

Low latency & high fidelity – Average end‑to‑end delay was ≈ 350 ms, with < 2 % packet loss even when 16 watches streamed concurrently.
Battery endurance – A single 48‑hour charge sustained continuous data capture for a full 2‑hour class, confirming feasibility for typical lecture lengths.
Insight generation – Visual analyses revealed clear physiological signatures (e.g., heart‑rate spikes, increased motion) aligned with moments of audience questioning and presenter stress.
Scalability – The server handled up to 200 Hz aggregate sampling rates without degradation, suggesting the pipeline can support larger cohorts or richer sensor suites.
User acceptance – 87 % of students reported the smartwatch was “non‑intrusive,” and the instructor highlighted the dashboard’s ability to pinpoint engagement bottlenecks in real time.

Practical Implications

Real‑time feedback loops – Educators can monitor class‑wide affective states live, enabling adaptive interventions (e.g., pausing for clarification when stress spikes).
Automated analytics pipelines – Developers can plug the open‑source Watch‑DMLT into existing Learning Management Systems (LMS) to enrich analytics dashboards with physiological context.
Beyond education – The same architecture applies to corporate training, remote workshops, or any scenario where multimodal human‑computer interaction needs to be quantified at scale.
Rapid prototyping – Because the system relies on commodity smartwatches and standard web technologies, teams can prototype new MLA studies without custom hardware or costly data‑center setups.
Privacy‑by‑design – Data is anonymized on‑device before transmission, and the open schema makes it straightforward to integrate consent management tools.

Limitations & Future Work

Sensor diversity – The current implementation is tied to Fitbit Sense 2; extending to other wearables (Apple Watch, Garmin) will require additional SDK wrappers.
Gaze capture – Accurate eye‑tracking was only demonstrated with an external device; integrating on‑watch cameras or low‑cost eye‑trackers remains an open challenge.
Long‑term studies – The paper reports a single‑semester deployment; longitudinal validation (multiple courses, varied demographics) is needed to assess generalizability.
Automated inference – Future work could embed machine‑learning models on the server to automatically detect engagement patterns, reducing reliance on manual annotation.
Edge processing – Offloading some preprocessing to the smartwatch (e.g., feature extraction) could further cut bandwidth and improve privacy.

Authors

Alvaro Becerra
Pablo Villegas
Ruth Cobos

Paper Information

arXiv ID: 2512.02651v1
Categories: cs.HC, cs.CV, cs.SE
Published: December 2, 2025
PDF: Download PDF

[Paper] Real-Time Multimodal Data Collection Using Smartwatches and Its Visualization in Education

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] EditThinker: Unlocking Iterative Reasoning for Any Image Editor

[Paper] AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement

[Paper] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

[Paper] SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models