[Paper] A robust generalizable device-agnostic deep learning model for sleep-wake determination from triaxial wrist accelerometry

Published: (December 1, 2025 at 01:43 PM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.01986v1

Overview

A new deep‑learning model can reliably infer whether a person is asleep or awake from raw wrist‑accelerometer data, regardless of the brand of device used. Tested on 453 adults—including people with sleep apnea, restless‑leg syndrome, and a wide age range—the algorithm matches polysomnography (the clinical gold standard) far better than prior actigraphy methods, especially at detecting wake periods.

Key Contributions

  • Device‑agnostic architecture – works equally well with three commercially‑available triaxial wrist accelerometers.
  • Three‑class classification (wake, sleep, sleep‑with‑arousals) that is later collapsed into a binary sleep‑wake decision, improving wake detection.
  • Targeted training on low‑efficiency sleepers to boost sensitivity to brief awakenings.
  • Robust performance across sleep disorders (obstructive sleep apnea, periodic limb movements) and ages 18–85.
  • Open‑source‑ready pipeline (30‑second epoch feature extraction, lightweight CNN/LSTM hybrid) that can be embedded in mobile or wearable SDKs.

Methodology

  1. Data collection – Simultaneous wrist accelerometry and full‑night polysomnography were recorded from 453 participants at a tertiary sleep clinic. Three different devices (Device A, B, C) captured triaxial acceleration at 30 Hz.
  2. Pre‑processing – Raw signals were segmented into 30‑second epochs (the same resolution used by PSG). Standard statistical and frequency‑domain features (e.g., mean, variance, spectral power) were computed for each axis.
  3. Model design – A hybrid deep network combines:
    • A small 1‑D convolutional stack to capture short‑term motion patterns.
    • A bidirectional LSTM layer to model temporal dependencies across epochs.
    • A final soft‑max head that outputs three classes: Wake, Sleep, Sleep‑with‑Arousal.
  4. Training strategy – To address the chronic under‑detection of wake, the authors oversampled subjects with low sleep efficiency (< 80 %) or high arousal index (> 15 h⁻¹) from one device, then validated on the remaining recordings from all three devices.
  5. Post‑processing – A simple decision tree collapses the three‑class output into a binary sleep‑wake label, applying a rule that “Sleep‑with‑Arousal” counts as sleep for total sleep time calculations but can be flagged for downstream sleep‑quality metrics.

Results & Findings

MetricValue
F1‑Score (binary)0.86
Sensitivity (detecting sleep)0.87
Specificity (detecting wake)0.78
Correlation with PSG total sleep timeR = 0.69
Correlation with PSG sleep efficiencyR = 0.63
  • Performance remained stable across the three accelerometer models (ΔF1 < 0.02).
  • No significant drop in accuracy for participants with moderate‑to‑severe obstructive sleep apnea (AHI > 15) or periodic limb movements.
  • The model correctly identified brief awakenings that traditional actigraphy algorithms typically miss, reducing the “sleep‑over‑estimation” bias.

Practical Implications

  • Consumer wearables: Manufacturers can integrate the model into firmware or companion apps to deliver clinically‑grade sleep metrics without needing additional sensors.
  • Remote health monitoring: Tele‑sleep clinics can rely on inexpensive wrist devices for longitudinal sleep tracking, freeing up PSG slots for complex cases.
  • Research & pharma trials: Large‑scale sleep‑outcome studies can use the model to standardize sleep‑wake labeling across heterogeneous device fleets, cutting data‑cleaning overhead.
  • Personalized feedback: Because the algorithm flags “sleep with arousals,” developers can build UI elements that surface micro‑wake events, helping users understand sleep fragmentation.
  • Edge deployment: The network’s modest parameter count (< 200 k) enables on‑device inference on low‑power microcontrollers, preserving battery life and user privacy (no cloud upload needed).

Limitations & Future Work

  • Population bias – All participants were recruited from a clinical sleep lab; performance in healthy, community‑dwelling cohorts remains to be validated.
  • Single‑night recordings – Night‑to‑night variability was not explored; future work should assess longitudinal stability.
  • Device sampling rates – The study used 30 Hz accelerometers; ultra‑low‑power devices sampling at ≤ 5 Hz may need model re‑training or quantization.
  • Explainability – While the model outperforms rule‑based actigraphy, interpreting which motion patterns drive wake detection is still an open research question.

Bottom line: This device‑agnostic deep‑learning approach narrows the gap between consumer‑grade actigraphy and clinical polysomnography, opening the door for more accurate, scalable sleep monitoring in everyday tech products.

Authors

  • Nasim Montazeri
  • Stone Yang
  • Dominik Luszczynski
  • John Zhang
  • Dharmendra Gurve
  • Andrew Centen
  • Maged Goubran
  • Andrew Lim

Paper Information

  • arXiv ID: 2512.01986v1
  • Categories: q-bio.QM, cs.LG
  • Published: December 1, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »