[Paper] A robust generalizable device-agnostic deep learning model for sleep-wake determination from triaxial wrist accelerometry
Source: arXiv - 2512.01986v1
Overview
A new deep‑learning model can reliably infer whether a person is asleep or awake from raw wrist‑accelerometer data, regardless of the brand of device used. Tested on 453 adults—including people with sleep apnea, restless‑leg syndrome, and a wide age range—the algorithm matches polysomnography (the clinical gold standard) far better than prior actigraphy methods, especially at detecting wake periods.
Key Contributions
- Device‑agnostic architecture – works equally well with three commercially‑available triaxial wrist accelerometers.
- Three‑class classification (wake, sleep, sleep‑with‑arousals) that is later collapsed into a binary sleep‑wake decision, improving wake detection.
- Targeted training on low‑efficiency sleepers to boost sensitivity to brief awakenings.
- Robust performance across sleep disorders (obstructive sleep apnea, periodic limb movements) and ages 18–85.
- Open‑source‑ready pipeline (30‑second epoch feature extraction, lightweight CNN/LSTM hybrid) that can be embedded in mobile or wearable SDKs.
Methodology
- Data collection – Simultaneous wrist accelerometry and full‑night polysomnography were recorded from 453 participants at a tertiary sleep clinic. Three different devices (Device A, B, C) captured triaxial acceleration at 30 Hz.
- Pre‑processing – Raw signals were segmented into 30‑second epochs (the same resolution used by PSG). Standard statistical and frequency‑domain features (e.g., mean, variance, spectral power) were computed for each axis.
- Model design – A hybrid deep network combines:
- A small 1‑D convolutional stack to capture short‑term motion patterns.
- A bidirectional LSTM layer to model temporal dependencies across epochs.
- A final soft‑max head that outputs three classes: Wake, Sleep, Sleep‑with‑Arousal.
- Training strategy – To address the chronic under‑detection of wake, the authors oversampled subjects with low sleep efficiency (< 80 %) or high arousal index (> 15 h⁻¹) from one device, then validated on the remaining recordings from all three devices.
- Post‑processing – A simple decision tree collapses the three‑class output into a binary sleep‑wake label, applying a rule that “Sleep‑with‑Arousal” counts as sleep for total sleep time calculations but can be flagged for downstream sleep‑quality metrics.
Results & Findings
| Metric | Value |
|---|---|
| F1‑Score (binary) | 0.86 |
| Sensitivity (detecting sleep) | 0.87 |
| Specificity (detecting wake) | 0.78 |
| Correlation with PSG total sleep time | R = 0.69 |
| Correlation with PSG sleep efficiency | R = 0.63 |
- Performance remained stable across the three accelerometer models (ΔF1 < 0.02).
- No significant drop in accuracy for participants with moderate‑to‑severe obstructive sleep apnea (AHI > 15) or periodic limb movements.
- The model correctly identified brief awakenings that traditional actigraphy algorithms typically miss, reducing the “sleep‑over‑estimation” bias.
Practical Implications
- Consumer wearables: Manufacturers can integrate the model into firmware or companion apps to deliver clinically‑grade sleep metrics without needing additional sensors.
- Remote health monitoring: Tele‑sleep clinics can rely on inexpensive wrist devices for longitudinal sleep tracking, freeing up PSG slots for complex cases.
- Research & pharma trials: Large‑scale sleep‑outcome studies can use the model to standardize sleep‑wake labeling across heterogeneous device fleets, cutting data‑cleaning overhead.
- Personalized feedback: Because the algorithm flags “sleep with arousals,” developers can build UI elements that surface micro‑wake events, helping users understand sleep fragmentation.
- Edge deployment: The network’s modest parameter count (< 200 k) enables on‑device inference on low‑power microcontrollers, preserving battery life and user privacy (no cloud upload needed).
Limitations & Future Work
- Population bias – All participants were recruited from a clinical sleep lab; performance in healthy, community‑dwelling cohorts remains to be validated.
- Single‑night recordings – Night‑to‑night variability was not explored; future work should assess longitudinal stability.
- Device sampling rates – The study used 30 Hz accelerometers; ultra‑low‑power devices sampling at ≤ 5 Hz may need model re‑training or quantization.
- Explainability – While the model outperforms rule‑based actigraphy, interpreting which motion patterns drive wake detection is still an open research question.
Bottom line: This device‑agnostic deep‑learning approach narrows the gap between consumer‑grade actigraphy and clinical polysomnography, opening the door for more accurate, scalable sleep monitoring in everyday tech products.
Authors
- Nasim Montazeri
- Stone Yang
- Dominik Luszczynski
- John Zhang
- Dharmendra Gurve
- Andrew Centen
- Maged Goubran
- Andrew Lim
Paper Information
- arXiv ID: 2512.01986v1
- Categories: q-bio.QM, cs.LG
- Published: December 1, 2025
- PDF: Download PDF