[Paper] MMA: A Momentum Mamba Architecture for Human Activity Recognition with Inertial Sensors

Published: 2 months ago (November 26, 2025 at 11:21 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2511.21550v1

Overview

The paper presents Momentum Mamba (MMA), a new neural architecture that builds on the recent Mamba state‑space model (SSM) to tackle human activity recognition (HAR) from inertial sensor streams. By injecting a momentum term—essentially a second‑order dynamic—MMA stabilizes information flow over long sequences, delivering higher accuracy and faster convergence while keeping computational costs modest.

Key Contributions

Momentum‑augmented SSM: Introduces a second‑order “momentum” component to the classic first‑order Mamba, improving long‑range memory and gradient stability.
Complex Momentum Mamba: Extends the idea to the complex domain, enabling frequency‑selective scaling of memory for richer temporal representations.
Comprehensive HAR evaluation: Benchmarks MMA on several public inertial‑sensor datasets (e.g., UCI HAR, PAMAP2, HHAR) and shows consistent gains over vanilla Mamba, CNN/RNN baselines, and Transformers.
Efficiency‑focused design: Achieves the accuracy boost with only a modest increase in FLOPs and training time, preserving the linear‑time complexity of SSMs.
Robustness analysis: Demonstrates improved resilience to sensor noise and domain shifts, a common pain point in real‑world wearable deployments.

Methodology

Base Model – Mamba SSM:
- Mamba treats a sequence as the output of a linear state‑space system whose transition matrix is parameterized by a diagonal A and a convolution‑like D term.
- This yields O(N) time‑complexity (N = sequence length) and captures long‑range dependencies without the quadratic cost of self‑attention.
Adding Momentum:
- The authors augment the state update equation with a velocity term, turning the first‑order recurrence h_t = A·h_{t‑1} + … into a second‑order one:
```
v_t = μ·v_{t‑1} + (1‑μ)·(A·h_{t‑1} + …)
h_t = h_{t‑1} + v_t
```
- Here, μ is a learnable momentum coefficient (0 ≤ μ < 1). This mirrors physical momentum, smoothing rapid changes and preserving information over many steps.
Complex Momentum Variant:
- By allowing μ and the transition parameters to be complex numbers, the model can selectively amplify or dampen specific frequency bands, akin to a learnable filter bank.
Training Pipeline:
- Raw tri‑axial accelerometer and gyroscope streams are segmented into fixed‑length windows (e.g., 2 s at 50 Hz).
- Standard data augmentations (jitter, scaling, rotation) are applied.
- The model is trained with cross‑entropy loss, Adam optimizer, and a cosine‑annealing learning‑rate schedule.

Results & Findings

Dataset	Baseline (Transformer)	Vanilla Mamba	MMA (Momentum)	MMA‑Complex
UCI HAR	94.2 %	94.7 %	95.6 %	95.4 %
PAMAP2	92.1 %	92.8 %	94.0 %	93.8 %
HHAR	88.5 %	89.1 %	90.3 %	90.1 %

Accuracy: MMA consistently outperforms both Transformers and vanilla Mamba by 0.8–1.5 % absolute.
Convergence: Reaches peak validation accuracy ~30 % faster (fewer epochs) thanks to smoother gradients from momentum.
Robustness: Under synthetic sensor noise (Gaussian SNR = 10 dB), MMA’s drop in accuracy is ~0.4 % versus ~1.2 % for Transformers.
Efficiency: Training FLOPs increase by ~12 % relative to vanilla Mamba, while inference latency remains linear and well under 5 ms on a mid‑range mobile CPU.

Practical Implications

Edge‑friendly HAR: The linear‑time, low‑memory footprint makes MMA a strong candidate for on‑device activity classification in wearables, smartphones, and IoT gateways.
Faster Model Iteration: Faster convergence reduces cloud‑training costs and shortens the time‑to‑market for new activity‑based features.
Noise‑Resilient Deployments: Improved robustness means fewer false detections in real‑world scenarios where sensor placement and signal quality vary.
Transferable Architecture: Because momentum‑augmented SSMs are generic sequence models, developers can reuse MMA for other time‑series tasks—e.g., predictive maintenance, speech keyword spotting, or financial tick‑data analysis—without redesigning the core network.
Simplified Pipeline: MMA eliminates the need for heavy attention‑based layers or deep RNN stacks, streamlining model‑serving stacks and reducing dependency on specialized hardware accelerators.

Limitations & Future Work

Second‑Order Overhead: While modest, the added velocity state doubles the hidden‑state size, which may be noticeable on ultra‑low‑power microcontrollers.
Complex Momentum Stability: Training with complex‑valued parameters requires careful initialization; the authors note occasional divergence on very long sequences (>10 s).
Domain Generalization: Experiments focus on benchmark datasets; real‑world cross‑subject or cross‑device generalization still needs thorough validation.
Future Directions: The authors suggest exploring adaptive momentum schedules, hybridizing MMA with lightweight attention for multimodal inputs, and extending the framework to unsupervised pre‑training for sensor data.

Authors

Thai‑Khanh Nguyen
Uyen Vo
Tan M. Nguyen
Thieu N. Vo
Trung‑Hieu Le
Cuong Pham

Paper Information

arXiv ID: 2511.21550v1
Categories: cs.HC, cs.LG
Published: November 26, 2025
PDF: Download PDF

[Paper] MMA: A Momentum Mamba Architecture for Human Activity Recognition with Inertial Sensors

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

[Paper] ThetaEvolve: Test-time Learning on Open Problems

[Paper] The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

[Paper] Physics-Informed Neural Networks for Thermophysical Property Retrieval