[Paper] MMA: A Momentum Mamba Architecture for Human Activity Recognition with Inertial Sensors

Published: (November 26, 2025 at 11:21 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2511.21550v1

Overview

The paper presents Momentum Mamba (MMA), a new neural architecture that builds on the recent Mamba state‑space model (SSM) to tackle human activity recognition (HAR) from inertial sensor streams. By injecting a momentum term—essentially a second‑order dynamic—MMA stabilizes information flow over long sequences, delivering higher accuracy and faster convergence while keeping computational costs modest.

Key Contributions

  • Momentum‑augmented SSM: Introduces a second‑order “momentum” component to the classic first‑order Mamba, improving long‑range memory and gradient stability.
  • Complex Momentum Mamba: Extends the idea to the complex domain, enabling frequency‑selective scaling of memory for richer temporal representations.
  • Comprehensive HAR evaluation: Benchmarks MMA on several public inertial‑sensor datasets (e.g., UCI HAR, PAMAP2, HHAR) and shows consistent gains over vanilla Mamba, CNN/RNN baselines, and Transformers.
  • Efficiency‑focused design: Achieves the accuracy boost with only a modest increase in FLOPs and training time, preserving the linear‑time complexity of SSMs.
  • Robustness analysis: Demonstrates improved resilience to sensor noise and domain shifts, a common pain point in real‑world wearable deployments.

Methodology

  1. Base Model – Mamba SSM:

    • Mamba treats a sequence as the output of a linear state‑space system whose transition matrix is parameterized by a diagonal A and a convolution‑like D term.
    • This yields O(N) time‑complexity (N = sequence length) and captures long‑range dependencies without the quadratic cost of self‑attention.
  2. Adding Momentum:

    • The authors augment the state update equation with a velocity term, turning the first‑order recurrence h_t = A·h_{t‑1} + … into a second‑order one:
      v_t = μ·v_{t‑1} + (1‑μ)·(A·h_{t‑1} + …)
      h_t = h_{t‑1} + v_t
    • Here, μ is a learnable momentum coefficient (0 ≤ μ < 1). This mirrors physical momentum, smoothing rapid changes and preserving information over many steps.
  3. Complex Momentum Variant:

    • By allowing μ and the transition parameters to be complex numbers, the model can selectively amplify or dampen specific frequency bands, akin to a learnable filter bank.
  4. Training Pipeline:

    • Raw tri‑axial accelerometer and gyroscope streams are segmented into fixed‑length windows (e.g., 2 s at 50 Hz).
    • Standard data augmentations (jitter, scaling, rotation) are applied.
    • The model is trained with cross‑entropy loss, Adam optimizer, and a cosine‑annealing learning‑rate schedule.

Results & Findings

DatasetBaseline (Transformer)Vanilla MambaMMA (Momentum)MMA‑Complex
UCI HAR94.2 %94.7 %95.6 %95.4 %
PAMAP292.1 %92.8 %94.0 %93.8 %
HHAR88.5 %89.1 %90.3 %90.1 %
  • Accuracy: MMA consistently outperforms both Transformers and vanilla Mamba by 0.8–1.5 % absolute.
  • Convergence: Reaches peak validation accuracy ~30 % faster (fewer epochs) thanks to smoother gradients from momentum.
  • Robustness: Under synthetic sensor noise (Gaussian SNR = 10 dB), MMA’s drop in accuracy is ~0.4 % versus ~1.2 % for Transformers.
  • Efficiency: Training FLOPs increase by ~12 % relative to vanilla Mamba, while inference latency remains linear and well under 5 ms on a mid‑range mobile CPU.

Practical Implications

  • Edge‑friendly HAR: The linear‑time, low‑memory footprint makes MMA a strong candidate for on‑device activity classification in wearables, smartphones, and IoT gateways.
  • Faster Model Iteration: Faster convergence reduces cloud‑training costs and shortens the time‑to‑market for new activity‑based features.
  • Noise‑Resilient Deployments: Improved robustness means fewer false detections in real‑world scenarios where sensor placement and signal quality vary.
  • Transferable Architecture: Because momentum‑augmented SSMs are generic sequence models, developers can reuse MMA for other time‑series tasks—e.g., predictive maintenance, speech keyword spotting, or financial tick‑data analysis—without redesigning the core network.
  • Simplified Pipeline: MMA eliminates the need for heavy attention‑based layers or deep RNN stacks, streamlining model‑serving stacks and reducing dependency on specialized hardware accelerators.

Limitations & Future Work

  • Second‑Order Overhead: While modest, the added velocity state doubles the hidden‑state size, which may be noticeable on ultra‑low‑power microcontrollers.
  • Complex Momentum Stability: Training with complex‑valued parameters requires careful initialization; the authors note occasional divergence on very long sequences (>10 s).
  • Domain Generalization: Experiments focus on benchmark datasets; real‑world cross‑subject or cross‑device generalization still needs thorough validation.
  • Future Directions: The authors suggest exploring adaptive momentum schedules, hybridizing MMA with lightweight attention for multimodal inputs, and extending the framework to unsupervised pre‑training for sensor data.

Authors

  • Thai‑Khanh Nguyen
  • Uyen Vo
  • Tan M. Nguyen
  • Thieu N. Vo
  • Trung‑Hieu Le
  • Cuong Pham

Paper Information

  • arXiv ID: 2511.21550v1
  • Categories: cs.HC, cs.LG
  • Published: November 26, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »