[Paper] StretchTime: Adaptive Time Series Forecasting via Symplectic Attention

Published: (February 9, 2026 at 01:29 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.08983v1

Overview

The paper introduces StretchTime, a new transformer‑based architecture that can “stretch” or “compress” its sense of time when forecasting time‑series data. By replacing the standard rotary positional encoding with a learnable Symplectic Positional Embedding (SyPE), the model adapts to non‑uniform, warped temporal patterns that appear in finance, IoT sensor streams, health monitoring, and many other real‑world domains.

Key Contributions

  • Formal analysis of positional encoding limits – proves that the popular rotary position embedding (RoPE) cannot represent non‑affine (non‑linear) time warping.
  • Symplectic Positional Embeddings (SyPE) – a novel, Hamiltonian‑inspired encoding that generalizes RoPE from the rotation group SO(2) to the symplectic group Sp(2, ℝ).
  • Adaptive warp module – learns input‑dependent dilation/contraction factors, letting attention heads dynamically rescale temporal coordinates.
  • StretchTime architecture – integrates SyPE into a multivariate forecasting transformer, achieving state‑of‑the‑art results on several benchmark datasets.
  • Robustness to non‑stationary dynamics – demonstrates consistent performance gains on datasets with shifting periodicities and irregular sampling rates.

Methodology

  1. Problem framing – Traditional transformers treat time as a uniformly spaced index. The authors show that many real series exhibit time‑warped dynamics where the effective “speed of time” changes locally.
  2. Symplectic embedding design
    • Starts from RoPE’s rotation matrix (R(\theta) \in \mathrm{SO}(2)).
    • Extends to a symplectic matrix (S(\phi) \in \mathrm{Sp}(2,\mathbb{R})) that can represent both rotations and area‑preserving shears, giving extra degrees of freedom to model stretching/compressing.
    • The warp factor (\phi_t) is produced by a lightweight neural module that conditions on the raw input at time (t) (e.g., recent values, trend indicators).
  3. Integration with attention – Each token’s positional vector is multiplied by its learned symplectic matrix before entering the scaled‑dot‑product attention. This makes the similarity scores sensitive to the locally warped timeline.
  4. End‑to‑end training – The warp module, SyPE parameters, and the rest of the transformer are optimized jointly with the usual forecasting loss (e.g., MSE or MAE). No extra supervision about the warping function is required.

Results & Findings

Dataset (type)Baseline (RoPE)StretchTime (SyPE)Relative Δ
Electricity (hourly)0.112 MAE0.094–16%
Traffic (15‑min)0.087 MAE0.074–15%
Exchange‑rate (daily)0.021 RMSE0.018–14%
Synthetic time‑warped series0.145 MAE0.103–29%
  • Consistent gains across multivariate, univariate, and synthetic benchmarks, especially where the underlying frequency changes over time.
  • Ablation studies confirm that the adaptive warp module contributes the bulk of the improvement; removing it reverts performance close to the RoPE baseline.
  • Robustness tests (e.g., random missing values, irregular sampling) show that StretchTime degrades gracefully, whereas standard transformers suffer larger accuracy drops.

Practical Implications

  • Financial modeling – Traders can feed irregular tick data into StretchTime and obtain forecasts that automatically adjust to market regime shifts (e.g., sudden volatility spikes).
  • IoT & edge analytics – Sensor streams often have variable reporting intervals; StretchTime can handle the irregular cadence without costly resampling.
  • Healthcare monitoring – Physiological signals (heart rate, hormone levels) exhibit circadian drifts; the model can learn patient‑specific rhythm changes in real time.
  • Software integration – SyPE is a drop‑in replacement for RoPE in existing transformer libraries (PyTorch, TensorFlow). The extra parameters are lightweight (< 2 % of total model size), making it feasible for production and even on‑device inference.

Limitations & Future Work

  • Computational overhead – The adaptive warp module adds a small per‑token cost; on very long sequences (≥ 10 k steps) latency may become noticeable.
  • Interpretability – While the warp factor is learned, the paper does not provide a systematic way to extract human‑readable warping curves from the model.
  • Scope of benchmarks – Experiments focus on standard academic datasets; broader industry‑scale evaluations (e.g., high‑frequency trading, large‑scale smart‑city sensor networks) are left for future studies.
  • Extension to other modalities – The authors suggest exploring SyPE for video frame rate adaptation or irregular text streams, which remains an open research direction.

StretchTime shows that a modest, physics‑inspired tweak to positional encoding can unlock a new level of adaptability for time‑series transformers, making them far more useful for the messy, non‑uniform data that developers encounter every day.

Authors

  • Yubin Kim
  • Viresh Pati
  • Jevon Twitty
  • Vinh Pham
  • Shihao Yang
  • Jiecheng Lu

Paper Information

  • arXiv ID: 2602.08983v1
  • Categories: cs.LG, cs.AI
  • Published: February 9, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »