[Paper] StretchTime: Adaptive Time Series Forecasting via Symplectic Attention

Published: 3 days ago (February 9, 2026 at 01:29 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.08983v1

Overview

The paper introduces StretchTime, a new transformer‑based architecture that can “stretch” or “compress” its sense of time when forecasting time‑series data. By replacing the standard rotary positional encoding with a learnable Symplectic Positional Embedding (SyPE), the model adapts to non‑uniform, warped temporal patterns that appear in finance, IoT sensor streams, health monitoring, and many other real‑world domains.

Key Contributions

Formal analysis of positional encoding limits – proves that the popular rotary position embedding (RoPE) cannot represent non‑affine (non‑linear) time warping.
Symplectic Positional Embeddings (SyPE) – a novel, Hamiltonian‑inspired encoding that generalizes RoPE from the rotation group SO(2) to the symplectic group Sp(2, ℝ).
Adaptive warp module – learns input‑dependent dilation/contraction factors, letting attention heads dynamically rescale temporal coordinates.
StretchTime architecture – integrates SyPE into a multivariate forecasting transformer, achieving state‑of‑the‑art results on several benchmark datasets.
Robustness to non‑stationary dynamics – demonstrates consistent performance gains on datasets with shifting periodicities and irregular sampling rates.

Methodology

Problem framing – Traditional transformers treat time as a uniformly spaced index. The authors show that many real series exhibit time‑warped dynamics where the effective “speed of time” changes locally.
Symplectic embedding design
- Starts from RoPE’s rotation matrix (R(\theta) \in \mathrm{SO}(2)).
- Extends to a symplectic matrix (S(\phi) \in \mathrm{Sp}(2,\mathbb{R})) that can represent both rotations and area‑preserving shears, giving extra degrees of freedom to model stretching/compressing.
- The warp factor (\phi_t) is produced by a lightweight neural module that conditions on the raw input at time (t) (e.g., recent values, trend indicators).
Integration with attention – Each token’s positional vector is multiplied by its learned symplectic matrix before entering the scaled‑dot‑product attention. This makes the similarity scores sensitive to the locally warped timeline.
End‑to‑end training – The warp module, SyPE parameters, and the rest of the transformer are optimized jointly with the usual forecasting loss (e.g., MSE or MAE). No extra supervision about the warping function is required.

Results & Findings

Dataset (type)	Baseline (RoPE)	StretchTime (SyPE)	Relative Δ
Electricity (hourly)	0.112 MAE	0.094	–16%
Traffic (15‑min)	0.087 MAE	0.074	–15%
Exchange‑rate (daily)	0.021 RMSE	0.018	–14%
Synthetic time‑warped series	0.145 MAE	0.103	–29%

Consistent gains across multivariate, univariate, and synthetic benchmarks, especially where the underlying frequency changes over time.
Ablation studies confirm that the adaptive warp module contributes the bulk of the improvement; removing it reverts performance close to the RoPE baseline.
Robustness tests (e.g., random missing values, irregular sampling) show that StretchTime degrades gracefully, whereas standard transformers suffer larger accuracy drops.

Practical Implications

Financial modeling – Traders can feed irregular tick data into StretchTime and obtain forecasts that automatically adjust to market regime shifts (e.g., sudden volatility spikes).
IoT & edge analytics – Sensor streams often have variable reporting intervals; StretchTime can handle the irregular cadence without costly resampling.
Healthcare monitoring – Physiological signals (heart rate, hormone levels) exhibit circadian drifts; the model can learn patient‑specific rhythm changes in real time.
Software integration – SyPE is a drop‑in replacement for RoPE in existing transformer libraries (PyTorch, TensorFlow). The extra parameters are lightweight (< 2 % of total model size), making it feasible for production and even on‑device inference.

Limitations & Future Work

Computational overhead – The adaptive warp module adds a small per‑token cost; on very long sequences (≥ 10 k steps) latency may become noticeable.
Interpretability – While the warp factor is learned, the paper does not provide a systematic way to extract human‑readable warping curves from the model.
Scope of benchmarks – Experiments focus on standard academic datasets; broader industry‑scale evaluations (e.g., high‑frequency trading, large‑scale smart‑city sensor networks) are left for future studies.
Extension to other modalities – The authors suggest exploring SyPE for video frame rate adaptation or irregular text streams, which remains an open research direction.

StretchTime shows that a modest, physics‑inspired tweak to positional encoding can unlock a new level of adaptability for time‑series transformers, making them far more useful for the messy, non‑uniform data that developers encounter every day.

Authors

Yubin Kim
Viresh Pati
Jevon Twitty
Vinh Pham
Shihao Yang
Jiecheng Lu

Paper Information

arXiv ID: 2602.08983v1
Categories: cs.LG, cs.AI
Published: February 9, 2026
PDF: Download PDF

[Paper] StretchTime: Adaptive Time Series Forecasting via Symplectic Attention

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Diffusion-Pretrained Dense and Contextual Embeddings

[Paper] YOR: Your Own Mobile Manipulator for Generalizable Robotics

[Paper] Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

[Paper] SCRAPL: Scattering Transform with Random Paths for Machine Learning