[Paper] Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data

Published: 3 days ago (February 12, 2026 at 01:54 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.12267v1

Overview

The paper introduces Flow‑Guided Neural Operator (FGNO), a self‑supervised learning framework that treats the amount of corruption applied to a time‑series as a learnable “flow” rather than a fixed masking ratio. By blending operator learning with flow‑matching techniques, FGNO can extract multi‑scale representations from a single model and achieve state‑of‑the‑art results on several biomedical time‑series benchmarks.

Key Contributions

Dynamic corruption as a learning signal – replaces static masking ratios with a continuous flow that gradually adds noise, giving the model a richer supervisory signal.
Operator‑based architecture – leverages neural operators to learn mappings in functional space, enabling the model to handle varying temporal resolutions via the Short‑Time Fourier Transform (STFT).
Hierarchical feature extraction – taps into multiple network layers and multiple flow times, producing representations that span low‑level patterns to high‑level global context.
Clean‑input inference – although training uses noisy inputs, representations are extracted from pristine data, eliminating stochastic inference noise and boosting downstream accuracy.
Strong empirical gains – up to 35 % AUROC improvement on neural signal decoding, 16 % RMSE reduction on skin‑temperature prediction, and >20 % boost in accuracy/Macro‑F1 on sleep‑stage classification under low‑data regimes.

Methodology

Pre‑processing with STFT – each raw time‑series is transformed into a time‑frequency map, normalizing different sampling rates and resolutions into a common functional representation.
Flow‑guided corruption – a flow parameter (t \in [0,1]) controls how much Gaussian‑type noise is blended into the input. Instead of a single mask, the model sees a continuum of corrupted versions, learning to predict the clean signal as a function of (t).
Neural Operator core – the backbone is a neural operator (e.g., Fourier Neural Operator) that learns a mapping from the corrupted functional input to the clean output across the entire flow. Because operators act on functions rather than fixed‑size vectors, the same network can process sequences of varying length and sampling frequency.
Multi‑level representation read‑out – during training, hidden states from several layers and several flow times are stored. At inference, a downstream task can pick the most appropriate combination (e.g., early layers for short‑term patterns, deeper layers for long‑term trends).
Self‑supervised objective – a simple reconstruction loss (e.g., MSE between predicted and clean STFT) is applied across all flow times, encouraging the model to learn a smooth trajectory from noisy to clean representations.

Results & Findings

Dataset (Domain)	Metric	Baseline	FGNO	Relative Gain
BrainTreeBank (Neural signals)	AUROC	0.71	0.96	+35 %
DREAMT (Skin temperature)	RMSE	0.84 °C	0.71 °C	–16 %
SleepEDF (Sleep staging)	Accuracy / Macro‑F1	0.68 / 0.62	0.84 / 0.78	+20 %+

Gains are especially pronounced when only a small fraction of labeled data is available (e.g., 5 % of the full training set).
Ablation studies show that (i) using a static mask degrades performance by ~10 %, (ii) discarding the flow dimension reduces AUROC by ~8 %, and (iii) extracting representations from noisy inputs at test time hurts accuracy by ~4 %.
The operator‑based design proves robust to irregular sampling and missing values, common in biomedical recordings.

Practical Implications

Plug‑and‑play pre‑training – Developers can pre‑train FGNO on any unlabeled sensor stream (IoT, wearables, industrial logs) and fine‑tune a lightweight head for classification, regression, or anomaly detection.
Reduced labeling cost – Because FGNO thrives under data scarcity, teams can achieve high performance with far fewer annotated samples, accelerating product cycles for health‑tech and predictive‑maintenance solutions.
Unified model for heterogeneous time‑scales – The STFT + operator pipeline means the same model can ingest high‑frequency ECG, medium‑frequency temperature, or low‑frequency environmental data without redesign.
Deterministic inference – Clean‑input representation extraction eliminates randomness, simplifying deployment in latency‑critical or safety‑critical systems (e.g., bedside monitoring).
Potential for edge deployment – The underlying neural operator can be distilled or quantized, making FGNO a candidate for on‑device inference on low‑power microcontrollers.

Limitations & Future Work

Computational overhead – Training requires generating multiple corrupted versions per sample and performing STFTs, which can be memory‑intensive for very long sequences.
Domain specificity of the flow schedule – The current Gaussian‑based flow may not be optimal for highly non‑Gaussian noise patterns (e.g., bursty network traffic).
Limited evaluation outside biomedicine – While results are impressive on physiological data, broader benchmarks (finance, speech, IoT) are needed to confirm generality.
Future directions suggested by the authors include:
1. Learning the flow dynamics themselves (instead of fixing a Gaussian schedule).
2. Integrating attention‑style token mixing to better capture long‑range dependencies.
3. Exploring multi‑modal extensions where the operator jointly processes synchronized sensor streams.

Authors

Duy Nguyen
Jiachen Yao
Jiayun Wang
Julius Berner
Animashree Anandkumar

Paper Information

arXiv ID: 2602.12267v1
Categories: cs.LG
Published: February 12, 2026
PDF: Download PDF

[Paper] Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Scaling Verification Can Be More Effective than Scaling Policy Learning for Vision-Language-Action Alignment

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

[Paper] AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

[Paper] Agentic Test-Time Scaling for WebAgents