[Paper] Evo-TFS: Evolutionary Time-Frequency Domain-Based Synthetic Minority Oversampling Approach to Imbalanced Time Series Classification

Published: (January 3, 2026 at 05:38 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.01150v1

Overview

Time‑series classification (TSC) powers everything from predictive maintenance to health monitoring, but most state‑of‑the‑art models assume the training data are evenly balanced across classes. In real‑world deployments the minority class—often the one that matters most (e.g., a fault event)—is under‑represented, causing deep‑learning classifiers to miss it. The paper Evo‑TFS introduces a new oversampling technique that synthesises realistic minority‑class series by evolving them in both the time and frequency domains, dramatically improving classification performance on imbalanced TSC problems.

Key Contributions

  • Evolutionary Oversampling Framework – Leverages strongly‑typed Genetic Programming (GP) to generate synthetic time‑series samples that respect both temporal and spectral properties.
  • Dual‑Domain Fitness Function – Combines time‑domain similarity (e.g., shape, amplitude) with frequency‑domain metrics (e.g., power spectral density) to guide the GP evolution toward high‑quality, diverse series.
  • Domain‑Agnostic Design – Works with any downstream classifier (CNN, LSTM, shapelet‑based, or frequency‑domain models) without requiring model‑specific tweaks.
  • Comprehensive Empirical Evaluation – Benchmarks Evo‑TFS against classic oversamplers (SMOTE, ADASYN) and recent time‑series‑specific methods on multiple imbalanced datasets, showing statistically significant gains.
  • Open‑Source Implementation – Authors release the GP‑based oversampler as a Python package, facilitating easy integration into existing ML pipelines.

Methodology

  1. Data Representation – Each original minority‑class series is transformed into two parallel representations: the raw time‑domain signal and its Fourier‑based frequency spectrum.
  2. Strongly‑Typed Genetic Programming – A population of candidate programs (individuals) is initialized. Each program defines a recipe for constructing a new series by combining primitive operations (e.g., scaling, shifting, windowing) that are type‑checked to ensure valid time‑domain or frequency‑domain manipulations.
  3. Fitness Evaluation – For every candidate series, two scores are computed:
    • Time‑Domain Score: measures shape similarity to real minority samples using Dynamic Time Warping (DTW) and statistical moments.
    • Frequency‑Domain Score: assesses spectral similarity via cosine similarity of power spectra and preservation of dominant frequency components.
      The overall fitness is a weighted sum of these scores, encouraging candidates that look realistic in both domains.
  4. Evolutionary Operators – Standard GP operators (crossover, mutation) are applied, respecting type constraints, to evolve the population over several generations. The best individuals are selected as synthetic minority samples.
  5. Integration with Classifiers – The generated series are appended to the training set, and any off‑the‑shelf TSC model can be trained as usual.

Results & Findings

  • Classification Boost: Across 12 publicly available imbalanced TSC benchmarks, Evo‑TFS raised the macro‑F1 score by an average of 7.4 % over the next best oversampler.
  • Model‑Agnostic Gains: Both deep models (CNN, LSTM) and classic shapelet‑based classifiers saw improvements, confirming that the synthetic data are useful regardless of the downstream architecture.
  • Diversity Preservation: Spectral analysis showed that Evo‑TFS generated a broader range of frequency patterns than SMOTE‑based methods, reducing overfitting to a narrow set of synthetic examples.
  • Statistical Significance: Paired Wilcoxon signed‑rank tests (p < 0.01) validate that the observed performance gains are not due to random chance.

Practical Implications

  • Fault Detection & Predictive Maintenance – Engineers can now train more reliable anomaly detectors even when failure events are scarce, reducing downtime and maintenance costs.
  • Healthcare Time‑Series (ECG, Wearables) – Better minority‑class synthesis helps identify rare pathological patterns without needing massive labeled datasets.
  • Financial Time‑Series Anomaly Spotting – Traders can improve detection of rare market manipulations or flash crashes, enhancing risk‑management systems.
  • Plug‑and‑Play Integration – Since Evo‑TFS outputs standard NumPy arrays, developers can drop it into existing pipelines (scikit‑learn, PyTorch, TensorFlow) with a single function call.
  • Reduced Data Collection Burden – Organizations can achieve high‑performing models without the expensive effort of gathering more minority samples, accelerating time‑to‑market for AI‑enabled products.

Limitations & Future Work

  • Computational Overhead – The GP evolution step is more expensive than simple interpolation‑based oversamplers; scaling to very large datasets may require parallelisation or surrogate fitness approximations.
  • Parameter Sensitivity – The balance between time‑ and frequency‑domain fitness weights can affect results; automated hyper‑parameter tuning is an open challenge.
  • Domain Specificity – While the method is generic, certain domains (e.g., irregularly sampled sensor streams) may need custom primitives or preprocessing steps.
  • Future Directions – The authors plan to explore hybrid evolutionary‑GAN approaches for faster sample generation, incorporate multivariate time‑series extensions, and evaluate on streaming/online learning scenarios.

Authors

  • Wenbin Pei
  • Ruohao Dai
  • Bing Xue
  • Mengjie Zhang
  • Qiang Zhang
  • Yiu-Ming Cheung

Paper Information

  • arXiv ID: 2601.01150v1
  • Categories: cs.LG, cs.NE
  • Published: January 3, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »