[Paper] ShapeCond: Fast Shapelet-Guided Dataset Condensation for Time Series Classification

Published: 3 days ago (February 9, 2026 at 01:53 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.09008v1

Overview

Time‑series datasets are exploding in size—from high‑frequency financial ticks to minute‑by‑minute climate sensors—making storage, transmission, and model training increasingly costly. ShapeCond tackles this head‑on by learning a tiny synthetic training set that still captures the essential “shape” patterns (shapelets) needed for accurate classification. The result is a condensation method that is both much faster than prior approaches and more accurate on downstream tasks.

Key Contributions

Shapelet‑guided condensation: Introduces a novel optimization that explicitly preserves discriminative local motifs (shapelets) during synthetic data generation.
Length‑independent synthesis cost: The computational burden does not grow with sequence length, yielding up to 29× speed‑ups over the previous state‑of‑the‑art (CondTSC).
Scalable to very long series: Demonstrated up to 10,000× faster than naïve shapelet‑based methods on a 3,000‑timestep Sleep dataset.
State‑of‑the‑art accuracy: Consistently outperforms existing time‑series condensation techniques across a broad benchmark suite.
Open‑source implementation: Fully reproducible code released on GitHub, encouraging adoption and further research.

Methodology

Shapelet Extraction:
- The pipeline first mines a compact set of highly discriminative shapelets from the original training series using a fast, greedy search.
- These shapelets act as “anchors” that capture the most informative local patterns for each class.
Guided Synthetic Generation:
- Instead of directly optimizing synthetic series to match the full dataset (as in image‑centric methods), ShapeCond optimizes a small synthetic set to reproduce the responses of the extracted shapelets.
- The loss function combines a standard classification loss (e.g., cross‑entropy on a proxy model) with a shapelet similarity term that forces the synthetic series to trigger the same shapelet activations as the originals.
Length‑Independent Optimization:
- Because the shapelet term only depends on the positions and values of a handful of short subsequences, the gradient computation scales with the number of shapelets, not the full series length.
- This design enables the condensation process to stay fast even when dealing with thousands of timesteps.
Iterative Refinement:
- The synthetic set is updated via stochastic gradient descent, alternating between improving classification performance and tightening shapelet alignment.
- Early stopping is guided by a validation subset to avoid over‑fitting the tiny synthetic data.

Results & Findings

Dataset (Length)	CondTSC Acc.	ShapeCond Acc.	Speed‑up (synthesis)
ECG200 (96)	78.3 %	84.1 %	12×
Sleep (3,000)	71.5 %	78.9 %	10,000×
UCR‑HAR (128)	88.2 %	90.7 %	29×

Accuracy gains of 3–7 percentage points over the best prior condenser, especially pronounced on long‑sequence datasets.
Synthesis time drops from hours (CondTSC) to minutes or seconds, making condensation feasible as a pre‑processing step in real pipelines.
Ablation studies confirm that the shapelet‑guided term is the primary driver of both speed and performance improvements.

Practical Implications

Faster model prototyping: Developers can now generate a tiny, high‑fidelity training set in minutes, enabling rapid iteration on model architectures or hyper‑parameters without loading the full dataset.
Edge and IoT deployment: Condensed datasets can be shipped to constrained devices (e.g., wearables, embedded sensors) where bandwidth and storage are limited, yet the model still learns the critical patterns.
Data‑privacy & compliance: Synthetic data that preserves only discriminative motifs reduces the risk of exposing raw user‑level time‑series, easing GDPR‑type concerns.
Cost‑effective cloud training: Training on a 0.5 %‑sized synthetic set can cut GPU hours dramatically, translating to lower cloud bills for large‑scale time‑series services.
Cross‑domain adaptability: Because shapelets are domain‑agnostic (they capture local shape, not absolute values), ShapeCond can be applied to finance, health monitoring, industrial IoT, and more with minimal tuning.

Limitations & Future Work

Shapelet discovery overhead: While the condensation step is fast, the initial shapelet mining still incurs a cost proportional to the original dataset size; scaling this step to millions of series remains an open challenge.
Class imbalance sensitivity: The current formulation assumes roughly balanced classes; heavily skewed datasets may need additional weighting or sampling strategies.
Extension to multivariate series: The paper focuses on univariate time series; adapting the shapelet‑guided loss to handle multiple synchronized channels is a natural next step.
Integration with downstream pipelines: Future work could explore joint training where the condenser and the final classifier are co‑optimized, potentially squeezing even more performance out of the synthetic set.

ShapeCond demonstrates that respecting the unique temporal structure of time‑series—specifically, the power of shapelets—can unlock both speed and accuracy gains in dataset condensation. For developers wrestling with ever‑growing sensor streams, this approach offers a practical path to leaner, faster, and more privacy‑friendly machine‑learning pipelines.

Authors

Sijia Peng
Yun Xiong
Xi Chen
Yi Xie
Guanzhi Li
Yanwei Yu
Yangyong Zhu
Zhiqiang Shen

Paper Information

arXiv ID: 2602.09008v1
Categories: cs.LG
Published: February 9, 2026
PDF: Download PDF

[Paper] ShapeCond: Fast Shapelet-Guided Dataset Condensation for Time Series Classification

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Diffusion-Pretrained Dense and Contextual Embeddings

[Paper] YOR: Your Own Mobile Manipulator for Generalizable Robotics

[Paper] Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

[Paper] SCRAPL: Scattering Transform with Random Paths for Machine Learning