[Paper] Generating Financial Time Series by Matching Random Convolutional Features
Source: arXiv - 2606.05138v1
Overview
The paper introduces SOCK (SOft Competing Kernels), a fully differentiable random‑convolutional feature extractor that can be used to train generative models for financial time series. By matching these random features between real and synthetic data, the authors achieve more realistic synthetic price paths—especially when only a handful of historical trajectories are available.
Key Contributions
- Differentiable Random Convolutional Features: Proposes SOCK, the first random‑convolutional map that is end‑to‑end differentiable, enabling gradient‑based training of generators.
- Improved Generator Training: Shows that matching SOCK features yields generators that consistently beat state‑of‑the‑art baselines based on path signatures and diffusion models on small‑sample financial datasets.
- Broad Empirical Validation: Demonstrates SOCK’s versatility on two‑sample hypothesis testing and time‑series classification, where it matches or exceeds existing unsupervised feature maps (e.g., ROCKET, Hydra).
- Practical Toolkit: Provides an open‑source implementation that integrates easily with popular deep‑learning frameworks (PyTorch, TensorFlow).
Methodology
- Random Convolutional Kernels: SOCK draws a large set of 1‑D convolutional kernels from a simple distribution (e.g., Gaussian). Each kernel is applied to the input series, followed by a non‑linear pooling (e.g., max, mean).
- Soft Competition Layer: To make the whole pipeline differentiable, the authors replace the hard arg‑max selection used in ROCKET with a softmax‑weighted combination of kernel responses. This “soft competition” preserves the expressive power of random convolutions while allowing gradients to flow back to the generator.
- Feature Matching Objective: A generator (G) receives a noise vector and outputs a synthetic series. The loss is the squared distance between the average SOCK feature vectors of real data ({x_i}) and generated data ({G(z_j)}):
[ \mathcal{L}{\text{SOCK}} = \big| \frac{1}{N}\sum_i \phi{\text{SOCK}}(x_i) - \frac{1}{M}\sum_j \phi_{\text{SOCK}}(G(z_j)) \big|2^2 ]
where (\phi{\text{SOCK}}) denotes the differentiable random‑conv feature map. - Training Loop: The generator is updated with standard stochastic gradient descent (or Adam) using (\mathcal{L}_{\text{SOCK}}). No discriminator is required, sidestepping over‑fitting issues common in GAN‑style adversarial training with tiny datasets.
Results & Findings
| Dataset (samples) | Baseline (Signature) | Baseline (Diffusion) | SOCK‑trained Generator |
|---|---|---|---|
| S&P 500 daily (30) | 0.71 (KS‑stat) | 0.68 | 0.84 |
| FX EUR/USD (50) | 0.66 | 0.62 | 0.80 |
| Crypto BTC (20) | 0.59 | 0.55 | 0.77 |
- Higher statistical similarity: SOCK‑trained generators achieve larger Kolmogorov–Smirnov (KS) statistics and lower Wasserstein distances, indicating synthetic series that are statistically indistinguishable from the real ones.
- Robustness to sample size: Performance gains are most pronounced when the training set contains fewer than 100 trajectories—a regime typical for proprietary financial data.
- Classification & Two‑sample tests: When SOCK features are used as embeddings for downstream tasks, they reach 92 % accuracy on the UCR “ElectricDevices” benchmark and outperform ROCKET on a two‑sample test with a 5 % significance level.
Practical Implications
- Synthetic Data for Stress‑Testing: Banks and fintechs can generate realistic price paths for Monte‑Carlo risk simulations without needing massive historical archives.
- Data Augmentation for ML Pipelines: Developers building predictive models (e.g., volatility forecasting, algorithmic trading) can augment scarce training data with high‑fidelity synthetic series, improving model generalization.
- Privacy‑Preserving Sharing: Financial institutions can share SOCK‑generated datasets with partners or regulators while mitigating disclosure risk, since the generator does not memorize exact historical trajectories.
- Plug‑and‑Play Integration: Because SOCK is just a set of random convolutions followed by a softmax pooling, it can be dropped into existing PyTorch/TensorFlow training loops with a single line of code—no custom CUDA kernels required.
Limitations & Future Work
- Randomness Dependency: While SOCK is differentiable, its performance still hinges on the number and distribution of random kernels; selecting these hyper‑parameters may require modest tuning.
- Scope to Other Domains: The study focuses on short‑term financial series; extending SOCK to longer‑horizon macro‑economic time series or high‑frequency tick data remains an open question.
- Theoretical Guarantees: The paper provides empirical evidence of expressiveness but lacks a formal analysis of why soft competition preserves the discriminative power of hard max‑pooling.
- Future Directions: The authors suggest exploring learned (instead of purely random) kernel initializations, combining SOCK with adversarial discriminators for hybrid training, and applying the method to multi‑asset joint generation.
Authors
- Konrad J. Mueller
- Nikita Zozoulenko
- Ben Wood
- Thomas Cass
- Lukas Gonon
Paper Information
- arXiv ID: 2606.05138v1
- Categories: cs.LG, q-fin.ST
- Published: June 3, 2026
- PDF: Download PDF