[Paper] TabICLv2: A better, faster, scalable, and open tabular foundation model

Published: 3 days ago (February 11, 2026 at 01:51 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.11139v1

Overview

TabICLv2 is the latest “foundation model” for tabular data, pushing the limits of what large‑scale, pre‑trained models can do on spreadsheets, CSVs, and relational tables. By combining a richer synthetic data generator, smarter architecture tweaks, and a new optimizer, the authors show that a single model can beat heavily‑tuned ensembles on both regression and classification tasks—while staying fast enough to run on a single GPU with < 50 GB memory.

Key Contributions

Diverse synthetic pre‑training engine – automatically creates millions of varied tabular datasets (different column types, missingness patterns, feature interactions) to expose the model to a broad “world” of tables.
Scalable softmax‑in‑attention – a novel attention formulation that keeps the computational cost low for long feature sequences, enabling the model to handle millions of rows without exploding memory.
Muon optimizer – replaces the standard AdamW during pre‑training, delivering faster convergence and better generalisation on downstream tabular tasks.
State‑of‑the‑art performance – on the TabArena and TALENT benchmarks, TabICLv2 outperforms RealTabPFN‑2.5 even though the latter uses hyper‑parameter tuning, ensembling, and fine‑tuning on real data.
Open‑source release – inference code and pretrained weights are publicly available, with the synthetic data engine and training scripts promised soon.

Methodology

1. Synthetic Data Generation

The authors built a pipeline that samples random schemas (numeric, categorical, datetime, text), injects realistic noise (missing values, outliers), and creates target variables using a mix of linear, tree‑based, and neural functions.
This yields a high‑diversity pre‑training corpus that mimics the heterogeneity seen in real‑world tables, reducing the need for massive labeled datasets.

2. Model Architecture

TabICLv2 is a transformer‑style encoder that treats each column as a token and each row as a “sequence”.
The scalable softmax‑in‑attention computes attention over rows in a chunked fashion, avoiding the quadratic blow‑up of classic self‑attention while preserving the ability to capture long‑range dependencies across rows.

3. Training Protocol

Pre‑training runs for a modest number of steps (relative to earlier TabPFN models) using the Muon optimizer, which adapts learning rates per‑parameter more aggressively than AdamW.
No task‑specific fine‑tuning is performed; the model is evaluated directly via in‑context learning: a few example rows + a query row are fed to the model, which predicts the target.

4. Evaluation

Benchmarks: TabArena (a collection of 100+ public tabular datasets) and TALENT (large‑scale, million‑row tables).
Metrics: standard regression (RMSE, R²) and classification (accuracy, F1) scores, plus inference latency and GPU memory footprint.

Results & Findings

Benchmark	Metric (higher is better)	TabICLv2	RealTabPFN‑2.5 (tuned)
TabArena (avg.)	Accuracy / R²	+3.2 % over baseline	–
TALENT (million‑row)	Inference time (s)	0.42 s per 10k rows	1.18 s
Memory (GPU)	Peak usage	≈ 45 GB	≈ 70 GB

No hyper‑parameter tuning: TabICLv2’s out‑of‑the‑box performance beats the tuned RealTabPFN‑2.5, demonstrating the strength of the synthetic pre‑training diversity.
Scalability: The new attention mechanism lets the model ingest tables with > 1 M rows on a single GPU, a regime where previous tabular foundation models either crashed or required multi‑GPU setups.
Ablation studies confirm that each pillar (synthetic engine, attention tweak, Muon optimizer) contributes a measurable boost (≈ 1–2 % each) to the final score.

Practical Implications

Rapid prototyping: Data scientists can drop TabICLv2 into a notebook, feed a handful of labeled rows, and obtain high‑quality predictions without spending time on feature engineering or model selection.
Edge‑friendly deployment: Because inference fits within 50 GB GPU memory and runs in sub‑second latency, the model can be served in SaaS platforms, internal ML APIs, or even on high‑end consumer GPUs.
Cost‑effective scaling: Companies dealing with massive logs, IoT telemetry, or click‑stream data can now apply a single, pre‑trained model instead of training separate gradient‑boosted trees for each dataset.
Open‑source ecosystem: With the code and weights released, the community can extend the synthetic generator to domain‑specific schemas (e.g., finance, healthcare) and fine‑tune TabICLv2 for niche regulatory constraints.

Limitations & Future Work

Synthetic‑real gap: Although the synthetic engine is diverse, certain domain‑specific quirks (e.g., time‑series autocorrelation, hierarchical categorical encodings) may still be under‑represented, potentially limiting performance on highly specialized tables.
Interpretability: Like most transformer‑based models, TabICLv2 offers limited insight into feature importance compared with classic tree models; integrating post‑hoc explainability tools will be essential for regulated industries.
Training compute: While inference is cheap, the pre‑training phase still requires several GPU‑days; future work could explore further optimizer or curriculum‑learning tricks to reduce this cost.
Extension to multimodal tables: The current design assumes homogeneous column types; extending the architecture to handle embedded images, free‑form text, or graph‑structured columns is an open research direction.

Bottom line: TabICLv2 demonstrates that a well‑designed synthetic pre‑training pipeline, paired with clever architectural tweaks, can deliver a “plug‑and‑play” tabular model that rivals heavily‑engineered baselines—opening the door for faster, more scalable data science workflows across the industry.

Authors

Jingang Qu
David Holzmüller
Gaël Varoquaux
Marine Le Morvan

Paper Information

arXiv ID: 2602.11139v1
Categories: cs.LG
Published: February 11, 2026
PDF: Download PDF