[Paper] Can TabPFN Compete with GNNs for Node Classification via Graph Tabularization?

Published: 1 week ago (December 9, 2025 at 11:51 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.08798v1

Overview

The paper asks a simple yet provocative question: Can a powerful tabular‑learning foundation model replace graph‑specific neural networks for node classification? By converting graph structures into rich tabular features and feeding them to TabPFN, a pretrained transformer that excels on tabular data, the authors show that you can achieve performance on par with (or even better than) state‑of‑the‑art Graph Neural Networks (GNNs) – especially on graphs where the usual homophily assumption breaks down.

Key Contributions

TabPFN‑GN pipeline: A systematic way to “tabularize” a graph by concatenating node attributes, structural descriptors, positional encodings, and optionally smoothed neighborhood aggregates.
Zero‑shot node classification: Leverages the pretrained TabPFN model directly, requiring no graph‑specific fine‑tuning or large language‑model back‑ends.
Extensive benchmarking: Experiments on 12 widely used node‑classification datasets (both homophilous and heterophilous) demonstrate competitive or superior accuracy compared to leading GNN architectures.
Empirical insight: Shows that well‑engineered tabular features can capture enough graph information to close the gap between tabular and graph domains, challenging the belief that dedicated GNNs are always necessary.
Open‑source reproducibility: The authors release code and feature‑engineering scripts, enabling practitioners to try the approach on their own graphs.

Methodology

Feature Extraction
- Node attributes: Original feature vectors (if any).
- Structural properties: Degree, clustering coefficient, PageRank, eigenvector centrality, etc.
- Positional encodings: Laplacian eigenvectors or random‑walk based embeddings that give each node a coordinate in a low‑dimensional space.
- Neighborhood smoothing (optional): Apply a few rounds of graph diffusion (e.g., personalized PageRank or simple averaging) to blend neighbor information into the node’s feature vector.
Tabularization
- Concatenate all the above descriptors into a single flat vector per node, yielding a classic tabular dataset where each row = a node, columns = engineered features, and the target = node label.
Model Inference
- Feed the resulting table into TabPFN, a transformer‑based model pretrained on millions of synthetic tabular tasks.
- TabPFN predicts class probabilities in a zero‑shot fashion—no additional gradient updates are performed.
Evaluation
- Compare accuracy (and sometimes F1) against GNN baselines (GCN, GAT, GraphSAGE, H2GCN, etc.) under identical train/validation/test splits.

The pipeline is deliberately lightweight: once the features are computed (a one‑time O(|E|) operation), inference is just a forward pass through TabPFN, which runs on a single GPU or even CPU for modest graph sizes.

Results & Findings

Dataset type	Homophily	TabPFN‑GN Accuracy	Best GNN Accuracy
Cora, Citeseer, Pubmed	High	≈ same (±0.5 %)	Slightly higher (≈ 0.3 %)
Squirrel, Chameleon	Low	+3–5 % over GNNs	Lower
Actor, Cornell, Texas, Wisconsin	Mixed	Competitive (within 1 %)	Comparable

Homophilous graphs: TabPFN‑GN matches GNNs, confirming that the engineered features preserve the signal that GNNs normally exploit.
Heterophilous graphs: TabPFN‑GN consistently outperforms GNNs, likely because the handcrafted structural descriptors capture cross‑class connections that message‑passing GNNs tend to smooth away.
Training cost: No back‑propagation on the graph data; the only compute is the one‑time feature extraction and a forward pass through TabPFN (≈ seconds for graphs with ≤ 10k nodes).

Practical Implications

Rapid prototyping: Data scientists can spin up a node‑classification model without writing any GNN code or tuning graph‑specific hyperparameters.
Resource‑constrained environments: Since TabPFN‑GN avoids expensive GPU training cycles, it’s attractive for edge devices or organizations lacking large compute budgets.
Heterophily handling: Many real‑world networks (e.g., fraud detection, recommendation systems) exhibit low homophily; TabPFN‑GN offers a ready‑made alternative that sidesteps the need for specialized heterophilous GNN designs.
Integration with existing pipelines: The tabular output can be fed into any downstream system that already consumes CSV/Parquet data—no need to embed a graph engine.
Foundation‑model synergy: Demonstrates that a pretrained tabular foundation model can serve as a “universal learner” across modalities when the right feature engineering is applied, opening doors to similar cross‑modal tricks (e.g., turning text graphs into tabular data).

Limitations & Future Work

Scalability: Feature extraction still requires O(|E|) operations and memory proportional to the number of nodes; extremely large graphs (millions of nodes) may need sampling or distributed processing.
Feature engineering dependency: The approach’s success hinges on the quality of handcrafted descriptors; automated feature learning (e.g., via graph‑aware autoencoders) could reduce manual effort.
Static graphs only: The current pipeline assumes a fixed graph; extending to dynamic or streaming graphs would require incremental feature updates.
Benchmark breadth: While 12 datasets are solid, more diverse domains (e.g., knowledge graphs, protein interaction networks) would further validate generality.
Model interpretability: TabPFN’s predictions are less transparent than classic GNN message‑passing; future work could explore attribution methods tailored to the tabularized graph features.

Overall, the study provides a compelling proof‑of‑concept that “graph tabularization + a strong tabular foundation model” can be a practical, low‑maintenance alternative to bespoke GNN training—especially when dealing with heterophilous networks or limited compute resources.

Authors

Jeongwhan Choi
Woosung Kang
Minseo Kim
Jongwoo Kim
Noseong Park

Paper Information

arXiv ID: 2512.08798v1
Categories: cs.LG, cs.AI
Published: December 9, 2025
PDF: Download PDF

[Paper] Can TabPFN Compete with GNNs for Node Classification via Graph Tabularization?

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Spatia: Video Generation with Updatable Spatial Memory

[Paper] Predictive Concept Decoders: Training Scalable End-to-End Interpretability Assistants

[Paper] Artism: AI-Driven Dual-Engine System for Art Generation and Critique

[Paper] Learning Model Parameter Dynamics in a Combination Therapy for Bladder Cancer from Sparse Biological Data