[Paper] Can TabPFN Compete with GNNs for Node Classification via Graph Tabularization?
Source: arXiv - 2512.08798v1
Overview
The paper asks a simple yet provocative question: Can a powerful tabular‑learning foundation model replace graph‑specific neural networks for node classification? By converting graph structures into rich tabular features and feeding them to TabPFN, a pretrained transformer that excels on tabular data, the authors show that you can achieve performance on par with (or even better than) state‑of‑the‑art Graph Neural Networks (GNNs) – especially on graphs where the usual homophily assumption breaks down.
Key Contributions
- TabPFN‑GN pipeline: A systematic way to “tabularize” a graph by concatenating node attributes, structural descriptors, positional encodings, and optionally smoothed neighborhood aggregates.
- Zero‑shot node classification: Leverages the pretrained TabPFN model directly, requiring no graph‑specific fine‑tuning or large language‑model back‑ends.
- Extensive benchmarking: Experiments on 12 widely used node‑classification datasets (both homophilous and heterophilous) demonstrate competitive or superior accuracy compared to leading GNN architectures.
- Empirical insight: Shows that well‑engineered tabular features can capture enough graph information to close the gap between tabular and graph domains, challenging the belief that dedicated GNNs are always necessary.
- Open‑source reproducibility: The authors release code and feature‑engineering scripts, enabling practitioners to try the approach on their own graphs.
Methodology
-
Feature Extraction
- Node attributes: Original feature vectors (if any).
- Structural properties: Degree, clustering coefficient, PageRank, eigenvector centrality, etc.
- Positional encodings: Laplacian eigenvectors or random‑walk based embeddings that give each node a coordinate in a low‑dimensional space.
- Neighborhood smoothing (optional): Apply a few rounds of graph diffusion (e.g., personalized PageRank or simple averaging) to blend neighbor information into the node’s feature vector.
-
Tabularization
- Concatenate all the above descriptors into a single flat vector per node, yielding a classic tabular dataset where each row = a node, columns = engineered features, and the target = node label.
-
Model Inference
- Feed the resulting table into TabPFN, a transformer‑based model pretrained on millions of synthetic tabular tasks.
- TabPFN predicts class probabilities in a zero‑shot fashion—no additional gradient updates are performed.
-
Evaluation
- Compare accuracy (and sometimes F1) against GNN baselines (GCN, GAT, GraphSAGE, H2GCN, etc.) under identical train/validation/test splits.
The pipeline is deliberately lightweight: once the features are computed (a one‑time O(|E|) operation), inference is just a forward pass through TabPFN, which runs on a single GPU or even CPU for modest graph sizes.
Results & Findings
| Dataset type | Homophily | TabPFN‑GN Accuracy | Best GNN Accuracy |
|---|---|---|---|
| Cora, Citeseer, Pubmed | High | ≈ same (±0.5 %) | Slightly higher (≈ 0.3 %) |
| Squirrel, Chameleon | Low | +3–5 % over GNNs | Lower |
| Actor, Cornell, Texas, Wisconsin | Mixed | Competitive (within 1 %) | Comparable |
- Homophilous graphs: TabPFN‑GN matches GNNs, confirming that the engineered features preserve the signal that GNNs normally exploit.
- Heterophilous graphs: TabPFN‑GN consistently outperforms GNNs, likely because the handcrafted structural descriptors capture cross‑class connections that message‑passing GNNs tend to smooth away.
- Training cost: No back‑propagation on the graph data; the only compute is the one‑time feature extraction and a forward pass through TabPFN (≈ seconds for graphs with ≤ 10k nodes).
Practical Implications
- Rapid prototyping: Data scientists can spin up a node‑classification model without writing any GNN code or tuning graph‑specific hyperparameters.
- Resource‑constrained environments: Since TabPFN‑GN avoids expensive GPU training cycles, it’s attractive for edge devices or organizations lacking large compute budgets.
- Heterophily handling: Many real‑world networks (e.g., fraud detection, recommendation systems) exhibit low homophily; TabPFN‑GN offers a ready‑made alternative that sidesteps the need for specialized heterophilous GNN designs.
- Integration with existing pipelines: The tabular output can be fed into any downstream system that already consumes CSV/Parquet data—no need to embed a graph engine.
- Foundation‑model synergy: Demonstrates that a pretrained tabular foundation model can serve as a “universal learner” across modalities when the right feature engineering is applied, opening doors to similar cross‑modal tricks (e.g., turning text graphs into tabular data).
Limitations & Future Work
- Scalability: Feature extraction still requires O(|E|) operations and memory proportional to the number of nodes; extremely large graphs (millions of nodes) may need sampling or distributed processing.
- Feature engineering dependency: The approach’s success hinges on the quality of handcrafted descriptors; automated feature learning (e.g., via graph‑aware autoencoders) could reduce manual effort.
- Static graphs only: The current pipeline assumes a fixed graph; extending to dynamic or streaming graphs would require incremental feature updates.
- Benchmark breadth: While 12 datasets are solid, more diverse domains (e.g., knowledge graphs, protein interaction networks) would further validate generality.
- Model interpretability: TabPFN’s predictions are less transparent than classic GNN message‑passing; future work could explore attribution methods tailored to the tabularized graph features.
Overall, the study provides a compelling proof‑of‑concept that “graph tabularization + a strong tabular foundation model” can be a practical, low‑maintenance alternative to bespoke GNN training—especially when dealing with heterophilous networks or limited compute resources.
Authors
- Jeongwhan Choi
- Woosung Kang
- Minseo Kim
- Jongwoo Kim
- Noseong Park
Paper Information
- arXiv ID: 2512.08798v1
- Categories: cs.LG, cs.AI
- Published: December 9, 2025
- PDF: Download PDF