[Paper] Can TabPFN Compete with GNNs for Node Classification via Graph Tabularization?

Published: (December 9, 2025 at 11:51 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.08798v1

Overview

The paper asks a simple yet provocative question: Can a powerful tabular‑learning foundation model replace graph‑specific neural networks for node classification? By converting graph structures into rich tabular features and feeding them to TabPFN, a pretrained transformer that excels on tabular data, the authors show that you can achieve performance on par with (or even better than) state‑of‑the‑art Graph Neural Networks (GNNs) – especially on graphs where the usual homophily assumption breaks down.

Key Contributions

  • TabPFN‑GN pipeline: A systematic way to “tabularize” a graph by concatenating node attributes, structural descriptors, positional encodings, and optionally smoothed neighborhood aggregates.
  • Zero‑shot node classification: Leverages the pretrained TabPFN model directly, requiring no graph‑specific fine‑tuning or large language‑model back‑ends.
  • Extensive benchmarking: Experiments on 12 widely used node‑classification datasets (both homophilous and heterophilous) demonstrate competitive or superior accuracy compared to leading GNN architectures.
  • Empirical insight: Shows that well‑engineered tabular features can capture enough graph information to close the gap between tabular and graph domains, challenging the belief that dedicated GNNs are always necessary.
  • Open‑source reproducibility: The authors release code and feature‑engineering scripts, enabling practitioners to try the approach on their own graphs.

Methodology

  1. Feature Extraction

    • Node attributes: Original feature vectors (if any).
    • Structural properties: Degree, clustering coefficient, PageRank, eigenvector centrality, etc.
    • Positional encodings: Laplacian eigenvectors or random‑walk based embeddings that give each node a coordinate in a low‑dimensional space.
    • Neighborhood smoothing (optional): Apply a few rounds of graph diffusion (e.g., personalized PageRank or simple averaging) to blend neighbor information into the node’s feature vector.
  2. Tabularization

    • Concatenate all the above descriptors into a single flat vector per node, yielding a classic tabular dataset where each row = a node, columns = engineered features, and the target = node label.
  3. Model Inference

    • Feed the resulting table into TabPFN, a transformer‑based model pretrained on millions of synthetic tabular tasks.
    • TabPFN predicts class probabilities in a zero‑shot fashion—no additional gradient updates are performed.
  4. Evaluation

    • Compare accuracy (and sometimes F1) against GNN baselines (GCN, GAT, GraphSAGE, H2GCN, etc.) under identical train/validation/test splits.

The pipeline is deliberately lightweight: once the features are computed (a one‑time O(|E|) operation), inference is just a forward pass through TabPFN, which runs on a single GPU or even CPU for modest graph sizes.

Results & Findings

Dataset typeHomophilyTabPFN‑GN AccuracyBest GNN Accuracy
Cora, Citeseer, PubmedHigh≈ same (±0.5 %)Slightly higher (≈ 0.3 %)
Squirrel, ChameleonLow+3–5 % over GNNsLower
Actor, Cornell, Texas, WisconsinMixedCompetitive (within 1 %)Comparable
  • Homophilous graphs: TabPFN‑GN matches GNNs, confirming that the engineered features preserve the signal that GNNs normally exploit.
  • Heterophilous graphs: TabPFN‑GN consistently outperforms GNNs, likely because the handcrafted structural descriptors capture cross‑class connections that message‑passing GNNs tend to smooth away.
  • Training cost: No back‑propagation on the graph data; the only compute is the one‑time feature extraction and a forward pass through TabPFN (≈ seconds for graphs with ≤ 10k nodes).

Practical Implications

  • Rapid prototyping: Data scientists can spin up a node‑classification model without writing any GNN code or tuning graph‑specific hyperparameters.
  • Resource‑constrained environments: Since TabPFN‑GN avoids expensive GPU training cycles, it’s attractive for edge devices or organizations lacking large compute budgets.
  • Heterophily handling: Many real‑world networks (e.g., fraud detection, recommendation systems) exhibit low homophily; TabPFN‑GN offers a ready‑made alternative that sidesteps the need for specialized heterophilous GNN designs.
  • Integration with existing pipelines: The tabular output can be fed into any downstream system that already consumes CSV/Parquet data—no need to embed a graph engine.
  • Foundation‑model synergy: Demonstrates that a pretrained tabular foundation model can serve as a “universal learner” across modalities when the right feature engineering is applied, opening doors to similar cross‑modal tricks (e.g., turning text graphs into tabular data).

Limitations & Future Work

  • Scalability: Feature extraction still requires O(|E|) operations and memory proportional to the number of nodes; extremely large graphs (millions of nodes) may need sampling or distributed processing.
  • Feature engineering dependency: The approach’s success hinges on the quality of handcrafted descriptors; automated feature learning (e.g., via graph‑aware autoencoders) could reduce manual effort.
  • Static graphs only: The current pipeline assumes a fixed graph; extending to dynamic or streaming graphs would require incremental feature updates.
  • Benchmark breadth: While 12 datasets are solid, more diverse domains (e.g., knowledge graphs, protein interaction networks) would further validate generality.
  • Model interpretability: TabPFN’s predictions are less transparent than classic GNN message‑passing; future work could explore attribution methods tailored to the tabularized graph features.

Overall, the study provides a compelling proof‑of‑concept that “graph tabularization + a strong tabular foundation model” can be a practical, low‑maintenance alternative to bespoke GNN training—especially when dealing with heterophilous networks or limited compute resources.

Authors

  • Jeongwhan Choi
  • Woosung Kang
  • Minseo Kim
  • Jongwoo Kim
  • Noseong Park

Paper Information

  • arXiv ID: 2512.08798v1
  • Categories: cs.LG, cs.AI
  • Published: December 9, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »