[Paper] Which Algorithms Can Graph Neural Networks Learn?

Published: (February 13, 2026 at 12:09 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.13106v1

Overview

A new theoretical study investigates exactly what kinds of classic graph algorithms can be learned by message‑passing graph neural networks (MPNNs). By linking formal learning guarantees with algorithmic tasks such as shortest‑path, minimum‑spanning‑tree, and knapsack, the authors clarify when an MPNN trained on tiny graphs will reliably generalise to much larger instances—an essential step toward embedding algorithmic reasoning into real‑world AI pipelines.

Key Contributions

  • General learning framework: Provides sufficient conditions for an MPNN to learn a target algorithm from a finite training set of small graphs and to approximate it on arbitrarily large inputs.
  • Broad algorithm coverage: Shows the framework applies to single‑source shortest paths (SSSP), minimum spanning trees (MST), and a wide class of dynamic‑programming problems (e.g., 0‑1 knapsack).
  • Impossibility results: Proves that standard MPNNs cannot learn several natural algorithmic tasks, pinpointing the expressive gaps that cause failure.
  • Enhanced architectures: Introduces modest extensions to the MPNN message‑passing scheme that overcome the identified impossibility barriers.
  • Refined analysis for Bellman‑Ford: Derives a dramatically smaller training‑set size requirement for learning the Bellman‑Ford SSSP algorithm and incorporates a differentiable regularisation loss, extending prior work.
  • Empirical validation: Experiments on synthetic and real‑world graph datasets confirm the theoretical predictions, demonstrating successful generalisation and the limits of standard MPNNs.

Methodology

  1. Problem formalisation

    • Treat an algorithm as a function mapping a graph (and possibly auxiliary data) to an output (e.g., distance labels).
    • Define a training distribution consisting of small graphs drawn from a bounded family.
  2. Learning model

    • Use an MPNN with a fixed number of message‑passing rounds.
    • The network is trained with a standard supervised loss (e.g., mean‑squared error) on the algorithm’s output for each training graph.
  3. Theoretical analysis

    • Leverage uniform convergence and sample‑complexity tools to bound the number of training examples needed for the MPNN to approximate the target algorithm on any larger graph.
    • Identify algorithmic properties (e.g., locality of information propagation, monotonicity) that guarantee the required sample complexity is polynomial.
    • Prove negative results by constructing tasks where the required information cannot be captured by the limited receptive field of standard MPNNs.
  4. Architectural extensions

    • Add global read‑out vectors, higher‑order messages, or learnable attention to increase expressive power while keeping the model permutation‑equivariant.
  5. Experiments

    • Train baseline MPNNs and the proposed extensions on small synthetic graphs (≤ 20 nodes).
    • Test on much larger graphs (up to several thousand nodes) for SSSP, MST, and knapsack, measuring accuracy and runtime overhead.

Results & Findings

AlgorithmStandard MPNN (baseline)Extended MPNNTraining‑set size needed*
Bellman‑Ford (SSSP)Fails to generalise beyond ~30 nodesLearns perfectly up to 5 k nodesO(log 
Kruskal‑style MSTSystematic errors on cycles > 15 nodesCorrect MST edges on graphs of 2 k nodesPolynomial in max degree
0‑1 Knapsack (DP)Cannot represent optimal selectionAchieves > 95 % optimality on large instancesRequires only O(1)‑size training set due to DP locality

*Theoretical bounds derived in the paper; empirical curves match the predictions.

Key takeaways

  • Locality matters – algorithms whose decision for a node depends only on information that can travel a bounded number of hops (e.g., Bellman‑Ford with a fixed number of iterations) are learnable with modest data.
  • Global dependencies break standard MPNNs – tasks needing unbounded aggregation (e.g., exact MST via global sorting) are provably out of reach without architectural tweaks.
  • Small training sets suffice – under the identified conditions, a handful of tiny graphs can teach an MPNN to handle graphs orders of magnitude larger.

Practical Implications

  • Algorithmic modules inside larger AI systems – Developers can now embed a learned SSSP or knapsack solver as a differentiable component of end‑to‑end pipelines (e.g., routing in logistics, resource allocation in cloud orchestration).
  • Speed‑memory trade‑offs – A learned Bellman‑Ford variant runs on GPU with the same latency as a standard GNN forward pass, offering a potential speed boost over classic CPU‑based implementations for massive batched graphs.
  • Robustness to graph size – Because the theory guarantees generalisation, engineers can train on cheap synthetic data and deploy on production graphs that are far larger, reducing data‑collection costs.
  • Guidance for model design – The impossibility results steer practitioners away from plain MPNNs for tasks that need global ordering, prompting the use of the proposed extensions (global tokens, attention) instead.
  • Differentiable regularisation – The refined Bellman‑Ford analysis includes a smooth regularisation term, making it easier to integrate the learned solver into gradient‑based optimisation loops (e.g., joint learning of routing and demand forecasting).

Limitations & Future Work

  • Assumptions on input distribution – Guarantees rely on training graphs being drawn from a bounded family (e.g., bounded degree, limited edge weight range). Real‑world networks that violate these assumptions may need larger training sets.
  • Scalability of extensions – Adding global tokens or higher‑order messages increases memory consumption; efficient implementations for very large graphs remain an open engineering challenge.
  • Beyond deterministic algorithms – The current framework focuses on exact, deterministic procedures. Extending the theory to stochastic or approximation algorithms (e.g., randomized MST) is a promising direction.
  • Learning with limited supervision – The study assumes full algorithmic labels for every node/edge. Investigating weakly‑supervised or self‑supervised variants could broaden applicability.
  • Empirical breadth – While synthetic experiments are thorough, testing on diverse real‑world domains (road networks, communication graphs, biological interaction networks) would further validate practical impact.

Authors

  • Solveig Wittig
  • Antonis Vasileiou
  • Robert R. Nerem
  • Timo Stoll
  • Floris Geerts
  • Yusu Wang
  • Christopher Morris

Paper Information

  • arXiv ID: 2602.13106v1
  • Categories: cs.LG, cs.AI, cs.DS, cs.NE
  • Published: February 13, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »