[Paper] Benchmarking Graph Neural Networks in Solving Hard Constraint Satisfaction Problems

Published: (February 20, 2026 at 01:41 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.18419v1

Overview

The paper Benchmarking Graph Neural Networks in Solving Hard Constraint Satisfaction Problems examines whether modern Graph Neural Networks (GNNs) truly outperform classic optimization heuristics on some of the toughest combinatorial problems. By introducing a rigorously‑designed set of benchmark instances drawn from statistical‑physics models, the authors provide the community with a reproducible yardstick for future claims of “neural superiority.”

Key Contributions

  • Hard, statistically‑grounded CSP benchmarks – a publicly released suite ( RandCSPBench ) of random constraint‑satisfaction problems that are provably difficult for both exact solvers and heuristics.
  • Comprehensive performance study – side‑by‑side evaluation of several state‑of‑the‑art GNN architectures against leading classical algorithms (e.g., Survey Propagation, WalkSAT, Simulated Annealing).
  • Fair experimental protocol – identical training/validation splits, hyper‑parameter budgets, and runtime caps to eliminate hidden advantages.
  • Insightful analysis of failure modes – identification of why GNNs struggle on the hardest instances (e.g., limited expressive power, over‑fitting to easy patterns).
  • Open‑source tooling – the benchmark generator, data loaders, and evaluation scripts are all released under an MIT license, ready for integration into CI pipelines or research notebooks.

Methodology

  1. Benchmark Generation – The authors use random ensembles from statistical physics (e.g., random k‑SAT, graph coloring on Erdős‑Rényi graphs) tuned to sit right at the phase transition where problem hardness spikes.
  2. Instance Sampling – For each problem type, thousands of instances are generated with varying sizes (up to several thousand variables) and stored in a standard JSON format.
  3. Algorithm Portfolio
    • Classical heuristics: Survey Propagation, WalkSAT, Parallel Tempering, and a tuned Simulated Annealing variant.
    • Neural solvers: Three GNN models (Message‑Passing Neural Network, Graph Convolutional Network, and a recent Graph Attention Network) trained end‑to‑end to predict variable assignments.
  4. Training Regime – Neural models receive only the training split of instances; hyper‑parameters (learning rate, depth, message‑passing steps) are selected via grid search on a held‑out validation set.
  5. Evaluation – All methods are run under the same wall‑clock budget per instance (e.g., 30 seconds) and measured on two metrics: solution quality (fraction of satisfied constraints) and success rate (percentage of instances solved to optimality).

The whole pipeline is scripted in Python, leveraging PyTorch Geometric for the GNNs and existing open‑source CSP solvers for the baselines.

Results & Findings

SolverAvg. Satisfaction (%)Optimal‑solution RateTypical Runtime
Survey Propagation98.794.2< 1 s
WalkSAT96.388.12–5 s
Simulated Annealing95.985.410 s
GNN‑MPNN84.531.20.5 s (inference)
GNN‑GCN82.127.80.4 s
GNN‑GAT80.924.50.6 s

Key take‑aways

  • Classical heuristics still dominate on the hardest random CSPs, often solving >90 % of instances within seconds.
  • GNN‑based solvers are fast at inference time but achieve significantly lower success rates, especially as instance size grows beyond a few hundred variables.
  • The performance gap widens near the phase‑transition density, confirming that the benchmark truly stresses algorithmic limits.

Practical Implications

  • For developers building AI‑augmented solvers: The study suggests that GNNs are currently better suited as pre‑processors (e.g., to generate good initializations) rather than stand‑alone solvers for hard CSPs.
  • Integration into existing pipelines: The open‑source benchmark can be plugged into CI tests to automatically verify that any new neural optimizer actually improves over baseline heuristics on a representative hard workload.
  • Hardware considerations: Since GNN inference is lightweight, hybrid approaches (GNN‑guided search + classical local moves) could exploit GPUs for rapid candidate generation while relying on CPUs for deep refinement.
  • Product roadmap: Companies working on scheduling, verification, or combinatorial design can use RandCSPBench to stress‑test their proprietary heuristics before release, ensuring robustness against worst‑case inputs.

Limitations & Future Work

  • Benchmark scope – The current suite focuses on random CSPs; real‑world instances often exhibit structure (e.g., community graphs) that may be easier for GNNs.
  • Model capacity – Only relatively shallow GNNs were explored; deeper or more expressive architectures (e.g., higher‑order message passing) might close the gap.
  • Training data – Models were trained exclusively on instances from the same distribution; transfer learning across problem families remains an open question.
  • Scalability – Experiments capped at ~10 k variables; scaling to industrial‑size problems (10⁶+ variables) will require distributed training and inference pipelines.

The authors encourage the community to extend the benchmark, experiment with hybrid algorithms, and report results using the provided evaluation scripts to foster transparent, reproducible progress in neural combinatorial optimization.

Authors

  • Geri Skenderi
  • Lorenzo Buffoni
  • Francesco D’Amico
  • David Machado
  • Raffaele Marino
  • Matteo Negri
  • Federico Ricci-Tersenghi
  • Carlo Lucibello
  • Maria Chiara Angelini

Paper Information

  • arXiv ID: 2602.18419v1
  • Categories: cond-mat.dis-nn, cs.LG
  • Published: February 20, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »