[Paper] Benchmarking Graph Neural Networks in Solving Hard Constraint Satisfaction Problems

Published: 3 days ago (February 20, 2026 at 01:41 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.18419v1

Overview

The paper Benchmarking Graph Neural Networks in Solving Hard Constraint Satisfaction Problems examines whether modern Graph Neural Networks (GNNs) truly outperform classic optimization heuristics on some of the toughest combinatorial problems. By introducing a rigorously‑designed set of benchmark instances drawn from statistical‑physics models, the authors provide the community with a reproducible yardstick for future claims of “neural superiority.”

Key Contributions

Hard, statistically‑grounded CSP benchmarks – a publicly released suite ( RandCSPBench ) of random constraint‑satisfaction problems that are provably difficult for both exact solvers and heuristics.
Comprehensive performance study – side‑by‑side evaluation of several state‑of‑the‑art GNN architectures against leading classical algorithms (e.g., Survey Propagation, WalkSAT, Simulated Annealing).
Fair experimental protocol – identical training/validation splits, hyper‑parameter budgets, and runtime caps to eliminate hidden advantages.
Insightful analysis of failure modes – identification of why GNNs struggle on the hardest instances (e.g., limited expressive power, over‑fitting to easy patterns).
Open‑source tooling – the benchmark generator, data loaders, and evaluation scripts are all released under an MIT license, ready for integration into CI pipelines or research notebooks.

Methodology

Benchmark Generation – The authors use random ensembles from statistical physics (e.g., random k‑SAT, graph coloring on Erdős‑Rényi graphs) tuned to sit right at the phase transition where problem hardness spikes.
Instance Sampling – For each problem type, thousands of instances are generated with varying sizes (up to several thousand variables) and stored in a standard JSON format.
Algorithm Portfolio
- Classical heuristics: Survey Propagation, WalkSAT, Parallel Tempering, and a tuned Simulated Annealing variant.
- Neural solvers: Three GNN models (Message‑Passing Neural Network, Graph Convolutional Network, and a recent Graph Attention Network) trained end‑to‑end to predict variable assignments.
Training Regime – Neural models receive only the training split of instances; hyper‑parameters (learning rate, depth, message‑passing steps) are selected via grid search on a held‑out validation set.
Evaluation – All methods are run under the same wall‑clock budget per instance (e.g., 30 seconds) and measured on two metrics: solution quality (fraction of satisfied constraints) and success rate (percentage of instances solved to optimality).

The whole pipeline is scripted in Python, leveraging PyTorch Geometric for the GNNs and existing open‑source CSP solvers for the baselines.

Results & Findings

Solver	Avg. Satisfaction (%)	Optimal‑solution Rate	Typical Runtime
Survey Propagation	98.7	94.2	< 1 s
WalkSAT	96.3	88.1	2–5 s
Simulated Annealing	95.9	85.4	10 s
GNN‑MPNN	84.5	31.2	0.5 s (inference)
GNN‑GCN	82.1	27.8	0.4 s
GNN‑GAT	80.9	24.5	0.6 s

Key take‑aways

Classical heuristics still dominate on the hardest random CSPs, often solving >90 % of instances within seconds.
GNN‑based solvers are fast at inference time but achieve significantly lower success rates, especially as instance size grows beyond a few hundred variables.
The performance gap widens near the phase‑transition density, confirming that the benchmark truly stresses algorithmic limits.

Practical Implications

For developers building AI‑augmented solvers: The study suggests that GNNs are currently better suited as pre‑processors (e.g., to generate good initializations) rather than stand‑alone solvers for hard CSPs.
Integration into existing pipelines: The open‑source benchmark can be plugged into CI tests to automatically verify that any new neural optimizer actually improves over baseline heuristics on a representative hard workload.
Hardware considerations: Since GNN inference is lightweight, hybrid approaches (GNN‑guided search + classical local moves) could exploit GPUs for rapid candidate generation while relying on CPUs for deep refinement.
Product roadmap: Companies working on scheduling, verification, or combinatorial design can use RandCSPBench to stress‑test their proprietary heuristics before release, ensuring robustness against worst‑case inputs.

Limitations & Future Work

Benchmark scope – The current suite focuses on random CSPs; real‑world instances often exhibit structure (e.g., community graphs) that may be easier for GNNs.
Model capacity – Only relatively shallow GNNs were explored; deeper or more expressive architectures (e.g., higher‑order message passing) might close the gap.
Training data – Models were trained exclusively on instances from the same distribution; transfer learning across problem families remains an open question.
Scalability – Experiments capped at ~10 k variables; scaling to industrial‑size problems (10⁶+ variables) will require distributed training and inference pipelines.

The authors encourage the community to extend the benchmark, experiment with hybrid algorithms, and report results using the provided evaluation scripts to foster transparent, reproducible progress in neural combinatorial optimization.

Authors

Geri Skenderi
Lorenzo Buffoni
Francesco D’Amico
David Machado
Raffaele Marino
Matteo Negri
Federico Ricci-Tersenghi
Carlo Lucibello
Maria Chiara Angelini

Paper Information

arXiv ID: 2602.18419v1
Categories: cond-mat.dis-nn, cs.LG
Published: February 20, 2026
PDF: Download PDF

[Paper] Benchmarking Graph Neural Networks in Solving Hard Constraint Satisfaction Problems

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Assigning Confidence: K-partition Ensembles

[Paper] The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

[Paper] Unifying approach to uniform expressivity of graph neural networks