[Paper] Survey on Neural Routing Solvers
Source: arXiv - 2602.21761v1
Overview
Neural Routing Solvers (NRSs) are a new breed of deep‑learning models that aim to solve vehicle routing problems (VRPs) by learning the “rules of thumb” that human‑crafted heuristics use. This survey paper shines a light on the heuristic nature of NRSs, organizes the rapidly growing literature into a clear taxonomy, and proposes a more realistic way to benchmark these models for real‑world generalization.
Key Contributions
- Heuristic‑centric perspective: Re‑frames NRS research as the evolution of classic routing heuristics (e.g., savings, insertion, local search) that are now learned rather than manually coded.
- Hierarchical taxonomy: Introduces a three‑level classification (problem‑scope → heuristic principle → architectural family) that makes it easy to locate any NRS in the literature.
- Generalization‑focused evaluation pipeline: Designs a benchmark that tests models on out‑of‑distribution (OOD) instances, varying size, geography, and demand patterns—addressing the over‑fitting problem of existing pipelines.
- Comprehensive empirical comparison: Runs a head‑to‑head study of representative NRSs under both the traditional and the new pipelines, revealing hidden performance gaps and robustness issues.
- Open‑source toolbox: Releases code for the taxonomy, dataset generators, and evaluation scripts, enabling reproducible research and rapid prototyping.
Methodology
- Literature mapping: The authors collected 70+ NRS papers (spanning 2018‑2024) and annotated each with (a) the VRP variant it targets, (b) the heuristic principle it emulates (construction, improvement, or hybrid), and (c) the neural architecture (graph neural network, transformer, reinforcement‑learning agent, etc.).
- Taxonomy construction: Using the annotations, they built a tree‑like hierarchy:
- Level 1 – Problem scope: CVRP, VRPTW, PDPTW, etc.
- Level 2 – Heuristic principle: Construction (e.g., learned savings), Improvement (learned local search moves), Hybrid (learned meta‑heuristics).
- Level 3 – Architecture family: GNN‑based encoders, attention‑based decoders, RL policy networks, diffusion models, etc.
- Evaluation pipelines:
- Conventional pipeline: Train on a fixed set of instances (often from a single distribution) and test on similarly sized, same‑distribution instances.
- Generalization pipeline (proposed): Create multiple test suites that differ in size (small → large), spatial distribution (clustered vs. uniform), and demand stochasticity. Models are trained once and evaluated across all suites, mimicking real‑world deployment where problem characteristics shift.
- Benchmarking: Selected 10 representative NRSs (e.g., Attention Model, POMO, Neural Large‑Neighbourhood Search, Graph‑Based RL) and ran them on standard CVRP datasets (Solomon, Augerat) plus the OOD suites. Metrics include solution quality (percentage gap to optimal/Best‑Known), inference speed, and robustness (variance across OOD sets).
Results & Findings
| Pipeline | Best‑performing NRS (avg. gap) | Notable observations |
|---|---|---|
| Conventional | POMO – 1.8 % gap | Works well when train‑test distributions match. |
| Generalization | Neural Large‑Neighbourhood Search (NLNS) – 3.4 % gap | Maintains relatively low degradation on larger, clustered instances. |
| Conventional (speed) | Attention Model – 0.5 ms per instance | Very fast but quality drops sharply on OOD data. |
| Generalization (robustness) | Hybrid GNN‑RL – 4.1 % gap, low variance | Shows consistent performance across diverse test suites. |
Key takeaways
- Many NRSs that look impressive under the conventional pipeline lose 2‑5× more quality when faced with OOD instances.
- Architectures that incorporate local search or neighborhood exploration (e.g., NLNS) generalize better than pure end‑to‑end sequence models.
- Inference speed remains a strong advantage of NRSs, but the trade‑off with robustness must be considered for production use.
Practical Implications
- For logistics software vendors: The survey suggests that plugging a vanilla attention‑based NRS into an existing routing engine may yield quick wins on static, well‑characterized routes, but a more robust hybrid (construction + learned improvement) is needed for dynamic fleets with varying order patterns.
- For developers building custom routing solutions: The taxonomy helps you pick a starting point—e.g., if you already have a heuristic insertion routine, you can replace its decision rule with a GNN‑based policy rather than rebuilding from scratch.
- Edge deployment: Because most NRSs run inference in milliseconds on a GPU/TPU, they are suitable for real‑time dispatching, but you should validate on OOD data that mirrors your city’s geography and demand spikes.
- Tooling & reproducibility: The released toolbox lets you generate OOD benchmark suites with a single command, making it easier to integrate NRS evaluation into CI pipelines.
Limitations & Future Work
- Dataset bias: The surveyed papers largely focus on CVRP and VRPTW; other complex variants (e.g., stochastic demand, multi‑modal fleets) remain under‑explored.
- Scalability ceiling: While inference is fast, training still requires massive synthetic data and GPU hours, limiting accessibility for small companies.
- Explainability: Learned heuristics are often opaque; the survey calls for methods to extract human‑readable rules from NRSs, facilitating trust and regulatory compliance.
- Future directions:
- Unified benchmark suites covering a broader set of VRP flavors.
- Meta‑learning approaches that adapt a single NRS to new distributions with few‑shot fine‑tuning.
- Tighter integration of NRSs with classic OR solvers (e.g., using NRSs to generate warm‑starts for mixed‑integer programming).
Authors
- Yunpeng Ba
- Xi Lin
- Changliang Zhou
- Ruihao Zheng
- Zhenkun Wang
- Xinyan Liang
- Zhichao Lu
- Jianyong Sun
- Yuhua Qian
- Qingfu Zhang
Paper Information
- arXiv ID: 2602.21761v1
- Categories: math.OC, cs.AI, cs.LG, cs.NE
- Published: February 25, 2026
- PDF: Download PDF