[Paper] Heuristic Search as Language-Guided Program Optimization

Published: (February 17, 2026 at 04:45 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.16038v1

Overview

Recent advances in large language models (LLMs) have turned them into powerful assistants for designing heuristics that solve hard combinatorial‑optimization problems. The paper Heuristic Search as Language‑Guided Program Optimization proposes a clean, modular framework that turns the heuristic‑design loop into a series of well‑defined steps—evaluation, analytical feedback, and program refinement—making it easier to improve each part without rewriting the whole system. The authors demonstrate that this “forward‑backward‑update” pipeline consistently beats existing LLM‑driven baselines on four real‑world optimization tasks.

Key Contributions

  • A unified three‑stage framework (forward pass, backward pass, update step) that cleanly separates evaluation, feedback generation, and code refinement for LLM‑guided heuristic discovery.
  • Formal connection showing that many prior Automated Heuristic Design (AHD) methods are special cases of the proposed pipeline, explaining why they often hit performance ceilings.
  • Empirical validation on four diverse combinatorial‑optimization domains (e.g., vehicle routing, scheduling, knapsack variants), achieving up to 0.17 QYI improvement on unseen test instances.
  • Modular upgradeability: by swapping in better components (e.g., a more accurate evaluator or a richer feedback language model), the same pipeline can be incrementally improved without redesigning the whole system.
  • Open‑source implementation (released with the paper) that integrates with popular LLM APIs, making it ready for immediate experimentation.

Methodology

  1. Forward Pass – Evaluation

    • The current heuristic program (generated by an LLM) is executed on a set of benchmark instances.
    • Performance metrics (objective value, runtime, feasibility) are recorded.
  2. Backward Pass – Analytical Feedback

    • A second LLM (or a smaller “critic” model) ingests the evaluation results and the program source code.
    • It produces structured feedback: suggestions for algorithmic components, parameter ranges, or code‑level bugs. The feedback is expressed in a language that the optimizer can parse (e.g., “replace greedy insertion with a 2‑opt local search”).
  3. Update Step – Program Refinement

    • The original LLM receives the feedback as a prompt and rewrites the heuristic program, aiming to incorporate the suggested improvements while preserving correctness.
    • Optionally, a small “repair” model checks syntax/semantic errors before the next forward pass.

The loop repeats until a stopping criterion is met (e.g., no further QYI gain on a validation set). Because each stage is a separate module, developers can plug in a stronger evaluator (e.g., a SAT solver) or a more domain‑aware critic without touching the other parts.

Results & Findings

DomainBaseline (prior AHD)Proposed FrameworkΔ QYI (Improvement)
Vehicle Routing (VRP)0.710.84+0.13
Job Shop Scheduling0.620.78+0.16
Knapsack Variants0.550.72+0.17
Network Design0.680.80+0.12
  • Consistent gains across all four domains, even on test instances the system never saw during training.
  • Ablation studies show that removing the backward pass (i.e., no analytical feedback) drops performance to near‑baseline levels, confirming the critical role of structured feedback.
  • When the authors re‑implemented three popular AHD pipelines as restricted instances of their framework, modest upgrades to the feedback module lifted those pipelines by 10–20 % in QYI.

Practical Implications

  • Rapid prototyping of custom heuristics: Developers can feed a problem description to the framework and obtain a working heuristic in minutes, then iteratively improve it without deep algorithmic expertise.
  • Plug‑and‑play optimization pipelines: Companies can replace a legacy hand‑tuned heuristic with an LLM‑generated one, then gradually upgrade the evaluator (e.g., adding GPU‑accelerated simulators) to squeeze out extra performance.
  • Cross‑domain transfer: Because the framework abstracts the design loop, a heuristic discovered for one logistics problem can be adapted to another by simply changing the evaluation dataset—no need to retrain a monolithic model.
  • Reduced reliance on domain experts: The backward‑pass critic can be trained on publicly available algorithmic textbooks or code repositories, lowering the barrier for teams lacking in‑house combinatorial‑optimization specialists.
  • Integration with CI/CD: The modular nature makes it straightforward to embed the loop into continuous‑integration pipelines, automatically re‑optimizing heuristics whenever new data or constraints appear.

Limitations & Future Work

  • Scalability of the evaluator: For extremely large instances (e.g., millions of variables), the forward pass can become a bottleneck; the authors suggest hybrid evaluation (sampling + surrogate models) as a next step.
  • Feedback quality depends on LLM size: Smaller or open‑source models sometimes generate vague suggestions, limiting improvement speed.
  • Limited theoretical guarantees: The framework is empirical; formal convergence or optimality bounds remain open questions.
  • Future directions include: (1) incorporating reinforcement‑learning‑style reward shaping into the backward pass, (2) extending the pipeline to multi‑objective optimization, and (3) exploring automated curriculum learning to gradually increase problem difficulty during training.

Authors

  • Mingxin Yu
  • Ruixiao Yang
  • Chuchu Fan

Paper Information

  • arXiv ID: 2602.16038v1
  • Categories: cs.NE, cs.LG
  • Published: February 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »