[Paper] Heuristic Search as Language-Guided Program Optimization
Source: arXiv
Source: arXiv - 2602.16038v1
Overview
Recent advances in large language models (LLMs) have turned them into powerful assistants for designing heuristics that solve hard combinatorial‑optimization problems.
The paper Heuristic Search as Language‑Guided Program Optimization proposes a clean, modular framework that turns the heuristic‑design loop into a series of well‑defined steps:
- Evaluation – run the current heuristic on the target problem.
- Analytical Feedback – extract performance metrics and failure modes.
- Program Refinement – use the LLM to generate improved code based on the feedback.
This “forward‑backward‑update” pipeline makes it easy to improve each part without rewriting the whole system. The authors demonstrate that it consistently outperforms existing LLM‑driven baselines on four real‑world optimization tasks.
Key Contributions
- A unified three‑stage framework (forward pass, backward pass, update step) that cleanly separates evaluation, feedback generation, and code refinement for LLM‑guided heuristic discovery.
- Formal connection showing that many prior Automated Heuristic Design (AHD) methods are special cases of the proposed pipeline, explaining why they often hit performance ceilings.
- Empirical validation on four diverse combinatorial‑optimization domains (e.g., vehicle routing, scheduling, knapsack variants), achieving up to 0.17 QYI improvement on unseen test instances.
- Modular upgradeability: by swapping in better components (e.g., a more accurate evaluator or a richer feedback language model), the same pipeline can be incrementally improved without redesigning the whole system.
- Open‑source implementation (released with the paper) that integrates with popular LLM APIs, making it ready for immediate experimentation.
Methodology
1. Forward Pass – Evaluation
- Execute the current heuristic program (generated by an LLM) on a set of benchmark instances.
- Record performance metrics such as objective value, runtime, and feasibility.
2. Backward Pass – Analytical Feedback
- Feed the evaluation results and the program source code to a second LLM (or a smaller “critic” model).
- The model returns structured feedback that can be parsed by the optimizer, e.g.:
- Suggestions for algorithmic components
- Parameter ranges
- Code‑level bugs (e.g., “replace greedy insertion with a 2‑opt local search”)
3. Update Step – Program Refinement
- Provide the original LLM with the feedback as a prompt.
- The LLM rewrites the heuristic program, incorporating the suggested improvements while preserving correctness.
- (Optional) Run a lightweight “repair” model to catch syntax or semantic errors before the next forward pass.
The loop repeats until a stopping criterion is met (e.g., no further QYI gain on a validation set). Because each stage is a separate module, developers can swap in a stronger evaluator (e.g., a SAT solver) or a more domain‑aware critic without modifying the other components.
Results & Findings
| Domain | Baseline (prior AHD) | Proposed Framework | Δ QYI (Improvement) |
|---|---|---|---|
| Vehicle Routing (VRP) | 0.71 | 0.84 | +0.13 |
| Job Shop Scheduling | 0.62 | 0.78 | +0.16 |
| Knapsack Variants | 0.55 | 0.72 | +0.17 |
| Network Design | 0.68 | 0.80 | +0.12 |
- Consistent gains across all four domains, even on test instances the system never saw during training.
- Ablation studies show that removing the backward pass (i.e., no analytical feedback) drops performance to near‑baseline levels, confirming the critical role of structured feedback.
- When the authors re‑implemented three popular AHD pipelines as restricted instances of their framework, modest upgrades to the feedback module lifted those pipelines by 10–20 % in QYI.
Practical Implications
Rapid prototyping of custom heuristics
Developers can feed a problem description to the framework and obtain a working heuristic in minutes, then iteratively improve it without deep algorithmic expertise.Plug‑and‑play optimization pipelines
Companies can replace a legacy hand‑tuned heuristic with an LLM‑generated one, then gradually upgrade the evaluator (e.g., adding GPU‑accelerated simulators) to squeeze out extra performance.Cross‑domain transfer
Because the framework abstracts the design loop, a heuristic discovered for one logistics problem can be adapted to another by simply changing the evaluation dataset—no need to retrain a monolithic model.Reduced reliance on domain experts
The backward‑pass critic can be trained on publicly available algorithmic textbooks or code repositories, lowering the barrier for teams lacking in‑house combinatorial‑optimization specialists.Integration with CI/CD
The modular nature makes it straightforward to embed the loop into continuous‑integration pipelines, automatically re‑optimizing heuristics whenever new data or constraints appear.
Limitations & Future Work
Scalability of the evaluator
- For extremely large instances (e.g., millions of variables), the forward pass can become a bottleneck.
- The authors suggest hybrid evaluation (sampling + surrogate models) as a next step.
Feedback quality depends on LLM size
- Smaller or open‑source models sometimes generate vague suggestions, limiting improvement speed.
Limited theoretical guarantees
- The framework is empirical; formal convergence or optimality bounds remain open questions.
Future Directions
- Incorporate reinforcement‑learning‑style reward shaping into the backward pass.
- Extend the pipeline to multi‑objective optimization.
- Explore automated curriculum learning to gradually increase problem difficulty during training.
Authors
- Mingxin Yu
- Ruixiao Yang
- Chuchu Fan
Paper Information
| Field | Details |
|---|---|
| arXiv ID | 2602.16038v1 |
| Categories | cs.NE, cs.LG |
| Published | February 17, 2026 |
| Download PDF |