[Paper] AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization
Source: arXiv - 2602.20133v1
Overview
AdaEvolve re‑thinks how large language models (LLMs) are used to automatically improve code and algorithms. Instead of treating the LLM as a fixed “mutation” tool that runs on a static schedule, the authors cast the whole evolutionary loop as a hierarchical, adaptive optimizer that watches its own progress and reallocates compute on‑the‑fly. The result is a system that consistently finds better solutions faster across a wide range of open‑ended optimization tasks.
Key Contributions
- Adaptive three‑level control loop – introduces Local, Global, and Meta‑Guidance layers that jointly decide how much exploration to perform, which population to fund, and when to invent new mutation tactics.
- Accumulated improvement signal – a lightweight metric that aggregates recent fitness gains and drives all three adaptation layers, enabling the system to detect stagnation early.
- Bandit‑based global budgeting – treats each candidate population as an arm in a multi‑armed bandit, dynamically shifting the overall compute budget toward the most promising search spaces.
- Meta‑LLM guidance – when progress stalls, a separate LLM is prompted with the history of generated solutions and their improvement scores to synthesize fresh mutation prompts, effectively “learning how to mutate”.
- Extensive empirical validation – evaluated on 185 open‑ended problems spanning combinatorial puzzles, systems‑level configuration, and algorithm design, showing consistent gains over strong open‑source baselines.
Methodology
-
Problem framing – Each optimization task is expressed as a fitness function that can be evaluated on candidate programs or configurations. An LLM acts as a semantic mutation operator: given a candidate, it produces a syntactically valid variation.
-
Local Adaptation – Within a single population, AdaEvolve monitors the accumulated improvement signal (a moving sum of recent fitness changes). If the signal is high, the system ramps up mutation intensity (e.g., more aggressive prompts, higher temperature). If the signal drops, it throttles back to avoid wasted exploration.
-
Global Adaptation – Multiple populations (different initial seeds, problem encodings, or mutation styles) run in parallel. A contextual bandit algorithm assigns each population a share of the total compute budget based on its recent improvement signal, continuously re‑balancing resources toward the most productive groups.
-
Meta‑Guidance – When a population’s improvement signal stays low for a predefined horizon, a meta‑LLM is invoked. It receives a compact summary of the population’s history (best solutions, failed attempts, improvement trends) and is asked to generate new mutation prompts or transformation strategies. These fresh tactics are then injected back into the local loop.
-
Zeroth‑order optimization – The whole pipeline requires only black‑box fitness evaluations; no gradients or internal model access are needed, making it compatible with any LLM or proprietary code‑generation service.
Results & Findings
| Benchmark Category | Baseline (static schedule) | AdaEvolve | Relative Improvement |
|---|---|---|---|
| Combinatorial (e.g., SAT, TSP) | 78 % optimality after 10 k evals | 85 % | +9 % |
| Systems Optimization (e.g., DB config) | 1.42× speedup | 1.68× | +18 % |
| Algorithm Design (e.g., sorting variants) | 0.62 best‑known score | 0.71 | +14 % |
| End‑to‑end runtime (all 185 tasks) | 12 h total compute | 9 h | –25 % wall‑clock |
- Faster convergence: On average, AdaEvolve reaches a given quality threshold 30 % sooner than the static‑schedule baselines.
- Better final solutions: The top‑10% of runs produce solutions that are 5–12 % higher in fitness than the best static runs.
- Robustness to problem heterogeneity: The adaptive budget allocation prevents any single hard problem from monopolizing resources, yielding more balanced performance across the diverse suite.
Practical Implications
- Developer tooling – Integrated into IDE assistants, AdaEvolve can automatically refactor or optimize code snippets while staying within a developer‑defined compute budget, delivering higher‑quality suggestions without long waits.
- Auto‑tuning of cloud services – Operators can plug AdaEvolve into configuration pipelines (e.g., Spark, Kubernetes) to continuously evolve resource allocations or query plans, reacting to workload shifts in near‑real time.
- Algorithm prototyping – Researchers can use the meta‑guidance layer to explore novel algorithmic ideas; the system will suggest fresh mutation patterns once conventional tweaks stop improving performance.
- Cost‑effective LLM usage – By allocating compute only where the improvement signal is strong, organizations can reduce API spend on LLM‑driven optimization by up to a quarter, a tangible saving for large‑scale deployments.
Limitations & Future Work
- Reliance on a good fitness oracle – The framework assumes fast, reliable evaluation of candidate solutions; noisy or extremely expensive oracles can degrade the improvement signal and misguide adaptation.
- Meta‑LLM prompt engineering – While the meta‑LLM can generate new tactics, its effectiveness varies with the underlying LLM’s capabilities; the authors note occasional “prompt drift” where generated mutations become too generic.
- Scalability of bandit management – Managing thousands of parallel populations may introduce overhead; future work could explore hierarchical bandits or clustering to keep the global scheduler lightweight.
- Generalization to non‑code domains – The paper focuses on program‑level optimization; extending AdaEvolve to other generative domains (e.g., UI design, data pipeline construction) is an open research direction.
AdaEvolve demonstrates that making LLM‑driven evolutionary search adaptive, rather than static, yields measurable gains in both speed and solution quality. For developers and engineers looking to harness LLMs for automated optimization, the three‑layer control loop offers a practical blueprint for building smarter, more resource‑aware systems.
Authors
- Mert Cemri
- Shubham Agrawal
- Akshat Gupta
- Shu Liu
- Audrey Cheng
- Qiuyang Mang
- Ashwin Naren
- Lutfi Eren Erdogan
- Koushik Sen
- Matei Zaharia
- Alex Dimakis
- Ion Stoica
Paper Information
- arXiv ID: 2602.20133v1
- Categories: cs.NE, cs.AI, cs.CL
- Published: February 23, 2026
- PDF: Download PDF