[Paper] AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization

Published: 3 days ago (February 23, 2026 at 01:45 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.20133v1

Overview

AdaEvolve re‑thinks how large language models (LLMs) are used to automatically improve code and algorithms. Instead of treating the LLM as a fixed “mutation” tool that runs on a static schedule, the authors cast the whole evolutionary loop as a hierarchical, adaptive optimizer that watches its own progress and reallocates compute on‑the‑fly. The result is a system that consistently finds better solutions faster across a wide range of open‑ended optimization tasks.

Key Contributions

Adaptive three‑level control loop – introduces Local, Global, and Meta‑Guidance layers that jointly decide how much exploration to perform, which population to fund, and when to invent new mutation tactics.
Accumulated improvement signal – a lightweight metric that aggregates recent fitness gains and drives all three adaptation layers, enabling the system to detect stagnation early.
Bandit‑based global budgeting – treats each candidate population as an arm in a multi‑armed bandit, dynamically shifting the overall compute budget toward the most promising search spaces.
Meta‑LLM guidance – when progress stalls, a separate LLM is prompted with the history of generated solutions and their improvement scores to synthesize fresh mutation prompts, effectively “learning how to mutate”.
Extensive empirical validation – evaluated on 185 open‑ended problems spanning combinatorial puzzles, systems‑level configuration, and algorithm design, showing consistent gains over strong open‑source baselines.

Methodology

Problem framing – Each optimization task is expressed as a fitness function that can be evaluated on candidate programs or configurations. An LLM acts as a semantic mutation operator: given a candidate, it produces a syntactically valid variation.
Local Adaptation – Within a single population, AdaEvolve monitors the accumulated improvement signal (a moving sum of recent fitness changes). If the signal is high, the system ramps up mutation intensity (e.g., more aggressive prompts, higher temperature). If the signal drops, it throttles back to avoid wasted exploration.
Global Adaptation – Multiple populations (different initial seeds, problem encodings, or mutation styles) run in parallel. A contextual bandit algorithm assigns each population a share of the total compute budget based on its recent improvement signal, continuously re‑balancing resources toward the most productive groups.
Meta‑Guidance – When a population’s improvement signal stays low for a predefined horizon, a meta‑LLM is invoked. It receives a compact summary of the population’s history (best solutions, failed attempts, improvement trends) and is asked to generate new mutation prompts or transformation strategies. These fresh tactics are then injected back into the local loop.
Zeroth‑order optimization – The whole pipeline requires only black‑box fitness evaluations; no gradients or internal model access are needed, making it compatible with any LLM or proprietary code‑generation service.

Results & Findings

Benchmark Category	Baseline (static schedule)	AdaEvolve	Relative Improvement
Combinatorial (e.g., SAT, TSP)	78 % optimality after 10 k evals	85 %	+9 %
Systems Optimization (e.g., DB config)	1.42× speedup	1.68×	+18 %
Algorithm Design (e.g., sorting variants)	0.62 best‑known score	0.71	+14 %
End‑to‑end runtime (all 185 tasks)	12 h total compute	9 h	–25 % wall‑clock

Faster convergence: On average, AdaEvolve reaches a given quality threshold 30 % sooner than the static‑schedule baselines.
Better final solutions: The top‑10% of runs produce solutions that are 5–12 % higher in fitness than the best static runs.
Robustness to problem heterogeneity: The adaptive budget allocation prevents any single hard problem from monopolizing resources, yielding more balanced performance across the diverse suite.

Practical Implications

Developer tooling – Integrated into IDE assistants, AdaEvolve can automatically refactor or optimize code snippets while staying within a developer‑defined compute budget, delivering higher‑quality suggestions without long waits.
Auto‑tuning of cloud services – Operators can plug AdaEvolve into configuration pipelines (e.g., Spark, Kubernetes) to continuously evolve resource allocations or query plans, reacting to workload shifts in near‑real time.
Algorithm prototyping – Researchers can use the meta‑guidance layer to explore novel algorithmic ideas; the system will suggest fresh mutation patterns once conventional tweaks stop improving performance.
Cost‑effective LLM usage – By allocating compute only where the improvement signal is strong, organizations can reduce API spend on LLM‑driven optimization by up to a quarter, a tangible saving for large‑scale deployments.

Limitations & Future Work

Reliance on a good fitness oracle – The framework assumes fast, reliable evaluation of candidate solutions; noisy or extremely expensive oracles can degrade the improvement signal and misguide adaptation.
Meta‑LLM prompt engineering – While the meta‑LLM can generate new tactics, its effectiveness varies with the underlying LLM’s capabilities; the authors note occasional “prompt drift” where generated mutations become too generic.
Scalability of bandit management – Managing thousands of parallel populations may introduce overhead; future work could explore hierarchical bandits or clustering to keep the global scheduler lightweight.
Generalization to non‑code domains – The paper focuses on program‑level optimization; extending AdaEvolve to other generative domains (e.g., UI design, data pipeline construction) is an open research direction.

AdaEvolve demonstrates that making LLM‑driven evolutionary search adaptive, rather than static, yields measurable gains in both speed and solution quality. For developers and engineers looking to harness LLMs for automated optimization, the three‑layer control loop offers a practical blueprint for building smarter, more resource‑aware systems.

Authors

Mert Cemri
Shubham Agrawal
Akshat Gupta
Shu Liu
Audrey Cheng
Qiuyang Mang
Ashwin Naren
Lutfi Eren Erdogan
Koushik Sen
Matei Zaharia
Alex Dimakis
Ion Stoica

Paper Information

arXiv ID: 2602.20133v1
Categories: cs.NE, cs.AI, cs.CL
Published: February 23, 2026
PDF: Download PDF

[Paper] AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

[Paper] GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

[Paper] Dynamic Personality Adaptation in Large Language Models via State Machines

[Paper] When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models