[Paper] LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI
Source: arXiv - 2601.21511v1
Overview
The paper introduces LLaMEA‑SAGE, an extension to the LLaMEA framework that uses structural feedback from the generated code to steer large‑language‑model (LLM)‑based automated algorithm design (AAD). By extracting graph‑theoretic and complexity features from the abstract syntax trees (ASTs) of candidate algorithms, the system builds a surrogate model that tells the LLM how to mutate code, dramatically speeding up the search for high‑performing optimizers.
Key Contributions
- Feature‑driven guidance: Derives explainable, graph‑based features from ASTs and learns a surrogate model that predicts algorithm performance.
- Natural‑language mutation instructions: Translates the most influential features into human‑readable prompts that direct the LLM’s next code generation step without hard‑coding constraints.
- Integration with LLaMEA: Embeds the SAGE feedback loop into the existing evolutionary AAD pipeline, preserving its expressive power while adding a structured bias.
- Empirical validation: Shows faster convergence on small benchmark suites and superior final performance on the large‑scale MA‑BBOB competition suite compared to vanilla LLaMEA and other state‑of‑the‑art AAD methods.
- Explainable AI (XAI) pipeline: Uses SHAP/feature importance analyses to surface which code structures most impact optimizer quality, offering developers insight into “good” algorithm design patterns.
Methodology
- Initial population generation: LLaMEA prompts a large language model (e.g., GPT‑4) with a high‑level description of the target optimizer, receiving Python (or other language) code snippets.
- AST extraction: Each generated snippet is parsed into an abstract syntax tree. From the AST, a set of structural descriptors is computed—e.g., depth, branching factor, loop nesting, use of specific library calls, and graph‑theoretic metrics like cyclomatic complexity.
- Surrogate modeling: An inexpensive regression model (e.g., Gradient Boosted Trees) is trained on an archive of evaluated algorithms, mapping the extracted features to observed performance on a validation set of benchmark problems.
- Explainable AI analysis: Feature importance (via SHAP values or permutation importance) identifies which structural aspects most correlate with high performance.
- Natural‑language feedback generation: The system converts the top‑k influential features into concise mutation instructions (e.g., “increase the depth of the search tree” or “replace the current selection operator with a tournament of size 3”).
- Guided mutation: These instructions are fed back into the LLM as part of the next prompt, nudging the model to generate code that respects the suggested structural changes while still allowing creative variations.
- Evolution loop: Steps 2‑6 repeat, continuously refining the population until a stopping criterion (budget, convergence) is met.
Results & Findings
| Experiment | Baseline (vanilla LLaMEA) | LLaMEA‑SAGE | Speed‑up / Performance Gain |
|---|---|---|---|
| Small synthetic benchmark (5 functions) | 0.78 ± 0.04 (best‑found fitness) | 0.81 ± 0.03 | ~30 % fewer generations to reach same fitness |
| MA‑BBOB suite (55 multimodal functions) | 0.62 ± 0.07 (average rank) | 0.71 ± 0.05 | Statistically significant (p < 0.01) improvement; top‑5 rank among all AAD competitors |
| Runtime overhead (feature extraction + surrogate) | – | + 5 % wall‑clock time per generation | Overhead negligible compared to LLM inference cost |
Key takeaways
- Faster convergence: By biasing the search toward structurally promising code, LLaMEA‑SAGE reaches comparable or better fitness levels with fewer LLM calls.
- Higher final quality: On the large‑scale MA‑BBOB benchmark, the guided approach consistently outperforms the unguided version and other recent AAD systems.
- Explainability: The XAI analysis surfaces concrete coding patterns (e.g., deeper recursion, specific mutation operators) that correlate with success, offering actionable insights for human designers.
Practical Implications
- Accelerated AAD pipelines: Teams can integrate SAGE into existing LLM‑based optimizer generators to cut down on costly API calls and reduce cloud‑compute bills.
- Human‑in‑the‑loop co‑design: The natural‑language feedback can be displayed directly to developers, who can accept, tweak, or reject suggestions, turning the system into an intelligent coding assistant for meta‑heuristic design.
- Portability across languages: Because the feature extraction works on ASTs, the approach can be applied to any language supported by modern parsers (Python, C++, Java), enabling cross‑language optimizer synthesis.
- Domain‑specific extensions: By swapping the benchmark suite used to train the surrogate, organizations can tailor the guidance to their own problem domains (e.g., scheduling, hyper‑parameter tuning, reinforcement learning).
- Better interpretability of AI‑generated code: The XAI layer demystifies why a particular generated optimizer works, helping with compliance, debugging, and maintenance—critical concerns for production systems.
Limitations & Future Work
- Surrogate fidelity: The regression model is only as good as the evaluated archive; sparse or noisy performance data can mislead the guidance.
- Scalability of feature set: While AST features are lightweight, adding more sophisticated static analysis (e.g., data‑flow or symbolic execution) could increase overhead.
- LLM prompt sensitivity: The quality of the mutation instructions depends on the LLM’s ability to follow nuanced natural‑language cues; different model versions may behave inconsistently.
- Generalization to non‑optimizers: The current study focuses on evolutionary optimizers; extending SAGE to other algorithm families (e.g., graph algorithms, neural architecture search) remains an open question.
- Future directions: The authors suggest (1) incorporating dynamic runtime profiling features, (2) exploring multi‑objective surrogate models (e.g., balancing solution quality and runtime), and (3) testing the framework with open‑source LLMs to reduce dependency on proprietary APIs.
Authors
- Niki van Stein
- Anna V. Kononova
- Lars Kotthoff
- Thomas Bäck
Paper Information
- arXiv ID: 2601.21511v1
- Categories: cs.AI, cs.NE, cs.SE
- Published: January 29, 2026
- PDF: Download PDF