[Paper] EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery
Source: arXiv - 2512.13857v1
Overview
EvoLattice is a new framework that lets large language models (LLMs) evolve whole populations of programs—or even multi‑agent behaviors—inside a single directed‑acyclic graph (DAG). By storing multiple persistent alternatives at each node, every valid path through the graph becomes a distinct, runnable candidate, dramatically expanding the search space while keeping the underlying structure compact and reusable.
Key Contributions
- Graph‑based population encoding – Represents an entire candidate pool in one DAG, avoiding the “single‑candidate overwrite” limitation of prior LLM‑guided evolution.
- Multi‑alternative nodes – Each node holds several interchangeable code fragments (or prompt pieces), enabling combinatorial explosion of candidates without duplicating shared code.
- Alternative‑level evaluation – Scores are collected for every alternative across all paths it participates in, yielding fine‑grained statistics about local design choices and their global impact.
- Deterministic self‑repair – A built‑in mechanism guarantees acyclicity and dependency consistency, automatically fixing structural errors introduced by the LLM.
- Implicit quality‑diversity dynamics – The multi‑alternative representation naturally drives both performance improvement and behavioral diversity, without needing an external archive.
- Unified program & agent evolution – The same graph can host source‑code snippets or prompt fragments, making the approach applicable to program synthesis, optimizer meta‑learning, and multi‑agent system design.
Methodology
- Graph Construction – Start with a root node representing an empty program. The LLM proposes new alternatives (e.g., a function definition, a loop, a prompt fragment) that are added as child nodes.
- Path Extraction – Any acyclic path from the root to a leaf spells out a complete program/agent. Because each node may have several alternatives, the number of possible paths grows combinatorially.
- Evaluation Loop –
- Execute each candidate path (or a sampled subset for scalability).
- Record performance metrics (e.g., test‑case pass rate, reward in a simulated environment).
- Propagate the scores back to every alternative that appeared in the evaluated paths, aggregating statistics such as mean, variance, and contribution to success.
- LLM‑Guided Mutation & Recombination – The aggregated statistics become a dense feedback signal. The LLM is prompted to:
- Mutate low‑scoring alternatives (replace or tweak code).
- Recombine high‑scoring alternatives from different branches to create new paths.
- Pruning & Self‑Repair – Alternatives that consistently underperform are pruned. The self‑repair routine checks the DAG for cycles or broken dependencies and automatically restructures it, ensuring every path remains executable.
- Iterative Evolution – Steps 2‑5 repeat for a fixed number of generations or until a performance threshold is met.
Results & Findings
| Benchmark | Baseline (single‑candidate LLM) | EvoLattice | Observations |
|---|---|---|---|
| Program synthesis (synthetic tasks) | 62 % success after 30 generations | 78 % success | Faster convergence, fewer catastrophic regressions |
| Optimizer meta‑learning | 0.71 average reward | 0.84 average reward | More stable improvement curve, less variance |
| Multi‑agent prompt composition | 48 % task completion | 66 % task completion | Emergent diversity of strategies without explicit archive |
- Stability: EvoLattice’s self‑repair prevented crashes that plagued overwrite‑based methods, resulting in smoother learning curves.
- Expressivity: The combinatorial path space allowed the discovery of solutions that combined previously unrelated code fragments, something single‑candidate approaches could never explore.
- Implicit QD behavior: Diversity metrics (e.g., number of distinct functional behaviours) rose naturally as alternatives diversified, mirroring explicit quality‑diversity algorithms.
Practical Implications
- Scalable code generation pipelines: Teams can integrate EvoLattice into CI/CD to continuously evolve utility scripts, configuration generators, or domain‑specific languages while preserving useful building blocks.
- Robust agent design: In reinforcement‑learning or chatbot contexts, developers can evolve prompt libraries or sub‑policies that automatically recombine, yielding more adaptable agents.
- Reduced LLM waste: Because alternatives are persisted and re‑used, the same LLM calls generate multiple candidate solutions, lowering API costs compared to generating fresh programs each iteration.
- Debug‑friendly evolution: The deterministic self‑repair gives developers confidence that generated code will at least be syntactically valid, simplifying downstream testing and deployment.
- Plug‑and‑play with existing tools: EvoLattice’s DAG can be exported to common graph formats (e.g., GraphML, DOT) and visualized, making it compatible with version‑control diff tools and code‑review workflows.
Limitations & Future Work
- Scalability of exhaustive evaluation: While sampling mitigates the combinatorial blow‑up, very large graphs still require careful budget allocation; smarter path‑selection heuristics are needed.
- LLM dependence: The quality of mutations hinges on the underlying model; weaker LLMs may produce many low‑utility alternatives, increasing pruning overhead.
- Domain specificity: The current implementation focuses on imperative code and prompt fragments; extending to functional languages, hardware description languages, or graphics shaders may need custom node semantics.
- User‑guided constraints: Future work could expose APIs for developers to inject hard constraints (e.g., security policies) directly into the graph, guiding the evolution toward compliant solutions.
EvoLattice opens a promising avenue for turning LLMs into true evolutionary engineers—preserving what works, exploring what could work, and doing it all within a single, self‑healing graph structure. As the community builds richer evaluation metrics and tighter integration with development pipelines, we can expect LLM‑guided program discovery to become a practical tool in the everyday developer’s toolbox.
Authors
- Kamer Ali Yuksel
Paper Information
- arXiv ID: 2512.13857v1
- Categories: cs.AI, cs.CL, cs.LG, cs.MA, cs.NE
- Published: December 15, 2025
- PDF: Download PDF