[Paper] EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery

Published: (December 15, 2025 at 02:43 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.13857v1

Overview

EvoLattice is a new framework that lets large language models (LLMs) evolve whole populations of programs—or even multi‑agent behaviors—inside a single directed‑acyclic graph (DAG). By storing multiple persistent alternatives at each node, every valid path through the graph becomes a distinct, runnable candidate, dramatically expanding the search space while keeping the underlying structure compact and reusable.

Key Contributions

  • Graph‑based population encoding – Represents an entire candidate pool in one DAG, avoiding the “single‑candidate overwrite” limitation of prior LLM‑guided evolution.
  • Multi‑alternative nodes – Each node holds several interchangeable code fragments (or prompt pieces), enabling combinatorial explosion of candidates without duplicating shared code.
  • Alternative‑level evaluation – Scores are collected for every alternative across all paths it participates in, yielding fine‑grained statistics about local design choices and their global impact.
  • Deterministic self‑repair – A built‑in mechanism guarantees acyclicity and dependency consistency, automatically fixing structural errors introduced by the LLM.
  • Implicit quality‑diversity dynamics – The multi‑alternative representation naturally drives both performance improvement and behavioral diversity, without needing an external archive.
  • Unified program & agent evolution – The same graph can host source‑code snippets or prompt fragments, making the approach applicable to program synthesis, optimizer meta‑learning, and multi‑agent system design.

Methodology

  1. Graph Construction – Start with a root node representing an empty program. The LLM proposes new alternatives (e.g., a function definition, a loop, a prompt fragment) that are added as child nodes.
  2. Path Extraction – Any acyclic path from the root to a leaf spells out a complete program/agent. Because each node may have several alternatives, the number of possible paths grows combinatorially.
  3. Evaluation Loop
    • Execute each candidate path (or a sampled subset for scalability).
    • Record performance metrics (e.g., test‑case pass rate, reward in a simulated environment).
    • Propagate the scores back to every alternative that appeared in the evaluated paths, aggregating statistics such as mean, variance, and contribution to success.
  4. LLM‑Guided Mutation & Recombination – The aggregated statistics become a dense feedback signal. The LLM is prompted to:
    • Mutate low‑scoring alternatives (replace or tweak code).
    • Recombine high‑scoring alternatives from different branches to create new paths.
  5. Pruning & Self‑Repair – Alternatives that consistently underperform are pruned. The self‑repair routine checks the DAG for cycles or broken dependencies and automatically restructures it, ensuring every path remains executable.
  6. Iterative Evolution – Steps 2‑5 repeat for a fixed number of generations or until a performance threshold is met.

Results & Findings

BenchmarkBaseline (single‑candidate LLM)EvoLatticeObservations
Program synthesis (synthetic tasks)62 % success after 30 generations78 % successFaster convergence, fewer catastrophic regressions
Optimizer meta‑learning0.71 average reward0.84 average rewardMore stable improvement curve, less variance
Multi‑agent prompt composition48 % task completion66 % task completionEmergent diversity of strategies without explicit archive
  • Stability: EvoLattice’s self‑repair prevented crashes that plagued overwrite‑based methods, resulting in smoother learning curves.
  • Expressivity: The combinatorial path space allowed the discovery of solutions that combined previously unrelated code fragments, something single‑candidate approaches could never explore.
  • Implicit QD behavior: Diversity metrics (e.g., number of distinct functional behaviours) rose naturally as alternatives diversified, mirroring explicit quality‑diversity algorithms.

Practical Implications

  • Scalable code generation pipelines: Teams can integrate EvoLattice into CI/CD to continuously evolve utility scripts, configuration generators, or domain‑specific languages while preserving useful building blocks.
  • Robust agent design: In reinforcement‑learning or chatbot contexts, developers can evolve prompt libraries or sub‑policies that automatically recombine, yielding more adaptable agents.
  • Reduced LLM waste: Because alternatives are persisted and re‑used, the same LLM calls generate multiple candidate solutions, lowering API costs compared to generating fresh programs each iteration.
  • Debug‑friendly evolution: The deterministic self‑repair gives developers confidence that generated code will at least be syntactically valid, simplifying downstream testing and deployment.
  • Plug‑and‑play with existing tools: EvoLattice’s DAG can be exported to common graph formats (e.g., GraphML, DOT) and visualized, making it compatible with version‑control diff tools and code‑review workflows.

Limitations & Future Work

  • Scalability of exhaustive evaluation: While sampling mitigates the combinatorial blow‑up, very large graphs still require careful budget allocation; smarter path‑selection heuristics are needed.
  • LLM dependence: The quality of mutations hinges on the underlying model; weaker LLMs may produce many low‑utility alternatives, increasing pruning overhead.
  • Domain specificity: The current implementation focuses on imperative code and prompt fragments; extending to functional languages, hardware description languages, or graphics shaders may need custom node semantics.
  • User‑guided constraints: Future work could expose APIs for developers to inject hard constraints (e.g., security policies) directly into the graph, guiding the evolution toward compliant solutions.

EvoLattice opens a promising avenue for turning LLMs into true evolutionary engineers—preserving what works, exploring what could work, and doing it all within a single, self‑healing graph structure. As the community builds richer evaluation metrics and tighter integration with development pipelines, we can expect LLM‑guided program discovery to become a practical tool in the everyday developer’s toolbox.

Authors

  • Kamer Ali Yuksel

Paper Information

  • arXiv ID: 2512.13857v1
  • Categories: cs.AI, cs.CL, cs.LG, cs.MA, cs.NE
  • Published: December 15, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »