[Paper] Beyond Algorithm Evolution: An LLM-Driven Framework for the Co-Evolution of Swarm Intelligence Optimization Algorithms and Prompts

Published: (December 9, 2025 at 07:37 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.09209v1

Overview

The paper introduces a co‑evolution framework that lets a large language model (LLM) simultaneously refine a swarm‑intelligence optimization algorithm and the textual prompts that steer it. By treating prompts as first‑class citizens in the design loop, the authors show that you can achieve strong, model‑agnostic performance on classic NP‑hard problems without relying on the most expensive LLMs.

Key Contributions

  • Unified co‑evolution loop: A single LLM iteratively improves both the swarm‑intelligence algorithm (e.g., PSO, ant colony) and its accompanying prompt template.
  • Prompt‑template evaluation metric: A lightweight, interpretable scoring method that quantifies how well a prompt guides the algorithm, enabling fast selection during evolution.
  • Model‑agnostic robustness: Demonstrated consistent gains across a spectrum of LLMs (GPT‑4o‑mini, Qwen‑3‑32B, GPT‑5), reducing dependence on high‑cost models.
  • Empirical superiority: State‑of‑the‑art results on a suite of NP‑complete benchmarks (e.g., SAT, TSP, knapsack) compared with existing automated design systems (EoH, FunSearch, Reevo).
  • Ablation & trajectory analysis: Shows that evolving prompts and algorithms together yields markedly better solutions than evolving either component alone.

Methodology

  1. Initial Population – Randomly generate a set of swarm‑intelligence algorithm variants (different operators, parameter settings) and a matching set of prompt templates (natural‑language descriptions of the optimization task).
  2. LLM‑Driven Generation – Feed each algorithm‑prompt pair to the LLM, which proposes mutations (e.g., tweak a velocity update rule, rephrase a prompt).
  3. Evaluation
    • Algorithm fitness: Run the algorithm on a benchmark instance and record solution quality / runtime.
    • Prompt fitness: Use the authors’ lightweight metric that measures how effectively the prompt elicits useful reasoning from the LLM (e.g., consistency of generated search directions).
  4. Selection & Recombination – Keep the top‑performing pairs, recombine their components (mix‑and‑match prompts with algorithms), and repeat the loop for a fixed number of generations.
  5. Final Selection – The best‑scoring pair is output as the “co‑evolved” solution.

The whole pipeline runs with a single LLM call per candidate, making it computationally tractable even on modest hardware.

Results & Findings

BenchmarkBaseline (EoH)Co‑evolved (GPT‑4o‑mini)Speed‑up vs. Baseline
SAT‑10085 % solved92 % solved1.3×
TSP‑501.45× optimal1.21× optimal1.2×
Knapsack‑2000.78 ratio0.84 ratio1.1×
  • Across all tested NP problems, the co‑evolved approach outperformed the strongest existing automated design methods by 5–10 % in solution quality.
  • When swapping the underlying LLM (Qwen‑3‑32B vs. GPT‑5), the evolutionary trajectories diverged: prompts generated by the smaller model tended to be more explicit, while the larger model produced more abstract, higher‑level prompts. Yet both converged to high‑performing algorithm‑prompt pairs, confirming the framework’s model‑agnostic nature.
  • Ablation: Removing prompt evolution caused a 6–8 % drop in performance, underscoring that prompt refinement is not a cosmetic add‑on but a core driver of success.

Practical Implications

  • Cost‑effective AI‑augmented optimization – Companies can deploy cheaper LLMs (e.g., open‑source 30B models) and still reap near‑state‑of‑the‑art optimization performance, lowering cloud‑compute bills.
  • Plug‑and‑play optimizer – The framework outputs a ready‑to‑run swarm algorithm together with a concise prompt, meaning developers can embed it directly into pipelines (e.g., scheduling, routing, resource allocation) without hand‑crafting heuristics.
  • Rapid prototyping for new domains – By feeding domain‑specific constraints into the prompt template, the co‑evolution loop can automatically discover tailored swarm behaviours, accelerating R&D for logistics, finance, or bioinformatics.
  • Explainability boost – The prompt‑template evaluation metric provides a human‑readable trace of why a particular algorithm variant works, aiding debugging and compliance audits.

Limitations & Future Work

  • Scalability to ultra‑large problem instances – Experiments capped at moderate‑size NP benchmarks; performance on industrial‑scale datasets remains to be validated.
  • Prompt search space design – The current template grammar is handcrafted; richer, possibly hierarchical prompt representations could unlock further gains.
  • Cross‑modal extensions – The authors note the potential to co‑evolve visual or code‑based “prompts” (e.g., program sketches) alongside algorithms, a direction left for future exploration.

Overall, the paper charts a promising path toward LLM‑driven, self‑optimizing heuristics that blend algorithmic rigor with natural‑language flexibility—an approach that could reshape how developers build intelligent optimization services.

Authors

  • Shipeng Cen
  • Ying Tan

Paper Information

  • arXiv ID: 2512.09209v1
  • Categories: cs.NE
  • Published: December 10, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »