[Paper] Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
Source: arXiv - 2601.03335v1
Overview
The paper introduces Digital Red Queen (DRQ), a lightweight self‑play framework that lets a large language model (LLM) continuously evolve assembly‑like programs—called warriors—to out‑compete every previously generated opponent in the classic Core War sandbox. By turning the optimization problem into an open‑ended “Red Queen” arms race, the authors show that LLM‑generated code can become increasingly general and converge toward robust strategies, offering a new lens on adversarial AI and potential lessons for security‑focused applications.
Key Contributions
- Red‑Queen self‑play loop: A simple algorithm where each new LLM‑generated warrior must defeat all earlier warriors, enforcing continual adaptation.
- LLM‑driven program synthesis: Uses a state‑of‑the‑art language model to write low‑level Core War assembly code from high‑level prompts.
- Empirical evidence of convergence: Across many generations, warriors become more general (perform better against unseen human‑crafted opponents) and less behaviorally diverse, mirroring convergent evolution.
- Core War as a testbed: Demonstrates that the Turing‑complete Core War VM is a tractable, controllable sandbox for studying adversarial co‑evolution and for benchmarking LLM‑based evolution methods.
- Broader vision: Shows how minimal self‑play setups could be transplanted to real‑world adversarial domains such as cybersecurity red‑team/blue‑team exercises or drug‑resistance modeling.
Methodology
- Environment: Core War—a virtual machine where two programs (warriors) battle for control of shared memory. The language is assembly‑like, deterministic, and fully observable.
- Initial population: A set of baseline warriors (including human‑written ones) seeds the competition.
- Self‑play loop (DRQ):
- At round t, the LLM receives a prompt describing the goal: “Write a Core War warrior that defeats every warrior generated in rounds 0 … t‑1.”
- The model generates candidate code, which is compiled and tested against the full archive of previous warriors.
- The first candidate that wins all matches becomes the new champion and is added to the archive.
- Evaluation: After many rounds, the authors test the evolved warriors against a held‑out suite of human‑crafted opponents and measure behavioral diversity using execution trace clustering.
- Analysis: Track win‑rates, generality (performance on unseen opponents), and diversity trends over independent runs.
Results & Findings
- Increasing generality: After ~200 generations, the DRQ warriors achieve higher win‑rates against a diverse set of human warriors than any single generation earlier in the run.
- Convergence of behavior: Independent DRQ runs produce warriors with remarkably similar execution patterns, indicating a strong attractor strategy in the fitness landscape.
- Efficiency: The entire evolution process runs on commodity hardware (single GPU) and completes within a few hours, showing that sophisticated adversarial dynamics need not require massive compute.
- Comparison to static optimization: A baseline where the LLM is asked to optimize against a fixed opponent plateaus quickly, whereas the Red‑Queen loop continues to push performance upward.
Practical Implications
- Adversarial code generation for security testing: DRQ‑style self‑play could automate the creation of novel exploits or defensive payloads that continuously adapt to each other, providing richer red‑team/blue‑team training scenarios.
- Robust AI agents: The convergence toward general strategies suggests a pathway for training LLM‑based agents that remain effective even as opponents evolve, useful in competitive gaming, automated negotiation, or autonomous defense systems.
- Benchmark for LLM program synthesis: Core War offers a low‑overhead, reproducible benchmark for measuring how well LLMs can generate correct, performant low‑level code under adversarial pressure.
- Rapid prototyping of co‑evolutionary algorithms: The minimal DRQ loop can be transplanted to other sandboxed domains (e.g., network packet filters, smart contract fuzzing) to explore arms‑race dynamics without building large simulation infrastructures.
Limitations & Future Work
- Domain specificity: Core War, while expressive, is a toy environment; results may not directly transfer to high‑stakes real‑world systems without additional constraints.
- LLM dependence: The quality of evolved warriors hinges on the underlying model’s code‑generation capabilities; smaller or less‑trained models may stall early.
- Diversity loss: Convergent behavior, while indicating a strong strategy, also reduces exploration of alternative tactics that could be valuable in heterogeneous threat landscapes.
- Future directions: Extending DRQ to multi‑objective settings (e.g., stealth + speed), integrating reinforcement‑learning critics for finer‑grained feedback, and applying the framework to realistic cybersecurity sandboxes or drug‑resistance simulations.
Authors
- Akarsh Kumar
- Ryan Bahlous-Boldi
- Prafull Sharma
- Phillip Isola
- Sebastian Risi
- Yujin Tang
- David Ha
Paper Information
- arXiv ID: 2601.03335v1
- Categories: cs.AI, cs.NE
- Published: January 6, 2026
- PDF: Download PDF