[Paper] Vulcan: Instance-Optimal Systems Heuristics Through LLM-Driven Search

Published: (December 31, 2025 at 01:58 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.25065v1

Overview

The paper introduces Vulcan, a framework that leverages code‑generating large language models (LLMs) to automatically synthesize instance‑optimal system heuristics—tiny, workload‑specific policies for tasks like cache eviction or memory tiering. By turning the heuristic‑design process into a searchable program‑generation problem, the authors show that LLM‑crafted policies can beat the best hand‑tuned algorithms on real hardware.

Key Contributions

  • LLM‑driven policy synthesis: Demonstrates that even modest LLMs can generate correct, executable system code when guided by a well‑defined, task‑agnostic interface.
  • Separation of policy and mechanism: Introduces a lightweight API that lets the LLM focus on what to decide (policy) while the underlying system handles how to enforce it (mechanism).
  • Evolutionary search over generated code: Uses a fast, population‑based search to explore the space of LLM‑produced heuristics and converge on the best performer for a given workload/hardware pair.
  • Empirical breakthroughs: Synthesized cache‑eviction policies outperform the state‑of‑the‑art LRU‑based and learning‑based algorithms by up to 69 %, and memory‑tiering policies improve performance by 7.9 %.
  • Generalizable workflow: Shows the same pipeline works for multiple resource‑management problems without hand‑crafting problem‑specific prompts or model fine‑tuning.

Methodology

  1. Define a minimal interface – The system exposes a small set of functions (e.g., on_access(key), evict_candidate()) and a scalar objective (e.g., hit‑rate, latency). This keeps the LLM’s code generation task simple and verifiable.
  2. Prompt the LLM – A concise description of the interface plus the optimization goal is fed to a code‑generating LLM (e.g., Codex or a 7B‑parameter model). The model returns a candidate policy implementation in a target language (C/C++/Rust).
  3. Compile & sandbox – The generated code is compiled and run inside a safe sandbox that simulates the target system (cache simulator, memory‑tiering emulator).
  4. Evolutionary search – An evolutionary algorithm treats each compiled policy as an individual. It mutates the prompt (e.g., tweaking wording, adding constraints) and recombines successful snippets, iteratively improving performance.
  5. Select the best instance‑optimal policy – After a fixed budget of generations, the highest‑scoring policy is deployed as the final heuristic for the specific workload/hardware configuration.

The key insight is that by constraining the problem space through the interface, even a relatively small LLM can reliably produce syntactically correct, runnable code, allowing the search to focus on performance rather than debugging.

Results & Findings

TaskBaseline (human‑designed)Vulcan‑synthesizedImprovement
Cache eviction (hit‑rate)LRU, HyperCache, TinyLFUCustom LLM policy+69 % hit‑rate
Memory tiering (throughput)Tiered‑LRU, RL‑based tieringLLM‑generated tiering rule+7.9 % throughput
  • Robustness: The generated policies remained stable across variations in workload mix (e.g., Zipfian vs. uniform accesses).
  • Speed of synthesis: Using a 7B‑parameter model, a full search (≈ 200 generations) completed within a few hours on a single GPU, far cheaper than a full manual redesign cycle.
  • Portability: The same interface allowed swapping the backend (e.g., from a cache simulator to a real kernel module) with minimal changes.

Practical Implications

  • Rapid heuristic prototyping – System engineers can now ask an LLM to “write a cache‑eviction policy that maximizes hit‑rate for workload X” and let Vulcan iterate automatically, cutting weeks of trial‑and‑error.
  • Tailored performance for edge devices – Small IoT or edge servers with unique memory hierarchies can receive custom‑fit policies without a dedicated research team.
  • Continuous adaptation – When workloads drift (e.g., after a software update), the same pipeline can re‑run overnight to produce a new optimal heuristic, enabling self‑optimizing systems.
  • Lower barrier to entry – Developers without deep OS theory can still obtain high‑quality policies, democratizing performance engineering.
  • Potential integration points – Kernel subsystems (e.g., page replacement), storage engines, CDN caches, and cloud orchestration layers could expose the Vulcan interface as a plug‑in, letting operators auto‑tune policies per tenant.

Limitations & Future Work

  • Search budget vs. optimality – The evolutionary search is not guaranteed to find the global optimum; results depend on the number of generations and prompt diversity.
  • Model reliability – While the constrained interface reduces syntax errors, occasional semantic bugs still require runtime validation, adding overhead.
  • Hardware specificity – Policies are instance‑optimal; a policy tuned for one CPU/cache configuration may degrade on another, necessitating re‑synthesis for each target.
  • Scalability to complex policies – Tasks that need multi‑dimensional state (e.g., multi‑queue schedulers) may exceed the expressive power of small LLMs, prompting the need for larger models or hierarchical interfaces.
  • Future directions – The authors suggest exploring (a) closed‑loop online synthesis where the system continuously refines policies in production, (b) richer type‑safe interfaces to further reduce bugs, and (c) integration with reinforcement‑learning signals to guide the search more efficiently.

Authors

  • Rohit Dwivedula
  • Divyanshu Saxena
  • Sujay Yadalam
  • Daehyeok Kim
  • Aditya Akella

Paper Information

  • arXiv ID: 2512.25065v1
  • Categories: cs.OS, cs.AI, cs.DC
  • Published: December 31, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »