[Paper] Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours

Published: (May 6, 2026 at 01:40 PM EDT)
5 min read
Source: arXiv

Source: arXiv - 2605.05170v1

Overview

The Verkor team’s new paper showcases Design Conductor 2.0, an upgraded multi‑agent system that can autonomously design a full‑blown LLM inference accelerator—from concept to FPGA‑ready RTL—in just 80 hours. Building on their earlier “Conductor” prototype (which assembled a simple RISC‑V CPU in 12 h), this version leverages the latest frontier LLMs (April 2026 releases) to tackle tasks that are 80× larger and far more complex, culminating in the VerTQ accelerator that hard‑wires support for the TurboQuant quantization scheme.

Key Contributions

  • Scalable autonomous hardware design pipeline: Extends the original Conductor framework to handle multi‑stage, high‑complexity projects without human intervention.
  • TurboQuant‑specific accelerator (VerTQ): 240‑cycle pipeline, 5,129 FP16/FP32 compute units, 8 attention pipes, mapped to a TSMC 16FF (5.7 mm²) FPGA implementation at 125 MHz.
  • Token‑efficiency analysis: Detailed accounting of LLM token usage across design phases, demonstrating a 30 % reduction compared to the prior system.
  • Robust multi‑agent orchestration: Introduces a hierarchy of specialist agents (specification, micro‑architecture, RTL generation, verification, and place‑and‑route) that communicate via a shared knowledge graph.
  • Open‑source harness and benchmark suite: The authors release the Conductor 2.0 harness, a set of four benchmark designs (including VerTQ), and scripts for reproducing the results.

Methodology

  1. Task Decomposition – The top‑level “project manager” LLM parses the high‑level goal (e.g., “build a TurboQuant inference accelerator”) and splits it into sub‑tasks: architecture definition, datapath design, verification plan, and physical implementation.
  2. Specialist Agents – Each sub‑task is handed to a dedicated LLM agent fine‑tuned on relevant corpora (computer‑architecture textbooks, RTL coding standards, FPGA placement heuristics).
  3. Iterative Prompt‑Feedback Loop – Agents generate artefacts (e.g., Verilog modules, testbenches) and immediately run them through automated tools (Yosys, Verilator, Vivado). Errors are fed back as new prompts, enabling self‑correcting cycles.
  4. Knowledge Graph Sync – All design decisions, constraints, and performance metrics are stored in a central graph, ensuring consistency across agents and preventing contradictory changes.
  5. Final Synthesis – The RTL is fed to a commercial FPGA toolchain, producing a place‑and‑route layout that meets timing (125 MHz) and area (5.7 mm²) targets.

The whole flow is orchestrated by a lightweight Python controller that monitors token consumption, triggers tool runs, and logs progress.

Results & Findings

MetricVerTQ (Conductor 2.0)Prior Conductor (CPU)
Design time80 h (fully autonomous)12 h (CPU)
Compute density5129 FP16/32 units (≈ 0.45 GOPS/mm²)N/A
Pipeline depth240 cycles (fixed‑function TurboQuant)5‑stage generic CPU
FPGA frequency125 MHz200 MHz (CPU)
Silicon area (TSMC 16FF)5.7 mm²1.2 mm² (CPU)
Token usage≈ 1.2 B tokens (30 % less than baseline)2.0 B tokens
Verification coverage100 % functional simulation + 95 % post‑synthesis lint85 % functional simulation

Key takeaways:

  • The system can handle orders‑of‑magnitude larger designs while keeping token usage in check.
  • VerTQ meets the performance envelope required for modern LLM inference (high FP16/32 throughput) and fits comfortably within a modest FPGA footprint.
  • The hierarchical agent model dramatically reduces the need for manual debugging; most RTL errors were resolved automatically within three feedback iterations.

Practical Implications

  • Accelerated hardware prototyping – Companies can spin up custom inference accelerators in days rather than months, dramatically shortening time‑to‑market for niche AI workloads (e.g., edge LLMs, domain‑specific quantization).
  • Cost‑effective FPGA deployment – With a 5.7 mm² footprint, VerTQ can be instantiated on mid‑range FPGAs, enabling rapid, low‑volume production without expensive ASIC masks.
  • Democratizing ASIC design – By exposing the Conductor 2.0 harness as open source, smaller startups can generate RTL for specialized AI blocks without hiring a full hardware team.
  • Standard‑body impact – The TurboQuant‑hardwired pipeline could inspire future RISC‑V extensions or vendor IP blocks that natively support aggressive quantization schemes.
  • Toolchain integration – The token‑efficiency metrics and knowledge‑graph approach provide a blueprint for integrating LLM agents into existing EDA flows, potentially augmenting traditional synthesis and verification tools.

Limitations & Future Work

  • Scalability ceiling – While 80 h suffices for a single accelerator, scaling to multi‑chip SoC designs (e.g., full AI processors) still exceeds current token budgets and tool runtimes.
  • Verification depth – The current pipeline stops at post‑synthesis lint and functional simulation; formal verification and corner‑case timing analysis remain manual.
  • Model dependency – Results hinge on the April 2026 frontier LLMs; regression to older models leads to a steep drop in design quality and token efficiency.
  • Hardware diversity – The study focuses on a TSMC 16FF FPGA target; porting to ASIC or other process nodes will require additional agent training data.

Future research directions include integrating formal property checking agents, expanding the knowledge graph to capture power‑budget constraints, and exploring multi‑agent collaboration across heterogeneous design teams (hardware, firmware, and software).

Authors

  • The Verkor Team
  • Ravi Krishna
  • Suresh Krishna
  • David Chin

Paper Information

  • arXiv ID: 2605.05170v1
  • Categories: cs.AR, cs.AI
  • Published: May 6, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Normalizing Trajectory Models

Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coar...