[Paper] AutoFSM: A Multi-agent Framework for FSM Code Generation with IR and SystemC-Based Testing

Published: (December 12, 2025 at 04:15 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.11398v1

Overview

The paper presents AutoFSM, a multi‑agent framework that couples large language models (LLMs) with a purpose‑built intermediate representation (IR) and SystemC‑based testing to generate reliable Verilog code for finite‑state‑machine (FSM) control logic. By structuring the generation pipeline and automating verification, the authors demonstrate a measurable boost in code correctness and debugging speed—an advance that could make AI‑assisted hardware design practical for everyday engineers.

Key Contributions

  • Structured IR for FSMs – A clear, hierarchical intermediate representation that isolates syntactic details from high‑level state‑machine semantics, dramatically lowering syntax‑error rates.
  • Multi‑agent orchestration – Separate agents handle IR creation, Verilog translation, and testbench generation, enabling parallel development and easier debugging.
  • SystemC‑driven automatic testbench – First integration of SystemC modeling with auto‑generated testbenches, providing fast, high‑coverage functional validation of the generated RTL.
  • SKT‑FSM benchmark – A new, publicly released suite of 67 hierarchical FSMs spanning three complexity tiers, filling a gap in hardware‑generation evaluation datasets.
  • Empirical gains – When paired with the same base LLM, AutoFSM improves pass rates by up to 11.94 % and cuts syntax errors by up to 17.62 % compared with the open‑source MAGE framework.

Methodology

  1. Prompt‑to‑IR Agent – The LLM receives a natural‑language description of the desired FSM and outputs the structured IR (states, transitions, inputs/outputs, hierarchy). The IR is expressed in a JSON‑like schema that enforces well‑formedness.
  2. IR‑to‑Verilog Agent – A second LLM (or a rule‑based transformer) consumes the IR and emits Verilog RTL. Because the IR already guarantees syntactic consistency, the Verilog generator focuses on idiomatic coding patterns rather than error‑prone syntax.
  3. SystemC Modeling Agent – Parallel to Verilog generation, an agent builds a SystemC behavioral model from the same IR. This model serves as an oracle for functional correctness.
  4. Automatic Testbench Synthesis – Using the SystemC model, a testbench generator creates stimulus vectors, monitors outputs, and reports mismatches back to the LLM loop for iterative refinement.
  5. Evaluation on SKT‑FSM – The authors run the full pipeline on each benchmark case, measuring syntax error frequency, simulation pass rate, and overall generation time.

Results & Findings

MetricAutoFSM (LLM + IR)MAGE (baseline)Improvement
Syntax error rate5.3 %22.9 %‑17.6 %
Pass rate (simulation)78.1 %66.2 %+11.9 %
Avg. generation time per FSM12 s15 sFaster due to early error filtering

Key takeaways

  • The IR acts as a strong guardrail, slashing syntax errors by more than threefold.
  • SystemC‑backed testbenches catch functional bugs early, raising the overall pass rate.
  • The multi‑agent design scales well across FSMs of varying depth, showing consistent gains even on the most complex benchmark items.

Practical Implications

  • Faster Prototyping – Hardware teams can describe control logic in plain English (or a lightweight DSL) and obtain synthesizable Verilog in seconds, cutting down initial RTL drafting time.
  • Reduced Debug Overhead – Early syntax validation and automated functional testing mean fewer manual simulation cycles, freeing engineers to focus on architecture rather than typo‑hunting.
  • Toolchain Integration – Because the IR is language‑agnostic, it can be hooked into existing CI/CD pipelines, enabling continuous generation and regression testing of FSM blocks.
  • Educational Use – Students and junior designers can experiment with FSM design without mastering Verilog syntax, using the framework as a learning scaffold.
  • Open‑source Benchmark – SKT‑FSM provides a ready‑made test suite for anyone building new LLM‑based hardware generators, fostering community‑driven improvements.

Limitations & Future Work

  • LLM Dependency – The quality of the generated IR and Verilog still hinges on the underlying LLM; smaller or less‑trained models may not achieve the same error reductions.
  • Scope to FSMs – AutoFSM targets finite‑state‑machine control logic; extending the approach to datapath components (e.g., arithmetic units) remains an open challenge.
  • Benchmark Diversity – While SKT‑FSM covers hierarchical FSMs, real‑world designs often involve mixed‑level timing constraints and vendor‑specific primitives that are not yet represented.
  • Future Directions – The authors plan to (1) incorporate reinforcement learning from testbench feedback to close the generation loop, (2) broaden the IR to capture timing and power annotations, and (3) evaluate the framework on commercial RTL libraries and larger system‑level designs.

Authors

  • Qiuming Luo
  • Yanming Lei
  • Kunzhong Wu
  • Yixuan Cao
  • Chengjian Liu

Paper Information

  • arXiv ID: 2512.11398v1
  • Categories: cs.SE, cs.MA
  • Published: December 12, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »