[Paper] AutoFSM: A Multi-agent Framework for FSM Code Generation with IR and SystemC-Based Testing

Published: 1 month ago (December 12, 2025 at 04:15 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.11398v1

Overview

The paper presents AutoFSM, a multi‑agent framework that couples large language models (LLMs) with a purpose‑built intermediate representation (IR) and SystemC‑based testing to generate reliable Verilog code for finite‑state‑machine (FSM) control logic. By structuring the generation pipeline and automating verification, the authors demonstrate a measurable boost in code correctness and debugging speed—an advance that could make AI‑assisted hardware design practical for everyday engineers.

Key Contributions

Structured IR for FSMs – A clear, hierarchical intermediate representation that isolates syntactic details from high‑level state‑machine semantics, dramatically lowering syntax‑error rates.
Multi‑agent orchestration – Separate agents handle IR creation, Verilog translation, and testbench generation, enabling parallel development and easier debugging.
SystemC‑driven automatic testbench – First integration of SystemC modeling with auto‑generated testbenches, providing fast, high‑coverage functional validation of the generated RTL.
SKT‑FSM benchmark – A new, publicly released suite of 67 hierarchical FSMs spanning three complexity tiers, filling a gap in hardware‑generation evaluation datasets.
Empirical gains – When paired with the same base LLM, AutoFSM improves pass rates by up to 11.94 % and cuts syntax errors by up to 17.62 % compared with the open‑source MAGE framework.

Methodology

Prompt‑to‑IR Agent – The LLM receives a natural‑language description of the desired FSM and outputs the structured IR (states, transitions, inputs/outputs, hierarchy). The IR is expressed in a JSON‑like schema that enforces well‑formedness.
IR‑to‑Verilog Agent – A second LLM (or a rule‑based transformer) consumes the IR and emits Verilog RTL. Because the IR already guarantees syntactic consistency, the Verilog generator focuses on idiomatic coding patterns rather than error‑prone syntax.
SystemC Modeling Agent – Parallel to Verilog generation, an agent builds a SystemC behavioral model from the same IR. This model serves as an oracle for functional correctness.
Automatic Testbench Synthesis – Using the SystemC model, a testbench generator creates stimulus vectors, monitors outputs, and reports mismatches back to the LLM loop for iterative refinement.
Evaluation on SKT‑FSM – The authors run the full pipeline on each benchmark case, measuring syntax error frequency, simulation pass rate, and overall generation time.

Results & Findings

Metric	AutoFSM (LLM + IR)	MAGE (baseline)	Improvement
Syntax error rate	5.3 %	22.9 %	‑17.6 %
Pass rate (simulation)	78.1 %	66.2 %	+11.9 %
Avg. generation time per FSM	12 s	15 s	Faster due to early error filtering

Key takeaways

The IR acts as a strong guardrail, slashing syntax errors by more than threefold.
SystemC‑backed testbenches catch functional bugs early, raising the overall pass rate.
The multi‑agent design scales well across FSMs of varying depth, showing consistent gains even on the most complex benchmark items.

Practical Implications

Faster Prototyping – Hardware teams can describe control logic in plain English (or a lightweight DSL) and obtain synthesizable Verilog in seconds, cutting down initial RTL drafting time.
Reduced Debug Overhead – Early syntax validation and automated functional testing mean fewer manual simulation cycles, freeing engineers to focus on architecture rather than typo‑hunting.
Toolchain Integration – Because the IR is language‑agnostic, it can be hooked into existing CI/CD pipelines, enabling continuous generation and regression testing of FSM blocks.
Educational Use – Students and junior designers can experiment with FSM design without mastering Verilog syntax, using the framework as a learning scaffold.
Open‑source Benchmark – SKT‑FSM provides a ready‑made test suite for anyone building new LLM‑based hardware generators, fostering community‑driven improvements.

Limitations & Future Work

LLM Dependency – The quality of the generated IR and Verilog still hinges on the underlying LLM; smaller or less‑trained models may not achieve the same error reductions.
Scope to FSMs – AutoFSM targets finite‑state‑machine control logic; extending the approach to datapath components (e.g., arithmetic units) remains an open challenge.
Benchmark Diversity – While SKT‑FSM covers hierarchical FSMs, real‑world designs often involve mixed‑level timing constraints and vendor‑specific primitives that are not yet represented.
Future Directions – The authors plan to (1) incorporate reinforcement learning from testbench feedback to close the generation loop, (2) broaden the IR to capture timing and power annotations, and (3) evaluate the framework on commercial RTL libraries and larger system‑level designs.

Authors

Qiuming Luo
Yanming Lei
Kunzhong Wu
Yixuan Cao
Chengjian Liu

Paper Information

arXiv ID: 2512.11398v1
Categories: cs.SE, cs.MA
Published: December 12, 2025
PDF: Download PDF

[Paper] AutoFSM: A Multi-agent Framework for FSM Code Generation with IR and SystemC-Based Testing

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] A Study of Library Usage in Agent-Authored Pull Requests

[Paper] Mini-SFC: A Comprehensive Simulation Framework for Orchestration and Management of Service Function Chains

[Paper] Visualisation for the CIS benchmark scanning results

[Paper] Coverage Isn't Enough: SBFL-Driven Insights into Manually Created vs. Automatically Generated Tests