[Paper] LLM-based Behaviour Driven Development for Hardware Design
Source: arXiv - 2512.17814v1
Overview
Design verification for chips and complex systems is a massive bottleneck—writing precise test scenarios from high‑level specs can take weeks. This paper explores a novel twist on Behavior‑Driven Development (BDD) for hardware: using Large Language Models (LLMs) to automatically turn textual requirements into executable verification scenarios. By bridging the gap between natural‑language specs and formal testbenches, the authors aim to make hardware verification faster, less error‑prone, and more accessible to engineers who aren’t verification specialists.
Key Contributions
- LLM‑driven scenario generation: A pipeline that feeds hardware specifications to an LLM (e.g., GPT‑4) and receives BDD‑style “Given‑When‑Then” scenarios ready for testbench integration.
- Domain‑specific prompting & fine‑tuning: Tailored prompts and a lightweight fine‑tuning step that teach the model the syntax of hardware description languages (HDL) and verification frameworks (UVM, SystemVerilog).
- Prototype toolchain: An end‑to‑end prototype that links the LLM output to existing simulation environments, automatically converting scenarios into SystemVerilog assertions and test vectors.
- Empirical evaluation: Case studies on three open‑source hardware blocks (a FIFO, an ALU, and a simple RISC‑V core) showing up to 45 % reduction in manual scenario authoring time and a 10–20 % increase in functional coverage.
- Human‑in‑the‑loop workflow: A lightweight UI that lets verification engineers review, edit, and approve generated scenarios, keeping the process safe for safety‑critical designs.
Methodology
- Spec Collection – Gathered natural‑language requirement documents (e.g., “the FIFO must never overflow”) from existing hardware projects.
- Prompt Engineering – Crafted prompts that ask the LLM to output BDD scenarios in Gherkin‑style syntax, explicitly requesting SystemVerilog‑compatible assertions.
- Fine‑tuning – Used a small dataset of 200 hand‑written hardware BDD examples to fine‑tune the base LLM, improving its understanding of HDL terminology.
- Scenario Translation – Parsed generated “Given‑When‑Then” steps and automatically mapped them to SystemVerilog constructs (e.g.,
assert property,covergroup). - Integration & Simulation – Injected the translated testbench snippets into a UVM environment and ran them on a standard simulator (VCS/ModelSim).
- Metrics Collection – Measured authoring effort (person‑hours), functional coverage (via coverage reports), and bug detection rate compared to a baseline manual BDD workflow.
Results & Findings
| Metric | Manual BDD | LLM‑augmented BDD |
|---|---|---|
| Avg. time to create a scenario (hrs) | 0.8 | 0.44 |
| Functional coverage increase | – | +12 % (FIFO), +18 % (ALU), +10 % (RISC‑V) |
| Bugs discovered (new) | 3 | 5 |
| False positives (invalid scenarios) | 0 | 2 % of generated scenarios (fixed in review) |
The study shows that the LLM can reliably produce syntactically correct verification code, but a short human review step is still required to catch occasional hallucinations or ambiguous wording. Overall, the workflow cuts down on repetitive writing and helps less‑experienced engineers contribute to verification.
Practical Implications
- Speed up verification cycles – Teams can generate a first draft of test scenarios in minutes rather than hours, accelerating the “design‑verify‑iterate” loop.
- Lower the entry barrier – Junior hardware engineers or software‑focused developers can participate in verification without deep UVM expertise, fostering cross‑disciplinary collaboration.
- Better documentation traceability – Because scenarios are derived directly from natural‑language specs, the link between requirement, test, and coverage becomes explicit, aiding compliance audits (e.g., ISO 26262).
- Plug‑and‑play with existing EDA flows – The prototype outputs standard SystemVerilog/UVM code, meaning it can be dropped into any existing simulation or formal verification pipeline without major tool changes.
- Potential for AI‑assisted regression management – The same LLM pipeline could be extended to auto‑update scenarios when specs evolve, reducing regression‑test maintenance overhead.
Limitations & Future Work
- Hallucination risk – The LLM occasionally invents signals or constraints not present in the spec; a robust verification workflow must retain a human review checkpoint.
- Domain coverage – The fine‑tuning dataset is small and focused on a few classic blocks; scaling to large SoCs with proprietary IP may require more extensive domain data.
- Performance on formal methods – The current work targets simulation‑based verification; integrating with formal property generation remains an open challenge.
- Toolchain integration – While the prototype works with open‑source simulators, tighter integration with commercial EDA suites (Synopsys, Cadence) is needed for industry adoption.
- Future directions include (1) building a continuous‑learning loop where verified scenarios feed back into the LLM, (2) expanding to multi‑language specs (e.g., UML, SysML), and (3) exploring zero‑shot prompting to eliminate the fine‑tuning step altogether.
Authors
- Rolf Drechsler
- Qian Liu
Paper Information
- arXiv ID: 2512.17814v1
- Categories: cs.SE, cs.AI, cs.AR
- Published: December 19, 2025
- PDF: Download PDF