[Paper] LLM-based Behaviour Driven Development for Hardware Design

Published: 1 month ago (December 19, 2025 at 12:19 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.17814v1

Overview

Design verification for chips and complex systems is a massive bottleneck—writing precise test scenarios from high‑level specs can take weeks. This paper explores a novel twist on Behavior‑Driven Development (BDD) for hardware: using Large Language Models (LLMs) to automatically turn textual requirements into executable verification scenarios. By bridging the gap between natural‑language specs and formal testbenches, the authors aim to make hardware verification faster, less error‑prone, and more accessible to engineers who aren’t verification specialists.

Key Contributions

LLM‑driven scenario generation: A pipeline that feeds hardware specifications to an LLM (e.g., GPT‑4) and receives BDD‑style “Given‑When‑Then” scenarios ready for testbench integration.
Domain‑specific prompting & fine‑tuning: Tailored prompts and a lightweight fine‑tuning step that teach the model the syntax of hardware description languages (HDL) and verification frameworks (UVM, SystemVerilog).
Prototype toolchain: An end‑to‑end prototype that links the LLM output to existing simulation environments, automatically converting scenarios into SystemVerilog assertions and test vectors.
Empirical evaluation: Case studies on three open‑source hardware blocks (a FIFO, an ALU, and a simple RISC‑V core) showing up to 45 % reduction in manual scenario authoring time and a 10–20 % increase in functional coverage.
Human‑in‑the‑loop workflow: A lightweight UI that lets verification engineers review, edit, and approve generated scenarios, keeping the process safe for safety‑critical designs.

Methodology

Spec Collection – Gathered natural‑language requirement documents (e.g., “the FIFO must never overflow”) from existing hardware projects.
Prompt Engineering – Crafted prompts that ask the LLM to output BDD scenarios in Gherkin‑style syntax, explicitly requesting SystemVerilog‑compatible assertions.
Fine‑tuning – Used a small dataset of 200 hand‑written hardware BDD examples to fine‑tune the base LLM, improving its understanding of HDL terminology.
Scenario Translation – Parsed generated “Given‑When‑Then” steps and automatically mapped them to SystemVerilog constructs (e.g., assert property, covergroup).
Integration & Simulation – Injected the translated testbench snippets into a UVM environment and ran them on a standard simulator (VCS/ModelSim).
Metrics Collection – Measured authoring effort (person‑hours), functional coverage (via coverage reports), and bug detection rate compared to a baseline manual BDD workflow.

Results & Findings

Metric	Manual BDD	LLM‑augmented BDD
Avg. time to create a scenario (hrs)	0.8	0.44
Functional coverage increase	–	+12 % (FIFO), +18 % (ALU), +10 % (RISC‑V)
Bugs discovered (new)	3	5
False positives (invalid scenarios)	0	2 % of generated scenarios (fixed in review)

The study shows that the LLM can reliably produce syntactically correct verification code, but a short human review step is still required to catch occasional hallucinations or ambiguous wording. Overall, the workflow cuts down on repetitive writing and helps less‑experienced engineers contribute to verification.

Practical Implications

Speed up verification cycles – Teams can generate a first draft of test scenarios in minutes rather than hours, accelerating the “design‑verify‑iterate” loop.
Lower the entry barrier – Junior hardware engineers or software‑focused developers can participate in verification without deep UVM expertise, fostering cross‑disciplinary collaboration.
Better documentation traceability – Because scenarios are derived directly from natural‑language specs, the link between requirement, test, and coverage becomes explicit, aiding compliance audits (e.g., ISO 26262).
Plug‑and‑play with existing EDA flows – The prototype outputs standard SystemVerilog/UVM code, meaning it can be dropped into any existing simulation or formal verification pipeline without major tool changes.
Potential for AI‑assisted regression management – The same LLM pipeline could be extended to auto‑update scenarios when specs evolve, reducing regression‑test maintenance overhead.

Limitations & Future Work

Hallucination risk – The LLM occasionally invents signals or constraints not present in the spec; a robust verification workflow must retain a human review checkpoint.
Domain coverage – The fine‑tuning dataset is small and focused on a few classic blocks; scaling to large SoCs with proprietary IP may require more extensive domain data.
Performance on formal methods – The current work targets simulation‑based verification; integrating with formal property generation remains an open challenge.
Toolchain integration – While the prototype works with open‑source simulators, tighter integration with commercial EDA suites (Synopsys, Cadence) is needed for industry adoption.
Future directions include (1) building a continuous‑learning loop where verified scenarios feed back into the LLM, (2) expanding to multi‑language specs (e.g., UML, SysML), and (3) exploring zero‑shot prompting to eliminate the fine‑tuning step altogether.

Authors

Rolf Drechsler
Qian Liu

Paper Information

arXiv ID: 2512.17814v1
Categories: cs.SE, cs.AI, cs.AR
Published: December 19, 2025
PDF: Download PDF

[Paper] LLM-based Behaviour Driven Development for Hardware Design

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting

[Paper] Adversarial Robustness of Vision in Open Foundation Models

[Paper] When Reasoning Meets Its Laws

[Paper] Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy