

# LLM-Aided Testbench Generation and Bug Detection for Finite-State Machines

Jitendra Bhandari\*, Johann Knechtel†, Ramesh Narayanaswamy‡, Siddharth Garg\*, and Ramesh Karri\*

\*New York University, New York, USA †New York University Abu Dhabi, UAE ‡Synopsys, Sunnyvale, CA, USA

**Abstract**—This work investigates the potential of tailoring Large Language Models (LLMs), specifically GPT3.5 and GPT4, for the domain of chip testing. A key aspect of chip design is functional testing, which relies on testbenches to evaluate the functionality and coverage of Register-Transfer Level (RTL) designs. We aim to enhance testbench generation by incorporating feedback from commercial-grade Electronic Design Automation (EDA) tools into LLMs. Through iterative feedback from these tools, we refine the testbenches to achieve improved test coverage. Our case studies present promising results, demonstrating that this approach can effectively enhance test coverage. By integrating EDA tool feedback, the generated testbenches become more accurate in identifying potential issues in the RTL design. Furthermore, we extended our study to use this enhanced test coverage framework for detecting bugs in the RTL implementations.

**Index Terms**—LLM, EDA, Testing, Coverage Analysis, FSM

## I. INTRODUCTION

Testbench generation is vital in the integrated circuit (IC) design cycle to ensure that chips are functional, meet design specifications, and are of high quality. Testbenches are designed to activate and monitor a wide range of design behaviors, ensuring that every part of a circuit is exercised during testing, leaving no function unchecked. This comprehensive testing approach not only ensures functional correctness but also provides valuable insights that can lead to design optimizations. Functional bugs tend to occur in a chip's control path, which is often implemented via Finite-State Machines (FSMs). These FSMs orchestrate the sequence of operations within the chip, making their correct function critical. Thus, achieving full FSM coverage is often a requirement for compliance with industry standards, particularly in sectors like aerospace and medical devices etc., where high reliability and safety are paramount [1].

Large Language Models (LLMs) [2] have revolutionized the way developers approach coding and testing, including the realm of hardware description languages (HDL). Recent work has demonstrated the utility of LLMs for various hardware-related tasks. For example, LLMs have been used for Verilog code generation [3]–[6], where they help in writing and optimizing HDL code. They have also been employed in generating assertions [7], [8], which are critical for verifying the correctness of HDL designs. Additionally, LLMs have shown promise in scripting for electronic design automation (EDA) tools [9], [10], streamlining the design and verification process. These methods either directly leverage foundation models like GPT, or fine-tune specialized code generation models on hardware datasets. Results from these studies suggest that LLMs can effectively understand Verilog code with its syntax, structure,



Fig. 1: Contribution 1: LLM-aided Testbench Generation.



Fig. 2: Contribution 2: LLM-aided Bug Detection.

and functional requirements. This understanding enables LLMs to assist in various stages of the design and verification process, improving efficiency and accuracy.

**Scope of This Work:** First, we explore LLMs for automated testbench generation of FSMs. Through our case studies, we find that by using feedback from commercial EDA simulation tools, LLMs can improve FSM test coverage. Second, we analyze the bug detection capability by providing output traces from the tools and utilizing prompting techniques.

## II. BACKGROUND AND RELATED WORK

Functional testing of RTL modules is crucial to ensure correct functionality. State-of-the-art verification techniques include both functional and formal verification, along with coverage analysis. These techniques rely on the careful formulation of assertions, often using System-Verilog, and the generation of comprehensive test patterns. Designers and test engineers find it especially challenging to create a testbench and generate effective test patterns. This difficulty increases when the design is still in progress and involves multiple team members. Testbench compilation requires a thorough understanding of the RTL code. Designers must meticulously analyze the code to generate test patterns that achieve complete coverage. Effective coverage analysis helps identify untested parts of the design, guiding further test development.

LLMs and their increasing prominence, as highlighted by the developments of models like GPT-4 [2], have sparked broad

interest in various fields. Specifically, chip design has seen a surge in innovative approaches leveraging LLMs, such as [11]. Considerable progress has been made in improving the quality of Verilog code generation [3]–[6]. These studies introduce methodologies to refine the process of generating Verilog code, demonstrating the potential of LLMs to streamline and improve hardware design workflows. Furthermore, [12]–[14] showcases the effectiveness of prompting strategies in chip design. They harness the power of LLMs to conceptualize and detail the intricate aspects of hardware architectures, paving the way for efficient chip development. The work in [10] extends LLM applications into assistant chatbots, script generation, and bug analysis. Similarly, [9] explores the use of LLMs for task planning and execution within the EDA flow, highlighting the potential for automating and optimizing complex workflows. Finally, the generation of assertions for verifying the correctness of IC designs, has been enhanced through LLMs [7], [8].

### III. METHODOLOGY

We study the capabilities of LLMs for both *coverage-guided testbench generation* and *automated bug detection*.

#### A. LLM (and coverage)-guided Testbench Generation

Fig. 1 illustrates an LLM-guided methodology for automatic testbench generation for FSMs. Any LLM can be employed and without loss of generality, we use GPT3.5 and GPT4. The inputs are: (1) an English-language specification<sup>1</sup>, and (2) a Verilog RTL description of design-under-test (DUT). The question we seek to address here is: *Can we automatically generate a high-quality testbench for such DUT Verilog code?* We first need a metric to measure the quality of a testbench. A common metric, especially for FSMs, is *transition coverage*, i.e., there should be at least one set of test patterns that covers all state transitions in the FSM. Several simulation tools provide *coverage reports*, including state and transition coverage. Here, we investigate if we can use such coverage reports as feedback to iteratively improve coverage of an LLM-generated testbench.

Our method for automatic testbench generation involves several steps to ensure comprehensive testing of FSMs. First, we prompt an LLM to generate an initial testbench using the Verilog code of the DUT as input, as shown in Fig. 3(a). This Verilog code represents the FSM that needs to be tested. The LLM processes this input and generates a testbench that is intended to verify the function of the DUT. Next, the generated testbench undergoes a compilation check using commercial EDA tools. In this step, the testbench is checked for syntax and logic errors. If any errors are detected during compilation, these errors are appended to the prompt and fed back to the LLM. The LLM uses this feedback to refine and correct its output iteratively until an error-free testbench is produced.

Next, the FSM is simulated in coverage analysis mode. The simulation yields a report that quantifies how thoroughly the testbench exercises the FSM. The coverage report identifies

<sup>1</sup>specification could be phrased as “Write a Verilog module that detects a 1011 pattern and outputs a 1 anytime the last four inputs match with this pattern, and 0 otherwise.”

You are an expert in design verification for Verilog code. Given a Verilog RTL module, you will write a testbench to simulate it and try to cover all the possible state transitions. Please follow the below instructions while providing any response:

- 1) The testbench should start with `module tb();`
- 2) You will add `$fsdbDumpfile`, `$fsdbDumpvars` commands in the testbench at the start of the first `initial` block.
- 3) Please use `apply_input()` format to apply input sequences.
- 4) You should consider whether it requires an active or high reset from the RTL code provided.
- 5) At the end of test patterns add `$finish`.

(a) System Prompt for GPT.

The above testbench provided doesn't cover all the transitions. This is the list of transitions that were expected but didn't happen:  
“*Transition from A to B*”

Please consider the RTL Verilog code provided while providing the testbench and combine the test cases from the above response. You may have to reset a few times to cover certain transitions.

(b) Prompt 1, to guide on some transitions not covered yet.

We ran the simulation tool with the testbench, this is the value for the state register variable across the clock cycle, the sequence is provided serially starting from 0 till the simulation finishes. This also shows the transition at each clock cycle.

“*S0 S1 S2 S3 S1 S5.....*”

Please use the design specification provided and find out if there is any mismatch between them. We are looking to see if any transitions are inconsistent with the design spec.

(c) Prompt 2, to verify the test outputs with the design specification.

Fig. 3: Exemplary prompts used with GPT.

any FSM transitions that the testbench has missed, highlighting areas where the testbench needs improvement. Uncovered transitions in the coverage report are used as prompts for the LLM, as shown in Fig. 3(b). These prompts guide the LLM to generate new test cases to cover missing transitions. By doing so, we ensure that the testbench comprehensively tests all functional parts of the FSM. Generating new test cases and performing coverage analysis is repeated, as shown in Fig. 1. The cycle continues until full transition coverage is achieved or may stop once a user-defined threshold is met. Fig. 5 shows a testbench that yields 100% coverage.

#### B. LLM-guided Automated Bug Detection

RTL may contain bugs, either intentionally inserted or accidental, that need to be identified and fixed during testing. Given a testbench, we simulate the FSM and record the results of each test case. These simulation results and the original design specification in plain English are provided to the LLM. The LLM is tasked with identifying failing test cases by comparing the simulation outcomes with the expected behavior described in the specification in Fig. 2. This capability is powerful because the LLM can operate on less formal, natural-language descriptions. Creating a formal specification is complex, and time-consuming. Fig. 3(c) provides prompts.

We use testbenches generated in part 1 as the starting point for bug detection. Our study shows that, as the FSM becomes complex the number of input patterns to detect the bugs increases notably. This leads to a large I/O trace obtained after the simulation, as shown in Fig. 2. GPT cannot comprehend this, thus missing potential bugs. This naive approach thus

does not scale. To address these challenges, we employ two improvements to prompting: (1) *Dividing I/O patterns into smaller sets to track states*, and (2) *Handling multi-bit inputs and outputs individually to manage sub-circuits (output cones)*, as shown in Fig. 4. An analysis is in Section IV(C).

This is the input-output pair for the first 10 clock cycles. Please use the design specification provided and find out if there is any mismatch between them. We are looking to see if any transitions are inconsistent with the design spec. This will be followed up with the next 10 clock cycles and so on. After every 10th clock cycle please remember the current state which is very important when we provide new 10 input-output pairs.

(a) Prompt 3, sub-divide all the input/output pairs.

To simplify the mismatch detection, consider the provided input-output pair for each clock cycle. Start the detection process by focusing on one bit of output and check for correct values as the input patterns are applied, followed by checking on other bits of output as well.

(b) Prompt 4, handle multi-bit input and output scenarios separately.

Fig. 4: Additional Prompt for GPT during detection step.

#### IV. EXPERIMENTS

##### A. Setup

Our study uses GPT(3.5/4), *Synopsys VCS U-2023.03-1* and *Synopsys Verdi U-2023.03-1*. The latter tools are for compiling and simulating Verilog code, debugging, and using its coverage reports as feedback. An example VCS report template is shown in Fig. 6. We automated the framework in Fig. 1 and Fig. 2 in Python, including testbench generation, code compilation, coverage analysis, and comparison of test cases with natural-language specifications. For our case studies, we obtained a large number of representative FSMs from HDLBits [15] and GitHub, with varying complexity in terms of number of states and transitions. 100 FSMs are used in our study.<sup>2</sup>

##### B. Results: LLM-guided Testbench Generation

To understand the limitations of using LLMs, if any, we have first studied the obtained testbenches in detail, providing the following important insights. First, for certain paths and transitions selected for extended coverage, the model may not be able to figure out the next correct input pattern to move to the next state. For example, to cover all transitions in Fig. 7(a) (state transition diagram for FSM16), i.e., A→A, A→B, B→C, B→D, C→D, C→E, E→D, E→E, D→F, F→C, F→D, and D→A, we will require an input pattern that begins with coverage of state A, like 1000010010101101.... This requirement was not automatically understood/identified by GPTs, so the model could get stuck at some state and repeat generating the same test patterns. Second, due to the context limits of LLM for that matter, the model cannot cover all transitions in one iteration. Thus, we prompt it to ‘reset’ after certain transitions such that it can discover all input patterns up to some state transitions and traverse the rest, as shown in Fig. 3.

```

1 module tb_fsm();
2   reg clk;
3   reg rst;
4   reg inp;
5   wire out1;
6   wire out2;
7
8   fsm uut(.clk(clk), .rst(rst), .inp(inp), .out1(out1), .
9           .out2(out2));
// Clock generation
10  always begin
11    #5 clk = ~clk;
12  end
13
14 // Test procedure
15 initial begin
16   // Initialize signals
17   clk = 0;
18   rst = 1;
19   inp = 0;
20   #10 rst = 0;
21
22   // Set up FSDB file for waveform viewing
23   $fsdbDumpfile("test.fsdb");
24   $fsdbDumpvars;
25
26   // Test sequence to cover all transitions
27   // S0 -> S1 -> S2 -> S3 -> S4 -> S5 -> S6 -> S7 with inp
28   =1
29   apply_input_sequence(8'b11111111);
// Rest of the test patterns
// ....
$stop;
end
// Task to apply input and display results
task apply_input(input reg in);
  inp = in;
  #10; // Wait for one clock cycle
endtask
// Task to apply a sequence of inputs
task apply_input_sequence(input reg [7:0] seq);
  integer i;
  for (i = 7; i >= 0; i = i - 1) begin
    apply_input(seq[i]);
  end
endtask
endmodule

```

Fig. 5: An LLM-generated testbench exercising the design under test, in this instance an FSM.

| FSM Coverage for Module : fsm    |           |                    |         |
|----------------------------------|-----------|--------------------|---------|
| Summary for FSM :: current_state |           |                    |         |
|                                  | Total     | Covered            | Percent |
| States                           | 4         | 4                  | 100.00  |
| Transitions                      | 8         | 6                  | 75.00   |
| States                           | Line No.  | Covered            |         |
| A                                | 17        | Covered            |         |
| B                                | 29        | Covered            |         |
| C                                | 38        | Covered            |         |
| D                                | 35        | Covered            |         |
| Transitions                      | Line No.  | Covered            |         |
| A→A                              | 17        | Covered            |         |
| A→B                              | 29        | Covered            |         |
| <b>B→A</b>                       | <b>17</b> | <b>Not Covered</b> |         |
| B→C                              | 38        | Covered            |         |
| B→D                              | 35        | Covered            |         |
| <b>C→A</b>                       | <b>17</b> | <b>Not Covered</b> |         |
| C→D                              | 44        | Covered            |         |
| D→A                              | 17        | Covered            |         |

Fig. 6: FSM coverage report template.

<sup>2</sup>We show 50/100 cases, capturing the complexity of the dataset. Full analysis of the datasets, including the RTL source code and the testbenches are available at [16]. The remaining 50 cases are in the appendix, Table II.

TABLE I: Results on FSMs of different complexity. All FSMs have natural-language prompts describing the function. "Iters" are the number of iterations required to achieve % Cov (Coverage). "State Regs", "I/O pairs" and "Fuzzing" denote # input patterns required to detect a mismatch from the specification by the three methods, respectively. FSMs are arranged from 1–100. 50 FSMs are reported here, rest are reported in Appendix A. (✓) denotes successful bug detection and (✗) denotes failed attempt.

| Level  | FSMs | LLM (and coverage)-guided Testbench Generation |     |        |        |       |                  |                |       | LLM-guided Automated Bug Detection |            |           |         |                  |                |         |            | LLM-guided Automated Bug Detection |         |            |                  |                |   |            |           |         |      |
|--------|------|------------------------------------------------|-----|--------|--------|-------|------------------|----------------|-------|------------------------------------|------------|-----------|---------|------------------|----------------|---------|------------|------------------------------------|---------|------------|------------------|----------------|---|------------|-----------|---------|------|
|        |      | Characteristics                                |     |        | GPT3.5 | GPT4  | GPT3.5+This Work | GPT4+This Work |       | GPT3.5                             |            |           | GPT4    | GPT3.5+This Work | GPT4+This Work |         | GPT3.5     |                                    |         | GPT4       | GPT3.5+This Work | GPT4+This Work |   | GPT3.5     |           |         | GPT4 |
|        |      | i/o                                            | o/p | states | % Cov  | % Cov | Iters            | % Cov          | Iters | % Cov                              | State Regs | I/O pairs | Fuzzing | State Regs       | I/O pairs      | Fuzzing | State Regs | I/O pairs                          | Fuzzing | State Regs | I/O pairs        | Fuzzing        |   | State Regs | I/O pairs | Fuzzing |      |
| Easy   | 1    | 2                                              | 1   | 2      | 50     | 50    | 2                | 100            | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 2    | 6                                              | 3   | 2      | 75     | 50    | 2                | 100            | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 9    | 1                                              | 1   | 3      | 33     | 75    | 3                | 100            | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 10   | 1                                              | 1   | 3      | 50     | 50    | 3                | 100            | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 15   | 3                                              | 3   | 4      | 75     | 50    | 3                | 100            | 3     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 16   | 2                                              | 2   | 4      | 25     | 50    | 3                | 100            | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 17   | 2                                              | 1   | 4      | 25     | 50    | 2                | 90             | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 18   | 2                                              | 8   | 4      | 25     | 50    | 4                | 100            | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 19   | 5                                              | 2   | 4      | 25     | 50    | 5                | 90             | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 20   | 3                                              | 3   | 4      | 50     | 50    | 3                | 100            | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 25   | 1                                              | 1   | 5      | 25     | 20    | 5                | 90             | 3     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 26   | 2                                              | 8   | 5      | 33     | 50    | 6                | 100            | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 27   | 1                                              | 2   | 5      | 25     | 35    | 5                | 100            | 3     | 90                                 | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 28   | 3                                              | 3   | 5      | 50     | 50    | 4                | 95             | 3     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 29   | 2                                              | 6   | 5      | 50     | 50    | 4                | 90             | 3     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 30   | 1                                              | 1   | 6      | 25     | 25    | 5                | 90             | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 31   | 6                                              | 2   | 6      | 25     | 20    | 6                | 100            | 2     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 32   | 7                                              | 3   | 6      | 20     | 25    | 4                | 100            | 4     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 33   | 4                                              | 4   | 6      | 43     | 37    | 6                | 90             | 3     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 34   | 1                                              | 1   | 6      | 23     | 32    | 6                | 95             | 3     | 95                                 | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 35   | 2                                              | 4   | 6      | 40     | 35    | 5                | 100            | 3     | 90                                 | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 36   | 1                                              | 2   | 6      | 21     | 40    | 5                | 90             | 3     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 42   | 5                                              | 5   | 7      | 30     | 19    | 7                | 90             | 5     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 43   | 4                                              | 4   | 7      | 20     | 50    | 6                | 92             | 3     | 100                                | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
| Medium | 46   | 3                                              | 7   | 8      | 50     | 33    | 8                | 96             | 6     | 95                                 | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 47   | 3                                              | 7   | 8      | 24     | 7     | 90               | 5              | 90    | ✓                                  | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 48   | 1                                              | 2   | 8      | 20     | 13    | 7                | 91             | 7     | 95                                 | ✓          | ✓         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 52   | 2                                              | 2   | 9      | 25     | 15    | 8                | 90             | 5     | 95                                 | ✓          | ✗         | ✓       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 53   | 2                                              | 4   | 9      | 10     | 12    | 9                | 92             | 8     | 100                                | ✓          | ✗         | ✗       | ✓                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 54   | 1                                              | 1   | 9      | 19     | 33    | 7                | 95             | 6     | 92                                 | ✓          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 59   | 2                                              | 2   | 10     | 8      | 10    | 9                | 95             | 5     | 90                                 | ✓          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 60   | 1                                              | 3   | 10     | 11     | 15    | 8                | 100            | 8     | 100                                | ✓          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 61   | 3                                              | 3   | 10     | 25     | 20    | 8                | 90             | 8     | 90                                 | ✓          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 62   | 2                                              | 1   | 10     | 15     | 12    | 8                | 100            | 8     | 100                                | ✗          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 66   | 6                                              | 5   | 11     | 13     | 10    | 10               | 90             | 9     | 90                                 | ✗          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 67   | 1                                              | 1   | 11     | 7      | 11    | 9                | 92             | 9     | 95                                 | ✗          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 71   | 12                                             | 3   | 12     | 22     | 13    | 12               | 90             | 8     | 90                                 | ✗          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 72   | 2                                              | 1   | 12     | 13     | 12    | 9                | 94             | 9     | 95                                 | ✗          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 75   | 1                                              | 11  | 13     | 18     | 10    | 13               | 92             | 9     | 90                                 | ✗          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 78   | 4                                              | 16  | 14     | 6      | 7     | 14               | 90             | 10    | 100                                | ✗          | ✗         | ✗       | ✗                | ✓              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
| Hard   | 82   | 1                                              | 6   | 16     | 9      | 12    | 15               | 95             | 11    | 95                                 | ✗          | ✗         | ✗       | ✗                | ✗              | ✓       | ✓          | ✓                                  | ✓       | ✓          | ✓                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 85   | 1                                              | 1   | 19     | 7      | 5     | 18               | 91             | 13    | 93                                 | ✗          | ✗         | ✗       | ✗                | ✗              | ✓       | ✓          | ✗                                  | ✗       | ✗          | ✗                | ✓              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 87   | 2                                              | 2   | 20     | 5      | 6     | 20               | 90             | 13    | 90                                 | ✗          | ✗         | ✗       | ✗                | ✗              | ✗       | ✗          | ✗                                  | ✗       | ✗          | ✗                | ✗              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 90   | 1                                              | 1   | 22     | 7      | 7     | 21               | 90             | 14    | 94                                 | ✗          | ✗         | ✗       | ✗                | ✗              | ✗       | ✗          | ✗                                  | ✗       | ✗          | ✗                | ✗              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 92   | 4                                              | 4   | 23     | 8      | 11    | 22               | 91             | 12    | 93                                 | ✗          | ✗         | ✗       | ✗                | ✗              | ✗       | ✗          | ✗                                  | ✗       | ✗          | ✗                | ✗              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 93   | 1                                              | 2   | 24     | 4      | 10    | 23               | 94             | 13    | 91                                 | ✗          | ✗         | ✗       | ✗                | ✗              | ✗       | ✗          | ✗                                  | ✗       | ✗          | ✗                | ✗              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 95   | 2                                              | 1   | 25     | 8      | 12    | 22               | 90             | 11    | 92                                 | ✗          | ✗         | ✗       | ✗                | ✗              | ✗       | ✗          | ✗                                  | ✗       | ✗          | ✗                | ✗              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 96   | 1                                              | 1   | 25     | 5      | 15    | 23               | 93             | 12    | 100                                | ✗          | ✗         | ✗       | ✗                | ✗              | ✗       | ✗          | ✗                                  | ✗       | ✗          | ✗                | ✗              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 99   | 2                                              | 2   | 27     | 12     | 9     | 24               | 95             | 13    | 92                                 | ✗          | ✗         | ✗       | ✗                | ✗              | ✗       | ✗          | ✗                                  | ✗       | ✗          | ✗                | ✗              | ✓ | ✓          | ✓         | ✓       | ✓    |
|        | 100  | 6                                              | 16  | 28     | 6      | 5     | 22               | 95             | 14    | 95                                 | ✗          | ✗         | ✗       | ✗                | ✗              | ✗       | ✗          | ✗                                  | ✗       | ✗          | ✗                | ✗              | ✓ | ✓          | ✓         | ✓       | ✓    |

Table I provides representative results for our method in Section III. As indicated, further results are made available at [16]. Our dataset has {Easy, Medium, Difficult} FSMs, based on their complexity in terms of the number of states and the number of transitions (which is exponentially related to the number of states). The dataset showcases the capabilities of LLMs and their limitations (e.g., due to a lack of domain knowledge). First, we characterized the FSMs based on the number of

inputs (i/p), outputs (o/p), and states. Iterations relate to our method explained in Section III; each iteration involves running VCS and collecting key information, as highlighted in Fig. 6, which includes the percentage of coverage and 'Not Covered' transitions with the current test cases. This information is used to provide more context for the next prompt. Second, we report coverage with no feedback, where we prompted {GPT3.5, GPT4} to generate a testbench with full coverage.



Fig. 7: State transition diagram for FSM16.

Correspondingly,  $\{GPT3.5 + This\ Work, GPT4 + This\ Work\}$  represents coverage using feedback with our method. Varying the number of iterations is required for testbench generation across FSMs. This is expected as the iterative process depends on the number of states, transitions, and the type of connections. Due to repetitive responses after some iterations and certain transitions being very difficult for the LLMs to understand, our aim was to reach as close to 100% coverage as possible. As shown in Table I, with our feedback mechanism we were able to achieve that with both GPT3.5 and GPT4. Still, the number of iterations required to achieve that is less for GPT4 versus GPT3.5, which is expected considering the capabilities of both models. For example, for FSM16, without feedback, the coverage is 25% for both GPT3.5 and GPT4, whereas for our method it is 100% in both cases.

### C. Results: LLM-guided Automatic Bug Detection

So far, we did *not* provide natural-language design specifications for any of the FSMs. Next we investigate automatic comparison of the natural-language specification against the generated test cases and simulation outputs. Such capabilities are essential for bug detection. Consider the ‘buggy’ and incorrect transition in FSM16 in Fig. 7(b) that occurred due to human designer error.

As shown in Fig. 2, we provide the tool with the test patterns and the VCS simulation outputs for each clock cycle. We prompt GPT(3.5/4) to correspond/match these test results with the design specifications and to find mismatches, if any. The mismatches are highlighted as potential bugs in the FSM RTL. We compare three scenarios in Table I: (1) *State Regs (Registers)*: We provide all input patterns and state register values for each clock cycle; (2) *I/O pairs*: We provide only i?O pairs per clock cycle; (3) *Fuzzing*: We provide random input patterns. For the Scenarios (1, 2), we applied our framework to generate test patterns. Scenario (3) is a baseline non-AI approach akin to hardware fuzzing.

**Initial Findings:** LLMs can help raise warnings for mismatches in an advanced manner beyond traditional coverage.<sup>3</sup> However, when using large testbenches, it can become difficult for the model to follow up for longer sequences of input

<sup>3</sup>100% transition coverage does not guarantee bug detection. Consider ‘buggy’ transition in FSM16 in Fig. 7(b). A test case reaches the incorrect transition but goes no further and fails to find the bug. FSM output diverges only a few cycles later.

patterns. Looking at each case in more detail for  $\{GPT3.5, GPT4\}$  in the Table I, we find that in Scenario (1), providing the state registers value helps the LLM to correlate with the design specification. That is, the model only needs to check for the next state based on the input patterns, and any transitions that are not mentioned/covered would raise a warning of a potential bug. For this scenario, thus, we can detect incorrect transitions for a few of the *Easy* and *Medium* level FSMs with a reasonable number of input patterns, but not for the *Hard* FSMs. This is because of the inability of these models to comprehend a large input context, with GPT4 performing better compared to GPT3.5. Scenario (2) is challenging. We had to add some additional input patterns, thus leading to a bigger output trace to be fed to these models. We observe a similar trend to Scenario (1) across GPT3.5 and GPT4. Scenario (3), i.e., random test pattern generation, performs similarly to the other 2 scenarios, because of the mentioned limitation of these models.

**Analysis for Bug Detection:** We explored other approaches to improve the result, by adjusting the prompt as shown in Fig. 4. In one technique, instead of feeding all input-output patterns, we divide them into smaller sets. This is easier to handle but still allows the LLM to track the state reached at the end of each set. The next iteration of our method proceeds from there. Another technique applies to multi-bit inputs and outputs. Instead of considering all output bits at once, we tackle them individually. That is, we provided all input patterns for an output bit and repeated this for all outputs. The rationale is to focus LLMs on manageable sub-circuits (output cones) rather than “bombarding” them with all information. We provide the natural-language description with each query to LLM, such that the model does not lose the overall context. In Table I, the columns corresponding to  $\{GPT3.5 + This\ Work, GPT4 + This\ Work\}$  reports our final results with all these prompting techniques. With GPT3.5 and our method, we were able to detect the bug in all of *Easy* and *Medium* FSMs across all the 3 scenarios and a few *Hard* FSMs for Scenarios (1) and (3). GPT4 combined with our method can detect the bug in all the 3 level of FSMs and for all scenarios. This shows the effectiveness of our method to improve the performance of LLMs on this task. Fig. 8 shows the scaling of input patterns required to detect bugs for the *fuzzing* approach for all methods. Clearly, our method combined with GPT3.5 and GPT4 can detect bugs with fewer patterns and with a higher success rate.

### D. Detailed Case Studies using GPT4

1) *FSM22*: This FSM has one input, multiple outputs, and 6 states. For Scenarios (1) and (2), with less number of patterns, it was easy for the model to analyze and flag the mismatch between the generated output and the design specification. For Scenario (3), we employ new prompts similar to those shown in Fig. 4. The mismatch was detected with 64 patterns.

2) *FSM34*: has 2 inputs, 1 output, and 10 states. For Scenarios (1,2), we detected incorrect transitions by accessing state register and I/O values, respectively, for each clock cycle. For Scenario (3) using random patterns, GPT4 could not comprehend the input sequences. We modified the prompt similar to Fig. 4, i.e., we iteratively consider subsets of I/O



Fig. 8: # of input patterns for bug detection for 100 FSMs using *fuzzing* for all methods from Table I. Results are arranged as regions of FSM complexity. For missing points, respective methods could not detect the bug.

pairs. We begin with 100 random patterns and run the simulator to collect related I/O pairs, which are then fed to GPT4 in small chunks. GPT4 then detected a mismatch with  $\sim 89$  patterns.

3) *FSM42*: This is a 1-bit input, 1-bit output, and 19 state FSM. For Scenarios (1) and (2), 73 and 85 patterns, respectively were needed to detect the mismatch. For Scenario (3), feeding all I/O pairs to GPT4 did not catch the bug. Thus, we used the prompts from Fig. 4 and increased the random patterns to 200. A mismatch was detected with  $\sim 107$  patterns.

4) *FSM50*: is complex, with multi-bit outputs and 28 states. Our approach of feeding all the input-output pairs failed for all the cases. Instead, we collected 200 patterns for each case, where GPT4 still failed to comprehend the whole FSM. We employed both prompt variations in Fig. 4. For Scenario (1), with  $\sim 158$  test patterns and full access to state register values. For Scenarios (2,3), for this revised approach, GPT-4 detected the mismatch with  $\sim 193$  and  $\sim 181$  test patterns, respectively.

## V. CONCLUSIONS: KEY CHALLENGES AND INSIGHTS

Designs with a large number of states are challenging for LLMs to analyze all possible transitions. Eventually, such cases lead to a repetition of responses. Second, fine-tuning an LLM for this work's scope is challenging as testbenches depend on RTL codes. Testbenches should be accompanied by the corresponding RTL, and such combined datasets are not available. From the prior point, two further challenges

follow. Even if we obtain such datasets, labeling testbenches as ‘good’ versus ‘bad’ in terms of coverage is not easy, at least not without thoroughly running the EDA tools on all designs in the first place. Interdependency between regular RTL and testbenches may be difficult to comprehend for the LLMs, given the syntactic differences in these two types of HDL files.

To overcome the scalability challenge for the large numbers of states and transitions, we prompt LLMs with phrases like “reset after X number of transitions are covered”. The model can then start with a new context to generate the remaining test patterns without getting stuck at some state. Furthermore, without access to a golden testbench for few-shot learning, the feedback from the EDA tools is important as such feedback in the prompt better guide the model. That is, after achieving some coverage, prompts are modified with phrases like “Keep the previous test cases but now focus more on the transitions that are not covered yet”. Once we provide the underlying RTL, the LLM can follow the context. The interdependency between testbenches and RTL is better understood throughout the iterative process and evidenced through our case study on bug detection. Prompts play a crucial role in the output quality.

## APPENDIX

### A. Result II

Table II reports our results for the rest of the FSMs other than the one we have mentioned in the Table I.

## REFERENCES

- [1] J. Takahashi *et al.*, “Testing for high assurance system by fsm,” *IEICE TRANSACTIONS on Information and Systems*, vol. 86, no. 10, pp. 2114–2120, 2003.
- [2] OpenAI, “GPT-4,” Mar. 2023. [Online]. Available: <https://openai.com/research/gpt-4>
- [3] M. Liu *et al.*, “Verilogeval: Evaluating large language models for verilog code generation,” in *2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)*. IEEE, 2023, pp. 1–8.
- [4] S. Thakur *et al.*, “Autochip: Automating hdl generation using llm feedback,” *arXiv preprint arXiv:2311.04887*, 2023.
- [5] Y. Lu *et al.*, “Rtlbm: An open-source benchmark for design rtl generation with large language model,” in *2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)*. IEEE, 2024, pp. 722–727.
- [6] S. Thakur *et al.*, “Verigen: A large language model for verilog code generation,” *ACM Transactions on Design Automation of Electronic Systems*, 2023.
- [7] R. Kande *et al.*, “Llm-assisted generation of hardware assertions,” *arXiv preprint arXiv:2306.14027*, 2023.
- [8] W. Fang *et al.*, “Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms,” *arXiv preprint arXiv:2402.00386*, 2024.
- [9] H. Wu *et al.*, “Chateda: A large language model powered autonomous agent for eda,” *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 2024.
- [10] M. Liu *et al.*, “Chipnemo: Domain-adapted llms for chip design,” *arXiv preprint arXiv:2311.00176*, 2023.
- [11] R. Zhong *et al.*, “Llm4eda: Emerging progress in large language models for electronic design automation,” *arXiv preprint arXiv:2401.12224*, 2023.
- [12] J. Blocklove *et al.*, “Chip-chat: Challenges and opportunities in conversational hardware design,” in *2023 ACM/IEEE 5th Workshop on Machine Learning for CAD (MLCAD)*. IEEE, 2023, pp. 1–6.
- [13] Y. Fu *et al.*, “Gpt4aigchip: Towards next-generation ai accelerator design automation via large language models,” in *2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)*. IEEE, 2023, pp. 1–9.
- [14] K. Chang *et al.*, “Chipgpt: How far are we from natural language hardware design,” *arXiv preprint arXiv:2305.14019*, 2023.
- [15] “Problem sets - HDLBITS,” 2023. [Online]. Available: [https://hdlbits.01xz.net/wiki/Problem\\_sets](https://hdlbits.01xz.net/wiki/Problem_sets)
- [16] J. Bhandari. (2024) LLM-Aided-Testbench-Generation-for-FSM. [Online]. Available: <https://github.com/jitendra-bhandari/LLM-Aided-Testbench-Generation-for-FSM>

TABLE II: Remainder of the Result. See also Table I caption.

| Level  | FSMs | LLM-guided Testbench Generation |     |        |        |       |                  | LLM-guided Automated Bug Detection |        |       |            |           |                  |                |           |         |
|--------|------|---------------------------------|-----|--------|--------|-------|------------------|------------------------------------|--------|-------|------------|-----------|------------------|----------------|-----------|---------|
|        |      | Characteristics                 |     |        | GPT3.5 | GPT4  | GPT3.5+This Work | GPT4+This Work                     | GPT3.5 |       |            | GPT4      | GPT3.5+This Work | GPT4+This Work |           |         |
|        |      | i/o                             | o/p | states | % Cov  | % Cov | Iters            | % Cov                              | Iters  | % Cov | State Regs | I/O pairs | Fuzzing          | State Regs     | I/O pairs | Fuzzing |
| Easy   | 3    | 1                               | 2   | 2      | 50     | 50    | 2                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 4    | 2                               | 2   | 2      | 75     | 50    | 3                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 5    | 2                               | 2   | 2      | 50     | 75    | 3                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 6    | 2                               | 1   | 2      | 25     | 50    | 2                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 7    | 2                               | 2   | 2      | 50     | 75    | 2                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 8    | 2                               | 3   | 2      | 50     | 50    | 3                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 11   | 1                               | 1   | 3      | 42     | 50    | 3                | 100                                | 3      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 12   | 1                               | 1   | 3      | 50     | 50    | 2                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 13   | 3                               | 3   | 3      | 24     | 50    | 4                | 95                                 | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 14   | 1                               | 1   | 3      | 33     | 40    | 3                | 100                                | 3      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 21   | 1                               | 2   | 4      | 20     | 45    | 4                | 94                                 | 3      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 22   | 2                               | 2   | 4      | 40     | 50    | 5                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 23   | 2                               | 2   | 4      | 25     | 25    | 5                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 24   | 1                               | 1   | 4      | 32     | 40    | 4                | 97                                 | 3      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 37   | 1                               | 1   | 6      | 25     | 50    | 5                | 91                                 | 3      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 38   | 6                               | 2   | 6      | 50     | 50    | 4                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 39   | 3                               | 3   | 6      | 50     | 45    | 6                | 100                                | 2      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 40   | 1                               | 4   | 6      | 25     | 25    | 5                | 93                                 | 3      | 90    | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 41   | 1                               | 1   | 6      | 20     | 20    | 5                | 100                                | 4      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 44   | 2                               | 1   | 7      | 25     | 37    | 6                | 90                                 | 3      | 95    | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 45   | 1                               | 4   | 7      | 50     | 50    | 7                | 100                                | 4      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
| Medium | 49   | 1                               | 2   | 8      | 25     | 30    | 6                | 90                                 | 5      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 50   | 2                               | 2   | 8      | 20     | 35    | 6                | 95                                 | 4      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 51   | 1                               | 4   | 8      | 15     | 40    | 7                | 91                                 | 4      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 55   | 2                               | 1   | 9      | 25     | 38    | 7                | 93                                 | 5      | 100   | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 56   | 3                               | 3   | 9      | 18     | 25    | 6                | 90                                 | 4      | 95    | ✓          | ✓         | ✓                | ✓              | ✓         | ✓       |
|        | 57   | 1                               | 4   | 9      | 33     | 25    | 7                | 90                                 | 6      | 91    | ✓          | ✗         | ✓                | ✓              | ✓         | ✓       |
|        | 58   | 1                               | 1   | 9      | 12     | 18    | 5                | 94                                 | 5      | 95    | ✓          | ✗         | ✓                | ✓              | ✓         | ✓       |
|        | 63   | 5                               | 5   | 10     | 15     | 10    | 9                | 96                                 | 7      | 100   | ✓          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 64   | 1                               | 1   | 10     | 20     | 45    | 8                | 90                                 | 5      | 100   | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 65   | 1                               | 2   | 10     | 18     | 25    | 8                | 100                                | 5      | 93    | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 68   | 2                               | 1   | 11     | 10     | 15    | 9                | 100                                | 6      | 96    | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 69   | 3                               | 1   | 11     | 20     | 20    | 9                | 92                                 | 7      | 90    | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 70   | 1                               | 1   | 11     | 25     | 20    | 8                | 98                                 | 6      | 100   | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 73   | 2                               | 2   | 12     | 35     | 41    | 11               | 90                                 | 8      | 90    | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 74   | 3                               | 1   | 12     | 17     | 15    | 10               | 95                                 | 7      | 95    | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 76   | 4                               | 3   | 13     | 9      | 18    | 11               | 90                                 | 9      | 90    | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 77   | 3                               | 3   | 13     | 11     | 10    | 10               | 75                                 | 8      | 90    | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 79   | 1                               | 1   | 14     | 25     | 22    | 11               | 91                                 | 9      | 92    | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 80   | 1                               | 2   | 14     | 32     | 10    | 12               | 90                                 | 9      | 100   | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
| Hard   | 81   | 1                               | 1   | 15     | 13     | 16    | 14               | 100                                | 10     | 95    | ✗          | ✗         | ✗                | ✓              | ✓         | ✓       |
|        | 83   | 2                               | 2   | 17     | 11     | 17    | 16               | 90                                 | 12     | 96    | ✗          | ✗         | ✗                | ✓              | ✗         | ✓       |
|        | 84   | 1                               | 1   | 18     | 6      | 8     | 19               | 95                                 | 13     | 97    | ✗          | ✗         | ✗                | ✓              | ✗         | ✓       |
|        | 86   | 1                               | 2   | 19     | 8      | 5     | 20               | 92                                 | 13     | 91    | ✗          | ✗         | ✗                | ✗              | ✗         | ✓       |
|        | 88   | 1                               | 1   | 20     | 9      | 6     | 21               | 91                                 | 14     | 92    | ✗          | ✗         | ✗                | ✗              | ✗         | ✓       |
|        | 89   | 1                               | 1   | 21     | 5      | 9     | 21               | 93                                 | 12     | 95    | ✗          | ✗         | ✗                | ✗              | ✗         | ✓       |
|        | 91   | 3                               | 3   | 22     | 10     | 11    | 23               | 90                                 | 12     | 90    | ✗          | ✗         | ✗                | ✗              | ✗         | ✓       |
|        | 94   | 1                               | 1   | 24     | 4      | 13    | 24               | 96                                 | 13     | 100   | ✗          | ✗         | ✗                | ✗              | ✗         | ✓       |
|        | 97   | 1                               | 1   | 26     | 8      | 6     | 23               | 90                                 | 14     | 90    | ✗          | ✗         | ✗                | ✗              | ✗         | ✓       |
|        | 98   | 1                               | 1   | 26     | 7      | 3     | 22               | 90                                 | 14     | 95    | ✗          | ✗         | ✗                | ✗              | ✗         | ✓       |