

# **AI for Automatic HDL Code Generation from High-Level Descriptions**

Department Lippstadt 2

**Advance Hardware Engineering**

submitted by

**Md Ruhul Kuddus Saleh**

Electronic Engineering

Mat.Nr.: 2219956

md-ruhul-kuddus.saleh@stud.hshl.de

**18 December 2025**

**Supervisor:** Prof. Dr.-Ing. Ali Hayek

## Abstract

Development of a Hardware Description Language (HDL) is among the most time-consuming and expert-resource-intensive steps of FPGA and ASIC designs. With the growing complexity of digital systems and pressing demands of time-to-market, conventional manual HDL designs using the Register Transfer-Level (RTL) approach are no longer adequately scalable. However, with recent AI technology and advancements achieved through the use of large language models (LLMs) in the processing and translation of natural language descriptions, there is a new potential channel of automatic generation of HDL from a natural language description or from an algorithmic description.

The HDL design synthesis comparison between professionally developed HDL designs and designs generated with AI on a target Xilinx Artix-7 FPGA evaluates synthesis feasibility, resource usage, power performance, and possible implementation on silicon. Although the AI designs have correct functional intent and proper HDL syntax, important deficiencies are found in its support for design tools, power performance, overall system architecture understanding, and constraints on implementation. There are instances where synthesis fails or goes beyond device capabilities, making it unimplementable in hardware.

The outcome of the study clearly states that the current application of AI is more optimal in the role of an HDL code assistant, rather than being an independent hardware designer. Also, there is an imperative need for major breakthroughs in the field of device-aware training, power-efficient designs, and verification.

# Contents

|                                                           |    |
|-----------------------------------------------------------|----|
| Introduction . . . . .                                    | 4  |
| High-Level Descriptions for FPGA Design . . . . .         | 4  |
| IV. Traditional High-Level Synthesis Techniques . . . . . | 5  |
| V. AI Techniques for HDL Code Generation . . . . .        | 6  |
| VI. Experimental Analysis and Results . . . . .           | 7  |
| VII. Tools and Prototypes Already Available . . . . .     | 11 |
| VIII. Benefits and Challenges . . . . .                   | 11 |
| Conclusion . . . . .                                      | 12 |
| Appendix: Source Code Listings . . . . .                  | 13 |

# Introduction

Moore’s Law and Dennard scaling have traditionally been the pillars sustaining the semiconductor industry. However, lately, the rate of progress has gradually declined [1]. This has increased the focus on the concept of heterogeneous computing architecture, which lays the foundation using combined CPU architecture specific hardware such as GPUs, FPGAs, or ASIC [2]. This redistribution of workload on platforms aligned for the specific function helps focally maximize performance along with power savings [3].

FPGAs are considered the brightest component within the industry. This assists in the processing associated with specialized calculations [4]. However, FPGAs are complex in nature. They require knowhow in the realm of HDL programming, specifically Verilog or VHDL programming [10]. Manual processing within the handling associated with high-level algorithms in terms of generating the RTL model has always proven time-consuming and error-prone [6, 7]. Moreover, the High-Level Synthesis (HLS) tools available are often limited in handling specialized software-centric processing efficiently [8, 9].

This has sparked the idea of using AI, specifically Large Language Models (LLMs), in the processing [5]. This paper aims at exploring the feasibility associated with AI-synthesized HDL. They compare the AI-synthesized designs with the designs embedded by professionals. This paper aims at exploring the implementation of AI in the hardware description within the industry [10].

## High-Level Descriptions for FPGA Design

High-level descriptions form the bridge between the algorithmic intention and the hardware implementation in both conventional high-level synthesis and AI-based design flows. The different descriptions vary based on the level of abstraction, expressiveness, and ability to model hardware constructs like parallelism, time, and data transport. The description approach used has a major impact on design productivity and the quality of the produced RTL.

### A. Software-Like Languages

The traditional HLS flow relies on software-oriented programming languages like C, C++, OpenCL, and SystemC to program hardware functionality [1]. The mentioned programming languages are widely understandable and supported, although not optimal in their implementation on FPGA architectures. They are natural and suited for software design, which hides hardware parallelizations, making developers entirely dependent on implementation-specific directives to manage parallelization and pipelining [6]. Moreover, these software-oriented design languages lack sufficient control over memory, and there are no timing-related constructs that make cycle-accurate implementation and optimization difficult [8, 10].

### B. Domain Specific Languages

Domain-specific languages (DSLs) such as Chisel, Bluespec, MyHDL, and PyMTL have overcome these challenges through the use of hardware-oriented abstraction constructs [2]. The mentioned languages facilitate hardwired structural description and offer parallelism

as a primitive, hence offering a simpler description of a hardware design [4]. The modeling tools, such as MATLAB and Simulink, facilitate fast functional modeling, while efficient implementation, in most cases, demands optimization [9].

## C. Behavioral vs. Functional Descriptions

High-level descriptions can thus be categorized into Behavioral or Functional. Behavioral styles are good at explaining the behavior of algorithms, while having little architectural relevance. This results in data paths and control logic being abstracted by the synthesizer or the AI tools, which proves inefficient at times [5]. Functional styles, which include dataflow-based models, address the dependency architecture comprehensively, thus being relatively free from ambiguities in the synthesis process [3]. Hence, description style selection assumes importance in synthesizing hardware at the desired level, either using AI or the conventional approach at times [7].

# Traditional High-Level Synthesis Techniques

High-Level Synthesis (HLS) has also been recognized as an innovative approach for Electronic Design Automation (EDA) in bridging the increasing productivity gap in hardware design. Since HLS enables designers to model their hardware designs in high-level languages (HLLs), it automatically translates the description of an algorithm into an RTL description, thus offering faster design development time as it does not require RTL coding manually. However, the broader design space associated with high-level synthesis also brings in its own design difficulties.

## A. Core HLS Workflow

The translation of high-level code to hardware involves a streamlined set of steps for synthesis and optimization. Scheduling involves identifying the clock cycle during which an operation should take place, trading off between clock cycles, throughput, and timing constraints, keeping data dependencies in mind.

Resource allocation involves specifications related to hardware resource types and quantities, which could range from arithmetic units, memory blocks, to specifying which operations and variables bind to resource instances, also taking into account resource constraints for implementation. For exploiting parallelism, optimization techniques involving unrolling and pipelining are carried out, which enable concurrent execution, hence speeding up execution by executing a number of iterations concurrently for loops. High-level data storage is then assigned to physical memory resources, which range from register memory to Block RAM (BRAM), also requiring manual partitioning to satisfy bandwidth constraints. Finally, specifying control logic through finite state machines (FSMs) for controlling datapath execution and data transfer is required for designing digital systems by controlling data transfer between different stages through FSMs that implement control flow during system execution.

## B. Existing HLS Tools

The HLS environment encompasses a range of tools, including proprietary tools and academic tools. The proprietary tools include Xilinx Vivado HLS (Vitis HLS) and the Intel HLS Compiler, which are designed for specific FPGA architectures. Cadence Stratus HLS and Mentor Catapult HLS enable a wider design space exploration of FPGAs and ASICs. At the same time, open-source academic tools like LegUp and ROCCC enable a flexible environment for customizing the tools. While the environment provides more insight into the design process, the tools lack the timing closure and design-specific optimization of the proprietary tools.

## C. Limitations of Traditional HLS

However, the existing traditional HLS tools have various drawbacks that impede their autonomy. The HLS tools tend to parse software-centric code literally and lack the capability to infer the optimal hardware architecture without the involvement of the designer. Obtaining optimal performance requires the tedious insertion of directives by the designer, which makes the exploration of the design space longer and repetitive.

The existing HLS tools tend to behave inefficiently while processing irregular control flow or dynamically varying memory access patterns, which lead to suboptimal hardware consumption and resource usage. The longer synthesis and analysis time introduces hurdles in the optimization process by leading to delayed feedback on power consumption, performance, and area. The above hindrances open up avenues for exploring the usage of AI-based approaches that facilitate the efficient inference and optimization of architecture.

# V. AI Techniques for HDL Code Generation

The shortcomings of conventional High-Level Synthesis (HLS), specifically the need for human optimization and the expensiveness of design space exploration, have necessitated the involvement of Artificial Intelligence (AI) in the design flow process. More advanced AI methods, based on predictive machine learning and generative Large Language Models (LLM), seek to automate the optimization process and minimize human involvement during hardware design [5, 10].

## A. Machine Learning for Performance Prediction

These models use machine learning algorithms to approximate the Quality of Results (QoR) attributes such as area, latency, and power consumption without the need for complete synthesis [6]. These models are trained using data from past designs to provide feedback efficiently and aid the selection of pragmas for loop pipelining and memory partitioning [2, 8].

## B. Deep Learning

Deep learning techniques make it possible to better model complex hardware graphs. Graph Neural Networks (GNNs) analyze structural graphs like Abstract Syntax Trees

(ASTs) and Dataflow Graphs (DFGs), focusing on dependencies that are useful in understanding resource utilization and timing information in code [8].

Models based on the transformer neural network further enhance this analysis by finding long-term dependencies in code that are significant in determining hardware efficiency [5].

## C. Large Language Models

Large Language Models (LLMs), namely ChatGPT, CodeLlama, and StarCoder, bring generative capabilities to hardware EDA. They have the ability to transform natural-language descriptions into synthesizable HDL and assist with code completion and refactoring [5, 9, 11]. The limitations of these approaches lie in hallucination, lack of awareness regarding timing aspects, and ignorance of specific device constraints [10, 11].

# VI. Experimental Analysis and Results

For a practical estimate of the viability of using AI-based hardware descriptions, a comparative experimental test was done on professionally designed “golden reference” designs and HDL generated using state-of-the-art large language models. The test was done on a Xilinx Artix-7 FPGA technology using the Vivado 2025.1 design tools. The memory inference HDL was tested using ChatGPT-5.2, while the more complex control HDL was generated using Claude Sonnet 4.5.

## A. Experiment 1: Memory Inference Using ChatGPT-5.2

The initial experiment tested if it was possible for an AI model to deduce a correct dual-port BRAM and to implement conventional power management techniques, such as clock enables. The reference design used was that of a professional reference design taken from a Xilinx memory template.

Table 1: BRAM Resource Utilization and Workflow Outcome

| Metric           | Professional<br>(Xilinx<br>Template) | AI-Generated<br>(ChatGPT-<br>5.2) | Difference          |
|------------------|--------------------------------------|-----------------------------------|---------------------|
| Block RAM Tiles  | 1 (1.00%)                            | 1 (1.00%)                         | 0%                  |
| Bonded IOB       | 158                                  | 152                               | -3.8%               |
| Synthesis Status | Pass                                 | Fail(Syntax<br>Error)             | Workflow<br>Failure |



(a) Professional BRAM Utilization



(b) AI BRAM Utilization

Figure 1: Utilization reports for BRAM inference showing identical memory resource usage.

ChatGPT-5.2 was able to correctly infer the RAMB36E1 primitive. Unfortunately, there was an issue with the tool flow compatibility during the synthesis step. ChatGPT-5.2 was able to produce SystemVerilog language constructs while assigning the Verilog-2001 file extension. This resulted in the design being rejected by the Vivado compiler because of the improper syntax.



Figure 2: Vivado synthesis failure screenshot showing the syntax errors in the AI-generated HDL.

There was a more critical difference between the power analysis. Even though both designs used the same memory resources, the AI-designed architecture used nearly twice the on-chip power.

Table 2: Power Consumption Comparison for BRAM Design

| Power Component            | Professional    | AI-Generated    | Increase    |
|----------------------------|-----------------|-----------------|-------------|
| BRAM Power                 | 0.160 W         | 0.321 W         | +100% (2x)  |
| I/O Power                  | 30.726 W        | 61.414 W        | +100% (2x)  |
| <b>Total On-Chip Power</b> | <b>32.562 W</b> | <b>64.294 W</b> | <b>+97%</b> |



Figure 3: Comparison of on-chip power consumption; the AI design (b) exceeds junction temperature limits.

The professionally designed code incorporated clock-enabling signals (ena/enb) that disabled the BRAM during idling cycles. The generated code lacked the enabling signals, which caused the memory and I/O to run continuously. The effect caused the dynamic power consumption to increase by almost twice the original amount, thus lacking power consumption design reasoning.

## B. Experiment 2: Generation of Control Logic Using Claude Sonnet 4.5

The second experiment involved the generation of a GPIO controller with a masked write capability. This was an exercise in understanding system-level interfaces as well as realizing efficient hardware-level parallelism. The original design was based on an OpenTitan project.

Table 3: Logic Resource and Physical Constraint Utilization

| Metric            | Professional<br>(OpenTitan) | AI-Generated<br>(Claude 4.5) | Status                         |
|-------------------|-----------------------------|------------------------------|--------------------------------|
| Slice LUTs        | 68                          | 193                          | 2.8× Increase                  |
| Slice Registers   | 160                         | 192                          | 1.2× Increase                  |
| Bonded IOB        | 203                         | 521                          | <b>FAILED</b><br><b>(148%)</b> |
| Max IOB Available | 210                         | 210                          | —                              |
| Total Leaf Cells  | 434                         | 908                          | <b>2.1x Area</b>               |



(a) Professional GPIO Utilization



(b) AI GPIO Utilization

Figure 4: Utilization reports highlighting the massive increase in Bonded IOB and LUTs in the AI-generated design.

The hardware unimplementability of the design created by the AI involved many I/O pins. In contrast to the professional design approach, the professional design connected the configuration registers using an internal bus. The design created by the AI made many internal signals available through the top-level port. This led to a hardware need for 521 I/O pins, which was above the hardware capability by 148%.

Moreover, the AI implementation utilized software-style "for loops" for the masked writes. Although correct in functionality, the loops were unrolled during the synthesis process, leading to an abundance of combinational logic. In contrast, the professional implementation utilized a single-cycle approach with bitwise masking.

Table 4: Power and Thermal Analysis Results

| Metric              | Professional<br>(OpenTitan) | AI-<br>Generated<br>Code | Difference /<br>Status |
|---------------------|-----------------------------|--------------------------|------------------------|
| Logic Power         | 0.196 W                     | 0.595 W                  | +203% (3x)             |
| I/O Power           | 13.847 W                    | 30.232 W                 | +118%                  |
| Total On-Chip Power | 15.133 W                    | 32.829 W                 | +117%                  |
| Junction Temp       | 94.0 °C<br>(Safe)           | 125.0 °C<br>(Failed)     | DESTRUCTIVE            |



(a) Professional Logic Power



(b) AI Logic Power

Figure 5: Thermal analysis showing logic power and junction temperature deltas.

The generated design was capable of attaining the highest permissible junction temperature of 125°C, which would activate the thermal shutdown in the actual design, thus making it incapable of being manufactured.

## C. Summary of Key Observations

The results obtained from the experiments have shown that there are three basic challenges associated with the use of the AI-HDL. First, the AI model has a lack of system architecture awareness, causing unmanageable I/O. Secondly, the use of other power-aware design methodologies, such as clock gating, contributes to an increase in the net dynamic power. Finally, the lack of software design methodologies, which includes clock gating, causes a major imbalance in logic usage and resultant thermal characteristics.

## VII. Tools and Prototypes Available

AI-driven hardware design is beginning to move from research into early commercial deployment, promising to reduce manual effort in traditional EDA design flows and increase efficiency [2, 5].

Graph Neural Networks used by academic prototypes, such as HARP, have already predicted optimization impacts and accelerated design space exploration [6]. Chip-Chat and Autochip provide iterative improvements to HDL code quality using LLM-based generators [5, 10]. Hybrid flows use neural cost models to enable near-real-time timing and area estimation without full synthesis [8].

**Commercial Tools:** AI is being introduced by vendors for managing difficult SoC designs. Synopsys DSO.ai uses reinforcement learning to automatically perform synthesis and floorplanning. Cadence Cerebrus is an AI-driven parameter tuner that seeks the optimal PPA; Siemens EDA uses AI-assisted verification for speed in bug detection. These commercial tools enhance, instead of replacing, human designers [5].

**Open-source Initiatives:** These include datasets like VHDL-Eval and VHDL-Xform, providing support for HDL-specific LLMs and neural cost models, especially for under-represented languages [9, 4]. Community-driven tools focus on automated refactoring of AI-generated HDL with the goal of improving readability and synthesizability [10].

## VIII. Benefits and Challenges

### Benefits

- **Development Speed:** AI-driven flows speed up development with automatic pragma insertion and parameter tuning [8].
- **Accessibility:** Rapid prototyping is thus possible even for non-experts [1, 10].
- **Consistency:** AI-assisted refactoring and test generation may be used under expert supervision to boost consistency and overall Quality of Results (QoR) [11].

## Challenges

- **Synthesizability:** AI-generated HDL may include non-synthesizable constructs and thus require human review [10].
- **Data Scarcity:** The training data of limited size, especially for VHDL, restricts the generalization capability of the model [9].
- **Physical Constraints:** Most AI models do not consider timing, power, and physical constraints, resulting in syntactically correct but physically infeasible designs [11].
- **Reliability:** The integration of probabilistic AI into deterministic EDA flows raises reliability and verification issues, highlighting the need for human oversight [5, 7].

## Conclusion

The project demonstrates the beginnings of the useful role of AI models, such as ChatGPT or Claude, as a “code assistant” in the design of hardware, inferring primitives effectively and accelerating the prototyping stage. In their present form, they demonstrate the lack of system-level architectural insight, leading to the generation of software-ish logic that suffers from inefficiency and, thus, critical physical hazards, including overly heavy I/O and doubled power.

The neglect of important power management strategies, including clock gating, reiterates the incompleteness of these tools vis-à-vis manufacturability and thermal robustness. Nonetheless, the use of AI-flows, backed by academic efforts, industrial tools, and open-source projects, allows the amalgamation of AI training data that includes a device perspective, along with online EDA feedback, to fill the gap between the syntax of the model and the hardware that needs to be built.

## Appendix: Source Code Listings

This appendix contains the Verilog and SystemVerilog source code used during the experimental analysis. These listings include both the professional reference designs and the AI-generated implementations for the BRAM and GPIO benchmarks.

### A. Professional BRAM Reference (Xilinx Template)

```
1 // Professional Source: Xilinx Vivado Language Template (Fixed)
2 // Module: True Dual Port RAM with Dual Clocks
3 // Standardized values: Width=32, Depth=1024 for benchmarking
4
5 module xilinx_true_dual_port_ram #(
6     parameter RAM_WIDTH = 32,                                // Fixed: 32-bit
7     data
8     parameter RAM_DEPTH = 1024,                             // Fixed: 1024
9     entries
10    parameter RAM_PERFORMANCE = "HIGH_PERFORMANCE", // Select "
11        HIGH_PERFORMANCE" or "LOW_LATENCY"
12    parameter INIT_FILE = ""                            // Leave blank
13 );
14
15     input [clogb2(RAM_DEPTH-1)-1:0] addra,      // Port A address bus
16     input [clogb2(RAM_DEPTH-1)-1:0] addrb,      // Port B address bus
17     input [RAM_WIDTH-1:0] dina,                  // Port A RAM input data
18     input [RAM_WIDTH-1:0] dinb,                  // Port B RAM input data
19     input clka,                                 // Port A clock
20     input clkb,                                 // Port B clock
21     input wea,                                  // Port A write enable
22     input web,                                  // Port B write enable
23     input ena,                                  // Port A RAM Enable
24     input enb,                                  // Port B RAM Enable
25     input rsta,                                 // Port A output reset
26     input rstb,                                 // Port B output reset
27     input regcea,                               // Port A output register
28     enable
29     input regceb,                               // Port B output register
30     enable
31     output [RAM_WIDTH-1:0] douta,                // Port A RAM output data
32     output [RAM_WIDTH-1:0] doutb                 // Port B RAM output data
33 );
34
35
36 // Scaling function for address width
37 function integer clogb2;
38     input integer depth;
39     for (clogb2=0; depth>0; clogb2=clogb2+1)
40         depth = depth >> 1;
41     endfunction
42
43 // 2D Array for RAM storage
44 reg [RAM_WIDTH-1:0] BRAM [RAM_DEPTH-1:0];
45 reg [RAM_WIDTH-1:0] ram_data_a = {RAM_WIDTH{1'b0}};
46 reg [RAM_WIDTH-1:0] ram_data_b = {RAM_WIDTH{1'b0}};
47
48 // Initialize memory to zero
49 generate
50     if (INIT_FILE != "") begin: use_init_file
51         initial
```

```

45          $readmemh(INIT_FILE, BRAM, 0, RAM_DEPTH-1);
46      end else begin: init_bram_to_zero
47          integer ram_index;
48          initial
49              for (ram_index = 0; ram_index < RAM_DEPTH; ram_index =
50                  ram_index + 1)
51                  BRAM[ram_index] = {RAM_WIDTH{1'b0}};
52      end
53  endgenerate
54
54 // PORT A OPERATION
55 always @(posedge clka) begin
56     if (ena) begin
57         if (wea)
58             BRAM[addr_a] <= dina;
59         ram_data_a <= BRAM[addr_a];
60     end
61 end
62
63 // PORT B OPERATION
64 always @(posedge clk_b) begin
65     if (enb) begin
66         if (web)
67             BRAM[addr_b] <= dinb;
68         ram_data_b <= BRAM[addr_b];
69     end
70 end
71
72 // Output Register Stage
73 generate
74     if (RAM_PERFORMANCE == "LOW_LATENCY") begin: no_output_register
75         assign douta = ram_data_a;
76         assign doutb = ram_data_b;
77     end else begin: output_register
78         reg [RAM_WIDTH-1:0] douta_reg = {RAM_WIDTH{1'b0}};
79         reg [RAM_WIDTH-1:0] doutb_reg = {RAM_WIDTH{1'b0}};
80
81         always @(posedge clka) begin
82             if (rst_a)
83                 douta_reg <= {RAM_WIDTH{1'b0}};
84             else if (regcea)
85                 douta_reg <= ram_data_a;
86         end
87
88         always @(posedge clk_b) begin
89             if (rst_b)
90                 doutb_reg <= {RAM_WIDTH{1'b0}};
91             else if (regceb)
92                 doutb_reg <= ram_data_b;
93         end
94
95         assign douta = douta_reg;
96         assign doutb = doutb_reg;
97     end
98 endgenerate
99
100 endmodule

```

Listing 1: Xilinx True Dual Port RAM Template

## B. AI-Generated BRAM (ChatGPT-5.2)

```

1 module true_dual_port_ram #(
2     parameter int RAM_WIDTH = 32,
3     parameter int RAM_DEPTH = 1024,
4     parameter string RAM_PERFORMANCE = "LOW_LATENCY", // or "
5     parameter string INIT_FILE = "" // optional
6     memory init file
7 )(
8     // Port A
9     input logic clka,
10    input logic wea,
11    input logic [$clog2(RAM_DEPTH)-1:0] addra,
12    input logic [RAM_WIDTH-1:0] dina,
13    output logic [RAM_WIDTH-1:0] douta,
14
15    // Port B
16    input logic clkb,
17    input logic web,
18    input logic [$clog2(RAM_DEPTH)-1:0] addrb,
19    input logic [RAM_WIDTH-1:0] dinb,
20    output logic [RAM_WIDTH-1:0] doutb,
21
22    // Reset (used only for output registers)
23    input logic rst
24 );
25
26 localparam int ADDR_WIDTH = $clog2(RAM_DEPTH);
27
28 // Memory array
29 logic [RAM_WIDTH-1:0] ram [0:RAM_DEPTH-1];
30
31 // Internal read data (pre-output register)
32 logic [RAM_WIDTH-1:0] douta_int;
33 logic [RAM_WIDTH-1:0] doutb_int;
34
35 // -----
36 // Optional memory initialization
37 // -----
38 generate
39     if (INIT_FILE != "") begin : gen_init_file
40         initial $readmemh(INIT_FILE, ram);
41     end else begin : gen_init_zero
42         integer i;
43         initial begin
44             for (i = 0; i < RAM_DEPTH; i = i + 1)
45                 ram[i] = '0;
46         end
47     end
48 endgenerate
49
50 // -----
51 // Port A - Read-First behavior
52 // -----
53 always_ff @(posedge clka) begin
54     // Read old data first
55     douta_int <= ram[addra];

```

```

56     // Write after read
57     if (wea) begin
58         ram[addr_a] <= dina;
59     end
60 end
61
62 // -----
63 // Port B - Read-First behavior
64 // -----
65 always_ff @(posedge clk_b) begin
66     // Read old data first
67     doutb_int <= ram[addr_b];
68
69     // Write after read
70     if (web) begin
71         ram[addr_b] <= dinb;
72     end
73 end
74
75 // -----
76 // Output register stage (performance selection)
77 // -----
78 generate
79     if (RAM_PERFORMANCE == "HIGH_PERFORMANCE") begin :
gen_high_perf
80         // Extra registered output stage (2-cycle latency)
81         always_ff @(posedge clka) begin
82             if (rst)
83                 douta <= '0;
84             else
85                 douta <= douta_int;
86         end
87
88         always_ff @(posedge clk_b) begin
89             if (rst)
90                 doutb <= '0;
91             else
92                 doutb <= doutb_int;
93         end
94     end else begin : gen_low_latency
95         // Direct output (1-cycle latency)
96         always_comb begin
97             douta = douta_int;
98             doutb = doutb_int;
99         end
100    end
101 endgenerate
102
103 endmodule

```

Listing 2: ChatGPT-Generated Dual Port RAM

## C. Professional GPIO Reference (OpenTitan)

```
1 // PROFESSIONAL BENCHMARK: OpenTitan GPIO (Cleaned for Synthesis)
2 // Source: Derived from lowRISC OpenTitan 'gpio.sv'
3
4 // 1. PACKAGE DEFINITION
5 package gpio_pkg;
6   parameter int NumIOs = 32;
7 endpackage
8
9 // 2. MAIN MODULE
10 module opentitan_gpio
11   import gpio_pkg::*;
12 (
13   input  logic          clk_i ,
14   input  logic          rst_ni ,
15
16   // Register Interface (Simplified for Benchmarking)
17   input  logic          reg_we ,
18   input  logic [31:0]    reg_addr ,
19   input  logic [31:0]    reg_wdata ,
20   output logic [31:0]   reg_rdata ,
21
22   // GPIO Ports
23   input  logic [NumIOs-1:0] cio_gpio_i ,
24   output logic [NumIOs-1:0] cio_gpio_o ,
25   output logic [NumIOs-1:0] cio_gpio_en_o ,
26
27   // Interrupt Output
28   output logic [NumIOs-1:0] intr_gpio_o
29 );
30
31   // Registers
32   logic [31:0] data_in_q;
33   logic [31:0] direct_out_q;
34
35   // Input Synchronization (Standard Double Flop for Safety)
36   logic [31:0] cio_gpio_sync_1;
37   logic [31:0] cio_gpio_sync_2;
38
39   always_ff @(posedge clk_i or negedge rst_ni) begin
40     if (!rst_ni) begin
41       cio_gpio_sync_1 <= '0;
42       cio_gpio_sync_2 <= '0;
43     end else begin
44       cio_gpio_sync_1 <= cio_gpio_i;
45       cio_gpio_sync_2 <= cio_gpio_sync_1;
46     end
47   end
48
49   // Edge Detection Logic
50   logic [31:0] event_rise;
51   logic [31:0] event_fall;
52
53   always_ff @(posedge clk_i) begin
54     data_in_q <= cio_gpio_sync_2;
55   end
56
57   assign event_rise = cio_gpio_sync_2 & ~data_in_q;
```

```

58 assign event_fall = ~cio_gpio_sync_2 & data_in_q;
59
60 // -----
61 // THE CRITICAL TEST: MASKED WRITES
62 // Professional code uses efficient bitwise logic in one cycle.
63 // -----
64 always_ff @(posedge clk_i or negedge rst_ni) begin
65   if (!rst_ni) begin
66     direct_out_q <= '0;
67   end else if (reg_we) begin
68     case (reg_addr[7:0])
69       8'h00: direct_out_q <= reg_wdata; // Direct Write
70
71       // Masked Lower: Update lower 16 bits based on upper 16 mask
72       8'h04: direct_out_q[15:0] <= (direct_out_q[15:0] & ~reg_wdata
73         [31:16]) | (reg_wdata[15:0] & reg_wdata[31:16]);
74
75       // Masked Upper: Update upper 16 bits based on upper 16 mask
76       8'h08: direct_out_q[31:16] <= (direct_out_q[31:16] & ~reg_wdata
77         [31:16]) | (reg_wdata[15:0] & reg_wdata[31:16]);
78     endcase
79   end
80 end
81
82 assign cio_gpio_o = direct_out_q;
83 assign cio_gpio_en_o = {32{1'b1}}; // All outputs enabled for this
84   test
85
86 // Interrupt Logic (Simplified for synthesis comparison)
87 // Hardcoded enables to force synthesis to keep the logic
88 logic [31:0] intr_ctrl_en_rising = 32'hFFFFFF;
89 logic [31:0] intr_ctrl_en_falling = 32'hFFFFFF;
90
91 always_ff @(posedge clk_i or negedge rst_ni) begin
92   if (!rst_ni)
93     intr_gpio_o <= '0;
94   else
95     intr_gpio_o <= (event_rise & intr_ctrl_en_rising) |
96                   (event_fall & intr_ctrl_en_falling);
97 end
98
99 endmodule

```

Listing 3: OpenTitan GPIO Benchmark

## D. AI-Generated GPIO (Claude Sonnet 4.5)

```

1 module gpio #( parameter int unsigned GpioWidth = 32 ) (
2     input logic clk_i, rst_ni,
3     input logic [GpioWidth-1:0] cio_gpio_i,
4     output log// Professional GPIO Module with Interrupt Support
5 // Copyright (c) 2024. All rights reserved.
6 // Designed for 32-bit General Purpose Input/Output control
7
8 module gpio #(
9     parameter int unsigned GpioWidth = 32
10 ) (
11     input logic clk_i,
12     input logic rst_ni,
13
14     // Pad interface
15     input logic [GpioWidth-1:0] cio_gpio_i,
16     output logic [GpioWidth-1:0] cio_gpio_o,
17     output logic [GpioWidth-1:0] cio_gpio_en_o,
18
19     // Register interface - Direct Write
20     input logic direct_out_we_i,
21     input logic [GpioWidth-1:0] direct_out_data_i,
22     input logic direct_oe_we_i,
23     input logic [GpioWidth-1:0] direct_oe_data_i,
24
25     // Register interface - Masked Write Upper [31:16]
26     input logic masked_out_upper_we_i,
27     input logic [15:0] masked_out_upper_data_i,
28     input logic [15:0] masked_out_upper_mask_i,
29     input logic masked_oe_upper_we_i,
30     input logic [15:0] masked_oe_upper_data_i,
31     input logic [15:0] masked_oe_upper_mask_i,
32
33     // Register interface - Masked Write Lower [15:0]
34     input logic masked_out_lower_we_i,
35     input logic [15:0] masked_out_lower_data_i,
36     input logic [15:0] masked_out_lower_mask_i,
37     input logic masked_oe_lower_we_i,
38     input logic [15:0] masked_oe_lower_data_i,
39     input logic [15:0] masked_oe_lower_mask_i,
40
41     // Interrupt configuration
42     input logic [GpioWidth-1:0] intr_ctrl_en_rising_i,
43     input logic [GpioWidth-1:0] intr_ctrl_en_falling_i,
44     input logic [GpioWidth-1:0] intr_ctrl_en_lvhigh_i,
45     input logic [GpioWidth-1:0] intr_ctrl_en_lvlow_i,
46
47     // Interrupt status (write-1-to-clear)
48     input logic intr_state_we_i,
49     input logic [GpioWidth-1:0] intr_state_data_i,
50     output logic [GpioWidth-1:0] intr_state_o,
51
52     // Interrupt output
53     output logic [GpioWidth-1:0] intr_o
54 );
55
56     // =====

```

```

57 // Internal Signals
58 //
59 //=====
60 logic [GpioWidth-1:0] gpio_out_q, gpio_out_d;
61 logic [GpioWidth-1:0] gpio_oe_q, gpio_oe_d;
62 logic [GpioWidth-1:0] gpio_in_sync1_q, gpio_in_sync2_q;
63 logic [GpioWidth-1:0] gpio_in_prev_q;
64 logic [GpioWidth-1:0] intr_state_q, intr_state_d;
65 //
66 // Edge detection signals
67 logic [GpioWidth-1:0] rising_edge, falling_edge;
68 //
69 // Input Synchronization (Double-Flop for Metastability Protection)
70 //
71 //=====
72 always_ff @(posedge clk_i or negedge rst_ni) begin
73   if (!rst_ni) begin
74     gpio_in_sync1_q <= '0;
75     gpio_in_sync2_q <= '0;
76     gpio_in_prev_q <= '0;
77   end else begin
78     gpio_in_sync1_q <= cio_gpio_i;
79     gpio_in_sync2_q <= gpio_in_sync1_q;
80     gpio_in_prev_q <= gpio_in_sync2_q;
81   end
82 end
83 //
84 // Edge Detection Logic
85 //
86 assign rising_edge = gpio_in_sync2_q & ~gpio_in_prev_q;
87 assign falling_edge = ~gpio_in_sync2_q & gpio_in_prev_q;
88 //
89 //=====
90 // GPIO Output Data Register Logic
91 //
92 //=====
93 always_comb begin
94   gpio_out_d = gpio_out_q;
95   // Direct write has priority
96   if (direct_out_we_i) begin
97     gpio_out_d = direct_out_data_i;
98   end else begin
99     // Masked write upper [31:16]

```

```

100    if (masked_out_upper_we_i) begin
101        for (int i = 0; i < 16; i++) begin
102            if (masked_out_upper_mask_i[i]) begin
103                gpio_out_d[16+i] = masked_out_upper_data_i[i];
104            end
105        end
106    end
107
108    // Masked write lower [15:0]
109    if (masked_out_lower_we_i) begin
110        for (int i = 0; i < 16; i++) begin
111            if (masked_out_lower_mask_i[i]) begin
112                gpio_out_d[i] = masked_out_lower_data_i[i];
113            end
114        end
115    end
116
117 end
118
119 always_ff @(posedge clk_i or negedge rst_ni) begin
120     if (!rst_ni) begin
121         gpio_out_q <= '0;
122     end else begin
123         gpio_out_q <= gpio_out_d;
124     end
125 end
126
127 assign cio_gpio_o = gpio_out_q;
128
129 //=====
130 // GPIO Output Enable Register Logic
131 //=====
132
133 always_comb begin
134     gpio_oe_d = gpio_oe_q;
135
136     // Direct write has priority
137     if (direct_oe_we_i) begin
138         gpio_oe_d = direct_oe_data_i;
139     end else begin
140         // Masked write upper [31:16]
141         if (masked_oe_upper_we_i) begin
142             for (int i = 0; i < 16; i++) begin
143                 if (masked_oe_upper_mask_i[i]) begin
144                     gpio_oe_d[16+i] = masked_oe_upper_data_i[i];
145                 end
146             end
147         end
148
149         // Masked write lower [15:0]
150         if (masked_oe_lower_we_i) begin
151             for (int i = 0; i < 16; i++) begin
152                 if (masked_oe_lower_mask_i[i]) begin
153                     gpio_oe_d[i] = masked_oe_lower_data_i[i];
154                 end
155             end
156         end
157     end
158 end

```

```

154         end
155     end
156   end
157 end

158
159 always_ff @(posedge clk_i or negedge rst_ni) begin
160   if (!rst_ni) begin
161     gpio_oe_q <= '0;
162   end else begin
163     gpio_oe_q <= gpio_oe_d;
164   end
165 end

166
167 assign cio_gpio_en_o = gpio_oe_q;
168
169 /**
170 =====
171 // Interrupt Detection Logic
172 /**
173 =====

174 logic [GpioWidth-1:0] intr_detect;
175
176 always_comb begin
177   for (int i = 0; i < GpioWidth; i++) begin
178     intr_detect[i] = (intr_ctrl_en_rising_i[i] & rising_edge[i]) |
179                      (intr_ctrl_en_falling_i[i] & falling_edge[i]) |
180                      (intr_ctrl_en_lvhigh_i[i] & gpio_in_sync2_q[i])
181     |
182                      (intr_ctrl_en_lvlow_i[i] & ~gpio_in_sync2_q[i]
183    ]);
184   end
185 end

186 /**
187 =====

188 // Interrupt State Register (Write-1-to-Clear)
189 /**
190 =====

191
192 always_comb begin
193   intr_state_d = intr_state_q;
194
195   // Set interrupt bits when detected
196   intr_state_d = intr_state_q | intr_detect;
197
198   // Clear interrupt bits on write-1-to-clear
199   if (intr_state_we_i) begin
200     intr_state_d = intr_state_d & ~intr_state_data_i;
201   end
202 end

203
204 always_ff @(posedge clk_i or negedge rst_ni) begin
205   if (!rst_ni) begin
206     intr_state_q <= '0;
207   end else begin

```

```

202     intr_state_q <= intr_state_d;
203   end
204 end
205
206 assign intr_state_o = intr_state_q;
207 assign intr_o = intr_state_q;
208
209 // =====
210 // Assertions for Verification
211 //
212 `ifndef SYNTHESIS
213 // Check that direct and masked writes don't occur simultaneously
214 always_ff @(posedge clk_i) begin
215   if (rst_ni) begin
216     assert (!(direct_out_we_i && (masked_out_upper_we_i ||
217       masked_out_lower_we_i)))
218     else $error("Direct and masked GPIO output writes should not
219       occur simultaneously");
220     assert (!(direct_oe_we_i && (masked_oe_upper_we_i ||
221       masked_oe_lower_we_i)))
222     else $error("Direct and masked GPIO OE writes should not occur
223       simultaneously");
224   end
225 end
226 `endif
227
228 endmoduleic [GpioWidth-1:0] cio_gpio_o, cio_gpio_en_o,
229 // ... [Additional Interface Ports] ...
230 );
231 // Software-style loop-based masking (Causes Logic Inefficiency)
232 always_comb begin
233   gpio_out_d = gpio_out_q;
234   if (masked_out_lower_we_i) begin
235     for (int i = 0; i < 16; i++) begin
236       if (masked_out_lower_mask_i[i]) begin
237         gpio_out_d[i] = masked_out_lower_data_i[i];
238       end
239     end
240   end
241 end
242
243 endmodule

```

Listing 4: Claude-Generated GPIO with Loop-Based Masking

# Bibliography

- [1] W. Meeus, K. Van Beeck, T. Goedemé, J. Meel, and D. Stroobandt, "An overview of today's high-level synthesis tools," *Design Automation for Embedded Systems*, vol. 16, no. 3, pp. 31–51, Aug. 2012.
- [2] M. W. Numan, B. J. Phillips, G. S. PudDY, and K. Falkner, "Towards Automatic High-Level Code Deployment on Reconfigurable Platforms: A Survey of High-Level Synthesis Tools and Toolchains," *IEEE Access*, vol. 8, pp. 174692-174722, 2020.
- [3] R. Nane et al., "A Survey and Evaluation of FPGA High-Level Synthesis Tools," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 35, no. 10, pp. 1591-1604, Oct. 2016.
- [4] J. Marjanovic, "LOW VS HIGH LEVEL PROGRAMMING FOR FPGA," in *Proc. IBIC2018*, Shanghai, China, 2018, pp. 1-7.
- [5] S. Alsaqer, S. Alajmi, I. Ahmad, and M. Alfailakawi, "The potential of LLMs in hardware design," *Journal of Engineering Research*, vol. 13, no. 1, pp. 2392-2404, 2025.
- [6] Z. Liu, Y. Dou, J. Jiang, and J. Xu, "Automatic Code Generation of Convolutional Neural Networks in FPGA Implementation," in *Proc. 2016 IEEE International Conference on Parallel and Distributed Processing with Applications*, Changsha, China, 2016, pp. 1-8.
- [7] M. Dossis and G. Dimitriou, "Are HLS Tools Healthy? The C-Cubed Project," *Engineering, Technology & Applied Science Research*, vol. 5, no. 2, pp. 790-794, 2015.
- [8] Y. Bai et al., "Learning to Compare Hardware Designs for High-Level Synthesis," in *Proc. 2024 ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD '24)*, Salt Lake City, UT, USA, Sep. 2024, pp. 1-7.
- [9] P. Vijayaraghavan et al., "Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization," in *Proc. 2024 ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD '24)*, Salt Lake City, UT, USA, Sep. 2024, pp. 1-10.
- [10] R. Al Amin, M. G. R. Lincoln, and R. Obermaisser, "Towards LLM-Assisted HDL Generation and Verification," in *Proc. 2025 14th Mediterranean Conference on Embedded Computing (MECO)*, Budva, Montenegro, Jun. 2025, pp. 1-4.
- [11] S. Thakur et al., "Autochip: Automating hdl generation using llm feedback," *arXiv preprint arXiv:2311.04887*, 2023.