



**EEDG/CE 6370**  
**Design and Analysis of Reconfigurable Systems**  
**Homework 6**  
**High-Level Synthesis**

## 1. Laboratory Objectives

- Learn how to use a commercial HLS tool.
- Perform the 3 main HLS steps using the commercial HLS tool:
  - Resource allocation
  - Resource Scheduling
  - Binding
- Verify that the design works as expected.
- Learn how to use the automatic testbench generator.
- Create input stimuli files.
- Simulate using the cycle-accurate model generator.
- Perform RTL simulation

## 2. Summary

- In this lab you will learn to use a commercial HLS tool synthesizing a design from C into RTL and verifying it.

## 3. Tool Requirement

- NEC CyberWorkbench

Connect to your UTD NoMachine account and edit your `~/.bashrc` file in your home directory adding the following:



```
# NEC CWB
export CYBER_PATH=/proj/cad/cwb-6.1
export CYBER_ADMIN_PATH=${CYBER_PATH}/linux_x86_64/admin
export CYBER_SYSTEMC_HOME=${CYBER_PATH}/osci
export CYBER_LIB=${CYBER_PATH}/lib
export LD_LIBRARY_PATH=${CYBER_PATH}/lib:${LD_LIBRARY_PATH}
export CYLMD_LICENSE_FILE=27010@legolas1.utdallas.edu
export PATH=$PATH:${CYBER_PATH}/bin
export PATH=$PATH:${CYBER_ADMIN_PATH}/bin
# Siemens Modelsim (RTL simulator)
source /proj/cad/startup/profile.mentor_2022
```

Test that it works by entering the following command at the terminal window:

```
% cwb &
```

This should launch CyberWorkBench (CWB) OR type:

```
%which cwb
```

**Note:** We do only have 8 CWB GUI licenses. When you start CWB the GUI is launched and a license is being used. Once you are done, please close the GUI and make sure that the GUI process is not running in the background. If it is → kill it as follows:

```
% ps -aux | grep <username>
% kill 9 <PID GUI>
```

If no GUI license is available you can execute the synthesis process by calling the individual tools from them command prompt as shown in the appendix.

## 4. Average of 8 Project

### Creating a new empty project

1. Open CWB (either clicking on the CWB icon on the desktop or on linux %cwb &) - /proj/cad/cwb-6.1/bin/cwb
2. Create a new project and workspace called ave8 (File→New)
3. Set the project parameters as shown in the figure:



The main items that need to be specified are:

- Clock period ( $2000 \times 1/100\text{ns} = 20\text{ns} \Leftrightarrow 50\text{ MHz}$ )
- Target device Cyclone V 5CSEMA5F31C6 device
- Logic Synthesis tool → Quartus II
- Select the Cyclone V libraries which contain the area and delay information of the basic operations
  - BLIB (Basic Library): contains the delay and area information of basic logic gates (AND, OR,etc..) and muxes of different sizes (cycloneV.BLIB)
  - FLIB (Functional Unit Library): Contains the area and delay information of Functional units of different sizes (e.g. 2-4-6-8-12 bit adder) (cycloneV.FLIB)

4. Click on Apply to generate the empty project

#### Editing or Inserting a ANSI-C description for synthesis

1. Right click on source → Import → ave8.c



2. Parse source code to check if there are any syntax errors. The console window will indicate if any errors have occurred and where. A 'light blue' folder with an .IFF file will be generated if the parsing was successful



## Setting up Synthesis mode and other constraints

1. Right click on ‘light-blue’ folder → Synthesis options



2. Set the scheduling mode to ‘Automatic scheduling’. This implies that CWB will automatically time the C description at the scheduling phase. Manual scheduling implies that the user will manually time the description using the ‘\$’ sign as a clock boundary.
3. Other synthesis options that control the synthesis process can also be set here as well as the target clock frequency.
4. Click on ‘Apply’ and OK to accept the changes.

## Create the FU constraint files (Resource Allocation)

1. Right-click on ‘light-blue’ folder → Generate Constraint → Generate FLIB/FCNT to open a dialog window that allows the generation of the resource constraint file



2. Increase the ‘Large Unit’ field to 10 as shown in the dialog window.



CWB considers FUs of different bws as Large ( $\leq 8$  bits), middle (8bits $\leq 4$  bits) and small ( $> 4$  bits). By default it limits the number of FUs allocated as indicated in the dialog window to 5:10:20

3. Click ‘Generate’. Two new files will be generated:

**ave8-auto.FLIB:** FU Library file containing FUs area and delay of FUs not contained in the initial FLIB/BLIB files (empty in most cases like in this)

**ave8-auto.FCNT:** Constraint file limiting the number of FUs tha the synthesizer can instantiate in paralle.

Review the contents of the files by ‘right-click → open by internal editor



The screenshot shows the Quartus Project Manager interface. On the left, the project tree for 'ave8\_lab' contains a folder 'ave8' which is expanded to show 'src', 'include', 'testbench', 'Other files', and a 'Synthesis' folder containing 'RTL.aed'. The 'ave8-auto.FCNT' file is selected in the internal editor window on the right, showing its Verilog-like content:

```
1  #@VERSION(2.00)
2  #@CNT{ave8}
3  #@KIND(BASIC_OPERATOR)
4  #@CLK 2000
5  #@UNIT 1/100ns
6  @FCNT{
7      NAME    add8u
8      LIMIT   4
9      #      COMMENT
10 }
11 @FCNT{
12     NAME    add12u
13     LIMIT   3
14     #      COMMENT
15 }
16 #@HASH(b6836427a6977ba58df8e5dd1f14d7)
17 #@END(ave8)
18
19
20
21
22
23
24
25
26
27
28
```

## High-Level Synthesis

The conversion of the C program into RTL (Verilog or VHDL) can now take place:

1. Select Verilog at the toolbar



2. Right-click on 'light-blue' folder → Synthesize/Generate RTL (or toolbar)



A report file (Quality of Results-QoR) will be generated and appears automatically. It contains all the synthesis information: FPGA resources used, critical path, design latency, etc...

**NOTE:** The information reported is based on the FLIB and BLIB file provided. It is therefore important to provide the correct library files or to re-generate these if a different device is targeted.  
Also, the reported data should be confirmed performing a logic synthesis and a simulation.

The screenshot displays the Cyber RTL Synthesis Report for the module 'ave8'. The left pane shows the project structure with files like 'ave8.sv', 'ave8.vhd', and 'ave8.v'. The right pane contains several tabs: 'Summary' (Basic Library Name: DvBSTDLIB (BLB Version 2.00), Clock period: 20ns), 'Latency' (Primary Port: Memory, Functional Unit: Register, Multiplexer, Decoder, Timing, WIRE), 'Registers' (Registers: 117, Block memory bits: 58, DSPs: 0), 'Controller' (STATES / FSM: 1, STATES / FSM DECODER DELAY: 0.59ns, WIRE: 193, PORT: 435, TOTAL IN: 18, OUT: 10, IN / OUT: 8), and 'Important Information' (False path: Unchecked, Multi-cycle path: Unchecked, False loop/Combinational loop: 0, Latch [bit]: 0). Below these are sections for 'Synthesis Analysis' (CSV, LOG, QOR, HTML, UML, XML, YML, TDF, TDF File, State Trans., Cycle level, Structure), 'Timing' (Summary), and 'Latency' (Loop: total, Type: S, Latency: Reset = L1, Latency index: 2, State No: 0, 1, Folding loop: -, Folding stages: -, Total folding states: -, Hazard: -, User operator: -, Sub loops: 1, Line: ..;/ave8.c:31).

Other report files and information files are also created (.err, .SUMM, .tips). The .err files contains any possible synthesis error, warnings and tips to improve the synthesis results.

### 3. Open ave8.err file and review the warning, information and tips messages

| Quality of Result(2013/11/11 15:35:08) |            | ave8-auto.FCNT                                                                                                                                                                                                                                                                               | bdltran Error Viewer |
|----------------------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| No                                     | Error Type | Detail                                                                                                                                                                                                                                                                                       |                      |
| 005                                    | W_BT4486   | The bitwidth of reference variable <code>[i]</code> automatically reduced from 32 bit to 3 bit.<br>[Action] Confirm the bitwidth etc. for assignment point variable.<br>[Source Lines]<br>40(..;/ave8.c); <code>fifo[i] = fifo[i- 1];</code><br>51(..;/ave8.c); <code>sum += fifo[i];</code> |                      |
| 008                                    | W_BT4476   | The bitwidth of constant 1 is reduced from 32 bit to 3 bit.<br>[Action] Check the assignment destination bitwidth etc.<br>[Source Lines]<br>40(..;/ave8.c); <code>fifo[i] = fifo[i- 1];</code>                                                                                               |                      |
| 009                                    | W_BT4486   | The bitwidth of reference variable <code>(out0_v)</code> automatically reduced from 32 bit to 8 bit.<br>[Action] Confirm the bitwidth etc. for assignment point variable.<br>[Source Lines]<br>58(..;/ave8.c); <code>out0 = out0_v;</code>                                                   |                      |
| 015                                    | W_BT8766   | Since the assignment to the following variable( <code>fifo_a07</code> ) is non-referencing so delete it.<br>[Action] Rewrite as description not having non-referencing assignment.<br>[Source Lines]<br>23(..;/ave8.c); <code>var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0};</code>            |                      |

## Review of Synthesis Result

The synthesis result can be reviewed in different ways.

1. Schematic View: Right-click ‘light-blue’ folder → Show Datapath



This opens the CWB’s datapath viewer. Clicking on any part of the schematic will also highlight the source code in CWB that corresponds to it (cross referencing)

2. Signal Table: Right-click ‘light-blue’ folder → Show Signal Table

| Signal Name    | C/bdl                                           | 00  | 01 |
|----------------|-------------------------------------------------|-----|----|
| Port           |                                                 | R   | W  |
| in0            | in ter[0:8] in0;                                | 1   |    |
| out0           | out ter[0:8] out0;                              | 2   |    |
| Register       |                                                 | RW  |    |
| RG_fifo        | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; 4  |     |    |
| RG_fifo_1      | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; 5  |     |    |
| RG_fifo_2      | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; 6  |     |    |
| RG_fifo_3      | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; 7  |     |    |
| RG_fifo_4      | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; 8  |     |    |
| RG_fifo_5      | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; 9  |     |    |
| RG_fifo_6      | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; 10 |     |    |
| Memory         |                                                 | USE |    |
| Function Unit  |                                                 |     |    |
| add12u_11@1    |                                                 | 17  |    |
| add12u_11_10@1 |                                                 | 18  |    |
| add12u_11_10@2 |                                                 | 19  |    |
| add8u@1        |                                                 | 13  |    |
| add8u@2        |                                                 | 14  |    |
| add8u@3        |                                                 | 15  |    |
| add8u@4        |                                                 | 16  |    |

The Signal table shows the timing diagram of the synthesis. When inputs and outputs are being read or written and when registers accessed. State 00 = reset state and state 01 = stating doing the computation.

## Scheduling Result

You can review the scheduling result, by enabling a synthesis option to show the scheduled of the synthesized circuit. In Information → select Display Data Flow Graph



Re-synthesize the circuit. The CDFG will be displayed. The synthesis will continue after the window is closed.



## Logic Synthesis

It is now possible to call Intel/Altera's or AMD/Xilinx's logic synthesizers from within CWB. For this, first we need to create a synthesis script for the target FPGA vendor. Then the logic synthesizer can be executed by either opening its IDE (GUI) or from the command line interface (CUI).

**NOTE:** Make sure that CWB has the path to Quartus setup in CWB's GUI toolbar: Tools→Options→Tool Path

1. Create Logic synthesis script: Right click on light blue process folder → Logic synthesis options/Generate script → Execute

This creates



This will create a .tcl script file for Quartus. The newly generated red folder includes all the information regarding the logic synthesis



**Note:** You might need to set up the path for the Altera tools in Tools → Options → Tool Path

2. Logic Synthesis: Right clicking on the red folder → Logic synthesis (CUI) will launch Quartus in command mode. Quartus will start and the Quartus outputs will be displayed on CWB's console window



Once Quartus finishes, the logic synthesis results can be observed at the report folder.

**Note:** If Quartus is not installed on the Linux machine that you are using, then you will have to copy the RTL code generated by CWB and manually generate a Quartus project on your local machine.

To see if Quartus is installed type on the terminal window:

```
%which quartus
```

## Design Verification: Software Simulation

There are different levels of simulations that can be used to verify the design:

- Pure SW simulation: normal SW simulation
- Behavioral simulation: untimed simulation (like SW), but considering the HW data types (ter, var, reg)
- Cycle-accurate simulation : timed simulation of the scheduled synthesis results
- RTL simulation

1. The first step is to do a normal SW simulation from within CWB. Change to the ‘Project’ tab → Create Simulation Scenario.



2. Select “Behavioral” and C/C++ and call the scenario name ‘ave8\_sw’
3. Select the top process (check the check box)



4. Right-click on ‘ave8\_sw’ folder → options → Build tab -> Add C to the preprocessor definitions



##### 5. Add the indata.txt input stimuli file to the simulation environment



##### 6. Build the binary (.exe file) and simulate it

The outdata.txt file with the SW simulation values will be generated. This will be our ‘golden’ output against which the simulation results throughout the different synthesis stages will be compared against.

**Note:** Based on the operating system used by the Linux machine that you use, the GUI might give you an error like:

‘awk: error while loading shared libraries: libgmp.so.3: cannot open shared object file: No such file or directory’

In this case you can compile the code following these steps at the terminal window:

- 1.) Go to the folder where you created the project
- 2.) Inside you will find a 'simulation\_work' folder
- 3.) Go there and into the actual simulation that you want to execute. SW, cycle-accurate, etc...
- 4.) Compile the code from there by executing the Makefile generated by the GUI:

```
% make -f Makefile.GNU
```

This generates the .exe file

- 5.) Run the compiled program

```
%./name.exe
```

## RTL simulation

The last verification step is to simulate if the RTL generated also matches the 'golden' output.

1. Create a new simulation scenario and select RTL. Name the scenario ave8\_rtl



2. Copy the indata.txt and outdata.txt file into the new simulation folder and rename as in0.clv and out0.clv. CWB needs an input and output file with the same name as the ports as input stimuli.
3. Generate the testbench : Select RTL simulator (Modelsim) and input stimuli every cycle, Input data from file and radix as decimal. Click OK and run the simulation



**Note:** Make sure you have added Modelsim to your path by adding this to your `~/.bashrc` file or calling it from the terminal:

```
source /proj/cad/startup/profile.mentor_2022
```

A .vcd file which can be opened by a waveform viewer (e.g. GTKwave) is generated. Double click on it to open it.



The simulation output should match the result of the SW simulation.

## Design Space Exploration

C-Based design allows the generation of different architectures with different area vs. performance constraints without having to modify the actual C code. This is mainly done by modifying the synthesis constraints. E.g. FCNT constraint file or synthesis options.

1. Open the synthesis window to observe the synthesis result of the current design in the design space exploration window. Right-click on ‘light blue’ folder → Show synthesis results → Synthesis window



The Y-axis represents the area of the design, while the y-axis can be modified to represent different things. Modify to ‘Latency index’

2. Open the ave8-auto.FCNT resource constraint file and reduce the number of FUs to 1



3. Apply the changes

4. Set manual FCNT generation so that CWB does not generate a new FCNT file when re-synthesizing



- Re-synthesize the design. A new design with different Area and latency characteristics is generated and plotted in the synthesis window



Opening the schematic viewer confirms that only 2 FUs have been instantiated. The synthesized results maximizes resource sharing.



Opening the signal table shows that the latency has now increased from 1 cycle to 5

| Signal Table  |                                              | Port | Reg | Mem | Fu  | Others |
|---------------|----------------------------------------------|------|-----|-----|-----|--------|
| Signal Name   | C/bdl                                        | 00   | 01  | 02  | 03  | 04     |
| Port          |                                              |      |     | R   |     | W      |
| in0           | in ter[0:8] in0;                             |      |     |     |     |        |
| out0          | out ter[0:8] out0;                           |      |     |     |     |        |
| Register      |                                              | RW   | RW  | RW  | RW  | RW     |
| RG_fifo       | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; | RW   | RW  | RW  | RW  | RW     |
| RG_fifo_1     | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; | RW   | RW  | RW  | RW  | RW     |
| RG_fifo_2     | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; | RW   | RW  | RW  | RW  | RW     |
| RG_fifo_3     | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; | RW   | RW  | RW  | RW  | RW     |
| RG_fifo_4     | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; | R    | R   | RW  | RW  | RW     |
| RG_fifo_5     | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; | R    | R   | RW  | RW  | RW     |
| RG_fifo_6     | var[0:8] fifo[8] = {0, 0, 0, 0, 0, 0, 0, 0}; |      | RW  | RW  | RW  | RW     |
| Memory        |                                              | USE  | USE | USE | USE |        |
| Function Unit |                                              | USE  | USE | USE |     |        |
| add12u_11@1   |                                              | USE  | USE | USE |     |        |
| add8u@1       |                                              |      |     |     |     |        |

Re-verify the new design. No new simulation scenarios are needed. CWB will automatically re-generate the RTL, testbenches and cycle-accurate simulation models.

## Transactional Level Simulation

The input and output files have the extension .clv because each data in them represent an input/output for that specific port each cycle. In this new design an input should only be read every 4 clock cycles. Two options are available:

- Modify the in0.clv file inserting 0 0 0 every data
- Using transactional level simulation. In this case the synthesizer will generate a valid signal for each port and set the signal when a new input is needed. The automatically generated testbench will monitor the values of these valid signals and only apply a new test vector when the valid signal for that particular port is set.

1. Copy the .clv files and rename them as .tlv (e.g in0.tlv and out0.tlv)
2. Open the synthesis options dialog window → detail settings → Circuit specifications1 → set ‘Valid signal for each port is generated as an output port.



3. Re-synthesize the design and re-run an RTL simulation.

## Appendix I – Calling CWB from command line

1.) Copy the .c file and the .FLIB and .BLIB file to the same directory. You should basically have:

```
ave8.C  
cycloneV-7.FLIB  
cyclone-7.BLIB
```

2.) Parse the behavioral description. If no errors are found a ave8.IFF is generated

```
% cpars ave8.c
```

3.) Synthesize the parsed file. If there are no issues, then an ave8\_E.IFF file will be generated and all of the report files:

```
%bdltran -c2000 -s ave8.IFF -lfl cycloneV-7.FLIB -lb cycloneV-7.BLIB
```

- -c2000 is the clock period in 1/100ns units, hence,  $2000=20\text{ns} = 50\text{Mhz}$
- -s means automatic scheduling mode. HLS takes care of everything

If you want to synthesize the design using a manually edited Functional Unit Constraint file (FCNT), then add these options:

```
%bdltran -c2000 -s ave8.IFF -lfl cycloneV-7.FLIB -lb cycloneV-7.BLIB -Zresource_fcnt=USE -lfc ave8-auto.FCNT
```

4.) Generate Verilog or VHDL:

```
%veriloggen ave8_E.IFF  
%vhdlgen ave8_E.IFF
```

The output should be either ave8\_E.v or ave8\_E.vhd which is the final HLS output.

[END]