

# Vitis Design Flow Lab

## Introduction

This lab provides a basic introduction to high-level synthesis using the Vitis flow. You will use Vitis to create a project. You will simulate, synthesize, and implement the provided design.

## Objectives

After completing this lab, you will be able to:

- Create a new project using Vitis
- Simulate a C design by using a self-checking test bench
- Synthesize the design
- Perform design analysis using the Analysis Perspective view
- Perform co-simulation on a generated RTL design by using a provided C test bench
- Implement a design

## Steps

### Create a New Project

#### Create a new project in Vitis HLS targeting PYNQ-Z2 board

##### 1. Launch Vitis: Select **Create Component...** > **Create Empty HLS Component**



Getting Started view of Vitis

##### 2. Click the *Browse...* button of the Location field and browse to **{labs}\lab1** on a Windows machine or **{labs}/lab1** on a Linux machine creating sub-folders as necessary, and then click **OK**.

Note: From this point onward reference will be made to Linux name.

3. For Project Name, type **matrixmul**.



New Vitis HLS Project wizard

4. Click **Next**.
5. In the *Configuration File* window, select the *Empty File* option and click **Next**.
6. In the *Source Files* window, type **matrixmul** as the *Top Function* name (the provided source file contains the function, called matrixmul, to be synthesized).
7. Click the *Add Files...* button (which in the line of DESIGN FILES), select **matrixmul.cpp** file from the **{labs}/lab1** folder, and then click **Open**.
8. Then we add the test file, in the next block click *Add Files...* button (which in the line of TEST BENCH FILES), select **matrixmul\_test.cpp** file from the **/home/xup/hls/labs/lab1** folder and click **Open**.
9. Select the **matrixmul\_test.cpp** in the files list window and click the *Edit CFLAG...* button, type **-DHW\_COSIM** (there has a "-" in here, don't forget it), and click **OK**. (This defines a compiler flag that will be used later.)
10. Click **Next**.



### Source Files setting

11. In the **Hardware** page, select **Part** field, enter **xc7z020clg400-1** in the **Search** field and click **next**.

here you should select the device that you are using, In my class Zedboard using xc7z020clg484-1



*Using Search to select the chip*

12. In the *Settings* page, type 10ns in the clock. Click **next**.
13. Click **Finish**. You will see the created project in the *VITIS COMPONENTS* view. Expand various sub-folders to see the entries under each sub-folder.

*Explorer Window*

14. Double-click on the **matrixmul.cpp** under the source folder to open its content in the information pane.

```
29 #include "matrixmul.h"
30
31 void matrixmul(
32     mat_a_t a[MAT_A_ROWS][MAT_A_COLS],
33     mat_b_t b[MAT_B_ROWS][MAT_B_COLS],
34     result_t res[MAT_A_ROWS][MAT_B_COLS])
35 {
36     // Iterate over the rows of the A matrix
37     Row: for(int i = 0; i < MAT_A_ROWS; i++) {
38         // Iterate over the columns of the B matrix
39         Col: for(int j = 0; j < MAT_B_COLS; j++) {
40             // Do the inner product of a row of A and col of B
41             res[i][j] = 0;
42             Product: for(int k = 0; k < MAT_B_ROWS; k++) {
43                 res[i][j] += a[i][k] * b[k][j];
44             }
45         }
46     }
47 }
48 }
```

### *The Design under consideration*

It can be seen that the design is a matrix multiplication implementation, consisting of three nested loops. The Product loop is the inner most loop performing the actual Matrix elements product and sum. The Col loop is the outer-loop which feeds the next column element data with the passed row element data to the Product loop. Finally, Row is the outer-most loop. The `res[i][j]=0` (line 41) resets the result every time a new row element is passed and new column element is used.

## Run C Simulation

1. Select **FLOW > C Simulation > Run.**

it may be have a windows ask if Enable Code Analyzer, click **Yes, enable Code Analyzer**

The files will be compiled and you will see the output in the Console window.



The screenshot shows the VS Code interface with the 'OUTPUT' tab selected. The output window displays the following log entries:

```
matrixmul::c-simulation X
30    INFO: [HLS 211-200] Linking executable
31    INFO: [HLS 211-200] Computing HLS IR Information
32    INFO: [HLS 211-200] Determining source code dependencies
33    {
34    {870,906,942}
35    {1086,1131,1176}
36    {1302,1356,1410}
37    }
38    Test passed.
39    INFO: [SIM 211-200] Analyzing trace data
40    INFO: [SIM 211-208] Running Code Analyzer
41    INFO: [SIM 211-210] Code Analyzer finished
42    INFO: [vitis-run 60-791] Total elapsed time: 0h 0m 18s
43    C-simulation finished successfully
44
```

### Program output

2. Double-click on **matrixmul\_test.cpp** under **testbench** folder in the Explorer to see the content.

You should see two input matrices initialized with some values and then the code that executes the algorithm. If **HW\_COSIM** is defined (as was done during the project set-up) then the **matrixmul** function is called and compares the output of the computed result with the one returned from the called function, and prints **Test passed** if the results match. If **HW\_COSIM** had not been defined then it will simply output the computed result and not call the **matrixmul** function.

## Run Debugger

### Run the application in debugger mode and understand the behavior of the program.

1. Click **Debug** below the **Run**.

The application will be compiled with **-g** option to include the debugging information, the compiled application will be invoked, and the debug perspective will be opened automatically.

2. The *Debug* perspective will show the **matrixmul\_test.cpp** in the source view, **argc** and **argv** variables defined in the **Variables > Local**, thread created and the program suspended at the main() function entry point in the *Debug* view.



### A Debug perspective

3. Scroll-down in the source view, and click in the left size of the line 67 where it is about to output "{" in the output console window. This will set a break-point at line 67.

The breakpoint is marked with a red circle.

```

66 // Print result matrix
67 • cout << "{" << endl; You, 2 days ago
68 //cout << setw(6);
69 ▼ for (int i = 0; i < MAT_A_ROWS; i++) {
70     cout << "{";
71     for (int j = 0; j < MAT_B_COLS; j++) {
72 ▼ #ifdef HW_COSIM

```

4. Similarly, set a breakpoint at line 63 in the matrixmul() function.

5. Using the **Step Over** (F10) button several times, observe the execution progress, and observe the variable values updating, as well as computed software result.

```

45     result_t hw_result[3][3], sw_result[3][3];
46     int err_cnt = 0;
47
48     // Generate the expected result
49     for(int i = 0; i < MAT_A_ROWS; i++) {
50         for(int j = 0; j < MAT_B_COLS; j++) {
51             // Iterate over the columns of the B matrix
52             // Initialize the result[i][j] = 0;
53             sw_result[i][j] = 0;
54
55             // Do the inner product of a row of A and col of B
56             for(int k = 0; k < MAT_B_ROWS; k++) {
57                 sw_result[i][j] += in_mat_a[i][k] * in_mat_b[k][j];
58             }
59         }
60     }
61
62     #ifdef HW_COSIM
63     // Run the AutoESL matrix multiply block
64     matrixmul(a: in_mat_a, b: in_mat_b, res: hw_result);
65     #endif
66
67     // Print result matrix
68     cout << "C" << endl;
69     //cout << setw(6);
70     for (int i = 0; i < MAT_A_ROWS; i++) {
71         cout << "C";
72         for (int j = 0; j < MAT_B_COLS; j++) {
73             #ifdef HW_COSIM
74                 cout << hw_result[i][j] << " ";
75             #else
76                 cout << sw_result[i][j] << " ";
77             #endif
78         }
79         cout << endl;
80     }
81 
```

*Debugger's intermediate output view*

6. Now click the **Restart** button or *ctrl+shift+F5* and then click the **Continue** or F5 to complete the software computation and stop at line 63.
7. Observe the following computed software result in the variables view.



*Software computed result*

8. Click on the **Step Into** (F11) button to traverse into the **matrixmul** module, the one that we will synthesize, and observe that the execution is paused on line 37 of the module.
9. Using the **Step Over** (F6) several times, observe the computed results. Once satisfied, you can use the **Restart** and **Continue** back to the line 63.
10. Set a breakpoint on line 96 (return err\_cnt;), and click on the **Continue** button. The execution will continue until the breakpoint is encountered. The console window will show the results as seen earlier (Figure 11, which is up there).
11. Press the **Continue** button to finish the debugging session.

## Synthesize the Design

**Switch to Synthesis view and synthesize the design with the defaults. View the synthesis results and answer the question listed in the detailed section of this step.**

1. Switch to the *Synthesis* view by clicking **Vitis Components** button.



*The button*

2. Select **Flow > C SYNTHESIS > Run** to start the synthesis process.
3. When the synthesis process is completed, Select **C SYNTHESIS > REPORTS > Synthesis** to open the synthesis page.

The screenshot shows the 'Summary Synthesis Report - matrixmul' page. At the top, it displays general information: Version 2024.1 (Build 5069499 on May 21 2024), Product family zynq, and Target device xc7z020-clg484-1. Below this, the 'Estimated Quality of Results' section shows a timing estimate table:

| TARGET   | ESTIMATED | UNCERTAINTY |
|----------|-----------|-------------|
| 10.00 ns | 6.638 ns  | 2.70 ns     |

The 'Performance & Resource Estimates' section includes a table with columns: MODULES & LOOPS, ISSUE TYPE, VIOLATION TYPE, LATENCY(CYCLES), LATENCY(NS), ITERATION LATENCY, INTERVAL, TRIP COUNT, PIPELINED, BRAM, DSP, FF, LUT, URAM. A legend at the bottom indicates: Modules (green circle), Loops (orange circle), and Hide empty columns (checkbox).

*Report view after synthesis is completed*

4. If you expand **Output > syn** in Explorer, several generated files including report files will become accessible.



*Explorer view after the synthesis process*

Note that when the **syn** folder under the *Output* folder is expanded in the *Explorer* view, it will show *report*, *verilog*, and *vhdl* sub-folders under which report files, and generated source (vhdl, verilog, header, and cpp) files. By double-clicking any of these entries one can open the corresponding file in the information pane.

Also note that if the target design has hierarchical functions, reports corresponding to lower-level functions are also created.

5. The *Synthesis Report* shows the performance and resource estimates as well as estimated latency in the design.
6. Using scroll bar on the right, scroll down into the report and answer the following question.

**Question 1** Answer the following question:

Estimated clock period:

Worst case latency:

Number of DSP48E used:

Number of FFs used:

Number of LUTs used:

7. The report also shows the top-level interface signals generated by the tools. This can be seen in the Synthesis page down below.

## Summary Synthesis Report - matrixmul

### ✓ HW Interfaces

#### ✓ AP\_MEMORY

| PORT         | DIRECTION | BITWIDTH |
|--------------|-----------|----------|
| a_address0   | out       | 4        |
| a_address1   | out       | 4        |
| a_q0         | in        | 8        |
| a_q1         | in        | 8        |
| b_address0   | out       | 4        |
| b_address1   | out       | 4        |
| b_q0         | in        | 8        |
| b_q1         | in        | 8        |
| res_address0 | out       | 4        |
| res_d0       | out       | 16       |

#### ✓ TOP LEVEL CONTROL

| INTERFACE | TYPE       | PORTS                             |
|-----------|------------|-----------------------------------|
| ap_clk    | clock      | ap_clk                            |
| ap_rst    | reset      | ap_rst                            |
| ap_ctrl   | ap_ctrl_hs | ap_done ap_idle ap_ready ap_start |

### ✓ SW I/O Information

#### ✓ Top Function Arguments

| ARGUMENT | DIRECTION | DATATYPE |
|----------|-----------|----------|
| a        | in        | char*    |
| b        | in        | char*    |
| res      | out       | short*   |

*Generated interface signals*

You can see **ap\_clk**, **ap\_rst**, **ap\_idle** and **ap\_ready** control signals are automatically added to the design by default. These signals are used as handshaking signals to indicate when the design is ready to take the next computation command (ap\_ready), when the next computation is started (ap\_start),

and when the computation is completed (ap\_done). Other signals are generated based on the input and output signals in the design and their default or specified interfaces.

## Analyze using Analysis Perspective

### Switch to the Analysis Perspective and understand the design behavior.

1. Select **C SYNTHESIS > REPORTS > Schedule Viewer** to open the analysis viewer.

The Analysis perspective consists of 4 panes as shown below. Note that the module and loops hierarchies are displayed unexpanded by default. The **Module Hierarchy** pane shows both the performance and area information for the entire design and can be used to navigate through the hierarchy. The **Performance Profile** pane is visible and shows the performance details for this level of hierarchy. The information in these two panes is similar to the information reviewed earlier in the synthesis report. The **Schedule Viewer** is also shown in the right-hand side pane. This view shows how the operations in this particular block are scheduled into clock cycles.



*Analysis perspective*

2. Click on **>** of loop *Row\_Col* to expand.



*Performance matrix showing top-level Row operation*

From this we can see that there is an add operation performed. This addition is likely the counter to count the loop iterations, and we can confirm this.

3. Select the block for the **adder** (**add\_in75\_3(+)**), right-click and select **Goto Source**. The source code pane will be opened, highlighting line 37 where the loop index is being tested and incremented.



*Cross probing into the source file*

4. Click on the **Schedule Viewer** tool bar button to switch back to the *Synthesis* view.

Run C/RTL Co-simulation

**Run the C/RTL Co-simulation with the default settings of VHDL. Verify that the simulation passes.**

1. Select **Flow > C/RTL COSIMULATION > Run**, it will automatically run. Wait for the COSIMULATION to complete. The C/RTL Co-simulation will run, generating and compiling several files, and then simulating the design. It goes through three stages. First, the VHDL test bench is executed to generate input stimuli for the RTL design.

Second, an RTL test bench with newly generated input stimuli is created and the RTL simulation is then

performed.

Finally, the output from the RTL is re-applied to the VHDL test bench to check the results. In the console window you can see the progress and also a message that the test is passed.

This eliminates writing a separate testbench for the synthesized design.



```

OUTPUT X DEBUG CONSOLE X PROBLEMS 1 X
matrixmul:co-simulation x
Vitis Messages matrixmul:c:simulation_debug matrixmul:synthesis matrixmul:co-simulation x
115 $finish called at time : 425 ns : File "E:/robot/project/xilinx_fpga_class/hls/lab1/sources/matrixmul/matrixmul/hls/
116 ## quit
117 INFO: [Common 17-206] Exiting xsim at Tue Feb 11 16:13:16 2025...
118 [
119 {870,906,942}
120 {1086,1131,1176}
121 {1302,1356,1410}
122 ]
123 Test passed.
124 INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
125 INFO: [COSIM 212-211] II is measurable only when transaction number is greater than 1 in RTL simulation. Otherwise,
126 INFO: [HLS 200-112] Total CPU user time: 2 seconds. Total CPU system time: 2 seconds. Total elapsed time: 39.559 sec
127 INFO: [vitis-run 60-791] Total elapsed time: 0h 0m 45s
128 Co-simulation finished successfully
129

```

*Console view showing simulation progress*

2. Once the simulation verification is completed, the simulation report tab will open showing the results.

Click **C/TRL COSIMULATION > REPORTS > Cosimulation** to see report. The report indicates if the simulation passed or failed. In addition, the report indicates the measured latency and interval.

Since we have selected only VHDL, the result shows the latencies and interval (initiation) which indicates after how many clock cycles later the next input can be provided.



| MODULES & LOOPS | Avg II | Max II | Min II | Avg Latency | Max Latency | Min Latency | Total Execut |
|-----------------|--------|--------|--------|-------------|-------------|-------------|--------------|
| matrixmul (1)   |        |        |        | 22          | 22          | 22          | 22           |
| Row_Col         |        |        |        | 23          | 23          | 23          | 23           |

*Co-simulation results*

## Analyze the dumped traces.

1. You will see the Wave Viewer can't use with the  , click it and we search the **wave** in the **hls\_config.cfg**. And set the **cosim.wave\_debug**. Then change the **cosim.trace+level** from None to all.



*Change the setting*

2. Next click **C/TRL COSIMULATION > Run** again, the vivado will open.

3. You should click the **Run for 1μs** in the vivado.



*Full waveform showing iteration worth simulation*

4. View various part of the simulation and try to understand how the design works.

5. When done, close Vivado, click **ok** if prompted.

Export RTL and Implement

**In Vitis HLS, export the design, selecting VHDL as a language, and run the implementation by selecting Evaluate option.**

1. In Vitis HLS, select **Flow > IMPLEMENTATION** > wait for implementation finished. An Export RTL Dialog box will open.

```

Vitis Messages matrixmul:co-simulation x matrixmul:implementation x
988 #==== Final timing ====
989 CP required: 10.000
990 CP achieved post-synthesis: 3.884
991 CP achieved post-implementation: 3.884
992 Timing met
993 TIMESTAMP: HLS-REPORT: implementation end: 2025-02-11 16:39:11 +0800
994 INFO: HLS-REPORT: impl run complete: worst setup slack (WNS)=6.461103, worst hold slack (WHS)=0.100312, total pulse width slack(TPWS)=0.000000, number of
995 # hls_vivado_reports_finalize $report_options
996 TIMESTAMP: HLS-REPORT: all reports complete: 2025-02-11 16:39:11 +0800
997 INFO: [Common 17-286] Exiting Vivado at Tue Feb 11 16:39:11 2025...
998 INFO: [HLS 200-802] Generated output file matrixmul/matrixmul.zip
999 INFO: [vitis-run 68-791] Total elapsed time: 0h 5m 7s
1000 Implementation finished successfully
1001

```

*Run Implementation Done*

2. Click the **Flow > IMPLEMENTATION > REPORTS > RTL Synthesis** to see the report.



*RTL Synthesis report*

3. You can see the **xilinx\_com\_hls\_matrixmul\_1\_0.zip** on **impl > ip**.which can be added to the Vivado IP catalog.



The ip folder content

4. Close Vitis by selecting **File > Close Window**.

## Conclusion

In this lab, you completed the major steps of the high-level synthesis design flow using Vitis. You created a project, adding source files, synthesized the design, simulated the design, and implemented the design. You also learned how to use the Analysis capability to understand the scheduling and binding.

## Answers

**Answers for question 1:**

Estimated clock period: **6.816 ns**

Worst case latency: **24 clock cycles**

Number of DSP48E used: **2**

Number of FFs used: **66**

Number of LUTs used: **365**

Copyright© 2024, Advanced Micro Devices, Inc.