

# C-based Design: High-Level Synthesis with the Vivado HLS Tool Lab Workbook

dsp-hls-2016.1-wkb-lab-rev1



# C-based Design: High-Level Synthesis with the Vivado HLS Tool

## Lab Workbook 2016.1



© Copyright 2016 Xilinx, Inc. Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, UltraScale, Virtex, Vivado, Zynq, and other designated brands included herein are trademarks of Xilinx in the United States and other countries. All other trademarks are the property of their respective owners.

### DISCLAIMER

The information disclosed to you hereunder (the "Materials") is provided solely for the selection and use of Xilinx products. To the maximum extent permitted by applicable law: (1) Materials are made available "AS IS" and with all faults, Xilinx hereby DISCLAIMS ALL WARRANTIES AND CONDITIONS, EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, OR FITNESS FOR ANY PARTICULAR PURPOSE; and (2) Xilinx shall not be liable (whether in contract or tort, including negligence, or under any other theory of liability) for any loss or damage of any kind or nature related to, arising under, or in connection with, the Materials (including your use of the Materials), including for any direct, indirect, special, incidental, or consequential loss or damage (including loss of data, profits, goodwill, or any type of loss or damage suffered as a result of any action brought by a third party) even if such damage or loss was reasonably foreseeable or Xilinx had been advised of the possibility of the same. Xilinx assumes no obligation to correct any errors contained in the Materials or to notify you of updates to the Materials or to product specifications. You may not reproduce, modify, distribute, or publicly display the Materials without prior written consent. Certain products are subject to the terms and conditions of the Limited Warranties which can be viewed at <http://www.xilinx.com/warranty.htm>; IP cores may be subject to warranty and support terms contained in a license issued to you by Xilinx. Xilinx products are not designed or intended to be fail-safe or for use in any application requiring fail-safe performance; you assume sole risk and liability for use of Xilinx products in Critical Applications: <http://www.xilinx.com/warranty.htm#critapps>. All other trademarks are the property of their respective owners.



# Lab FAQ

- Where can I get the files for the labs?
  - [www.xilinx.com/training/downloads.htm](http://www.xilinx.com/training/downloads.htm)
  - These are original files and do not contain any work that you may have performed.
  - Labs were developed using version 2016.1 of the tools. Later versions may work and will likely require you to update various pieces of IP. See the "Updating IP" topic in the *Lab Reference Guide* for instructions on how to update IP (Vivado Design Suite Operations > Vivado IP Integrator Operations > Updating IP).
- What if I cannot answer a question in the lab?
  - Do your best! The questions are meant to stimulate thought, not to test your knowledge. After you have pondered a question you can find the answer at the end of each lab.
- Where can I get more detailed information on a topic?
  - The *Lab Reference Guide* is a collection of "how to" topics for commonly performed tasks categorized by tool (Vivado Design Suite, Vivado Analyzer, SDK, etc.) and subdivided into major areas within the tool.
  - The *Lab Reference Guide* is available from the lab files download as well as from [www.xilinx.com/training/downloads.htm](http://www.xilinx.com/training/downloads.htm).
- How do the instructions work?
  - The instructions are provided in layers.
    - Steps – these are the major/broadest aspects of solving the problem that the lab poses (X).
    - Instructions – these represent the significant instructions towards solving the issue outlined by the step (X-X).
    - Tasks – these are the finest granularity items that (when combined with the other tasks) solve the instruction they are subordinate to. Every task is a "clickable" event (X-X-X).



- What do KCU105, KC705, and KC7xx mean in the lab instructions and lab files?
  - There are two boards, the Kintex® UltraScale™ FPGA KCU105 evaluation board and the Kintex-7 FPGA KC705 evaluation board—each representing a different type of architecture.
    - The Kintex UltraScale FPGA KCU105 evaluation board represents the UltraScale architecture with the specific part number of **xc7ku040-ffva1156-2-e**.
    - The Kintex-7 FPGA KC705 evaluation board represents the 7 series architecture with the specific part number of **xc7k325tffg900-2**.
  - In addition, there is also a *fictitious* board KC7xx representing the 7 series architecture with a part number of **xc7k70tfg676-2**.
- The lab files are provided for both the UltraScale and 7 series architectures.
- The lab instructions have been written such that both UltraScale and 7 series architecture users can work through the lab.
- Follow the lab instructions based on the board that you choose to work on. Since it can be quite verbose to repeat board terminology throughout a lab, the following is used:
  - "KCU105" represents the Kintex UltraScale FPGA KCU105 evaluation board with the xc7ku040-ffva1156-2-e part.
  - "KC705" represents the Kintex-7 FPGA KC705 evaluation board with the xc7k325tffg900-2 part.
  - "KC7xx" represents the fictitious KC7xx board with the xc7k70tfg676-2 part.

## Table of Contents

---

|                                                                  |            |
|------------------------------------------------------------------|------------|
| <b>Lab 1: Introduction to the Vivado HLS Tool Flow .....</b>     | <b>3</b>   |
| <b>Lab 2: Introduction to the Vivado HLS Tool CLI Flow .....</b> | <b>27</b>  |
| <b>Lab 3: Interface Synthesis .....</b>                          | <b>39</b>  |
| <b>Lab 4: Improving Performance.....</b>                         | <b>51</b>  |
| <b>Lab 5: Implementing Arrays as RTL Interfaces .....</b>        | <b>75</b>  |
| <b>Lab 6: Improving Area and Resource Utilization .....</b>      | <b>99</b>  |
| <b>Lab 7: HLx Flow – System Integration .....</b>                | <b>141</b> |



# Lab 1: Introduction to the Vivado HLS Tool Flow

2016.1

## Abstract

This lab introduces how to perform basic actions using the Vivado® High-Level Synthesis (HLS) design flow.

This lab should take approximately 45 minutes.

## Objectives

After completing this lab, you will be able to:

- Create a new project in the Vivado HLS tool GUI
- Simulate a C design by using a self-checking testbench
- Synthesize the design
- Perform design analysis using the Analysis Perspective view
- Perform co-simulation on a generated RTL design by using a provided C testbench
- Implement the design

## Introduction

This lab provides an introduction to the major features of the Vivado® High-Level Synthesis (HLS) tool GUI flow. You will use the Vivado HLS tool in GUI mode to create a project. You will also simulate, synthesize, and implement the design provided.

In this lab, you will be using a C design to implement a discrete cosine transformation (DCT). The function implements a 2D DCT algorithm by first processing each row of the input array via a 1D DCT, then processing the columns of the resulting array through the same 1D DCT. It calls the *read\_data*, *dct\_2d*, and *write\_data* functions.

The *read\_data* function is defined at line 54 and consists of two loops: *RD\_Loop\_Row* and *RD\_Loop\_Col*. The *write\_data* function is defined at line 66 and consists of two loops to perform writing the result. The *dct\_2d* function, defined at line 23, calls the *dct\_1d* function and performs transpose.

Finally, the *dct\_1d* function, defined at line 4, uses *dct\_coeff\_table* and performs the required function by implementing a basic iterative form of the 1D Type-II DCT algorithm.

The following figure shows the function hierarchy on the left-hand side, the loops in the order they are executed, and the flow of data on the right-hand side.



Figure 1-1: Design Hierarchy and Dataflow

## General Flow



## Creating a Vivado HLS Tool Project

## Step 1

In this step, you will launch the Vivado HLS tool GUI and create a new project for the provided C-based discrete cosine transformation (DCT) design.

There are a number of ways to launch the Vivado HLS tool. The two most popular mechanisms are shown here.

### 1-1. Launch the Vivado HLS tool.

- 1-1-1. Select **Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado HLS > Vivado HLS 2016.1.**



**Figure 1-2: Launching the Vivado HLS Tool**

-- OR --

Double-click the **Vivado HLS** shortcut icon (  ) on the desktop.

The Vivado HLS tool opens to the Welcome window. From the Welcome window you can create a new project, open examples, and access documentation and examples.



**Figure 1-3: Vivado HLS Welcome Screen**

Here you will learn to create a new Vivado HLS project from scratch.

### 1-2. Create a Vivado HLS project named **dct\_prj**.

- 1-2-1. From the Welcome Page, click **Create New Project**.



Figure 1-4: Creating a New Vivado HLS Project

### 1-3. The Project Configuration dialog box asks for a project name and location.

- 1-3-1. Enter **dct\_prj** in the Project name field (1).
- 1-3-2. Enter **C:\training\hls\labs\hls\_tool\_flow\zc702\dct** in the Location field (2).



Figure 1-5: Configuring a New HLS Project

- 1-3-3. Click **Next** (3).

**1-4. The Add/Remove Files dialog box opens. Here you will be invited to add existing files or create new sources.**

**1-4-1. Click Add Files.**

The Open File dialog box opens.

**Note:** If do not have existing files at this moment and you want to create new ones, click **New File**.

**1-4-2. Browse to C:\training\hls\labs\hls\_tool\_flow\zc702\dct.**

**1-4-3. Select **dct.c**.**

The Vivado HLS tool automatically adds the working directory (project directory) and any directory that contains C files added to the project to the search path. Hence, header files that reside in these directories are automatically included in the project (no need to explicitly specify them). You must specify the path to all other header files (if any) by clicking the Edit CFLAGS button.

**1-4-4. Click **Open** to add these files.**

Note that you can add compiler directives specific to each entry at this point.

**1-4-5. Click **Browse** next to the Top Function field.**

The Select Top function dialog box opens, which lists all the functions available from the specified source files.

**1-4-6. Select **dct (dct.c)** from the list.**

**Note:** You can also manually enter the name of the top function in the Top Function field.



**Figure 1-6: Adding Files to a New Vivado HLS Project**

**1-4-7. Click **Next**.**

**1-5. Add any existing testbench files.**

If you have (or want) any testbench files they can be entered here.  
Sometimes the testbench is built into the synthesizable file.

**1-5-1.** Click **Add Files**.

**1-5-2.** Navigate to *C:\training\hls\labs\hls\_tool\_flow\zc702\dct*.

**1-5-3.** Select **dct\_test.c, in.dat and out.golden.dat**.

**1-5-4.** Click **Open** to add these files.

**1-5-5.** Click **Next**.

**1-6. Finally, it is time to specify some of the physical parameters of the design.**

**1-6-1.** By default, **solution1** is populated in the Solution Name field.

No changes are required.

**1-6-2.** Set the clock period to **10**.

You can leave the Uncertainty field blank.

**1-6-3.** Click the **Browse** button to select a part or board.



Figure 1-7: Locating the Board Browse Button

- 1-6-4.** Click **Boards** as shown below.  
**1-6-5.** Enter **zynq** in the Search field.



**Figure 1-8: Filtering to Quickly Locate Target Platforms**

- 1-6-6.** Select **Zynq ZC702 Evaluation Board** from the search list.  
**1-6-7.** Click **OK** to select the board.  
**1-6-8.** Click **Finish**.

You will see the created project in the Explorer tab.



**Figure 1-9: Vivado HLS with Newly Created Project**

The Vivado HLS tool GUI consists of various panes to proceed with the development work. You will see the created project in the Explorer view. Expand various sub-folders to see the entries under each sub-folder. The Source folder consists of source files associated with the project and the Test Bench folder consists of testbench files associated with the project.

## Running C Simulation

## Step 2

After creating the project, the next step is to validate the design before proceeding with synthesizing the design. In this step, you will validate the design by using the provided self-checking C testbench.

### 2-1. Become familiar with the provided source and testbench files.

**2-1-1.** In the Project Explorer pane, expand the **Source** folder.

**2-1-2.** Double-click **dct.c** to open the file.

This will open the source file in the Information pane.

**2-1-3.** Become familiar with the code and data structures.

**2-1-4.** In the Project Explorer pane, expand the **Includes > C:\training\hls\labs\hls\_tool\_flow\zc702\dct** folder.

**2-1-5.** Double-click **dct.h** to open the header file.

**2-1-6.** Review the contents of the header file.

**2-1-7.** In the Project Explorer pane, expand the **Test Bench** folder.

**2-1-8.** Double-click **dct\_test.c** to open it in the Information pane.

This testbench is a self-checking testbench; i.e., the computed output is compared against a reference golden output and returns either pass or fail.

### 2-2. Simulate the Vivado HLS tool design.

**2-2-1.** Select **Project > Run C Simulation** or click the **Run C Simulation** icon (▶).



Figure 1-10: Launching the C Simulation

The Run C Simulation dialog box opens.



**Figure 1-11: C Simulation Dialog Box**

Each of the options controls how simulation is run:

- Launch Debugger: After compilation the debug perspective automatically opens for you to step through the code.
- Build Only: Compile the object/netlist files but do not execute.
- Clean Build: Remove previously compiled files before compiling.
- Optimizing Compile: Use the gcc/g++ -O option (no debug info and this is mutually exclusive with debug options; this may run faster, but the difference is not substantial).

**2-2-2. Select Default Options (i.e. select nothing).**

**2-2-3. Click OK.**

The simulation log will be displayed in the editor pane.

## 2-3. View the simulation report.

**The information generated by the Vivado HLS tool can be found in two places, both described here.**

**The first is the Console window, which reports not only the output produced by the code being simulated, but all of the simulation engine messages as well. The simulation log provides only a few simulation engine messages and the simulated code output.**

- 2-3-1. Select the **Console** tab in the lower portion of the tool's GUI.

You may need to scroll to view all the output produced by the simulation.



```

Console X Errors Warnings
Vivado HLS Console
Compiling(apcc) ../../../../../../dct.c in debug mode
INFO: [HLS 200-10] Running 'c:/Xilinx/Vivado_HLS/2016.2/bin/unwrapped/win64.o/apcc.exe'
INFO: [HLS 200-10] For user 'jayesh' on host 'whd(jayesh)' (Windows NT_amd64 version 6.1) on Thu Jun 16 1:
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/hls_tool_flow/zc702/dct/dct_prj/solution1/csim/build'
INFO: [APCC 202-3] Tmp directory is apcc_db
INFO: [APCC 202-1] APCC is done.
Generating csim.exe
*** *** *** ***
Results are good
*** *** *** ***
INFO: [SIM 211-1] CSim done with 0 errors.
Finished C simulation.

```

**Figure 1-12: Example Output After a Simulation**

The other location, described below, provides only a few simulation engine messages and the simulated code output. Typically this is opened after the simulation completes; however, if you need to access it after closing the log pane, here's how to access the simulation report.

- 2-3-2. Expand **dct\_prj > solution1 > csim > report** in the Explorer pane.

- 2-3-3. Double-click the log file name to open it in the editor pane.



**Figure 1-13: Locating the Simulation Log File**

You should see the *Results are good* message in the simulation log file and in the console area. If you do not see this message, ask for help from your instructor.

## Synthesizing the Design

## Step 3

In this step, you will synthesize the design by using Vivado HLS tool defaults and analyze how many resources are utilized to implement the C design.

### 3-1. Synthesize the design.

- 3-1-1. Select **Solution > Run C Synthesis > Active Solution** or click the **Run Synthesis** icon in the menu bar.



Figure 1-14: Launching Synthesis

This option synthesizes the currently selected solution.

All solutions (or selected solutions) can be synthesized by using the drop-down menu next to the synthesis icon. You can synthesize all solutions or synthesize selected solutions in addition to the default.



Figure 1-15: Options for What to Synthesize

When the synthesis completes, the Synthesis report will be displayed in the Information pane.



**Figure 1-16: Synthesis Report**

The Synthesis report shows the performance and area estimates as well as estimated latency in the design.

- 3-1-2.** In the Outline pane, click **Performance Estimates** and **Utilization Estimates** to answer the following question.

### Question 1

Write down the following details from the Synthesis report:

- Estimated clock frequency:
- Worst case latency:
- Number of DSP48E used:
- Number of FFs used:
- Number of LUTs used:

- 3-1-3.** In the Outline pane, select **Interface > Summary**.

The report also shows the top-level interface signals generated by the tools.

| Interface         |     |      |            |               |              |
|-------------------|-----|------|------------|---------------|--------------|
| Summary           |     |      |            |               |              |
| RTL Ports         | Dir | Bits | Protocol   | Source Object | C Type       |
| ap_clk            | in  | 1    | ap_ctrl_hs | dct           | return value |
| ap_rst            | in  | 1    | ap_ctrl_hs | dct           | return value |
| ap_start          | in  | 1    | ap_ctrl_hs | dct           | return value |
| ap_done           | out | 1    | ap_ctrl_hs | dct           | return value |
| ap_idle           | out | 1    | ap_ctrl_hs | dct           | return value |
| ap_ready          | out | 1    | ap_ctrl_hs | dct           | return value |
| input_r_address0  | out | 6    | ap_memory  | input_r       | array        |
| input_r_ce0       | out | 1    | ap_memory  | input_r       | array        |
| input_r_q0        | in  | 16   | ap_memory  | input_r       | array        |
| output_r_address0 | out | 6    | ap_memory  | output_r      | array        |
| output_r_ce0      | out | 1    | ap_memory  | output_r      | array        |
| output_r_we0      | out | 1    | ap_memory  | output_r      | array        |
| output_r_d0       | out | 16   | ap_memory  | output_r      | array        |

**Figure 1-17: Generated Interface Signals**

You can see that *ap\_clk* and *ap\_rst* are automatically added. *ap\_start*, *ap\_done*, and *ap\_idle* are top-level signals used as handshaking signals to indicate when the design is able to accept the next computation command (*ap\_idle*), when the next computation is started (*ap\_start*), and when the computation is completed (*ap\_done*). Other signals are generated based on the design itself.

### 3-1-4. Select the **Console** tab.

The Synthesis log is available in the Vivado HLS Console.

Note that when the *Solution1 > Syn* folder is expanded in the Explorer view, it will show the *report*, *systemc*, *verilog*, and *vhdl* sub-folders under which the report files and generated source files (VHDL, Verilog, header, and cpp) are available. Double-clicking any of these entries will open the corresponding file in the Information pane.

Also note that the target design has hierarchical functions, and reports corresponding to lower-level functions are also created (in this example *dct\_dct\_1d2\_csynth.rpt* and *dct\_dct2d\_csynth.rpt* in addition to **dct\_csynth.rpt**). By default, the report for the top-level function is displayed in the Information pane once synthesis is completed.

## Analyzing the Design Using the Analysis Perspective

**Step 4**

The Analysis perspective is used after synthesis completes for analyzing the design in detail. This perspective provides considerably more details than the synthesis report.

### 4-1. Switch to the Analysis perspective and understand the design behavior.

#### 4-1-1. Select **Solution > Open Analysis Perspective** or click the **Analysis** perspective icon

( ) to open the analysis viewer.

The Analysis perspective consists of five panes as shown below. Note that the module and loops hierarchies are displayed unexpanded by default.

The Module Hierarchy pane provides an overview of the entire RTL design:

- This view can navigate throughout the design hierarchy.
- The Module Hierarchy pane shows the resources and latency contribution for each block in the RTL hierarchy.

The Module Hierarchy pane shows both the performance and area information for the entire design and can be used to navigate through the hierarchy. The Performance Profile pane is visible and shows the performance details for this level of hierarchy. The information in these two panes is similar to the information reviewed earlier in the synthesis report.

The Performance view [Performance(solution1)] is also shown in the right-hand side pane. This view shows how the operations in this particular block are scheduled into clock cycles.

- The left-hand column lists the resources.
- The top row lists the control states (C0 to C5) in the design. Control states are the internal states used by the Vivado HLS tool to schedule operations into clock cycles. There is a close correlation between the control states in the RTL FSM, but there is no one-to-one mapping.



Figure 1-18: Analysis Perspective Panes

## 4-2. Analyze the performance of the dct module.

- 4-2-1. Click the loop **RD\_Loop\_Row** and click the sub-loop **RD\_Loop\_Col**.
- 4-2-2. Similarly, click **WR\_Loop\_Row** and click the sub-loop **WR\_Loop\_Col**.



Figure 1-19: Performance of the dct Loop Operation

The dct module has three main resources:

- o A loop called **RD\_Loop\_Row**.
- o A sub-block called **dct\_2d**.
- o A loop called **WR\_Loop\_Row**.

The information presented in the Schedule view is explained below by reviewing the first set of resources to be execute: the **RD\_Loop\_Row** loop:

- o The design starts in the C0 state.
- o It then starts to execute the logic in the loop **RD\_Loop\_Row**.
  - **Note:** In the first state of the loop, the exit condition is checked and there is an add operation.
- o The loop executes over three states: C1, C2 and C3.

**4-2-3.** Review the Performance Profile pane information.



**Figure 1-20: Performance Profile Pane**

The Performance Profile pane shows this loop has a tripcount of eight. It therefore iterates around these three states eight times.

The Performance Profile pane shows that the loop **RD\_Loop\_Row** takes 144 clock cycles to execute:

- o One cycle at the start of the loop *RD\_Loop\_Row*.
- o It takes **16** clock cycles to execute all operations of the loop **RD\_Loop\_Col**.
- o Plus a clock cycle to return to the start of the loop **RD\_Loop\_Row** for a total of **18** cycles per loop iteration.
- o Eight iterations of 18 cycles is why it takes 144 clock cycles to complete.

Within the loop **RD\_Loop\_Col** you can see there are some adders, a two-cycle read operation and a write operation.

**4-2-4.** Select the purple color block for the tmp\_14(+) operation (line no. 11) as shown below.

**4-2-5.** Right-click and select **Goto Source**.

Alternatively, you can double-click the purple to open the source code.

**4-2-6.** Select **read\_data** at line no. 62 and click **OK**.



**Figure 1-21: C Source Code Correlation**

You can see that the write operation is implementing the writing of data into the `buf` array from the `input` array variable.

### 4-3. Analyze the resource usage for the design.

#### 4-3-1. Select the **Resource Profile** pane and the **Resource** tab as shown below.



**Figure 1-22: Analysis Perspective – Resource Usage**

The Resource Profile pane shows the resources used at this level of hierarchy. In this design, you can see that the most of the resources are due to the instances—blocks that are instantiated inside this block.

By expanding the **Expressions** category, you can see that most of the resources at this level of hierarchy are used to implement adders.

The Resource pane shows the control state of the operations used. In this design, all the adder operations are associated with a different adder resource. There is no sharing of the adders. More than one add operation on each horizontal line indicates that the same resource is used multiple times in different states or clock cycles.

The adders are used in the same cycles that are memory accessed and are dedicated to each memory. Cross correlation with the C code can be used to confirm.

#### 4-3-2. Click the **Synthesis** perspective icon ( ) to return to the synthesis view.

## Performing C/RTL Co-simulation

**Step 5**

Now that you have performed high-level synthesis on the C design, you will perform RTL co-simulation on the generated RTL using the C testbench.

Run C/RTL co-simulation, selecting Verilog and skipping VHDL. Verify that the simulation passes.

### 5-1. Cosimulate the Vivado HLS tool design.

- 5-1-1. Select **Solution > Run C/RTL Cosimulation** or click the **Run C/RTL Cosimulation** icon (checkbox).



Figure 1-23: Launching from the Menu

The Run C/RTL Co-simulation dialog box opens.



Figure 1-24: Co-simulation Dialog Box

Each of the options controls how C/RTL Cosimulation is run:

- RTL Selection: Select the RTL that is simulated (Verilog/VHDL).
- Setup Only: This creates all the files (wrappers, adapters and scripts) required to run the simulation but does not execute the simulator.
- Dump Trace: During RTL verification, the trace files can be saved and viewed using an appropriate viewer. By selecting this option, the trace file will be saved to the `<solution>/sim/<RTL>` folder.
- Optimizing Compile: This ensures a high level of optimization is used to compile the C testbench. Using this option increases the compile time but the simulation executes faster.

**5-1-2. Select **Default Options** (i.e. select nothing).**

**5-1-3. Click **OK**.**

The simulation log will be displayed in the editor pane.

**5-2. View the Cosimulation report.**

**The information generated by the Vivado HLS tool can be found in two places, both described here.**

**The first is the Console window, which reports not only the output produced by the code being simulated, but all of the simulation engine messages as well. The simulation log provides only a few simulation engine messages and the simulated code output.**

**5-2-1. Select the **Console** tab in the lower portion of the tool's GUI.**

You may need to scroll to view all the output produced by the cosimulation.



```

Vivado HLS Console
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
// RTL Simulation : 0 / 601 [0.00%] @ "125000"
// RTL Simulation : 601 / 601 [100.00%] @ "498965000"
$finish called at time : 499005 ns : File "C:/training/
## quit
INFO: [Common 17-206] Exiting xsim at ...
INFO: [COSIM 212-316] Starting C post checking ...
Comparing against output data
*****
PASS: The output matches the golden output!
*****
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
Finished C/RTL cosimulation.
|
```

**Figure 1-25: Example Output After a C/RTL Co-simulation**

The other location, described below, provides only a few simulation engine messages and the simulated code output. Typically this is opened after the simulation completes; however, if you need to access it after closing the log pane, here's how to access the simulation report.

**5-2-2.** Expand **dct\_prj > solution1 > sim > report** in the Explorer pane.

**5-2-3.** Double-click the log file name to open it in the editor pane.



Figure 1-26: Locating the Co-Simulation Log File

The Cosimulation Report in HTML format will be displayed in the main viewing area.



Figure 1-27: Cosimulation Report – HTML

**5-2-4.** You can quickly verify the cosimulation status here.

This process will take a few minutes to complete. After C/RTL cosimulation has been completed, the Cosimulation report will be accessible in the Information pane, including the latency information.

Also, in the Console tab, notice the *Results are Good* message that is displayed.

## Exporting the RTL as an IP Core

## Step 6

In this step you will export the RTL as an IP core to be used with the top-level design.

### 6-1. Export the RTL, selecting Verilog as the language.

- 6-1-1. Select **Solution > Export RTL**.

Alternatively, you can click the  toolbar button.

- 6-1-2. Ensure that **IP Catalog** is selected from the Format Selection drop-down list.



Figure 1-28: Export RTL Dialog Box

- 6-1-3. Click **Configuration** next to the Format Selection drop-down list.

Notice that you can provide information about the IP, such as the vendor, library, version, and description in the IP Identification dialog box.



Figure 1-29: IP Identification Dialog Box

- 6-1-4.** Click **Cancel** in the IP Identification dialog box.
- 6-1-5.** Make certain that the **Evaluate** option is selected and that **Verilog** is selected as the RTL to be evaluated.

This will perform Vivado RTL synthesis and implementation on the generated IP. Implementation is run to evaluate and provide confidence that the RTL will meet its estimated timing and area goals and that these results are not included as part of the exported package.

- 6-1-6.** Click **OK** in the Export RTL dialog box.

You can observe the progress in the Console tab. When the run is complete, the Implementation Report is displayed in the Information pane.



**Figure 1-30: Implementation Results in the Vivado HLS Tool**

Once implementation completes, the Implementation report will open in the Information pane. The final timing of the implemented design has been achieved.

The exported IP is available in the `<solution_directory>\impl\ip` folder.

- 6-1-7.** Select **File > Exit** to close the Vivado HLS tool.

## Summary

In this lab, you learned how to create a new Vivado HLS tool project in the GUI and execute major steps (simulate, synthesize, co-simulate and export) in the HLS Vivado Design Suite flow.

In the following labs, you will examine some of the software reports, determine how the design was implemented, and determine whether or not design goals for area and performance were met.

## Answers

1. Write down the following details from the Synthesis report:

Estimated clock frequency: 6.38  
Worst case latency: 3959  
Number of DSP48E used: 1  
Number of FFs used: 278  
Number of LUTs used: 353

# Lab 2: Introduction to the Vivado HLS Tool CLI Flow

2016.1

## Abstract

This lab introduces how to perform basic actions in the Vivado® HLS command prompt.

This lab should take approximately 30 minutes.

## Objectives

After completing this lab, you will be able to:

- Create a new project in the Vivado HLS tool CLI
- Simulate a C design by using a self-checking testbench
- Synthesize the design
- Simulate an RTL design by using a C testbench
- Implement the design

## Introduction

This lab provides an introduction to the major features of the Vivado® High-Level Synthesis (HLS) tool Command Line Interface (CLI) flow. You will use the Vivado HLS tool in CLI mode to create a project. You will also simulate, synthesize, and implement the design provided.

In this lab, you will be using a C design to implement a DCT. The function implements a 2D DCT algorithm by first processing each row of the input array via a 1D DCT, then processing the columns of the resulting array through the same 1D DCT. It calls the `read_data`, `dct_2d`, and `write_data` functions.

The `read_data` function is defined at line 54 and consists of two loops: `RD_Loop_Row` and `RD_Loop_Col`. The `write_data` function is defined at line 66 and consists of two loops to perform writing the result. The `dct_2d` function, defined at line 23, calls the `dct_1d` function and performs transpose.

Finally, the `dct_1d` function, defined at line 4, uses the `dct_coeff_table` and performs the required function by implementing a basic iterative form of the 1D Type-II DCT algorithm.

The following figure shows the function hierarchy on the left-hand side, the loops in the order they are executed, and the flow of data on the right-hand side.



**Figure 2-1: Design Hierarchy and Dataflow**

## General Flow



## Creating a Vivado HLS Tool Project

## Step 1

In this step, you will create a new Vivado HLS tool project, add source files, and provide solution settings for the default solution using the Vivado HLS tool command prompt. Later, you will open the created project in the Vivado HLS tool GUI to have a quick review of the settings were applied.

- 1-1. Launch the Vivado HLS tool command prompt and change the directory to the *hls\_cli\_flow* working directory.**
- 1-1-1.** Select **Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado HLS > Vivado HLS 2016.1 Command Prompt** to launch the Vivado HLS tool command prompt.



The screenshot shows a Windows Command Prompt window titled "C:\ Vivado HLS 2016.1 Command Prompt". The window displays the Vivado HLS Command Prompt interface, which includes a list of available commands: vivado\_hls, apcc, gcc, g++, make. It also shows the Microsoft Windows [Version 6.1.7601] copyright information and the current directory C:\Xilinx\Vivado\_HLS\2016.1>.

Figure 2-2: Vivado HLS Command Prompt

- 1-1-2.** Enter the following command to change the directory to the current working directory:

```
cd C:\training\hls\labs\hls_cli_flow\dct\kintex7
```



The screenshot shows the same Command Prompt window after the "cd" command was executed. The current directory is now C:\training\hls\labs\hls\_cli\_flow\dct\kintex7>, indicating the change in working directory.

Figure 2-3: Changing the Directory

**1-2. Write the commands to create a new project, associate source files, and configure solution settings.**

- 1-2-1.** Using Windows Explorer, browse to the `C:\training\hls\labs\hls_cli_flow\dct\kintex7` directory and open **dct.tcl** with your preferred text editor.

- 1-2-2.** Insert the following command at line no. 2 of *dct.tcl* to create a new project named *dct\_prj*:

```
open_project -reset dct_prj
```

**Question 1**

What does the `-reset` switch do in the previous command?

---

---

---

- 1-2-3.** Enter the following command at line no. 4 to specify *dct* as a top-level function to be synthesized in the design:

```
set_top dct
```

- 1-2-4.** Enter the following command to add *dct.c* as a design source file to the project:

```
add_files dct.c
```

**Question 2**

How does the above command differ from adding testbench files to the project?

---

---

---

- 1-2-5.** Enter the following commands to associate *dct\_test.c*, *in.dat*, and *out.golden.dat* as testbench source files to the project:

```
add_files -tb dct_test.c  
add_files -tb in.dat  
add_files -tb out.golden.dat
```

- 1-2-6.** Enter the following command at line no.12 to create the solution to the project named *solution1*:

```
open_solution solution1 -reset
```

- 1-2-7.** Insert the command at line no.14 to associate the xc7k70tfg484-2 device to the solution:

```
set_part {xc7k70tfg484-2}
```

- 1-2-8.** Insert the command at line no.16 to associate the clock with a 10ns period to solution1:

```
create_clock -period 10ns
```

After writing the above commands, the *dct.tcl* file should look like the figure below.



```
1 # Insert the command to create new project
2 open_project -reset dct_prj
3 # Insert the command to specify the top-level function
4 set_top dct
5 # Insert the command to add design file named dct.c
6 add_files dct.c
7 # Insert the command to add testbench files named dct_test.c, in.dat and out.golden.dat
8 add_files -tb dct_test.c
9 add_files -tb in.dat
10 add_files -tb out.golden.dat
11 # Insert the command to create the solution named solution1
12 open_solution solution1 -reset
13 # Insert the command to associate xc7k70tfg484-2 device to the solution1
14 set_part {xc7k70tfg484-2}
15 # Insert the command to associate clock with 10ns period to the solution1
16 create_clock -period 10ns
17 # Insert the command to run C simulation
18
19 # Insert the command to Synthesize the design
20
21 # Insert the command to perform C/RTL Cosimulation
22
23 # Insert the command to Export RTL as IP
```

**Figure 2-4: dct.tcl with Project Information**

- 1-2-9.** Save the **dct.tcl** file.

- 1-2-10.** Exit the text editor to close the Tcl file.

**1-3. Run the *dct.tcl* file to create the project and open the created project in the Vivado HLS tool GUI.**

- 1-3-1.** In the Vivado HLS tool command prompt, enter the following command to run the *dct.tcl* file:

```
vivado_hls -f dct.tcl
```

```
Vivado HLS 201 Command Prompt - vivado_hls -f dct.tcl
=====
== Vivado HLS Command Prompt
== Available commands:
== vivado_hls,apcc,gcc,g++,make
=====
Microsoft Windows [Version 6.1.7601]
Copyright <C> 2009 Microsoft Corporation. All rights reserved.

C:\Xilinx\Vivado_HLS\201>cd C:\training\hls\labs\hls_cli_flow\dct\kintex7
C:\training\hls\labs\hls_cli_flow\dct\kintex7>vivado_hls -f dct.tcl
=====
Vivado(TM) HLS - High-Level Synthesis from C, C++ and SystemC
Version 2016.1
Build on
Copyright <C> 1986-2016 Xilinx, Inc. All Rights Reserved.
=====
INFO: [HLS 200-10] Running 'C:/Xilinx/Vivado_HLS/201/bin/unwrapped/win64.o/vivado_hls.exe'
INFO: [HLS 200-10] For user '...' on host '...' (Windows NT_amd64 v
ersion 6.1) on
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/hls_cli_flow/dct/kintex7'
INFO: [HLS 200-10] Creating and opening project 'C:/training/hls/labs/hls_cli_f
low/dct/kintex7/dct_prj'.
INFO: [HLS 200-10] Adding design file 'dct.c' to the project
INFO: [HLS 200-10] Adding test bench file 'dct_test.c' to the project
INFO: [HLS 200-10] Adding test bench file 'in.dat' to the project
INFO: [HLS 200-10] Adding test bench file 'out.golden.dat' to the project
INFO: [HLS 200-10] Creating and opening solution 'C:/training/hls/labs/hls_cli_f
low/dct/kintex7/dct_prj/solution1'.
INFO: [HLS 200-10] Cleaning up the solution database.
INFO: [HLS 200-10] Setting target device to 'xc7k70tfbg484-2'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 10ns.
vivado_hls>
```

Figure 2-5: Creating the Project

- 1-3-2.** Enter the following command to launch the GUI with the recently created project:

```
vivado_hls -p dct_prj
```

```
Vivado HLS 201 Command Prompt - vivado_hls -f dct.tcl
vivado_hls> vivado_hls -p dct_prj
INFO: [HLS 200-10] Running 'C:/Xilinx/VIVADO/201/bin/unwrapped/win64.o/viva
do_hls.exe'
INFO: [HLS 200-10] For user '...' on host '...' (Windows NT_amd64 v
ersion 6.1) on
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/hls_cli_flow/dct/kintex7'
INFO: [HLS 200-10] Bringing up Vivado HLS GUI ...
```

Figure 2-6: Launching the GUI from the CLI

You should see the created project in the Explorer view.



Figure 2-7: Vivado HLS GUI with Recently Created Project

- 1-3-3.** In the Vivado HLS tool GUI, select **Project > Project Settings**.

The Project Settings dialog box opens.

- 1-3-4.** Select **Simulation** and **Synthesis** to ensure that the design and testbench sources were added to the project as entered in the Tcl file.

- 1-3-5.** Close the Project Settings dialog box once the source files are verified.

- 1-3-6.** In the Vivado HLS tool GUI, select **Solution > Solution Settings**.

The Solution Settings dialog box for the active solution, i.e. *Solution1*, the only solution that exists in this current project, opens.

- 1-3-7.** Select **Synthesis** to ensure that the clock frequency and devices settings were the ones described in the Tcl file.

- 1-3-8.** Close the Solution Settings dialog box once the clock frequency and device settings are verified.

- 1-3-9.** In the Vivado HLS tool GUI, select **File > Exit** to exit the GUI.

## Running C Simulation and Synthesizing the Design

## Step 2

In this step, you will run C simulation and synthesize the design from the Vivado HLS tool command prompt and then open the Vivado HLS tool GUI to review the project status and Synthesis report.

### 2-1. Simulate the design.

- 2-1-1.** In the Vivado HLS tool command prompt, enter the following command to run C simulation:

```
csim_design
```

This command compiles and runs pre-synthesis C simulation of the design using the provided C testbench.

```
vivado_hls> csim_design
  Compiling(apcc) ../../../../dct_test.c in debug mode
INFO: [HLS 200-10] Running 'c:/Xilinx/Vivado_HLS/201/bin/unwrapped/win64.o/apcc.exe'
INFO: [HLS 200-10] For user '...' on host '...' (Windows NT_amd64 version 6.1) on
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/hls_cli_flow/dct/kintex7/dct_prj/solution1/csim/build'
INFO: [APCC 202-31] Tmp directory is C:/tmp/apcc_db_.../97721738812564
INFO: [APCC 202-11] APCC is done.
  Compiling(apcc) ../../../../dct.c in debug mode
INFO: [HLS 200-10] Running 'c:/Xilinx/Vivado_HLS/201/bin/unwrapped/win64.o/apcc.exe'
INFO: [HLS 200-10] For user '...' on host '...' (Windows NT_amd64 version 6.1)
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/hls_cli_flow/dct/kintex7/dct_prj/solution1/csim/build'
INFO: [APCC 202-31] Tmp directory is C:/tmp/apcc_db_.../147921742536347
INFO: [APCC 202-11] APCC is done.
  Generating csim.exe
*** *** ***
Results are good
*** *** ***
INFO: [ISIM 211-1] CSim done with 0 errors.
vivado_hls> _
```

Figure 2-8: Running C Simulation from the Command Prompt

### 2-2. Synthesize the design.

- 2-2-1.** In the Vivado HLS tool command prompt, enter following command to synthesize the design:

```
csynth_design
```

This will elaborate and perform high-level synthesis on the source files added to the project. This also analyzes the sources files, validates the directives if they are present, and performs some initial code transformations.

```

INFO: [HLS 200-10] --
INFO: [HLS 200-10] -- Generating RTL for module 'dct_dct_2d'
INFO: [HLS 200-10] --
INFO: [RTGEN 206-100] Finished creating RTL model for 'dct_dct_2d'.
INFO: [HLS 200-111] Elapsed time: 0.181 seconds; current memory usage: 95.1 MB.
INFO: [HLS 200-10] --
INFO: [HLS 200-10] -- Generating RTL for module 'dct'
INFO: [HLS 200-10] --
INFO: [RTGEN 206-500] Setting interface mode on port 'dct/input_r' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on port 'dct/output_r' to 'ap_memory'.
INFO: [RTGEN 206-500] Setting interface mode on function 'dct' to 'ap_ctrl_hs'.
INFO: [RTGEN 206-100] Finished creating RTL model for 'dct'.
INFO: [HLS 200-111] Elapsed time: 0.241 seconds; current memory usage: 95.8 MB.
INFO: [RIMG 210-279] Implementing memory 'dct_dct_1d2_dct_coeff_table_rom' using distributed ROMs.
INFO: [RIMG 210-278] Implementing memory 'dct_dct_2d_row_outbuf_ram' using block RAMs.
INFO: [HLS 200-10] Finished generating all RTL models.
INFO: [SYSC 207-301] Generating SystemC RTL for dct.
INFO: [UHDL 208-304] Generating VHDL RTL for dct.
INFO: [VLOG 209-307] Generating Verilog RTL for dct.
vivado_hls>

```

Figure 2-9: Performing Synthesis from the Command Prompt

### 2-3. Verify the Synthesis report in the Vivado HLS tool GUI.

**2-3-1.** Enter the following command to launch the GUI:

```
vivado_hls -p dct_prj
```

**2-3-2.** If the Synthesis report does not open automatically in the Vivado HLS tool GUI, select **Solution > Open Report > Synthesis**.

**2-3-3.** Analyze the report file.

#### Question 3

Write down the following details from the Synthesis report:

- Estimated clock frequency:
- Worst case latency:
- Number of DSP48E used:
- Number of FFs used:
- Number of LUTs used:

**2-3-4.** In the Outline pane, select **Interface > Summary**.

The report also shows the top-level interface signals generated by the tools.

**2-3-5.** In the Vivado HLS tool GUI, select **File > Exit** to exit the GUI.

## Co-simulating and Exporting the RTL

## Step 3

In this step, you will perform C/RTL co-simulation on the generated RTL files by using the C testbench from the Vivado HLS tool command prompt. Also, you will generate the IP from the Vivado HLS tool command prompt.

### 3-1. Perform C/RTL co-simulation.

- 3-1-1.** In the Vivado HLS tool command prompt, enter the following command to execute post-synthesis co-simulation:

```
cosim_design
```

```
C:\ Vivado HLS 201  Command Prompt - vivado_hls -f dct.tcl
***** xsim v201  <64-bit>
**** SW Build  on
**** IP Build  on
** Copyright  Xilinx, Inc. All Rights Reserved.

source xsim.dir/dct/xsim_script.tcl
# xsim {dct} -autoloadwcfg -tclbatch {dct.tcl}
Vivado Simulator 201
Time resolution is 1 ps
source dct.tcl
## run all
///
// Inter-Transaction Progress: Completed Transaction / Total Transaction
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
///

// RTL Simulation : 0 / 1 [0.00%] @ "125000"
// RTL Simulation : 1 / 1 [100.00%] @ "29495000"
///

$finish called at time : 29535 ns : File "C:/training/hls/labs/hls_cli_flow/dct/kintex7/dct_prj/solution1/sim/verilog/dct.autotb.v" Line 264
## quit
INFO: [Common 17-206] Exiting xsim at ...
INFO: [COSIM 212-316] Starting C post checking ...
*** *** ***
Results are good
*** *** ***
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
vivado_hls> _
```

Figure 2-10: C/RTL Co-simulation

Notice that the message **\*\*\* C/RTL co-simulation finished: PASS \*\*\*** is displayed.

### 3-2. Export the design to IP.

- 3-2-1.** Enter the following command to perform an RTL implementation for Verilog HDL:

```
export_design -evaluate verilog -format ipxact
```

```

C:\ Vivado HLS 201  Command Prompt - vivado_hls -f dct.tcl
report_utilization: Time (s): cpu = 00:00:00 ; elapsed = 00:00:00.058 . Memory (MB): peak = 1012.711 ; gain = 0.000

Implementation tool: Xilinx Vivado v.201
Project:          dct_prj
Solution:         solution1
Device target:   xc7k70tfbg484-2
Report date:

===== Resource usage ===
SLICE:
LUT:
FF:
DSP:
BRAM:
SRL:
===== Final timing ===
CP required:
CP achieved:
Timing met
INFO: [Common 17-206] Exiting Vivado at ...
vivado_hls> _

```

**Figure 2-11: Exporting Synthesized RTL as an IP**

**Note:** The `-evaluate` option will perform Vivado RTL synthesis and implementation on the generated IP. Implementation is run to evaluate and provide confidence that the RTL will meet its estimated timing and area goals and that these results are not included as part of the exported package.

### 3-3. Verify the results in the Vivado HLS tool GUI.

- 3-3-1.** Enter the following command to launch the GUI:

```
vivado_hls -p dct_prj
```

- 3-3-2.** In the Vivado HLS tool GUI, select **Solution > Open Report > Co-simulation** to open the RTL Simulation report.  
**3-3-3.** Select **Solution > Open Report > Export RTL** to open the Export RTL report.  
**3-3-4.** Analyze the reports and close the Vivado HLS tool GUI.  
**3-3-5.** Enter the following command to close the Vivado HLS tool command prompt:

```
exit
```

## Summary

In this lab, you learned how to use the Vivado HLS tool command prompt to:

- Create the Vivado HLS tool project
- Create a solution with the desired settings
- Execute major actions in the Vivado HLS (simulate, synthesize, cosimulate and export the RTL of the design) design flow

## Answers

1. What does the –reset switch do in the previous command?

This option resets the project by removing any project data that already exists. Later if you wish to run the project again, this option helps reset the project data and creates a fresh one.

2. How does the above command differ from adding testbench files to the project?

To add the testbench source, the "-tb" switch should be added to the `add_files` command.

3. Write down the following details from the Synthesis report:

Estimated clock frequency: 7.48

Worst case latency: 2935

Number of DSP48E used: 1

Number of FFs used: 246

Number of LUTs used: 353

# Lab 3: Interface Synthesis

2016.1

## Abstract

This lab demonstrates how to implement I/O ports and protocols using high-level synthesis.

This lab should take approximately 45 minutes.

## Objectives

After completing this lab, you will be able to:

- Identify the default I/O protocol for ports
- Select the appropriate I/O protocol
- Add directives to select the interface type

## Introduction

This lab illustrates how RTL ports are added to a C design. Interface synthesis assigns a protocol to each of these ports. This synchronizes data transfer with the synthesized logic.

|                                       | Argument Type  | Scalar |        | Array |     |   | Pointer or Reference |     |   | HLS:: Stream |
|---------------------------------------|----------------|--------|--------|-------|-----|---|----------------------|-----|---|--------------|
|                                       |                | Input  | Return | I     | I/O | O | I                    | I/O | O |              |
| Block-Level Protocol                  | Interface Mode |        |        |       |     |   |                      |     |   |              |
|                                       | ap_ctrl_none   |        |        |       |     |   |                      |     |   |              |
|                                       | ap_ctrl_hs     |        | D      |       |     |   |                      |     |   |              |
|                                       | ap_ctrl_chain  |        |        |       |     |   |                      |     |   |              |
| AXI Interface Protocol                | axis           |        |        |       |     |   |                      |     |   |              |
|                                       | s_axilite      |        |        |       |     |   |                      |     |   |              |
|                                       | m_axi          |        |        |       |     |   |                      |     |   |              |
| No I/O Protocol                       | ap_none        | D      |        |       |     |   | D                    |     |   |              |
|                                       | ap_stable      |        |        |       |     |   |                      |     |   |              |
| Wire Handshake Protocol               | ap_ack         |        |        |       |     |   |                      |     |   |              |
|                                       | ap_vld         |        |        |       |     |   |                      | D   |   |              |
|                                       | ap_ovid        |        |        |       |     |   |                      | D   |   |              |
|                                       | ap_hs          |        |        |       |     |   |                      |     |   |              |
| Memory Interface Protocol: RAM : FIFO | ap_memory      |        |        | D     | D   | D |                      |     |   |              |
|                                       | bram           |        |        |       |     |   |                      |     |   |              |
|                                       | ap_fifo        |        |        |       |     |   |                      |     |   | D            |
| Bus Protocol                          | ap_bus         |        |        |       |     |   |                      |     |   |              |



Figure 3-1: I/O Protocol Types

## Block-Level Interface Protocol

By default, a block-level interface protocol is added to the design. The Vivado® HLS tool uses the interface types *ap\_ctrl\_none*, *ap\_ctrl\_hs*, and *ap\_ctrl\_chain* to specify whether the RTL is implemented with block-level handshake signals. Block-level handshake signals specify the following:

- When the design can start to perform the operation
- When the operation ends
- When the design is idle and ready for new inputs

These signals control the block independently of any port-level I/O protocols. These ports control when the block can start processing data (*ap\_start*), indicate when it is ready to accept new inputs (*ap\_ready*), and indicate if the design is idle (*ap\_idle*) or has completed operation (*ap\_done*).

You can specify these block-level I/O protocols on the function or the function return. If the C code does not return a value, you can still specify the block-level I/O protocol on the function return. The Vivado HLS tool creates an output port *ap\_return* for the return value.

## Port-Level Interface Protocol

The final group of signals are the data ports. The I/O protocol that is created depends on the type of C argument and on the default. A complete list of all possible I/O protocols is shown in the table.

After the block-level protocol has been used to start the operation of the block, the port-level I/O protocols are used to sequence data into and out of the block.

## General Flow



## Creating a Vivado HLS Tool Project

## Step 1

In this step, you will create a new Vivado® HLS tool project, add source files, and provide solution settings for the default solution using the Vivado HLS tool command prompt.

### 1-1. Launch the Vivado HLS tool command prompt and change the directory to the *interface\_synthesis* working directory.

- 1-1-1. Select **Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado HLS > Vivado HLS 2016.1 Command Prompt** to launch the Vivado HLS tool command prompt.



Figure 3-2: Vivado HLS Command Prompt

- 1-1-2. Enter the following command to change the directory to the current working directory:

```
cd C:\training\hls\labs\interface_synthesis
```

### 1-2. Review and run the *run\_interface\_syn\_hls.tcl* file to create the project and open the created project in the Vivado HLS tool GUI.

- 1-2-1. Using any of the text editors available under the C:\training\hls\labs\interface\_synthesis directory, open the *run\_interface\_syn\_hls.tcl* file.

#### 1-2-2. Review the following sections:

- o Project name (*adders\_io\_prj*)
- o Source files (*adders\_io.c*) and testbench (*adders\_io\_test.c*) added to the project
- o Creating a solution (*solution1*)
- o Selecting the Xilinx device (*xc7k160tfg484-1*) and clock period (2)
- o Running the C simulation

- 1-2-3. Close the *interface\_syn\_hls.tcl* file.

- 1-2-4.** In the Vivado HLS tool command prompt, enter the following command to run the `run_interface_syn_hls.tcl` file:

```
vivado_hls -f run_interface_syn_hls.tcl
```

```

C:\ Vivado HLS 20... Command Prompt
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/interface_synthesis'
INFO: [HLS 200-10] Creating and opening project 'C:/training/hls/labs/interface_
synthesis/adders_io_prj'.
INFO: [HLS 200-10] Adding design file 'adders_io.c' to the project
INFO: [HLS 200-10] Adding test bench file 'adders_io_test.c' to the project
INFO: [HLS 200-10] Creating and opening solution 'C:/training/hls/labs/interface
_synthesis/adders_io_prj/solution1'.
INFO: [HLS 200-10] Cleaning up the solution database.
INFO: [HLS 200-10] Setting target device to 'xc7k160tbg484-1'
INFO: [SYN 201-201] Setting up clock 'default' with a period of 2ns.
  Compiling(apcc) ../../../../adders_io_test.c in debug mode
INFO: [HLS 200-10] Running 'c:/Xilinx/Vivado_HLS/20.../bin/unwrapped/win64.o/ap
cc.exe'
INFO: [HLS 200-10] For user '...' on host '...' <Windows NT_amd64 v
ersion 6.1> on
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/interface_synthesis/adders
_io_prj/solution1/csim/build'
INFO: [APCC 202-3] Tmp directory is C:/tmp/apcc_db_.../139203448706017
INFO: [APCC 202-1] APCC is done.
  Compiling(apcc) ../../../../adders_io.c in debug mode
INFO: [HLS 200-10] Running 'c:/Xilinx/Vivado_HLS/20.../bin/unwrapped/win64.o/ap
cc.exe'
INFO: [HLS 200-10] For user '...' on host '...' <Windows NT_amd64 v
ersion 6.1> on
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/interface_synthesis/adders
_io_prj/solution1/csim/build'
INFO: [APCC 202-3] Tmp directory is C:/tmp/apcc_db_.../173763451355849
INFO: [APCC 202-1] APCC is done.
  Generating csim.exe
  10+20+30=60
  20+30+40=90
  30+40+50=120
  40+50+60=150
  50+60+70=180
-----Pass!-----
INFO: [SIM 211-1] CSim done with 0 errors.
C:\training\hls\labs\interface_synthesis>

```

Figure 3-3: Creating the Project and Running the C Simulation

**Note:** Observe that the C simulation passed.

- 1-2-5.** Enter the following command to open the Vivado HLS tool project:

```
vivado_hls -p adders_io_prj
```

The Vivado HLS tool GUI opens.

- 1-2-6.** Expand **adders\_io\_prj > Source** in the Explorer pane.

- 1-2-7.** Double-click the **adders\_io.c** file to open it.



**Figure 3-4: Opening the Source Code**

- 1-2-8.** Review the source file.

The source code for this lab is relatively simple to help you focus on the interface behavior and not the core logic.

The code does not have a function return, but instead passes the output of the function through the pointer argument `*in_out1`. This also provides you the opportunity to explore the interface options for bidirectional (input and output) ports.

## Verifying the Default I/O Protocols

## Step 2

In this step, you will verify the default I/O protocols generated by the tool.

The types of I/O protocol that you can add to C function arguments by interface synthesis depends on the argument type.

The pointer argument in this example is both an input and output to the function. In the RTL design, this argument is implemented as separate input and output ports.

The possible options for each function argument are described below.

| Function Argument         | I/O Protocol Options                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <i>in1</i> and <i>in2</i> | <p>These are pass-by-value arguments that can be implemented with the following I/O protocols:</p> <ul style="list-style-type: none"> <li>• <i>ap_none</i>: No I/O protocol. This is the default for inputs.</li> <li>• <i>ap_stable</i>: No I/O protocol.</li> <li>• <i>ap_ack</i>: Implemented with an associated output acknowledge port.</li> <li>• <i>ap_vld</i>: Implemented with an associated input valid port.</li> <li>• <i>ap_hs</i>: Implemented with both input valid and output acknowledge ports.</li> </ul>                                                                                                                                                                                                                                                                                                                                                  |
| <i>in_out1</i>            | <p>This is a pass-by-reference output that can be implemented with the following I/O protocols:</p> <ul style="list-style-type: none"> <li>• <i>ap_none</i>: No I/O protocol. This is the default for inputs.</li> <li>• <i>ap_stable</i>: No I/O protocol.</li> <li>• <i>ap_ack</i>: Implemented with an associated input acknowledge port.</li> <li>• <i>ap_vld</i>: Implemented with an associated output valid port. This is the default for outputs.</li> <li>• <i>ap_ovld</i>: Implemented with an associated output valid port (no valid port for the input part of any inout ports).</li> <li>• <i>ap_hs</i>: Implemented with both input valid port and output acknowledge ports</li> <li>• <i>ap_fifo</i>: A FIFO interface with associated output write and input FIFO full ports.</li> <li>• <i>ap_bus</i>: A Vivado HLS tool bus interface protocol.</li> </ul> |

## 2-1. Synthesize the design.

- 2-1-1. Select **Solution > Run C Synthesis > Active Solution** or click the **Run Synthesis** icon in the menu bar.



Figure 3-5: Launching Synthesis

This option synthesizes the currently selected solution.

All solutions (or selected solutions) can be synthesized by using the drop-down menu next to the synthesis icon. You can synthesize all solutions or synthesize selected solutions in addition to the default.



Figure 3-6: Options for What to Synthesize

## 2-2. View the Synthesis report.

You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.

- 2-2-1.** In the Outline pane, select **Interface** to review the Interface Summary.



**Figure 3-7: Interface Summary Report**

The Interface Summary shows how the arguments in the C source are by default synthesized into RTL RAM ports:

- The design has a *clock*, *reset*, and the default block-level I/O protocol *ap\_ctrl\_hs*.
- Input data (*in1* and *in2*) with the port-level I/O protocol *ap\_none*
- Input and output ports (*in\_out1\_i* and *in\_out1\_o*) with the port-Level I/O protocol *ap\_ovld*
  - *ap\_ovld* is a subset of the *ap\_hs* interface type. The *ap\_ovld* port-level I/O protocol provides the following signals:
    - Data port
    - Valid signal to indicate when data is read
    - For input arguments and the input half of inout arguments, the design defaults to type *ap\_none*.
    - For output arguments and the output half of inout arguments, the design implements type *ap\_vld*.



Figure 3-8: Interface Summary Report - Block-Level and Port-Level I/O Protocols

## Specifying the Port-Level I/O Protocols

**Step 3**

### 3-1. Specify the I/O protocols for the ports *in1*, *in2*, and *in\_out1*.

- 3-1-1. Ensure that the **adders\_io.c** file is opened.
- 3-1-2. In the Directive tab, right-click **in1** and select **Insert Directive**.
- 3-1-3. Select **INTERFACE** from the Directive drop-down list.
- 3-1-4. Select **ap\_vld** from the mode (optional) drop-down list.
- 3-1-5. Click **OK**.

Figure 3-9: Selecting the Interface Type as **ap\_vld** for the *in1* Port

- 3-1-6.** Similarly, select **in2** and select **INTERFACE** from the Directive list and **ap\_ack** from the mode (optional) drop-down list.
- 3-1-7.** Similarly, select **in\_out1** and select **INTERFACE** from the Directive list and **ap\_hs** from the mode (optional) drop-down list.

The final view of the INTERFACE directives of the ports in the Directive tab should look similar to the figure below.



Figure 3-10: Final View of the Directive Tab

- 3-1-8.** In the Explorer pane, expand the **adders\_io\_prj > solution1 > Constraints** folder.  
**3-1-9.** Double-click the **directives.tcl** file to open it.

You should see that the directives added through the GUI are now in the *directives.tcl* file.



Figure 3-11: Opening the *directives.tcl* File

### 3-2. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

### 3-3. View the Synthesis report.

You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.

- 3-3-1.** In the Outline pane, select **Interface** to review the Interface Summary:
- The design has a clock (*ap\_clk*) and reset (*ap\_rst*).
  - The default block-level I/O protocol signals (*ap\_start*, *ap\_done*, *ap\_idle* and *ap\_ready*) are present.
  - The port *in1* is implemented with a data port and an associated input valid signal.
  - The data on port *in1* is only read when the port *in1\_ap\_vld* is active High.
  - The port *in2* is implemented with a data port and an associated output acknowledge signal.
  - The port *in2\_ap\_ack* will be active High when the data port *in2* is read.
  - *inout\_i* identifies the input part of the argument *inout1*. This has the associated input valid port *inout1\_i\_ap\_vld* and the output acknowledge port *inout1\_i\_ap\_ack*.
  - The output part of the argument *inout1* is identified as *inout\_o*. This has the associated output valid port *inout1\_o\_ap\_vld* and the input acknowledge port *inout1\_o\_ap\_ack*.



Figure 3-12: Interface Summary Report After Applying INTERFACE Directives

- 3-3-2.** Select **File > Exit** to close the Vivado HLS tool.

- 3-3-3.** Close the Vivado HLS Command Prompt.

## Summary

In this lab, you learned how to apply directives to implement different types of port-Level I/O protocols. You also reviewed the Interface Summary to see the types of port-level I/O protocols that were implemented.

## Answers

Since there were no questions in this lab, this section is intentionally left blank.

# Lab 4: Improving Performance

2016.1

## Abstract

This lab describes how to improve design performance.

This lab should take approximately 45 minutes.

## Objectives

After completing this lab, you will be able to:

- Identify the effect of applying the INLINE directive
- Improve performance by using the PIPELINE directive
- Distinguish between the DATAFLOW directive and the Configuration Command feature

## Introduction

This lab introduces various techniques and directives that can be used in the Vivado® HLS tool to improve design performance. The design under consideration in the lab accepts an image in a custom RGB format, converts it to the YUV color space, applies a filter to the YUV image, and converts it back to RGB.

## General Flow



## Creating a Vivado HLS Tool Project

## Step 1

In this step, you will create a new Vivado HLS tool project, add source files, and provide solution settings for the default solution using the Vivado HLS tool command prompt.

**1-1. Launch the Vivado HLS tool command prompt and change the directory to the *improve\_performance* working directory.**

- 1-1-1.** Select **Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado HLS > Vivado HLS 2016.1 Command Prompt** to launch the Vivado HLS tool command prompt.
- 1-1-2.** Enter the following command to change the directory to the current working directory:  
`cd C:\training\hls\labs\improve_performance`

**1-2. Review and run the *run\_yuv\_filter.tcl* file to create the project and open the created project in the Vivado HLS tool GUI.**

- 1-2-1.** Using any of the text editors available under the `C:\training\hls\labs\improve_performance` directory, open the ***run\_yuv\_filter.tcl*** file.
- 1-2-2.** Review the following sections:
  - Project name (*yuv\_filter\_prj*)
  - Source files (*yuv\_filterc*) and testbench (*yuv\_filter\_test.c*, *image\_aux.c*, and *output.golden.dat*) added to the project
  - Creating a solution (*solution1*)
  - Selecting the Xilinx device (*xc7z020clg484-1*) and clock period (*10*)
  - Running the C simulation
  - Synthesizing the design
- 1-2-3.** Close the ***run\_yuv\_filter.tcl*** file.
- 1-2-4.** In the Vivado HLS tool command prompt, enter the following command to run the ***run\_yuv\_filter.tcl*** file:  
`vivado_hls -f run_yuv_filter.tcl`

```

ca Vivado HLS 2018.1 Command Prompt

malloc' with type 'void *unsigned long long'
image_t *yuv = <image_t *>malloc<sizeof(image_t)>;
../../../../yuv_filter.c:14:30: note: please include the header <stdlib.h> or explicitly provide a declaration for 'malloc'
1 warning generated.
../../../../yuv_filter.c:14:30: warning: implicitly declaring library function 'malloc' with type 'void *unsigned long long'
image_t *yuv = <image_t *>malloc<sizeof(image_t)>;
../../../../yuv_filter.c:14:30: note: please include the header <stdlib.h> or explicitly provide a declaration for 'malloc'
1 warning generated.
INFO: [APCC 202-31] Tmp directory is C:/tmp/apcc_db_15912633542336
INFO: [APCC 202-11] APCC is done.

Generating csim.exe
Test passed!
INFO: [CSIM 211-11] CSim done with 0 errors.
INFO: [HLS 200-10] Analyzing design file 'yuv_filter.c' ...
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-10] Starting code transformations ...
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [XFORM 203-602] Inlining function 'yuv_scale' into 'yuv_filter' <yuv_filter.c:24> automatically.
INFO: [XFORM 203-401] Performing if-conversion on hyperblock from <yuv_filter.c:92:33> to <yuv_filter.c:92:27> in function 'yuv2rgb'... converting 7 basic blocks.
INFO: [XFORM 203-111] Balancing expressions in function 'rgb2yuv' <yuv_filter.c:30>..11 expression(s) balanced.
INFO: [HLS 200-111] Elapsed time: 15.804 seconds; current memory usage: 92.2 MB.

INFO: [HLS 200-10] Synthesizing 'yuv_filter' ...
INFO: [HLS 200-10]
INFO: [HLS 200-10] --
INFO: [HLS 200-10] -- Scheduling module 'yuv_filter_rgb2yuv'.
INFO: [HLS 200-10] --

```

Figure 4-1: Viewing the C Simulation Results

**Note:** Observe that the C simulation passed and that the design starts synthesizing.

- 1-2-5.** Enter the following command to open the Vivado HLS tool project:

```
vivado_hls -p yuv_filter_prj
```

The Vivado HLS tool GUI opens.

## Analyzing the Project and Results

## Step 2

- 2-1. Open the source file and review the three functions. Look at the results and observe that the latencies are undefined (represented by "?").**

- 2-1-1.** Expand **yuv\_filter\_prj > Source** in the Project Explorer pane.  
**2-1-2.** Double-click the **yuv\_filter.c** file to open it and review the design.

The design has three functions: `rgb2yuv`, `yuv_scale`, and `yuv2rgb`.

Each of these filter functions iterates over the entire source image (which has maximum dimensions specified in `image_aux.h`), requiring a single-source pixel to produce a pixel in the resulting image. The `image_aux.h` file can be accessed by expanding the **yuv\_filter\_prj > Includes > C:/training/hls/labs/improve\_performance** folder in the Project Explorer pane.

The scale function simply applies individual scale factors, supplied as top-level arguments to the YUV components.

Notice that most of the variables are of user-defined (typedef) and aggregate (e.g. structure, array) types.

Also notice that the original source used `malloc()` to dynamically allocate storage for the internal image buffers. While appropriate for such large data structures in software, `malloc()` is not synthesizable and is not supported by the Vivado HLS tool.

A viable workaround is conditionally compiled into the code, leveraging the `_SYNTHESIS_` macro. The Vivado HLS tool automatically defines the `_SYNTHESIS_` macro when reading any code. This ensure that the original `malloc()` code is used outside of synthesis but the Vivado HLS tool will use the workaround when synthesizing.

- 2-1-3. Expand the **yuv\_filter\_prj > solution1 > syn > report** folder in the Project Explorer pane.

- 2-1-4. Double-click the **yuv\_filter\_csynth.rpt** file to view the Synthesis report.



Figure 4-2: Viewing the Synthesis Report – Latency Computation

Each of the loops in this design has variable bounds—the width and height are defined by members of input type `image_t`. When variables bounds are present on loops, the total latency of the loops cannot be determined. This impacts the ability to perform analysis using reports. Hence, "?" is reported for various latencies.

## Applying the TRIPCOUNT Directive

## Step 3

To assist in providing loop-latency estimates, the Vivado HLS tool provides a TRIPCOUNT directive that allows limits on the variable bounds to be specified by the user. In this design, such directives have been embedded in the source code in the form of #pragma statements.

In this step, you will open the source file and uncomment the pragma lines, resynthesize, and observe the resources used as well as the estimated latencies.

### 3-1. Uncomment the pragma lines to add the TRIPCOUNT directive.

- 3-1-1. Expand **yuv\_filter\_prj > Source** in the Project Explorer pane.
- 3-1-2. Double-click the **yuv\_filter.c** file to open it.
- 3-1-3. Uncomment the #pragma lines (50, 53, 90, 93, 130, 133) to define the loop bounds.
- 3-1-4. Press <**Ctrl + S**> to save the file.

### 3-2. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

### 3-3. View the Synthesis report.

- 3-3-1. View the Synthesis report in the Information window.



Figure 4-3: Latency Computation After Applying the TRIPCOUNT Pragma

## Question 1

Looking at the report, fill in the table below.

|                        |  |
|------------------------|--|
| Estimated clock period |  |
| Worst case latency     |  |
| Number of DSP48E used  |  |
| Number of BRAMs used   |  |
| Number of FFs used     |  |
| Number of LUTs used    |  |

### Performance and Utilization Estimates After the TRIPCOUNT Directive

- 3-3-2.** In the Console tab, observe that the `yuv_scale` function is automatically inlined into the `yuv_filter` function.

```
Vivado HLS Console
Console Errors Warnings
Vivado HLS Console
INFO: [HLS 200-10] Adding test bench file 'test_data' to the project
INFO: [HLS 200-10] Adding test bench file 'yuv_filter_test.c' to the project
INFO: [HLS 200-10] Adding test bench file 'image_aux.c' to the project
INFO: [HLS 200-10] Opening solution 'C:/training/hls/labs/improve_performance/yuv_filter_prj/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 10ns.
INFO: [HLS 200-10] Setting target device to 'xc7z020clg484-1'
INFO: [HLS 200-10] Analyzing design file 'yuv_filter.c' ...
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-10] Starting code transformations ...
INFO: [HLS 200-10] Checking synthesizability
INFO: [XFORM 203-602] Inlining function 'yuv_scale' into 'yuv_filter' (yuv_filter.c:24) automatically.
INFO: [XFORM 203-401] Performing ir-conversion on hyperblock from (yuv_filter.c:92:33) to (yuv_filter.c:92:27) in
INFO: [XFORM 203-11] Balancing expressions in function 'rgb2yuv' (yuv_filter.c:30)...11 expression(s) balanced.
INFO: [HLS 200-111] Elapsed time: 5.942 seconds; current memory usage: 92.2 MB.
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'yuv_filter' ...
INFO: [HLS 200-10] -
```

Figure 4-4: Observing the Vivado HLS Tool Automatically Inlining the Function `yuv_scale`

Observe that there are three synthesis reports under the `syn > report` folder in the Explorer view:

- o `yuv_filter (yuv_filter_csynth.rpt)`
- o `rgb2yuv (yuv_filter_rgb2yuv_csynth.rpt)`
- o `yuv2rgb (yuv_filter_yuv2rgb_csynth.rpt)`

There is no entry for `yuv_scale` since this function was inlined into the `yuv_filter` function.



**Figure 4-5: Viewing the Synthesis Report (After Applying the TRIPCOUNT Pragma)**

### 3-3-3. Expand the **Summary** of the Loop latency.

| Loop Name          | min    | max      | Iteration Latency | achieved | target | Trip Count | Pipelined |
|--------------------|--------|----------|-------------------|----------|--------|------------|-----------|
| - YUV_SCALE_LOOP_X | 280400 | 17207040 | 1402 ~ 8962       | -        | -      | 200 ~ 1920 | no        |
| + YUV_SCALE_LOOP_Y | 1400   | 8960     | 7                 | -        | -      | 200 ~ 1280 | no        |

**Figure 4-6: Viewing the Details of the Loop Latency**

Note the latency and trip count numbers for the `yuv_scale` function.

Also note that the YUV\_SCALE\_LOOP\_Y loop latency is 7X the specified TRIPCOUNT, implying that seven cycles are used for each of the iterations of the loop.

You can verify this by opening an analysis perspective view, expanding the YUV\_SCALE\_LOOP\_X entry, and then expanding the YUV\_SCALE\_Y entry.

### 3-4. Switch to the Analysis perspective and understand the design behavior.

- 3-4-1.** Select **Solution > Open Analysis Perspective** or click the **Analysis** perspective icon

( ) to open the analysis viewer.

- 3-4-2.** Expand **YUV\_SCALE\_LOOP\_X** and **YUV\_SCALE\_LOOP\_Y**.



Figure 4-7: Design Analysis View of the YUV\_SCALE\_LOOP\_Y Loop

- 3-4-3.** Double-click **yuv\_filter\_rgb2yuv\_csynth.rpt** under the **yuv\_filter\_prj > solution1 > syn > report** folder.
- 3-4-4.** In the Synthesis report, under the Utilization Estimates section, note the resources used for the **rgb2yuv** function.

## Question 2

Looking at the report, fill in the table below.

|                        |  |
|------------------------|--|
| Estimated clock period |  |
| Worst case latency     |  |
| Number of DSP48E used  |  |
| Number of BRAMs used   |  |
| Number of FFs used     |  |
| Number of LUTs used    |  |

### Performance and Utilization Estimates for the *rgb2yuv* Function

- 3-4-5. Similarly, open **yuv\_filter\_yuv2rgb\_csynth.rpt** under the `yuv_filter_prj > solution1 > syn > report` folder.
- 3-4-6. In the Synthesis report, under the Utilization Estimates section, note the resources used for the `yuv2rgb` function.

## Question 3

Looking at the report, fill in the table below

|                        |  |
|------------------------|--|
| Estimated clock period |  |
| Worst case latency     |  |
| Number of DSP48E used  |  |
| Number of BRAMs used   |  |
| Number of FFs used     |  |
| Number of LUTs used    |  |

### Performance and Utilization Estimates for the *yuv2rgb* Function

For the `rgb2yuv` function, the worst-case latency is reported as 19664641 clock cycles.

The reported latency can be estimated as follows:

- o  $\text{RGB2YUV\_LOOP\_Y}$  total loop latency =  $8 \times 1280 = 10240$  cycles
- o One entry and one exit clock for loop  $\text{RGB2YUV\_LOOP\_Y} = 10242$  cycles

- o RGB2YUV\_LOOP\_X loop body latency = 10242 cycles
- o RGB2YUV\_LOOP\_X total loop latency =  $10242 \times 1920 = 19664640$  cycles

## Turning Off INLINE and Applying the PIPELINE Directive

## Step 4

In this step, you will create a new solution by copying the previous solution settings. You will then configure the new solution by turning off the automatic INLINE directive and applying PIPELINE directive.

### 4-1. Create a new solution by copying the previous solution (solution1) settings.

4-1-1. Select **Project > New Solution** or click the **New Solution** icon (  ) from the toolbar.

4-1-2. Leave the options at their default settings.



Figure 4-8: Creating a New Solution (solution2)

4-1-3. Click **Finish**.

### 4-2. Turn off the INLINE directive to the function *yuv\_scale*.

4-2-1. Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.

4-2-2. Ensure that the **yuv\_filter.c** file is open and active in the Information pane.

4-2-3. Select **yuv\_filter** in the Directive tab.

**4-2-4.** In the Directive tab, right-click the **yuv\_filter** function and select **Insert Directive**.

**4-2-5.** Select **INLINE** from the Directive drop-down list.

**4-2-6.** Select the **off (optional)** option.



Figure 4-9: Turning Off the INLINE Function

**4-2-7.** Click **OK**.

When an object (function or loop) is pipelined, all the loops below it, down through the hierarchy, will be automatically unrolled.

In order for a loop to be unrolled, it must have fixed bounds: All the loops in this design have variable bounds, defined by an input argument variable to the top-level function.

Note that the **TRIPCOUNT** directive on the loops only influences reporting, it does not set bounds for synthesis.

Neither the top-level function nor any of the sub-functions are pipelined in this example.

The **PIPELINE** directive must be applied to the inner-most loop in each function. The innermost loops have no variable-bounded loops inside of them which are required to be unrolled and the outer loop will simply keep the inner loop fed with data.

#### 4-3. Apply the PIPELINE directive to the YUV\_SCALE\_LOOP\_Y loop.

**4-3-1.** Expand **yuv\_scale** in the Directive tab.

**4-3-2.** In the Directive tab, right-click the **YUV\_SCALE\_LOOP\_Y** function and select **Insert Directive**.

- 4-3-3.** Select **PIPELINE** from the Directive drop-down list.
- 4-3-4.** Leave the **II (optional)** field blank as the Vivado HLS tool will try for an  $II=1$ , one new input every clock cycle.
- 4-3-5.** Click **OK**.



Figure 4-10: Applying the PIPELINE Directive for the YUV\_SCALE\_LOOP\_Y Loop

- 4-3-6.** Similarly, apply the PIPELINE directive to the **YUV2RGB\_LOOP\_Y** and **RGB2YUV\_LOOP\_Y** loops.

The final view of PIPELINE directives in the Directive tab should look similar to the figure below.



Figure 4-11: Final View of PIPELINE Directives

#### 4-4. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

#### 4-5. View the Synthesis report.

**You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.**

- 4-5-1. Select **Project > Compare Reports** or click the **Compare Reports** icon () to compare the results of the two solutions (*solution1* and *solution2*).
- 4-5-2. Select **solution1** and **solution2** from the Available solutions section.
- 4-5-3. Click **Add**.
- 4-5-4. Click **OK**.



Figure 4-12: Selecting **solution1** and **solution2**

You should see the compared results of *solution1* and *solution2* as shown in the figure below.

| All Compared Solutions                                          |           |           |           |
|-----------------------------------------------------------------|-----------|-----------|-----------|
| <a href="#">solution2: xc7z020clg484-1</a>                      |           |           |           |
| <a href="#">solution1: xc7z020clg484-1</a>                      |           |           |           |
| Performance Estimates                                           |           |           |           |
| <input type="checkbox"/> <a href="#">Timing (ns)</a>            |           |           |           |
| Clock                                                           |           | solution2 | solution1 |
| ap_clk                                                          | Target    | 10.00     | 10.00     |
|                                                                 | Estimated | 11.19     | 11.19     |
| <input type="checkbox"/> <a href="#">Latency (clock cycles)</a> |           |           |           |
|                                                                 |           | solution2 | solution1 |
| Latency                                                         | min       | 120028    | 921205    |
|                                                                 | max       | 7372828   | 56536325  |
| Interval                                                        | min       | 120029    | 921206    |
|                                                                 | max       | 7372829   | 56536326  |

Figure 4-13: Performance Comparison After Pipelining

Observe that the latency reduced from 56536325 to 7372828 clock cycles.

In *solution1*, the total loop latency of the innermost loop was **loop\_body\_latency × loop iteration count**, whereas in *solution2* the new total loop latency of the innermost loop was **loop\_body\_latency + loop iteration count**.

- 4-5-5.** Scroll down in the comparison report to view the resource utilization.

Observe that the flip-flops, LUTs, and DSP48E utilization increased, whereas block RAM remained the same.

| Utilization Estimates |           |           |
|-----------------------|-----------|-----------|
|                       | solution2 | solution1 |
| BRAM_18K              | 12288     | 12288     |
| DSP48E                | 15        | 12        |
| FF                    | 911       | 716       |
| LUT                   | 1145      | 760       |

Figure 4-14: Comparison of Resource Estimates

## Applying the DATAFLOW Directive and Configuration Command

### Step 5

In this step, you will create a new solution by copying the previous solution settings. You will then configure the new solution by applying the DATAFLOW directive.

#### 5-1. Create a new solution by copying the previous solution (solution2) settings.

- 5-1-1. Select **Project > New Solution** or click **New Solution** icon () from the toolbar.
- 5-1-2. Leave the options at their default settings.



Figure 4-15: Creating a New Solution (solution3)

- 5-1-3. Click **Finish**.

#### 5-2. Apply the DATAFLOW directive to the *YUV\_filter* function.

- 5-2-1. Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.
- 5-2-2. Ensure that the **yuv\_filter.c** file is open and active in the Information pane.
- 5-2-3. Select **yuv\_filter** in the Directive tab.
- 5-2-4. In the Directive tab, right-click the **yuv\_filter** function and select **Insert Directive**.
- 5-2-5. Select **DATAFLOW** from the Directive drop-down list.
- 5-2-6. Click **OK**.

### 5-3. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

### 5-4. View the Synthesis report.

**You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.**

- 5-4-1.** Observe that additional information, "Type dataflow", in the Performance Estimates section is now indicated.

| Performance Estimates  |         |           |             |          |
|------------------------|---------|-----------|-------------|----------|
| Timing (ns)            |         |           |             |          |
| Summary                |         |           |             |          |
| Clock                  | Target  | Estimated | Uncertainty |          |
| ap_clk                 | 10.00   | 11.19     | 1.25        |          |
| Latency (clock cycles) |         |           |             |          |
| Summary                |         |           |             |          |
| Latency                |         | Interval  |             |          |
| min                    | max     | min       | max         | Type     |
| 120027                 | 7372827 | 40010     | 2457610     | dataflow |
| Detail                 |         |           |             |          |

**Figure 4-16: Performance Estimate After Applying the DATAFLOW Directive**

The Dataflow pipeline throughput indicates the number of clock cycles between each set of input reads. If this throughput value is less than the design latency, it indicates that the design can start processing new inputs before the current input data is output.

While the overall latencies have not changed significantly, the dataflow throughput is showing that the design can achieve close to the theoretical limit ( $1920 \times 1280 = 2457600$ ) of processing one pixel every clock cycle.

- 5-4-2.** Scroll down in the report to view the resource utilization.

Observe that the number of block RAM required has been doubled. This is due to the default dataflow **ping-pong** buffering.

| Utilization Estimates |          |        |        |       |
|-----------------------|----------|--------|--------|-------|
| Summary               |          |        |        |       |
| Name                  | BRAM_18K | DSP48E | FF     | LUT   |
| DSP                   | -        | -      | -      | -     |
| Expression            | -        | -      | 0      | 2     |
| FIFO                  | 0        | -      | 35     | 172   |
| Instance              | -        | 15     | 837    | 1103  |
| Memory                | 24576    | -      | 192    | 0     |
| Multiplexer           | -        | -      | -      | 20    |
| Register              | -        | -      | 10     | -     |
| Total                 | 24576    | 15     | 1074   | 1297  |
| Available             | 280      | 220    | 106400 | 53200 |
| Utilization (%)       | 8777     | 6      | 1      | 2     |

Figure 4-17: Resource Estimate with DATAFLOW Directive

When DATAFLOW optimization is performed, memory buffers are automatically inserted between the functions to ensure that the next function can begin operation before the previous function has finished. The default memory buffers are ping-pong buffers sized to fully accommodate the largest producer or consumer array.

The Vivado HLS tool allows the memory buffers to be the default ping-pong buffers or FIFOs. Since this design has data accesses that are fully sequential, FIFOs can be used. Another advantage to using FIFOs is that the size of the FIFOs can be directly controlled (not possible in ping-pong buffers where random accesses are allowed).

The memory buffers type can be selected by using the Vivado HLS tool Configuration command.

**5-5. Apply the DATAFLOW configuration command and select the FIFO for the memory buffers.**

- 5-5-1. Select **Solution > Solution Settings** or click the **Solution Settings** icon (  ) to access the configuration command settings.
- 5-5-2. In the Configuration Settings dialog box, select General and click **Add**.
- 5-5-3. Select **config\_dataflow** from the Command drop-down list.
- 5-5-4. Select the **fifo** from the default\_channel drop-down list.
- 5-5-5. Enter **2** in the fifo\_depth field and click **OK**.
- 5-5-6. Click **OK**.



Figure 4-18: Selecting the Dataflow Configuration Command and FIFO as the Buffer

**5-6. Synthesize the Vivado HLS design.**

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

### 5-7. View the Synthesis report.

You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.

Note that the performance parameter has not changed; however, resource estimates show that the design is not using any block RAM, and other resource usage (flip-flops, LUTs) has also reduced.

| Utilization Estimates |          |        |        |       |
|-----------------------|----------|--------|--------|-------|
| Summary               |          |        |        |       |
| Name                  | BRAM_18K | DSP48E | FF     | LUT   |
| DSP                   | -        | -      | -      | -     |
| Expression            | -        | -      | -      | -     |
| FIFO                  | 0        | -      | 65     | 292   |
| Instance              | -        | 15     | 692    | 844   |
| Memory                | -        | -      | -      | -     |
| Multiplexer           | -        | -      | -      | -     |
| Register              | -        | -      | 2      | -     |
| Total                 | 0        | 15     | 759    | 1136  |
| Available             | 280      | 220    | 106400 | 53200 |
| Utilization (%)       | 0        | 6      | ~0     | 2     |

Figure 4-19: Resource Estimation After Dataflow Configuration Command

### 5-8. Compare the performance and resource estimates of all the solutions.

- 5-8-1. Select **Project > Compare Reports** or click the **Compare Reports** icon () to compare the results of all three solutions.
- 5-8-2. Select **solution1**, **solution2**, and **solution3** from the Available solutions section.
- 5-8-3. Click **Add**.

**5-8-4.** Click **OK**.

| Performance Estimates         |           |           |           |           |
|-------------------------------|-----------|-----------|-----------|-----------|
| <b>Timing (ns)</b>            |           |           |           |           |
| Clock                         |           | solution3 | solution1 | solution2 |
| ap_clk                        | Target    | 10.00     | 10.00     | 10.00     |
|                               | Estimated | 11.36     | 11.19     | 11.19     |
| <b>Latency (clock cycles)</b> |           |           |           |           |
|                               |           | solution3 | solution1 | solution2 |
| Latency                       | min       | 40015     | 921205    | 120028    |
|                               | max       | 2457615   | 56536325  | 7372828   |
| Interval                      | min       | 40008     | 921206    | 120029    |
|                               | max       | 2457608   | 56536326  | 7372829   |
| Utilization Estimates         |           |           |           |           |
|                               |           | solution3 | solution1 | solution2 |
| BRAM_18K                      | 0         | 12288     | 12288     |           |
| DSP48E                        | 15        | 12        | 15        |           |
| FF                            | 759       | 716       | 911       |           |
| LUT                           | 1136      | 760       | 1145      |           |

**Figure 4-20:** Comparison of Performance and Resource Estimates for All Solutions**Exporting and Implementing the Design****Step 6**

In this step, you will export the RTL as an IP core to be used with the top-level design.

**6-1. Export the RTL, selecting Verilog as the language.****6-1-1.** Select **Solution > Export RTL**.

Alternatively, you can click the  toolbar button.

- 6-1-2.** Ensure that **IP Catalog** is selected from the Format Selection drop-down list.



**Figure 4-21:** Export RTL Dialog Box

- 6-1-3.** Make certain that the **Evaluate** option is selected and that **Verilog** is selected as the RTL to be evaluated.

This will perform Vivado RTL synthesis and implementation on the generated IP. Implementation is run to evaluate and provide confidence that the RTL will meet its estimated timing and area goals and that these results are not included as part of the exported package.

- 6-1-4.** Click **OK** in the Export RTL dialog box.

You can observe the progress in the Console tab. When the run is complete, the Implementation report is displayed in the Information pane.



**Figure 4-22:** Export Report

- 6-1-5.** Select **File > Exit** to close the Vivado HLS tool.

## Summary

In this lab, you learned that even though a design could not be pipelined at the top level, a strategy of pipelining the individual loops and then using dataflow optimization to make the functions operate in parallel was able to achieve the same high throughput, processing one pixel per clock.

When the DATAFLOW directive is applied, the default memory buffers (of the ping-pong type) are automatically inserted between the functions. Using the fact that the design used only sequential (streaming) data accesses allowed the costly memory buffers associated with dataflow optimization to be replaced with simple two element FIFOs via the Dataflow command configuration.

## Answers

1. Looking at the report, fill in the table below.

|                        |          |
|------------------------|----------|
| Estimated clock period | 11.19    |
| Worst case latency     | 56536325 |
| Number of DSP48E used  | 12       |
| Number of BRAMs used   | 12288    |
| Number of FFs used     | 716      |
| Number of LUTs used    | 760      |

### Performance and Utilization Estimates After the TRIPCOUNT Directive

2. Looking at the report, fill in the table below.

|                        |          |
|------------------------|----------|
| Estimated clock period | 8.34     |
| Worst case latency     | 19664641 |
| Number of DSP48E used  | 5        |
| Number of FFs used     | 198      |
| Number of LUTs used    | 253      |

### Performance and Utilization Estimates for the *rgb2yuv* Function

3. Looking at the report, fill in the table below.

|                        |          |
|------------------------|----------|
| Estimated clock period | 11.19    |
| Worst case latency     | 19664641 |
| Number of DSP48E used  | 4        |
| Number of FFs used     | 203      |
| Number of LUTs used    | 238      |

### Performance and Utilization Estimates for the *yuv2rgb* Function



# Lab 5: Implementing Arrays as RTL Interfaces

2016.1

## Abstract

This lab demonstrates how to implement arrays as RTL interfaces.

This lab should take approximately 45 minutes.

## Objectives

After completing this lab, you will be able to:

- Identify the default I/O protocol for ports
- Select the dual-port RAM and FIFO interfaces for input and output arrays
- Partition an array interface into an arbitrary number of ports
- Partition an array interface into individual ports

## Introduction

This lab demonstrates how array arguments on the top-level function interface can be implemented as a number of different types of RTL ports.

|                                       | Argument Type | Scalar         |       | Array  |   |     | Pointer or Reference |   |     | HLS:: Stream |
|---------------------------------------|---------------|----------------|-------|--------|---|-----|----------------------|---|-----|--------------|
|                                       |               | Interface Mode | Input | Return | I | I/O | O                    | I | I/O | O            |
| Block-Level Protocol                  | ap_ctrl_none  |                |       |        |   |     |                      |   |     |              |
| AXI Interface Protocol                | ap_ctrl_hs    |                | D     |        |   |     |                      |   |     |              |
| No I/O Protocol                       | ap_ctrl_chain |                |       |        |   |     |                      |   |     |              |
| Wire Handshake Protocol               | axis          |                |       |        |   |     |                      |   |     |              |
| Memory Interface Protocol: RAM : FIFO | s_axilite     |                |       |        |   |     |                      |   |     |              |
| Bus Protocol                          | m_axi         |                |       |        |   |     |                      |   |     |              |
|                                       | ap_none       | D              |       |        |   |     | D                    |   |     |              |
|                                       | ap_stable     |                |       |        |   |     |                      |   |     |              |
|                                       | ap_ack        |                |       |        |   |     |                      |   |     |              |
|                                       | ap_vld        |                |       |        |   |     |                      | D |     |              |
|                                       | ap_ovld       |                |       |        |   |     | D                    |   |     |              |
|                                       | ap_hs         |                |       |        |   |     |                      |   |     |              |
|                                       | ap_memory     |                |       | D      | D | D   |                      |   |     |              |
|                                       | bram          |                |       |        |   |     |                      |   |     |              |
|                                       | ap_fifo       |                |       |        |   |     |                      |   | D   |              |
|                                       | ap_bus        |                |       |        |   |     |                      |   |     |              |



Figure 5-1: I/O Protocol Types

Array arguments are implemented by default as an *ap\_memory* interface. This is a standard block RAM interface with data, address, chip-enable, and write-enable ports. An *ap\_memory* interface can be implemented as a single-port of dual-port interface.

If the Vivado® HLS tool can determine that using a dual-port interface will reduce the initial interval, it will automatically implement a dual-port interface. The RESOURCE directive is used to specify the memory resource and if this directive is specified on the array with a single-port block RAM, a single-port interface will be implemented.

Conversely, if a dual-port interface is specified using the RESOURCE directive and the Vivado HLS tool determines that this interface provides no benefit, it will automatically implement a single-port interface. The *bram* interface mode is functionally identical to the *ap\_memory* interface.

The only difference is how the ports are implemented when the design is used in the Vivado IP integrator:

- An *ap\_memory* interface is displayed as multiple and separate ports.
- A *bram* interface is displayed as a single grouped port that can be connected to a Xilinx block RAM using a single point-to-point connection.

If the array is accessed in a sequential manner an *ap\_fifo* interface can be used. As with the *ap\_hs* interface, the Vivado HLS tool will halt if it determines that the data access is not sequential, report a warning if it cannot determine if the access is sequential, or issue no message if it determines the access is sequential. The *ap\_fifo* interface can only be used for reading or writing, not both.

## General Flow



## Creating a Vivado HLS Tool Project

## Step 1

In this step, you will create a new Vivado® HLS tool project, add source files, and provide solution settings for the default solution using the Vivado HLS tool command prompt.

### 1-1. Launch the Vivado HLS tool command prompt and change the directory to the *array\_interfaces* working directory.

- 1-1-1. Select **Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado HLS > Vivado HLS 2016.1 Command Prompt** to launch the Vivado HLS tool command prompt.



Figure 5-2: Vivado HLS Command Prompt

- 1-1-2. Enter the following command to change the directory to the current working directory:

```
cd C:\training\hls\labs\array_interfaces
```

### 1-2. Review and run the *run\_hls.tcl* file to create the project and open the created project in the Vivado HLS tool GUI.

- 1-2-1. Using any of the text editors available under the C:\training\hls\labs\array\_interfaces directory, open the *run\_hls.tcl* file.

#### 1-2-2. Review the following sections:

- o Project name (*array\_io\_prj*)
- o Source files (*array\_io.c*) and testbench (*array\_io\_test.c* and *result.golden.dat*) added to the project
- o Creating a solution (*solution1*)
- o Selecting the Xilinx device (*xc7k160tfg484-1*) and clock period (4)
- o Running the C simulation

- 1-2-3. Close the *run\_hls.tcl* file.

- 1-2-4.** In the Vivado HLS tool command prompt, enter the following command to run the `run_hls.tcl` file:

```
vivado_hls -f run_hls.tcl
```

```
C:\Xilinx\Vivado_HLS\2016.1>cd C:\training\hls\labs\array_interfaces
C:\training\hls\labs\array_interfaces>vivado_hls -f run_hls.tcl
=====
Vivado(TM) HLS - High-Level Synthesis from C, C++ and SystemC
Version 2016.1
Build on Mon Jun 13 13:44:11 +0530 2016
Copyright (C) Xilinx, Inc. All Rights Reserved.
=====
INFO: [HLS 200-10] Running 'C:/Xilinx/Vivado_HLS/2016.1/bin/unwrapped/win64.o/vivado_hls.exe'
INFO: [HLS 200-10] For user '...' on host '...' (Windows NT_amd64 version 6.1) on Mon Jun 13 13:44:11 +0530 2016
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/array_interfaces'
INFO: [HLS 200-10] Creating and opening project 'C:/training/hls/labs/array_interfaces/array_io_prj'.
INFO: [HLS 200-10] Adding design file 'array_io.c' to the project
INFO: [HLS 200-10] Adding test bench file 'array_io_test.c' to the project
INFO: [HLS 200-10] Adding test bench file 'result_golden.dat' to the project
INFO: [HLS 200-10] Creating and opening solution 'C:/training/hls/labs/array_interfaces/array_io_prj/solution1'.
INFO: [HLS 200-10] Cleaning up the solution database.
INFO: [HLS 200-10] Setting target device to 'xc7k160tfg484-1'
INFO: [SYN 201-201] Setting up clock 'default' with a period of 4ns.
Compiling(apcc) ../../../../array_io_test.c in debug mode
INFO: [HLS 200-10] Running 'c:/Xilinx/Vivado_HLS/2016.1/bin/unwrapped/win64.o/apcc.exe'
INFO: [HLS 200-10] For user '...' on host '...' (Windows NT_amd64 version 6.1) on
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/array_interfaces/array_io_prj/solution1/csim/build'
INFO: [APCC 202-3] Tmp directory is C:/tmp/apcc_db_.../111922036443895
INFO: [APCC 202-1] APCC is done.
Compiling(apcc) ../../../../array_io.c in debug mode
INFO: [HLS 200-10] Running 'c:/Xilinx/Vivado_HLS/2016.1/bin/unwrapped/win64.o/apcc.exe'
INFO: [HLS 200-10] For user '...' on host '...' (Windows NT_amd64 version 6.1) on
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/array_interfaces/array_io_prj/solution1/csim/build'
INFO: [APCC 202-3] Tmp directory is C:/tmp/apcc_db_.../84202039631898
INFO: [APCC 202-1] APCC is done.
Generating csim.exe
Test passed!
INFO: [ISIM 211-1] CSim done with 0 errors.
C:\training\hls\labs\array_interfaces>
```

**Figure 5-3: Creating the Project and Running the C Simulation**

- 1-2-5.** Enter the following command to open the Vivado HLS tool project:

```
vivado_hls -p array_io_prj
```

The Vivado HLS tool GUI opens.

- 1-2-6.** Expand **array\_io\_prj > Source** in the Project Explorer pane.

**1-2-7.** Double-click the **array\_io.c** file to open it.



```
#include "array_io.h"

// The data comes in organized in a single array.
// - The first sample for the first channel (CHAN)
// - Then the first sample for the 2nd channel etc.
// The channels are accumulated independently
// E.g. For 8 channels:
// Array Order : 0 1 2 3 4 5 6 7 8 9 10 etc
// Sample Order: A0 B0 C0 D0 E0 F0 G0 H0 A1 B1 C2
// Output Order: A0 B0 C0 D0 E0 F0 G0 H0 A0+A1 B0+B1 C0+C2

void array_io (dout_t d_o[N], din_t d_i[N]) {
    int i, rem;

    // Store accumulated data
    static dacc_t acc[CHANNELS];
    dacc_t temp;

    // Accumulate each channel
    For_Loop: for (i=0;i<N;i++) {
        rem=i%CHANNELS;
        temp = acc[rem] + d_i[i];
        acc[rem] = temp;
        d_o[i] = acc[rem];
    }
}
```

**Figure 5-4:** Opening the Source File

**1-2-8.** Review the source file.

This design has an input array and an output array. The data in the input array is ordered as channels and on how the channels are accumulated.

You can review the test bench and the input and output data in the `result.golden.dat` file.

## Synthesizing the Design

## Step 2

In this step, you will synthesize the design by using Vivado HLS tool defaults and review how the array ports are synthesized to RAM ports.

### 2-1. Synthesize the design.

- 2-1-1. Select **Solution > Run C Synthesis > Active Solution** or click the **Run Synthesis** icon in the menu bar.



Figure 5-5: Launching Synthesis

This option synthesizes the currently selected solution.

All solutions (or selected solutions) can be synthesized by using the drop-down menu next to the synthesis icon. You can synthesize all solutions or synthesize selected solutions in addition to the default.



Figure 5-6: Options for What to Synthesize

When the synthesis completes, the Synthesis report will be displayed in the Information pane.



**Figure 5-7: Synthesis Report**

The Synthesis report shows the performance, area estimates, and interface summary in the design.

- 2-1-2.** In the Outline pane, select **Interface** to review the Interface Summary (shown in the above figure).

| Interface    |     |      |            |               |              |
|--------------|-----|------|------------|---------------|--------------|
| Summary      |     |      |            |               |              |
| RTL Ports    | Dir | Bits | Protocol   | Source Object | C Type       |
| ap_clk       | in  | 1    | ap_ctrl_hs | array_io      | return value |
| ap_rst       | in  | 1    | ap_ctrl_hs | array_io      | return value |
| ap_start     | in  | 1    | ap_ctrl_hs | array_io      | return value |
| ap_done      | out | 1    | ap_ctrl_hs | array_io      | return value |
| ap_idle      | out | 1    | ap_ctrl_hs | array_io      | return value |
| ap_ready     | out | 1    | ap_ctrl_hs | array_io      | return value |
| d_o_address0 | out | 5    | ap_memory  | d_o           | array        |
| d_o_ce0      | out | 1    | ap_memory  | d_o           | array        |
| d_o_we0      | out | 1    | ap_memory  | d_o           | array        |
| d_o_d0       | out | 16   | ap_memory  | d_o           | array        |
| d_i_address0 | out | 5    | ap_memory  | d_i           | array        |
| d_i_ce0      | out | 1    | ap_memory  | d_i           | array        |
| d_i_q0       | in  | 16   | ap_memory  | d_i           | array        |

**Figure 5-8: Interface Summary Report**

The Interface Summary shows how array arguments in the C source are by default synthesized into RTL RAM ports:

- o The design has a *clock*, *reset*, and the default block-level I/O protocol *ap\_ctrl\_hs*.
- o The *d\_o* argument has been synthesized to a RAM port (I/O protocol *ap\_memory*).
- o A data port (*d\_o\_d0*).
- o An address port (*d\_o\_address0*).
- o Control ports of a chip-enable (*d\_o\_ce0*) and a write-enable port (*d\_o\_we0*).
- o The *d\_i* argument has been synthesized to a similar RAM interface, but has an input port (*d\_i\_q0*) and no write-enable port because this interface only reads data.

The figure shows a C code snippet on the left and its corresponding Interface Summary table on the right.

**C Code Snippet:**

```

void array_io (dout_t d_o[N], din_t d_i[N]) {
    int i, rem;

    // Store accumulated data
    static dacc_t acc[CHANNELS];
    dacc_t temp;

    // Accumulate each channel
    For_Loop: for (i=0;i<N;i++) {
        rem=i%CHANNELS;
        temp = acc[rem] + d_i[i];
        acc[rem] = temp;
        d_o[i] = acc[rem]; Port-Level I/O Protocol
    }                                Memory Interface Protocol - RAM
}

```

**Interface Summary Table:**

| Interface    |     |      |            |               |              |
|--------------|-----|------|------------|---------------|--------------|
| Summary      |     |      |            |               |              |
| RTL Ports    | Dir | Bits | Protocol   | Source Object | C Type       |
| ap_clk       | in  | 1    | ap_ctrl_hs | array_io      | return value |
| ap_rst       | in  | 1    | ap_ctrl_hs | array_io      | return value |
| ap_start     | in  | 1    | ap_ctrl_hs | array_io      | return value |
| ap_done      | out | 1    | ap_ctrl_hs | array_io      | return value |
| ap_idle      | out | 1    | ap_ctrl_hs | array_io      | return value |
| ap_ready     | out | 1    | ap_ctrl_hs | array_io      | return value |
| d_o_address0 | out | 5    | ap_memory  | d_o           | array        |
| d_o_ce0      | out | 1    | ap_memory  | d_o           | array        |
| d_o_we0      | out | 1    | ap_memory  | d_o           | array        |
| d_o_d0       | out | 16   | ap_memory  | d_o           | array        |
| d_i_address0 | out | 5    | ap_memory  | d_i           | array        |
| d_i_ce0      | out | 1    | ap_memory  | d_i           | array        |
| d_i_q0       | in  | 16   | ap_memory  | d_i           | array        |

Figure 5-9: Memory Interface Protocol – RAM

## Question 1

What is the width of the data ports and address ports?

---



---



---

## Using Dual-Port RAM and FIFO Interfaces

**Step 3**

High-level synthesis allows you to specify a RAM interface as a single port or dual port. If you do not make such a selection, the Vivado HLS tool automatically analyzes the design and selects the number of ports to maximize the data rate.

When loops are *rolled*, synthesis creates the logic for one iteration of the loop, and the RTL design executes this logic for each iteration of the loop in sequence. Using optimization directives, you can *unroll* loops, which allows all iterations to occur in parallel.

By default (as shown in the previous step), the tool created a single-port RAM interface because the *for* loop in the source code is by default left **rolled**.

Each iteration of the loop is executed in turn:

- Read the input port ( $d_i$ ).
- Read the accumulated result ( $acc[rem]$ ) from the internal RAM.
- Sum the accumulated and new data ( $acc[rem] + d_i[i]$ ) and write into the internal RAM ( $acc[rem]$ ).
- Write the result to the output port ( $do[i]$ ).
- Repeat for the next iteration of the loop.

This ensures that only a single input read and output write is ever required. Even if multiple input and outputs are made available, the internal logic cannot take advantage of any additional ports.

**Note:** If you specify a dual-port RAM and the Vivado HLS tool can determine only a single port is required, it uses a single port and overrides the dual-port specification.

In this design, if you want to implement an array argument using multiple RTL ports, the first thing you must do is **unroll** the *for* loop and allow all internal operations to happen in parallel. Otherwise, there is no benefit in multiple ports: the rolled *for* loop ensures only one data sample can be read (or written) at a time.

### 3-1. Create a new solution and unroll the UNROLL.

3-1-1. Select **Project > New Solution** or click **New Solution** icon (  ) from the toolbar.

3-1-2. Leave the options at their default settings.



Figure 5-10: Creating a New Solution (solution2)

3-1-3. Click **Finish**.

### 3-2. Add the directive UNROLL to unroll the **for** loop.

3-2-1. Ensure that the **array\_io.c** file is opened and active in the Information pane.

3-2-2. In the Directive tab, right-click **For\_Loop** and select **Insert Directive**.



Figure 5-11: Selecting the Directive for For\_Loop

- 3-2-3.** Select **UNROLL** from the Directive drop-down list.



Figure 5-12: Selecting the Directive UNROLL

- 3-2-4.** Click **OK**.

### 3-3. Specify a dual-port RAM for input reads.

**The Resource directive indicates the type of RAM connected to an interface.**

- 3-3-1.** Ensure that the **array\_io.c** file is opened.  
**3-3-2.** In the Directive tab, right-click **d\_i** and select **Insert Directive**.



Figure 5-13: Selecting the Directive for the Input Array **d\_i**

- 3-3-3.** Select **RESOURCE** from the Directive drop-down list.
- 3-3-4.** Click the **core (optional)** box.
- 3-3-5.** Select **RAM\_2P\_BRAM [storage]** from the Vivado HLS core selection dialog box and click **OK**.



**Figure 5-14:** Selecting the Dual-Port Block RAM as the Resource for the Input Array `d_i`

Note that the Resource directive allows you to choose various types of RAM.

- 3-3-6.** Click **OK** in the Vivado HLS Directive Editor dialog box.

### 3-4. Implement the output port using a FIFO interface.

- 3-4-1.** In the Directive tab, right-click `d_o` and select **Insert Directive**.
- 3-4-2.** Select **INTERFACE** from the Directive drop-down list.
- 3-4-3.** Select **ap\_fifo** from the Mode drop-down list.

**3-4-4. Click OK.**

The Directive tab should now look similar to the figure below.



```

array_io
d_o
% HLS INTERFACE ap_fifo port=d_o
d_i
% HLS RESOURCE variable=d_i core=RAM_2P_BRAM
acc
For_Loop
% HLS UNROLL

```

**Figure 5-15: Directive Tab with Directives for UNROLL, Dual-Port BRAM, and FIFO Interface**

**3-5. Synthesize the design.**

- 3-5-1.** Select **Solution > Run C Synthesis > Active Solution** or click the **Run Synthesis** icon in the menu bar.



**Figure 5-16: Launching Synthesis**

This option synthesizes the currently selected solution.

All solutions (or selected solutions) can be synthesized by using the drop-down menu next to the synthesis icon. You can synthesize all solutions or synthesize selected solutions in addition to the default.



**Figure 5-17: Options for What to Synthesize**

You can view the synthesis information in the Console tab.

```

Console X Errors Warnings
Vivado HLS Console
INFO: [HLS 200-10] Analyzing design file 'array_io.c' ...
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-10] Starting code transformations ...
INFO: [HLS 200-10] Checking synthesizability
INFO: [XFORM 203-501] Unrolling loop 'For_Loop' (array_io.c:65) in function 'array_io' completely.
INFO: [XFORM 203-102] Partitioning array acc automatically.
INFO: [HLS 200-111] Elapsed time: 5.655 seconds; current memory usage: 72.3 MB.
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'array_io' ...
INFO: [HLS 200-10] -----
INFO: [HLS 200-10] -- Scheduling module 'array_io'
INFO: [HLS 200-10] --
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111] Elapsed time: 1.051 seconds; current memory usage: 74.2 MB.
INFO: [HLS 200-10] -----
INFO: [HLS 200-10] -- Exploring micro-architecture for module 'array_io'
INFO: [HLS 200-10] --
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111] Elapsed time: 0.085 seconds; current memory usage: 74.5 MB.
INFO: [HLS 200-10] -----
INFO: [HLS 200-10] -- Generating RTL for module 'array_io'
INFO: [HLS 200-10] --
INFO: [RTGEN 206-501] Setting interface mode on port 'array_io/d_o' to 'ap_fifo'.
INFO: [RTGEN 206-501] Setting interface mode on port 'array_io/d_i' to 'ap_memory'.
INFO: [RTGEN 206-501] Setting interface mode on function 'array_io' to 'ap_ctrl_hs'.

```

**Figure 5-18: Synthesis Information in the Console Tab**

When synthesis completes, the Synthesis report will be displayed in the Information pane.

- 3-5-2.** In the Outline pane, select **Interface** to review the Interface Summary.

| RTL Ports    | Dir | Bits | Protocol   | Source Object | C Type       |
|--------------|-----|------|------------|---------------|--------------|
| ap_clk       | in  | 1    | ap_ctrl_hs | array_io      | return value |
| ap_RST       | in  | 1    | ap_ctrl_hs | array_io      | return value |
| ap_start     | in  | 1    | ap_ctrl_hs | array_io      | return value |
| ap_done      | out | 1    | ap_ctrl_hs | array_io      | return value |
| ap_idle      | out | 1    | ap_ctrl_hs | array_io      | return value |
| ap_ready     | out | 1    | ap_ctrl_hs | array_io      | return value |
| d_o_din      | out | 16   | ap_fifo    | d_o           | pointer      |
| d_o_full_n   | in  | 1    | ap_fifo    | d_o           | pointer      |
| d_o_write    | out | 1    | ap_fifo    | d_o           | pointer      |
| d_i_address0 | out | 5    | ap_memory  | d_i           | array        |
| d_i_ce0      | out | 1    | ap_memory  | d_i           | array        |
| d_i_q0       | in  | 16   | ap_memory  | d_i           | array        |
| d_i_address1 | out | 5    | ap_memory  | d_i           | array        |
| d_i_ce1      | out | 1    | ap_memory  | d_i           | array        |
| d_i_q1       | in  | 16   | ap_memory  | d_i           | array        |

**Figure 5-19: Selecting the Interface Summary (Solution2)**

The Interface Summary shows how the directives are synthesized into the RTL dual-port block RAM ports and FIFO interface:

- o The design has the standard *clock*, *reset*, and block-level I/O ports (*ap\_ctrl\_hs*).
- o The array argument *d\_o* has been implemented as a FIFO interface with a 16-bit data port (*d\_o\_din*) and associated output write (*d\_o\_write*) and input FIFO full (*d\_o\_full\_n*) ports.
- o The argument *d\_i* has been implemented as a dual-port RAM interface.



**Figure 5-20: Memory Interface Protocol – Dual Port Block RAM and FIFO**

By using a dual-port RAM interface, this design can accept input data at twice the rate of the previous design. Because the *for* loop was unrolled, the logic in the loop is able to consume data at this rate.

Unrolling the loops allows more reads to be performed (but creates N copies of the logic). However, with the use of a single-port FIFO interface on the output, the output data rate is the same as before.

## Partitioning RAM and FIFO Array Interfaces

**Step 4**

In this step, you will learn how to partition an array interface into any arbitrary number of ports.

### 4-1. Create a new solution by copying the previous solution (*solution2*) settings.

- 4-1-1. Select **Project > New Solution** or click the **New Solution** icon (  ) from the toolbar.
- 4-1-2. Leave the options at their default settings (with *solution2* selected).
- 4-1-3. Click **Finish**.

### 4-2. Apply the **ARRAY\_PARTITION** directive to the **d\_o** array.

- 4-2-1. Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.
- 4-2-2. Ensure that the **array\_io.c** file is open and active in the Information pane.
- 4-2-3. Expand the **array\_io** function in the Directive tab.
- 4-2-4. In the Directive tab, right-click the **d\_o** array of the *array\_io* function and select **Insert Directive**.
- 4-2-5. Select **ARRAY\_PARTITION** from the Directive drop-down list.
- 4-2-6. Select the *type (optional)* as **block**.
- 4-2-7. Enter **4** in the *factor (optional)* field.
- 4-2-8. Click **OK**.



Figure 5-21: Applying the **ARRAY\_PARTITION** Directive for the **d\_o** Array

#### 4-3. Apply the ARRAY\_PARTITION directive to the d\_i array.

- 4-3-1. In the Directive tab, right-click the **d\_i** array of the *array\_io* function and select **Insert Directive**.
- 4-3-2. Select **ARRAY\_PARTITION** from the Directive drop-down list.
- 4-3-3. Select the *type (optional)* as **block**.
- 4-3-4. Enter **2** in the *factor (optional)* field.
- 4-3-5. Click **OK**.

The final view of the ARRAY\_PARTITION directives in the Directive tab should look similar to the figure below.



Figure 5-22: Final View of ARRAY\_PARTITION Directives

#### 4-4. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

#### 4-5. View the Synthesis report.

**You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.**

- 4-5-1. In the Outline pane, select **Interface** to review the Interface Summary.

The design has the standard clock, reset, and block-level I/O ports.

The array argument **d\_o** has been implemented as four separate FIFO interfaces.

The argument **d\_i** has been implemented as two separate RAM interfaces, each of which uses a dual-port interface. (If you see four separate RAM interfaces, confirm that the partition factor for **d\_i** is two and not four).

If the input port **d\_i** was partitioned into four, only a single-port RAM interface would be required for each port. Because the output port can only output four values at once, there would be no benefit in reading eight inputs at once.



Figure 5-23: Interface Report for Partitioned Interfaces

## Fully Partitioning the Array Interfaces

## Step 5

In this step, you will partition an array interface into individual ports.

### 5-1. Create a new solution by copying the previous solution (*solution3*) settings.

5-1-1. Select **Project > New Solution** or click the **New Solution** icon (  ) from the toolbar.

5-1-2. Leave the options at their default settings (with *solution3* selected).

5-1-3. Click **Finish**.

### 5-2. Modify the **ARRAY\_PARTITION** directive of the *d\_o* array to partition fully.

5-2-1. Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.

5-2-2. Ensure that the **array\_io.c** file is open and active in the Information pane.

5-2-3. Expand the **array\_io** function in the Directive tab.

5-2-4. In the Directive tab, right-click the **ARRAY\_PARTITION** of the *d\_o* array of the *array\_io* function and select **Modify Directive**.

5-2-5. Select **ARRAY\_PARTITION** from the Directive drop-down list.

5-2-6. Delete **4** from the *factor (optional)* field.

5-2-7. Select the *type (optional)* as **complete**.

5-2-8. Click **OK**.



Figure 5-24: Applying Full Partition to the *d\_o* Array

5-2-9. Similarly, repeat the previous tasks to completely partition the *d\_i* array.

### 5-3. Remove the RESOURCE directive of the d\_i array.

- 5-3-1. Right-click the RESOURCE directive of *d\_i* and select **Remove Directive**.



Figure 5-25: Removing the RESOURCE Directive

The final view of the fully partitioned array directives in the Directive tab should look similar to the figure below.



Figure 5-26: Final View of Fully Partitioned Directives

### 5-4. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

## 5-5. View the Synthesis report.

You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.

- 5-5-1. In the Outline pane, select **Interface** to review the Interface Summary.

The design has the standard clock, reset, and block-level I/O ports.

The array argument **d\_o** has been implemented as 32 separate FIFO interfaces.

The argument **d\_i** has been implemented as 32 separate scalar ports. Because the default interface for input scalars is not in the I/O protocol, they have the I/O protocol **ap\_none**.

Although you have focused exclusively on the I/O interfaces, at this point it is worth examining the differences in performance across all four solutions.

## 5-6. Compare all four solutions.

- 5-6-1. Select **Project > Compare Reports** or click the **Compare Reports** icon (  ) to compare the results of all the solutions (*solution1*, *solution2*, *solution3*, and *solution4*).

- 5-6-2. Select **solution1**, **solution2**, **solution3**, and **solution4** from the Available solutions section.

- 5-6-3. Click **Add**.

- 5-6-4. Click **OK**.

You should see the compared results of *solution1*, *solution2*, *solution3*, and *solution4* as shown in the figure below.

| Performance Estimates                           |           |           |           |           |           |
|-------------------------------------------------|-----------|-----------|-----------|-----------|-----------|
| <input type="checkbox"/> Timing (ns)            |           |           |           |           |           |
| Clock                                           |           | solution4 | solution1 | solution2 | solution3 |
| ap_clk                                          | Target    | 4.00      | 4.00      | 4.00      | 4.00      |
|                                                 | Estimated | 3.40      | 2.39      | 2.70      | 3.40      |
| <input type="checkbox"/> Latency (clock cycles) |           |           |           |           |           |
|                                                 |           | solution4 | solution1 | solution2 | solution3 |
| Latency                                         | min       | 2         | 129       | 33        | 11        |
|                                                 | max       | 2         | 129       | 33        | 11        |
| Interval                                        | min       | 3         | 130       | 34        | 12        |
|                                                 | max       | 3         | 130       | 34        | 12        |

Figure 5-27: Comparison of Performance Estimates of All Solutions

Observe that *solution4*, using a unique port for each array element, is much faster than the previous solutions. The internal logic can access the data as soon as it is required. (There is no performance bottleneck due to port accesses.)

**5-6-5.** Scroll down in the comparison report to view the resource utilization.

Observe that the solutions with more I/O ports (*solution 2*, *solution3*, and *solution4*), allow more parallel processing, but also use more considerably more resources.

|          | solution4 | solution1 | solution2 | solution3 |
|----------|-----------|-----------|-----------|-----------|
| BRAM_18K | 0         | 0         | 0         | 0         |
| DSP48E   | 0         | 0         | 0         | 0         |
| FF       | 1155      | 186       | 1266      | 1228      |
| LUT      | 1058      | 55        | 1322      | 1195      |

**Figure 5-28: Comparison of Utilization Estimates of All Solutions**

## Summary

In this lab, you have learned how to implement a memory interface protocol such as *ap\_memory* and *ap\_fifo*. You also used directives to implement an input array as dual-port block RAM. In addition, you used directives to fully partition the array and analyzed the synthesized report.

## Answers

1. What is the width of the data ports and address ports?

The data port is the width of the data values in the C source (16-bit integers in this case) and the width of the address port has been automatically sized to match the number of addresses that must be accessed (5-bit for 32 addresses).



# Lab 6: Improving Area and Resource Utilization

2016.1

## Abstract

This lab introduces various techniques and directives that can be used in the Vivado® HLS tool to improve design performance as well as area and resource utilization.

This lab should take approximately 60 minutes.

## Objectives

After completing this lab, you will be able to:

- Describe the effect of the PIPELINE directive on resource utilization and performance
- Identify the effect of memory partition techniques on resource utilization and performance
- Describe the impact of various directives on resource utilization and performance

## Introduction

This lab introduces various techniques and directives which can be used in Vivado HLS to improve design performance as well as area and resource utilization. The design under consideration performs discrete cosine transformation (DCT) on an 8x8 block of data.

In this lab, you will be using a C design to implement a DCT. The function implements a 2D DCT algorithm by first processing each row of the input array via a 1D DCT, then processing the columns of the resulting array through the same 1D DCT. It calls the read\_data, dct\_2d, and write\_data functions.

The read\_data function is defined at line 54 and consists of two loops: RD\_Loop\_Row and RD\_Loop\_Col. The write\_data function is defined at line 66 and consists of two loops to perform writing the result. The dct\_2d function, defined at line 23, calls the dct\_1d function and performs transpose.

Finally, the dct\_1d function, defined at line 4, uses the dct\_coeff\_table and performs the required function by implementing a basic iterative form of the 1D Type-II DCT algorithm.

The following figure shows the function hierarchy on the left-hand side, the loops in the order they are executed, and the flow of data on the right-hand side.



**Figure 6-1: Design Hierarchy and Dataflow**

## General Flow



## Creating a Vivado HLS Tool Project

## Step 1

There are a number of ways to launch the Vivado HLS tool. The two most popular mechanisms are shown here.

### 1-1. Launch the Vivado HLS tool.

- 1-1-1. Select **Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado HLS > Vivado HLS 2016.1.**



**Figure 6-2: Launching the Vivado HLS Tool**

-- OR --

Double-click the **Vivado HLS** shortcut icon (  ) on the desktop.

The Vivado HLS tool opens to the Welcome window. From the Welcome window you can create a new project, open examples, and access documentation and examples.



**Figure 6-3: Vivado HLS Welcome Screen**

Here you will learn to create a new Vivado HLS project from scratch.

### 1-2. Create a Vivado HLS project named **dct\_prj**.

- 1-2-1. From the Welcome Page, click **Create New Project**.



Figure 6-4: Creating a New Vivado HLS Project

### 1-3. The Project Configuration dialog box asks for a project name and location.

- 1-3-1. Enter **dct\_prj** in the Project name field (1).
- 1-3-2. Enter **C:\training\hls\labs\improve\_area** in the Location field (2).



Figure 6-5: Configuring a New HLS Project

- 1-3-3. Click **Next** (3).

**1-4. The Add/Remove Files dialog box opens. Here you will be invited to add existing files or create new sources.**

**1-4-1. Click Add Files.**

The Open File dialog box opens.

**Note:** If do not have existing files at this moment and you want to create new ones, click **New File**.

**1-4-2. Browse to C:\training\hls\labs\improve\_area.**

**1-4-3. Select **dct.c**.**

The Vivado HLS tool automatically adds the working directory (project directory) and any directory that contains C files added to the project to the search path. Hence, header files that reside in these directories are automatically included in the project (no need to explicitly specify them). You must specify the path to all other header files (if any) by clicking the Edit CFLAGS button.

**1-4-4. Click **Open** to add these files.**

Note that you can add compiler directives specific to each entry at this point.

**1-4-5. Click **Browse** next to the Top Function field.**

The Select Top function dialog box opens, which lists all the functions available from the specified source files.

**1-4-6. Select **dct (dct.c)** from the list.**

**Note:** You can also manually enter the name of the top function in the Top Function field.



**Figure 6-6: Adding Files to a New Vivado HLS Project**

**1-4-7. Click **Next**.**

**1-5. Add any existing testbench files.**

If you have (or want) any testbench files they can be entered here.  
Sometimes the testbench is built into the synthesizable file.

**1-5-1.** Click **Add Files**.

**1-5-2.** Navigate to *C:\training\hls\labs\improve\_area*.

**1-5-3.** Select **dct\_test.c, in.dat and out.golden.dat**.

**1-5-4.** Click **Open** to add these files.

**1-5-5.** Click **Next**.

**1-6. Finally, it is time to specify some of the physical parameters of the design.**

**1-6-1.** By default, **Solution1** is populated in the Solution Name field.

No changes are required.

**1-6-2.** Set the clock period to **10**.

You can leave the Uncertainty field blank.

**1-6-3.** Click the **Browse** button to select a part or board.



Figure 6-7: Locating the Board Browse Button

**1-6-4.** Click **Boards** as shown below.

**1-6-5.** Enter **zynq** in the Search field.



Figure 6-8: Filtering to Quickly Locate Target Platforms

**1-6-6.** Select **Zynq ZC702 Evaluation Board** from the search list.

**1-6-7.** Click **OK** to select the board.

**1-6-8.** Click **Finish**.

You will see the created project in the Explorer tab.

## Synthesizing the Design

## Step 2

In this step, you will synthesize the design by using Vivado HLS tool defaults and analyze how many resources are utilized to implement the C design.

### 2-1. Synthesize the design.

**2-1-1.** Select **Solution > Run C Synthesis > Active Solution** or click the **Run Synthesis** icon in the menu bar.



Figure 6-9: Launching Synthesis

This option synthesizes the currently selected solution.

All solutions (or selected solutions) can be synthesized by using the drop-down menu next to the synthesis icon. You can synthesize all solutions or synthesize selected solutions in addition to the default.



Figure 6-10: Options for What to Synthesize

## 2-2. View the Synthesis report.

**You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.**

### 2-2-1. View the synthesis results in the Console tab.

Note that the Synthesis Report section in the Explorer view only shows the *dct\_dtc\_1d2\_csynth.rpt*, *dct\_dtc\_2d\_csynth.rpt*, and *dct.rpt* entries. The *read\_data* and *write\_data* functions reports are not listed. This is because these two functions are inlined.

### 2-2-2. Verify this by scrolling up in the Console tab.



Figure 6-11: Inlining of *read\_data* and *write\_data* Functions

**2-2-3.** Review the performance and resource estimates as well as the estimated latency.



Figure 6-12: Synthesis Report

### Question 1

Looking at the report, fill in the table below.

|                        |  |
|------------------------|--|
| Estimated clock period |  |
| Worst case latency     |  |
| Number of DSP48E used  |  |
| Number of BRAMs used   |  |
| Number of FFs used     |  |
| Number of LUTs used    |  |

### Performance and Utilization Estimates

## 2-2-4. View the top-level signals generated by the tools.



**Figure 6-13: Generated Interface Signals**

You can see that *ap\_clk*, *ap\_rst* are automatically added. The *ap\_start*, *ap\_done*, *ap\_idle*, and *ap\_ready* signals are top-level signals used as handshaking signals to indicate when the design is able to accept the next computation command (*ap\_idle*), when the next computation is started (*ap\_start*), and when the computation is completed (*ap\_done*).

The top-level function has input and output arrays, hence an *ap\_memory* interface is generated for each of them.

## 2-2-5. Open the **dct\_dct\_1d2\_csynth.rpt** and **dct\_dct\_2d\_csynth.rpt** synthesis reports under syn > report in the Explorer view.



**Figure 6-14: Synthesis Reports of dct1d2 and dct2d**

The report for *dct\_2d* clearly indicates that most of the design cycles (3668) are spent doing the row and column DCTs. Also the *dct\_1d* report indicates that the latency is 209 clock cycles  $((24+2)*8+1)$ .

## Performing C/RTL Co-simulation

**Step 3**

Now that you have performed high-level synthesis on the C design, you will perform RTL co-simulation on the generated RTL using the C testbench.

Run C/RTL co-simulation, selecting Verilog and skipping VHDL. Verify that the simulation passes.

### 3-1. Cosimulate the Vivado HLS tool design.

- 3-1-1. Select **Solution > Run C/RTL Cosimulation** or click the **Run C/RTL Cosimulation** icon (checkbox).



Figure 6-15: Launching from the Menu

The Run C/RTL Co-simulation dialog box opens.



Figure 6-16: Co-simulation Dialog Box

Each of the options controls how C/RTL Cosimulation is run:

- RTL Selection: Select the RTL that is simulated (Verilog/VHDL).
- Setup Only: This creates all the files (wrappers, adapters and scripts) required to run the simulation but does not execute the simulator.
- Dump Trace: During RTL verification, the trace files can be saved and viewed using an appropriate viewer. By selecting this option, the trace file will be saved to the `<solution>/sim/<RTL>` folder.
- Optimizing Compile: This ensures a high level of optimization is used to compile the C testbench. Using this option increases the compile time but the simulation executes faster.

**3-1-2. Select **Default Options** (i.e. select nothing).**

**3-1-3. Click **OK**.**

The simulation log will be displayed in the editor pane.

**3-2. View the Cosimulation report.**

**The information generated by the Vivado HLS tool can be found in two places, both described here.**

**The first is the Console window, which reports not only the output produced by the code being simulated, but all of the simulation engine messages as well. The simulation log provides only a few simulation engine messages and the simulated code output.**

**3-2-1. Select the **Console** tab in the lower portion of the tool's GUI.**

You may need to scroll to view all the output produced by the cosimulation.



```

Vivado HLS Console
// Inter-Transaction Progress: completed Transaction / Total Transaction
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
// RTL Simulation : 0 / 601 [0.00%] @ "125000"
// RTL Simulation : 601 / 601 [100.00%] @ "498965000"
$finish called at time : 499005 ns : File "C:/training/
## quit
INFO: [Common 17-206] Exiting xsim at ...
INFO: [COSIM 212-316] Starting C post checking ...
Comparing against output data
*****
PASS: The output matches the golden output!
*****
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
Finished C/RTL cosimulation.
|
```

**Figure 6-17: Example Output After a C/RTL Co-simulation**

The other location, described below, provides only a few simulation engine messages and the simulated code output. Typically this is opened after the simulation completes; however, if you need to access it after closing the log pane, here's how to access the simulation report.

3-2-2. Expand **dct\_prj > solution1 > sim > report** in the Explorer pane.

3-2-3. Double-click the log file name to open it in the editor pane.



Figure 6-18: Locating the Co-Simulation Log File

The Cosimulation Report in HTML format will be displayed in the main viewing area.

The screenshot shows the main LabVIEW workspace with the following content:

**Cosimulation Report for 'dct'**

| RTL     | Status | Latency |        |        | Interval |        |     |
|---------|--------|---------|--------|--------|----------|--------|-----|
|         |        | min     | avg    | max    | min      | avg    | max |
| VHDL    | NA     | NA      | NA     | NA     | NA       | NA     |     |
| Verilog | Pass   | ██████  | ██████ | ██████ | ██████   | ██████ |     |

Export the report(.html) using the [Export Wizard](#)

Figure 6-19: Cosimulation Report – HTML

You can quickly verify the cosimulation status here.

## Applying the PIPELINE Directive

## Step 4

In this step, you will create a new solution by copying the previous solution settings.

You will then apply the PIPELINE directive to the following:

- *DCT\_Inner\_Loop*
- *Xpose\_Row\_Inner\_Loop*
- *Xpose\_Col\_Inner\_Loop*
- *RD\_Loop\_Col*
- *WR\_Loop\_Col*

### 4-1. Create a new solution by copying the previous solution (*solution1*) settings.

- 4-1-1. Select **Project > New Solution** or click the **New Solution** icon (  ) from the toolbar.
- 4-1-2. Leave the options at their default settings.



Figure 6-20: Creating a New Solution (solution2)

- 4-1-3. Click **Finish**.

## 4-2. Apply the PIPELINE directive to the *DCT\_Inner\_Loop* loop.

- 4-2-1. Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.
- 4-2-2. Ensure that the **dct.c** file is open and active in the Information pane.
- 4-2-3. Expand **dct\_1d** in the Directive tab.
- 4-2-4. In the Directive tab, right-click the **DCT\_Inner\_Loop** loop and select **Insert Directive**.
- 4-2-5. Select **PIPELINE** from the Directive drop-down list.
- 4-2-6. Leave the **II (optional)** field blank as the Vivado HLS tool will try for an II=1, one new input every clock cycle.
- 4-2-7. Click **OK**.



Figure 6-21: Selecting the Directive PIPELINE for DCT\_Inner\_Loop

- 4-2-8. Similarly, apply the PIPELINE directive to the **Xpose\_Row\_Inner\_Loop** and **Xpose\_Col\_Inner\_Loop** loops of the `dct_2d` function, **RD\_Loop\_Col** of the `read_data` function, and **WR\_Loop\_Col** of the `write_data` function.

The final view of the PIPELINE directives in the Directive tab should look similar to the figure below.



Figure 6-22: Final View of PIPELINE Directives

#### 4-3. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

#### 4-4. View the Synthesis report.

You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.

- 4-4-1. Select **Project > Compare Reports** or click the **Compare Reports** icon () to compare the results of the two solutions (*solution1* and *solution2*).
- 4-4-2. Select **solution1** and **solution2** from the Available solutions section.
- 4-4-3. Click **Add**.
- 4-4-4. Click **OK**.

You should see the compared results of *solution1* and *solution2* as shown in the figure below.

| Performance Estimates  |           |           |           |
|------------------------|-----------|-----------|-----------|
| Timing (ns)            |           |           |           |
| Clock                  |           | solution2 | solution1 |
| ap_clk                 | Target    | 10.00     | 10.00     |
|                        | Estimated | 7.68      | 6.38      |
| Latency (clock cycles) |           |           |           |
|                        |           | solution2 | solution1 |
| Latency                | min       | 1850      | 3959      |
|                        | max       | 1850      | 3959      |
| Interval               | min       | 1851      | 3960      |
|                        | max       | 1851      | 3960      |

Figure 6-23: Performance Comparison After Pipelining

Observe that the latency reduced from 3959 to 1850 clock cycles.

- 4-4-5. Scroll down in the comparison report to view the resource utilization.

Observe that the LUT utilization increased and flip-flop decreased, whereas block RAM and DSP48E remained the same.

| Utilization Estimates |           |           |
|-----------------------|-----------|-----------|
|                       | solution2 | solution1 |
| BRAM_18K              | 5         | 5         |
| DSP48E                | 1         | 1         |
| FF                    | 255       | 278       |
| LUT                   | 457       | 353       |

Figure 6-24: Comparison of Resource Estimates After Pipelining

**4-5. Switch to the Analysis perspective and determine where most of the clock cycles are spent, i.e. where the large latencies are.**

- 4-5-1.** Select **Solution > Open Analysis Perspective** or click the **Analysis** perspective icon

( ) to open the analysis viewer.

- 4-5-2.** In the Module Hierarchy tab, select **dct**.

- 4-5-3.** View the entries under the Performance Profile tab.



The screenshot shows the Vivado HLS Analysis viewer. The top part displays the **Module Hierarchy** table:

|              | BRAM | DSP | FF  | LUT | Latency | Interval | Pipeline type |
|--------------|------|-----|-----|-----|---------|----------|---------------|
| ↳ dct        | 5    | 1   | 255 | 457 | 1850    | 1851     | none          |
| ↳ dct_dct_2d | 3    | 1   | 194 | 319 | 1717    | 1717     | none          |

The bottom part shows the **Performance Profile** table:

|                           | Pipelined | Latency | Initiation Interval | Iteration Latency | Trip count |
|---------------------------|-----------|---------|---------------------|-------------------|------------|
| ↳ dct                     | -         | 1850    | 1851                | -                 | -          |
| ↳ RD_Loop_Row_RD_Loop_Col | yes       | 64      | 1                   | 2                 | 64         |
| ↳ WR_Loop_Row_WR_Loop_Col | yes       | 64      | 1                   | 2                 | 64         |

Figure 6-25: Performance Profile at the **dct** Function Level

Observe the **RD\_Loop\_Row\_RD\_Loop\_Col** and **WR\_Loop\_Row\_WR\_Loop\_Col** entries.

These are two nested loops that have been flattened and given new names formed by appending the inner loop name to the out loop name. You can also verify this via the messages in the Console view.



```
Vivado HLS Console
INFO: [XFORM 203-602] Inlining function 'write_data' into 'dct' (dct.c:90) automatically.
INFO: [XFORM 203-602] Inlining function 'read_data' into 'dct' (dct.c:85) automatically.
INFO: [XFORM 203-602] Inlining function 'write_data' into 'dct' (dct.c:90) automatically
INFO: [XFORM 203-541] Flattening a loop nest 'Xpose_Row_Outer_Loop' (dct.c:38:1) in function 'dct_2d'.
INFO: [XFORM 203-541] Flattening a loop nest 'Xpose Col Outer Loop' (dct.c:49:1) in function 'dct_2d'.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'DCT_Outer_Loop' (dct.c:13:67) in function 'dct_1d' :
WARNING: [XFORM 203-542] the outer loop is not a perfect loop because there is nontrivial logic in the loop latch.
INFO: [XFORM 203-541] Flattening a loop nest 'RD_Loop_Row' (dct.c:59:67) in function 'dct'.
INFO: [XFORM 203-541] Flattening a loop nest 'WR_Loop_Row' (dct.c:71:67) in function 'dct'.
INFO: [HLS 200-111] Elapsed time: 8.52 seconds; current memory usage: 91.6 MB.
INFO: [HLS 200-10] Starting hardware synthesis ...
INFO: [HLS 200-10] Synthesizing 'dct' ...
INFO: [HLS 200-10] ...
```

Figure 6-26: Viewing the Console Tab Indicating Loops Flattening

- 4-5-4.** In the Module Hierarchy tab, expand **dct > dct\_dct\_2d > dct\_dct\_1d2**.

Notice that the most of the latency occurs in the *dct\_2d* function.



Figure 6-27: *dct\_1d* Function Performance Profile

- 4-5-5. In the Performance Profile tab, select **DCT\_Inner\_Loop** (1).
- 4-5-6. Right-click the **node\_60 (write)** block in the C3 state in the Performance view (2).
- 4-5-7. Select **Goto Source** (3).



Figure 6-28: Understanding What is Preventing DCT\_Outer\_Loop Flattening

Notice that line no. 19 is highlighted, which is preventing the flattening of **DCT\_Outer\_Loop**.

**4-5-8.** Switch to the **Synthesis** perspective.

Moving the PIPELINE directive from the inner loop to the outer loop of *dct\_1d* will lead to more parallelism of the multiply and add operations. That is, eight (8) multiply-and-add operations are done concurrently, thus minimizing the number of cycles required to compute each value in the output array.

**4-6. Create a new solution by copying the previous solution (*solution2*) settings.**

- 4-6-1.** Select **Project > New Solution** or click the **New Solution** icon (  ) from the toolbar.
- 4-6-2.** Leave the options at their default settings (with *solution2* selected).
- 4-6-3.** Click **Finish**.

**4-7. Apply the PIPELINE directive to the *DCT\_Outer\_Loop* loop of *dct\_1d* and remove the PIPELINE directive of the *DCT\_Inner\_Loop* loop of the *dct\_1d* function.**

- 4-7-1.** Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.
- 4-7-2.** Ensure that the **dct.c** file is open and active in the Information pane.
- 4-7-3.** Expand **dct\_1d** in the Directive tab.
- 4-7-4.** In the Directive tab, select the **PIPELINE** directive of the *DCT\_Inner\_Loop* loop of the *dct\_1d* function.
- 4-7-5.** Right-click and select **Remove Directive**.



Figure 6-29: Removing the PIPELINE Directive of DCT\_Inner\_Loop

- 4-7-6. In the Directive tab, right-click the **DCT\_Outer\_Loop** loop and select **Insert Directive**.
- 4-7-7. Select **PIPELINE** from the Directive drop-down list.
- 4-7-8. Click **OK**.



Figure 6-30: PIPELINE Directive for the DCT\_Outer\_Loop of the dct\_1d Function

By pipelining an outer loop, all inner loops will be unrolled automatically (if legal), so there is no need to explicitly apply an UNROLL directive to **DCT\_Inner\_Loop**. Simply move the pipeline to the outer loop: the nested loop will still be pipelined but the operations in the inner-loop body will operate concurrently.

#### 4-8. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

#### 4-9. View the Synthesis report.

You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.

- 4-9-1. Select **Project > Compare Reports** or click the **Compare Reports** icon ( ) to compare the results of the two solutions.
- 4-9-2. Select **solution2** and **solution3** from the Available solutions section.
- 4-9-3. Click **Add**.
- 4-9-4. Click **OK**.

You should see the compared results of *solution2* and *solution3* as shown in the figure below.

| Performance Estimates  |           |           |           |
|------------------------|-----------|-----------|-----------|
| Timing (ns)            |           |           |           |
| Clock                  |           | solution3 | solution2 |
| ap_clk                 | Target    | 10.00     | 10.00     |
|                        | Estimated | 9.40      | 7.68      |
| Latency (clock cycles) |           |           |           |
|                        |           | solution3 | solution2 |
| Latency                | min       | 874       | 1850      |
|                        | max       | 874       | 1850      |
| Interval               | min       | 875       | 1851      |
|                        | max       | 875       | 1851      |

Figure 6-31: Performance Comparison After Pipelining DCT\_Outer\_Loop of *dct\_1d*

Observe that the latency reduced from 1850 to 874 clock cycles.

- 4-9-5.** Scroll down in the comparison report to view the resource utilization.

Observe that the utilization of all resources (except block RAM) increased.

Since *DCT\_Inner\_Loop* was unrolled, the parallel computation requires eight DSP48E resources.

| Utilization Estimates |           |           |
|-----------------------|-----------|-----------|
|                       | solution3 | solution2 |
| BRAM_18K              | 5         | 5         |
| DSP48E                | 8         | 1         |
| FF                    | 677       | 255       |
| LUT                   | 518       | 457       |

Figure 6-32: Comparison of Resource Estimates After Pipelining DCT\_Outer\_Loop of *dct\_1d*

- 4-9-6.** Double-click **dct\_dct\_1d2\_csynth.rpt** under **dct\_prj > solution3 > syn > report** in the Explorer pane.

The screenshot shows the Xilinx Synthesis Report interface. The top section, "Performance Estimates", contains tables for Timing (ns) and Latency (clock cycles). The "Timing (ns)" table shows clock information: Clock ap\_clk, Target 10.00, Estimated 9.40, Uncertainty 1.25. The "Latency (clock cycles)" table shows latency intervals: min 36, max 36, Type none. The bottom section, "Utilization Estimates", contains a table for resource usage across various components like DSP, BRAM, and LUT.

| Name            | BRAM_18K | DSP48E   | FF         | LUT        |
|-----------------|----------|----------|------------|------------|
| DSP             | -        | 8        | -          | -          |
| Expression      | -        | -        | 0          | 128        |
| FIFO            | -        | -        | -          | -          |
| Instance        | -        | -        | -          | -          |
| Memory          | 0        | -        | 119        | 16         |
| Multiplexer     | -        | -        | -          | 21         |
| Register        | -        | -        | 420        | -          |
| <b>Total</b>    | <b>0</b> | <b>8</b> | <b>539</b> | <b>165</b> |
| Available       | 280      | 220      | 106400     | 53200      |
| Utilization (%) | 0        | 3        | ~0         | ~0         |

**Figure 6-33: Synthesis Report of 1d2 Function - Increased Resource Utilization**

Observe that the pipeline **Initiation Interval (II)** is four cycles, not one as might be hoped and there are now eight block RAMs being used for the coefficient table.

Looking closely at the synthesis log, notice that the coefficient table was automatically partitioned, resulting in eight separate ROMs. This helped reduce the latency by keeping the unrolled computation loop fed. However the input arrays to the *dct\_1d* function were not automatically partitioned.

The reason that the II is four rather than the eight one might expect is because the Vivado HLS tool automatically uses dual-port RAMs when beneficial to scheduling operations.



```
Vivado HLS Console
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-10] Starting code transformations ...
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [XFORM 203-602] Inlining function 'read_data' into 'dct' (dct.c:85) automatically.
INFO: [XFORM 203-602] Inlining function 'write_data' into 'dct' (dct.c:90) automatically.
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'DCT_Outer_Loop' (dct.c:13) in function 'dct_1d' for pipelining.
INFO: [XFORM 203-501] Unrolling loop 'DCT_Inner_Loop' (dct.c:15) in function 'dct_1d' completely.
INFO: [XFORM 203-102] Partitioning array 'dct_coeff_table' in dimension 2 automatically.
INFO: [XFORM 203-602] Inlining function 'read_data' into 'dct' (dct.c:85) automatically.
INFO: [XFORM 203-602] Inlining function 'write_data' into 'dct' (dct.c:90) automatically.
INFO: [XFORM 203-11] Balancing expressions in function 'dct_1d' (dct.c:4)...8 expression(s) balanced.
```

Figure 6-34: Automatic Partitioning of `dct_coeff_table`



```
Vivado HLS Console
INFO: [HLS 200-10] -----
INFO: [HLS 200-10] -- Scheduling module 'dct_dct_1d2'
INFO: [HLS 200-10] -----
INFO: [SCHED 204-11] Starting scheduling ...
INFO: [SCHED 204-61] Pipelining loop 'DCT_Outer_Loop'.
WARNING: [SCHED 204-69] Unable to schedule 'load' operation ('src.load[5]', dct.c:17) on array 'src' due to limited memory ports.
INFO: [SCHED 204-61] Pipelining result: Target II: 1, Final II: 4, Depth: 7.
WARNING: [SCHED 204-21] Estimated clock period (9.4ns) exceeds the target (target clock period: 10ns, clock uncertainty: 1.25ns, ...
WARNING: [SCHED 204-21] The critical path consists of the following:
'mul' operation ('tmp_10[7]', dct.c:17) (3.36 ns)
'add' operation ('tmp[7]', dct.c:19) (3.02 ns)
'a' operation ('tmp[7]', dct.c:19) (3.02 ns)
```

Figure 6-35: Initiation Interval of 4

#### 4-10. Switch to the Analysis perspective and look at the `dct_1d` Performance view.

- 4-10-1. Select **Solution > Open Analysis Perspective** or click the **Analysis** perspective icon



( ) to open the analysis viewer.

- 4-10-2. In the Module Hierarchy tab, expand `dct` > `dct_dct_2d` > `dct_dct_1d2`.

- 4-10-3. Select `dct_dct_1d2`.

Observe that the **DCT\_Outer\_Loop** is pipelined.



Figure 6-36: DCT\_Outer\_Loop Pipelined

Also observe that **DCT\_Outer\_Loop** spans over eight states in the Performance View.



Figure 6-37: DCT\_Outer\_Loop in Performance View

#### 4-10-4. Select the **Resource** tab and expand **Memory Ports**.



Figure 6-38: Resource Tab – Memory Ports View

Observe that the memory accesses on the block RAM src are being used to the maximum in every clock cycle.

A block RAM at most can be dual port and both ports are being used. This is a good indication that the design may be bandwidth limited by the memory resources.

## Improving the Memory Bandwidth

## Step 5

In this step, you will apply the ARRAY\_PARTITION directive to *buf\_2d\_in* of *dct* (since the bottleneck was on the *src* port of the *dct\_1d* function, which was passed via *in\_block* of the *dct\_2d* function, which in turn was passed via *buf\_2d\_in* of the *dct* function) and *col\_inbuf* of *dct\_2d*.

### 5-1. Create a new solution by copying the previous solution (*solution3*) settings.

- 5-1-1. Select **Project > New Solution** or click the **New Solution** icon (  ) from the toolbar.
- 5-1-2. Leave the options at their default settings (with *solution3* selected).
- 5-1-3. Click **Finish**.

### 5-2. Apply the ARRAY\_PARTITION directive to *buf\_2d\_in* of the *dct* function and *col\_inbuf* of the *dct\_2d* function.

- 5-2-1. Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.
- 5-2-2. Ensure that the **dct.c** file is open and active in the Information pane.
- 5-2-3. Expand **dct** in the Directive tab.
- 5-2-4. In the Directive tab, right-click the **buf\_2d\_in** array of the *dct* function and select **Insert Directive**.

The *buf\_2d\_in* array is selected since the bottleneck was on the *src* port of the *dct\_1d* function, which was passed via *in\_block* of the *dct\_2d* function, which in turn was passed via *buf\_2d\_in* of the *dct* function).

- 5-2-5. Select **ARRAY\_PARTITION** from the Directive drop-down list.
- 5-2-6. Select the *type (optional)* as **complete**.
- 5-2-7. Enter **2** in the *dimension (optional)* field.

**5-2-8.** Click **OK**.

Figure 6-39: Applying the **ARRAY\_PARTITION** Directive to the Memory Buffer

**5-2-9.** Similarly, apply the **ARRAY\_PARTITION** directive with *dimension* of **2** and *type* as **complete** to the `col_inbuf` array of the `dct2d` function.

The final view of the ARRAY\_PARTITION directives in the Directive tab should look similar to the figure below.



Figure 6-40: Final View of ARRAY\_PARTITION Directives

### 5-3. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

### 5-4. View the Synthesis report.

You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.

- 5-4-1. Select **Project > Compare Reports** or click the **Compare Reports** icon ( ) to compare the results of the two solutions (*solution3* and *solution4*).
- 5-4-2. Select **solution3** and **solution4** from the Available solutions section.
- 5-4-3. Click **Add**.
- 5-4-4. Click **OK**.

You should see the compared results of *solution3* and *solution4* as shown in the figure below.

| Performance Estimates  |           |           |           |
|------------------------|-----------|-----------|-----------|
| Timing (ns)            |           |           |           |
| Clock                  |           | solution4 | solution3 |
| ap_clk                 | Target    | 10.00     | 10.00     |
|                        | Estimated | 10.79     | 9.40      |
| Latency (clock cycles) |           |           |           |
|                        |           | solution4 | solution3 |
| Latency                | min       | 508       | 874       |
|                        | max       | 508       | 874       |
| Interval               | min       | 509       | 875       |
|                        | max       | 509       | 875       |

Figure 6-41: Performance Comparison After Array Partitioning

Observe that the latency reduced from 874 to 508 clock cycles.

- 5-4-5.** Scroll down in the comparison report to view the resource utilization.

Observe that flip-flop usage has increased by almost double.

| Utilization Estimates |           |           |
|-----------------------|-----------|-----------|
|                       | solution4 | solution3 |
| BRAM_18K              | 3         | 5         |
| DSP48E                | 8         | 8         |
| FF                    | 1243      | 677       |
| LUT                   | 625       | 518       |

Figure 6-42: Comparison of Resource Estimates After Array Partitioning

- 5-4-6.** Double-click **dct\_csynth.rpt** under **dct\_prj > solution4 > syn > report** in the Explorer pane.

- 5-4-7.** Expand the **Loop** entry in the *dct* report and observe that the **Pipeline II** is now 1.

## 5-5. Switch to the Analysis perspective and look at the *dct* resources profile view.

- 5-5-1.** Select **Solution > Open Analysis Perspective** or click the **Analysis** perspective icon



- 5-5-2.** In the Module Hierarchy tab, select **dct**.

- 5-5-3.** Select the **Resource Profile** tab.

**5-5-4.** Expand the **Memories** and **Expressions** entries.



**Figure 6-43: Resource Profile After Partitioning Buffers**

Observe that the most of the resources are consumed by instances. The *buf\_2d\_in* array is partitioned into multiple memories and most of the operations are done in addition and comparison.

**5-5-5.** Switch to Synthesis perspective.

## Applying the DATAFLOW Directive

## Step 6

In this step, you will apply the DATAFLOW directive to improve the throughput.

### 6-1. Create a new solution by copying the previous solution (*solution4*) settings.

- 6-1-1. Select **Project > New Solution** or click the **New Solution** icon (  ) from the toolbar.
- 6-1-2. Leave the options at their default settings (with *solution4* selected).
- 6-1-3. Click **Finish**.

### 6-2. Apply the DATAFLOW directive to the *dct* function.

- 6-2-1. Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.
- 6-2-2. Ensure that the **dct.c** file is open and active in the Information pane.
- 6-2-3. In the Directive tab, right-click the **dct** function and select **Insert Directive**.
- 6-2-4. Select **DATAFLOW** from the Directive drop-down list to improve the throughput.
- 6-2-5. Click **OK**.

### 6-3. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

### 6-4. View the Synthesis report.

**You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.**

- 6-4-1. Observe that the **dataflow** type pipeline throughput is listed in the Performance Estimates.

| Performance Estimates  |          |           |             |          |
|------------------------|----------|-----------|-------------|----------|
| Timing (ns)            |          |           |             |          |
| Summary                |          |           |             |          |
| Clock                  | Target   | Estimated | Uncertainty |          |
| ap_clk                 | 10.00    | 10.79     | 1.25        |          |
| Latency (clock cycles) |          |           |             |          |
| Summary                |          |           |             |          |
| Latency                | Interval |           |             |          |
| min                    | max      | min       | max         | Type     |
| 507                    | 507      | 374       | 374         | dataflow |

Figure 6-44: Performance Estimates After Applying the DATAFLOW Directive

The Dataflow pipeline throughput indicates the number of clock cycles between each set of inputs reads (interval parameter). If this value is less than the design latency, it indicates that the design can start processing new inputs before the currents input data are output.

Note that the dataflow is only supported for the functions and loops at the top level, not those that are down through the design hierarchy. Only loops and functions exposed at the top level of the design will get benefit from dataflow optimization.

- 6-4-2.** Scroll down in the comparison report to view the resource utilization.

Observe that the number of BRAM\_18K resources required at the top level has increased from three to four.

| Utilization Estimates |          |        |        |       |
|-----------------------|----------|--------|--------|-------|
| Summary               |          |        |        |       |
| Name                  | BRAM_18K | DSP48E | FF     | LUT   |
| DSP                   | -        | -      | -      | -     |
| Expression            | -        | -      | 0      | 1     |
| FIFO                  | -        | -      | -      | -     |
| Instance              | 2        | 8      | 985    | 564   |
| Memory                | 2        | -      | 512    | 32    |
| Multiplexer           | -        | -      | -      | 16    |
| Register              | -        | -      | 8      | -     |
| Total                 | 4        | 8      | 1505   | 613   |
| Available             | 280      | 220    | 106400 | 53200 |
| Utilization (%)       | 1        | 3      | 1      | 1     |

**Figure 6-45: Resource Utilization After DATAFLOW Directive**

- 6-4-3.** In the Console view, notice that **dct\_coeff\_table** is automatically partitioned in dimension 2.

The *buf\_2d\_in* and *col\_inbuf* arrays are partitioned as you had applied the directive in the previous run. The dataflow is applied at the top level, which created channels between the top-level functions *read\_data*, *dct\_2d*, and *write\_data*.

```

INFO: [HLS 200-10] Checking synthesizability ...
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'DCT_Outer_Loop' (dct.c:13) in function 'dct_1d' for pipelining.
INFO: [XFORM 203-501] Unrolling loop 'DCT_Inner_Loop' (dct.c:15) in function 'dct_1d' completely.
INFO: [XFORM 203-102] Partitioning array 'dct_coeff_table' in dimension 2 automatically.
INFO: [XFORM 203-101] Partitioning array 'buf_2d_in' (dct.c:81) in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'col_inbuf' (dct.c:27) in dimension 2 completely.
WARNING: [XFORM 203-710] All the elements of global array 'buf_2d_in[0]' (dct.c:81) should be updated in process function 'read_data'.
WARNING: [XFORM 203-713] All the elements of global array 'buf_2d_in[7]' (dct.c:81) should be updated in process function 'read_data'.
WARNING: [XFORM 203-713] All the elements of global array 'buf_2d_in[0]' (dct.c:81) should be updated in process function 'read_data'.
WARNING: [XFORM 203-713] All the elements of global array 'buf_2d_in[1]' (dct.c:81) should be updated in process function 'read_data'.
WARNING: [XFORM 203-713] All the elements of global array 'buf_2d_in[5]' (dct.c:81) should be updated in process function 'read_data'.
WARNING: [XFORM 203-713] All the elements of global array 'buf_2d_in[2]' (dct.c:81) should be updated in process function 'read_data'.
WARNING: [XFORM 203-713] All the elements of global array 'buf_2d_in[3]' (dct.c:81) should be updated in process function 'read_data'.
WARNING: [XFORM 203-713] All the elements of global array 'buf_2d_in[4]' (dct.c:81) should be updated in process function 'read_data'.
INFO: [XFORM 203-712] Applying dataflow to function 'dct' (dct.c:78), detected/extracted 3 process function(s):
    'read_data'
    'dct_2d'
    'write_data'.
INFO: [XFORM 203-11] Balancing expressions in function 'dct_1d' (dct.c:4)...8 expression(s) balanced.
INFO: [XFORM 203-541] Flattening a loop nest 'WR_Loop_Row' (dct.c:71:67) in function 'write_data'.
INFO: [XFORM 203-541] Flattening a loop nest 'WR_Loop_Row' (dct.c:71:67) in function 'write_data'.

```

Figure 6-46: Synthesis Process After the DATAFLOW Directive

## 6-5. Switch to the Analysis perspective and look at the *dct* performance profile view.

- 6-5-1.** Select **Solution > Open Analysis Perspective** or click the **Analysis** perspective icon



- 6-5-2.** In the Module Hierarchy tab, select **dct\_2d**.

- 6-5-3.** Select the **Performance Profile** tab.



Figure 6-47: Performance Analysis After the DATAFLOW Directive

One of the limitations of the dataflow optimization is that it only works on top-level loops and functions.

One way to have the blocks in *dct\_2d* operate in parallel would be to pipeline the entire function. This however would unroll all the loops and can sometimes lead to a large area increase.

An alternative is to raise these loops up to the top level of hierarchy, where dataflow optimization can be applied, by removing the *dct\_2d* hierarchy, i.e. inline the *dct\_2d* function.

- 6-5-4.** Switch to the Synthesize perspective.

## Applying the INLINE Directive

## Step 7

---

### 7-1. Create a new solution by copying the previous solution (*solution5*) settings.

- 7-1-1. Select **Project > New Solution** or click the **New Solution** icon (  ) from the toolbar.
- 7-1-2. Leave the options at their default settings (with *solution5* selected).
- 7-1-3. Click **Finish**.

### 7-2. Apply the INLINE directive to the *dct\_2d* function.

- 7-2-1. Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.
- 7-2-2. Ensure that the **dct.c** file is open and active in the Information pane.
- 7-2-3. In the Directive tab, right-click the **dct\_2d** function and select **Insert Directive**.
- 7-2-4. Select **INLINE** from the Directive drop-down list.

The INLINE directive causes the function to which it is applied to be inlined—its hierarchy is dissolved.

- 7-2-5. Click **OK**.

### 7-3. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

### 7-4. View the Synthesis report.

**You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.**

- 7-4-1. Observe that the latency reduced from 507 to 479 clock cycles and the DATAFLOW pipeline throughput drastically reduced from 374 to 106 clock cycles in the Performance Estimates.
- 7-4-2. Examine the synthesis log to see what transformations were applied automatically.

The *dct\_1d* function calls are now automatically inlined into the loops from which they are called, which allows the loop nesting to be flattened automatically.

Note also that the DSP48E usage has doubled (from 8 to 16). This is because previously a single instance of *dct\_1d* was used to do both row and column processing. Now that the row and column loops are executing concurrently, this can no longer be the case and two copies of *dct\_1d* are required. The Vivado HLS tool will seek to minimize the number of clocks, even if it means increasing the area.

Block RAM usage has increased once again (from 4 to 6), due to ping-pong buffering between more dataflow processes.

```
Vivado HLS Console
WARNING: [XFORM 203-713] All the elements of global array 'col_inbuf[6]' (dct.c:27) should be updated in process function 'Loop_Xpose_Col_Outer_Loop_proc'.
INFO: [XFORM 203-712] Applying dataflow to function 'dct' (dct.c:78), detected/extracted 6 process function(s):
  'read_data'
  'Loop_Row_DCT_Loop_proc'
  'Loop_Xpose_Row_Outer_Loop_proc'
  'Loop_Col_DCT_Loop_proc'
  'Loop_Xpose_Col_Outer_Loop_proc'
  'write_data'.
INFO: [XFORM 203-602] Inlining function 'dct_1d' into 'Loop_Row_DCT_Loop_proc' (dct.c:33->dct.c:87) automatically.
INFO: [XFORM 203-602] Inlining function 'dct_1d' into 'Loop_Col_DCT_Loop_proc' (dct.c:44->dct.c:87) automatically.
INFO: [XFORM 203-11] Balancing expressions in function 'Loop_Row_DCT_Loop_proc' (dct.c:13:61)...8 expression(s) balanced.
INFO: [XFORM 203-11] Balancing expressions in function 'Loop_Col_DCT_Loop_proc' (dct.c:13:61)...8 expression(s) balanced.
INFO: [XFORM 203-541] Flattening a loop nest 'WR_Loop_Row' (dct.c:71:67) in function 'write_data'.
INFO: [XFORM 203-541] Flattening a loop nest 'RD_Loop_Row' (dct.c:59:67) in function 'read_data'.
INFO: [XFORM 203-541] Flattening a loop nest 'Xpose_Row_Outer_Loop' (dct.c:38:1) in function 'Loop_Xpose_Row_Outer_Loop_proc'.
INFO: [XFORM 203-541] Flattening a loop nest 'Xpose_Col_Outer_Loop' (dct.c:49:1) in function 'Loop_Xpose_Col_Outer_Loop_proc'.
```

**Figure 6-48: Synthesis Process in the Console Tab After INLINE Directive Applied to the *dct\_2* Function**

## 7-5. Switch to the Analysis perspective and review the module hierarchy.

- 7-5-1. Select **Solution > Open Analysis Perspective** or click the **Analysis** perspective icon



- 7-5-2. In the Module Hierarchy tab, select **dct**.

|  |                                    | BRAM | DSP | FF   | LUT | Latency | Interval | Pipeline type |
|--|------------------------------------|------|-----|------|-----|---------|----------|---------------|
|  | dct                                | 6    | 16  | 2360 | 566 | 479     | 106      | dataflow      |
|  | dct_read_data                      | 0    | 0   | 29   | 55  | 66      | 66       | none          |
|  | dct_Loop_Row_DCT_Loop_proc         | 0    | 8   | 600  | 114 | 105     | 105      | none          |
|  | dct_Loop_Xpose_Row_Outer_Loop_proc | 0    | 0   | 29   | 57  | 66      | 66       | none          |
|  | dct_Loop_Col_DCT_Loop_proc         | 0    | 8   | 600  | 114 | 105     | 105      | none          |
|  | dct_Loop_Xpose_Col_Outer_Loop_proc | 0    | 0   | 30   | 65  | 66      | 66       | none          |
|  | dct_write_data                     | 0    | 0   | 32   | 63  | 66      | 66       | none          |

**Figure 6-49: Performance Analysis After the INLINE Directive**

Now that the *dct\_2d* function is inlined, observe that the *dct\_2d* entry is replaced with the following:

- *dct\_Loop\_Row\_DCT\_Loop\_proc*
- *dct\_Loop\_Xpose\_Row\_Outer\_Loop\_proc*
- *dct\_Loop\_Col\_DCT\_Loop\_proc*
- *dct\_Loop\_Xpose\_Col\_Outer\_Loop\_proc*

Also observe that all the functions are operating in parallel, yielding the top-level function interval (throughput) of 106 clock cycles.

**7-5-3.** Switch to the Synthesize perspective.

## Applying the RESHAPE Directive

## Step 8

The ARRAY\_RESHAPE directive combines ARRAY\_PARTITIONING with the vertical mode of ARRAY\_MAP and is used to reduce the number of block RAM while still allowing the beneficial attributes of partitioning: parallel access to the data.

The ARRAY\_RESHAPE directive allows more data to be accessed in a single clock cycle. In cases where more data can be accessed in a single clock cycle, the Vivado HLS tool may automatically unroll any loops consuming this data, if doing so will improve the throughput. The loop can be fully or partially unrolled to create enough hardware to consume the additional data in a single clock cycle.

### 8-1. Create a new solution by copying the previous solution (*solution6*) settings.

- 8-1-1.** Select **Project > New Solution** or click the **New Solution** icon (  ) from the toolbar.
- 8-1-2.** Leave the options at their default settings (with *solution6* selected).
- 8-1-3.** Click **Finish**.

### 8-2. Apply the ARRAY\_SHAPE directive to the *buf\_2d\_in* array of the *dct* function, the *col\_inbuf* array of the *dct\_2d* function, and the *dct\_coeff\_table* array of the *dct\_1d* function.

- 8-2-1.** Select **Project > Close Inactive Solution Tabs** to close all inactive solution windows.
- 8-2-2.** Ensure that the **dct.c** file is open and active in the Information pane.
- 8-2-3.** In the Directive tab, right-click the **ARRAY\_PARTITION** directive of the *buf\_2d\_in* array of the *dct* function and select **Modify Directive**.
- 8-2-4.** Select **ARRAY\_SHAPE** from the Directive drop-down list.
- 8-2-5.** Select the *type (optional)* as **complete**.
- 8-2-6.** Enter **2** in the *dimension (optional)* field.

**8-2-7.** Click **OK**.

Figure 6-50: Applying the **ARRAY\_reshape** Directive

- 8-2-8.** Similarly, change the **ARRAY\_PARTITION** directive applied to the `col_inbuf` array of the `dct_2d` function in the Directive pane to **ARRAY\_reshape** with a dimension of **2** and type as **complete**.
- 8-2-9.** Assign the **ARRAY\_reshape** directive with dimension of **2** and type as **complete** to the `dct_2coeff_table` array.

The final view of the ARRAY\_reshape directives in the Directive tab should look similar to the figure below.



**Figure 6-51: Final View of ARRAY\_reshape Directives**

### 8-3. Synthesize the Vivado HLS design.

If you do not recall how to perform this task, refer to the "Synthesizing the Vivado HLS Design" section under Vivado HLS Operations in the *Lab Reference Guide*.

### 8-4. View the Synthesis report.

**You should see that the Synthesis report in the Information window opens automatically when synthesis is completed.**

- 8-4-1.** Observe that the latency increased from 479 to 607 clock cycles and the DATAFLOW pipeline throughput increased from 106 to 131 clock cycles in the Performance Estimates.

The block RAM resource utilization increased from 6 to 22.

- 8-4-2.** Examine the synthesis log to see what transformations were applied automatically.

Reviewing the synthesis log will provide some clues. There are warnings in the scheduling phase for *read\_data* stating that  $\text{II}=1$  could not be achieved. In fact, *read\_data* complains about conflicting read and write operations.

The problem here is due to the fact that an update to a single element in a reshaped array requires that the entire word be read, the single element updated, and the entire

word written back. An array that has been reshaped requires a read-modify-write cycle (the Vivado HLS tool does not implement byte-masking on writes).

This operation negatively impacts the maximum write bandwidth for such an array.

Thus it can be seen the directives have to be applied carefully.

- 8-4-3.** Select **File > Exit** to close the Vivado HLS tool.

## Summary

In this lab, you learned various techniques to improve performance and balance resource utilization.

The PIPELINE directive, when applied to an outer loop, will automatically cause the inner loop to unroll. When a loop is unrolled, resource utilization increases as operations are done concurrently. Partitioning memory may improve performance but will increase block RAM utilization.

When the DATAFLOW directive is applied, the default memory buffers (of ping-pong type) are automatically inserted between the top-level functions and loops. When the INLINE directive is applied to a function, the lower level hierarchy is automatically dissolved.

The RESHAPE directive will allow multiple accesses to block RAM. However, care should be taken if a single element requires modification as it will result in a read-modify-write operation for the entire word.

The Analysis perspective and console logs can provide insight on what is going on.

## Answers

1. Looking at the report, fill in the table below.

|                        |      |
|------------------------|------|
| Estimated clock period | 6.38 |
| Worst case latency     | 3959 |
| Number of DSP48E used  | 1    |
| Number of BRAMs used   | 5    |
| Number of FFs used     | 278  |
| Number of LUTs used    | 353  |

### Performance and Utilization Estimates



# Lab 7: HLx Flow – System Integration

2016.1

## Abstract

This lab introduces the HLx design flow for generating IP from the Vivado® HLS tool and using the generated IP to create a subsystem with the MicroBlaze™ processor using the Vivado IP integrator.

This lab should take approximately 75 minutes.

## Objectives

After completing this lab, you will be able to:

- Describe the HLx design flow
- Identify the steps and directives involved in creating an IP within the Vivado HLS tool
- Create a system with the MicroBlaze processor and IP (fir filter) created with the Vivado HLS tool using the IP integrator flow
- Use software and run it on the created subsystem

## Introduction

In this lab, you will use a C design for an **11 tap FIR (Finite Impulse Response) filter**. This lab requires you to develop an IP of the designed filter using the Vivado HLS tool. Using the IP integrator flow, you will create a system with the MicroBlaze processor and the IP created in the HLS tool. In the Software Development Kit (SDK), you will use the test application provided to write into and read from the IP interface registers.

## General Flow



## Creating a Vivado HLS Tool Project

**Step 1**

In this step, you will create a new Vivado HLS project to implement a FIR filter.

There are a number of ways to launch the Vivado HLS tool. The two most popular mechanisms are shown here.

### 1-1. Launch the Vivado HLS tool.

- 1-1-1. Select **Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado HLS > Vivado HLS 2016.1**.



Figure 7-1: Launching the Vivado HLS Tool

-- OR --

Double-click the **Vivado HLS** shortcut icon (  ) on the desktop.

The Vivado HLS tool opens to the Welcome window. From the Welcome window you can create a new project, open examples, and access documentation and examples.



Figure 7-2: Vivado HLS Welcome Screen

Here you will learn to create a new Vivado HLS project from scratch.

### 1-2. Create a Vivado HLS project named **fir\_prj**.

- 1-2-1. From the Welcome Page, click **Create New Project**.



Figure 7-3: Creating a New Vivado HLS Project

### 1-3. The Project Configuration dialog box asks for a project name and location.

- 1-3-1. Enter **fir\_prj** in the Project name field (1).
- 1-3-2. Enter **C:\training\hls\labs\hlx\_system\_integration\KC705\fir** in the Location field (2).



Figure 7-4: Configuring a New HLS Project

- 1-3-3. Click **Next** (3).

**1-4. The Add/Remove Files dialog box opens. Here you will be invited to add existing files or create new sources.**

**1-4-1. Click Add Files.**

The Open File dialog box opens.

**Note:** If do not have existing files at this moment and you want to create new ones, click **New File**.

**1-4-2. Browse to C:\training\hls\labs\hlx\_system\_integration\KC705\fir.**

**1-4-3. Select fir.c.**

The Vivado HLS tool automatically adds the working directory (project directory) and any directory that contains C files added to the project to the search path. Hence, header files that reside in these directories are automatically included in the project (no need to explicitly specify them). You must specify the path to all other header files (if any) by clicking the Edit CFLAGS button.

**1-4-4. Click Open to add these files.**

Note that you can add compiler directives specific to each entry at this point.

**1-4-5. Click Browse next to the Top Function field.**

The Select Top function dialog box opens, which lists all the functions available from the specified source files.

**1-4-6. Select fir (fir.c) from the list.**

**Note:** You can also manually enter the name of the top function in the Top Function field.



**Figure 7-5: Adding Files to a New Vivado HLS Project**

**1-4-7. Click Next.**

**1-5. Add any existing testbench files.**

If you have (or want) any testbench files they can be entered here.  
Sometimes the testbench is built into the synthesizable file.

- 1-5-1.** Click **Add Files**.

- 1-5-2.** Navigate to *C:\training\hls\labs\hlx\_system\_integration\KC705\fir*.

- 1-5-3.** Select **fir\_test.c, out.gold.dat, and out.gold.8.dat**.

- 1-5-4.** Click **Open** to add these files.

- 1-5-5.** Click **Next**.

**1-6. Finally, it is time to specify some of the physical parameters of the design.**

- 1-6-1.** By default, **solution1** is populated in the Solution Name field.

No changes are required.

- 1-6-2.** Set the clock period to **10**.

You can leave the Uncertainty field blank.

- 1-6-3.** Click the **Browse** button to select a part or board.



Figure 7-6: Locating the Board Browse Button

- 1-6-4.** Click **Boards** as shown below.  
**1-6-5.** Enter **kintex** in the Search field.



**Figure 7-7: Filtering to Quickly Locate Target Platforms**

- 1-6-6.** Select **Kintex-7 KC705 Evaluation Platform** from the search list.  
**1-6-7.** Click **OK** to select the board.  
**1-6-8.** Click **Finish**.  
**1-6-9.** You will see the created project in the Explorer tab.

## 1-7. Explore the design.

- 1-7-1.** Expand the various sub-folders to see the entries under each sub-folder.



**Figure 7-8: Newly Created Vivado HLS Tool Project**

**1-7-2.** In the Explorer tab, expand the **Source** folder.

**1-7-3.** Double-click **fir.c**.

This will open the *fir.c* file in the Information pane.



```
#include "fir.h"

void fir (
    data_t *y,
    data_t x
) {
    const coef_t c[N] = {0,-10,-9,23,56,63,56,23,-9,-10,0};
    static data_t shift_reg[N];
    acc_t acc;
    int i;

    acc=0;
    Shift_Accum_Loop: for (i=N-1;i>=0;i--) {
        if (i==0) {
            acc+=x*c[0];
            shift_reg[0]=x;
        } else {
            shift_reg[i]=shift_reg[i-1];
            acc+=shift_reg[i]*c[i];
        }
    }
    *y=acc;
}
```

**Figure 7-9: Design Under Consideration**

The FIR filter expects *x* as a sample input and a pointer to the computed sample out. Both of these are defined as having a data type, *data\_t*. The coefficients are loaded in array *c* of type *coef\_t* through initialization. The sequential algorithm is applied and the accumulated value is computed out.

**1-7-4.** In the Explorer tab, expand the **Includes > C:/training/hls/labs/hlx\_system\_integration/KC705/fir** folder.

**1-7-5.** Double-click **fir.h** to open it.

This header file has the constant and type declarations associated with the *fir.c* design.

**1-7-6.** Similarly, open the **fir\_test.c** file located in the *fir.prj > Test Bench* folder to explore its contents.

## Running C Simulation

## Step 2

In this step, you will validate the FIR filter design with the provided C testbench.

### 2-1. Simulate the Vivado HLS tool design.

- 2-1-1. Select **Project > Run C Simulation** or click the **Run C Simulation** icon (  ).



Figure 7-10: Launching the C Simulation

The Run C Simulation dialog box opens.



Figure 7-11: C Simulation Dialog Box

Each of the options controls how simulation is run:

- Launch Debugger: After compilation the debug perspective automatically opens for you to step through the code.
- Build Only: Compile the object/netlist files but do not execute.
- Clean Build: Remove previously compiled files before compiling.

- Optimizing Compile: Use the gcc/g++ -O option (no debug info and this is mutually exclusive with debug options; this may run faster, but the difference is not substantial).

**2-1-2.** Select **the default options (i.e. select nothing)**.

**2-1-3.** Click **OK**.

The simulation log will be displayed in the editor pane.

## **2-2. View the simulation report.**

**The information generated by the Vivado HLS tool can be found in two places, both described here.**

**The first is the Console window, which reports not only the output produced by the code being simulated, but all of the simulation engine messages as well. The simulation log provides only a few simulation engine messages and the simulated code output.**

**2-2-1.** Select the **Console** tab in the lower portion of the tool's GUI.

You may need to scroll to view all the output produced by the simulation.



The screenshot shows the Vivado HLS Console window. The title bar says "Console" and has tabs for "Errors" and "Warnings". The main area displays the following text:

```
Compiling(apcc) ../../../../dct.c in debug mode
INFO: [HLS 200-10] Running 'c:/Xilinx/Vivado_HLS/2016.2/bin/unwrapped/win64.o/apcc.exe'
INFO: [HLS 200-10] For user 'jayeshk' on host 'Windows NT_amd64 version 6.1' on Thu Jun 16 1:
INFO: [HLS 200-10] In directory 'C:/training/hls/labs/hls_tool_flow/zc702/dct/dct_prj/solution1/csims/build'
INFO: [APCC 202-3] Tmp directory is apcc_db
INFO: [APCC 202-1] APCC is done.
Generating csim.exe
*** *** ***
Results are good
*** *** ***
INFO: [SIM 211-1] CSim done with 0 errors.
Finished C simulation.
```

**Figure 7-12: Example Output After a Simulation**

The other location, described below, provides only a few simulation engine messages and the simulated code output. Typically this is opened after the simulation completes; however, if you need to access it after closing the log pane, here's how to access the simulation report.

**2-2-2.** Expand **fir\_prj > solution1 > csim > report** in the Explorer pane.

- 2-2-3.** Double-click the log file name to open it in the editor pane.



Figure 7-13: Locating the Simulation Log File

If you do not see the above message, ask for help from your instructor.

This testbench is a self-checking testbench; that is, the computed output is compared against a reference golden output and returns whether the test passed or failed.

## Setting Up the AXI Lite Adapters and Synthesizing the Design Step 3

In this step, you will apply the Resource directive to the FIR interface signals to generate an AXI Lite interface for the filter. You will also perform synthesis on the design.

### 3-1. Add the Resource directive to create the AXILiteS adapters.

- 3-1-1.** Select the **fir.c** tab in the Editor window (1).
- 3-1-2.** In the Auxiliary pane, select the **Directive** tab (2).



Figure 7-14: Selecting the Directive Tab

- 3-1-3.** In the Directive tab, right-click the variable **x** and select **Insert Directive**.



Figure 7-15: Inserting a Directive for the Variable x

The Vivado HLS Directive Editor dialog box opens.

- 3-1-4.** Select **INTERFACE** from the Directive drop-down list (1).
- 3-1-5.** Select **s\_axilite** from the mode (optional) drop-down list (2).
- 3-1-6.** Enter **fir\_io** in the bundle (optional) field to associate the input and output into one s\_axilite interface called *fir\_io* (3).



Figure 7-16: Selecting the s\_axilite Interface Directive

- 3-1-7.** Click **OK** (4).

- 3-1-8. Repeat the previous steps to apply the INTERFACE directive to the **y** output.



Figure 7-17: Inserting the Directive for the Variable y

When the design is synthesized, this will generate the ap\_start signal.

- 3-1-9. Repeat the previous steps to apply the INTERFACE directive to the top-level module **fir** in the Directive tab.



Figure 7-18: Inserting the Directive for the Top Module FIR

This will create address maps for the x, y, ap\_start, ap\_valid, ap\_done, and ap\_idle signals, which can be accessed via software.

Alternatively, the ap\_start, ap\_valid, ap\_done, and ap\_idle signals can be generated as separate ports on the core if you do not apply this directive. These ports will then have to be connected in a processing system using the GPIO IP core.

### 3-2. Synthesize the design.

- 3-2-1. Select **Solution > Run C Synthesis > Active Solution** or click the **Run Synthesis** icon in the menu bar.



Figure 7-19: Launching Synthesis

This option synthesizes the currently selected solution.

All solutions (or selected solutions) can be synthesized by using the drop-down menu next to the synthesis icon. You can synthesize all solutions or synthesize selected solutions in addition to the default.



Figure 7-20: Options for What to Synthesize

When synthesis completes, the Synthesis report will be displayed in the Information pane.

- 3-2-2. Review the Synthesis report.  
3-2-3. Go to the Interface summary and view the I/O ports generated by the tool.

## Performing Co-simulation and Exporting the Design as an IP Step 4

Now that you have completed high-level synthesis on the C design, you will perform RTL co-simulation on the generated RTL using the C testbench and then export the design as an IP.

### 4-1. Cosimulate the Vivado HLS tool design.

- 4-1-1. Select **Solution > Run C/RTL Cosimulation** or click the **Run C/RTL Cosimulation** icon (checkbox).



Figure 7-21: Launching from the Menu

The Run C/RTL Co-simulation dialog box opens.



Figure 7-22: Co-simulation Dialog Box

Each of the options controls how C/RTL Cosimulation is run:

- RTL Selection: Select the RTL that is simulated (Verilog/VHDL).
- Setup Only: This creates all the files (wrappers, adapters and scripts) required to run the simulation but does not execute the simulator.
- Dump Trace: During RTL verification, the trace files can be saved and viewed using an appropriate viewer. By selecting this option, the trace file will be saved to the `<solution>/sim/<RTL>` folder.
- Optimizing Compile: This ensures a high level of optimization is used to compile the C testbench. Using this option increases the compile time but the simulation executes faster.

**4-1-2.** Select **the default options (i.e. select nothing)**.

**4-1-3.** Click **OK**.

The simulation log will be displayed in the editor pane.

**4-2. View the Cosimulation report.**

**The information generated by the Vivado HLS tool can be found in two places, both described here.**

**The first is the Console window, which reports not only the output produced by the code being simulated, but all of the simulation engine messages as well. The simulation log provides only a few simulation engine messages and the simulated code output.**

**4-2-1.** Select the **Console** tab in the lower portion of the tool's GUI.

You may need to scroll to view all the output produced by the cosimulation.



```

Vivado HLS Console
// Inter-Transaction Progress: completed Transaction / Total Transaction
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
// RTL Simulation : 0 / 601 [0.00%] @ "125000"
// RTL Simulation : 601 / 601 [100.00%] @ "498965000"
$finish called at time : 499005 ns : File "C:/training/
## quit
INFO: [Common 17-206] Exiting xsim at ...
INFO: [COSIM 212-316] Starting C post checking ...
Comparing against output data
*****
PASS: The output matches the golden output!
*****
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
Finished C/RTL cosimulation.
|
```

**Figure 7-23: Example Output After a C/RTL Co-simulation**

The other location, described below, provides only a few simulation engine messages and the simulated code output. Typically this is opened after the simulation completes; however, if you need to access it after closing the log pane, here's how to access the simulation report.

4-2-2. Expand **fir\_prj > solution1 > sim > report** in the Explorer pane.

4-2-3. Double-click the log file name to open it in the editor pane.



Figure 7-24: Locating the Co-Simulation Log File

The Cosimulation Report in HTML format will be displayed in the main viewing area.

The screenshot shows the main LabVIEW workspace with the following content:

**Cosimulation Report for 'dct'**

| RTL     | Status      | Latency    |            |            | Interval   |            |     |
|---------|-------------|------------|------------|------------|------------|------------|-----|
|         |             | min        | avg        | max        | min        | avg        | max |
| VHDL    | NA          | NA         | NA         | NA         | NA         | NA         |     |
| Verilog | <b>Pass</b> | [redacted] | [redacted] | [redacted] | [redacted] | [redacted] |     |

Export the report(.html) using the [Export Wizard](#)

Figure 7-25: Cosimulation Report – HTML

You can quickly verify the cosimulation status here.

You will now export the RTL as an IP core to be used with the top-level design.

#### 4-3. Export the RTL, selecting Verilog as the language.

- 4-3-1. Select **Solution > Export RTL**.

Alternatively, you can click the  toolbar button.

- 4-3-2. Ensure that **IP Catalog** is selected from the Format Selection drop-down list.



Figure 7-26: Export RTL Dialog Box

- 4-3-3. Make certain that the **Evaluate** option is selected and that **Verilog** is selected as the RTL to be evaluated.

This will perform Vivado RTL synthesis and implementation on the generated IP. Implementation is run to evaluate and provide confidence that the RTL will meet its estimated timing and area goals and that these results are not included as part of the exported package.

- 4-3-4. Click **OK** in the Export RTL dialog box.

You can observe the progress in the Console tab. When the run is complete, the Implementation Report is displayed in the Information pane. The final timing of the implemented design has been achieved.

The exported IP is available in the `<solution_directory>\impl\ip` folder.

- 4-3-5.** When implementation completes, expand the **impl\ip** folder under **solution1** in the Explorer tab view and observe the various generated folders and files (IP, hardware files, software drivers, and IP documentation).



Figure 7-27: Exploring the Exported IP under the **impl** Directory

- 4-3-6.** Select **File > Exit** to close the Vivado HLS tool.

## Creating a MicroBlaze Processor System Using the Vivado IPI and Exporting Hardware to SDK

### Step 5

In this step you will create a MicroBlaze processor system using the Vivado IP integrator (IPI), add the IP generated in the Vivado HLS tool, generate the bitstream for the system, and export the design to the SDK tool.

There are a number of ways to launch the Vivado Design Suite. The two most popular mechanisms are shown here.

#### 5-1. Launch the Vivado Design Suite.

**This can be done in two standard ways, use your preferred method.**

- 5-1-1.** Select **Start > All Programs > Xilinx Design Tools > Vivado 2016.1 > Vivado 2016.1.**



**Figure 7-28: Launching the Vivado Design Suite from the Start Menu**

-- OR --

Double-click the **Vivado Design Suite** shortcut icon (

The Vivado Design Suite opens to the Welcome window. From the Welcome window you can create a new project, open an existing project, or enter Tcl commands directly into the Vivado Design Suite as well as access documentation and examples.



**Figure 7-29: Vivado Design Suite Welcome Screen**

This launches the New Project wizard.

"Create New Project" is the starting point for all designs. Projects contains sources, settings, graphics, IP, and other elements that are used to build a final bitstream and analyze a design. The Create New Project Wizard in the Vivado Design Suite allows you to specify HDL and other project resource files that will be included in the project.

## 5-2. Create a new blank Vivado Design Suite project.

- 5-2-1. Click **Create New Project** to begin the process (1).



**Figure 7-30: Creating a New Vivado Design Suite Project**

This will launch the New Project Wizard.

- 5-2-2. Click **Next** to exit the introductory dialog box and begin entering in project-specific information (2).

**5-3. You will now encounter a series of dialog boxes asking you to enter different pieces of information describing the project.**

- 5-3-1.** Enter **system** in the Project name field.  
**5-3-2.** Enter **C:\training\HLx\_Flow\labs\hlx\_system\_integration\KC705\fir** in the Project location field.

Alternatively, you can use the browse feature to navigate to where you want the project to reside.

- 5-3-3.** Deselect the **Create Project Subdirectory** option as leaving this checked will create an unnecessary level of hierarchy for this lab.



**Figure 7-31: Entering the Project Name and Location**

- 5-3-4.** Click **Next** to advance to the next dialog box.

Here you will choose between an RTL project or a post-synthesis project. Simply put, an RTL project enables you to add or create new HDL files and synthesize them, whereas the post-synthesis project requires pre-synthesized files. When an empty design is created, an RTL project is used.

- 5-3-5.** Select **RTL Project** (1).

- 5-3-6.** Select **Do not specify sources at this time** (2), which creates a blank project.

While existing sources could be entered at this time, you will enter them later so that you can move through this portion of the project creation process more quickly.



**Figure 7-32: Selecting Project Type**

- 5-3-7.** Click **Next** to advance to the target device/platform selection (3).
- 5-4.** **Select the target part by first filtering by board and then by family. If you are not using a supported board, you will need to filter by part.**
- 5-4-1.** Select **Boards** from the Select area to filter by board rather than by the specific part (1).
- 5-4-2.** Select **All** from the Vendor drop-down list in the Filter area (2).
- This limits the number of boards seen to those manufactured by the specified vendor.
- 5-4-3.** Select **Kintex-7 KC705 Evaluation Platform** from the board list.
- Alternatively you can select the board directly from the list at any time while in this dialog box.



**Figure 7-33: Selecting the Board for the Project**

- 5-4-4.** Click **Next** to advance to the summary (3).
- A summary of your project is displayed. If you want to change any of the information that you entered, you can do so now by clicking **Back** until you reach the correct dialog

box and making the correction, or you can create the project now and edit the project properties, add or remove files, etc. later.

- 5-4-5.** Click **Finish** to accept these settings and build the project.

Your project is constructed and leaves you in the operational portion of the Vivado Design Suite GUI.

## **5-5. Add the IP generated by the Vivado HLS to tool to the IP catalog.**

- 5-5-1.** Select **Tools > Project Settings**.
- 5-5-2.** Click **IP** in the Project Settings dialog box.
- 5-5-3.** Select the **Repository Manager** tab in the dialog box.
- 5-5-4.** Click **+** to add a repository.
- 5-5-5.** Select the **C:\training\HLx\_Flow\labs\hlx\_system\_integration\KC705\fir\fir\_prj\solution1\impl\ip** directory.
- 5-5-6.** Click **OK** in the Add Repository dialog box.



**Figure 7-34: Adding Vivado HLS Tool IP to the IP Repository**

- 5-5-7.** Click **OK** to add the Fir IP generated in the Vivado HLS tool to the IP catalog.

- 5-5-8. Select **IP Catalog** from the Flow Navigator under *Project Manager*.



Figure 7-35: IP Catalog with Vivado HLS Tool IP

Note that the Vivado HLS tool IP is added to the IP catalog.

- 5-5-9. Click **X** in the IP Catalog tab to close the IP catalog.

The Vivado IP integrator is a graphical tool that assists you in "stitching" together various pieces of IP. This tool can be used for both embedded and non-embedded designs.

### 5-6. Create an IP integrator block diagram.

- 5-6-1. If necessary, expand the **IP Integrator** field under the Flow Navigator (1).
- 5-6-2. Click **Create Block Design** to start creating a new IP subsystem (2).



Figure 7-36: Launching the IP Integrator

- 5-6-3. When the Create Block Design dialog box opens, name the design **microblaze\_design** (1).



Figure 7-37: Creating an IP Integrator Block Design

- 5-6-4. Click **OK** to open a new, blank IP integrator canvas (2).

The IP integrator workspace opens with a note in the canvas area inviting you to begin adding IP.



Figure 7-38: Initial View of the IP Integrator Tool Showing an Informational Message

### 5-7. Add a MicroBlaze processor to the IP integrator canvas.

- 5-7-1. Open the IP catalog in one of two ways:
- o Click the **Add IP** icon on the left border of the workspace.



Figure 7-39: Opening the IP Catalog from the Add IP Icon

-- OR --

- Right-click any background space in the workspace and select **Add IP**.



**Figure 7-40: Opening the IP Catalog from Right-Clicking the Workspace**

**Important Note:** Using Windows > IP Catalog will add the new IP to the top level of hierarchy of the design—it will NOT add the IP to the diagram! You can, however, float the Windows > IP Catalog and drag-and-drop from the catalog onto the canvas of the block design

Once the IP catalog opens, you can search for the processor block.



**Figure 7-41: Two Mechanisms to Search for IP**

- 5-7-2.** Enter part of the processor name into the Search field to narrow the search parameters (1).

For example, **zynq** for the Zynq All Programmable SoC or **microblaze** for the MicroBlaze processor. Note that the Zynq AP SoC PS IP will only appear when Zynq-7000 AP SoCs or boards with these parts have been selected for the design.



Figure 7-42: Narrowing Search Parameters (Zynq AP SoC)



Figure 7-43: Narrowing Search Parameters (MicroBlaze Processor)

- 5-7-3. Double-click the **MicroBlaze** processor name entry to add its IP block to the design (2).



Figure 7-44: Zynq Processing System IP



Figure 7-45: MicroBlaze Processor IP

- 5-8. Configure the MicroBlaze processor to include the following options:

- Local Memory: 16KB
- Local Memory ECC: None
- Cache Configuration: 16KB
- Debug Module: Debug Only
- Peripheral AXI port: Enabled
- Interrupt controller: Enabled
- Clock Connection: New Clocking Wizard (100 MHz)

- 5-8-1. Click **Run Block Automation**.

**5-8-2.** Configure the options according to the figure below.



**Figure 7-46: MicroBlaze Processor Options**

**5-8-3.** Click **OK**.

**5-8-4.** Click the **Regenerate Layout** icon (⟳) in the vertical toolbar on the left side of the canvas.

You should see the following block diagram.



**Figure 7-47: MicroBlaze Processor Configuration**

## 5-9. Make the connections to the external clock and reset.

### 5-9-1. Click Run Connection Automation.

5-9-2. Select the interfaces to connect to the external clock and reset as shown in the figure below.



Figure 7-48: Connections to the External Clock and Reset

5-9-3. Click OK.

## 5-10. Add and connect the axi\_uartlite to the design.

5-10-1. Right-click in the Diagram tab window and select **Add IP**.

5-10-2. In the Search field, enter **uart**.

5-10-3. Double-click **AXI Uartlite**.

The AXI Uartlite will appear in the Diagram window.

5-10-4. Click **Run Connection Automation**.

5-10-5. Select the interfaces to connect as shown in the figure below.



Figure 7-49: Connections to the Slave Interface and UART Ports

5-10-6. Click OK.

### 5-11. Configure the AXI Uartlite.

5-11-1. Double-click the **AXI Uartlite** block.

The Re-customize IP window opens.

5-11-2. In the IP Configuration tab, set the baud rate as **115200**.

5-11-3. Click **OK**.

### 5-12. Add the Fir IP (generated from the Vivado HLS tool) to the design.

5-12-1. Right-click the block diagram and select **Add IP**.

5-12-2. In the Search field, enter **Fir**.



Figure 7-50: Searching for the Fir IP in the IP Catalog

5-12-3. Double-click **Fir** to add the Fir IP to the block diagram.

### 5-13. Run Connection Automation.

5-13-1. Click **Run Connection Automation** from the Information bar.



Figure 7-51: Running Connection Automation

**5-13-2.** Select the slave interface to connect as shown below.



Figure 7-52: Connections to the FIR Slave Interface

**5-13-3.** Click **OK** in the Run Connection Automation dialog box.

This will automatically connects the clock, reset and slave interface.

#### 5-14. Connect the FIR interrupt to the interrupt controller to complete the design.

**5-14-1.** Connect the **interrupt** port of the **fir\_0** instance to the **In0** port of the **Concat** instance.

**5-14-2.** Double-click the **Concat** instance.

**5-14-3.** In the Re-customize IP window, change the Number of Ports to **1**.

#### 5-15. Regenerate the layout.

**5-15-1.** Click the **Regenerate Layout** icon ( ) in the vertical toolbar on the left side of the canvas.

The block diagram should resemble the following figure.



Figure 7-53: Completed Design

#### 5-16. Verify that addresses are assigned to the peripherals.

**5-16-1.** Select the **Address Editor** tab.

**5-16-2.** From the vertical toolbar, click **Expand All** (-expand icon).

The addresses are automatically assigned to the peripherals when connection automation was run.



The screenshot shows the Address Editor window with the following data:

| Cell                                         | Slave Interface | Base Name | Offset Address | Range | High Address |
|----------------------------------------------|-----------------|-----------|----------------|-------|--------------|
| microblaze_0                                 |                 |           |                |       |              |
| Data (32 address bits : 4G)                  |                 |           |                |       |              |
| microblaze_0_local_memory/dlmb_bram_if_cntlr | SLMB            | Mem       | 0x0000_0000    | 16K   | 0x0000_3FFF  |
| microblaze_0_axi_intc                        | s_axi           | Reg       | 0x4120_0000    | 64K   | 0x4120_FFFF  |
| axi_uartlite_0                               | S_AXI           | Reg       | 0x4060_0000    | 64K   | 0x4060_FFFF  |
| fir_0                                        | s_axi_fir_io    | Reg       | 0x44A0_0000    | 64K   | 0x44A0_FFFF  |
| Instruction (32 address bits : 4G)           |                 |           |                |       |              |
| microblaze_0_local_memory/ilmb_bram_if_cntlr | SLMB            | Mem       | 0x0000_0000    | 16K   | 0x0000_3FFF  |

**Figure 7-54: Performance estimate after applying DATAFLOW directive**

**5-16-3.** Click the **Validate Design** (checkmark icon) vertical toolbar button in the Diagram tab.

You should see a validation successful dialog box. If you see any errors in the design, ask for help from your instructor.

**5-16-4.** Click **OK** in the Validate Design dialog box.

**5-16-5.** Select **File > Save Block Design** to save the design.

## 5-17. Generate the bitstream for the design.

**5-17-1.** In the Sources window, right-click **microblaze\_system (microblaze\_system.bd)** and select **Create HDL Wrapper**.

**5-17-2.** Accept the default settings and click **OK** in the Create HDL Wrapper dialog box.

**5-17-3.** In the Flow Navigator, click **Generate Bitstream** under Program and Debug.

**5-17-4.** Click **Yes** in the No Implementation Results Available dialog box if it appears.

Generating the bitstream will take approximately five minutes.

**5-17-5.** Once the bitstream generation completed dialog box appears, select **Open Implemented Design** and click **OK**.

**5-17-6.** Select **File > Export > Export Hardware**.

**5-17-7.** In the Export Hardware for SDK dialog box, select the **Include bitstream** option.



Figure 7-55: Exporting Hardware to SDK

**5-17-8.** Click **OK**.

This exports the hardware to the path that is local to the project.

**5-17-9.** Select **File > Launch SDK**.



Figure 7-56: Launch SDK Dialog Box

**5-17-10.** Accept the default settings and click **OK**.

This will launch the SDK tool.

## Creating an Application in SDK

## Step 6

In this step you will create an application in the SDK environment to test the FIR functionality.

### 6-1. Add the Fir IP drivers to the SDK repository.

6-1-1. Select **Xilinx Tools > Repositories**.

6-1-2. Click **New** in the Local Repositories section.

6-1-3. Browse to the *C:\training\hls\labs\hlx\_system\_integration\KC705\fir\fir\_prj\solution1\impl\ip* directory.



Figure 7-57: Adding HLS IP Drivers to the SDK Repository

6-1-4. Click **OK**.

## 6-2. Create an application in SDK.

- 6-2-1. Select **File > New > Application Project**.
- 6-2-2. Enter **fir\_test\_application** in the Project name field.
- 6-2-3. Use the default name for the Board Support Package name, **fir\_test\_application\_bsp**.
- 6-2-4. Click **Next**.
- 6-2-5. Select **Empty Application** from the Available Templates section.
- 6-2-6. Click **Finish**.

Wait until the building workspace process completes.

## 6-3. Import the provided test application source into the SDK application project.

- 6-3-1. In the Project Explorer tab, expand **fir\_test\_application** (1).
- 6-3-2. Right-click the **src** folder and select **Import** (2).



Figure 7-58: Importing a Resource File

**6-3-3.** Expand **General** and select **File System**.**Figure 7-59: Selecting File System****6-3-4.** Click **Next**.**6-3-5.** In the From directory field, click **Browse**.**6-3-6.** Browse to the *C:\training\hls\support* directory.**6-3-7.** Click **OK**.**6-3-8.** Select the **fir\_test\_application.c** file from the support folder.**Figure 7-60: Selecting the Application Source File(s)****6-3-9.** Click **Finish**.

After importing *fir\_test\_application.c*, SDK will automatically build the project and generate the corresponding ELF (Executable and Load Format) file.

**6-3-10.** In the Project Explorer tab, expand **fir\_test\_application > src**.

**6-3-11.** Double-click **fir\_test\_application.c** to review the provided test program.

The test application will apply an impulse sequence, read the transfer function of the filter from the output, and compare against the reference and display whether the test has passed or failed.

## Verifying the Design in Hardware

## Step 7

You will now configure the FPGA, download the application, and test the program for proper operation.

Set up and connect the hardware evaluation board, or verify that this has properly been done before turning on the power.

### 7-1. Connect the board to your machine as shown below.

- 7-1-1.** Connect a USB cable from a USB port on your computer to the USB UART connector on the evaluation board.
- 7-1-2.** Similarly, connect the USB cable to the Digilent USB JTAG interface.
- 7-1-3.** Ensure that the power cord is plugged in and turn on the evaluation board.
- 7-1-4.** Make sure that the board settings are proper.
- 7-1-5.** Ensure that all DIP switches (SW11) are in the off position.



**Figure 7-61: KC705 Evaluation Board**

You may be prompted to install drivers when the board is first connected. Do not allow the driver installation to search the Web, but allow it to search for the drivers on your computer.

## 7-2. Determine which COM to use to access the USB serial port on the KC705.

- 7-2-1.** Make sure that the development board is powered on and the serial UART device USB cable is in place.

This ensures that the USB-to-serial bridge will be enumerated by the PC host.

- 7-2-2.** Select **Start > Devices and Printers**.



Figure 7-62: Opening the Device and Printers Panel

- 7-2-3.** Locate **Silicon Labs CP210x USB to UART Bridge (COM#)**.



Figure 7-63: Locating the Silicon Labs Com Port Driver

The # indicates the port number for this serial connection.

- 7-2-4.** Close the **Devices and Printers** window by clicking the red 'X' in the upper right of the window.

**7-3. Set up the COM port in the SDK terminal settings.**

- 7-3-1.** In the SDK Terminal tab, click the **Connect to Serial Port** icon to configure the serial port settings as shown in the following figure.



Figure 7-64: Connecting to the Serial Port

- 7-3-2.** In the **Connect to serial port** window, configure the settings as shown in the following figure.



Figure 7-65: Serial Port Settings

- 7-3-3.** Click **OK**.

## 7-4. Program the hardware and run the software application.

### 7-4-1. Select Xilinx Tools > Program FPGA.



Figure 7-66: Programming the Device

### 7-4-2. Click **Program**.

### 7-4-3. In the Project Explorer tab, right-click **fir\_test\_application** and select **Run as > Run Configurations**.

### 7-4-4. Double-click **Xilinx C/C++ application (System Debugger)**.

### 7-4-5. Ensure that **fir\_test\_application Debug** is selected as shown below (1).



Figure 7-67: Verifying the Run Configuration

### 7-4-6. Click **Run** to run the SDK application on the hardware (2).

- 7-4-7.** Select the **SDK Terminal** tab to view the result.



**Figure 7-68:** Viewing in Terminal after Downloading the Program

- 7-4-8.** Switch off the board.

- 7-4-9.** Close the SDK and Vivado tools.

## Summary

In this lab, you developed an IP from the C design of a FIR filter using the Vivado HLS tool. Using the IP integrator, you created a system with the MicroBlaze processor and the IP. You then used a test application to write into and read from the IP interface registers.

## Answers

Since there were no questions in this lab, this section is intentionally left blank.