



## Introduction to High-Level Synthesis with Vivado HLS

Vivado HLS 2013.3 Version

This material exempt per Department of Commerce license exception TSU

© Copyright 2013 Xilinx

### Objectives

► After completing this module, you will be able to:

- Describe the high level synthesis flow
- Understand the control and datapath extraction
- Describe scheduling and binding phases of the HLS flow
- List the priorities of directives set by Vivado HLS
- List comprehensive language support in Vivado HLS
- Identify steps involved in validation and verification flows

## Outline

- **Introduction to High-Level Synthesis**
- **High-Level Synthesis with Vivado HLS**
- **Language Support**
- **Validation Flow**
- **Summary**

## High-Level Synthesis: HLS

### ➤ High-Level Synthesis

- Creates an RTL implementation from C level source code
- Extracts control and dataflow from the source code
- Implements the design based on defaults and user applied directives

### ➤ Many implementation are possible from the same source description

- Smaller designs, faster designs, optimal designs
- Enables design exploration



## Design Exploration with Directives



## Introduction to High-Level Synthesis

### ➤ How is hardware extracted from C code?

- Control and datapath can be extracted from C code at the top level
- The same principles used in the example can be applied to sub-functions
  - At some point in the top-level control flow, control is passed to a sub-function
  - Sub-function may be implemented to execute concurrently with the top-level and or other sub-functions

### ➤ How is this control and dataflow turned into a hardware design?

- Vivado HLS maps this to hardware through scheduling and binding processes

### ➤ How is my design created?

- How functions, loops, arrays and IO ports are mapped?

## HLS: Control Extraction

### Code

```
void fir(
    data_t*y,
    coef_t*c[4],
    data_tx
) {
    static data_t shift_reg[4];
    acc_t acc;
    int i;

    acc=0;
    loop: for (i=3;i>=0;i--) {
        if (i==0) {
            acc+=x*c[0];
            shift_reg[0]=x;
        } else {
            shift_reg[i]=shift_reg[i-1];
            acc+=shift_reg[i]*c[i];
        }
    }
    *y=acc;
}
```

From any C code example ..

Function Start

For-Loop Start

For-Loop End

Function End

### Control Behavior

Finite State Machine (FSM)  
states



The loops in the C code correlated to states  
of behavior

This behavior is extracted into a hardware  
state machine

Intro to HLS 11- 7

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## HLS: Control & Datapath Extraction

### Code

```
void fir(
    data_t*y,
    coef_t*c[4],
    data_tx
) {
    static data_t shift_reg[4];
    acc_t acc;
    int i;

    acc=0;
    loop: for (i=3;i>=0;i--) {
        if (i==0) {
            acc+=x*c[0];
            shift_reg[0]=x;
        } else {
            shift_reg[i]=shift_reg[i-1];
            acc+=shift_reg[i]*c[i];
        }
    }
    *y=acc;
}
```

From any C code example ..

### Operations



Operations are  
extracted...

### Control Behavior



The control is  
known

### Control & Datapath Behavior



A unified control dataflow behavior is  
created.

Intro to HLS 11- 8

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

# High-Level Synthesis: Scheduling & Binding

## ► Scheduling & Binding

- Scheduling and Binding are at the heart of HLS

## ► Scheduling determines in which clock cycle an operation will occur

- Takes into account the control, dataflow and user directives
- The allocation of resources can be constrained

## ► Binding determines which library cell is used for each operation

- Takes into account component delays, user directives



Intro to HLS 11- 9

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

# Scheduling

## ► The operations in the control flow graph are mapped into clock cycles

```
void foo ()  
...  
t1 = a * b;  
t2 = c + t1;  
t3 = d * t2;  
out = t3 - e;  
}
```



Schedule 1



## ► The technology and user constraints impact the schedule

- A faster technology (or slower clock) may allow more operations to occur in the same clock cycle

Schedule 2



## ► The code also impacts the schedule

- Code implications and data dependencies must be obeyed

Intro to HLS 11- 10

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Binding

### ► Binding is where operations are mapped to cores from the hardware library

- Operators map to cores

### ► Binding Decision: to share

- Given this schedule:



- Binding must use 2 multipliers, since both are in the same cycle
- It can decide to use an adder and subtractor or share one addsub

### ► Binding Decision: or not to share

- Given this schedule:



- Binding may decide to share the multipliers (each is used in a different cycle)
- Or it may decide the cost of sharing (muxing) would impact timing and it may decide *not to share* them
- It may make this same decision in the first example above too

## Outline

### ► Introduction to High-Level Synthesis

### ► High-Level Synthesis with Vivado HLS

### ► Language Support

### ► Validation Flow

### ► Summary

# Understanding Vivado HLS Synthesis

## ► HLS

- Vivado HLS determines in which cycle operations should occur (scheduling)
- Determines which hardware units to use for each operation (binding)
- It performs HLS by :
  - Obeying built-in defaults
  - Obeying user directives & constraints to override defaults
  - Calculating delays and area using the specified technology/device

## ► Understand the priority of directives

1. Meet Performance (clock & throughput)
  - Vivado HLS will allow a local clock path to fail if this is required to meet throughput
  - Often possible the timing can be met after logic synthesis
2. Then minimize latency
3. Then minimize area

# The Key Attributes of C code

```
void fir (
    data_t *y,
    coef_t c[4],
    data_t x
) {
    static data_t shift_reg[4];
    acc_t acc;
    int i;

    acc=0;
    loop: for (i=3;i>=0;i--) {
        if (i==0) {
            acc+=x*c[0];
            shift_reg[0]=x;
        } else {
            shift_reg[i]=shift_reg[i-1];
            acc+=shift_reg[i] * c[i];
        }
    }
    *y=acc;
}
```

**Functions:** All code is made up of functions which represent the design hierarchy: the same in hardware

**Top Level IO:** The arguments of the top-level function determine the hardware RTL interface ports

**Types:** All variables are of a defined type. The type can influence the area and performance

**Loops:** Functions typically contain loops. How these are handled can have a major impact on area and performance

**Arrays:** Arrays are used often in C code. They can influence the device IO and become performance bottlenecks

**Operators:** Operators in the C code may require sharing to control area or specific hardware implementations to meet performance

Let's examine the default synthesis behavior of these ...

## Functions & RTL Hierarchy

### ► Each function is translated into an RTL block

- Verilog module, VHDL entity



- By default, each function is implemented using a common instance
- Functions may be inlined to dissolve their hierarchy
  - Small functions may be automatically inlined

## Types = Operator Bit-sizes

### Code

```

void fir(
    data_t*y,
    coef_t*c[4],
    data_tx
) {
    static data_t shift_reg[4];
    acc_t acc;
    int i;

    acc=0;
    loop: for (i=3;i>=0;i--) {
        if (i==0) {
            acc+=x*c[0];
            shift_reg[0]=x;
        } else {
            shift_reg[i]=shift_reg[i-1];
            acc+=shift_reg[i]*c[i];
        }
    }
    *y=acc;
}
  
```

From any C code example ...

### Operations



Operations are extracted...

### Types

#### Standard C types

|                    |                 |                |
|--------------------|-----------------|----------------|
| long long (64-bit) | short (16-bit)  | unsigned types |
| int (32-bit)       | char (8-bit)    |                |
| float (32-bit)     | double (64-bit) |                |

#### Arbitrary Precision types

|              |                                            |
|--------------|--------------------------------------------|
| C:           | ap(u)int types (1-1024)                    |
| C++:         | ap_(u)int types (1-1024)<br>ap_fixed types |
| C++/SystemC: | sc_(u)int types (1-1024)<br>sc_fixed types |

Can be used to define any variable to be a specific bit-width (e.g. 17-bit, 47-bit etc).

The C types define the size of the hardware used:  
handled automatically

## Loops

### ► By default, loops are rolled

- Each C loop iteration → Implemented in the same state
- Each C loop iteration → Implemented with same resources

```
void foo_top (...) {  
    ...  
    Add: for (i=3;i>=0;i--) {  
        b = a[i] + b;  
    }  
    ...  
}
```

Loops require labels if they are to be referenced by Tcl directives  
(GUI will auto-add labels)

Synthesis



- Loops can be unrolled if their indices are statically determinable at elaboration time
  - Not when the number of iterations is variable
- Unrolled loops result in more elements to schedule but greater operator mobility
  - Let's look at an example ....

## Data Dependencies: Good

```
void fir (  
...  
acc=0;  
loop: for (i=3;i>=0;i--) {  
    if (i==0) {  
        acc+=x*c[0];  
        shift_reg[0]=x;  
    } else {  
        shift_reg[i]=shift_reg[i-1];  
        acc+=shift_reg[i]*c[i];  
    }  
}  
y=acc;
```

Default Schedule



### ► Example of good mobility

- The read on data port X can occur anywhere from the start to iteration 4
  - The only constraint on RDx is that it occur before the final multiplication
- Vivado HLS has a lot of freedom with this operation
  - It waits until the read is required, saving a register
  - There are no advantages to reading any earlier (unless you want it registered)
  - Input reads can be optionally registered
- The final multiplication is very constrained...

## Data Dependencies: Bad

```
void fir (           Default Schedule
...
acc=0;
loop: for (i=3;i>=0;i--) {
    if (i==0) {
        acc+=x*c[0];
        shift_reg[0]=x;
    } else {
        shift_reg[i]=shift_reg[i-1];
        acc+=shift_reg[i]*c[i];
    }
}
*y=acc;
}
```



## ► Example of bad mobility

- The final multiplication must occur before the read and final addition
    - It could occur in the same cycle if timing allows
  - Loops are rolled by default
    - Each iteration cannot start till the previous iteration completes
    - The final multiplication (in iteration 4) must wait for earlier iterations to complete
  - The structure of the code is forcing a particular schedule
    - There is little mobility for most operations
  - Optimizations allow loops to be unrolled giving greater freedom

Intro to HLS 11- 19

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Schedule after Loop Optimization

### ➤ With the loop unrolled (completely)

- The dependency on loop iterations is gone
  - Operations can now occur in parallel
    - If data dependencies allow
    - If operator timing allows
  - Design finished faster but uses more operators
    - 2 multipliers & 2 Adders



```

void fir (
...
acc=0;
loop: for (i=3;i>=0;i--) {
  if (i==0) {
    acc+=X*c[0];
    shift_reg[0]=x;
  } else {
    shift_reg[i]=shift_reg[i];
    acc+=shift_reg[i]*c[i];
  }
}
*y=acc;

```

## Schedule Summary

- All the logic associated with the loop counters and index checking are now gone
  - Two multiplications can occur at the same time
    - All 4 could, but it's limited by the number of input reads (2) on coefficient port C
  - Why 2 reads on port C?
    - The default behavior for arrays now limits the schedule ...

Intro to HIS 11-20

© Copyright 2013 Xilipx

XILINX > ALL PROGRAMMABLE

## Arrays in HLS

### ► An array in C code is implemented by a memory in the RTL

- By default, arrays are implemented as RAMs, optionally a FIFO

```
void foo_top(int x, ...)  
{  
    int A[N];  
    L1: for (i = 0; i < N; i++)  
        A[i+x] = A[i] + i;  
}
```



Synthesis



### ► The array can be targeted to any memory resource in the library

- The ports (Address, CE active high, etc.) and sequential operation (clocks from address to data out) are defined by the library model
- All RAMs are listed in the Vivado HLS Library Guide

### ► Arrays can be merged with other arrays and reconfigured

- To implement them in the same memory or one of different widths & sizes

### ► Arrays can be partitioned into individual elements

- Implemented as smaller RAMs or registers

## Top-Level IO Ports

### ► Top-level function arguments

- All top-level function arguments have a default hardware port type

### ► When the array is an argument of the top-level function

- The array/RAM is “off-chip”
- The type of memory resource determines the top-level IO ports
- Arrays on the interface can be mapped & partitioned
  - E.g. partitioned into separate ports for each element in the array

```
void foo_top(int A[3*N], int x)  
{  
    L1: for (i = 0; i < N; i++)  
        A[i+x] = A[i] + i;  
}
```

Synthesis

Number of ports defined by the  
RAM resource



### ► Default RAM resource

- Dual port RAM if performance can be improved otherwise Single Port RAM

## Schedule after an Array Optimization

### With the existing code & defaults

- Port C is a dual port RAM
- Allows 2 reads per clock cycles
  - IO behavior impacts performance

**Note:** It could have performed 2 reads in the original rolled design but there was no advantage since the rolled loop forced a single read per cycle



```
loop: for (i=3;i>=0;i--) {  
    if (i==0) {  
        acc+=x*c[0];  
        shift_reg[0]=x;  
    } else {  
        shift_reg[i]=shift_reg[i-1];  
        acc+=shift_reg[i]*c[i];  
    }  
}  
y=acc;
```



### With the C port partitioned into (4) separate ports

- All reads and mults can occur in one cycle
- If the timing allows
  - The additions can also occur in the same cycle
  - The write can be performed in the same cycles
  - Optionally the port reads and writes could be registered

## Operators

### Operator sizes are defined by the type

- The variable type defines the size of the operator

### Vivado HLS will try to minimize the number of operators

- By default Vivado HLS will seek to minimize area after constraints are satisfied

### User can set specific limits & targets for the resources used

- Allocation can be controlled
  - An upper limit can be set on the number of operators or cores allocated for the design: This can be used to force sharing
    - e.g. limit the number of multipliers to 1 will force Vivado HLS to share



Use 1 mult, but take 4 cycle even if it could be done in 1 cycle using 4 mults

- Resources can be specified
  - The cores used to implement each operator can be specified
  - e.g. Implement each multiplier using a 2 stage pipelined core (hardware)



Same 4 mult operations could be done with 2 pipelined mults (with allocation limiting the mults to 2)

## Outline

- **Introduction to High-Level Synthesis**
- **High-Level Synthesis with Vivado HLS**
- **Language Support**
- **Validation Flow**
- **Summary**

## Comprehensive C Support

- **A Complete C Validation & Verification Environment**
  - Vivado HLS supports complete bit-accurate validation of the C model
  - Vivado HLS provides a productive C-RTL co-simulation verification solution
- **Vivado HLS supports C, C++ and SystemC**
  - Functions can be written in any version of C
  - Wide support for coding constructs in all three variants of C
- **Modeling with bit-accuracy**
  - Supports arbitrary precision types for all input languages
  - Allowing the exact bit-widths to be modeled and synthesized
- **Floating point support**
  - Support for the use of float and double in the code
- **Support for OpenCV functions**
  - Enable migration of OpenCV designs into Xilinx FPGA
  - Libraries target real-time full HD video processing

## C, C++ and SystemC Support

### ➤ The vast majority of C, C++ and SystemC is supported

- Provided it is statically defined at **compile time**
- If it's not defined until **run time**, it won't be synthesizable

### ➤ Any of the three variants of C can be used

- If C is used, Vivado HLS expects the file extensions to be .c
- For C++ and SystemC it expects file extensions .cpp

## Outline

- Introduction to High-Level Synthesis
- High-Level Synthesis with Vivado HLS
- Language Support
- Validation Flow
- Summary

## C Validation and RTL Verification

### ► There are two steps to verifying the design

- Pre-synthesis: C **Validation**
  - Validate the algorithm is correct
- Post-synthesis: RTL **Verification**
  - Verify the RTL is correct

### ► C validation

- A **HUGE** reason users want to use HLS
  - Fast, free verification
- Validate the algorithm is correct before synthesis
  - Follow the test bench tips given over

### ► RTL Verification

- Vivado HLS can co-simulate the RTL with the original test bench



## C Function Test Bench

### ► The test bench is the level above the function

- The main() function is above the function to be synthesized

### ► Good Practices

- The test bench should compare the results with golden data
  - Automatically confirms any changes to the C are validated and verifies the RTL is correct
- The test bench should return a 0 if the self-checking is correct
  - Anything but a 0 (zero) will cause RTL verification to issue a FAIL message
  - Function main() should expect an integer return (non-void)

```
int main () {
    int ret=0;
    ...
    ret = system("diff --brief -w output.dat output.golden.dat");
    if (ret != 0) {
        printf("Test failed !!!\n");
        ret=1;
    } else {
        printf("Test passed !\n");
    }
    ...
    return ret;
}
```

## Determine or Create the top-level function

- Determine the top-level function for synthesis
- If there are Multiple functions, they must be merged
  - There can only be 1 top-level function for synthesis

Given a case where functions func\_A and func\_B are to be implemented in FPGA

```
main.c
int main () {
    ...
    func_A(a,b,'i1);
    func_B(c,'i1,'i2);
    func_C('i2,ret)
    ...
    return ret;
}
```

Recommendation is to separate test bench and design files

Re-partition the design to create a new single top-level function inside main()

```
main.c
#include func_AB.h
int main (a,b,c,d) {
    ...
    // func_A(a,b,i1);
    // func_B(c,i1,i2);
    func_AB (a,b,c, 'i1, 'i2);
    func_C('i2,ret)
    ...
    return ret;
}
```

```
func_AB.c
#include func_AB.h
func_AB(a,b,c,'i1, 'i2) {
    ...
    func_A(a,b,'i1);
    func_B(c,'i1,'i2);
    ...
}
```

## Outline

- Introduction to High-Level Synthesis
- High-Level Synthesis with Vivado HLS
- Language Support
- Validation Flow
- Summary

## Summary

### ➤ In HLS

- C becomes RTL
- Operations in the code map to hardware resources
- Understand how constructs such as functions, loops and arrays are synthesized

### ➤ HLS design involves

- Synthesize the initial design
- Analyze to see what limits the performance
  - User directives to change the default behaviors
  - Remove bottlenecks
- Analyze to see what limits the area
  - The types used define the size of operators
  - This can have an impact on what operations can fit in a clock cycle

### ➤ Use directives to shape the initial design to meet performance

- Increase parallelism to improve performance
- Refine bit sizes and sharing to reduce area



## Using Vivado HLS

Vivado HLS 2013.3 Version

This material exempt per Department of Commerce license exception TSU

© Copyright 2013 Xilinx

## Objectives

### ► After completing this module, you will be able to:

- List various OS under which Vivado HLS is supported
- Describe how projects are created and maintained in Vivado HLS
- State various steps involved in using Vivado HLS project creation wizard
- Distinguish between the role of top-level module in testbench and design to be synthesized
- List various verifications which can be done in Vivado HLS
- List Vivado HLS project directory structure

## Outline

- **Invoking Vivado HLS**
- **Project Creation using Vivado HLS**
- **Synthesis to IPXACT Flow**
- **Design Analysis**
- **Other Ways to use Vivado HLS**
- **Summary**

## Invoke Vivado HLS from Windows Menu



## Vivado HLS GUI



XILINX ➤ ALL PROGRAMMABLE.

## Outline

- **Invoking Vivado HLS**
- **Project Creation using Vivado HLS**
- **Synthesis to IPXACT Flow**
- **Design Analysis**
- **Other Ways to use Vivado HLS**
- **Summary**

## Vivado HLS Projects and Solutions

### ► Vivado HLS is project based

- A project specifies the source code which will be synthesized
- Each project is based on one set of source code
- Each project has a user specified name

### ► A project can contain multiple solutions

- Solutions are different implementations of the same code
- Auto-named solution1, solution2, etc.
- Supports user specified names
- Solutions can have different clock frequencies, target technologies, synthesis directives

### ► Projects and solutions are stored in a hierarchical directory structure

- Top-level is the project directory
- The disk directory structure is identical to the structure shown in the GUI project explorer (except for source code location)



## Vivado HLS Step 1: Create or Open a project

### ► Start a new project

- The GUI will start the project wizard to guide you through all the steps



### ► Open an existing project

- All results, reports and directives are automatically saved/remembered
- Use "Recent Project" menu for quick access

## Project Wizard

► The Project Wizard guides users through the steps of opening a new project



## Define Project & Directory

### ► Define the project name

- Note, here the project is given the extension .prj
- A useful way of seeing it's a project (and not just another directory) when browsing

### ► Browse to the location of the project

- In this example, project directory "dct.prj" will be created inside directory "lab1"



## Add Design Source Files

### ➤ Add Design Source Files

- This allows Vivado HLS to determine the top-level design for synthesis, from the test bench & associated files
- Not required for SystemC designs

### ➤ Add Files...

- Select the source code file(s)
- The CTRL and SHIFT keys can be used to add multiple files
- No need to include headers (.h) if they reside in the same directory

### ➤ Select File and Edit CFLAGS...

- If required, specify C compile arguments using the “Edit CFLAGS...”
  - Define macros:-DVERSION1
  - Location of any (header) files not in the same directory as the source: -I./include



Using Vivado HLS 12 - 11

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Specify Test Bench Files

### ➤ Use “Add Files” to include the test bench

- Vivado HLS will re-use these to verify the RTL using co-simulation

### ➤ And all files referenced by the test bench

- The RTL simulation will be executed in a different directory (Ensures the original results are not over-written)
- Vivado HLS needs to also copy any files accessed by the test bench
- Input data and output results (\*.dat) are shown in this example

### ➤ Add Folders

- If the test bench uses relative paths like “sub\_directory/my\_file.dat” you can add “sub\_directory” as a folder/directory

### ➤ Use “Edit CFLAGS...”

- To add any C compile flags required for compilation



Using Vivado HLS 12 - 12

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Test benches I

### ► The test bench should be in a separate file

### ► Or excluded from synthesis

- The Macro `__SYNTHESIS__` can be used to isolate code which will not be synthesized
  - This macro is defined when Vivado HLS parses any code (`-D__SYNTHESIS__`)

```
// test.c
#include <stdio.h>
void test (int d[10]) {
    int acc = 0;
    int i;
    for (i=0;i<10;i++) {
        acc += d[i];
        d[i] = acc;
    }
}
#ifndef __SYNTHESIS__
int main () {
    int d[10];
    for (i=0;i<10;i++) {
        d[i] = i;
    }
    test(d);
    for (i=0;i<10;i++) {
        printf("%d %d\n", i, d[i]);
    }
    return 0;
}
#endif
```

Design to be synthesized

Test Bench  
Nothing in this ifndef will be read by Vivado HLS (will be read by gcc)

Using Vivado HLS 12 - 13

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Test benches II

### ► Ideal test bench

- Should be self checking
  - RTL verification will re-use the C test bench
- If the test bench is self-checking
  - Allows RTL Verification to be run without a requirement to check the results again
- RTL verification “passes” if the test bench return value is 0 (zero)
  - Actively return a 0 if the simulation passes

```
int main () {
    // Compare results
    int ret = system("diff -brief -w test_data/output.dat test_data/output.golden.dat");
    if (ret != 0) {
        printf("Test failed !!!\n", ret); return 1;
    } else {
        printf("Test passed !\n", ret); return 0;
    }
}
```

The `-w` option ensures the “newline” does not cause a difference between Windows and Linux files

- Non-synthesizable constructs may be added to a synthesize function if `__SYNTHESIS__` is used

```
#ifndef __SYNTHESIS__
image_t *yuv = ((image_t *)malloc(sizeof(image_t)));
#else // Workaround malloc() calls w/o changing rest of code
    image_t _yuv;
#endif
```

Using Vivado HLS 12 - 14

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Solution Configuration

### ► Provide a solution name

- Default is solution1, then solution2 etc.

### ► Specify the clock

- The clock uncertainty is subtracted from the clock to provide an “effective clock period”
- Vivado HLS uses the “effective clock period” for Synthesis
- Provides users defined margin for downstream RTL synthesis, P&R

### ► Select the part

- Select a device family after applying filters such as family, package and speed grade (see next slide)



Using Vivado HLS 12 - 15

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Selecting Part and Implementation Engine

### ► Select the target part either through Parts or Boards specify

### ► Select RTL Tools

- Auto
  - Will select Vivado for 7 Series and Zynq devices
  - Will select ISE for Virtex-6 and earlier families
- Vivado
- ISE
  - ISE Design Suite must be installed and must be included in the PATH variable



| Device Selection Dialog                                                                                                                                                                                                                                                                                                                               |         |                       |                  |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|-----------------------|------------------|--------|--------|---------|-------|-----------------|------|--------|----|-----------------|------|--------|----|-----------------|------|--------|----|
| RTL Tool                                                                                                                                                                                                                                                                                                                                              | Specify | Filter                |                  |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |
| Auto                                                                                                                                                                                                                                                                                                                                                  | Parts   | Product Category: Any | Package: clg484  |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |
|                                                                                                                                                                                                                                                                                                                                                       | Boards  | Family: zynq          | Speed Grade: Any |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |
| Sub-Family: zynq                                                                                                                                                                                                                                                                                                                                      |         |                       |                  |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |
| Search: ▾                                                                                                                                                                                                                                                                                                                                             |         |                       |                  |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |
| <table border="1"><thead><tr><th>Device</th><th>Family</th><th>Package</th><th>Speed</th></tr></thead><tbody><tr><td>xc7z020clg484-1</td><td>zynq</td><td>clg484</td><td>-1</td></tr><tr><td>xc7z020clg484-2</td><td>zynq</td><td>clg484</td><td>-2</td></tr><tr><td>xc7z020clg484-3</td><td>zynq</td><td>clg484</td><td>-3</td></tr></tbody></table> |         |                       |                  | Device | Family | Package | Speed | xc7z020clg484-1 | zynq | clg484 | -1 | xc7z020clg484-2 | zynq | clg484 | -2 | xc7z020clg484-3 | zynq | clg484 | -3 |
| Device                                                                                                                                                                                                                                                                                                                                                | Family  | Package               | Speed            |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |
| xc7z020clg484-1                                                                                                                                                                                                                                                                                                                                       | zynq    | clg484                | -1               |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |
| xc7z020clg484-2                                                                                                                                                                                                                                                                                                                                       | zynq    | clg484                | -2               |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |
| xc7z020clg484-3                                                                                                                                                                                                                                                                                                                                       | zynq    | clg484                | -3               |        |        |         |       |                 |      |        |    |                 |      |        |    |                 |      |        |    |

| Device Selection Dialog                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                 |              |              |                  |       |      |        |         |       |                                |                 |      |        |    |                                |                |      |        |    |                                                |                 |      |        |    |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|--------------|--------------|------------------|-------|------|--------|---------|-------|--------------------------------|-----------------|------|--------|----|--------------------------------|----------------|------|--------|----|------------------------------------------------|-----------------|------|--------|----|
| RTL Tool                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Specify         | Filter       |              |                  |       |      |        |         |       |                                |                 |      |        |    |                                |                |      |        |    |                                                |                 |      |        |    |
| Auto                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Parts           | Family: zynq | Package: Any | Speed Grade: Any |       |      |        |         |       |                                |                 |      |        |    |                                |                |      |        |    |                                                |                 |      |        |    |
| Boards                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                 |              |              |                  |       |      |        |         |       |                                |                 |      |        |    |                                |                |      |        |    |                                                |                 |      |        |    |
| Search: ▾                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                 |              |              |                  |       |      |        |         |       |                                |                 |      |        |    |                                |                |      |        |    |                                                |                 |      |        |    |
| <table border="1"><thead><tr><th>Board</th><th>Part</th><th>Family</th><th>Package</th><th>Speed</th></tr></thead><tbody><tr><td>Zynq ZC702 Evaluation Platform</td><td>xc7z020clg484-1</td><td>zynq</td><td>clg484</td><td>-1</td></tr><tr><td>Zynq ZC706 Evaluation Platform</td><td>xc7z045fg900-1</td><td>zynq</td><td>ffg900</td><td>-1</td></tr><tr><td>ZedBoard Zynq Evaluation and Development Board</td><td>xc7z020clg484-1</td><td>zynq</td><td>clg484</td><td>-1</td></tr></tbody></table> |                 |              |              |                  | Board | Part | Family | Package | Speed | Zynq ZC702 Evaluation Platform | xc7z020clg484-1 | zynq | clg484 | -1 | Zynq ZC706 Evaluation Platform | xc7z045fg900-1 | zynq | ffg900 | -1 | ZedBoard Zynq Evaluation and Development Board | xc7z020clg484-1 | zynq | clg484 | -1 |
| Board                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Part            | Family       | Package      | Speed            |       |      |        |         |       |                                |                 |      |        |    |                                |                |      |        |    |                                                |                 |      |        |    |
| Zynq ZC702 Evaluation Platform                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | xc7z020clg484-1 | zynq         | clg484       | -1               |       |      |        |         |       |                                |                 |      |        |    |                                |                |      |        |    |                                                |                 |      |        |    |
| Zynq ZC706 Evaluation Platform                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | xc7z045fg900-1  | zynq         | ffg900       | -1               |       |      |        |         |       |                                |                 |      |        |    |                                |                |      |        |    |                                                |                 |      |        |    |
| ZedBoard Zynq Evaluation and Development Board                                                                                                                                                                                                                                                                                                                                                                                                                                                        | xc7z020clg484-1 | zynq         | clg484       | -1               |       |      |        |         |       |                                |                 |      |        |    |                                |                |      |        |    |                                                |                 |      |        |    |

Using Vivado HLS 12 - 16

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Clock Specification

### ► Clock frequency must be specified

- Only 1 clock can be specified for C/C++ functions
- SystemC can define multiple clocks

### ► Clock uncertainty can be specified

- Subtracted from the clock period to give an effective clock period
- The effective clock period is used for synthesis
  - Should not be used as a design parameter
  - Do not vary for different results: this is your safety margin
- A user controllable margin to account for downstream RTL synthesis and P&R



Using Vivado HLS 12 - 17

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## A Vivado HLS Project



Using Vivado HLS 12 - 18

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Vivado HLS GUI Toolbar

### ► The primary commands have toolbar buttons

- Easy access for standard tasks
- Button highlights when the option is available
  - E.g. cannot perform C/RTL simulation before synthesis



## Files: Views, Edits & Information



## Outline

- Invoking Vivado HLS
- Project Creation using Vivado HLS
- Synthesis to IPXACT Flow
- Design Analysis
- Other Ways to use Vivado HLS
- Summary

## Synthesis

- Run C Synthesis

- Console

- Will show run time information
  - Examine for failed constraints

- A “syn” directory is created

- Verilog, VHDL & SystemC RTL
  - Synthesis reports for all non-inlined functions

- Report opens automatically

- When synthesis completes

- Report is outlined in the Auxiliary pane



## Vivado HLS : RTL Verification



RTL output in Verilog, VHDL and SystemC

Automatic re-use of the C-level test bench

RTL verification can be executed from within Vivado HLS

Support for Xilinx simulators (XSim and ISim) and 3<sup>rd</sup> party HDL simulators in automated flow

## RTL Verification: Under-the-Hood

### ➤ RTL Co-Simulation

- Vivado HLS provides RTL verification
- Creates the wrappers and adapters to re-use the C test bench



- Prior to synthesis
  - Test bench
  - Top-level C function

### • After synthesis

- Test bench
- SystemC wrapper created by Vivado HLS
- SystemC adapters created by Vivado HLS
- RTL output from Vivado HLS
  - SystemC, Verilog or VHDL

There is no HDL test bench created

## RTL Verification Support

### ➤ Vivado HLS RTL Output

- Vivado HLS outputs RTL in SystemC, Verilog and VHDL
  - The SystemC output is at the RT Level
  - The input is not transformed to SystemC at the ESL

### ➤ RTL Verification with SystemC

- The SystemC RTL output can be used to verify the design without the need for a HDL simulator and license

### ➤ HDL Simulation Support

- Vivado HLS supports HDL simulators on both Windows & Linux
- The 3<sup>rd</sup> party simulator executable must be in OS search path

| Simulator                | Linux     | Windows       |
|--------------------------|-----------|---------------|
| XSim (Vivado Simulator)  | Supported | Supported     |
| ISim (ISE Simulator)     | Supported | Supported     |
| Mentor Graphics ModelSim | Supported | Supported     |
| Synopsys VCS             | Supported | Not Available |
| NCSim                    | Supported | Not Available |
| Riviera                  | Supported | Supported     |

## C/RTL Co-simulation

### ➤ Start Simulation

- Opens the dialog box

### ➤ Select the RTL

- SystemC does not require a 3<sup>rd</sup> party license
- Verilog and VHDL require the appropriate simulator
  - Select the desired simulator
- Run any or all

### ➤ Options

- Can output trace file (VCD format)
- Optimize the C compilation & specify test bench linker flags
- The “setup only” option will not execute the simulation

### ➤ OK will run the simulator

- Output files will be created in a “sim” directory



## Simulation Results

➤ Simulation output is shown in the console

➤ Expect the same test bench response

- If the C test bench plots, it will work with the RTL design (but slower)

➤ Sim Directory

- Will contain a sub-directory for each RTL which is verified

➤ Report

- A report is created and opened automatically



## Vivado HLS : RTL Export



RTL output in Verilog, VHDL and SystemC

Scripts created for RTL synthesis tools

RTL Export to IP-XACT, SysGen, and Pcore formats

IP-XACT and SysGen => Vivado HLS for 7 Series  
and Zynq families  
PCore => Only Vivado HLS Standalone for all  
families

## RTL Export Support

### ➤ RTL Export

- Can be exported to one of the three types
  - IP-XACT formatted IP for use with Vivado System Edition (SE)
    - 7 Series and Zynq families only
  - A System Generator IP block
    - 7 Series and Zynq families only
  - Pcore formated IP block for use with EDK
    - 7 Series, Zynq, Spartan-3, Spartan-6, Virtex-4/5/6 families

### ➤ Generation in both Verilog and VHDL for non-bus or non-interface based designs

### ➤ Logic synthesis will automatically be performed

- HLS license will use Vivado RTL Synthesis

## RTL Export: Synthesis

### ➤ RTL Synthesis can be performed to evaluate the RTL

- IP-XACT and System Generator formats: Vivado synthesis performed
- Pcore format: ISE synthesis is performed



### ➤ RTL synthesis results are not included with the IP package

- Evaluate step is provided to give confidence
  - Timing will be as estimate (or better)
  - Area will be as estimated (or better)
- Final RTL IP is synthesized with the rest of the RTL design
  - RTL Synthesis results from the Vivado HLS evaluation are not used

## RTL Export: IP Repositories

► IP can be imported into other Xilinx tools



## RTL Export for Implementation

► Click on Export RTL

– Export RTL Dialog opens

► Select the desired output format

IP Catalog  
System Generator for DSP  
System Generator for DSP (ISE)  
Pcore for EDK  
Synthesized Checkpoint (.dcp)

► Optionally, configure the output

► Select the desired language

► Optionally, click on Evaluate button for invoking implementation tools from within Vivado HLS

► Click OK to start the implementation



## RTL Export (Evaluate Option) Results

### ► Impl directory created

- Will contain a sub-directory for each RTL which is synthesized

### ► Report

- A report is created and opened automatically

```
Vivado HLS Console
Phase 9 Post Router Timing | Checksum: 17cfbf6fb

Time (s): cpu = 00:01:00 ; elapsed = 00:00:28 . Memory (MB): peak = 941.410 ; gain = 93.391
INFO: [Route 35-16] Router Completed Successfully
Ending Route Task | Checksum: 17cfbf6fb

Time (s): cpu = 00:00:00 ; elapsed = 00:00:28 . Memory (MB): peak = 941.410 ; gain = 93.391

Routing Is Done.

Vivado HLS Console
LUT: 279
FF: 134
DSP: 1
BRAM: 5
SRL: 0
#== Final timing ==
CP required: 10.000
CP achieved: 6.192
Timing met
INFO: [Common 17-206] Exiting Vivado at Mon Oct 14 15:49:19 2013...
@I [LIC-101] Checked in feature [VIVADO_HLS]
```

Using Vivado HLS 12 - 33

© Copyright 2013 Xilinx



XILINX ➤ ALL PROGRAMMABLE.

## RTL Export Results (Evaluate Option Unchecked)

### ► Impl directory created

- Will contain a sub-directory for both VHDL and Verilog along with the ip directory

### ► No report will be created

### ► Observe the console

- No packing, routing phases



```
Vivado HLS Console
Starting export RTL ...
C:/Xilinx/Vivado_HLS/2013.3/bin/vivado_hls.bat C:/xup/hls/labs/lab1/dct.prj/solution1/export
@I [LIC-101] Checked out feature [VIVADO_HLS]
@I [HLS-10] Running 'C:/Xilinx/Vivado_HLS/2013.3/bin/unwrapped/win64.0/vivado_hls.exe' for user 'parimalp' on host 'xupparimalp30' (Windows NT amd64 version 6.1) on directory 'C:/xup/hls/labs/lab1'
@I [HLS-10] Opening project 'C:/xup/hls/labs/lab1/dct.prj'.
@I [HLS-10] Opening solution 'C:/xup/hls/labs/lab1/dct.prj/solution1'.
@I [SYN-201] Setting up clock 'default' with a period of 10ns.
@I [HLS-10] Setting target device to 'xc7z020clg484-1'.
@I [IMPL-8] Exporting RTL as an IP in IP-XACT.

***** Vivado v2013.3 (64-bit)
*** SW Build 328145 on Sun Oct 13 18:10:54 MDT 2013
*** IP Build 191624 on Sun Oct 13 14:03:12 MDT 2013
** Copyright 1986-1999, 2001-2013 Xilinx, Inc. All Rights Reserved.

INFO: [Common 17-78] Attempting to get a license: Implementation
INFO: [Common 17-81] Feature available: Implementation
INFO: [Device 21-36] Loading parts and site information from c:/Xilinx/Vivado/2013.3/data/Parsing RTL primitives file [c:/Xilinx/Vivado/2013.3/data/parts/xilinx/rtl/prims/rtl_prims Finished parsing RTL primitives file [c:/Xilinx/Vivado/2013.3/data/parts/xilinx/rtl/prims/source run_ipack.tcl -notrace
INFO: [Common 17-206] Exiting Vivado at Mon Oct 14 16:15:15 2013...
@I [LIC-101] Checked in feature [VIVADO_HLS]
```

Using Vivado HLS 12 - 34

© Copyright 2013 Xilinx

XILINX ➤ ALL PROGRAMMABLE.

## Outline

- Invoking Vivado HLS
- Project Creation using Vivado HLS
- Synthesis to IPXACT Flow
- Design Analysis
- Other Ways to use Vivado HLS
- Summary

## Analysis Perspective

### ➤ Perspective for design analysis

- Allows interactive analysis



## Performance Analysis

The screenshot shows the Vivado HLS Performance Analysis interface. On the left, a hierarchical navigation tree displays operations like `i_21_read(wire_r)`, `i_2_read(wire_re)`, and nested loops `[+]DCT_Outer_Loop` and `[+]DCT_Inner_Loop`. A red arrow points from the tree to a yellow box labeled "Loop Hierarchy". Below the tree are two tabs: "Operations, loops and functions" and "Scheduled States". To the right is a code editor window titled "C Source" showing the `dct_1d` implementation. A red arrow points from the "Loop Hierarchy" box to the code editor, with a callout bubble stating: "Select operations and right-click to cross reference with the C source and HDL". The bottom status bar indicates "Using Vivado HLS 12 - 37" and "© Copyright 2013 Xilinx".

## Resources Analysis

The screenshot shows the Vivado HLS Resource Sharing analysis interface. On the left, a table lists resources across control states `c0` to `c8`. A red box highlights the "Performance: Resource Sharing" row. A yellow box labeled "Resource View" appears over the table, stating: "Scheduled operations associated with resource: anything on the same row shares the same resource". To the right is a "Resource Profile" window showing resource usage details. A red box highlights the "Resource Profile" tab. The bottom status bar indicates "Using Vivado HLS 12 - 38" and "© Copyright 2013 Xilinx".

## Outline

- Invoking Vivado HLS
- Project Creation using Vivado HLS
- Synthesis to IPXACT Flow
- Design Analysis
- Other Ways to use Vivado HLS
- Summary

## Command Line Interface: Batch Mode

- Vivado HLS can also be run in batch mode
  - Opening the Command Line Interface (CLI) will give a shell



- Supports the commands required to run Vivado HLS & pre-synthesis verification (gcc, g++, apcc, make)

## Using Vivado HLS CLI

### ➤ Invoke Vivado HLS in interactive mode

- Type Tcl commands one at a time

```
> vivado_hls -i
```

### ➤ Execute Vivado HLS using a Tcl batch file

- Allows multiple runs to be scripted and automated

```
> vivado_hls -f run_aesl.tcl
```

### ➤ Open an existing project in the GUI

- For analysis, further work or to modify it

```
> vivado_hls -p my.prj
```

### ➤ Use the shell to launch Vivado HLS GUI

```
> vivado_hls
```

## Using Tcl Commands

### ➤ When the project is created

- All Tcl command to run the project are created in script.tcl
  - User specified directives are placed in directives.tcl
- Use this as a template from creating Tcl scripts
  - Uncomment the commands before running the Tcl script



```
1 #####  
2 ## This file is generated automatically by Vivado HLS.  
3 ## Please DO NOT edit it.  
4 ## Copyright (C) 2013 Xilinx Inc. All rights reserved.  
5 #####  
6 open_project dct.prj  
7 set_top dct  
8 add_files ../../ise_labs/labs/lab3/dct.c  
9 add_files -tb ../../ise_labs/labs/lab3/dct_test.c  
10 add_files -tb ../../ise_labs/labs/lab3/in.dat  
11 add_files -tb ../../ise_labs/labs/lab3/out.golden.dat  
12 open_solution "solution1"  
13 set_part {xc7z020clg484-1}  
14 create_clock -period 10 -name default  
15 source "./dct.prj/solution1/directives.tcl"  
16 csim_design  
17 csynth_design  
18 cosim_design -trace_level none  
19 export_design -format ip_catalog -description "An IP generated by Vivado HLS" -vendor "  
20
```

## Help

### ► Help is always available

- The Help Menu
- Opens User Guide, Reference Guide and Man Pages



### ► In interactive mode

- The help command lists the man page for all commands

Vivado\_HLS> help add\_files

SYNOPSIS  
add\_files [OPTIONS] <src\_files>  
Etc...

A screenshot of a terminal window showing the command 'Vivado\_HLS> help add\_files'. Below it, the command's synopsis is displayed: 'SYNOPSIS add\_files [OPTIONS] <src\_files> Etc...'. To the right of the terminal, a yellow box contains the text 'Auto-Complete all commands using the tab key'.

Using Vivado HLS 12 - 43

© Copyright 2013 Xilinx

XILINX ► ALL PROGRAMMABLE.

## Outline

- Invoking Vivado HLS
- Project Creation using Vivado HLS
- Synthesis to IPXACT Flow
- Design Analysis
- Other Ways to use Vivado HLS
- Summary

Using Vivado HLS 12 - 44

© Copyright 2013 Xilinx

XILINX ► ALL PROGRAMMABLE.

## Summary

- Vivado HLS can be run under Windows XP, Windows 7, Red Hat Linux, and SUSE OS
- Vivado HLS can be invoked through GUI and command line in Windows OS, and command line in Linux
- Vivado HLS project creation wizard involves
  - Defining project name and location
  - Adding design files
  - Specifying testbench files
  - Selecting clock and technology
- The top-level module in testbench is main() whereas top-level module in the design is the function to be synthesized

## Summary

- Vivado HLS project directory consists of
  - \*.prj project file
  - Multiple solutions directories
  - Each solution directory may contain
    - impl, synth, and sim directories
    - The impl directory consists of pcores, verilog, and vhdl folders
    - The synth directory consists of reports, systemC, vhdl, and verilog folders
    - The sim directory consists of testbench and simulation files



## Lab1 Intro Vivado HLS Design Flow

Vivado HLS 2013.3 Version  
ZedBoard

This material exempt per Department of Commerce license exception TSU

© Copyright 2013 Xilinx

### Objectives

► After completing this lab, you will be able to:

- Create a project in Vivado HLS
- Run C-simulation
- Use debugger
- Synthesize and implement the design using the default options
- Use design analysis perspective to see what is going on under the hood
- Understand and analyze the generated output

## The Design

► This lab uses a simple matrix multiplication example to walk you through the Vivado HLS project creation and analysis steps. The design consists of three nested loops. The Product loop is the inner most loop performing the actual elements product. The Col loop is the outer-loop which feeds next column element data with the passed row element data to the Product loop. Finally, Row is the outer-most loop. The `res[i][j]=0` (line 79) resets the result every time a new row element is passed and new column element is used

```
67 #include "matrixmult.h"
68
69 void matrixmult(
70     mat_a_t a[MAT_A_ROWS][MAT_A_COLS],
71     mat_b_t b[MAT_B_ROWS][MAT_B_COLS],
72     result_t res[MAT_A_ROWS][MAT_B_COLS])
73 {
74     // Iterate over the rows of the A matrix
75     Row: for(int i = 0; i < MAT_A_ROWS; i++) {
76         // Iterate over the columns of the B matrix
77         Col: for(int j = 0; j < MAT_B_COLS; j++) {
78             // Do the inner product of a row of A and col of B
79             res[i][j] = 0;
80             Product: for(int k = 0; k < MAT_B_ROWS; k++) {
81                 res[i][j] += a[i][k] * b[k][j];
82             }
83         }
84     }
85 }
```

Lab1 Intro 12a- 3

© Copyright 2013 Xilinx

XILINX ► ALL PROGRAMMABLE.

## Procedure

- Create a project after starting Vivado HLS in GUI mode
- Run C simulation
  - to understand the design behavior
- Run the debugger
  - to see how the top-level module works
- Synthesize the design
- Analyze the generated output using the Analysis perspective
- Run C/RTL cosimulation
  - to perform RTL simulation
- View simulation results in Vivado
  - to understand the IO protocol
- Export RTL in the Evaluate mode and run the implementation

Lab1 Intro 12a- 4

© Copyright 2013 Xilinx

XILINX ► ALL PROGRAMMABLE.

## Summary

► In this lab, you completed the major steps of the high-level synthesis design flow using Vivado HLS. You created a project, added source files, synthesized the design, simulated the design, and implemented the design. You also learned that how to use the Analysis perspective to understand the scheduling