

## Project: Discrete Fourier Transform (DFT)

### 1) Introduction

The goal of this project is to design architectures that implement the Discrete Fourier Transform (DFT). DFT is a common operation in signal processing which generates a discrete frequency domain representation of the discrete input signal. The input signal is a vector of samples and the matrix is a set of basis functions corresponding to discrete cosine and sine waveforms of different frequencies. The multiplication of the input signal with these basis functions describes how well the input signal correlates with those waveforms, which is the value of the Fourier series at that frequency.

We start with the direct implementation of the DFT which is a form of a matrix-vector multiplication. Future projects describe how to refactor the code to perform a fast Fourier transform.

### 2) Materials

You can download the project files here:

### 2) Materials

You can download the project files here:

- [DFT Project Repo](#)

The repository is organized into five folders, dft\_8\_precomputed, dft\_32\_precomputed, dft\_256\_precomputed, dft\_1024\_precomputed, and Demo. Each of the first four folders has its own testbench, out.gold.dat file, and coefficients.h file.

Each dft\_xx\_precomputed folder contains following files:

- dft.cpp - the baseline implementation for the dft function.
- dft.h - header file
- dft\_test.cpp - test bench
- coefficientsX.h - a file containing the values of corresponding to one sine/cosine period sampled based upon the DFT points. For example, an 8 point DFT has 8 samples across both the sine and cosine function evenly spaced across one period. This is equivalent to dividing one rotation in the complex plane equally by the number of points in the DFT.
- out.gold.dat - "Golden" output. The testbench (dft\_test.cpp) generates a sample input and calls the function dft in dft.cpp with that sample input. This output of the function is compared to the expected output. This will indicate PASS or FAIL. If it fails, then the code in the folder Demo: dft.bit | dft.tcl | project3\_host.ipynb dft is incorrect. There are four different versions of depending on the DFT size and way in which the DFT coefficients were generated.
- script.tcl and directives.tcl file to create the project

Demo folder contains one file:

- DFT.ipynb - notebook for demo

### 3) Project Goal

You should modify the code to create a number of different architectures that perform tradeoffs between performance and resource utilization. For dft\_256\_precomputed and dft\_1024\_precomputed designs, you need to use precomputed values from coefficients256.h and coefficients1024.h

For 256-point and 1024-point DFTs, you will create a report describing how you generated these different architectures (code restructuring, pragmas utilized, etc.). For each architecture you should provide its results including the resource utilization (BRAMs, DSP48, LUT, FF), and performance in terms of throughput (number of DFT operations/second), latency, clock cycles, clock frequency (which is fixed to 10 ns). You can do most of the design space exploration on the 256 point DFT. You should pick your “best” 256 architecture and synthesize that as a 1024 DFT.

The 8 and 32 point folders are provided for your convenience. If you would like, you can do some of your initial design space optimization on these smaller architectures. But it is not necessary to use these at all.

The key in this project is to understand the tradeoffs between loop optimizations (unrolling and pipelining) and data partitioning. Therefore you should focus on these optimizations.

```
wes_2025_r_lizarraga@waiter:~$ cd project3_DFT/  
wes_2025_r_lizarraga@waiter:~/project3_DFT/dft_8_precomputed$ vitis_hls -f script.tcl
```

## 5) Questions

- **Question 1:** What changes would this code require if you were to use a custom CORDIC similar to what you designed for Project: CORDIC? Compared to a baseline code with HLS math functions for cos() and sin(), would changing the accuracy of your CORDIC core make the DFT hardware resource usage change? How would it affect the performance? Note that you do not need to implement the CORDIC in your code, we are just asking you to discuss potential tradeoffs that would be possible if you used a CORDIC that you designed instead of the one from Xilinx.

For this project to use CORDIC, I would have to include the CORDIC implementation in this project, for this implementation I would have to replace Sin() & Cos() functions with corresponding CORDIC ROTATIONS.

One of the main changes would be performance, resources usage and the accuracy, CORDIC method is very effective in performance since only uses SHIFT and ADDITIONS (no multiplications, less DSP usage).

Relative to the Accuracy, adding enough ROTATION data in the table to comply the required accuracy FPGA based system has limited number of DSPs that are used for multiplications, however CORDIC method would perform better with less resources by using the pre-computed scaling factors and gain

- **Question 2:** Rewrite the code to eliminate these math function calls (i.e. `cos()` and `sin()`) by utilizing a table lookup. How does this change the throughput and resource utilization? What happens to the table lookup when you change the size of your DFT?

## See dft\_256\_precomputer\_optimized1

```
// Calculate the jth frequency sample sequentially
for (j = 0; j < SIZE; j += 1) {
    // Utilize HLS tool to calculate sine and cosine values
    //~c = cos(j * w);
    //~s = -sin(j * w);
    // Lets use the const tables in header file:
    // cos_coefficients_table & sin_coefficients_table
    // Instead of calculation on the fly
    int index = (i * j) % SIZE;
    c = cos_coefficients_table[index];
    s = sin_coefficients_table[index];
```

```
1[INFO: [SIM 2] ***** CSIM start *****
2INFO: [SIM 4] CSIM will launch GCC as the compiler.
3make: 'csm.exe' is up to date.
4-----
5      RMSE(R)          RMSE(I)
6 0.000322206469718 0.000299013743643
7-----
8*****
9PASS: The output matches the golden output!
0*****
1INFO: [SIM 1] CSim done with 0 errors.
2INFO: [SIM 3] ***** CSIM finish *****
```

|                           | <b>Target</b> | <b>Estimated</b> | <b>Uncertainty</b> |
|---------------------------|---------------|------------------|--------------------|
| Baseline                  | 10.00 ns      | 7.958 ns         | 2.70 ns            |
| 1(Q2)<br>Lookup<br>tables | <b>Target</b> | <b>Estimated</b> | <b>Uncertainty</b> |
|                           | 10.00 ns      | 7.256 ns         | 2.70 ns            |

| Optimization                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                 |            |                |                 |             |                   |             |                   |           |            |           |       |      |      |     |      |       |  |  |  |       |        |         |  |        |   |    |    |    |       |      |   |                                                |              |  |  |       |        |         |  |        |   |    |   |    |      |      |   |                                |  |  |  |   |     |         |  |     |   |    |   |   |     |    |   |
|------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|------------|----------------|-----------------|-------------|-------------------|-------------|-------------------|-----------|------------|-----------|-------|------|------|-----|------|-------|--|--|--|-------|--------|---------|--|--------|---|----|----|----|-------|------|---|------------------------------------------------|--------------|--|--|-------|--------|---------|--|--------|---|----|---|----|------|------|---|--------------------------------|--|--|--|---|-----|---------|--|-----|---|----|---|---|-----|----|---|
| Baseline                                       | <table border="1"> <thead> <tr> <th>Modules &amp; Loops</th><th>Issue Type</th><th>Violation Type</th><th>Distance</th><th>Slack</th><th>Latency(cycles)</th><th>Latency(ns)</th><th>Iteration Latency</th><th>Interval</th><th>Trip Count</th><th>Pipelined</th><th>BRAM</th><th>DSP</th><th>FF</th><th>LUT</th><th>URAM</th></tr> </thead> <tbody> <tr> <td>▼ dft</td><td></td><td></td><td></td><td>-0.66</td><td>524626</td><td>5.246E6</td><td></td><td>524627</td><td>-</td><td>no</td><td>12</td><td>68</td><td>10153</td><td>9615</td><td>0</td></tr> <tr> <td>► dft_Pipeline_VITIS_LOOP_18_1_VITIS_LOOP_26_2</td><td>II Violation</td><td></td><td></td><td>-0.66</td><td>524362</td><td>5.244E6</td><td></td><td>524362</td><td>-</td><td>no</td><td>8</td><td>68</td><td>9864</td><td>9347</td><td>0</td></tr> <tr> <td>► dft_Pipeline_VITIS_LOOP_39_3</td><td></td><td></td><td></td><td>-</td><td>261</td><td>2.610E3</td><td></td><td>261</td><td>-</td><td>no</td><td>0</td><td>0</td><td>283</td><td>95</td><td>0</td></tr> </tbody> </table> | Modules & Loops | Issue Type | Violation Type | Distance        | Slack       | Latency(cycles)   | Latency(ns) | Iteration Latency | Interval  | Trip Count | Pipelined | BRAM  | DSP  | FF   | LUT | URAM | ▼ dft |  |  |  | -0.66 | 524626 | 5.246E6 |  | 524627 | - | no | 12 | 68 | 10153 | 9615 | 0 | ► dft_Pipeline_VITIS_LOOP_18_1_VITIS_LOOP_26_2 | II Violation |  |  | -0.66 | 524362 | 5.244E6 |  | 524362 | - | no | 8 | 68 | 9864 | 9347 | 0 | ► dft_Pipeline_VITIS_LOOP_39_3 |  |  |  | - | 261 | 2.610E3 |  | 261 | - | no | 0 | 0 | 283 | 95 | 0 |
| Modules & Loops                                | Issue Type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Violation Type  | Distance   | Slack          | Latency(cycles) | Latency(ns) | Iteration Latency | Interval    | Trip Count        | Pipelined | BRAM       | DSP       | FF    | LUT  | URAM |     |      |       |  |  |  |       |        |         |  |        |   |    |    |    |       |      |   |                                                |              |  |  |       |        |         |  |        |   |    |   |    |      |      |   |                                |  |  |  |   |     |         |  |     |   |    |   |   |     |    |   |
| ▼ dft                                          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                 |            | -0.66          | 524626          | 5.246E6     |                   | 524627      | -                 | no        | 12         | 68        | 10153 | 9615 | 0    |     |      |       |  |  |  |       |        |         |  |        |   |    |    |    |       |      |   |                                                |              |  |  |       |        |         |  |        |   |    |   |    |      |      |   |                                |  |  |  |   |     |         |  |     |   |    |   |   |     |    |   |
| ► dft_Pipeline_VITIS_LOOP_18_1_VITIS_LOOP_26_2 | II Violation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                 |            | -0.66          | 524362          | 5.244E6     |                   | 524362      | -                 | no        | 8          | 68        | 9864  | 9347 | 0    |     |      |       |  |  |  |       |        |         |  |        |   |    |    |    |       |      |   |                                                |              |  |  |       |        |         |  |        |   |    |   |    |      |      |   |                                |  |  |  |   |     |         |  |     |   |    |   |   |     |    |   |
| ► dft_Pipeline_VITIS_LOOP_39_3                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                 |            | -              | 261             | 2.610E3     |                   | 261         | -                 | no        | 0          | 0         | 283   | 95   | 0    |     |      |       |  |  |  |       |        |         |  |        |   |    |    |    |       |      |   |                                                |              |  |  |       |        |         |  |        |   |    |   |    |      |      |   |                                |  |  |  |   |     |         |  |     |   |    |   |   |     |    |   |

| 1(Q2)<br>Lookup<br>tables | Modules & Loops                                | Issue Type   | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF   | LUT  | URAM |
|---------------------------|------------------------------------------------|--------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|------|------|------|
|                           | ▼ dft                                          |              |                | -        |       | 393493          | 3.935E6     | -                 | 393494   | -          | no        | 4    | 5   | 1229 | 1544 |      |
|                           | ► dft_Pipeline_VITIS_LOOP_15_1_VITIS_LOOP_23_2 | II Violation |                | -        |       | 393232          | 3.932E6     | -                 | 393232   | -          | no        | 2    | 5   | 1202 | 1308 |      |
|                           | ► dft_Pipeline_VITIS_LOOP_39_3                 |              |                | -        |       | 258             | 2.580E3     | -                 | 258      | -          | no        | 0    | 0   | 21   | 63   |      |

$$\text{Throughput}_{\text{Per\_Input}} = \frac{1}{(\text{Clock cycles})(\text{Interval})}$$

$$\text{Throughput}_{\text{Per\_DFTsize}} = \frac{(\text{DFTsize})}{(\text{Clock cycles}-\text{ns})(\text{Interval})}$$

|                            | Baseline                                                                  | Lookup Tables (Q1)                                        |
|----------------------------|---------------------------------------------------------------------------|-----------------------------------------------------------|
| Clock Cycle Estimated (ns) | 7.958                                                                     | 7.256                                                     |
| Interval                   | 524627                                                                    | 393494                                                    |
| Throughput (KDFTPS)        | $\frac{(256)}{(1/(7.958 \times 10^{-9}))(524627)} = 61.31 \text{ KDFTPS}$ | $\frac{(256)}{(1/7.256 \times 10^{-9})(393494)} = 89.661$ |
| BRAM                       | 12                                                                        | 4                                                         |
| DSP                        | 68                                                                        | 5                                                         |
| FF                         | 10153                                                                     | 1229                                                      |
| LUT                        | 9615                                                                      | 1544                                                      |

- Throughput improved from 61.31 KDFTPS (kilo DFTs Per Second) to 89.661 KDFTPS

- Since we removed cos() & sin() functions, DSP usage was reduced from 68 to only 5

Throughput improved (increased), since Lookup table logic is simpler and not requiring multipliers, the area and resources used is reduced significantly

- **Question 3:** Modify the DFT function interface so that the input and outputs are stored in separate arrays. Modify the testbench to accommodate this change to DFT interface. How does this affect the optimizations that you can perform? How does it change the performance? And how does the resource usage change? You should use this modified interface for the remaining questions.

## See [dft\\_256\\_precomputer\\_optimized2](#)

dft.cpp:

```
void dft (DTYPE real_sample[SIZE], DTYP imag_sample[SIZE], DTYP real_out[SIZE], DTYP imag_out[SIZE]) {
```

dft.h:

```
void dft(DTYP real_sample[SIZE], DTYP imag_sample[SIZE], DTYP real_out[SIZE], DTYP imag_out[SIZE]);
```

dft\_test.cpp:

```
dft(In_R, In_I,Out_R,Out_I);
```

```
1 INFO: [SIM 2] **** CSIM start ****
2 INFO: [SIM 4] CSIM will launch GCC as the compiler.
3   Compiling ../../dft_test.cpp in debug mode
4   Compiling ../../dft.cpp in debug mode
5   Generating csim.exe
6 -----
7   RMSE(R)          RMSE(I)
8 0.000082730904978 0.000134893096401
9 -----
10 ****
11 PASS: The output matches the golden output!
12 ****
13 INFO: [SIM 1] CSim done with 0 errors.
14 INFO: [SIM 3] **** CSIM finish ****
```

| Optimization                      |                                                                                                                                                                                            |             |           |             |          |          |         |
|-----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-----------|-------------|----------|----------|---------|
| 1(Q2)<br>Lookup<br>tables         | <table border="1"> <thead> <tr> <th>Target</th><th>Estimated</th><th>Uncertainty</th></tr> </thead> <tbody> <tr> <td>10.00 ns</td><td>7.256 ns</td><td>2.70 ns</td></tr> </tbody> </table> | Target      | Estimated | Uncertainty | 10.00 ns | 7.256 ns | 2.70 ns |
| Target                            | Estimated                                                                                                                                                                                  | Uncertainty |           |             |          |          |         |
| 10.00 ns                          | 7.256 ns                                                                                                                                                                                   | 2.70 ns     |           |             |          |          |         |
| 2 (Q3)<br>I/O arrays<br>separated | <table border="1"> <thead> <tr> <th>Target</th><th>Estimated</th><th>Uncertainty</th></tr> </thead> <tbody> <tr> <td>10.00 ns</td><td>7.256 ns</td><td>2.70 ns</td></tr> </tbody> </table> | Target      | Estimated | Uncertainty | 10.00 ns | 7.256 ns | 2.70 ns |
| Target                            | Estimated                                                                                                                                                                                  | Uncertainty |           |             |          |          |         |
| 10.00 ns                          | 7.256 ns                                                                                                                                                                                   | 2.70 ns     |           |             |          |          |         |

| Optimization                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                     |            |                |                 |             |                   |             |                   |           |            |           |      |      |      |     |      |       |              |  |  |   |        |         |   |        |   |    |   |   |      |      |  |                                                |              |                     |  |   |        |         |    |        |       |     |   |   |      |      |  |                                |  |  |  |   |     |         |   |     |   |    |   |   |    |    |  |
|------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|------------|----------------|-----------------|-------------|-------------------|-------------|-------------------|-----------|------------|-----------|------|------|------|-----|------|-------|--------------|--|--|---|--------|---------|---|--------|---|----|---|---|------|------|--|------------------------------------------------|--------------|---------------------|--|---|--------|---------|----|--------|-------|-----|---|---|------|------|--|--------------------------------|--|--|--|---|-----|---------|---|-----|---|----|---|---|----|----|--|
| 1(Q2)<br>Lookup tables                         | <table border="1"> <thead> <tr> <th>Modules &amp; Loops</th><th>Issue Type</th><th>Violation Type</th><th>Distance</th><th>Slack</th><th>Latency(cycles)</th><th>Latency(ns)</th><th>Iteration Latency</th><th>Interval</th><th>Trip Count</th><th>Pipelined</th><th>BRAM</th><th>DSP</th><th>FF</th><th>LUT</th><th>URAM</th></tr> </thead> <tbody> <tr> <td>▼ dft</td><td></td><td></td><td></td><td>-</td><td>393493</td><td>3.935E6</td><td>-</td><td>393494</td><td>-</td><td>no</td><td>4</td><td>5</td><td>1229</td><td>1544</td><td></td></tr> <tr> <td>► dft_Pipeline_VITIS_LOOP_15_1_VITIS_LOOP_23_2</td><td>II Violation</td><td></td><td></td><td>-</td><td>393232</td><td>3.932E6</td><td>-</td><td>393232</td><td>-</td><td>no</td><td>2</td><td>5</td><td>1202</td><td>1308</td><td></td></tr> <tr> <td>► dft_Pipeline_VITIS_LOOP_39_3</td><td></td><td></td><td></td><td>-</td><td>258</td><td>2.580E3</td><td>-</td><td>258</td><td>-</td><td>no</td><td>0</td><td>0</td><td>21</td><td>63</td><td></td></tr> </tbody> </table> | Modules & Loops     | Issue Type | Violation Type | Distance        | Slack       | Latency(cycles)   | Latency(ns) | Iteration Latency | Interval  | Trip Count | Pipelined | BRAM | DSP  | FF   | LUT | URAM | ▼ dft |              |  |  | - | 393493 | 3.935E6 | - | 393494 | - | no | 4 | 5 | 1229 | 1544 |  | ► dft_Pipeline_VITIS_LOOP_15_1_VITIS_LOOP_23_2 | II Violation |                     |  | - | 393232 | 3.932E6 | -  | 393232 | -     | no  | 2 | 5 | 1202 | 1308 |  | ► dft_Pipeline_VITIS_LOOP_39_3 |  |  |  | - | 258 | 2.580E3 | - | 258 | - | no | 0 | 0 | 21 | 63 |  |
| Modules & Loops                                | Issue Type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Violation Type      | Distance   | Slack          | Latency(cycles) | Latency(ns) | Iteration Latency | Interval    | Trip Count        | Pipelined | BRAM       | DSP       | FF   | LUT  | URAM |     |      |       |              |  |  |   |        |         |   |        |   |    |   |   |      |      |  |                                                |              |                     |  |   |        |         |    |        |       |     |   |   |      |      |  |                                |  |  |  |   |     |         |   |     |   |    |   |   |    |    |  |
| ▼ dft                                          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                     |            | -              | 393493          | 3.935E6     | -                 | 393494      | -                 | no        | 4          | 5         | 1229 | 1544 |      |     |      |       |              |  |  |   |        |         |   |        |   |    |   |   |      |      |  |                                                |              |                     |  |   |        |         |    |        |       |     |   |   |      |      |  |                                |  |  |  |   |     |         |   |     |   |    |   |   |    |    |  |
| ► dft_Pipeline_VITIS_LOOP_15_1_VITIS_LOOP_23_2 | II Violation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                     |            | -              | 393232          | 3.932E6     | -                 | 393232      | -                 | no        | 2          | 5         | 1202 | 1308 |      |     |      |       |              |  |  |   |        |         |   |        |   |    |   |   |      |      |  |                                                |              |                     |  |   |        |         |    |        |       |     |   |   |      |      |  |                                |  |  |  |   |     |         |   |     |   |    |   |   |    |    |  |
| ► dft_Pipeline_VITIS_LOOP_39_3                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                     |            | -              | 258             | 2.580E3     | -                 | 258         | -                 | no        | 0          | 0         | 21   | 63   |      |     |      |       |              |  |  |   |        |         |   |        |   |    |   |   |      |      |  |                                                |              |                     |  |   |        |         |    |        |       |     |   |   |      |      |  |                                |  |  |  |   |     |         |   |     |   |    |   |   |    |    |  |
| 2 (Q3)<br>I/O arrays<br>separated              | <table border="1"> <thead> <tr> <th>Modules &amp; Loops</th><th>Issue Type</th><th>Violation Type</th><th>Distance</th><th>Slack</th><th>Latency(cycles)</th><th>Latency(ns)</th><th>Iteration Latency</th><th>Interval</th><th>Trip Count</th><th>Pipelined</th><th>BRAM</th><th>DSP</th><th>FF</th><th>LUT</th><th>URAM</th></tr> </thead> <tbody> <tr> <td>▼ dft</td><td>II Violation</td><td></td><td></td><td>-</td><td>393232</td><td>3.932E6</td><td>-</td><td>393233</td><td>-</td><td>no</td><td>2</td><td>5</td><td>1202</td><td>1341</td><td></td></tr> <tr> <td>► VITIS_LOOP_9_1_VITIS_LOOP_13_2</td><td>II Violation</td><td>Memory Dependency 1</td><td></td><td>-</td><td>393230</td><td>3.932E6</td><td>21</td><td>6</td><td>65536</td><td>yes</td><td>-</td><td>-</td><td>-</td><td>-</td><td></td></tr> </tbody> </table>                                                                                                                                                                                                      | Modules & Loops     | Issue Type | Violation Type | Distance        | Slack       | Latency(cycles)   | Latency(ns) | Iteration Latency | Interval  | Trip Count | Pipelined | BRAM | DSP  | FF   | LUT | URAM | ▼ dft | II Violation |  |  | - | 393232 | 3.932E6 | - | 393233 | - | no | 2 | 5 | 1202 | 1341 |  | ► VITIS_LOOP_9_1_VITIS_LOOP_13_2               | II Violation | Memory Dependency 1 |  | - | 393230 | 3.932E6 | 21 | 6      | 65536 | yes | - | - | -    | -    |  |                                |  |  |  |   |     |         |   |     |   |    |   |   |    |    |  |
| Modules & Loops                                | Issue Type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Violation Type      | Distance   | Slack          | Latency(cycles) | Latency(ns) | Iteration Latency | Interval    | Trip Count        | Pipelined | BRAM       | DSP       | FF   | LUT  | URAM |     |      |       |              |  |  |   |        |         |   |        |   |    |   |   |      |      |  |                                                |              |                     |  |   |        |         |    |        |       |     |   |   |      |      |  |                                |  |  |  |   |     |         |   |     |   |    |   |   |    |    |  |
| ▼ dft                                          | II Violation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                     |            | -              | 393232          | 3.932E6     | -                 | 393233      | -                 | no        | 2          | 5         | 1202 | 1341 |      |     |      |       |              |  |  |   |        |         |   |        |   |    |   |   |      |      |  |                                                |              |                     |  |   |        |         |    |        |       |     |   |   |      |      |  |                                |  |  |  |   |     |         |   |     |   |    |   |   |    |    |  |
| ► VITIS_LOOP_9_1_VITIS_LOOP_13_2               | II Violation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Memory Dependency 1 |            | -              | 393230          | 3.932E6     | 21                | 6           | 65536             | yes       | -          | -         | -    | -    |      |     |      |       |              |  |  |   |        |         |   |        |   |    |   |   |      |      |  |                                                |              |                     |  |   |        |         |    |        |       |     |   |   |      |      |  |                                |  |  |  |   |     |         |   |     |   |    |   |   |    |    |  |

|                               | Optimization1(Q2)<br>Lookup tables                          | Optimization2(Q3)<br>I/O arrays separated                   |
|-------------------------------|-------------------------------------------------------------|-------------------------------------------------------------|
| Clock Cycle<br>Estimated (ns) | 7.256                                                       | 7.256                                                       |
| Interval                      | 393494                                                      | 393233                                                      |
| Throughput<br>(KDFTPS)        | $\frac{(256)}{(1/(7.256 \times 10^{-9}))(393494)} = 89.661$ | $\frac{(256)}{(1/(7.256 \times 10^{-9}))(393233)} = 89.720$ |
| BRAM                          | 4                                                           | 2                                                           |
| DSP                           | 5                                                           | 5                                                           |
| FF                            | 1229                                                        | 1202                                                        |
| LUT                           | 1544                                                        | 1341                                                        |

The separation of the input and output arrays didn't improve performance significantly (see throughput), but the separation of the input and output arrays, eliminate the intermediate temporal arrays and allow us to perform parallel processing, so instead of sequential read & write, now it can be in parallel.

- Question 4: Loop Optimizations:** Examine the effects of loop unrolling and array partitioning on the performance and resource utilization. What is the relationship between array partitioning and loop unrolling? Does it help to perform one without the other? Plot the performance in terms of number of DFT operations per second (throughput) versus the unroll and array partitioning factor. Plot the same trend for resources (showing LUTs, FFs, DSP blocks, BRAMs). What is the general trend in both cases? Which design would you select? Why?

The relationship between array partitioning and loop unrolling is that loop unrolling allows parallelism on memory access, and this has a significant effect on throughput.

If the configuration doesn't provide enough memory access concurrency with the right loop partitioning, then the loop unrolling is serialized (sequentially) and cannot achieve its full potential.,

To do Loop unrolling without array partitioning causes period increases when increase loop unrolling factor without array partitioning.

HLS unroll factor=8 **without** Partition arrays:

| Target             | Estimated    | Uncertainty    |          |       |                 |             |                   |          |            |           |      |     |      |      |      |
|--------------------|--------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|------|------|------|
| 10.00 ns           | 8.567 ns     | 2.70 ns        |          |       |                 |             |                   |          |            |           |      |     |      |      |      |
| Modules & Loops    | Issue Type   | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF   | LUT  | URAM |
| ▼  dft             |              |                | -1.27    |       | 332801          | 3.328E6     | -                 | 332802   | -          | no        | 2    | 7   | 2617 | 2272 | 0    |
| ►  VITIS_LOOP_20_1 | II Violation |                | -        |       | 332800          | 3.328E6     | 1300              | -        | 256        | no        | -    | -   | -    | -    | -    |

HLS unroll factor=2 with Partition arrays:

| Target             | Estimated    | Uncertainty    |          |       |                 |             |                   |          |            |           |      |     |      |      |      |
|--------------------|--------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|------|------|------|
| 10.00 ns           | 8.495 ns     | 2.70 ns        |          |       |                 |             |                   |          |            |           |      |     |      |      |      |
| Modules & Loops    | Issue Type   | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF   | LUT  | URAM |
| ▼  dft             |              |                | -1.19    |       | 333058          | 3.331E6     | -                 | 333059   | -          | no        | 4    | 7   | 1579 | 1631 | 0    |
| ►  VITIS_LOOP_19_1 | II Violation |                | -        |       | 333057          | 3.331E6     | 1301              | -        | 256        | no        | -    | -   | -    | -    | -    |

HLS unroll factor=4 with Partition arrays:

| Target   | Estimated | Uncertainty |
|----------|-----------|-------------|
| 10.00 ns | 8.352 ns  | 2.70 ns     |

| Modules & Loops    | Issue Type | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF | LUT  | URAM |   |
|--------------------|------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|----|------|------|---|
| ▼  dft             |            |                |          | -1.05 | 332801          | 3.328E6     |                   | -        | 332802     | -         | no   | 8   | 7  | 2122 | 2034 | 0 |
| ►  VITIS_LOOP_19_1 |            | II Violation   |          | -     | 332800          | 3.328E6     | 1300              | -        | 256        | no        | -    | -   | -  | -    | -    |   |

HLS unroll factor=6 with Partition arrays:

| Target   | Estimated | Uncertainty |
|----------|-----------|-------------|
| 10.00 ns | 9.162 ns  | 2.70 ns     |

| Modules & Loops    | Issue Type | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF | LUT  | URAM |   |
|--------------------|------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|----|------|------|---|
| ▼  dft             |            |                |          | -1.86 | 335105          | 3.351E6     |                   | -        | 335106     | -         | no   | 12  | 7  | 5753 | 3950 | 0 |
| ►  VITIS_LOOP_19_1 |            | II Violation   |          | -     | 335104          | 3.351E6     | 1309              | -        | 256        | no        | -    | -   | -  | -    | -    |   |

HLS unroll factor=8 with Partition arrays:

| Target   | Estimated | Uncertainty |
|----------|-----------|-------------|
| 10.00 ns | 8.567 ns  | 2.70 ns     |

| Modules & Loops    | Issue Type | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF | LUT  | URAM |  |
|--------------------|------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|----|------|------|--|
| ▼  dft             |            |                |          | -1.27 | 332801          | 3.328E6     |                   | -        | 332802     | -         | no   | 16  | 7  | 3122 | 3103 |  |
| ►  VITIS_LOOP_19_1 |            | II Violation   |          | -     | 332800          | 3.328E6     | 1300              | -        | 256        | no        | -    | -   | -  | -    | -    |  |

| DFT size | Optimization Configuration                   | Unroll factor | Clock Cycle Estimated (ns) | Clock Cycle Estimated (ns) | Interval | Throughput (DFT PS) | Throughput (K DFT PS) | BRAM | DSP | FF   | LUT  |
|----------|----------------------------------------------|---------------|----------------------------|----------------------------|----------|---------------------|-----------------------|------|-----|------|------|
| 256      | HLS unroll factor=8 WITHOUT Partition arrays | 8             | 8.567                      | 8.567E-09                  | 332802   | 89789.44163         | 89.79                 | 2    | 7   | 2617 | 2272 |
| 256      | HLS unroll factor=2 with Partition arrays    | 2             | 8.495                      | 8.495E-09                  | 333059   | 90480.58677         | 90.48                 | 4    | 7   | 1579 | 1631 |
| 256      | HLS unroll factor=4 with Partition arrays    | 4             | 8.352                      | 8.352E-09                  | 332802   | 92100.83171         | 92.10                 | 8    | 7   | 2122 | 2034 |
| 256      | HLS unroll factor=6 with Partition arrays    | 6             | 9.152                      | 9.152E-09                  | 335106   | 83472.17887         | 83.47                 | 12   | 7   | 5753 | 3950 |
| 256      | HLS unroll factor=8 with Partition arrays    | 8             | 8.567                      | 8.567E-09                  | 332802   | 89789.44163         | 89.79                 | 16   | 7   | 3122 | 3103 |

r





### Code changes:

**#pragma HLS array\_partition**

Partitions an array into smaller arrays or individual elements and provides the following:

- Results in RTL with multiple small memories or multiple registers instead of one large memory.
- Effectively increases the amount of read and write ports for the storage.
- Potentially improves the throughput of the design.
- Requires more memory instances or registers.

`#pragma HLS array_partition variable=real_sample_in cyclic factor=8`

### Syntax

Place the pragma in the C source within the boundaries of the function where the array variable is defined.

```
#pragma HLS array_partition variable=<name> \
type=<type>  factor=<int>  dim=<int> off=true
```

## cyclic

Cyclic partitioning creates smaller arrays by interleaving elements from the original array. The array is partitioned cyclically by putting one element into each new array before coming back to the first array to repeat the cycle until the array is fully partitioned.

For example, if `factor=3` is used:

- Element 0 is assigned to the first new array
- Element 1 is assigned to the second new array.
- Element 2 is assigned to the third new array.
- Element 3 is assigned to the first new array again.

## factor=<int>

Specifies the number of smaller arrays that are to be created.

---

**!! Important:** For complete type partitioning, the factor is not specified. For block and cyclic partitioning, the `factor=` is required.

```
//Partition arrays into smaller arrays to individual elements to provide:  
// multiple registers instead of one large memory to Potentially improves the throughput of the design  
#pragma HLS array_partition variable=real_sample cyclic factor=8  
#pragma HLS array_partition variable=imag_sample cyclic factor=8  
#pragma HLS array_partition variable=real_out cyclic factor=8  
#pragma HLS array_partition variable=imag_out cyclic factor=8  
#pragma HLS array_partition variable=cos_coefficients_table cyclic factor=8  
#pragma HLS array_partition variable=sin_coefficients_table cyclic factor=8
```

```

for (j = 0; j < SIZE; j += 1) {
    #pragma HLS unroll factor=8
    // Utilize HLS tool to calculate sine and cosine values
    int index = (i*j)%SIZE;
    cos = cos_coefficients_table[index];
    sin = sin_coefficients_table[index];

    // Multiply the current phasor with the appropriate input sample and keep
    // running sum
    temp_real += (real_sample[j] * cos - imag_sample[j] * sin);
    temp_imag += (real_sample[j] * sin + imag_sample[j] * cos);
}

```

```

INFO: [SIM 2] **** CSIM start ****
INFO: [SIM 4] CSIM will launch GCC as the compiler.
Compiling ../../dft_test.cpp in debug mode
Compiling ../../dft.cpp in debug mode
Generating csim.exe
-----
RMSE(R)          RMSE(I)
0.000082730904978 0.000134893096401
-----
*****
PASS: The output matches the golden output!
*****
INFO: [SIM 1] CSim done with 0 errors.
INFO: [SIM 3] **** CSIM finish ****

```

| Optimization                                     |                                                                                                                                                                                            |             |           |             |          |          |         |
|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-----------|-------------|----------|----------|---------|
| Baseline                                         | <table border="1"> <thead> <tr> <th>Target</th><th>Estimated</th><th>Uncertainty</th></tr> </thead> <tbody> <tr> <td>10.00 ns</td><td>7.958 ns</td><td>2.70 ns</td></tr> </tbody> </table> | Target      | Estimated | Uncertainty | 10.00 ns | 7.958 ns | 2.70 ns |
| Target                                           | Estimated                                                                                                                                                                                  | Uncertainty |           |             |          |          |         |
| 10.00 ns                                         | 7.958 ns                                                                                                                                                                                   | 2.70 ns     |           |             |          |          |         |
| 1(Q2)<br>Lookup tables                           | <table border="1"> <thead> <tr> <th>Target</th><th>Estimated</th><th>Uncertainty</th></tr> </thead> <tbody> <tr> <td>10.00 ns</td><td>7.256 ns</td><td>2.70 ns</td></tr> </tbody> </table> | Target      | Estimated | Uncertainty | 10.00 ns | 7.256 ns | 2.70 ns |
| Target                                           | Estimated                                                                                                                                                                                  | Uncertainty |           |             |          |          |         |
| 10.00 ns                                         | 7.256 ns                                                                                                                                                                                   | 2.70 ns     |           |             |          |          |         |
| 2 (Q3)<br>I/O arrays separated                   | <table border="1"> <thead> <tr> <th>Target</th><th>Estimated</th><th>Uncertainty</th></tr> </thead> <tbody> <tr> <td>10.00 ns</td><td>7.256 ns</td><td>2.70 ns</td></tr> </tbody> </table> | Target      | Estimated | Uncertainty | 10.00 ns | 7.256 ns | 2.70 ns |
| Target                                           | Estimated                                                                                                                                                                                  | Uncertainty |           |             |          |          |         |
| 10.00 ns                                         | 7.256 ns                                                                                                                                                                                   | 2.70 ns     |           |             |          |          |         |
| 3 (Q4)<br>Loop unrolling<br>&<br>Array partition | <table border="1"> <thead> <tr> <th>Target</th><th>Estimated</th><th>Uncertainty</th></tr> </thead> <tbody> <tr> <td>10.00 ns</td><td>8.567 ns</td><td>2.70 ns</td></tr> </tbody> </table> | Target      | Estimated | Uncertainty | 10.00 ns | 8.567 ns | 2.70 ns |
| Target                                           | Estimated                                                                                                                                                                                  | Uncertainty |           |             |          |          |         |
| 10.00 ns                                         | 8.567 ns                                                                                                                                                                                   | 2.70 ns     |           |             |          |          |         |

| Optimization                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                     |                |                 |             |                   |             |                   |           |            |           |       |      |      |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
|--------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|----------------|-----------------|-------------|-------------------|-------------|-------------------|-----------|------------|-----------|-------|------|------|-----|------|-------|--------------|---|-------|--------|---------|---|--------|---|----|----|----|-------|------|---|------------------------------------------------|--------------|---------------------|-------|--------|---------|------|--------|-------|-----|---|----|------|------|---|--------------------------------|--|---|---|-----|---------|---|-----|---|----|---|---|-----|----|---|
| Baseline                                               | <b>Modules &amp; Loops</b> <table> <thead> <tr> <th>Issue Type</th><th>Violation Type</th><th>Distance</th><th>Slack</th><th>Latency(cycles)</th><th>Latency(ns)</th><th>Iteration Latency</th><th>Interval</th><th>Trip Count</th><th>Pipelined</th><th>BRAM</th><th>DSP</th><th>FF</th><th>LUT</th><th>URAM</th></tr> </thead> <tbody> <tr> <td>▼ dft</td><td></td><td>-</td><td>-0.66</td><td>524626</td><td>5.246E6</td><td>-</td><td>524627</td><td>-</td><td>no</td><td>12</td><td>68</td><td>10153</td><td>9615</td><td>0</td></tr> <tr> <td>► dft_Pipeline_VITIS_LOOP_18_1_VITIS_LOOP_26_2</td><td>II Violation</td><td>-</td><td>-0.66</td><td>524362</td><td>5.244E6</td><td>-</td><td>524362</td><td>-</td><td>no</td><td>8</td><td>68</td><td>9864</td><td>9347</td><td>0</td></tr> <tr> <td>► dft_Pipeline_VITIS_LOOP_39_3</td><td></td><td>-</td><td>-</td><td>261</td><td>2.610E3</td><td>-</td><td>261</td><td>-</td><td>no</td><td>0</td><td>0</td><td>283</td><td>95</td><td>0</td></tr> </tbody> </table> | Issue Type          | Violation Type | Distance        | Slack       | Latency(cycles)   | Latency(ns) | Iteration Latency | Interval  | Trip Count | Pipelined | BRAM  | DSP  | FF   | LUT | URAM | ▼ dft |              | - | -0.66 | 524626 | 5.246E6 | - | 524627 | - | no | 12 | 68 | 10153 | 9615 | 0 | ► dft_Pipeline_VITIS_LOOP_18_1_VITIS_LOOP_26_2 | II Violation | -                   | -0.66 | 524362 | 5.244E6 | -    | 524362 | -     | no  | 8 | 68 | 9864 | 9347 | 0 | ► dft_Pipeline_VITIS_LOOP_39_3 |  | - | - | 261 | 2.610E3 | - | 261 | - | no | 0 | 0 | 283 | 95 | 0 |
| Issue Type                                             | Violation Type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Distance            | Slack          | Latency(cycles) | Latency(ns) | Iteration Latency | Interval    | Trip Count        | Pipelined | BRAM       | DSP       | FF    | LUT  | URAM |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ▼ dft                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | -                   | -0.66          | 524626          | 5.246E6     | -                 | 524627      | -                 | no        | 12         | 68        | 10153 | 9615 | 0    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ► dft_Pipeline_VITIS_LOOP_18_1_VITIS_LOOP_26_2         | II Violation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | -                   | -0.66          | 524362          | 5.244E6     | -                 | 524362      | -                 | no        | 8          | 68        | 9864  | 9347 | 0    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ► dft_Pipeline_VITIS_LOOP_39_3                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | -                   | -              | 261             | 2.610E3     | -                 | 261         | -                 | no        | 0          | 0         | 283   | 95   | 0    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| 1(Q2)<br>Lookup<br>tables                              | <b>Modules &amp; Loops</b> <table> <thead> <tr> <th>Issue Type</th><th>Violation Type</th><th>Distance</th><th>Slack</th><th>Latency(cycles)</th><th>Latency(ns)</th><th>Iteration Latency</th><th>Interval</th><th>Trip Count</th><th>Pipelined</th><th>BRAM</th><th>DSP</th><th>FF</th><th>LUT</th><th>URAM</th></tr> </thead> <tbody> <tr> <td>▼ dft</td><td></td><td>-</td><td>-</td><td>393493</td><td>3.935E6</td><td>-</td><td>393494</td><td>-</td><td>no</td><td>4</td><td>5</td><td>1229</td><td>1544</td><td>0</td></tr> <tr> <td>► dft_Pipeline_VITIS_LOOP_15_1_VITIS_LOOP_23_2</td><td>II Violation</td><td>-</td><td>-</td><td>393232</td><td>3.932E6</td><td>-</td><td>393232</td><td>-</td><td>no</td><td>2</td><td>5</td><td>1202</td><td>1308</td><td>0</td></tr> <tr> <td>► dft_Pipeline_VITIS_LOOP_39_3</td><td></td><td>-</td><td>-</td><td>258</td><td>2.580E3</td><td>-</td><td>258</td><td>-</td><td>no</td><td>0</td><td>0</td><td>21</td><td>63</td><td>0</td></tr> </tbody> </table>              | Issue Type          | Violation Type | Distance        | Slack       | Latency(cycles)   | Latency(ns) | Iteration Latency | Interval  | Trip Count | Pipelined | BRAM  | DSP  | FF   | LUT | URAM | ▼ dft |              | - | -     | 393493 | 3.935E6 | - | 393494 | - | no | 4  | 5  | 1229  | 1544 | 0 | ► dft_Pipeline_VITIS_LOOP_15_1_VITIS_LOOP_23_2 | II Violation | -                   | -     | 393232 | 3.932E6 | -    | 393232 | -     | no  | 2 | 5  | 1202 | 1308 | 0 | ► dft_Pipeline_VITIS_LOOP_39_3 |  | - | - | 258 | 2.580E3 | - | 258 | - | no | 0 | 0 | 21  | 63 | 0 |
| Issue Type                                             | Violation Type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Distance            | Slack          | Latency(cycles) | Latency(ns) | Iteration Latency | Interval    | Trip Count        | Pipelined | BRAM       | DSP       | FF    | LUT  | URAM |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ▼ dft                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | -                   | -              | 393493          | 3.935E6     | -                 | 393494      | -                 | no        | 4          | 5         | 1229  | 1544 | 0    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ► dft_Pipeline_VITIS_LOOP_15_1_VITIS_LOOP_23_2         | II Violation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | -                   | -              | 393232          | 3.932E6     | -                 | 393232      | -                 | no        | 2          | 5         | 1202  | 1308 | 0    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ► dft_Pipeline_VITIS_LOOP_39_3                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | -                   | -              | 258             | 2.580E3     | -                 | 258         | -                 | no        | 0          | 0         | 21    | 63   | 0    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| 2 (Q3)<br>I/O arrays<br>separated                      | <b>Modules &amp; Loops</b> <table> <thead> <tr> <th>Issue Type</th><th>Violation Type</th><th>Distance</th><th>Slack</th><th>Latency(cycles)</th><th>Latency(ns)</th><th>Iteration Latency</th><th>Interval</th><th>Trip Count</th><th>Pipelined</th><th>BRAM</th><th>DSP</th><th>FF</th><th>LUT</th><th>URAM</th></tr> </thead> <tbody> <tr> <td>▼ dft</td><td>II Violation</td><td>-</td><td>-</td><td>393232</td><td>3.932E6</td><td>-</td><td>393233</td><td>-</td><td>no</td><td>2</td><td>5</td><td>1202</td><td>1341</td><td>0</td></tr> <tr> <td>► VITIS_LOOP_9_1_VITIS_LOOP_13_2</td><td>II Violation</td><td>Memory Dependency 1</td><td>-</td><td>393230</td><td>3.932E6</td><td>21</td><td>6</td><td>65536</td><td>yes</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr> </tbody> </table>                                                                                                                                                                                                             | Issue Type          | Violation Type | Distance        | Slack       | Latency(cycles)   | Latency(ns) | Iteration Latency | Interval  | Trip Count | Pipelined | BRAM  | DSP  | FF   | LUT | URAM | ▼ dft | II Violation | - | -     | 393232 | 3.932E6 | - | 393233 | - | no | 2  | 5  | 1202  | 1341 | 0 | ► VITIS_LOOP_9_1_VITIS_LOOP_13_2               | II Violation | Memory Dependency 1 | -     | 393230 | 3.932E6 | 21   | 6      | 65536 | yes | - | -  | -    | -    | - |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| Issue Type                                             | Violation Type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Distance            | Slack          | Latency(cycles) | Latency(ns) | Iteration Latency | Interval    | Trip Count        | Pipelined | BRAM       | DSP       | FF    | LUT  | URAM |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ▼ dft                                                  | II Violation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | -                   | -              | 393232          | 3.932E6     | -                 | 393233      | -                 | no        | 2          | 5         | 1202  | 1341 | 0    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ► VITIS_LOOP_9_1_VITIS_LOOP_13_2                       | II Violation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Memory Dependency 1 | -              | 393230          | 3.932E6     | 21                | 6           | 65536             | yes       | -          | -         | -     | -    | -    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| 3 (Q4)<br>Loop<br>unrolling<br>&<br>Array<br>partition | <b>Modules &amp; Loops</b> <table> <thead> <tr> <th>Issue Type</th><th>Violation Type</th><th>Distance</th><th>Slack</th><th>Latency(cycles)</th><th>Latency(ns)</th><th>Iteration Latency</th><th>Interval</th><th>Trip Count</th><th>Pipelined</th><th>BRAM</th><th>DSP</th><th>FF</th><th>LUT</th><th>URAM</th></tr> </thead> <tbody> <tr> <td>▼ dft</td><td></td><td>-</td><td>-1.27</td><td>332801</td><td>3.328E6</td><td>-</td><td>332802</td><td>-</td><td>no</td><td>16</td><td>7</td><td>3122</td><td>3103</td><td>0</td></tr> <tr> <td>► VITIS_LOOP_19_1</td><td>II Violation</td><td>-</td><td>-</td><td>332800</td><td>3.328E6</td><td>1300</td><td>-</td><td>256</td><td>no</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr> </tbody> </table>                                                                                                                                                                                                                                                      | Issue Type          | Violation Type | Distance        | Slack       | Latency(cycles)   | Latency(ns) | Iteration Latency | Interval  | Trip Count | Pipelined | BRAM  | DSP  | FF   | LUT | URAM | ▼ dft |              | - | -1.27 | 332801 | 3.328E6 | - | 332802 | - | no | 16 | 7  | 3122  | 3103 | 0 | ► VITIS_LOOP_19_1                              | II Violation | -                   | -     | 332800 | 3.328E6 | 1300 | -      | 256   | no  | - | -  | -    | -    | - |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| Issue Type                                             | Violation Type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Distance            | Slack          | Latency(cycles) | Latency(ns) | Iteration Latency | Interval    | Trip Count        | Pipelined | BRAM       | DSP       | FF    | LUT  | URAM |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ▼ dft                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | -                   | -1.27          | 332801          | 3.328E6     | -                 | 332802      | -                 | no        | 16         | 7         | 3122  | 3103 | 0    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |
| ► VITIS_LOOP_19_1                                      | II Violation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | -                   | -              | 332800          | 3.328E6     | 1300              | -           | 256               | no        | -          | -         | -     | -    | -    |     |      |       |              |   |       |        |         |   |        |   |    |    |    |       |      |   |                                                |              |                     |       |        |         |      |        |       |     |   |    |      |      |   |                                |  |   |   |     |         |   |     |   |    |   |   |     |    |   |

- **Question 5: Dataflow:** Apply dataflow pragma to your design to improve throughput. You may need to change your code and make submodules so that it aligns with the task-level or function-level modularity that dataflow can exploit; Xilinx provides [some examples of dataflow code](#). The [Vitis HLS User Guide](#) and [this summary](#) provide more information. How much improvement does dataflow provide? How does dataflow affect resource usage? What about BRAM usage specifically? Did you modify the code to make it more amenable to dataflow? If so, how? Please describe your architecture(s) with figures on your report.

## Dataflow implementation



```

#include<math.h>
#include "dft.h"
#include"coefficients256.h"

void Loop1(int i, DTYPEn cos[SIZE], DTYPEn sin[SIZE]) {
    int j;
    for (j = 0; j < SIZE; j++) {
#pragma HLS unroll factor=8
        //
        int index = (i * j) % SIZE;
        cos[j] = cos_coefficients_table[index];
        sin[j] = sin_coefficients_table[index];
    }
}

void Loop2(DTYPEn cos[SIZE], DTYPEn sin[SIZE], DTYPEn real_in[SIZE], DTYPEn imag_in[SIZE], DTYPEn temp_real_out[SIZE], DTYPEn temp_imag_out[SIZE]) {
    int j;
    for (j = 0; j < SIZE; j++) {
#pragma HLS unroll factor=8
        DTYPEn c_temp = cos[j];
        DTYPEn s_temp = sin[j];
        DTYPEn temp_real_in = real_in[j];
        DTYPEn temp_imag_in = imag_in[j];
        temp_real_out[j] = (temp_real_in * c_temp - temp_imag_in * s_temp);
        temp_imag_out[j] = (temp_real_in * s_temp + temp_imag_in * c_temp);
    }
}

void Loop3(int i, DTYPEn temp_real_out[SIZE], DTYPEn temp_imag_out[SIZE], DTYPEn sum_real[SIZE], DTYPEn sum_imag[SIZE]) {
    int j;
    for (j = 0; j < SIZE; j++) {
#pragma HLS unroll factor=8
        if (j==0) {
            sum_real[i]=temp_real_out[j];
            sum_imag[i]=temp_imag_out[j];
        } else {
            sum_real[i]+=temp_real_out[j];
            sum_imag[i]+=temp_imag_out[j];
        }
    }
}

void LastPart(int i, DTYPEn sum_real[SIZE], DTYPEn sum_imag[SIZE], DTYPEn real_out[SIZE], DTYPEn imag_out[SIZE]) {
    real_out[i] = sum_real[i];
    imag_out[i] = sum_imag[i];
}

void dft (DTYPEn real_in[SIZE], DTYPEn imag_in[SIZE], DTYPEn real_out[SIZE], DTYPEn imag_out[SIZE]) {
    //Partition arrays into smaller arrays to individual elements to provide:
    // multiple registers instead of one large memory to Potentially improves
    // the throughput of the design
    #pragma HLS array_partition variable=real_in cyclic factor=8
    #pragma HLS array_partition variable=imag_in cyclic factor=8
    #pragma HLS array_partition variable=real_out cyclic factor=8
    #pragma HLS array_partition variable=imag_out cyclic factor=8
    #pragma HLS array_partition variable=cos_coefficients_table cyclic factor=8
    #pragma HLS array_partition variable=sin_coefficients_table cyclic factor=8
    // Compute each frequency domain sample
    for (int i = 0; i < SIZE; i++) {
        #pragma HLS dataflow
        DTYPEn cos[SIZE], sin[SIZE];
        DTYPEn temp_real_out[SIZE], temp_imag_out[SIZE];
        DTYPEn sum_real[SIZE], sum_imag[SIZE];
        Loop1(i, cos, sin);                                // Lookup tables
        Loop2(cos, sin, real_in, imag_in, temp_real_out, temp_imag_out); // Multiply the current phasor with the appropriate input sample
        Loop3(i, temp_real_out, temp_imag_out, sum_real, sum_imag); // Summation of products
        LastPart(i,sum_real,sum_imag, real_out, imag_out); // Return values
    }
}

```

Dataflow, HLS unroll factor=8 with Partition arrays

| Target            | Estimated             | Uncertainty    |          |       |                 |             |                   |          |            |           |      |     |      |      |      |
|-------------------|-----------------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|------|------|------|
| 10.00 ns          | 11.058 ns             | 2.70 ns        |          |       |                 |             |                   |          |            |           |      |     |      |      |      |
| Modules & Loops   | Issue Type            | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF   | LUT  | URAM |
| ▼ dft             | ⚠ Timing Violation    |                | -3.76    |       | 338710          | 3.745E6     | -                 | 338711   | -          | no        | 34   | 34  | 7077 | 8629 | 0    |
| ► VITIS_LOOP_56_1 | ⚠ II&Timing Violation |                | -        |       | 338709          | 3.745E6     | 338709            | -        | 256        | dataflow  | -    | -   | -    | -    | -    |

Dataflow, HLS unroll factor=4 with Partition arrays

| Target            | Estimated          | Uncertainty    |          |       |                 |             |                   |          |            |           |      |     |      |      |      |
|-------------------|--------------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|------|------|------|
| 10.00 ns          | 7.256 ns           | 2.70 ns        |          |       |                 |             |                   |          |            |           |      |     |      |      |      |
| Modules & Loops   | Issue Type         | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF   | LUT  | URAM |
| ▼ dft             | ⚠ Timing Violation |                | -        |       | 346901          | 3.469E6     | -                 | 346902   | -          | no        | 26   | 34  | 5009 | 6871 | 0    |
| ► VITIS_LOOP_59_1 | ⚠ II Violation     |                | -        |       | 346900          | 3.469E6     | 346900            | -        | 256        | dataflow  | -    | -   | -    | -    | -    |

|                               | Optimization1(Q4)<br>Loop unrolling & Array partition       | Optimization2(Q5)<br>Dataflow                               |
|-------------------------------|-------------------------------------------------------------|-------------------------------------------------------------|
| Clock Cycle<br>Estimated (ns) | 11.058                                                      | 7.256                                                       |
| Interval                      | 338711                                                      | 346902                                                      |
| Throughput<br>(KDFTPS)        | $\frac{(256)}{(1/(11.058 \times 10^{-9}))(338711)} = 68.35$ | $\frac{(256)}{(1/(7.256 \times 10^{-9}))(346902)} = 101.70$ |
| BRAM                          | 34                                                          | 26                                                          |
| DSP                           | 34                                                          | 34                                                          |
| FF                            | 7077                                                        | 5009                                                        |
| LUT                           | 8629                                                        | 6871                                                        |

Significant improvement on Throughput from 68.35 KDFTPS to 101.70 KDFTPS .  
BRAM, FF & LUT resources were reduced

- **Question 6: Best architecture:** Briefly describe your “best” architecture. In what way is it the best? What optimizations did you use to obtain this result? What are the tradeoffs that you considered in order to obtain this architecture?

For my best optimization will be the one design with Dataflow implemented, lookup tables, I/O separated, array unrolling and array partition.

All these based on the design that provides the best Throughput.

"ap\_axis" refers to a data type used to represent an AXI4-Stream interface, allowing you to define streaming data paths within your C/C++ code that will be automatically translated into optimized hardware logic on an FPGA, essentially enabling efficient data transfer between different IP blocks or system components with a streaming protocol

Key points about ap\_axis:

Data structure:

It's a specialized data type within the "hls" namespace, typically defined as ap\_axis<DATA\_WIDTH, K, D, B> where:

DATA\_WIDTH: The width of the data payload in bits.

K: Number of user-defined "keep" bits (indicates which data bytes are valid).

D: Number of user-defined "last" bits (signals the end of a data burst).

B: Number of user-defined "user" bits (for custom signaling)

| Target   | Estimated | Uncertainty |
|----------|-----------|-------------|
| 10.00 ns | 11.058 ns | 2.70 ns     |

| Modules & Loops   | Issue Type            | Violation Type | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF   | LUT  |
|-------------------|-----------------------|----------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|------|------|
| ▼ dft             | ⚠ Timing Violation    |                |          | -3.76 | 338710          | 3.745E6     | -                 | 338711   | -          | no        | 34   | 34  | 7077 | 8629 |
| ► VITIS_LOOP_59_1 | ⚠ II&Timing Violation |                | -        | -     | 338709          | 3.745E6     | 338709            | -        | 256        | dataflow  | -    | -   | -    | -    |

- **Question 7: Streaming Interface Synthesis:** Modify your design to allow for streaming inputs and outputs using `hls::stream`. You must write your own testbench to account for the function interface change from `DTYPE` to `hls::stream`. NOTE: your design must pass Co-Simulation (not just C-Simulation). You can learn about `hls::stream` from the [HLS Stream Library](#). An example of code with both `hls::stream` and dataflow is available (along with its testbench) [here](#), and another [example showing hls::stream between functions](#). Describe the major changes that you made to your code to implement the streaming interface. What benefits does the streaming interface provide? What are the drawbacks?

```
dft.cpp  coefficients1024.h  dft_csim.log  Synthesis Summary(solution1)
1 INFO: [SIM 2] **** CSIM start ****
2 INFO: [SIM 4] CSIM will launch GCC as the compiler.
3 make: 'csim.exe' is up to date.
4 -----
5      RMSE(R)          RMSE(I)
6 0.001853297697380 0.001314980210736
7 -----
8 ****
9 PASS: The output matches the golden output!
10 ****
11 INFO [HLS SIM]: The maximum depth reached by any hls::stream() instance in the design is 1024
12 INFO: [SIM 1] CSim done with 0 errors.
13 INFO: [SIM 3] **** CSIM finish ****
14
```

| Target   | Estimated | Uncertainty |
|----------|-----------|-------------|
| 10.00 ns | 11.408 ns | 2.70 ns     |

| Modules & Loops | Issue Type           | Violation Type      | Distance | Slack | Latency(cycles) | Latency(ns) | Iteration Latency | Interval | Trip Count | Pipelined | BRAM | DSP | FF   | LUT  | URAM |
|-----------------|----------------------|---------------------|----------|-------|-----------------|-------------|-------------------|----------|------------|-----------|------|-----|------|------|------|
| ▼  dft          |                      |                     |          | -4.11 | 5386268         | 6.145E7     | -                 | 5385242  | -          | dataflow  | 48   | 34  | 7596 | 9289 | 0    |
| ►               | VITIS_LOOP_61_1_proc |                     | -        | -     | 1026            | 1.026E4     | -                 | 1026     | -          | no        | 0    | 0   | 13   | 73   | 0    |
| ►               | VITIS_LOOP_71_2_proc | II&Timing Violation | -        | -4.11 | 5385241         | 6.144E7     | -                 | 5385241  | -          | no        | 40   | 34  | 7545 | 9144 | 0    |

### **Co-Simulation:**

dft.cpp coefficients1024.h dft\_csim.log Synthesis Summary(solution1) Co-simulation Report(solution1) X

## Cosimulation Report for 'dft'

### General Information

Date: Sat Nov 16 02:28:47 AM PST 2024  
Version: 2022.2 (Build 3670227 on Oct 13 2022)  
Project: hls  
Status: Pass

Solution: solution1 (Vivado IP Flow Target)  
Product family: zynq  
Target device: xc7z020-clg400-1

### Cosim Options

Tool: Vivado XSIM RTL: Verilog

### Performance Estimates

Modules & Loops Avg II Max II Min II Avg Latency Max Latency Min Latency

| Modules & Loops        | Avg II | Max II | Min II | Avg Latency | Max Latency | Min Latency |
|------------------------|--------|--------|--------|-------------|-------------|-------------|
| ▼ dft                  |        |        |        | 5384212     | 5384212     | 5384212     |
| ▶ VITIS_LOOP_61_1_proc |        |        |        | 1024        | 1024        | 1024        |
| ▶ VITIS_LOOP_71_2_proc |        |        |        | 5383187     | 5383187     | 5383187     |

Flow Navigator x

- ▼ C SIMULATION
  - ▶ Run C Simulation
  - ▶ Reports & Viewers
- ▼ C SYNTHESIS
  - ▶ Run C Synthesis
  - ▼ Reports & Viewers
    - Report
    - Function Call Graph
    - Schedule Viewer
    - Dataflow Viewer
- ▼ C/RTL COSIMULATION
  - ▶ Run Cosimulation
  - ▼ Reports & Viewers
    - Report
    - Function Call Graph
    - Timeline Trace
    - Wave Viewer
- ▼ IMPLEMENTATION
  - ▶ Export RTL
  - ▶ Run Implementation

---

- ✓ Network 0
- ✓ /processing\_system7\_0
- ✓ /processing\_system7\_0/Data (32 address bits : 0x40000000 [ 1G ])
- ✓ /dft\_0/s\_axi\_control

| s_axi_control | Reg | 0x4000_0000 | 64K | 0x4000_FFFF |
|---------------|-----|-------------|-----|-------------|
|---------------|-----|-------------|-----|-------------|

## Displaying Error and Output

In [9]:

```
1 plt.figure(figsize=(10, 5))
2 plt.subplot(1,2,1)
3 plt.bar(ind,real_error)
4 plt.title("Real Part Error")
5 plt.xlabel("Index")
6 plt.ylabel("Error")
7 #plt.xticks(ind)
8 plt.tight_layout()
9
10 plt.subplot(1,2,2)
11 plt.bar(ind,imag_error)
12 plt.title("Imaginary Part Error")
13 plt.xlabel("Index")
14 plt.ylabel("Error")
15 #plt.xticks(ind)
16 plt.tight_layout()
```



In [10]:

```
1 freq=np.fft.fftfreq(1024)
```