

1.

It is a FIR system with a buffer to store the values if they arrive before being handled. In this lab, professor and TAs provide two different ways to connect the module to the CPU system. One is MAXI and the other is Stream. Following would compare the performance and other factors that show the difference between utilization of AXI and Stream.

## 2. Screenshot

### (1-1). Utilization of MAXI



| Utilization |             |           |               | Post-Synthesis |  | Post-Implementation                      |                  |
|-------------|-------------|-----------|---------------|----------------|--|------------------------------------------|------------------|
|             |             |           |               | Graph          |  | Table                                    |                  |
| Resource    | Utilization | Available | Utilizatio... |                |  | Power                                    | Sum              |
| LUT         | 5086        | 117120    | 4.34          |                |  | Total On-Chip Power:                     | 2.769 W          |
| LUTRAM      | 667         | 57600     | 1.16          |                |  | Junction Temperature:                    | 31.4 °C          |
| FF          | 6501        | 234240    | 2.78          |                |  | Thermal Margin:                          | 53.6 °C (22.8 W) |
| BRAM        | 2           | 144       | 1.39          |                |  | Effective θJA:                           | 2.3 °C/W         |
| DSP         | 33          | 1248      | 2.64          |                |  | Power supplied to off-chip devices:      | 0 W              |
| BUFG        | 2           | 352       | 0.57          |                |  | Confidence level:                        | Medium           |
|             |             |           |               |                |  | <a href="#">Implemented Power Report</a> |                  |

## (1-2). Utilization of Stream



| DRC Violations                         |                                      | Timing                                    | Setup   Hold   Pulse Width |
|----------------------------------------|--------------------------------------|-------------------------------------------|----------------------------|
| Summary:                               | ⚠️ 61 warnings                       | Worst Negative Slack (WNS):               | 3.404 ns                   |
| <a href="#">Implemented DRC Report</a> |                                      | Total Negative Slack (TNS):               | 0 ns                       |
|                                        |                                      | Number of Failing Endpoints:              | 0                          |
|                                        |                                      | Total Number of Endpoints:                | 35505                      |
|                                        |                                      | <a href="#">Implemented Timing Report</a> |                            |
| Utilization                            | Post-Synthesis   Post-Implementation | Power                                     | Summary   On-Chip          |
|                                        | Graph   Table                        |                                           |                            |
| Resource                               | Utilization                          | Available                                 | Utilizatio...              |
| LUT                                    | 6703                                 | 117120                                    | 5.72                       |
| LUTRAM                                 | 974                                  | 57600                                     | 1.69                       |
| FF                                     | 9532                                 | 234240                                    | 4.07                       |
| BRAM                                   | 2                                    | 144                                       | 1.39                       |
| DSP                                    | 33                                   | 1248                                      | 2.64                       |
| BUFG                                   | 1                                    | 352                                       | 0.28                       |

## (2-1) Hardware interface of MAXI

## (2-2) Hardware Interface of Stream

### (3-1) Co-simulation transcript/waveform & performance of MAXI

```

## run all
/////////////////////////////////////////////////////////////////
// Inter-Transaction Progress: Completed Transaction / Total Transaction
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
//
// RTL Simulation : "Inter-Transaction Progress" ["Intra-Transaction Progress"] @ "Simulation Time"
/////////////////////////////////////////////////////////////////
// RTL Simulation : 0 / 1 [n/a] @ "125000"
// RTL Simulation : 1 / 1 [n/a] @ "19675000"
/////////////////////////////////////////////////////////////////
$finish called at time : 19735 ns : File "/home/chenchingwen/Course2023/SoC/lab2/course-lab_2/hls_ip/solution1/sim/verilog/fir_n11_maxi.autotb.v" Line 438
## quit
INFO: [Common 17-206] Exiting xsim at Fri Sep 29 19:43:00 2023...
INFO: [COSIM 212-316] Starting C post checking ...
>> Start test!
>> Comparing against output data...
>> Test passed!
-----
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
INFO: [COSIM 212-211] II is measurable only when transaction number is greater than 1 in RTL simulation. Otherwise, they will be marked as all NA. If user wants,
INFO: [HLS 200-111] Finished Command cosim design CPU user time: 13.33 seconds. CPU system time: 1.87 seconds. Elapsed time: 14.01 seconds; current allocated mem
INFO: [HLS 200-112] Total CPU user time: 14.77 seconds. Total CPU system time: 2.39 seconds. Total elapsed time: 26.12 seconds; peak allocated memory: 772.504 MB
Finished C/RTL cosimulation.

```



### (3-2) Co-simulation transcript/waveform & performance of Stream

Synthesis Summary(solution1) FIR.cpp Co-simulation Report(solution1) x

Performance Estimates

| Modules & Loops                   | Avg II | Max II | Min II | Avg Latency | Max Latency | Min Latency |
|-----------------------------------|--------|--------|--------|-------------|-------------|-------------|
| ↳ fir_n11_strm                    |        |        |        | 6603        | 6603        | 6603        |
| ↳ fir_n11_strm_Pipeline_XFER_LOOP |        |        |        | 6600        | 6600        | 6600        |

Console x Errors Warnings Guidance Properties Man Pages Git Repositories

Vitis HLS Console

```
## add wave /apatab_fir_n11_strm_top/pstrmInput_TVALID -into $tb_return_group -color #ffff00 -radix hex
## add wave /apatab_fir_n11_strm_top/pstrmInput_TDATA -into $tb_return_group -radix hex
## save_wave_config fir_n11_strm.wcfg
## run all
///////////////////////////////
// Inter-Transaction Progress: Completed Transaction / Total Transaction
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
//
// RTL Simulation : "Inter-Transaction Progress" ["Intra-Transaction Progress"] @ "Simulation Time"
/////////////////////////////
// RTL Simulation : 0 / 1 [n/a] @ "125000"
// RTL Simulation : 1 / 1 [n/a] @ "67855000"
/////////////////////////////
$finish called at time : 67915 ns : File "/home/chenchingwen/Course2023/SoC/lab2/course-lab_2/hls_ip_stream/solution1/sim/verilog/fir_n11_
## quit
INFO: [Common 17-206] Exiting xsim at Fri Sep 29 20:26:59 2023...
INFO: [COSIM 212-316] Starting C post checking ...
>> Start test!
>> Comparing against output data...
>> Test passed!
-----
INFO [HLS SIM]: The maximum depth reached by any hls::stream() instance in the design is 600
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
INFO: [COSIM 212-211] II is measurable only when transaction number is greater than 1 in RTL simulation. Otherwise, they will be marked as
INFO: [HLS 200-111] Finished Command cosim_design CPU user time: 16.18 seconds. CPU system time: 2.72 seconds. Elapsed time: 17.8 seconds;
INFO: [HLS 200-112] Total CPU user time: 17.64 seconds. Total CPU system time: 3.05 seconds. Total elapsed time: 29.76 seconds; peak alloc
Finished C/RTL cosimulation.
```





(4-1) jupyter notebook result for MAXI



(4-2) jupyter notebook result for Stream



## 5. Observations

- (1). Utilization: Stream method consumes more resources(LUT, FF, etc) than MAXI one.
- (2). Execution time: Stream version consumes about 0.00075s while MAXI version consumes 0.00058s

Base on above observation, We can conclude that the MAXI version is a better choice with respect to the resource utilization(power consumption) and execution time.