

# 電子系統層級設計與驗證

## Homework 3

**HW/SW Co-Design of RGB2YUV and YUV2RGB**

學號：M133040052

M133040066

B103090046

姓名：陳昱介

紀汎甫

宋易柔

日期：113/5/13

# (一) Version 1: C

## 1. HLS 合成結果

| Performance Estimates |          |        |       | Utilization Estimates |  |  |  |
|-----------------------|----------|--------|-------|-----------------------|--|--|--|
| Timing (ns)           |          |        |       | Summary               |  |  |  |
| Name                  | BRAM_18K | DSP48E | FF    | LUT                   |  |  |  |
| DSP                   | -        | -      | -     | -                     |  |  |  |
| Expression            | -        | -      | 0     | 2328                  |  |  |  |
| FIFO                  | -        | -      | -     | -                     |  |  |  |
| Instance              | 0        | 25     | 2167  | 4019                  |  |  |  |
| Memory                | -        | -      | -     | -                     |  |  |  |
| Multiplexer           | -        | -      | -     | 368                   |  |  |  |
| Register              | -        | -      | 1314  | -                     |  |  |  |
| Total                 | 0        | 25     | 3481  | 6715                  |  |  |  |
| Available             | 120      | 80     | 35200 | 17600                 |  |  |  |
| Utilization (%)       | 0        | 31     | 9     | 38                    |  |  |  |

## 2. s\_axilite (hw.h)

```
c.cpp Synthesis(solution1) xrgb2yuv_axi_hw.h
14 //          bit 0 - auto_restart (Read/Write)
15 //          others - reserved
16 // 0x04 : Global Interrupt Enable Register
17 //          bit 0 - Global Interrupt Enable (Read/Write)
18 //          others - reserved
19 // 0x08 : IP Interrupt Enable Register (Read/Write)
20 //          bit 0 - Channel 0 (ap_done)
21 //          bit 1 - Channel 1 (ap_ready)
22 //          others - reserved
23 // 0x0c : IP Interrupt Status Register (Read/TOW)
24 //          bit 0 - Channel 0 (ap_done)
25 //          bit 1 - Channel 1 (ap_ready)
26 //          others - reserved
27 // 0x10 : Data signal of r
28 //          bit 31~0 - r[31:0] (Read/Write)
29 // 0x14 : reserved
30 // 0x18 : Data signal of g
31 //          bit 31~0 - g[31:0] (Read/Write)
32 // 0x1c : reserved
33 // 0x20 : Data signal of b
34 //          bit 31~0 - b[31:0] (Read/Write)
35 // 0x24 : reserved
36 // 0x28 : Data signal of y
37 //          bit 31~0 - y[31:0] (Read)
38 // 0x2c : Control signal of y
39 //          bit 0 - y_ap_vld (Read/COR)
40 //          others - reserved
41 // 0x30 : Data signal of u
42 //          bit 31~0 - u[31:0] (Read)
43 // 0x34 : Control signal of u
44 //          bit 0 - u_ap_vld (Read/COR)
45 //          others - reserved
46 // 0x38 : Data signal of v
47 //          bit 31~0 - v[31:0] (Read)
48 // 0x3c : Control signal of v
49 //          bit 0 - v_ap_vld (Read/COR)
50 //          others - reserved
51 // (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)
52 <
```

### 3. vivado block design



### 4. PYNQ Z2 執行結果

C ip spen 11.020362 sec



## (二) Version 2: SystemC

### 1. HLS 合成結果

## Performance Estimates

### Timing (ns)

#### Summary

| Clock  | Target | Estimated | Uncertainty |
|--------|--------|-----------|-------------|
| ap_clk | 10.00  | 8.593     | 1.25        |

#### Latency (clock cycles)

#### Summary

| Latency |     | Interval |     |      |
|---------|-----|----------|-----|------|
| min     | max | min      | max | Type |
| 0       | 10  | 0        | 10  | none |

#### Detail

##### Instance

##### Loop

## Utilization Estimates

### Summary

| Name            | BRAM_18K | DSP48E | FF    | LUT   |
|-----------------|----------|--------|-------|-------|
| DSP             | -        | -      | -     | -     |
| Expression      | -        | -      | -     | -     |
| FIFO            | -        | -      | -     | -     |
| Instance        | -        | 1      | 166   | 855   |
| Memory          | -        | -      | -     | -     |
| Multiplexer     | -        | -      | -     | -     |
| Register        | -        | -      | 27    | -     |
| Total           | 0        | 1      | 193   | 855   |
| Available       | 120      | 80     | 35200 | 17600 |
| Utilization (%) | 0        | 1      | ~0    | 4     |

## 2. vivado block design



## 3. PYNQ Z2 執行結果

Systemc ip spen 13.281702 sec



## (三) Version 3: Verilog

### 1. slv\_reg配置

```
449 assign start = slv_reg0;
450 assign importR = slv_reg1;
451 assign importG = slv_reg2;
452 assign importB = slv_reg3;
415 : reg_data_out <= slv_reg0;
416 : reg_data_out <= slv_reg1;
417 : reg_data_out <= slv_reg2;
418 : reg_data_out <= slv_reg3;
3'h4 : reg_data_out <= outportY; //slv_reg4
3'h5 : reg_data_out <= outportU; //slv_reg5
3'h6 : reg_data_out <= outportV; //slv_reg6
3'h7 : reg_data_out <= done; //slv_reg7
```

### 2. vivado block design



### 3. PYNQ Z2 執行結果

Verilog ip spen 9.343563 sec



## (四) 結果分析

| RGB2YUV   | DSP | LUT  | FF   | Clock Freq. | Execution Time |
|-----------|-----|------|------|-------------|----------------|
| Version 1 | 25  | 2581 | 3025 | 105MHz      | 13.281702 sec  |
| Version 2 | 4   | 1140 | 1432 | 115MHz      | 11.020362 sec  |
| Version 3 | 0   | 657  | 726  | 108MHz      | 9.343563 sec   |

## Version 1: C



**DRC Violations**

Summary: 38 warnings  
30 advisories

[Implemented DRC Report](#)

**Timing**

Setup | Hold | Pulse Width

Worst Negative Slack (WNS): 0.547 ns  
Total Negative Slack (TNS): 0 ns  
Number of Failing Endpoints: 0  
Total Number of Endpoints: 6359

[Implemented Timing Report](#)

**Utilization**

Post-Synthesis | Post-Implementation

Graph | Table

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 2581        | 53200     | 4.85          |
| LUTRAM   | 76          | 17400     | 0.44          |
| FF       | 3025        | 106400    | 2.84          |
| DSP      | 25          | 220       | 11.36         |
| BUFG     | 1           | 32        | 3.13          |

**Power**

Summary | On-Chip

Total On-Chip Power: 1.424 W  
Junction Temperature: 41.4 °C  
Thermal Margin: 43.6 °C (3.6 W)  
Effective θJA: 11.5 °C/W  
Power supplied to off-chip devices: 0 W  
Confidence level: Medium

[Implemented Power Report](#)

| Tcl Console                | Messages    | Log                       | Reports | Design Runs                                                                        | x  | ?  | — | □ | □ |  |
|----------------------------|-------------|---------------------------|---------|------------------------------------------------------------------------------------|----|----|---|---|---|--|
| Q                          | X           | ≡                         |         | 14                                                                                 | << | >> | + | % |   |  |
| Name                       | Constraints | Status                    |         | WNS TNS WHS THS TPWS Total Power Failed Routes LUT FF BRAMs URAM DSP Start Elapsed |    |    |   |   |   |  |
| synth_1 (active)           | constrs_1   | synth_design Complete!    |         | 0.547 0.0... 0.012 0.0... 0.000 1.424 0 2581 3025 0.00 0 5/16/25, 2:13 PM 00:00:32 |    |    |   |   |   |  |
| impl_1                     | constrs_1   | write_bitstream Complete! |         |                                                                                    |    |    |   |   |   |  |
| Out-of-Context Module Runs |             |                           |         |                                                                                    |    |    |   |   |   |  |

## Version 2: SystemC

**DRC Violations**

Summary: 12 warnings

[Implemented DRC Report](#)

**Timing**

Setup | Hold | Pulse Width

Worst Negative Slack (WNS): 1.331 ns  
Total Negative Slack (TNS): 0 ns  
Number of Failing Endpoints: 0  
Total Number of Endpoints: 3084

[Implemented Timing Report](#)

**Utilization**

Post-Synthesis | Post-Implementation

Graph | Table

| Resource | Utilization (%) |
|----------|-----------------|
| LUT      | 2%              |
| LUTRAM   | 1%              |
| FF       | 1%              |
| DSP      | 2%              |
| BUFG     | 3%              |

**Power**

Summary | On-Chip

Total On-Chip Power: 1.404 W  
Junction Temperature: 41.2 °C  
Thermal Margin: 43.8 °C (3.7 W)  
Effective θJA: 11.5 °C/W  
Power supplied to off-chip devices: 0 W  
Confidence level: Medium

[Implemented Power Report](#)

**DRC Violations**

Summary: 12 warnings  
[Implemented DRC Report](#)

**Timing**

Setup | Hold | Pulse Width

Worst Negative Slack (WNS): 1.331 ns  
Total Negative Slack (TNS): 0 ns  
Number of Failing Endpoints: 0  
Total Number of Endpoints: 3084  
[Implemented Timing Report](#)

**Utilization**      Post-Synthesis | **Post-Implementation**

Graph | Table

| Resource | Utilization | Available | Utilization % |
|----------|-------------|-----------|---------------|
| LUT      | 1140        | 53200     | 2.14          |
| LUTRAM   | 62          | 17400     | 0.36          |
| FF       | 1432        | 106400    | 1.35          |
| DSP      | 4           | 220       | 1.82          |
| BUFG     | 1           | 32        | 3.13          |

**Power**

Summary | On-Chip

Total On-Chip Power: 1.404 W  
Junction Temperature: 41.2 °C  
Thermal Margin: 43.8 °C (3.7 W)  
Effective 9JA: 11.5 °C/W  
Power supplied to off-chip devices: 0 W  
Confidence level: Medium  
[Implemented Power Report](#)

Tcl Console | Messages | Log | Reports | **Design Runs** x

?

| Name             | Constraints | Status                    | WNS   | TNS   | WHS   | THS   | TPWS  | Total Power | Failed Routes | LUT  | FF   | BRAMs | URAM | DSP | Start            | Elapsed  | Run Stra  |
|------------------|-------------|---------------------------|-------|-------|-------|-------|-------|-------------|---------------|------|------|-------|------|-----|------------------|----------|-----------|
| synth_1 (active) | constrs_1   | synth_design Complete!    |       |       |       |       |       | 1.404       | 0             | 0    | 0.00 | 0     | 0    | 0   | 5/16/25, 9:28 PM | 00:00:33 | Vivado :^ |
| impl_1           | constrs_1   | write_bitstream Complete! | 1.331 | 0.000 | 0.022 | 0.000 | 0.000 |             | 0             | 1140 | 1432 | 0.00  | 0    | 4   | 5/16/25, 9:29 PM | 00:02:06 | Vivado lr |

## Version 3: Verilog

**DRC Violations**

No DRC violations were found.  
[Implemented DRC Report](#)

**Timing**

Setup | Hold | Pulse Width

Worst Negative Slack (WNS): 0.766 ns  
Total Negative Slack (TNS): 0 ns  
Number of Failing Endpoints: 0  
Total Number of Endpoints: 1801  
[Implemented Timing Report](#)

**Utilization**      Post-Synthesis | **Post-Implementation**

Graph | Table

| Resource | Utilization (%) |
|----------|-----------------|
| LUT      | 1%              |
| LUTRAM   | 1%              |
| FF       | 1%              |
| BUFG     | 3%              |

**Power**

Summary | On-Chip

Total On-Chip Power: 1.4 W  
Junction Temperature: 41.2 °C  
Thermal Margin: 43.8 °C (3.7 W)  
Effective 9JA: 11.5 °C/W  
Power supplied to off-chip devices: 0 W  
Confidence level: Medium  
[Implemented Power Report](#)



| Design Runs      |             |                           |       |        |       |        |       |             |               |     |     |       |      |     |                  |          |
|------------------|-------------|---------------------------|-------|--------|-------|--------|-------|-------------|---------------|-----|-----|-------|------|-----|------------------|----------|
| Name             | Constraints | Status                    | WNS   | TNS    | WHS   | THS    | TPWS  | Total Power | Failed Routes | LUT | FF  | BRAMs | URAM | DSP | Start            | Elapsed  |
| synth_1 (active) | constrs_1   | synth_design Complete!    |       |        |       |        |       |             |               | 0   | 0   | 0.00  | 0    | 0   | 5/16/25, 3:29 PM | 00:00:34 |
| impl_1           | constrs_1   | write_bitstream Complete! | 0.766 | 0.0... | 0.044 | 0.0... | 0.000 | 1.400       | 0             | 657 | 726 | 0.00  | 0    | 0   | 5/16/25, 3:29 PM | 00:02:04 |

## (五) 實驗心得

這次用了 C 和 SystemC 做了 HLS 轉成 verilog code，讓我學到用軟體的方式如何轉成硬體語言，以後如果要做非常複雜的硬體架構，例如與神經網路相關，說不定可以先用諸如此類寫軟體的形式直接轉硬體就可以比較快速完成電路。再來就是用 Vivado 加 IP、包 IP，拉 Block Diagram，最後生成 Bitstream，不過在做 SystemC 的 Block Diagram 時候，AXI GPIO 原來要自己叫出來，還有就是要接 RGBYUV 的時候要按 AXI GPIO

上的+，這個方式我找了好一陣子，不過最後是奇蹟的按出來了。

我還有發現 C、SystemC 和 Verilog，每個和出來的結果各有不同，Verilog 用最少 DSP、LUT 和 FF，再來是 SystemC，用最多的是 C，所以如果想用 HLS 的話，可能也要考量到合出來的硬體面積比較大的問題。

在用 PYNQ 實作的時候也遇到一些問題，開啟 Jupyter 前要先啟動 root 權限，否則無法編譯；板子店員啟動後要先確認會不會閃藍、綠燈，否則就是 SD 卡有問題；連線 Jupyter 時，要先改 PC 網路的 ip 和遮罩。不過這是我第一次做軟硬體的 co-design，感覺很新奇，了解實際操作流程是如何進行。