

114 學年度  
國立中山大學  
硬體描述語言

Homework 2 pipelined add\_sub\_multiplier

作業

授課教師：蕭勝夫

學生學號/班級/姓名：M143010068/電機工程學系碩士班/王嘉良

## ● Multiply\_add 比較與發現

### ■ power\_report 截圖

multiply\_add 之 power report

```
i - Including register clock pin internal power

Cell Internal Power = 78.2215 uW (58%)
Net Switching Power = 55.7171 uW (42%)
Total Dynamic Power = 133.9386 uW (100%)
Cell Leakage Power = 101.6137 nW

Information: report_power power group summary does not include estimated clock tree power. (PWR-789)

Power Group      Internal Power      Switching Power      Leakage Power      Total Power ( % ) Attrs
io_pad           0.0000             0.0000             0.0000             0.0000 ( 0.00% )
memory          0.0000             0.0000             0.0000             0.0000 ( 0.00% )
black_box        0.0000             0.0000             0.0000             0.0000 ( 0.00% )
clock_network   0.0000             0.0000             0.0000             0.0000 ( 0.00% ) i
register         0.0000             0.0000             0.0000             0.0000 ( 0.00% )
sequential        0.0000             0.0000             0.0000             0.0000 ( 0.00% )
combinational    7.8222e-02       5.5717e-02       101.6118            0.1340 ( 100.00% )

Total           7.8222e-02 mW     5.5717e-02 mW     101.6118 nW     0.1340 mW
```

multiply\_add\_common\_factor 之 power report

```
Attributes
-----
i - Including register clock pin internal power

Cell Internal Power = 56.3956 uW (58%)
Net Switching Power = 41.6600 uW (42%)
Total Dynamic Power = 98.0556 uW (100%)
Cell Leakage Power = 56.0383 nW

Information: report_power power group summary does not include estimated clock tree power. (PWR-789)

Power Group      Internal Power      Switching Power      Leakage Power      Total Power ( % ) Attrs
io_pad           0.0000             0.0000             0.0000             0.0000 ( 0.00% )
memory          0.0000             0.0000             0.0000             0.0000 ( 0.00% )
black_box        0.0000             0.0000             0.0000             0.0000 ( 0.00% )
clock_network   0.0000             0.0000             0.0000             0.0000 ( 0.00% ) i
register         0.0000             0.0000             0.0000             0.0000 ( 0.00% )
sequential        0.0000             0.0000             0.0000             0.0000 ( 0.00% )
combinational    5.6396e-02       4.1660e-02       56.0364            9.8112e-02 ( 100.00% )

Total           5.6396e-02 mW     4.1660e-02 mW     56.0364 nW     9.8112e-02 mW
```

### ■ area\_report 截圖

multiply\_add 之 area report

|                                |                                         |
|--------------------------------|-----------------------------------------|
| Number of ports:               | 152                                     |
| Number of nets:                | 551                                     |
| Number of cells:               | 293                                     |
| Number of combinational cells: | 290                                     |
| Number of sequential cells:    | 0                                       |
| Number of macros/black boxes:  | 0                                       |
| Number of buf/inv:             | 32                                      |
| Number of references:          | 4                                       |
| Combinational area:            | 161.688963                              |
| Buf/Inv area:                  | 4.976640                                |
| Noncombinational area:         | 0.000000                                |
| Macro/Black Box area:          | 0.000000                                |
| Net Interconnect area:         | undefined (Wire load has zero net area) |
| Total cell area:               | 161.688963                              |

### multiply\_add\_common\_factor 之 area report

|                                |                                         |
|--------------------------------|-----------------------------------------|
| Number of ports:               | 98                                      |
| Number of nets:                | 313                                     |
| Number of cells:               | 165                                     |
| Number of combinational cells: | 163                                     |
| Number of sequential cells:    | 0                                       |
| Number of macros/black boxes:  | 0                                       |
| Number of buf/inv:             | 17                                      |
| Number of references:          | 3                                       |
| Combinational area:            | 89.942402                               |
| Buf/Inv area:                  | 2.643840                                |
| Noncombinational area:         | 0.000000                                |
| Macro/Black Box area:          | 0.000000                                |
| Net Interconnect area:         | undefined (Wire load has zero net area) |
| Total cell area:               | 89.942402                               |

### ■ area\_report 截圖

#### multiply\_add 之 timing report

| Des/Clust/Port                                  | Wire Load Model | Library                        |
|-------------------------------------------------|-----------------|--------------------------------|
| multiply_add                                    | ZeroWireload    | N16ADFP_StdCellss0p72vm40c_ccs |
| Point                                           | Incr            | Path                           |
| input external delay                            | 0.000000        | 0.000000 r                     |
| a[0] (in)                                       | 0.000000        | 0.000000 r                     |
| mult_4/a[0] (multiply_add_DW_mult_uns_2)        | 0.000000        | 0.000000 r                     |
| mult_4/U148/ZN (CKND1BWP16P90LVT)               | 0.013959        | 0.013959 f                     |
| mult_4/U119/ZN (NR2D1BWP16P90LVT)               | 0.016147        | 0.030106 r                     |
| mult_4/U57/CO (HA1D2BWP16P90LVT)                | 0.021239        | 0.051345 r                     |
| mult_4/U55/S (FA1D1BWP16P90LVT)                 | 0.035855        | 0.087200 f                     |
| mult_4/U13/CO (FA1D1BWP16P90LVT)                | 0.027252        | 0.114451 f                     |
| mult_4/U12/CO (FA1D1BWP16P90LVT)                | 0.027623        | 0.142074 f                     |
| mult_4/U11/CO (FA1D1BWP16P90LVT)                | 0.027625        | 0.169699 f                     |
| mult_4/U10/CO (FA1D1BWP16P90LVT)                | 0.027625        | 0.197324 f                     |
| mult_4/U9/CO (FA1D1BWP16P90LVT)                 | 0.027625        | 0.224949 f                     |
| mult_4/U8/CO (FA1D1BWP16P90LVT)                 | 0.027625        | 0.252574 f                     |
| mult_4/U7/CO (FA1D1BWP16P90LVT)                 | 0.027625        | 0.280198 f                     |
| mult_4/U6/CO (FA1D1BWP16P90LVT)                 | 0.027625        | 0.307823 f                     |
| mult_4/U5/CO (FA1D1BWP16P90LVT)                 | 0.027625        | 0.335448 f                     |
| mult_4/U4/CO (FA1D1BWP16P90LVT)                 | 0.027625        | 0.363073 f                     |
| mult_4/U3/CO (FA1D1BWP16P90LVT)                 | 0.027625        | 0.396698 f                     |
| mult_4/U2/S (FA1D1BWP16P90LVT)                  | 0.042923        | 0.433621 r                     |
| mult_4/product[14] (multiply_add_DW_mult_uns_2) | 0.000000        | 0.433621 r                     |
| add_8/A[14] (multiply_add_DW01_add_0)           | 0.000000        | 0.433621 r                     |
| add_8/U1_14/CO (FA1D1BWP16P90LVT)               | 0.020448        | 0.454069 r                     |
| add_8/U1_15/Z (XOR3D2BWP16P90LVT)               | 0.024210        | 0.478279 f                     |
| add_8/SUM[15] (multiply_add_DW01_add_0)         | 0.000000        | 0.478279 f                     |
| d[15] (out)                                     | 0.000000        | 0.478279 f                     |
| data arrival time                               |                 | 0.478279                       |

### multiply\_add\_common\_factor 之 timing report

| Des/Clust/Port                                                                            | Wire Load Model           | Library                                     |
|-------------------------------------------------------------------------------------------|---------------------------|---------------------------------------------|
| <code>multiply_add_common_factor</code>                                                   | <code>ZeroWireload</code> | <code>N16ADFP_StdCellss0p72vm40c_ccs</code> |
| Point                                                                                     | Incr                      | Path                                        |
| <code>input external delay</code>                                                         | 0.000000                  | 0.000000 f                                  |
| <code>a[0] (in)</code>                                                                    | 0.000000                  | 0.000000 f                                  |
| <code>add_5/A[0]</code> ( <code>multiply_add_common_factor_DW01_add_0</code> )            | 0.000000                  | 0.000000 f                                  |
| <code>add_5/U1/Z</code> ( <code>AN2D1BWP16P90LVT</code> )                                 | 0.010872                  | 0.010872 f                                  |
| <code>add_5/U1_1/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                              | 0.025990                  | 0.036862 f                                  |
| <code>add_5/U1_2/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                              | 0.027606                  | 0.064468 f                                  |
| <code>add_5/U1_3/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                              | 0.027625                  | 0.092092 f                                  |
| <code>add_5/U1_4/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                              | 0.027625                  | 0.119717 f                                  |
| <code>add_5/U1_5/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                              | 0.027625                  | 0.147342 f                                  |
| <code>add_5/U1_6/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                              | 0.027625                  | 0.174967 f                                  |
| <code>add_5/U1_7/S</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.038623                  | 0.213590 r                                  |
| <code>add_5/SUM[7]</code> ( <code>multiply_add_common_factor_DW01_add_0</code> )          | 0.000000                  | 0.213590 r                                  |
| <code>mult_6/a[7]</code> ( <code>multiply_add_common_factor_DW_mult_uns_1</code> )        | 0.000000                  | 0.213590 r                                  |
| <code>mult_6/U162/ZN</code> ( <code>CKND1BWP16P90LVT</code> )                             | 0.017536                  | 0.231126 f                                  |
| <code>mult_6/U129/ZN</code> ( <code>NR2D1BWP16P90LVT</code> )                             | 0.017100                  | 0.248226 r                                  |
| <code>mult_6/U48/S</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.040168                  | 0.288394 f                                  |
| <code>mult_6/U45/S</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.042152                  | 0.330545 r                                  |
| <code>mult_6/U44/S</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.038510                  | 0.369056 f                                  |
| <code>mult_6/U10/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                              | 0.028640                  | 0.397695 f                                  |
| <code>mult_6/U9/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.027625                  | 0.425320 f                                  |
| <code>mult_6/U8/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.027625                  | 0.452945 f                                  |
| <code>mult_6/U7/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.027625                  | 0.480570 f                                  |
| <code>mult_6/U6/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.027625                  | 0.508195 f                                  |
| <code>mult_6/U5/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.027625                  | 0.535820 f                                  |
| <code>mult_6/U4/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.027625                  | 0.563445 f                                  |
| <code>mult_6/U3/CO</code> ( <code>FA1D1BWP16P90LVT</code> )                               | 0.023398                  | 0.586843 f                                  |
| <code>mult_6/U1/Z</code> ( <code>XOR2D1BWP16P90LVT</code> )                               | 0.019416                  | 0.606258 r                                  |
| <code>mult_6/product[15]</code> ( <code>multiply_add_common_factor_DW_mult_uns_1</code> ) | 0.000000                  | 0.606258 r                                  |
| <code>d[15] (out)</code>                                                                  | 0.000000                  | 0.606258 r                                  |
| <code>data arrival time</code>                                                            | 0.000000                  | 0.606258                                    |

### ■ Power\_report 的發現

`multiply_add` 與 `multiply_add_common_factor` 的 dynamic power 分別是 133.9386 uW 與 98.0556 uW，有提取公因數的電路明顯功耗比較低，我想是因為第一個電路設計需要兩個乘法器，而公因數的電路只需要一個乘法器，因此從功耗上分析，在設計電路時通過簡單的代數重構（提取公因數），能將運算複雜度從「兩個乘法 + 一個加法」降低到「一個加法 + 一個乘法」，從而在硬體上獲得了顯著的功耗降低。

### ■ Area\_report 的發現

`multiply_add_common_factor` 的總面積僅為 89.942402，而 `multiply_add` 則高達 161.688963。表示提取公因數的寫法節省了約 40% 的晶片面積。透過這個實驗可以知道乘法器在 area 佔了不小的面積，日後在設計電路時可以減少乘法器或是像前面提到的代數重構使得設計出來的 cell area 較小。

再來是 `multiply_add` 有 293 個 cell 和 551 條 nets。`multiply_add_common_factor` 有 165 個 cell 和 313 條 nets。在實現一樣功能的同時，分散乘法設計使用了幾乎兩倍的 cell 數量和近兩倍的 net 數量，也因此浪費了大量面積。因此在 Area\_report 中也可以觀察到提取公因數的方法較佳。

## ■ timing\_report 的發現

multiply\_add 的 arrival time 是 0.478279，而 multiply\_add\_common\_factor 是 0.606258，推測是因為雖然分散乘法使用了兩倍的硬體，但兩個耗時最長的乘法運算是同時進行的，所以整體時間較快，所以在這兩個情況下，如果只追求速度，不考慮功耗跟面積時，應該考慮分散式乘法。

## ● add\_sub 比較與發現

### ■ power\_report 截圖

#### add\_sub\_1 之 power\_report

```
Design      Wire Load Model      Library
add_sub_1      ZeroWireload      N16ADFP_StdCellss0p72vm40c_ccs

Global Operating Voltage = 0.72
Power-specific unit information :
  Voltage Units = 1V
  Capacitance Units = 1.000000pf
  Time Units = 1ns
  Dynamic Power Units = 1mW    (derived from V,C,T units)
  Leakage Power Units = 1nW

Attributes
i - Including register clock pin internal power

Cell Internal Power = 8.5821 uW (71%)
Net Switching Power = 3.4411 uW (29%)
Total Dynamic Power = 12.0231 uW (100%)
Cell Leakage Power = 833.3741 pW
Information: report_power power group summary does not include estimated clock tree power. (PWR-789)

          Internal       Switching       Leakage       Total
Power Group    Power        Power        Power        Power (%) Attrs
-----+-----+-----+-----+-----+-----+
io_pad        0.0000      0.0000      0.0000      0.0000 ( 0.00%)
memory        0.0000      0.0000      0.0000      0.0000 ( 0.00%)
black_box     0.0000      0.0000      0.0000      0.0000 ( 0.00%)
clock_network 0.0000      0.0000      0.0000      0.0000 ( 0.00%)
register      0.0000      0.0000      0.0000      0.0000 ( 0.00%)
sequential    0.0000      0.0000      0.0000      0.0000 ( 0.00%)
combinational 8.5821e-03  3.4411e-03  0.8334      1.2024e-02 ( 100.00%)
-----+-----+-----+-----+-----+
Total         8.5821e-03 mW   3.4411e-03 mW   0.8334 nW   1.2024e-02 mW
1
```

#### add\_sub\_2 之 power\_report

```
Design      Wire Load Model      Library
add_sub_2      ZeroWireload      N16ADFP_StdCellss0p72vm40c_ccs

Global Operating Voltage = 0.72
Power-specific unit information :
  Voltage Units = 1V
  Capacitance Units = 1.000000pf
  Time Units = 1ns
  Dynamic Power Units = 1mW    (derived from V,C,T units)
  Leakage Power Units = 1nW

Attributes
i - Including register clock pin internal power

Cell Internal Power = 6.3748 uW (73%)
Net Switching Power = 2.3995 uW (27%)
Total Dynamic Power = 8.7743 uW (100%)
Cell Leakage Power = 5.0758 nW
Information: report_power power group summary does not include estimated clock tree power. (PWR-789)

          Internal       Switching       Leakage       Total
Power Group    Power        Power        Power        Power (%) Attrs
-----+-----+-----+-----+-----+
io_pad        0.0000      0.0000      0.0000      0.0000 ( 0.00%)
memory        0.0000      0.0000      0.0000      0.0000 ( 0.00%)
black_box     0.0000      0.0000      0.0000      0.0000 ( 0.00%)
clock_network 0.0000      0.0000      0.0000      0.0000 ( 0.00%)
register      0.0000      0.0000      0.0000      0.0000 ( 0.00%)
sequential    0.0000      0.0000      0.0000      0.0000 ( 0.00%)
combinational 6.3748e-03  2.3995e-03  5.0739      8.7793e-03 ( 100.00%)
-----+-----+-----+-----+
Total         6.3748e-03 mW   2.3995e-03 mW   5.0739 nW   8.7793e-03 mW
1
```

### add\_sub\_3 之 power\_report

```

Design      Wire Load Model      Library
-----      ZeroWireload      N16ADFP_StdCellss0p72vm40c_ccs

Global Operating Voltage = 0.72
Power-specific unit information :
  Voltage Units = 1V
  Capacitance Units = 1.000000pf
  Time Units = 1ns
  Dynamic Power Units = 1mW      (derived from V,C,T units)
  Leakage Power Units = 1nW

Attributes
-----
i - Including register clock pin internal power

  Cell Internal Power = 6.3748 uW (73%)
  Net Switching Power = 2.3995 uW (27%)
  Total Dynamic Power = 8.7743 uW (100%)
  Cell Leakage Power = 5.0758 nW

Information: report_power power group summary does not include estimated clock tree power. (PWR-789)

          Internal      Switching      Leakage      Total
Power Group    Power        Power        Power      Power ( % ) Attrs
-----
io_pad        0.0000      0.0000      0.0000      0.0000 ( 0.00% )
memory        0.0000      0.0000      0.0000      0.0000 ( 0.00% )
black_box     0.0000      0.0000      0.0000      0.0000 ( 0.00% )
clock_network 0.0000      0.0000      0.0000      0.0000 ( 0.00% )
register      0.0000      0.0000      0.0000      0.0000 ( 0.00% )
sequential     0.0000      0.0000      0.0000      0.0000 ( 0.00% )
combinational 6.3748e-03  2.3995e-03  5.0739      8.7793e-03 ( 100.00% )

Total          6.3748e-03 mW   2.3995e-03 mW   5.0739 nW   8.7793e-03 mW
1

```

### ■ Area\_report 截圖

#### add\_sub\_1 之 area\_report

```

Number of ports:                      33
Number of nets:                       73
Number of cells:                      56
Number of combinational cells:        56
Number of sequential cells:           0
Number of macros/black boxes:         0
Number of buf/inv:                   7
Number of references:                 18

Combinational area:                  21.669120
Buf/Inv area:                       1.088640
Noncombinational area:               0.000000
Macro/Black Box area:                0.000000
Net Interconnect area:              undefined (Wire load has zero net area)

Total cell area:                     21.669120
Total area:                          undefined

Hierarchical area distribution
-----

          Global cell area          Local cell area
Hierarchical cell      Absolute Total      Percent Total      Combi- national      Noncombi- national      Black- boxes      Design
-----
add_sub_1             21.6691      100.0
Total                  21.6691      0.0000      0.0000      0.0000      0.0000      add_sub_1

```

### add\_sub\_2 之 area\_report

```
Number of ports: 60
Number of nets: 70
Number of cells: 20
Number of combinational cells: 19
Number of sequential cells: 0
Number of macros/black boxes: 0
Number of buf/inv: 1
Number of references: 3

Combinational area: 13.893120
Buf/Inv area: 0.155520
Noncombinational area: 0.000000
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)

Total cell area: 13.893120
Total area: undefined

Hierarchical area distribution
-----
          Global cell area          Local cell area
Hierarchical cell      Absolute    Percent      Combi-   Noncombi- Black-
                           Total      Total      national  national  boxes   Design
-----
add_sub_2            13.8931    100.0     0.4147    0.0000  0.0000 add_sub_2
r368                13.4784    97.0     13.4784   0.0000  0.0000 add_sub_2_DW01_addsub_0
-----
Total               13.8931    0.0000  0.0000  0.0000


```

### add\_sub\_3 之 area\_report

```
Number of ports: 60
Number of nets: 70
Number of cells: 20
Number of combinational cells: 19
Number of sequential cells: 0
Number of macros/black boxes: 0
Number of buf/inv: 1
Number of references: 3

Combinational area: 13.893120
Buf/Inv area: 0.155520
Noncombinational area: 0.000000
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)

Total cell area: 13.893120
Total area: undefined

Hierarchical area distribution
-----
          Global cell area          Local cell area
Hierarchical cell      Absolute    Percent      Combi-   Noncombi- Black-
                           Total      Total      national  national  boxes   Design
-----
add_sub_3            13.8931    100.0     0.4147    0.0000  0.0000 add_sub_3
r368                13.4784    97.0     13.4784   0.0000  0.0000 add_sub_3_DW01_addsub_0
-----
Total               13.8931    0.0000  0.0000  0.0000


```

## ■ timing\_report 截圖

### add\_sub\_1 之 area\_report

| Des/Clust/Port             | Wire Load Model | Library                        |
|----------------------------|-----------------|--------------------------------|
| add_sub_1                  | ZeroWireload    | N16ADFP_StdCellss0p72vm40c_ccs |
| Point                      | Incr            | Path                           |
| input external delay       | 0.000000        | 0.000000 f                     |
| s (in)                     | 0.000000        | 0.000000 f                     |
| U63/ZN (CKND1BWP16P90)     | 0.014545        | 0.014545 r                     |
| U57/Z (AN2D1BWP16P90)      | 0.023013        | 0.037558 r                     |
| U56/Z (XOR2D1BWP16P90)     | 0.033288        | 0.070846 f                     |
| U55/ZN (MAOI222D1BWP16P90) | 0.019168        | 0.090015 r                     |
| U54/ZN (CKND1BWP16P90)     | 0.014119        | 0.104133 f                     |
| U51/ZN (OA121D1BWP16P90)   | 0.014998        | 0.119131 r                     |
| U50/ZN (IOA21D1BWP16P90)   | 0.022337        | 0.141468 f                     |
| U47/Z (OR2D1BWP16P90)      | 0.024948        | 0.166417 f                     |
| U46/ZN (AOI22D1BWP16P90)   | 0.026460        | 0.192877 r                     |
| U43/Z (AN2D1BWP16P90)      | 0.026272        | 0.219149 r                     |
| U41/ZN (OA122D1BWP16P90)   | 0.024996        | 0.244145 f                     |
| U38/Z (OR2D1BWP16P90)      | 0.026473        | 0.270618 f                     |
| U37/ZN (AOI22D1BWP16P90)   | 0.019741        | 0.290359 r                     |
| U36/ZN (CKND1BWP16P90)     | 0.021998        | 0.312357 f                     |
| U35/Z (OA21D1BWP16P90)     | 0.022696        | 0.335053 f                     |
| U34/ZN (AOI21D1BWP16P90)   | 0.015159        | 0.350212 r                     |
| U32/ZN (MAOI222D1BWP16P90) | 0.023631        | 0.373843 f                     |
| U30/Z (XOR2D1BWP16P90)     | 0.025153        | 0.398995 r                     |
| d[15] (out)                | 0.000000        | 0.398995                       |
| data arrival time          |                 | 0.398995                       |

### add\_sub\_2 之 area\_report

| Des/Clust/Port                         | Wire Load Model | Library                        |
|----------------------------------------|-----------------|--------------------------------|
| add_sub_2                              | ZeroWireload    | N16ADFP_StdCellss0p72vm40c_ccs |
| Point                                  | Incr            | Path                           |
| input external delay                   | 0.000000        | 0.000000 f                     |
| s (in)                                 | 0.000000        | 0.000000 f                     |
| U5/ZN (CKND1BWP16P90)                  | 0.023824        | 0.023824 r                     |
| r368/ADD_SUB (add_sub_2_DW01_addsub_0) | 0.000000        | 0.023824 r                     |
| r368/U8/Z (XOR2D1BWP16P90)             | 0.048543        | 0.072367 f                     |
| r368/U1_0/CO (FA1D1BWP16P90LVT)        | 0.030022        | 0.102388 f                     |
| r368/U1_1/CO (FA1D1BWP16P90LVT)        | 0.027724        | 0.130113 f                     |
| r368/U1_2/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.157749 f                     |
| r368/U1_3/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.185385 f                     |
| r368/U1_4/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.213021 f                     |
| r368/U1_5/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.240657 f                     |
| r368/U1_6/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.268293 f                     |
| r368/U1_7/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.295929 f                     |
| r368/U1_8/S (FA1D1BWP16P90LVT)         | 0.036437        | 0.332366 r                     |
| r368/SUM[8] (add_sub_2_DW01_addsub_0)  | 0.000000        | 0.332366 r                     |
| d[15] (out)                            | 0.000000        | 0.332366 r                     |
| data arrival time                      |                 | 0.332366                       |

### add\_sub\_3 之 area\_report

| Des/Clust/Port                         | Wire Load Model | Library                        |
|----------------------------------------|-----------------|--------------------------------|
| add_sub_3                              | ZeroWireload    | N16ADFP_StdCellss0p72vm40c_ccs |
| Point                                  | Incr            | Path                           |
| input external delay                   | 0.000000        | 0.000000 f                     |
| s (in)                                 | 0.000000        | 0.000000 f                     |
| U5/ZN (CKND1BWP16P90)                  | 0.023824        | 0.023824 r                     |
| r368/ADD_SUB (add_sub_3_DW01_addsub_0) | 0.000000        | 0.023824 r                     |
| r368/U8/Z (XOR2D1BWP16P90)             | 0.048543        | 0.072367 f                     |
| r368/U1_0/CO (FA1D1BWP16P90LVT)        | 0.030022        | 0.102388 f                     |
| r368/U1_1/CO (FA1D1BWP16P90LVT)        | 0.027724        | 0.130113 f                     |
| r368/U1_2/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.157749 f                     |
| r368/U1_3/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.185385 f                     |
| r368/U1_4/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.213021 f                     |
| r368/U1_5/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.240657 f                     |
| r368/U1_6/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.268293 f                     |
| r368/U1_7/CO (FA1D1BWP16P90LVT)        | 0.027636        | 0.295929 f                     |
| r368/U1_8/S (FA1D1BWP16P90LVT)         | 0.036437        | 0.332366 r                     |
| r368/SUM[8] (add_sub_3_DW01_addsub_0)  | 0.000000        | 0.332366 r                     |
| d[15] (out)                            | 0.000000        | 0.332366 r                     |
| data arrival time                      |                 | 0.332366                       |

### ■ Power\_report 的發現

在 power\_report 中，add\_sub\_1 的 total dynamic power 大約比 add\_sub\_2、add\_sub\_3 兩者多出將近 40%，而 add\_sub\_2、add\_sub\_3 兩種設計出來的 total dynamic power 完全一致，皆為 8.7743 uW，add\_sub\_1 是 12.0231 uW，我認為是在我們的設計下，design compiler 可以辨識我硬體想做的事（一個加法器、一個減法器、一個多工器），所以兩者這方面數據一致。

另外在 leakage power 中，add\_sub\_1 比另外兩者低了約六倍，為 833.3741 pW，我想應該是這個設計出來的電路總面積是最小的，尤其少了一個 subtractor。

### ■ Area\_report 的發現

add\_sub\_2, add\_sub\_3 兩者的數據完全一致，而 add\_sub\_1 比另外兩個模組合出來的 total cell area 多出了 50%左右，分別是 21.66, 13.89，所以 add\_sub\_2 (?: 語法) 與 add\_sub\_3 (always if/else 語法) 兩者的總面積和總功耗完全相同。

### ■ Timing\_report 的發現

add\_sub\_2, add\_sub\_3 兩者的 data arrival time 一致，皆為 0.332366 而 add\_sub\_1 是 0.398995，大約多出了 20%，表示有更高的延遲，也代表設計比較不好，因為 add\_sub\_1 這樣的設計導致了更高的延遲、更大的面積和更高的功耗。

## ● Non\_pipe 之波形圖與發現

### Pre-sim



### Gate-level-sim



### 觀察與發現

在我的波形圖中，第一張圖是  $c=0$ ，第二張圖是當  $c!=0$  的時候，可以觀察到 **expected\_d** 跟 **d\_nonpipe** 的 value 都相同，代表電路計算方面都正確，接著來看後續的 report，各個電路之間的 report 的發現會在最後做討論。

## Timing report

```
*****
Report : timing
  -path full
  -delay max
  -max_paths 1
Design : hw2_nonpipe
Version: T-2022.03
Date : Fri Oct 24 19:16:58 2025
*****
Operating Conditions: ss0p72vm40c Library: N16ADFP_StdCellss0p72vm40c_ccs
Wire Load Model Mode: top
Startpoint: s (input port)
Endpoint: d[15] (output port)
Path Group: (none)
Path Type: max
Des/Clust/Port      Wire Load Model      Library
hw2_nonpipe        ZeroWireload        N16ADFP_StdCellss0p72vm40c_ccs
Point                Incr      Path
-----  

input external delay          0.000000  0.000000 f
s (in)                      0.000000  0.000000 f
U5/ZN (CKND1BWP16P90)        0.025997  0.025997 r
r373/ADD_SUB (hw2_nonpipe_DW01_addsub_0) 0.000000  0.025997 r
r373/U9/Z (XOR2D1BWP16P90)   0.049652  0.075649 f
r373/U1_0/CO (FA1D1BWP16P90LVT) 0.030029  0.105678 f
r373/U1_1/CO (FA1D1BWP16P90LVT) 0.027740  0.133417 f
r373/U1_2/CO (FA1D1BWP16P90LVT) 0.027636  0.161053 f
r373/U1_3/CO (FA1D1BWP16P90LVT) 0.027636  0.188689 f
r373/U1_4/CO (FA1D1BWP16P90LVT) 0.027636  0.216326 f
r373/U1_5/CO (FA1D1BWP16P90LVT) 0.027636  0.243962 f
r373/U1_6/CO (FA1D1BWP16P90LVT) 0.027636  0.271598 f
r373/U1_7/S (FA1D1BWP16P90LVT) 0.038634  0.310232 r
r373/SUM[7] (hw2_nonpipe_DW01_addsub_0) 0.000000  0.310232 r
mult_6/a[7] (hw2_nonpipe_DW_mult_uns_1) 0.000000  0.310232 r
mult_6/U163/ZN (CKND1BWP16P90LVT) 0.017572  0.327804 f
mult_6/U129/ZN (NR2D1BWP16P90LVT) 0.017103  0.344907 r
mult_6/U48/S (FA1D1BWP16P90LVT) 0.040168  0.385074 f
mult_6/U45/S (FA1D1BWP16P90LVT) 0.042152  0.427226 r
mult_6/U44/S (FA1D1BWP16P90LVT) 0.038510  0.465736 f
mult_6/U10/CO (FA1D1BWP16P90LVT) 0.028648  0.494376 f
mult_6/U9/CO (FA1D1BWP16P90LVT) 0.027625  0.522001 f
mult_6/U8/CO (FA1D1BWP16P90LVT) 0.027625  0.549626 f
mult_6/U7/CO (FA1D1BWP16P90LVT) 0.027625  0.577251 f
mult_6/U6/CO (FA1D1BWP16P90LVT) 0.027625  0.604876 f
mult_6/U5/CO (FA1D1BWP16P90LVT) 0.027625  0.632501 f
mult_6/U4/CO (FA1D1BWP16P90LVT) 0.027625  0.660126 f
mult_6/U3/CO (FA1D1BWP16P90LVT) 0.023398  0.683523 f
mult_6/U1/Z (XOR2D1BWP16P90LVT) 0.019416  0.702939 r
mult_6/product[15] (hw2_nonpipe_DW_mult_uns_1) 0.000000  0.702939 r
d[15] (out)                  0.000000  0.702939 r
data arrival time            0.000000  0.702939  

-----  

(Path is unconstrained)
```

## Area report

```
1 ****
2 ****
3 Report : area
4 Design : hw2_nonpipe
5 Version: T-2022.03
6 Date : Fri Oct 24 19:16:58 2025
7 ****
8
9 Library(s) Used:
10      N16ADFP_StdCellss0p72vm40c_ccs (File: /cad/CBDK/ADFP/Executable_Package/Collaterals/IP/stdcell/N16ADFP_St
11
12 Number of ports:           100
13 Number of nets:            325
14 Number of cells:           174
15 Number of combinational cells: 172
16 Number of sequential cells: 0
17 Number of macros/black boxes: 0
18 Number of buf/inv:          18
19 Number of references:       4
20
21 Combinational area:        95.022722
22 Buf/Inv area:              2.799360
23 Noncombinational area:     0.000000
24 Macro/Black Box area:      0.000000
25 Net Interconnect area:    undefined (Wire load has zero net area)
26
27 Total cell area:           95.022722
28 Total area:                 undefined
29
30 Hierarchical area distribution
31 -----
32 -----
33
34                                     Global cell area          Local cell area
35                                     -----          -----
36 Hierarchical cell          Absolute Total Percent Combi- Noncombi- Black- Design
37          Total          Total          national  national  boxes
38
39 hw2_nonpipe                95.0227  100.0   0.4147   0.0000   0.0000  hw2_nonpipe
40 mult_6                      81.6480  85.9   81.6480   0.0000   0.0000  hw2_nonpipe_DW_mult_uns_1
41 r373                       12.9600  13.6   12.9600   0.0000   0.0000  hw2_nonpipe_DW01_addsub_0
42
43 Total                         95.0227  0.0000   0.0000
```

## Power report

```
6 ****
7 Report : power
8   -analysis_effort low
9 Design : hw2_nonpipe
10 Version: T-2022.03
11 Date  : Fri Oct 24 19:17:13 2025
12 ****
13
14 Library(s) Used:
15
16     N16ADFP_StdCellss0p72vm40c_ccs (File: /cad/CBDK/ADFP/Executable_Package/Collaterals/IP/stdcell/N1
17
18
19 Operating Conditions: ss0p72vm40c  Library: N16ADFP_StdCellss0p72vm40c_ccs
20 Wire Load Model Mode: top
21
22 Design      Wire Load Model      Library
23 -----
24 hw2_nonpipe      ZeroWireload      N16ADFP_StdCellss0p72vm40c_ccs
25
26
27 Global Operating Voltage = 0.72
28 Power-specific unit information :
29   Voltage Units = 1V
30   Capacitance Units = 1.000000pf
31   Time Units = 1ns
32   Dynamic Power Units = 1mW    (derived from V,C,T units)
33   Leakage Power Units = 1nW
34
35
36 Attributes
37 -----
38 i - Including register clock pin internal power
39
40 Cell Internal Power = 61.0430 uW (57%)
41 Net Switching Power = 45.6337 uW (43%)
42 Total Dynamic Power = 106.6767 uW (100%)
43 Cell Leakage Power = 56.4581 nW
44
45 Information: report_power power group summary does not include estimated clock tree power. (PWR-789)
46
47
48
49
50
51
52 Power Group      Internal Power      Switching Power      Leakage Power      Total Power ( % ) Attrs
53 -----
54 io_pad          0.0000            0.0000            0.0000            0.0000 ( 0.00% )
55 memory          0.0000            0.0000            0.0000            0.0000 ( 0.00% )
56 black_box       0.0000            0.0000            0.0000            0.0000 ( 0.00% )
57 clock_network   0.0000            0.0000            0.0000            0.0000 ( 0.00% )
58 register         0.0000            0.0000            0.0000            0.0000 ( 0.00% )
59 sequential        0.0000            0.0000            0.0000            0.0000 ( 0.00% )
60 combinational   6.1043e-02        4.5634e-02        56.4562           0.1067 ( 100.00% )
61
62 Total          6.1043e-02 mW        4.5634e-02 mW        56.4562 nW        0.1067 mW
63
64 1
```

## ● pipelined 之波形圖與發現

### Pre-sim



# Gate-level-sim



觀察與發現

在這次的實驗中，我發現 Pipelined 電路與 Non-pipelined 電路最大的不同，在於各個運算階段之間的延遲關係。Non-pipelined 電路的輸入與輸出在同一個時脈週期內完成所有運算，因此在模擬時，輸入一組資料後馬上就能觀察到正確的輸出結果；而在 Pipelined 架構中，資料會經過多個暫存階段，每一個 Stage 都只負責部分運算，完整的輸出結果會延遲數個時脈週期才出現。

另外，我也觀察到在 Verilog 中使用無號數運算時，若出現減法運算，結果會自動以 modulo 256 的方式進行 wrap-around，而不會出現負數。這部分若未在 Testbench 中對應處理，就會導致模擬的期望值與實際輸出不一致。就這兩個電路比較下來，pipelined 的電路比 non-pipe 還要有挑戰性一些，需要注意跟考慮很多細節，接著是 report 的截圖。

## Timing report

```

7 Design : hw2_pipe
8 Version: T-2022.03
9 Date   : Thu Oct 30 15:15:24 2025
10 ****
11 Operating Conditions: ss0p72vm40c Library: N16ADFP_StdCellss0p72vm40c_ccs
12 Wire Load Model Mode: top
13
14 Startpoint: u_stage1/c_pipe_out_reg_1
15 (rising edge-triggered flip-flop clocked by clk)
16 Endpoint: u_stage2/result_reg_15
17 (rising edge-triggered flip-flop clocked by clk)
18 Path Group: clk
19 Path Type: max
20
21
22 Des/Clust/Port      Wire Load Model      Library
23 -----+-----+-----+
24 hw2_pipe           ZeroWireload        N16ADFP_StdCellss0p72vm40c_ccs
25
26 Point          Incr      Path
27 -----
28 clock clk (rise edge)      0.000000  0.000000
29 clock network delay (ideal) 0.000000  0.000000
30 u_stage1/c_pipe_out_reg_1/_CP (DFCNQD1BWP16P90LVT) 0.000000  0.000000 r
31 u_stage1/c_pipe_out_reg_1/_Q (DFCNQD1BWP16P90LVT) 0.032969  0.032969 r
32 u_stage1/c_pipe_out[1] (hw2_pipe_stage1) 0.000000  0.032969 r
33 u_stage2/c[1] (hw2_pipe_stage2) 0.000000  0.032969 r
34 u_stage2/mult_11/b[1] (hw2_pipe_stage2_DW_mult_uns_0) 0.000000  0.032969 r
35
36 u_stage2/mult_11/U161/ZN (CKND1BWP16P90LVT) 0.017223  0.050192 f
37 u_stage2/mult_11/U185/ZN (NR2D1BWP16P90LVT) 0.016380  0.066572 r
38 u_stage2/mult_11/U196/C0 (HA1D1BWP16P90LVT) 0.019320  0.085893 r
39 u_stage2/mult_11/U62/S (FA1D1BWP16P90LVT) 0.036663  0.122556 f
40 u_stage2/mult_11/U193/C0 (FA1D1BWP16P90LVT) 0.027262  0.149818 f
41 u_stage2/mult_11/U192/C0 (FA1D1BWP16P90LVT) 0.027623  0.177441 f
42 u_stage2/mult_11/U162/C0 (FA1D1BWP16P90LVT) 0.027625  0.205066 f
43 u_stage2/mult_11/U171/C0 (FA1D1BWP16P90LVT) 0.027625  0.232691 f
44 u_stage2/mult_11/U168/C0 (FA1D1BWP16P90LVT) 0.028302  0.260993 f
45 u_stage2/mult_11/U159/C0 (FA1D2BWP16P90LVT) 0.023546  0.284539 f
46 u_stage2/mult_11/U8/C0 (FA1D1BWP16P90LVT) 0.026345  0.310884 f
47 u_stage2/mult_11/U7/C0 (FA1D1BWP16P90LVT) 0.027618  0.338502 f
48 u_stage2/mult_11/U6/C0 (FA1D1BWP16P90LVT) 0.027625  0.366126 f
49 u_stage2/mult_11/U157/C0 (FA1D1BWP16P90LVT) 0.027625  0.393751 f
50 u_stage2/mult_11/U163/C0 (FA1D1BWP16P90LVT) 0.027625  0.421376 f
51 u_stage2/mult_11/U191/C0 (FA1D1BWP16P90LVT) 0.023179  0.444556 f
52 u_stage2/mult_11/U201/Z (XOR3D1BWP16P90LVT) 0.040737  0.485292 r
53 u_stage2/mult_11/product[15] (hw2_pipe_stage2_DW_mult_uns_0) 0.000000  0.485292 r
54 u_stage2/result_reg_15/_D (DFCNQD2BWP16P90LVT) 0.000000  0.485292 r
55 data arrival time 0.485292
56
57 clock clk (rise edge) 0.502081  0.502081
58 clock network delay (ideal) 0.000000  0.502081
59 u_stage2/result_reg_15/_CP (DFCNQD2BWP16P90LVT) 0.000000  0.502081 r
60 library setup time -0.014647  0.487434
61 data required time 0.487434
62
63 data required time 0.487434
64 data arrival time -0.485292
65
66 -----+-----+
67 slack (MFT) 0.002141

```

## Area report

```

2 ****
3 Report : area
4 Design : hw2_pipe
5 Version: T-2022.03
6 Date   : Thu Oct 30 15:15:24 2025
7 ****
8
9 Library(s) Used:
10 N16ADFP_StdCellss0p72vm40c_ccs (File: /cad/CBDK/ADFP/Executable_Package/Collaterals/IP/stdcell/N16ADFP_StdCell/
11
12 Number of ports: 181
13 Number of nets: 456
14 Number of cells: 228
15 Number of combinational cells: 191
16 Number of sequential cells: 33
17 Number of macros/black boxes: 0
18 Number of buf/inv: 28
19 Number of references: 3
20
21 Combinational area: 98.858882
22 Buf/Inv area: 4.872960
23 Noncombinational area: 37.635841
24 Macro/Black Box area: 0.000000
25 Net Interconnect area: undefined (Wire load has zero net area)
26
27 Total cell area: 136.494723
28 Total area: undefined
29
30
31 Hierarchical area distribution
32 -----
33
34
35
36 Hierarchical cell Global cell area Local cell area
37
38
39 hw2_pipe 136.4947 100.0 1.6589 0.0000 0.0000 hw2_pipe
40 u_stage1 32.9184 24.1 0.5702 19.3882 0.0000 hw2_pipe_stage1
41 u_stage1/r368 12.9600 9.5 12.9600 0.0000 0.0000 hw2_pipe_stage1_DW01_addsub_0
42 u_stage2 101.9174 74.7 0.1555 18.2477 0.0000 hw2_pipe_stage2
43 u_stage2/mult_11 83.5142 61.2 83.5142 0.0000 0.0000 hw2_pipe_stage2_DW_mult_uns_0
44
45 Total 98.8589 37.6358 0.0000

```

## Power report

```
*****
Report : power
  -analysis_effort low
Design : hw2_pipe
Version: T-2022.03
Date  : Thu Oct 30 15:15:38 2025
*****


Library(s) Used:
  N16ADFP_StdCellss0p72vm40c_ccs (File: /cad/CBDK/ADFP/Executable_Package/Collaterals/IP/stdcell/N16ADFP_StdCellss0p72vm40c_ccs)

Operating Conditions: ss0p72vm40c Library: N16ADFP_StdCellss0p72vm40c_ccs
Wire Load Model Mode: top

Design      Wire Load Model      Library
hw2_pipe     ZeroWireload      N16ADFP_StdCellss0p72vm40c_ccs

Global Operating Voltage = 0.72
Power-specific unit information :
  Voltage Units = 1V
  Capacitance Units = 1.000000pf
  Time Units = 1ns
  Dynamic Power Units = 1mW   (derived from V,C,T units)
  Leakage Power Units = 1nW

Attributes
-----
i - Including register clock pin internal power

Cell Internal Power = 243.6456 uW (83%)
Net Switching Power = 50.3021 uW (17%)
Total Dynamic Power = 293.9477 uW (100%)
Cell Leakage Power = 71.8967 nW



| Power Group   | Internal Power | Switching Power | Leakage Power | Total Power | ( % )     | Attrs |
|---------------|----------------|-----------------|---------------|-------------|-----------|-------|
| io_pad        | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| memory        | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| black_box     | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| clock_network | 0.1620         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  | i     |
| register      | 2.2950e-02     | 1.1243e-03      | 23.2573       | 0.1861      | ( 63.30%) |       |
| sequential    | 0.0000         | 0.0000          | 0.0000        | 0.0000      | ( 0.00%)  |       |
| combinational | 5.8671e-02     | 4.9178e-02      | 48.6375       | 0.1079      | ( 36.70%) |       |
| Total         | 0.2436 mW      | 5.0302e-02 mW   | 71.8948 nW    | 0.2940 mW   |           |       |
| 1             |                |                 |               |             |           |       |


```

## ● clock-gating 之波形圖與發現

### pre-sim





## Gate-level-sim



## 觀察與發現

在設計 clock\_gating 時，我把 enable signal 設成  $|c$ ，觀察波型圖可以發現當  $c = 1$  時，時脈訊號能夠通過並且將 value 送到第二個 stage，如果  $c = 0$  時，就不會把 value 往下送到下一個 stage，以此節省功耗，這點可以在後面的 report 發現並驗證。

## Timing report

```

1
2 *****
3 Report : timing
4   -path full
5   -delay max
6   -max_paths 1
7 Design : hw2_clockgating
8 Version: T-2022.03
9 Date  : Fri Oct 24 21:58:19 2025
10 *****
11
12 Operating Conditions: ss0p72vm40c Library: N16ADFP_StdCellss0p72vm40c_ccs
13 Wire Load Model Mode: top
14
15 Startpoint: u_stage1/c_pipe_out_reg_4_
16   (rising edge-triggered flip-flop clocked by clk)
17 Endpoint: u_stage2/result_reg_15_
18   (rising edge-triggered flip-flop clocked by clk)
19 Path Group: clk
20 Path Type: max
21
22 Des/Clust/Port      Wire Load Model      Library
23 -----
24 hw2_clockgating      ZeroWireload        N16ADFP_StdCellss0p72vm40c_ccs
25
26 Point                Incr      Path
27 -----
28 clock clk (rise edge)          0.000000  0.000000
29 clock network delay (ideal)  0.000000  0.000000
30 u_stage1/c_pipe_out_reg_4_/CP (DFCNQD2BWP16P90LVT) 0.000000  0.000000 r
31 u_stage1/c_pipe_out_reg_4_/Q (DFCNQD2BWP16P90LVT)  0.035995  0.035995 r
32 u_stage1/c_pipe_out[4] (hw2_clockgating_stage1)    0.000000  0.035995 r
33 u_stage2/c[4] (hw2_clockgating_stage2)              0.000000  0.035995 r
34 u_stage2/mult_11/b[4] (hw2_clockgating_stage2_DW_mult_uns_1) 0.000000  0.035995 r
35                                         0.000000  0.035995 r
36 u_stage2/mult_11/U313/ZN (CKND1BWP16P90LVT)       0.017470  0.053465 f
37 u_stage2/mult_11/U221/ZN (NR2D1BWP16P90LVT)       0.016621  0.070086 r
38 u_stage2/mult_11/U107/S (HA1D2BWP16P90LVT)        0.025576  0.095662 f
39 u_stage2/mult_11/U105/S (FA1D1BWP16P90LVT)        0.039975  0.135637 r
40 u_stage2/mult_11/U104/S (FA1D1BWP16P90LVT)        0.036698  0.172335 f
41 u_stage2/mult_11/U105/S (FA1D1BWP16P90LVT)        0.039975  0.135637 r
42 u_stage2/mult_11/U104/S (FA1D1BWP16P90LVT)        0.036698  0.172335 f
43 u_stage2/mult_11/U211/Z (OR2D1BWP16P90LVT)        0.015246  0.187581 f
44 u_stage2/mult_11/U209/ZN (IOA21D1BWP16P90LVT)     0.015210  0.202791 f
45 u_stage2/mult_11/U206/ZN (AOI21D1BWP16P90LVT)     0.013888  0.216679 r
46 u_stage2/mult_11/U19/ZN (OAI21D1BWP16P90LVT)      0.012535  0.229214 f
47 u_stage2/mult_11/U208/ZN (IOA21D1BWP16P90LVT)     0.017662  0.246875 f
48 u_stage2/mult_11/U8/CO (FA1D1BWP16P90LVT)         0.027383  0.274258 f
49 u_stage2/mult_11/U7/CO (FA1D1BWP16P90LVT)         0.027622  0.301880 f
50 u_stage2/mult_11/U6/CO (FA1D1BWP16P90LVT)         0.027625  0.329505 f
51 u_stage2/mult_11/U5/CO (FA1D1BWP16P90LVT)         0.027625  0.357130 f
52 u_stage2/mult_11/U4/CO (FA1D1BWP16P90LVT)         0.027625  0.384754 f
53 u_stage2/mult_11/U3/CO (FA1D1BWP16P90LVT)         0.026767  0.411521 f
54 u_stage2/mult_11/U222/Z (XOR2D1BWP16P90LVT)       0.017583  0.429104 r
55 u_stage2/mult_11/product[15] (hw2_clockgating_stage2_DW_mult_uns_1) 0.000000  0.429104 r
56 u_stage2/result_reg_15_/D (DFCNQD2BWP16P90LVT)    0.000000  0.441278 r
57 data arrival time                                0.000000  0.441278
58
59 clock clk (rise edge)          0.485214  0.485214
60 clock network delay (ideal)  0.000000  0.485214
61 u_stage2/result_reg_15_/CP (DFCNQD2BWP16P90LVT)  0.000000  0.485214 r
62 library setup time           -0.014301  0.470913
63 data required time          0.470913
64
65 data required time          0.470913
66 data arrival time           -0.441278
67
68 slack (MET)                  0.029636
--
```

## Area report

```
*****
Report : area
Design : hw_clockgating
Version: T-2022.03
Date  : Fri Oct 24 21:58:19 2025
*****

Library(s) Used:
  N16ADFP_StdCellss0p72vm40c_ccs (File: /cad/CBDK/ADFP/Executable_Package/Collaterals/IP/stdcell/N16ADFP_StdCell/CCS/N

Number of ports: 181
Number of nets: 526
Number of cells: 304
Number of combinational cells: 267
Number of sequential cells: 33
Number of macros/black boxes: 0
Number of buf/inv: 44
Number of references: 2

Combinational area: 114.566402
Buf/Inv area: 7.153920
Noncombinational area: 37.635841
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)

Total cell area: 152.202244
Total area: undefined

Hierarchical area distribution
-----


| Hierarchical cell | Global cell area |               |                | Local cell area   |             |                                      | Design |
|-------------------|------------------|---------------|----------------|-------------------|-------------|--------------------------------------|--------|
|                   | Absolute Total   | Percent Total | Combi-national | Noncombi-national | Black-boxes |                                      |        |
| hw2_clockgating   | 152.2022         | 100.0         | 0.0000         | 0.0000            | 0.0000      | hw2_clockgating                      |        |
| u_stage1          | 43.6493          | 28.7          | 11.3011        | 19.3882           | 0.0000      | hw2_clockgating_stage1               |        |
| u_stage1/r368     | 12.9600          | 8.5           | 12.9600        | 0.0000            | 0.0000      | hw2_clockgating_stage1_DW01_addsub_0 |        |
| u_stage2          | 108.5530         | 71.3          | 5.1322         | 18.2477           | 0.0000      | hw2_clockgating_stage2               |        |
| u_stage2/mult_11  | 85.1731          | 56.0          | 85.1731        | 0.0000            | 0.0000      | hw2_clockgating_stage2_DW_mult_11    |        |
| Total             | 114.5664         | 100.0         | 37.6358        | 0.0000            | 0.0000      |                                      |        |


-----
```

## Power report

```
*****
Report : power
  -analysis_effort low
Design : hw_clockgating
Version: T-2022.03
Date  : Fri Oct 24 21:58:31 2025
*****


Library(s) Used:
  N16ADFP_StdCellss0p72vm40c_ccs (File: /cad/CBDK/ADFP/Executable_Package/Collaterals/IP/stdcell/N16

Operating Conditions: ss0p72vm40c Library: N16ADFP_StdCellss0p72vm40c_ccs
Wire Load Model Mode: top

Design      Wire Load Model      Library
hw2_clockgating      ZeroWireload      N16ADFP_StdCellss0p72vm40c_ccs

Global Operating Voltage = 0.72
Power-specific unit information :
  Voltage Units = 1V
  Capacitance Units = 1.000000pf
  Time Units = 1ns
  Dynamic Power Units = 1mW   (derived from V,C,T units)
  Leakage Power Units = 1nW

Attributes
-----
i - Including register clock pin internal power

  Cell Internal Power = 263.1686 uW (81%)
  Net Switching Power = 61.6695 uW (19%)
  -----
  Total Dynamic Power = 324.8381 uW (100%)
  Cell Leakage Power = 94.2606 nW

  Power Group      Internal Power      Switching Power      Leakage Power      Total Power ( % ) Attrs
  -----
  io_pad          0.0000            0.0000            0.0000            0.0000 ( 0.00% )
  memory          0.0000            0.0000            0.0000            0.0000 ( 0.00% )
  black_box        0.0000            0.0000            0.0000            0.0000 ( 0.00% )
  clock_network    0.1677            0.0000            0.0000            0.0000 ( 0.00% )
  register         2.3721e-02       2.6063e-03       23.4159           0.1941 ( 59.73% ) i
  sequential        0.0000            0.0000            0.0000            0.0000 ( 0.00% )
  combinational    7.1706e-02       5.9063e-02       70.8428           0.1308 ( 40.27% )

  Total           0.2632 mW       6.1669e-02 mW       94.2587 nW       0.3249 mW
```

## 測量二階不同 pipeline 內的 timing 資訊

| Des/Clust/Port                                        | Wire Load Model | Library                        |
|-------------------------------------------------------|-----------------|--------------------------------|
| hw2_pipe                                              | ZeroWireload    | N16ADFP_StdCellss0p72vm40c_ccs |
| Point                                                 | Incr            | Path                           |
| clock clk (rise edge)                                 | 0.0000          | 0.0000                         |
| clock network delay (ideal)                           | 0.0000          | 0.0000                         |
| input external delay                                  | 0.0000          | 0.0000 f                       |
| s (in)                                                | 0.0000          | 0.0000 f                       |
| u_stage1/s (hw2_pipe_stage1)                          | 0.0000          | 0.0000 f                       |
| u_stage1/U4/ZN (CKND1BWP16P90LVT)                     | 0.0188          | 0.0188 r                       |
| u_stage1/r368/ADD_SUB (hw2_pipe_stage1_DW01_addsub_0) | 0.0000          | 0.0188 r                       |
| u_stage1/r368/U9/Z (XOR2D1BWP16P90)                   | 0.0461          | 0.0650 f                       |
| u_stage1/r368/U1_0/CO (FA1D1BWP16P90LVT)              | 0.0300          | 0.0950 f                       |
| u_stage1/r368/U1_1/CO (FA1D1BWP16P90LVT)              | 0.0277          | 0.1226 f                       |
| u_stage1/r368/U1_2/CO (FA1D1BWP16P90LVT)              | 0.0276          | 0.1503 f                       |
| u_stage1/r368/U1_3/CO (FA1D1BWP16P90LVT)              | 0.0276          | 0.1779 f                       |
| u_stage1/r368/U1_4/CO (FA1D1BWP16P90LVT)              | 0.0276          | 0.2056 f                       |
| u_stage1/r368/U1_5/CO (FA1D1BWP16P90LVT)              | 0.0276          | 0.2332 f                       |
| u_stage1/r368/U1_6/CO (FA1D1BWP16P90LVT)              | 0.0276          | 0.2608 f                       |
| u_stage1/r368/U1_7/CO (FA1D1BWP16P90LVT)              | 0.0234          | 0.2842 f                       |
| u_stage1/r368/U17/Z (XOR2D1BWP16P90LVT)               | 0.0226          | 0.3068 r                       |
| u_stage1/r368/SUM[8] (hw2_pipe_stage1_DW01_addsub_0)  | 0.0000          | 0.3068 r                       |
| u_stage1/sum_pipe_out_reg_8_/D (DFCNQD2BWP16P90LVT)   | 0.0000          | 0.3068 r                       |
| data arrival time                                     | 0.3068          |                                |
| clock clk (rise edge)                                 | 0.5021          | 0.5021                         |
| clock network delay (ideal)                           | 0.0000          | 0.5021                         |
| u_stage1/sum_pipe_out_reg_8_/_CP (DFCNQD2BWP16P90LVT) | 0.0000          | 0.5021 r                       |
| library setup time                                    | -0.0145         | 0.4876                         |
| data required time                                    |                 | 0.4876                         |
| data required time                                    | 0.4876          |                                |
| data arrival time                                     | -0.3068         |                                |
| slack (MET)                                           | 0.1808          |                                |

| Des/Clust/Port                                               | Wire Load Model | Library                        |
|--------------------------------------------------------------|-----------------|--------------------------------|
| hw2_pipe                                                     | ZeroWireload    | N16ADFP_StdCellss0p72vm40c_ccs |
| Point                                                        | Incr            | Path                           |
| clock clk (rise edge)                                        | 0.0000          | 0.0000                         |
| clock network delay (ideal)                                  | 0.0000          | 0.0000                         |
| u_stage1/c_pipe_out_reg_1_/_CP (DFCNQD1BWP16P90LVT)          | 0.0000          | 0.0000 r                       |
| u_stage1/c_pipe_out_reg_1_/_Q (DFCNQD1BWP16P90LVT)           | 0.0330          | 0.0330 r                       |
| u_stage1/c_pipe_out[1] (hw2_pipe_stage1)                     | 0.0000          | 0.0330 r                       |
| u_stage2/c[1] (hw2_pipe_stage2)                              | 0.0000          | 0.0330 r                       |
| u_stage2/mult_11/b[1] (hw2_pipe_stage2_DW_mult_uns_0)        | 0.0000          | 0.0330 r                       |
| u_stage2/mult_11/U161/ZN (CKND1BWP16P90LVT)                  | 0.0172          | 0.0502 f                       |
| u_stage2/mult_11/U185/ZN (NR2D1BWP16P90LVT)                  | 0.0164          | 0.0666 r                       |
| u_stage2/mult_11/U190/CO (HA1D1BWP16P90LVT)                  | 0.0193          | 0.0859 r                       |
| u_stage2/mult_11/U62/S (FA1D1BWP16P90LVT)                    | 0.0367          | 0.1226 f                       |
| u_stage2/mult_11/U193/CO (FA1D1BWP16P90LVT)                  | 0.0273          | 0.1498 f                       |
| u_stage2/mult_11/U192/CO (FA1D1BWP16P90LVT)                  | 0.0276          | 0.1774 f                       |
| u_stage2/mult_11/U162/CO (FA1D1BWP16P90LVT)                  | 0.0276          | 0.2051 f                       |
| u_stage2/mult_11/U171/CO (FA1D1BWP16P90LVT)                  | 0.0276          | 0.2327 f                       |
| u_stage2/mult_11/U160/CO (FA1D1BWP16P90LVT)                  | 0.0283          | 0.2610 f                       |
| u_stage2/mult_11/U159/CO (FA1D2BWP16P90LVT)                  | 0.0235          | 0.2845 f                       |
| u_stage2/mult_11/U8/CO (FA1D1BWP16P90LVT)                    | 0.0263          | 0.3109 f                       |
| u_stage2/mult_11/U7/CO (FA1D1BWP16P90LVT)                    | 0.0276          | 0.3385 f                       |
| u_stage2/mult_11/U6/CO (FA1D1BWP16P90LVT)                    | 0.0276          | 0.3661 f                       |
| u_stage2/mult_11/U157/CO (FA1D1BWP16P90LVT)                  | 0.0276          | 0.3938 f                       |
| u_stage2/mult_11/U163/CO (FA1D1BWP16P90LVT)                  | 0.0276          | 0.4214 f                       |
| u_stage2/mult_11/U191/CO (FA1D1BWP16P90LVT)                  | 0.0232          | 0.4446 f                       |
| u_stage2/mult_11/U201/Z (XOR3D1BWP16P90LVT)                  | 0.0407          | 0.4853 r                       |
| u_stage2/mult_11/product[15] (hw2_pipe_stage2_DW_mult_uns_0) | 0.0000          | 0.4853 r                       |
| u_stage2/result_reg_15_/_D (DFCNQD2BWP16P90LVT)              | 0.0000          | 0.4853 r                       |
| data arrival time                                            | 0.4853          |                                |
| clock clk (rise edge)                                        | 0.5021          | 0.5021                         |
| clock network delay (ideal)                                  | 0.0000          | 0.5021                         |
| u_stage2/result_reg_15_/_CP (DFCNQD2BWP16P90LVT)             | 0.0000          | 0.5021 r                       |
| library setup time                                           | -0.0146         | 0.4874                         |
| data required time                                           | 0.4874          |                                |
| data required time                                           | 0.4874          |                                |
| data arrival time                                            | -0.4853         |                                |
| slack (MET)                                                  | 0.0021          |                                |





## 三個電路設計的 report 發現

### 1. Area (面積) 分析

- **Non-pipelined:**  $95.02 \mu\text{m}^2$  (面積最小)
- **Pipelined:**  $136.49 \mu\text{m}^2$  (面積中等)
- **Clock-gated:**  $152.20 \mu\text{m}^2$  (面積最大)

發現： 跟 non-pipelined 比起來，Pipelining 會顯著增加電路面積。Non-pipelined 的 Area Report 顯示 Number of sequential cells: 0，而 Pipelined 版本則有 33 個 sequential cells，其 Noncombinational area (即 Flip-Flop 面積) 高達 37.63。表明了增加的 Flip-Flop 是面積增加的主要原因。Clock-gated 設計在 Pipelined 基礎上又增加了時脈閘控邏輯，因此面積最大。

### 2. Timing (時序/效能) 分析

- **Non-pipelined:** 0.7029 ns (最慢)
- **Pipelined:** 0.4853 ns (較快)
- **Clock-gated:** 0.4413 ns (最快)

發現： 在這裡可以發現 pipelining 的優勢，比 non-pipelined 快了許多，因為可以分階段去處理運算，在 pipelined 中，我的電路把 critical path length 縮短到 0.4853，代表 pipelined 的電路可以跑得更快

### 3. Power (功耗) 分析 (DC 報告)

- **Dynamic Power:**
  - Non-pipelined: 106.67 uW (最低)
  - Pipelined: 293.94 uW
  - Clock-gated: 324.83 uW (最高)
- **Leakage Power:**

- Non-pipelined: 56.45 nW (最低)
- Pipelined: 71.89 nW
- Clock-gated: 94.26 nW (最高)

### 發現：

- **動態功耗**：加入 Pipelining 之後，動態功耗明顯增加（從 106.67 uW 跳到 293.94 uW）。我推測主要原因是 pipeline 裡的暫存器在每個時脈週期都會產生大量的切換功耗和內部功耗，占了整體功耗的六成以上。
- **意外的發現**：本來以為加上 Clock Gating 可以省電，結果動態功耗反而更高（324.83 uW），比 Pipelined 設計還多。可能的原因有幾個：
  1. 加上的時脈閘控邏輯本身也會吃掉一些功耗。
  2. 用來控制 enable 訊號的邏輯在切換時也會消耗功率，抵銷了原本預期的節能效果。
- **洩漏功耗 (Leakage Power)**：我認為這部分跟電路的總面積有關，面積越大，洩漏功耗也越高。像是 Non-pipelined 設計面積最小（95.02），洩漏功耗也最低（56.45 nW）；而 Clock-gated 設計面積最大（152.20），洩漏功耗也最高（94.26 nW）。

### 數據表格

| Design                        | Area ( $\mu\text{m}^2$ ) |           |            | Delay    | Latency  | Power       |             |           |
|-------------------------------|--------------------------|-----------|------------|----------|----------|-------------|-------------|-----------|
|                               | CL.                      | SL.       | Total      | (ns)     | (ns)     | Dynamic(uW) | Leakage(nW) | Total(uW) |
| Non-pipelined<br>(DC)         | 95.022722                | 0         | 95.022722  | 0.702939 | 0.702939 | 106.6767    | 56.4581     | 106.733   |
| Non-pipelined<br>(Prime Time) |                          |           |            | 0.702939 | 0.702939 | 8.53        | 53.6        | 8.5836    |
| Pipelined<br>(DC)             | 98.858882                | 37.635841 | 136.494723 | 0.4853   | 0.7921   | 293.9477    | 71.8967     | 294.0195  |
| Pipelined<br>(Prime Time)     |                          |           |            | 0.4853   | 0.7921   | 28.0        | 75.8        | 28.0758   |

|                             |            |           |            |        |        |          |         |          |
|-----------------------------|------------|-----------|------------|--------|--------|----------|---------|----------|
| Clock-gated<br>(DC)         |            |           |            | 0.4413 | 0.7639 | 324.8381 | 94.2606 | 324.9323 |
| Clock-gated<br>(Prime Time) | 114.566402 | 37.635841 | 152.202244 | 0.4413 | 0.7639 | 22.7     | 97.8    | 22.797   |

## primetime 的 power 結果

### non-pipelined

```

1 ****
2 Report : Averaged Power
3   -hierarchy
4 Design : hw2_nonpipe
5 Version: V-2023.12
6 Date   : Wed Oct 29 17:57:21 2025
7 ****
8
9
10
11          Int      Switch    Leak     Total
12 Hierarchy          Power      Power    Power    %
13 -----
14 hw2_nonpipe          4.40e-06 4.13e-06 5.36e-08 8.58e-06 100.0
15   r373 (hw2_nonpipe_DW01_addsub_0) 1.25e-06 5.90e-07 5.00e-09 1.84e-06 21.5
16   mult_6 (hw2_nonpipe_DW_mult_uns_1) 3.15e-06 3.42e-06 4.86e-08 6.62e-06 77.1
17 1

```

### Pipelined

```

*****
Report : Averaged Power
-hierarchy
Design : hw2_pipe
Version: V-2023.12
Date   : Wed Oct 29 17:25:32 2025
*****


          Int      Switch    Leak     Total
Hierarchy          Power      Power    Power    %
-----
hw2_pipe          1.99e-05 8.11e-06 7.58e-08 2.81e-05 100.0
  u_stage1 (hw2_pipe_stage1) 7.19e-06 1.37e-06 1.96e-08 8.57e-06 30.5
    r368 (hw2_pipe_stage1_DW01_addsub_0)
                           2.03e-06 1.08e-06 5.00e-09 3.12e-06 11.1
  u_stage2 (hw2_pipe_stage2) 1.26e-05 6.69e-06 5.62e-08 1.93e-05 68.8
    mult_11 (hw2_pipe_stage2_DW_mult_uns_0)
                           8.18e-06 6.69e-06 4.37e-08 1.49e-05 53.1

```

### clock-gating

```

1 |*****
2 Report : Averaged Power
3   -hierarchy
4 Design : hw2_clockgating
5 Version: V-2023.12
6 Date   : Wed Oct 29 17:06:32 2025
7 *****
8
9
10
11                                     Int     Switch    Leak      Total
12 Hierarchy                           Power    Power    Power    Power    %
13 -----
14 hw2_clockgating                      1.67e-05 6.02e-06 9.78e-08 2.28e-05 100.0
15   u_stage1 (hw2_clockgating_stage1)  7.72e-06 2.00e-06 2.84e-08 9.74e-06 42.7
16     r368 (hw2_clockgating_stage1_DW01_addsub_0)
17                               2.06e-06 1.11e-06 5.00e-09 3.18e-06 13.9
18   u_stage2 (hw2_clockgating_stage2)  9.00e-06 4.02e-06 6.93e-08 1.31e-05 57.3
19     mult_11 (hw2_clockgating_stage2_DW_mult_uns_1)
20                               4.26e-06 3.77e-06 5.17e-08 8.08e-06 35.4
21 1

```

## 自動驗證運算結果

### Nonpipelined-pre-sim

```

RTL Verification: Test 185 | Inputs: a=c3, b=eb, c=f2, s=0 | Expected D: be30 | Actual D: be30
RTL Verification: Test 186 | Inputs: a=89, b=6f, c=ad, s=1 | Expected D: a798 | Actual D: a798
RTL Verification: Test 187 | Inputs: a=68, b=25, c=ab, s=1 | Expected D: 5e2f | Actual D: 5e2f
RTL Verification: Test 188 | Inputs: a=1b, b=80, c=82, s=1 | Expected D: 4eb6 | Actual D: 4eb6
RTL Verification: Test 189 | Inputs: a=04, b=57, c=32, s=0 | Expected D: 53ca | Actual D: 53ca
RTL Verification: Test 190 | Inputs: a=79, b=e4, c=aa, s=0 | Expected D: 0cf2 | Actual D: 0cf2
RTL Verification: Test 191 | Inputs: a=c0, b=2e, c=71, s=0 | Expected D: 4072 | Actual D: 4072
RTL Verification: Test 192 | Inputs: a=fb, b=bc, c=17, s=0 | Expected D: 05a9 | Actual D: 05a9
RTL Verification: Test 193 | Inputs: a=c1, b=eb, c=89, s=1 | Expected D: e50c | Actual D: e50c
RTL Verification: Test 194 | Inputs: a=8f, b=bf, c=d3, s=0 | Expected D: 7e70 | Actual D: 7e70
RTL Verification: Test 195 | Inputs: a=d6, b=82, c=71, s=0 | Expected D: 2514 | Actual D: 2514
RTL Verification: Test 196 | Inputs: a=d7, b=0e, c=cd, s=1 | Expected D: b761 | Actual D: b761
RTL Verification: Test 197 | Inputs: a=05, b=1c, c=ec, s=0 | Expected D: c2cc | Actual D: c2cc
RTL Verification: Test 198 | Inputs: a=2a, b=65, c=03, s=1 | Expected D: 01ad | Actual D: 01ad
RTL Verification: Test 199 | Inputs: a=fe, b=fd, c=ee, s=0 | Expected D: 00e8 | Actual D: 00e8
RTL Verification: Test 200 | Inputs: a=fe, b=fd, c=e8, s=0 | Expected D: 00e8 | Actual D: 00e8
RTL Verification: Test 200 | Inputs: a=fe, b=fd, c=e8, s=0 | Expected D: 00e8 | Actual D: 00e8

=====
SUCCESS! All 200 tests passed.
=====
$finish called from file "/MasterClass/M143010068_HDL/HW2/non_pipelined/pre_sim/testbench.v", line 96.
$finish at simulation time 204000
          VCS Simulation Report
Time: 2040000 ps

```

### Nonpipelined-post-sim

```

Post-sim Verification: Test 284 | Inputs: a=c9, b=c9, c=da, s=0 | Expected D: 0000 | Actual D: 0000
Post-sim Verification: Test 285 | Inputs: a=ad, b=df, c=70, s=1 | Expected D: ad40 | Actual D: ad40
Post-sim Verification: Test 286 | Inputs: a=83, b=51, c=80, s=0 | Expected D: 1900 | Actual D: 1900
Post-sim Verification: Test 287 | Inputs: a=e9, b=00, c=df, s=0 | Expected D: d3ad | Actual D: d3ad
Post-sim Verification: Test 288 | Inputs: a=c9, b=84, c=e9, s=0 | Expected D: 3ecd | Actual D: 3ecd
Post-sim Verification: Test 289 | Inputs: a=10, b=de, c=09, s=0 | Expected D: 0ac2 | Actual D: 0ac2
Post-sim Verification: Test 290 | Inputs: a=d7, b=61, c=91, s=1 | Expected D: b0b8 | Actual D: b0b8
Post-sim Verification: Test 291 | Inputs: a=d3, b=9a, c=4e, s=1 | Expected D: 6f36 | Actual D: 6f36
Post-sim Verification: Test 292 | Inputs: a=b7, b=c6, c=9c, s=1 | Expected D: e82c | Actual D: e82c
Post-sim Verification: Test 293 | Inputs: a=2f, b=44, c=e2, s=1 | Expected D: 2c06 | Actual D: 2c06
Post-sim Verification: Test 294 | Inputs: a=e9, b=2e, c=c6, s=1 | Expected D: d7ca | Actual D: d7ca
Post-sim Verification: Test 295 | Inputs: a=aa, b=a0, c=f0, s=1 | Expected D: 3560 | Actual D: 3560
Post-sim Verification: Test 296 | Inputs: a=e8, b=cd, c=bb, s=1 | Expected D: 3f37 | Actual D: 3f37
Post-sim Verification: Test 297 | Inputs: a=df, b=7d, c=9f, s=0 | Expected D: 3cde | Actual D: 3cde
Post-sim Verification: Test 298 | Inputs: a=97, b=06, c=c4, s=1 | Expected D: 7834 | Actual D: 7834
Post-sim Verification: Test 299 | Inputs: a=d7, b=5d, c=73, s=1 | Expected D: 8a5c | Actual D: 8a5c
Post-sim Verification: Test 300 | Inputs: a=7f, b=4e, c=e0, s=1 | Expected D: b360 | Actual D: b360

=====
SUCCESS! All 301 tests passed.
=====
$finish called from file "/MasterClass/M143010068_HDL/HW2/non_pipelined/post_sim/testbench.v", line 109.
$finish at simulation time 3130000
          VCS Simulation Report
Time: 3130000 ps

```

## pipelined-pre-sim

```
RTL Verification: Test 190 | Inputs: a=79, b=e4, c=aa, s=0 | Expected D: 53ca | Actual D: 53ca
RTL Verification: Test 191 | Inputs: a=c0, b=2e, c=71, s=0 | Expected D: 0cf2 | Actual D: 0cf2
RTL Verification: Test 192 | Inputs: a=fb, b=bc, c=17, s=0 | Expected D: 4072 | Actual D: 4072
RTL Verification: Test 193 | Inputs: a=c1, b=eb, c=89, s=1 | Expected D: 05a9 | Actual D: 05a9
RTL Verification: Test 194 | Inputs: a=8f, b=bf, c=d3, s=0 | Expected D: e50c | Actual D: e50c
RTL Verification: Test 195 | Inputs: a=d6, b=82, c=71, s=0 | Expected D: 7e70 | Actual D: 7e70
RTL Verification: Test 196 | Inputs: a=d7, b=0e, c=cd, s=1 | Expected D: 2514 | Actual D: 2514
RTL Verification: Test 197 | Inputs: a=05, b=1c, c=ec, s=0 | Expected D: b761 | Actual D: b761
RTL Verification: Test 198 | Inputs: a=2a, b=65, c=03, s=1 | Expected D: c2cc | Actual D: c2cc
RTL Verification: Test 199 | Inputs: a=fe, b=fd, c=eb, s=0 | Expected D: 01ad | Actual D: 01ad
RTL Verification: Test 200 | Inputs: a=fe, b=fd, c=eb, s=0 | Expected D: 00e8 | Actual D: 00e8
RTL Verification: Test 200 | Inputs: a=fe, b=fd, c=eb, s=0 | Expected D: 00e8 | Actual D: 00e8

=====
SUCCESS! All 200 tests passed.

=====
$finish called from file "/MasterClass/M143010068_HDL/HW2/pipelined/pre_sim/testbench.v", line 105.
$finish at simulation time 2041000
          V C S  S i m u l a t i o n   R e p o r t
Time: 2041000 ps
CPU Time: 0.610 seconds;      Data structure size: 0.0Mb
Sat Nov 1 18:57:48 2025
CPU time: .428 seconds to compile + .458 seconds to elab + .430 seconds to link + .664 seconds in simulation
```

## pipelined-post-sim

```
RTL Verification: Test 290 | Inputs: a=d7, b=61, c=91, s=1 | Expected D: 0ac2 | Actual D: 0ac2
RTL Verification: Test 291 | Inputs: a=d3, b=9a, c=4e, s=1 | Expected D: b0b8 | Actual D: b0b8
RTL Verification: Test 292 | Inputs: a=b7, b=c6, c=9c, s=1 | Expected D: 6f36 | Actual D: 6f36
RTL Verification: Test 293 | Inputs: a=2f, b=44, c=62, s=1 | Expected D: e82c | Actual D: e82c
RTL Verification: Test 294 | Inputs: a=e9, b=2e, c=c6, s=1 | Expected D: 2c06 | Actual D: 2c06
RTL Verification: Test 295 | Inputs: a=aa, b=a0, c=f0, s=1 | Expected D: d7ca | Actual D: d7ca
RTL Verification: Test 296 | Inputs: a=e8, b=cd, c=bb, s=1 | Expected D: 3560 | Actual D: 3560
RTL Verification: Test 297 | Inputs: a=df, b=7d, c=9f, s=0 | Expected D: 3f37 | Actual D: 3f37
RTL Verification: Test 298 | Inputs: a=97, b=06, c=c4, s=1 | Expected D: 3cde | Actual D: 3cde
RTL Verification: Test 299 | Inputs: a=d7, b=5d, c=73, s=1 | Expected D: 7834 | Actual D: 7834
RTL Verification: Test 300 | Inputs: a=d7, b=5d, c=73, s=1 | Expected D: 8a5c | Actual D: 8a5c
RTL Verification: Test 300 | Inputs: a=d7, b=5d, c=73, s=1 | Expected D: 8a5c | Actual D: 8a5c

=====
SUCCESS! All 300 tests passed.

=====
$finish called from file "/MasterClass/M143010068_HDL/HW2/pipelined/post_sim/testbench.v", line 116.
$finish at simulation time 3041000
          V C S  S i m u l a t i o n   R e p o r t
Time: 3041000 ps
CPU Time: 0.750 seconds;      Data structure size: 0.3Mb
Sat Nov 1 18:58:24 2025
CPU time: 6.454 seconds to compile + .467 seconds to elab + .987 seconds to link + .813 seconds in simulation
```

## clock\_gating-pre-sim

```
RTL Verification: Test 291 | Inputs: a=9a, b=2f, c=b7, s=0 | Expected D: 5aaa | Actual D: 5aaa
RTL Verification: Test 292 | Inputs: a=c6, b=5d, c=2f, s=0 | Expected D: 4c7d | Actual D: 4c7d
RTL Verification: Test 293 | Inputs: a=44, b=9b, c=e9, s=0 | Expected D: 1347 | Actual D: 1347
RTL Verification: Test 294 | Inputs: a=2e, b=8d, c=aa, s=0 | Expected D: 82d1 | Actual D: 82d1
RTL Verification: Test 295 | Inputs: a=a0, b=37, c=e8, s=0 | Expected D: 14ea | Actual D: 14ea
RTL Verification: Test 296 | Inputs: a=cdf, b=d1, c=df, s=1 | Expected D: 5f28 | Actual D: 5f28
RTL Verification: Test 297 | Inputs: a=7d, b=58, c=97, s=1 | Expected D: 68a2 | Actual D: 68a2
RTL Verification: Test 298 | Inputs: a=06, b=29, c=d7, s=0 | Expected D: 7da3 | Actual D: 7da3
RTL Verification: Test 299 | Inputs: a=5d, b=4d, c=71, s=1 | Expected D: 909b | Actual D: 909b
RTL Verification: Test 300 | Inputs: a=4e, b=b9, c=39, s=0 | Expected D: 5456 | Actual D: 5456

=====
SUCCESS! All 301 tests passed.

=====
$finish called from file "/MasterClass/M143010068_HDL/HW2/clock_gating/pre_sim/testbench.v", line 108.
$finish at simulation time 305500
          V C S  S i m u l a t i o n   R e p o r t
Time: 3055000 ps
CPU Time: 0.590 seconds;      Data structure size: 0.0Mb
Sat Nov 1 18:58:52 2025
CPU time: .419 seconds to compile + .513 seconds to elab + .438 seconds to link + .641 seconds in simulation
```

## clock\_gating -post-sim

```
RTL Verification: Test 294 | Inputs: a=e9, b=2e, c=c6, s=1 | Expected D: 2c06 | Actual D: 2c06
RTL Verification: Test 295 | Inputs: a=aa, b=a0, c=f0, s=1 | Expected D: d7ca | Actual D: d7ca
RTL Verification: Test 296 | Inputs: a=e8, b=cd, c=bb, s=1 | Expected D: 3560 | Actual D: 3560
RTL Verification: Test 297 | Inputs: a=df, b=7d, c=9f, s=0 | Expected D: 3f37 | Actual D: 3f37
RTL Verification: Test 298 | Inputs: a=97, b=06, c=c4, s=1 | Expected D: 3cde | Actual D: 3cde
RTL Verification: Test 299 | Inputs: a=d7, b=5d, c=73, s=1 | Expected D: 7834 | Actual D: 7834
RTL Verification: Test 300 | Inputs: a=d7, b=5d, c=73, s=1 | Expected D: 7834 | Actual D: 7834
RTL Verification: Test 300 | Inputs: a=d7, b=5d, c=73, s=1 | Expected D: 7834 | Actual D: 7834

=====
SUCCESS! All 300 tests passed.

=====
$finish called from file "/MasterClass/M143010068_HDL/HW2/clock_gating/post_sim/testbench.v", line 122.
$finish at simulation time 3045000
          V C S  S i m u l a t i o n   R e p o r t
Time: 3045000 ps
CPU Time: 0.790 seconds;      Data structure size: 0.3Mb
Sat Nov 1 18:59:18 2025
CPU time: 6.028 seconds to compile + .475 seconds to elab + 1.105 seconds to link + .849 seconds in simulation
```

# Vivado

## Behavior 波形 & post-implement 波形並解釋

### Behavior 波形



### post-implement 波形





## 觀察與發現

在這個 vivado 的波形圖中，我的 add\_sub\_res 是 a, b 的值做相加減，在 clk 上緣時，會把 add\_sub\_res 的 value 做更新，接著是跟經過 pipe 的 c 做運算，得出 d，並且跟 expected\_d 做比較，可以發現這裡比較後都是正確的，且我的 behavioral 跟 post\_implementation 兩張波形圖完全相同，可以證明我電路的功能一致性。

## Project Summary-Overview 截圖，含 Utilization、Timing

```

Tcl Console x Messages Log
Q | H | D | II | R | B | W |
RTL Verification: Test 283 | Inputs(a=233, b= 46, c=198, s=1) -> Output d=55242, Expected=55242, Status: PASS
RTL Verification: Test 284 | Inputs(a=170, b=160, c=240, s=1) -> Output d=13664, Expected=13664, Status: PASS
RTL Verification: Test 285 | Inputs(a=232, b=205, c=187, s=1) -> Output d=16183, Expected=16183, Status: PASS
RTL Verification: Test 286 | Inputs(a=223, b=125, c=159, s=0) -> Output d=15582, Expected=15582, Status: PASS
RTL Verification: Test 287 | Inputs(a=151, b= 6, c=196, s=1) -> Output d=30772, Expected=30772, Status: PASS
RTL Verification: Test 288 | Inputs(a=215, b= 93, c=115, s=1) -> Output d=35420, Expected=35420, Status: PASS
RTL Verification: Test 289 | Inputs(a=127, b= 78, c=224, s=1) -> Output d=45920, Expected=45920, Status: PASS
RTL Verification: Test 290 | Inputs(a= 57, b=129, c=143, s=1) -> Output d=26598, Expected=26598, Status: PASS
RTL Verification: Test 291 | Inputs(a=254, b= 93, c= 51, s=1) -> Output d=17697, Expected=17697, Status: PASS
RTL Verification: Test 292 | Inputs(a=105, b= 86, c= 92, s=0) -> Output d=1748, Expected=1748, Status: PASS
RTL Verification: Test 293 | Inputs(a= 73, b=211, c=216, s=0) -> Output d=15248, Expected=15248, Status: PASS
RTL Verification: Test 294 | Inputs(a=120, b= 57, c= 9, s=1) -> Output d= 1593, Expected= 1593, Status: PASS
RTL Verification: Test 295 | Inputs(a= 86, b= 44, c= 26, s=1) -> Output d= 3380, Expected= 3380, Status: PASS
RTL Verification: Test 296 | Inputs(a= 0, b=172, c=108, s=0) -> Output d=36720, Expected=36720, Status: PASS
RTL Verification: Test 297 | Inputs(a=160, b= 29, c=218, s=1) -> Output d=41202, Expected=41202, Status: PASS
RTL Verification: Test 298 | Inputs(a=216, b=192, c=209, s=0) -> Output d= 5016, Expected= 5016, Status: PASS
RTL Verification: Test 299 | Inputs(a= 81, b= 81, c=179, s=1) -> Output d=28998, Expected=28998, Status: PASS
RTL Verification: Test 300 | Inputs(a=126, b=128, c= 72, s=0) -> Output d=36720, Expected=36720, Status: PASS

=====
SIMULATION PASSED! All 300 test cases are correct.

=====
$finish called at time : 3037 ns : File "C:/Users/airlab/Documents/HDL/HW_2/VivodoUsed/pipelined/testbench.v" Line 99
INFO: [USF-XSim-96] XSim completed. Design snapshot 'pre_syn_testbench_behav' loaded.
INFO: [USF-XSim-97] XSim simulation ran for -all

```

**Project Summary** testbench.v

[Overview](#) | [Dashboard](#)

Board overview: ZedBoard Zynq Evaluation and Development Kit

[Changes](#)

| Synthesis        |                                  | Implementation              |                                       | Summary      |  |
|------------------|----------------------------------|-----------------------------|---------------------------------------|--------------|--|
| Status:          | ✓ Complete                       | Status:                     | ✓ Complete                            | Route Status |  |
| Messages:        | 1 critical warning<br>2 warnings | Messages:                   | 1 critical warning                    |              |  |
| Part:            | xc7z020clg484-1                  | Part:                       | xc7z020clg484-1                       |              |  |
| Strategy:        | Vivado Synthesis Defaults        | Strategy:                   | Vivado Implementation Defaults        |              |  |
| Report Strategy: | Vivado Synthesis Default Reports | Report Strategy:            | Vivado Implementation Default Reports |              |  |
|                  |                                  | Incremental implementation: | None                                  |              |  |

| DRC Violations                         |                                  | Timing                                    |          | Setup   Hold   Pulse Width |  |
|----------------------------------------|----------------------------------|-------------------------------------------|----------|----------------------------|--|
| Summary:                               | 2 critical warnings<br>1 warning | Worst Negative Slack (WNS):               | 3.136 ns |                            |  |
| <a href="#">Implemented DRC Report</a> |                                  | Total Negative Slack (TNS):               | 0 ns     |                            |  |
|                                        |                                  | Number of Falling Endpoints:              | 0        |                            |  |
|                                        |                                  | Total Number of Endpoints:                | 16       |                            |  |
|                                        |                                  | <a href="#">Implemented Timing Report</a> |          |                            |  |

| Utilization |  | Post-Synthesis   Post-Implementation |       | Power                                    |                 | Summary   On-Chip |  |
|-------------|--|--------------------------------------|-------|------------------------------------------|-----------------|-------------------|--|
|             |  | Graph                                | Table | Total On-Chip Power:                     | 0.121 W         |                   |  |
|             |  | LUT: 1%                              |       | Junction Temperature:                    | 26.4 °C         |                   |  |
|             |  | FF: 1%                               |       | Thermal Margin:                          | 58.6 °C (4.9 W) |                   |  |
|             |  | IO: 22%                              |       | Effective SJA:                           | 11.5 °C/W       |                   |  |
|             |  | BUFG: 3%                             |       | Power supplied to off-chip devices:      | 0 W             |                   |  |
|             |  |                                      |       | Confidence level:                        | Low             |                   |  |
|             |  |                                      |       | <a href="#">Implemented Power Report</a> |                 |                   |  |