

# Team1\_Final\_project

## (1)AES-128

Block diagram:



Code:

```

addroundkey(state_grid,0,sub,ekey);
#pragma hls_pipeline_init_interval 1 //Catapult Constraint
#pragma hls_unroll yes //catapult C Constraint
loop_main : for(iteration = 1; iteration < nr; iteration++) {
    subbytes(sub,shift);
    shift_row_enc(shift,mix);
    mixcolumn(mix,round);
    addroundkey(round,iteration,sub,ekey);

}
subbytes(sub,shift);
shift_row_enc(shift,round);
addroundkey(round,nr,result,ekey);

```

## (2)Catapult hls

這次 catapult 我們有生出兩種架構，分別是想辦法讓他面積盡可能小和有最低的 latency 的架構，我們會去比較此兩種架構在 catapult 下的 area 和 timing，再與我們原先用 verilog 寫好的 aes 在 90nm 之下的 area 和 timing 去做比較。

下圖為我們驗證功能正確性的圖，上圖是我們 catapult 裡面模擬跑出來的結果，可以對上下圖產生出來的 aes output，表示我們的 catapult 產生的 aes 功能是正確的。

```
const char* state_str = "ThisIsPlaintextA";
const char* key_str = "ThisIsKeyyyyyyyyy";
```

```
# Message
# make: Warning: File `scverify/orig_cxx_osci/scverify_top' has modification
# =====
# Simulating design
# cd ../../..; ./Catapult_2/aes_enc.v4/scverify/orig_cxx_osci/scverify_top
#   92 73 38 4a 46 ea 35 36 7b 64 91 82 92 2c d3 bc
# finish
```

| Text to Encrypt  | Enter Secret Key  | AES Encrypted Output             |
|------------------|-------------------|----------------------------------|
| ThisIsPlaintextA | ThisIsKeyyyyyyyyy | 9273384A46EA35367B649182922CD3BC |

## Aes\_enc.v6(latency 最低)

catapult hls 45nm



紅圈由左到右分別是 latency、throughput、slack 和 area，分別是 18、17、0 和 54377，此 architecture 為我們想盡可能讓其 latency 低，最後確實也得到最低的 latency 18T。

90nm

|                        |              |
|------------------------|--------------|
| Combinational area:    | 34758.460162 |
| Buf/Inv area:          | 3499.523939  |
| Noncombinational area: | 7522.502594  |
| Macro/Black Box area:  | 0.000000     |
| Net Interconnect area: | 0.000000     |
| Total cell area:       | 42280.962755 |
| Total area:            | 42280.962755 |

  

|                                                                             |             |
|-----------------------------------------------------------------------------|-------------|
| data arrival time                                                           | 9.47        |
| clock clk (rise edge)                                                       | 10.00       |
| clock network delay (ideal)                                                 | 0.00        |
| clock uncertainty                                                           | -0.50       |
| aes_enc_run_inst/aes_enc_keyexpansion_for_1_37_aes_enc_keyexpansion_for_1_x | 0.00 9.50 r |
| library setup time                                                          | -0.03       |
| data required time                                                          | 9.47        |
| -----                                                                       |             |
| data required time                                                          | 9.47        |
| data arrival time                                                           | -9.47       |
| -----                                                                       |             |
| slack (MET)                                                                 | 0.00        |

以上為我們在 90nm 的製程檔下進行合成的結果，其面積約為 42280um<sup>2</sup>，slack 為 0.00。

## Aes\_enc.v8(area 小)

catapult hls 45nm



此 architecture 為我們想盡可能讓其面積小，最後得出的面積約為 29917.95um<sup>2</sup>，較上面的面積少了約 15000um<sup>2</sup>，但其 latency 為 583T 大相當多。

90nm

|                        |              |
|------------------------|--------------|
| Combinational area:    | 12403.237970 |
| Buf/Inv area:          | 1840.255165  |
| Noncombinational area: | 11758.446326 |
| Macro/Black Box area:  | 0.000000     |
| Net Interconnect area: | 0.000000     |
| Total cell area:       | 24161.684296 |
| Total area:            | 24161.684296 |

|                                                                                |        |
|--------------------------------------------------------------------------------|--------|
| data arrival time                                                              | 6.26   |
| clock clk (rise edge)                                                          | 10.00  |
| clock network delay (ideal)                                                    | 0.00   |
| clock uncertainty                                                              | -0.50  |
| aes_enc_run_inst/aes_enc_keyexpansion_temp_0_lpi_1_reg[6]/CK (DFFQ_X0P5M_A9TR) | 9.50 r |
| library setup time                                                             | 0.00   |
| data required time                                                             | -0.04  |
| -----                                                                          | -----  |
| data required time                                                             | 9.46   |
| data arrival time                                                              | -6.26  |
| -----                                                                          | -----  |
| slack (MET)                                                                    | 3.20   |

而在我們 90nm 的製程檔去合成的情況下，其面積為 24161um<sup>2</sup>，slack 為 3.20，因為我們原先 verilog 的目的為盡可能縮減 area，所以我們後面會將此 aes\_enc 去和我們的 aes 做比較。

### (3) Compare with Our's verilog aes

|                                                    |               |
|----------------------------------------------------|---------------|
| sc9_cln40g_base_rvt_tt_typical_max_0p90v_25c()     |               |
| /SynopsysDC/db/sc9_base_rvt/sc9_cln40g_base_rvt_tb |               |
| Number of ports:                                   | 1344          |
| Number of nets:                                    | 5352          |
| Number of cells:                                   | 4256          |
| Number of combinational cells:                     | 3516          |
| Number of sequential cells:                        | 673           |
| Number of macros/black boxes:                      | 0             |
| Number of buf/inv:                                 | 622           |
| Number of references:                              | 61            |
| Combinational area:                                | 12403.237970  |
| Buf/Inv area:                                      | 1840.255165   |
| Noncombinational area:                             | 11758.446326  |
| Macro/Black Box area:                              | 0.000000      |
| Net Interconnect area:                             | 0.000000      |
| Total cell area:                                   | 6264.215955   |
| Total area:                                        | 6264.215995   |
| Combinational area:                                | 124161.684296 |
| Total cell area:                                   | 24161.684296  |
| Total area:                                        | 24161.684296  |

左圖為我們原先 verilog 在 90nm 的製程下所合成出來的結果，其 area 為  $6264\text{um}^2$ ，表現優於使用 catapult hls 所生出來的 rtl 所合成出來的結果  $24161\text{um}^2$ 。

| Des/Clust/Port              | Wire Load Model | Library             | Path | data arrival time                                                              |       |        |
|-----------------------------|-----------------|---------------------|------|--------------------------------------------------------------------------------|-------|--------|
| aes                         | Small           | sc9_cln40g_base_rvt |      |                                                                                | 6.26  |        |
| Point                       |                 |                     | Incr | Path                                                                           |       |        |
| clock clk (rise edge)       | 0.00            | 0.00                |      | clock clk (rise edge)                                                          | 10.00 | 10.00  |
| clock network delay (ideal) | 0.00            | 0.00                |      | clock network delay (ideal)                                                    | 0.00  | 10.00  |
| input external delay        | 4.00            | 4.00                | F    | clock uncertainty                                                              | -0.50 | 9.50   |
| dout_rdy (in)               | 0.00            | 4.00                | F    |                                                                                |       |        |
| U1330/Y (AND2_X1M_A9TR)     | 0.05            | 4.05                | F    | aes_enc_run_inst/aes_enc_keyexpansion_temp_0_lpi_1_reg[6]/CK (DFFQ_X0P5M_A9TR) | 0.00  | 9.50 r |
| U1510/Y (INV_X3M_A9TR)      | 0.21            | 4.26                | r    |                                                                                |       |        |
| U1649/Y (NOR2_X3B_A9TR)     | 0.29            | 4.56                | F    |                                                                                |       |        |
| dout[39] (out)              | 0.00            | 4.56                | F    | library setup time                                                             | -0.04 | 9.46   |
| data arrival time           |                 |                     |      | data required time                                                             |       | 9.46   |
| clock clk (rise edge)       | 10.00           | 10.00               |      |                                                                                |       |        |
| clock network delay (ideal) | 0.00            | 10.00               |      |                                                                                |       |        |
| clock uncertainty           | -0.50           | 9.50                |      |                                                                                |       |        |
| output external delay       | -4.00           | 5.50                |      | data required time                                                             | 9.46  |        |
| data required time          |                 | 5.50                |      | data arrival time                                                              |       | -6.26  |
| data required time          |                 | 5.50                |      |                                                                                |       |        |
| data arrival time           |                 | -4.56               |      |                                                                                |       |        |
| slack (MET)                 |                 | 0.94                |      | slack (MET)                                                                    |       | 3.20   |

Slack 的部分，左圖為我們 verilog 的 slack 約為 0.94，對比使用 catapult hls 所生出來的 rtl 所合成出來的結果 3.20。

而我們原先 verilog 的 Latency 表現約 400T，其表現也優於 catapult hls 所生出來的 latency 表現 583T，因此我們推測 catapult 或許對於時間最佳化的表現較好，所以我們後續進行 integrate 的 rtl 為 latency 表現較好的 aes\_enc.v6。

## (4)Integrate into FSIC

此次將 final project integrate into FSIC 的流程我們基本上照著 lab2-2 的實作過程，首先第一張圖是我們所使用的 filelist，concat\_rtl\_less\_lat.v 是我們這次所使用的有最小 latency 的 rtl，其餘檔案基本上和 lab2-2 的一樣。

```
1 .../fpga.v
2 .../rtl/axi_ctrl_logic.sv
3 .../rtl/axil_axis.sv
4 .../rtl/axilite_master.sv
5 .../rtl/axilite_slave.sv
6 .../rtl/axis_master.sv
7 .../rtl/axis_slave.sv
8 // .../rtl/axis_switch.v
9 .../rtl/sw_caravel.v
10 .../rtl/config_ctrl.v
11 .../rtl/fsic_cikrst.v
12 .../rtl/fsic_clock.v
13 .../rtl/fsic_coreclk_phase_cnt.v
14 .../rtl/fsic_io_serdes_rx.v
15 .../rtl/fsic.v
16 .../rtl/io_serdes.v
17 .../rtl/logic_anlz.dummy_io.v
18 .../rtl/logic_anlz.dummy_io.vd
19 // .../rtl/mprij_io.dummy_io.v
20 // .../rtl/mprij_io.dummy_io.vd
21 .../rtl/mprij_io.v
22 .../rtl/user_subsys.all.v
23 .../rtl/user_prj0.v
24 .../rtl/user_prj1.v
25 .../rtl/user_prj2.v
26 .../rtl/user_prj3.v
27 .../rtl/sram.v
28 .../rtl(concat_rtl_less_lat.v)
```

我們這次的將我們的 aes 放在 user\_prj0，透過給 protocol 來和 fsic 溝通，以下是操作原理與 port information。

Port Information

| Port Name         | Port Address | Width (bit) | Port Functionality             |
|-------------------|--------------|-------------|--------------------------------|
| reg_rst           | 0x00         | 1           | reset IP when reg_rst = 1      |
| key0              | 0x10         | 32          | 32-bit key for encryption      |
| key1              | 0x14         | 32          | 32-bit key for encryption      |
| key2              | 0x18         | 32          | 32-bit key for encryption      |
| key3              | 0x1C         | 32          | 32-bit key for encryption      |
| dlen              | 0x08         | 32          | length of data to encrypt      |
| state (ss_tdata)  | Dynamic      | 128         | state input through ss_tdata   |
| cipher (sm_tdata) | Dynamic      | 128         | cipher output through sm_tdata |

1. 系統初始化
2. 當有 awvalid/wvalid 時，根據 awaddr 寫入不同的 register (key)
3. 當有 arvalid 時表示有 read request，當 arready 時，即根據 araddr 進行讀取不同的 register
4. 當 ss\_tready & ss\_tvalid → 接收 state
5. 將接收的 state 和 key 進行 aes
6. 當 sm\_tready & sm\_tvalid → 輸出 cipher

最後跑出來的結果如下兩圖，我們總共測兩種 pattern，兩者都能夠通過，final result 最後也通過。

```

15925=> test002 [PASS] soc_to_fpga_axis_expect_count = 16, soc_to_fpga_axis_captured_count = 16
15925=> test002 [PASS] idx3= 0, soc_to_fpga_axis_expect_value[ 0] = 000049bdb9b6, soc_to_fpga_axis_captured[ 0] = 000049bdb9b6
15925=> test002 [PASS] idx3= 1, soc_to_fpga_axis_expect_value[ 1] = 000081d9576c, soc_to_fpga_axis_captured[ 1] = 000081d9576c
15925=> test002 [PASS] idx3= 2, soc_to_fpga_axis_expect_value[ 2] = 0000b131a0b5, soc_to_fpga_axis_captured[ 2] = 0000b131a0b5
15925=> test002 [PASS] idx3= 3, soc_to_fpga_axis_expect_value[ 3] = 00003ba9432f, soc_to_fpga_axis_captured[ 3] = 00003ba9432f
15925=> test002 [PASS] idx3= 4, soc_to_fpga_axis_expect_value[ 4] = 000049bdb9b6, soc_to_fpga_axis_captured[ 4] = 000049bdb9b6
15925=> test002 [PASS] idx3= 5, soc_to_fpga_axis_expect_value[ 5] = 000081d9576c, soc_to_fpga_axis_captured[ 5] = 000081d9576c
15925=> test002 [PASS] idx3= 6, soc_to_fpga_axis_expect_value[ 6] = 0000b131a0b5, soc_to_fpga_axis_captured[ 6] = 0000b131a0b5
15925=> test002 [PASS] idx3= 7, soc_to_fpga_axis_expect_value[ 7] = 00003ba9432f, soc_to_fpga_axis_captured[ 7] = 00003ba9432f
15925=> test002 [PASS] idx3= 8, soc_to_fpga_axis_expect_value[ 8] = 000049bdb9b6, soc_to_fpga_axis_captured[ 8] = 000049bdb9b6
15925=> test002 [PASS] idx3= 9, soc_to_fpga_axis_expect_value[ 9] = 000081d9576c, soc_to_fpga_axis_captured[ 9] = 000081d9576c
15925=> test002 [PASS] idx3= 10, soc_to_fpga_axis_expect_value[10] = 0000b131a0b5, soc_to_fpga_axis_captured[10] = 0000b131a0b5
15925=> test002 [PASS] idx3= 11, soc_to_fpga_axis_expect_value[11] = 00003ba9432f, soc_to_fpga_axis_captured[11] = 00003ba9432f
15925=> test002 [PASS] idx3= 12, soc_to_fpga_axis_expect_value[12] = 000049bdb9b6, soc_to_fpga_axis_captured[12] = 000049bdb9b6
15925=> test002 [PASS] idx3= 13, soc_to_fpga_axis_expect_value[13] = 000081d9576c, soc_to_fpga_axis_captured[13] = 000081d9576c
15925=> test002 [PASS] idx3= 14, soc_to_fpga_axis_expect_value[14] = 0000b131a0b5, soc_to_fpga_axis_captured[14] = 0000b131a0b5
15925=> test002 [PASS] idx3= 15, soc_to_fpga_axis_expect_value[15] = 00003ba9432f, soc_to_fpga_axis_captured[15] = 00013ba9432f

25305=> test002 [PASS] soc_to_fpga_axis_expect_count = 16, soc_to_fpga_axis_captured_count = 16
25305=> test002 [PASS] idx3= 0, soc_to_fpga_axis_expect_value[ 0] = 0000f874918b, soc_to_fpga_axis_captured[ 0] = 0000f874918b
25305=> test002 [PASS] idx3= 1, soc_to_fpga_axis_expect_value[ 1] = 0000e84a3016, soc_to_fpga_axis_captured[ 1] = 0000e84a3016
25305=> test002 [PASS] idx3= 2, soc_to_fpga_axis_expect_value[ 2] = 00002f60218, soc_to_fpga_axis_captured[ 2] = 00002f60218
25305=> test002 [PASS] idx3= 3, soc_to_fpga_axis_expect_value[ 3] = 00003975cf8b, soc_to_fpga_axis_captured[ 3] = 00003975cf8b
25305=> test002 [PASS] idx3= 4, soc_to_fpga_axis_expect_value[ 4] = 000010f40295, soc_to_fpga_axis_captured[ 4] = 000010f40295
25305=> test002 [PASS] idx3= 5, soc_to_fpga_axis_expect_value[ 5] = 000039c11671, soc_to_fpga_axis_captured[ 5] = 000039c11671
25305=> test002 [PASS] idx3= 6, soc_to_fpga_axis_expect_value[ 6] = 00008330d4eb, soc_to_fpga_axis_captured[ 6] = 00008330d4eb
25305=> test002 [PASS] idx3= 7, soc_to_fpga_axis_expect_value[ 7] = 0000a4bc2787, soc_to_fpga_axis_captured[ 7] = 0000a4bc2787
25305=> test002 [PASS] idx3= 8, soc_to_fpga_axis_expect_value[ 8] = 00005d12641a, soc_to_fpga_axis_captured[ 8] = 00005d12641a
25305=> test002 [PASS] idx3= 9, soc_to_fpga_axis_expect_value[ 9] = 00005d12641a, soc_to_fpga_axis_captured[ 9] = 00005d12641a
25305=> test002 [PASS] idx3= 10, soc_to_fpga_axis_expect_value[10] = 00007e22d51c, soc_to_fpga_axis_captured[10] = 00007e22d51c
25305=> test002 [PASS] idx3= 11, soc_to_fpga_axis_expect_value[11] = 000086664ac7, soc_to_fpga_axis_captured[11] = 000086664ac7
25305=> test002 [PASS] idx3= 12, soc_to_fpga_axis_expect_value[12] = 00003e87a927, soc_to_fpga_axis_captured[12] = 00003e87a927
25305=> test002 [PASS] idx3= 13, soc_to_fpga_axis_expect_value[13] = 000088c09ac9, soc_to_fpga_axis_captured[13] = 000088c09ac9
25305=> test002 [PASS] idx3= 14, soc_to_fpga_axis_expect_value[14] = 00008543d9cf, soc_to_fpga_axis_captured[14] = 00008543d9cf
25305=> test002 [PASS] idx3= 15, soc_to_fpga_axis_expect_value[15] = 0001e4008bf7, soc_to_fpga_axis_captured[15] = 0001e4008bf7

=====
=====

25905=> Final result [PASS], check_cnt = 0059, error_cnt = 0000
=====

$finish called at time : 25905 ns : File "/home/ubuntu/soc2_fstl/fsic/testbench/tb_fstc.v" Line 440
#D quit
INFO: [Common 17-206] Exiting xsim at Sun Jun 16 22:23:16 2024...
ubuntu@ubuntu2004:/soc2_fstl/%$c/testbench/tc5

```

波型圖如下，測一次會給 16 筆 data，每 4 筆要傳輸時會拉 tvalid，當最後 4 筆傳輸出去後會拉 tlast，而我們做完 16 筆大約花  $25205 - 22390 = 2815$ ns。



## (5)Synopsys flow

### 1. Lab Synthesis:

**Area: 37975um<sup>2</sup>**

```
9 Library(s) Used:  
10      saed14rvt_tt0p8v25c (File: /home/course/ee5252/lab_snps_flow/SAED14_EDK_LAB/SAED14_EDK_LAB/lib/stdcell_rvt/db_nldm/saed14rvt_tt0p8v25c.db)  
11  
12 Number of ports:          4357  
13 Number of nets:           40972  
14 Number of cells:          36701  
15 Number of combinational cells: 34655  
16 Number of sequential cells: 1943  
17 Number of macros/black boxes: 0  
18 Number of buf/inv:          5615  
19 Number of references:       1  
20  
21 Combinational area:      10770.285600  
22 Buf/Inv area:             1199.998809  
23 Noncombinational area:    1666.731556  
24 Macro/Black Box area:     0.000000  
25 Net Interconnect area:   25538.404033  
26  
27 Total cell area:         12437.017156  
28 Total area:               37975.421189  
29  
30 Hierarchical area distribution  
31  
32
```

**Timing: clock clk 8**

|                    |       |      |
|--------------------|-------|------|
| library setup time | 0.00  | 7.70 |
| data required time | 7.70  | 7.70 |
| -----              |       |      |
| data required time | 7.70  |      |
| data arrival time  | -3.88 |      |
| -----              |       |      |
| slack (MET)        | 3.82  |      |

**Slack : 3.82**

**Power:**

| Hierarchy                      | Switch Power | Int Power | Leak Power | Total Power | %     |
|--------------------------------|--------------|-----------|------------|-------------|-------|
| aes_enc                        | 294.702      | 482.837   | 5.40e+06   | 782.935     | 100.0 |
| aes_enc_run_inst (aes_enc_run) | 294.702      | 482.836   | 5.40e+06   | 782.934     | 100.0 |

## 2. Lab\_1\_planning:

### Step 1 data\_setup:

```
##### Import_Design
read_verilog "$Core_compile"
Loading verilog file '/home/m112/m112061601/HLS/Final_syn/input/aes_enc.v'
Information: Reading Verilog into new design 'aes_enc' in library 'aes_enc'. (VR-012)
Number of modules read: 104
Top level ports: 395
Total ports in all modules: 4357
Total nets in all modules: 41026
Total instances in all modules: 36701
Elapsed = 00:00:00.30, CPU = 00:00:00.29
1
current_design ${DESIGN_NAME}
aes_enc:aes_enc.design
```

### Layout:



## Step 2 floorplan:

### Source used:

```
phase4. Total Wire Length = 391652.07
phase4. Layer M1 wire length = 6236.86
phase4. Layer M2 wire length = 119249.86
phase4. Layer M3 wire length = 141502.48
phase4. Layer M4 wire length = 66439.81
phase4. Layer M5 wire length = 49039.23
phase4. Layer M6 wire length = 6137.72
phase4. Layer M7 wire length = 3046.12
phase4. Layer M8 wire length = 0.00
phase4. Layer M9 wire length = 0.00
phase4. Layer MRDL wire length = 0.00
phase4. Total Number of Contacts = 271513
phase4. Via VIA12SQ_C count = 102788
phase4. Via VIA23SQ_C count = 137313
phase4. Via VIA34SQ_C count = 20342
phase4. Via VIA45SQ count = 10297
phase4. Via VIA56SQ count = 561
phase4. Via VIA67SQ_C count = 212
phase4. Via VIA78SQ_C count = 0
phase4. Via VIA89_C count = 0
phase4. Via VIA9RDL count = 0
phase4. completed.
```

No violations:

```
980 Wire length report (all)
981 =====
982 wire length in design temp_data_setup: 350874.150 microns.
983 wire length in design temp_data_setup (see through blk pins): 350874.150 microns.
984
985 Physical hierarchy violations report
986 =====
987 Violations in design temp_data_setup:
988 0 cells have placement violation.
989
990 Voltage area violations report
991 =====
992 Voltage area placement violations in design temp_data_setup:
993 0 cells placed outside the voltage area which they belong to.
994
995 Information: Default error view temp_data_setup_dpplace.err is created in GUI error browser. (DPP-054)
```

Layout:



刻度為 269.806um \* 269 um

```
Congestion utilization per direction:
Average vertical track utilization = 9.05 %
Peak vertical track utilization = 83.33 %
Average horizontal track utilization = 8.08 %
Peak horizontal track utilization = 62.07 %
```

### Step 3 powerplan:

```
257 check_pg_connectivity -nets "VDD VSS"
258 Loading cell instances...
259 Number of Standard Cells: 36598
260 Number of Macro Cells: 0
261 Number of IO Pad Cells: 0
262 Number of Blocks: 0
263 Loading P/G wires and vias...
264 Number of VDD Wires: 336
265 Number of VDD Vias: 4096
266 Number of VDD Terminals: 249
267 Number of VSS Wires: 338
268 Number of VSS Vias: 4225
269 Number of VSS Terminals: 253
270 *****Verify net VDD connectivity*****
271 Number of floating wires: 208
272 Number of floating vias: 0
273 Number of floating std cells: 36598
274 Number of floating hard macros: 0
275 Number of floating I/O pads: 0
276 Number of floating terminals: 1
277 Number of floating hierarchical blocks: 0
278 ****Verify net VSS connectivity*****
279 ****
280 Number of floating wires: 208
281 Number of floating vias: 0
282 Number of floating std cells: 36598
283 Number of floating hard macros: 0
284 Number of floating I/O pads: 0
285 Number of floating terminals: 1
286 Number of floating hierarchical blocks: 0
287 ****
288 Overall runtime: 0 seconds.
```

### Layout:



### 3. Lab2\_pnr:

Step 4 place:

```
3564 -----
3565 WNS of each timing group:           s1      s2
3566 -----
3567 clk                         0.1842  0.0019
3568 -----
3569 Setup WNS:                   0.1842  0.0019  0.0019
3570 Setup TNS:                 0.0000  0.0000  0.0000
3571 Number of setup violations:    0       0       0
3572 Hold WNS:                  0.0259  0.0261  0.0259
3573 Hold TNS:                 0.0000  0.0000  0.0000
3574 Number of hold violations:   0       0       0
3575 Number of max trans violations: 0       0       0
3576 Number of max cap violations:  0       0       0
3577 Number of min pulse width violations: 0       0       0
3578 -----
3579 Area:                      12207.469
3580 Cell count:                35042
3581 Buf/inv cell count:        4059
3582 Std cell utilization:     0.1966
3583 CPU(s):                   456
3584 Mem(Mb):                  914
3585 Host name:                ws24
3586 -----
3587 Histogram:          s1      s2
3588 -----
```

Layout:



## Step 5 Clock Tree Synthesis:

| WNS of each timing group:             | s1        | s2     |
|---------------------------------------|-----------|--------|
| clk                                   | 1.0740    | 0.4295 |
| Setup WNS:                            | 1.0740    | 0.4295 |
| Setup TNS:                            | 0.0000    | 0.0000 |
| Number of setup violations:           | 0         | 0      |
| Hold WNS:                             | 0.0214    | 0.0226 |
| Hold TNS:                             | 0.0000    | 0.0000 |
| Number of hold violations:            | 0         | 0      |
| Number of max trans violations:       | 0         | 0      |
| Number of max cap violations:         | 3         | 20     |
| Number of min pulse width violations: | 0         | 0      |
| Area:                                 | 12437.017 |        |
| Cell count:                           | 35962     |        |
| Buf/inv cell count:                   | 3794      |        |
| Std cell utilization:                 | 0.2011    |        |
| CPU(s):                               | 942       |        |
| Mem(Mb):                              | 1685      |        |
| Host name:                            | ws24      |        |
| Histogram:                            | s1        | s2     |

Layout:



## Step 6 Route:

| WNS of each timing group:             | s1        | s2     |
|---------------------------------------|-----------|--------|
| clk                                   | 1.8783    | 1.3453 |
| Setup WNS:                            | 1.8783    | 1.3453 |
| Setup TNS:                            | 0.0000    | 0.0000 |
| Number of setup violations:           | 0         | 0      |
| Hold WNS:                             | 0.0264    | 0.0267 |
| Hold TNS:                             | 0.0000    | 0.0000 |
| Number of hold violations:            | 0         | 0      |
| Number of max trans violations:       | 0         | 0      |
| Number of max cap violations:         | 0         | 3      |
| Number of min pulse width violations: | 0         | 0      |
| Area:                                 | 12416.016 |        |
| Cell count:                           | 35951     |        |
| Buf/inv cell count:                   | 3783      |        |
| Std cell utilization:                 | 0.2008    |        |
| CPU(s):                               | 1690      |        |
| Mem(Mb):                              | 1075      |        |
| Host name:                            | ws24      |        |

## Layout:



## 4. Lab4\_finishing:

### Step 7 finishing:

```
VERSION : 1
DIVIDERCHAR : 1
BUSBITCHARS : 1
DESIGN : 1
UNITS : 1
PROPERTYDEFINITIONS : 1
DIEAREA : 1
ROW : 415
TRACKS : 20
VIAS : 38
NONDEFAULTRULES : 1
COMPONENTS : 200863
PINS : 395
PINPROPERTIES : 395
SPECIALNETS : 4237
NETS : 37230
1
```

Layout:



## 5. Lab\_formal:

```
9***** Verification Results *****
0Verification SUCCEEDED
1-----
2 Reference design: r:/WORK/aes_enc
3 Implementation design: i:/WORK/aes_enc
4 2074 Passing compare points
5-----
6 Matched Compare Points      BBPin     Loop    BBNet     Cut     Port     DFF     LAT     TOTAL
7
8 Passing (equivalent)      0         0        0        0       131      1943      0     2074
9 Failing (not equivalent)  0         0        0        0       0         0        0     0
0*****
1 1
2 quit
3
4 Maximum memory usage for this session: 731 MB
5 CPU usage for this session: 23.2 seconds ( 0.01 hours )
6 Current time: Sun Jun 16 15:46:26 2024
7 Elapsed time: 128 seconds ( 0.04 hours )
8
9 Thank you for using Formality (R)!
```

**Verification SUCCEEDED**

## 6. lab\_StarRC:

|                      |                                    |         |            |
|----------------------|------------------------------------|---------|------------|
| Setup                | Elp=00:00:35 Cpu=00:00:01 Usr=1.6  | Sys=0.0 | Mem=454.5  |
| Layers_part1         | Elp=00:00:01 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=454.6  |
| Layers_part2         | Elp=00:00:00 Cpu=00:00:00 Usr=0.1  | Sys=0.0 | Mem=456.3  |
| HN_partHN            | Elp=00:00:00 Cpu=00:00:00 Usr=0.4  | Sys=0.0 | Mem=469.1  |
| HN_partSplitTopDef1  | Elp=00:00:01 Cpu=00:00:00 Usr=0.2  | Sys=0.0 | Mem=399.9  |
| HN_partSplitTopDef2  | Elp=00:00:01 Cpu=00:00:00 Usr=0.2  | Sys=0.0 | Mem=399.8  |
| HN_partSplitTopDef3  | Elp=00:00:00 Cpu=00:00:00 Usr=0.3  | Sys=0.0 | Mem=399.8  |
| Cells_partStdCells1  | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=463.5  |
| Cells_partTopCell1   | Elp=00:00:01 Cpu=00:00:00 Usr=0.5  | Sys=0.0 | Mem=471.3  |
| Cells_partTopCell2   | Elp=00:00:01 Cpu=00:00:00 Usr=0.4  | Sys=0.0 | Mem=467.2  |
| Cells_partTopCell3   | Elp=00:00:00 Cpu=00:00:00 Usr=0.3  | Sys=0.0 | Mem=467.2  |
| MergeCells           | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=456.6  |
| Translate_part1      | Elp=00:00:01 Cpu=00:00:00 Usr=0.8  | Sys=0.0 | Mem=456.6  |
| Translate_part2      | Elp=00:00:01 Cpu=00:00:00 Usr=0.5  | Sys=0.0 | Mem=456.6  |
| XinCombine           | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=394.9  |
| PartitionPP          | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=456.6  |
| MergeIndexPins       | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=456.6  |
| MergeIndex           | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=456.6  |
| MergeGpdInstPins     | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=456.6  |
| GPD_XtractSetup      | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=457.9  |
| NetlistSetup         | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=397.8  |
| NetlistAssemblySetup | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=394.9  |
| CleanDirectory       | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=394.9  |
| GPD_NameMap          | Elp=00:00:01 Cpu=00:00:00 Usr=0.2  | Sys=0.0 | Mem=397.9  |
| StitchPreProcess1    | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=0.0    |
| xTract_part1         | Elp=00:00:41 Cpu=00:00:38 Usr=35.9 | Sys=2.2 | Mem=1389.0 |
| xTract_part2         | Elp=00:00:00 Cpu=00:00:00 Usr=0.4  | Sys=0.1 | Mem=470.0  |
| xTract_part3         | Elp=00:00:01 Cpu=00:00:00 Usr=0.5  | Sys=0.1 | Mem=470.0  |
| xTract_part4         | Elp=00:00:00 Cpu=00:00:00 Usr=0.5  | Sys=0.1 | Mem=470.0  |
| StitchPreProcess2    | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=271.6  |
| Stitch_part1         | Elp=00:00:00 Cpu=00:00:00 Usr=0.2  | Sys=0.0 | Mem=278.0  |
| Stitch_part2         | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=278.0  |
| Stitch_part3         | Elp=00:00:01 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=278.0  |
| xTractPP             | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=278.4  |
| ReportViolations     | Elp=00:00:00 Cpu=00:00:00 Usr=0.1  | Sys=0.0 | Mem=399.6  |
| ReportOpens          | Elp=00:00:01 Cpu=00:00:00 Usr=0.1  | Sys=0.0 | Mem=394.9  |
| Netlist_assembly1    | Elp=00:00:00 Cpu=00:00:00 Usr=0.2  | Sys=0.0 | Mem=275.3  |
| Netlist_assembly2    | Elp=00:00:00 Cpu=00:00:00 Usr=0.1  | Sys=0.0 | Mem=274.6  |
| Netlist_assembly3    | Elp=00:00:01 Cpu=00:00:00 Usr=0.1  | Sys=0.0 | Mem=274.9  |
| Netlist_assembly4    | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=273.8  |
| GPD_PostProcess      | Elp=00:00:00 Cpu=00:00:00 Usr=0.0  | Sys=0.0 | Mem=395.3  |
| GPD_Converter1       | Elp=00:00:01 Cpu=00:00:00 Usr=0.7  | Sys=0.0 | Mem=281.8  |
| GPD_Converter2       | Elp=00:00:01 Cpu=00:00:00 Usr=0.2  | Sys=0.0 | Mem=274.9  |
| GPD_Converter3       | Elp=00:00:00 Cpu=00:00:00 Usr=0.4  | Sys=0.0 | Mem=277.7  |
| GPD_Converter4       | Elp=00:00:01 Cpu=00:00:00 Usr=0.3  | Sys=0.0 | Mem=277.2  |
| GPD_Converter5       | Elp=00:00:00 Cpu=00:00:00 Usr=0.2  | Sys=0.0 | Mem=273.1  |
| GPD_Converter6       | Elp=00:00:00 Cpu=00:00:00 Usr=0.1  | Sys=0.0 | Mem=272.5  |
| GPD_Converter7       | Elp=00:00:00 Cpu=00:00:00 Usr=0.1  | Sys=0.0 | Mem=272.7  |

No violations:

```

GPD Converter Part 4: Mon Jun 17 00:08:23 2024
  Warnings: 0  Errors: 0
  GPD_Converter4 Elp=00:00:01 Cpu=00:00:00 Usr=0.3      Sys=0.0      Mem=277.2
Done

GPD Converter Part 5: Mon Jun 17 00:08:27 2024
  Warnings: 0  Errors: 0
  GPD_Converter5 Elp=00:00:00 Cpu=00:00:00 Usr=0.2      Sys=0.0      Mem=273.1
Done

GPD Converter Part 6: Mon Jun 17 00:08:30 2024
  Warnings: 0  Errors: 0
  GPD_Converter6 Elp=00:00:00 Cpu=00:00:00 Usr=0.1      Sys=0.0      Mem=272.5
Done

GPD Converter Part 7: Mon Jun 17 00:08:32 2024
  Warnings: 0  Errors: 0
  GPD_Converter7 Elp=00:00:00 Cpu=00:00:00 Usr=0.1      Sys=0.0      Mem=272.7
Done

GPD Converter Part 8: Mon Jun 17 00:08:35 2024
  Warnings: 0  Errors: 0
  GPD_Converter8 Elp=00:00:00 Cpu=00:00:00 Usr=0.1      Sys=0.0      Mem=272.2
Done

GPD Converter Merge Corner 1: Mon Jun 17 00:08:37 2024
  Warnings: 0  Errors: 0
  GPD_Converter_merge_c1 Elp=00:00:01 Cpu=00:00:00 Usr=0.0      Sys=0.0      Mem=271.6
Done

GPD Converter Merge Corner 2: Mon Jun 17 00:08:41 2024
  Warnings: 0  Errors: 0
  GPD_Converter_merge_c2 Elp=00:00:00 Cpu=00:00:00 Usr=0.0      Sys=0.0      Mem=271.6
Done

```

## (6)FPGA simulation

```
0Z0001258=> AXI4LITE_WRITE_BURST 800000020, value: 0002, resp: 00
xecuting Axi4 End Of Simulation checks
arning: AXI4_ERRS_RLAST_ALL_DONE_EOS: All outstanding read bursts must have com
leted a the end of the simulation.
ime: 62009858 ns Iteration: 0 Process: /fsic_tb/DUT/design_1_i/axi_vip_0/inst
IF/PC/Always3109_1895 Scope: fsic_tb.DUT.design_1_i.axi_vip_0.inst.IF.PC.arm_a
ba4_pc_msg_err File: /tools/Xilinx/Vivado/2022.1/data/xilinx_vip/hdl/axi_vip_a
i4pc.sv Line: 725
xecuting Axi4 End Of Simulation checks
xecuting Axi4 End Of Simulation checks
xecuting Axi4 End Of Simulation checks
finish called at time : 62009858 ns : File "/home/ubuntu/final_fpga/vivado/fsic
tb.sv" Line 182
un: Time (s): cpu = 00:12:34 ; elapsed = 00:17:37 . Memory (MB): peak = 3006.18
; gain = 0.000 ; free physical = 126 ; free virtual = 3768
exit
NFO: xsimkernel Simulation Memory Usage: 286492 KB (Peak: 335636 KB), Simulatio
CPU Usage: 1045690 ms
NFO: [Common 17-206] Exiting Vivado at Tue Jun 18 03:59:02 2024...
=====
vivado complete
=====
ubuntu@ubuntu2004:~/final_fpga/vivado$
```



## (7)FPGA validation

目前 validation 的部分尚未完成，預計在 6/23  
前於報告補上。