

# Final project - Recurrent Unit Circuit Design

---

b05902086 周逸

---

## 1. 設計取捨

- 一開始花了一段時間考慮要把哪些權重主動儲存在 circuit 上面，而因為最後計算的結果是看 AT 值(面積 $\times$ cycle 數 $\times$ cycle time)，如果把所有權重資料都存在 circuit 上，雖然會使得總 cycle 數量變很低，但是會導致節省下來的 cycle 數量的倍率跑到面積上面，而這樣會導致 circuit 過大，很難壓 cycle time。因此最後是選擇所有資料都重新讀取。
- 雖然原本以為把  $B_{ih}$  及  $B_{hh}$  放在 Circuit 上可以用少量的 area 換取一些 cycle 數上的節省，但是實際上實作後發現為了存  $B_{ih}$  及  $B_{hh}$  會導致多用了大約一半面積，反而得不償失，而 cycle 數只少了  $\frac{1}{64}$  左右。

## 2. Stage

- ~~—開始仔細思考之後，會發現基本上只要依照讀六種不同 memory 的位置來切狀態即可完成，並且會在其中的五個狀態中重複循環。(後來為了 pipeline 和編號的方便，因此改成八個狀態)~~

## 3. 乘法器

- 在寫之前就有預想到乘法部分會變成 critical path，而實際用 design compiler 跑下去結果也是如此。因此使用 booth algorithm 把他變成幾個數字的加法，然後把加總的部分使用 pipeline 來處理，這樣就能壓低 cycle time。

## 4. Pipeline

- 盡量把大步驟拆成多個小步驟，然後盡量把小步驟均勻的分散到每個 state 中，使得每個 state 的執行時間都差不多，就能達到 pipeline 的效果，並壓低 cycle time。

## 5. 讀寫優化

- ~~盡量讓每個 register 寫入之後都不要在同個 cycle 中去讀他，可以讓 compiler 更容易的優化這種東西。(後來直接把所有的 = 都換成 <=)~~

## 6. 乘法和加法單元

- 因為乘法和加法的部份仍然容易成為 critical path，因此把乘法和加法的部分拉到外面變成單獨一塊，這樣計算乘法和加法前就不需要 stage 等等的判斷。

## 7. Transistor-level 合成

- Transistor-level 合成的時候，在 nano Route 這個階段的時候，如果 WNS 不夠大，innovus 會直接 crash。後來發現的解決方法是事先先執行 ECO Design，然後到 Mode 裡面把 Thresholds 裡的 Setup Slack 提高，這樣就能讓 WNS 變成正比較大的值，而不是 0.000 附近。這樣之後就能順利的讓 nano Route 不會 crash 了。

## 8. Cycle time

- 這個 verilog 在 Gate-Level 合成的時候，用 2ns 當 cycle time 也能合成的出來 (timing 可以得到 MET)，但是拿去模擬的時候就會出現問題，而看起來主要都是 hold time violation，但是因為找不到解決的方式，最後只能把 cycle time 調整成 3.6 ns 才不會出錯。

## 9. 合成參數

- Gate-Level
  - Cycle time: 3.6 ns
- Transistor-Level

- Cycle time: 3.6 ns

## 10. 結果

- Gate-level results
  - Can you pass gate-level simulation?
    - yes
  - Cycle time that can pass your gate-level simulation:
    - 3.6 ns
  - Total simulation time:
    - 4685447.578 ns
  - Total cell area:
    - $140038.896131 \mu m^2$
  - Cell area  $\times$  Simulation time:
    - $656144906702.7875 \mu m^2 \cdot ns$
- Transistor-level results
  - Can you pass transistor-level simulation?
    - yes
  - Cycle time that can pass your transistor-level simulation:
    - 3.6 ns
  - Total simulation time:
    - 4685447.569 ns
  - Total cell area:
    - $147578.746 \mu m^2$
  - Cell area  $\times$  Simulation time:
    - $691472476681.7686 \mu m^2 \cdot ns$

## 11. 截圖

- RTL Pass

```
b5902086@cad30:~/project/RTL>
-----
reset==0
busy==1
*CSimulation interrupted at 2616355800 PS + 1
ncsim>
ncsim: *W,CMUSEX: Control-D in interactive input - one more to exit.
ncsim> exit
`[[A
(7.23 s) Sat Jun 20 06:17:48
(234)#+-/project/RTL7
(CAD)b5902086@cad30:[0]$ ncverilog testfixture.v RNN.v
ncverilog: 15.20-s039: (c) Copyright 1995-2017 Cadence Design Systems, Inc.
Loading snapshot worklib:testfixture:v ..... Done
ncsim> source /usr/cad/cadence/INCISIV/cur/tools/inca/files/ncsimrc
ncsim> run
ncsim: *W,DVEXACC: some objects excluded from $dumpvars due to access restrictions, use +access+r on command line for access to all objects.
  File: ./testfixture.v, line = 74, pos = 9
  Scope: testfixture
  Time: 0 FS + 0
-----
START!!! Simulation Start .....
-----
reset==0
busy==1
-----
----- S U M M A R Y -----
Congratulations! All data have been generated successfully! The result is PASS!!
-----
Simulation complete via $finish(1) at time 4685446800 PS + 0
./testfixture.v:171      #(`CYCLE/2); $finish;
ncsim> exit
(8.89 s) Sat Jun 20 06:17:58
(235)#+-/project/RTL7
(CAD)b5902086@cad30:[0]$
```

## o Gate-level Area Report

```
b5902086@cad30:~/project/RTL7
*****
Report : area
Design : RNN
Version: N-2017.09-SP2
Date : Sat Jun 20 05:00:08 2020
*****
Library(s) Used:
    slow (File: /home/raid7_2/course/cvsd/CBDK_IC_Contest/CIC/SynopsysDC/db/slow.db)

Number of ports:          98
Number of nets:           8968
Number of cells:          8888
Number of combinational cells: 5788
Number of sequential cells: 3108
Number of macros/black boxes: 0
Number of buf/inv:         849
Number of references:     117

Combinational area:      48559.218547
Buf/Inv area:            4992.053365
Noncombinational area:   91479.677584
Macro/Black Box area:    0.000000
Net Interconnect area:  1097139.088348

Total cell area:         140038.896131
Total area:              1237177.984479

Hierarchical area distribution
-----

```

| Hierarchical cell | Global cell area | local cell area |                |                   |             |        |
|-------------------|------------------|-----------------|----------------|-------------------|-------------|--------|
|                   | Absolute Total   | Percent Total   | Combi-national | Noncombi-national | Black-boxes | Design |
| RNN               | 140038.8961      | 100.0           | 48559.2185     | 91479.6776        | 0.0000      | RNN    |
| Total             |                  |                 | 48559.2185     | 91479.6776        | 0.0000      |        |

```
1

(10 ms) Sat Jun 20 06:23:57
(237)#/~/project/RTL7
(CAD)b5902086@cad30:[0]$ cat syn/RNN_syn.area.rpt
```

## o Gate-level Timing Report

```
b5902086@cad30:~/project/RTL7
    (rising edge-triggered flip-flop clocked by clk)
Endpoint: h_new_reg[31]
    (rising edge-triggered flip-flop clocked by clk)
Path Group: clk
Path Type: max

Des/Clust/Port      Wire Load Model      Library
-----             tsmc13_wl10      slow
Point                Incr      Path
-----               -----
clock clk (rise edge)      0.00      0.00
clock network delay (ideal) 0.50      0.50
adder_40_reg[25]/CK (DFFQX2) 0.00 # 0.50 r
adder_40_reg[25]/Q (DFFQX2) 0.31      0.81 f
U2486/Y (NOR2X1)           0.57      1.38 r
U4487/Y (NOR2X2)           0.32      1.70 f
U4490/Y (NAND2X2)          0.28      1.98 r
U4491/Y (CLKNVX1)          0.33      2.30 f
U2945/Y (NOR2X1)           0.35      2.65 r
U2399/Y (OA12X1)           0.25      2.90 f
U2858/Y (AO12X1)           0.28      3.18 r
U3874/Y (XOR2X1)           0.26      3.44 f
U6417/Y (NOR2BX1)          0.26      3.70 f
h_new_reg[31]/D (DFFQX2)   0.00      3.70 f
data arrival time          3.70
-----               -----
slack (MET)                  0.00
-----               -----
```

```
1

(11 ms) Sat Jun 20 06:24:16
(238)#/~/project/RTL7
(CAD)b5902086@cad30:[0]$ cat syn/RNN_syn.timing.rpt
```

## o Gate-level Pass

```
b5902086@cad30:~/project/RTL7
reset==0
busy==1
`CSimulation interrupted at 33857496 PS + 1
ncsim> ?[A^C^C
ncsim> ^C^C
ncsim>
ncsim: *W,CMUSEX: Control-D in interactive input - one more to exit.
ncsim> exit

(6.49 s) Sat Jun 20 06:02:42
(229)#/./project/RTL7
(CAD)b5902086@cad30:~]$ ncverilog testfixture.v RNN_syn.v -v syn/tsmc13_neg.v +define+SDF
ncverilog: 15, 20-6039: (c) Copyright 1995-2017 Cadence Design Systems, Inc.
Loading snapshot worklib:testfixture.v ..... Done
ncsim> source /usr/cad/cadence/INCSIM/cur/tools/inca/files/ncsimrc
ncsim> run
ncsim: *W,DVEXAC: some objects excluded from $dumpvars due to access restrictions, use +access+r on command line for access to all objects.
File: ./testfixture.v, line = 74, pos = 9
Scope: testfixture
Time: 0 FS + 0

-----
START!!! Simulation Start ......

-----
reset==0
busy==1

-----
----- S U M M A R Y -----
Congratulations! All data have been generated successfully! The result is PASS!!

-----
Simulation complete via $finish(1) at time 4685447578 PS + 0
./testfixture.v:171      #(`CYCLE/2); $finish;
ncsim> exit

(5 min 54 s) Sat Jun 20 06:08:37
(230)#/./project/RTL7
(CAD)b5902086@cad30:~]$ ncverilog testfixture.v RNN_syn.v -v syn/tsmc13_neg.v +define+SDF
```

## o Transistor-level Floorplan



```

b5902086@cad30:~/project/RTL7/layout
# Design Mode: 90nm
# Analysis Mode: MMNC Non-OCV
# Parasitics Mode: No SPEF/RDB
# Signoff Settings: SI OFF
#####
##### calculate delays in BcWC mode...
Start delay calculation (FuLIDC) (32 T). (MEM=2629.99)
AAE DB initialization (MEM=2649.74 CPU=0:00:00.1 REAL=0:00:00.0)
AAE_INFO: Cdb files are:
    ./library/celtic/slow.cdb
    ./library/celtic/fast.cdb

Total number of fetched objects 10135
AAE_INFO: Total number of nets for which stage creation was skipped for all views 0
End delay calculation. (MEM=4654.04 CPU=0:00:02.4 REAL=0:00:00.0)
End delay calculation (FuLIDC). (MEM=4498.67 CPU=0:00:04.1 REAL=0:00:02.0)
*** Done Building Timing Graph (cpu=0:00:05.4 real=0:00:03.0 totSessioncpu=0:01:08 mem=2923.0M)

timeDesign Summary
-----
Setup views included:
av_func_mode_max av_scan_mode_max

+-----+-----+-----+
| Setup mode | all | regZreg | default |
+-----+-----+-----+
| WNS (ns): | 0.984 | 1.158 | 0.984 |
| TNS (ns): | 0.000 | 0.000 | 0.000 |
| Violating Paths: | 0 | 0 | 0 |
| All Paths: | 4317 | 4221 | 128 |
+-----+-----+-----+

Density: 94.891%
-----
Set Using Default Delay Limit as 1000.
Resetting back High Fanout Nets as non-ideal
Set Default Net Delay as 1000 ps.
Set Default Net Load as 0.5 pF.
Reported timing to dlr timingReports
Total CPU time: 7.49 sec
Total Real time: 3.0 sec
Total Memory Usage: 2645.566406 Mbytes
innovus 1> 

```

## ○ Transistor-level Full placement



```
b5902086@cad30:~/project/RTL/layout -
```

mis timing accuracy. To resolve this, check parasitics for completeness, re-extraction may be required.  
\*\*WARN: (IMPE51\_3014): The RC network is incomplete for net mdata\_w[17]. As a result, a lumped model will be used during delay calculation which may compromise timing accuracy. To resolve this, check parasitics for completeness, re-extraction may be required.  
\*\*WARN: (IMPE51\_3014): The RC network is incomplete for net mdata\_w[9]. As a result, a lumped model will be used during delay calculation which may compromise timing accuracy. To resolve this, check parasitics for completeness, re-extraction may be required.

Total number of fetched objects 10172

AAE\_INFO: Total number of nets for which stage creation was skipped for all views 0

End delay calculation. (MEM=5581.98 CPU=0:00:02.3 REAL=0:00:00.0)

End delay calculation (fullDC). (MEM=5581.98 CPU=0:00:02.5 REAL=0:00:00.0)

\*\*\* Done Building Timing Graph (cpu=0:00:03.6 real=0:00:01.0 totSessionCpu=0:14:28 mem=3776.6M)

---

timeDesign Summary

---

Setup views included:  
av\_func\_mode\_max

---

| Setup mode       | all   | regreg | default |
|------------------|-------|--------|---------|
| WNS (ns):        | 0.267 | 0.267  | 0.701   |
| TNS (ns):        | 0.000 | 0.000  | 0.000   |
| Violating Paths: | 0     | 0      | 0       |
| All Paths:       | 4317  | 4221   | 128     |

---

| DRVs       | Real           |           | Total          |
|------------|----------------|-----------|----------------|
|            | Nr nets(terms) | Worst Vlo | Nr nets(terms) |
| max_cap    | 0 (0)          | 0.000     | 0 (0)          |
| max_tran   | 0 (0)          | 0.000     | 0 (0)          |
| max_fanout | 0 (0)          | 0         | 0 (0)          |
| max_length | 0 (0)          | 0         | 0 (0)          |

---

Density: 95.619%  
Routing Overflow: 0.00% H and 0.00% V

---

Reported timing to dir timingReports  
Total CPU time: 7.36 sec  
Total Real time: 2.0 sec  
Total Memory Usage: 3785.566406 Mbytes  
innovus 1>



#### ◦ Transistor-level Power Ring



- Transistor-level Power Stripe



- Transistor-level CTS



- Transistor-level Special Route



- Transistor-level Nano Route



```

[1 b5902086@cad30:~/project/RTL7/layout]
End delay calculation. (MEM=5648.56 CPU=0:00:06.1 REAL=0:00:00.0)
End delay calculation (fullDC) (MEM=5648.56 CPU=0:00:06.8 REAL=0:00:01.0)
Loading CTE timing window with TwFlowType 0...(CPU = 0:00:00.0, REAL = 0:00:00.0, MEM = 3807.9M)
Add other locks and setupCteAAEClockMapping during iter 1
Loading CTE timing window is completed (CPU = 0:00:00.2, REAL = 0:00:01.0, MEM = 3807.9M)
Starting ST iteration 2
Calculate late delays in OCV mode...
Calculate early delays in OCV mode...
Calculate late delays in OCV mode...
Calculate early delays in OCV mode...
Start delay calculation (fullDC) (32 T). (MEM=3816)
Glitch Analysis: View av_func mode min -- Total Number of Nets Skipped = 0.
Glitch Analysis: View av_func mode min -- Total Number of Nets Analyzed = 98.
Glitch Analysis: View av_scan mode min -- Total Number of Nets Skipped = 0.
Glitch Analysis: View av_scan mode min -- Total Number of Nets Analyzed = 98.
Total number of fetched objects: 10172
AAE INFO: Total number of nets for which stage creation was skipped for all views 0
AAE INFO: 618: Total number of nets in the design is: 9057. 0.0 percent of the nets selected for ST analysis
End delay calculation. (MEM=5609.34 CPU=0:00:00.3 REAL=0:00:00.0)
End delay calculation (fullDC). (MEM=5609.34 CPU=0:00:00.4 REAL=0:00:00.0)
*** Done Building Timing Graph (cpu=0:00:10.0 real=0:00:02.0 totSessionCpu=0:19:51 mem=5609.3M)

-----
timeDesign Summary
-----

Hold views included:
av_func_mode_min av_scan_mode_min

+-----+-----+-----+-----+
| Hold mode | all | regRreg | default |
+-----+-----+-----+-----+
| WNS (ns): | 0.038 | 0.038 | 0.765 |
| TNS (ns): | 0.000 | 0.000 | 0.000 |
| Violating Paths: | 0 | 0 | 0 |
| All Paths: | 4317 | 4221 | 128 |
+-----+-----+-----+-----+
Density: 95.619%
Reported timing to dir timingReports
Total CPU time: 14.36 sec
Total Real time: 6.0 sec
Total Memory Usage: 3573.363281 Mbytes
Reset AAE Options
innovus 7> 

[1 b5902086@cad30:~/project/RTL7/layout]
+-----+-----+-----+-----+
| WNS (ns): | 0.038 | 0.038 | 0.765 |
| TNS (ns): | 0.000 | 0.000 | 0.000 |
| Violating Paths: | 0 | 0 | 0 |
| All Paths: | 4317 | 4221 | 128 |
+-----+-----+-----+-----+
Density: 95.619%
Reported timing to dir timingReports
Total CPU time: 14.36 sec
Total Real time: 6.0 sec
Total Memory Usage: 3573.363281 Mbytes
Reset AAE Options
innovus 7> *** Starting Verify Geometry (MEM: 3573.4) ***
**WARN: (IMPVFG-257): verifyGeometry command is replaced by verify_drc command. It still works in this release but will be removed in future release. Please update your script to use the new command.
VERIFY GEOMETRY ..... Starting Verification
VERIFY GEOMETRY ..... Initializing
VERIFY GEOMETRY ..... Deleting Existing Violations
VERIFY GEOMETRY ..... Creating Sub-Areas
..... bin size: 8320
**WARN: (IMPVFG-198): Area to be verified is small to see any runtime gain from multi-cpus. Use setMultiCpuUsage command to adjust the number of CPUs.
VERIFY GEOMETRY ..... SubArea : 1 of 1
VERIFY GEOMETRY ..... Cells : 0 Viols.
VERIFY GEOMETRY ..... SameNet : 0 Viols.
VERIFY GEOMETRY ..... Wiring : 0 Viols.
VERIFY GEOMETRY ..... Antenna : 0 Viols.
VERIFY GEOMETRY ..... Sub-Area : 1 complete 0 Viols. 0 Wrngs.
VG: elapsed time: 8.00
Begin Summary ...
Cells : 0
SameNet : 0
Wiring : 0
Antenna : 0
Short : 0
Overlap : 0
End Summary

Verification Complete : 0 Viols. 0 Wrngs.

*****End: VERIFY GEOMETRY*****
*** verify geometry (CPU: 0:00:07.7 MEM: 110.9M)
innovus 7> 

```

## ○ Transistor-level Summary

```

[1 b5902086@cad30:~/project/RTL7]
BUFX12 1.023840
BUFX12 1.023840
BUFX8 0.682560
BUFX8 0.682560
BUFX6 0.511920
BUFX6 0.511920
BUF4 0.341280
BUF4 0.341280
BUF3 0.259680
BUF3 0.259680
BUF2 0.170640
BUF2 0.170640 1054
# Cells in lib with max_fanout: 0
SDC max_cap: N/A
SDC max_tran: N/A
SDC max_fanout: N/A
Default Ext. Scale Factor: 1.000
Detail Ext. Scale Factor: 1.000

=====
Floorplan/Placement Information
=====
Total [red] of Standard cells: 141113.349 um^2
Total [red] of Standard cells(Subtracting Physical Cells): 141113.349 um^2
Total [red] of Macros: 0.000 um^2
Total [red] of Blockages: 0.000 um^2
Total [red] of Pad cells: 0.000 um^2
Total [red] of Core: 147578.746 um^2
Total [red] of Chip: 215630.152 um^2
Effective Utilization: 9.5619e-01
Number of Cell Rows: 104
% Pure Gate Density #1 (Subtracting BLOCKAGES): 95.619%
% Pure Gate Density #2 (Subtracting BLOCKAGES and Physical Cells): 95.619%
% Pure Gate Density #3 (Subtracting MACROS): 95.619%
% Pure Gate Density #4 (Subtracting MACROS and Physical Cells): 95.619%
% Pure Gate Density #5 (Subtracting MACROS and BLOCKAGES): 95.619%
% Pure Gate Density #6 ((Unplaced Standard Inst + Unplaced Block Blob Inst + Fixed Clock Inst Area) / (Free Site Area + Fixed Clock Inst Area)) for insts are placed: 95.619%
% Core Density (Counting Std Cells and MACROs): 95.619%
% Core Density #2(Subtracting Physical Cells): 95.619%
% Chip Density (Counting Std Cells and MACROs and IOs): 65.4442%
% Chip Density #2(Subtracting Physical Cells): 65.4442%
# Macros within 5 sites of IO pad: No
Macro halo defined?: No

"layout/summaryReport.rpt" 5393L, 228589C
5346,1 99% 

```

- Transistor-level Pass

```
b5902086@cad30:~/project/KIL7

-----
reset==0
busy==1
*CSimulation interrupted at 28089673 PS + 1
ncsim>
ncsim: *W,CMUSEX: Control-D in interactive input - one more to exit.
ncsim> exit

(5.91 s) Sat Jun 20 06:09:46
(232)#/~/project/RTL7
(CAD)b5902086@cad30:[0]$ ncverilog testfixture.v layout/RNN_APP.v -v syn/tsmc13_neg.v +define+SDF +ncmaxdelays
ncverilog: 15.20-5039: (c) Copyright 1995-2017 Cadence Design Systems, Inc.
Loading snapshot worklib:testfixture.v ..... Done
ncsim> source /usr/cad/cadence/INCAISV/cur/tools/inca/files/ncsimrc
ncsim> run
ncsim: *W,DVEXAC: some objects excluded from $dumpvars due to access restrictions, use +access+r on command line for access to all objects.
      File: ./testfixture.v, line = 74, pos = 9
      Scope: testfixture
      Time: 0 FS + 0

-----
START!!! Simulation Start ......

-----
reset==0
busy==1

-----
----- S U M M A R Y -----
Congratulations! All data have been generated successfully! The result is PASS!!

-----
Simulation complete via $finish(1) at time 4685447569 PS + 0
./testfixture.v:171      #(`CYCLE/2); $finish;
ncsim> exit

(7 min 19 s) Sat Jun 20 06:17:06
(233)#/~/project/RTL7
(CAD)b5902086@cad30:[0]$ ncverilog testfixture.v layout/RNN_APP.v -v syn/tsmc13_neg.v +define+SDF +ncmaxdelays,
```