

# Final project - Recurrent Unit Circuit Design

---

b05902086 周逸

---

## 1. 設計取捨

- 一開始花了一段時間考慮要把哪些權重主動儲存在 circuit 上面，而因為最後計算的結果是看 AT 值(面積 $\times$ cycle 數 $\times$ cycle time)，如果把所有權重資料都存在 circuit 上，雖然會使得總 cycle 數量變很低，但是會導致節省下來的 cycle 數量的倍率跑到面積上面，而這樣會導致 circuit 過大，很難壓 cycle time。因此最後是選擇所有資料都重新讀取。
- 雖然原本以為把  $B_{ih}$  及  $B_{hh}$  放在 Circuit 上可以用少量的 area 換取一些 cycle 數上的節省，但是實際上實作後發現為了存  $B_{ih}$  及  $B_{hh}$  會導致多用了大約一半面積，反而得不償失，而 cycle 數只少了  $\frac{1}{64}$  左右。

## 2. Stage

- 一開始仔細思考之後，會發現基本上只要依照讀六種不同 memory 的位置來切狀態即可完成，並且會在其中的五個狀態中重複循環。

## 3. 乘法器

- 在寫之前就有預想到乘法部分會變成 critical path，而實際用 design compiler 跑下去結果也是如此。因此把一個大乘法拆成多段小的乘法，然後把加總的部分放到下個 cycle 處理，這樣就能壓低 cycle time。

## 4. activation function

- 把乘法壓低後，critical path 就變成了計算 activation function 到傳出結果到 mdata\_w 的這段，因為這段只有半個 cycle 的時間可以運算(其它的運算都是在下次的 posedge 前能計算出結果就好，但是送出的資料則必須要在 negedge 前計算完成)，因此把 activation function 跟送出的部分拆開成兩個階段，雖然會讓總 cycle 數量多大約 1%，卻可以讓 cycle time 繼續往下壓，因此這也是個好的優化。

## 5. Transistor-level 合成

- Transistor-level 合成的時候，在 nano Route 這個階段的時候，如果 WNS 不夠大，innovus 會直接 crash，因此後來發現的解決方法是事先先執行 ECO Design，然後到 Mode 裡面把 Thresholds 裡的 Setup Slack 提高，這樣就能讓 WNS 變成正比較大的值，而不是 0.000 附近。這樣之後就能順利的讓 nano Route 不會 crash 了。

## 6. 合成 Cycle time

- Gate-Level: 4.2 ns
- Transistor-Level: 4.2 ns

## 7. 結果

- Gate-level results
  - Can you pass gate-level simulation?
    - yes
  - Cycle time that can pass your gate-level simulation:
    - 4.2 ns
  - Total simulation time:
    - 5358831.281 ns
  - Total cell area:
    - 218813.531461  $\mu\text{m}^2$

- Cell area  $\times$  Simulation time:
  - $1172584797099.2847 \mu\text{m}^2 \cdot ns$

- Transistor-level results

- Can you pass transistor-level simulation?
  - yes
- Cycle time that can pass your transistor-level simulation:
  - $4.2 ns$
- Total simulation time:
  - $5358831.358 ns$
- Total cell area:
  - $243247.604 \mu\text{m}^2$
- Cell area  $\times$  Simulation time:
  - $1303522888073.5662 \mu\text{m}^2 \cdot ns$

## 8. 截圖

- RTL Pass

```
b5902086@cad29:~/project/RTL
Building instance specific data structures.
Loading native compiled code: ..... Done
Design hierarchy summary:
Design hierarchy summary:
  Instances Unique
  Modules: 2 2
  Registers: 60 60
  Scalar wires: 6 -
  Vectored wires: 5 -
  Always blocks: 4 4
  Initial blocks: 11 11
  Cont. assignments: 0 6
  Pseudo assignments: 4 4
  Simulation timescale: 10ps
Writing initial simulation snapshot: worklib.testfixture:v
Loading snapshot worklib.testfixture:v ..... Done
ncsim source /usr/cad/cadence/INCISIV/cur/tools/inca/files/ncsimrc
ncsim> run
ncsim: *W,DVEXAC: some objects excluded from $dumpvars due to access restrictions, use +access+r on command line for access to all objects.
  File: ./testfixture.v, line = 74, pos = 9
  Scope: testfixture
  Time: 0 FS + 0
-----
----- START!!! Simulation Start .....
-----
----- reset==0
----- busy==1
-----
----- S U M M A R Y -----
Congratulations! All data have been generated successfully! The result is PASS!
-----
Simulation complete via $finish(1) at time 5358826200 PS + 0
./testfixture.v:171      #(`CYCLE/2); $finish;
ncsim> exit
(6.76 s) Thu Jun 11 04:47:37
(43)#-/project/RTL
(CAD)b5902086@cad29:[0]$ ncverilog testfixture.v RNN.v
```

- Gate-level Area Report

```
b5902086@cad29:~/project/RTL
*****
Report : area
Design : RNN
Version: N-2017.09-SP2
Date  : Wed Jun 10 20:15:46 2020
*****
Library(s) Used:
  slow (File: /home/raid7_2/course/cvsd/CBDK_IC_Contest/CIC/SynopsysDC/db/slow.db)

Number of ports:          98
Number of nets:           17463
Number of cells:          17167
Number of combinational cells: 14510
Number of sequential cells: 2657
Number of macros/black boxes: 0
Number of buf/inv:         4995
Number of references:     176
Combinational area:       148233.940862
Buf/Inv area:             26095.827474
Noncombinational area:    70579.590599
Macro/Black Box area:     0.000000
Net Interconnect area:   1717015.436829
Total cell area:          218813.531461
Total area:                1935828.968290

Hierarchical area distribution
  Global cell area          Local cell area
  Absolute      Percent   Total   Combi- Noncombi- Black-
  Hierarchical cell          Total      Total   national  national  boxes  Design
  -----
  RNN                   218813.5315  100.0  148233.9409  70579.5906  0.0000  RNN
  -----
  Total                  148233.9409  100.0  70579.5906  0.0000
  1
(21 ms) Thu Jun 11 15:40:40
(43)#-/project/RTL
(CAD)b5902086@cad29:[0]$ cat syn/RNN_syn.area.rpt -
```

## o Gate-level Timing Report

```
b5902086@cad29:~/project/RTL7
RNN tsmc13_wl10 slow
Point      Incr      Path
-----
clock clk (fall edge)    2.10      2.10
clock network delay (ideal) 0.50      2.60
input external delay       0.20      2.80 f
mdata_r[1] (in)          0.09      2.89 f
U9395/Y (XNOR2X4)        0.18      3.08 r
U9667/Y (NAND2X8)        0.15      3.22 r
U9464/Y (NAND2X8)        0.09      3.31 r
U9461/Y (NAND2X8)        0.07      3.38 f
U9458/Y (NAND2X8)        0.09      3.47 r
U9714/Y (NAND2X8)        0.08      3.55 f
U9697/Y (NAND2X8)        0.09      3.64 r
U9569/Y (NAND2X8)        0.12      3.77 f
U9583/Y (NAND2X8)        0.13      3.89 r
U9573/Y (INVX12)          0.18      3.99 f
U9572/Y (INVX20)          0.07      4.07 r
U9574/Y (NAND2X8)        0.05      4.12 f
U5447/Y (INVX6)           0.08      4.20 r
U9884/Y (NAND2X6)         0.06      4.26 f
U9581/Y (NAND2X6)         0.08      4.40 f
U8324/Y (NAND2X4)         0.07      4.47 r
h_new_reg[40]/D (DFFHQX8) 0.00      4.47 r
data arrival time          0.00      4.47
-----  

clock clk (rise edge)     4.20      4.20
clock network delay (ideal) 0.50      4.70
clock uncertainty         -0.10      4.60
h_new_reg[40]/CK (DFFHQX8) 0.00      4.60 r
library setup time         -0.13      4.47
data required time          0.00      4.47
-----  

data required time          0.00      4.47
data arrival time          -0.47      4.47
-----  

slack (MET)                0.00      0.00
1
(245 ms) Thu Jun 11 15:41:22
(4)#/~/project/RTL7
(CAD)b5902086@cad29:[0]$ cat syn/RNN_syn.timing.rpt
```

## o Gate-level Pass

```
b5902086@cad29:~/project/RTL7
Warning! Timing violation
$setuphold<setup>{ posedge CK && (flag == 1):6300 PS, negedge D:6205 PS, 0.152 : 152 PS, -0.077 : -77 PS };
File: ./syn/tsmc13_neg.v, line = 23468
Scope: testfixture.u_RNN.\mul_34_reg[6]
Time: 6300 PS

Warning! Timing violation
$setuphold<setup>{ posedge CK && (flag == 1):6300 PS, negedge D:6205 PS, 0.152 : 152 PS, -0.077 : -77 PS };
File: ./syn/tsmc13_neg.v, line = 23468
Scope: testfixture.u_RNN.\mul_20_reg[6]
Time: 6300 PS

Warning! Timing violation
$setuphold<setup>{ posedge CK && (flag == 1):6300 PS, negedge D:6117 PS, 0.265 : 265 PS, -0.087 : -87 PS };
File: ./syn/tsmc13_neg.v, line = 18964
Scope: testfixture.u_RNN.\mul_20_reg[3]
Time: 6300 PS

Warning! Timing violation
$setuphold<setup>{ posedge CK && (flag == 1):6300 PS, negedge D:6148 PS, 0.187 : 187 PS, -0.108 : -108 PS };
File: ./syn/tsmc13_neg.v, line = 23418
Scope: testfixture.u_RNN.\mul_02_reg[3]
Time: 6300 PS

reset==0
busy==1
-----  

----- S U M M A R Y -----  

Congratulations! All data have been generated successfully! The result is PASS!!  

-----  

Simulation complete via $finish(1) at time 5358831281 PS + 0
./testfixture.v:171      #(`CYCLE/2); $finish;
ncsim> exit
(8 min 56 s) Thu Jun 11 04:28:40
(40)#/~/project/RTL7
(CAD)b5902086@cad29:[0]$ ncverilog testfixture.v RNN_syn.v -v syn/tsmc13_neg.v +define+SDF
```

- Transistor-level Floorplan



- Transistor-level Full placement



- Transistor-level Power Ring



- Transistor-level Power Stripe



```

[1 b5902086@cad29:~/project/RTL7/layout]
innovus 1> **WARN: (IMPCM-125): Option "-checkPinLayerForAccess" for command getPlaceMode is obsolete and will be ignored. It no longer has any effect and should be removed from your script.
*** Starting refinePlace (0:08:47 mem=3749.9M) ***
Density distribution unevenness ratio = 3.152%
Movement: Detail placement moves 0 insts, mean move: 0.00 um, max move: 0.00 um
Runtime: CPU: 0:00:00.2 REAL: 0:00:00.0 MEM: 3749.9MB
Summary Report:
Instances move: 0 (out of 17203 movable)
Instances Flipped: 0
Mean displacement: 0.00 um
Max displacement: 0.00 um
Runtime: CPU: 0:00:00.3 REAL: 0:00:00.0 MEM: 3749.9MB
*** Finished refinePlace (0:08:47 mem=3749.9M) ***
Density distribution unevenness ratio = 3.152%
innovus 1> *** Starting Verify Geometry (MEM: 3769.9) ***
**WARN: (IMPVG-257): verifyGeometry command is replaced by verify_drc command. It still works in this release but will be removed in future release. Please update your script to use the new command.
VERIFY GEOMETRY ..... Starting Verification
VERIFY GEOMETRY ..... Initializing
VERIFY GEOMETRY ..... Deleting Existing Violations
VERIFY GEOMETRY ..... Creating Sub-Areas
..... bin size: 8320
**WARN: (IMPVG-198): Area to be verified is small to see any runtime gain from multi-cpus. Use setMultiCpuUsage command to adjust the number of CPUs.
VERIFY GEOMETRY ..... SubArea : 1 of 1
VERIFY GEOMETRY ..... Cells : 0 Viols.
VERIFY GEOMETRY ..... SameNet : 0 Viols.
VERIFY GEOMETRY ..... Wiring : 0 Viols.
VERIFY GEOMETRY ..... Antenna : 0 Viols.
VERIFY GEOMETRY ..... Sub-Area : 1 complete 0 Viols. 0 Wrngs.
VG: elapsed time: 7.00
Begin Summary ...
Cells : 0
SameNet : 0
Wiring : 0
Antenna : 0
Short : 0
Overlap : 0
End Summary

Verification Complete : 0 Viols. 0 Wrngs.

*****End: VERIFY GEOMETRY*****
*** verify geometry (CPU: 0:00:06.2 MEM: 1.0M)

innovus 1>

```

```

[1 b5902086@cad29:~/project/RTL7/layout]
iso timing accuracy. To resolve this, check parasitics for completeness, re-extraction may be required.
**WARN: (IMPEI-3014): The RC network is incomplete for net mdata[w0]. As a result, a lumped model will be used during delay calculation which may compromise timing accuracy. To resolve this, check parasitics for completeness, re-extraction may be required.
**WARN: (IMPEI-3014): The RC network is incomplete for net maddr[12]. As a result, a lumped model will be used during delay calculation which may compromise timing accuracy. To resolve this, check parasitics for completeness, re-extraction may be required.
Total Number of nets fetched: 17499
AAE INFO: Total number of nets for which stage creation was skipped for all views 0
End delay calculation. (MEM=5448.45 CPU=0:00:03.3 REAL=0:00:01.0)
End delay calculation (FullDC) (MEM=5448.45 CPU=0:00:03.5 REAL=0:00:01.0)
*** Done Building Timing Graph (cpu=0:00:04.4 real=0:00:01.0 totSessionCpu=0:09:02 mem=3657.0M)

-----
timeDesign Summary

Setup views included:
av_func_mode_max



| Setup mode       | all   | regReg | default |
|------------------|-------|--------|---------|
| WNS (ns):        | 0.338 | 0.374  | 0.338   |
| TNS (ns):        | 0.000 | 0.000  | 0.000   |
| Violating Paths: | 0     | 0      | 0       |
| All Paths:       | 2700  | 2657   | 2699    |



| DRVs       | Real           |           | Total          |
|------------|----------------|-----------|----------------|
|            | Nr nets(terms) | Worst Vio | Nr nets(terms) |
| max_cap    | 0 (0)          | 0.000     | 0 (0)          |
| max_tran   | 0 (0)          | 0.000     | 0 (0)          |
| max_fanout | 0 (0)          | 0         | 0 (0)          |
| max_length | 0 (0)          | 0         | 0 (0)          |



Density: 90.326%
Routing Overflow: 0.00% H and 0.00% V

Reported timing to dir timingReports
Total CPU time: 8.41 sec
Total Real time: 3.0 sec
Total Memory Usage: 3662.050781 Mbytes
innovus 1>

```

## ○ Transistor-level CTS



```

b5902086@cad29:~/project/RTL7/layout$ timeDesign Summary
Setup views included:
av_func_mode_max

+-----+-----+-----+-----+
| Setup mode | all | regReg | default |
+-----+-----+-----+-----+
| WNS (ns): | 0.335 | 0.371 | 0.335 |
| TNS (ns): | 0.000 | 0.000 | 0.000 |
| Violating Paths: | 0 | 0 | 0 |
| All Paths: | 2700 | 2657 | 2699 |
+-----+-----+-----+-----+
+-----+-----+-----+
| Real | Total |
| Nr nets(terms) | Worst Vio | Nr nets(terms) |
+-----+-----+-----+
| max_cap | 0 (0) | 0.000 | 0 (0) |
| max_tran | 0 (0) | 0.000 | 0 (0) |
| max_fanout | 0 (0) | 0 | 0 (0) |
| max_length | 0 (0) | 0 | 0 (0) |
+-----+-----+-----+
Density: 90.328%
Routing Overflow: 0.00% H and 0.00% V
Reported timing to dir timingReports
Total CPU time: 7.12 sec
Total Real time: 3.0 sec
Total Memory Usage: 3942.777344 Mbytes
innovus 6>

b5902086@cad29:~/project/RTL7/layout$ timeDesign Summary
Hold views included:
av_func_mode_min av_scan_mode_min

+-----+-----+-----+-----+
| Hold mode | all | regReg | default |
+-----+-----+-----+-----+
| WNS (ns): | 0.043 | 0.043 | 0.639 |
| TNS (ns): | 0.000 | 0.000 | 0.000 |
| Violating Paths: | 0 | 0 | 0 |
| All Paths: | 2700 | 2657 | 2699 |
+-----+-----+-----+-----+
Density: 90.328%
Routing Overflow: 0.00% H and 0.00% V
Reported timing to dir timingReports
Total CPU time: 7.75 sec
Total Real time: 2.0 sec
Total Memory Usage: 3553.75 Mbytes
innovus 6>

```

## ○ Transistor-level Special Route



```

b5902086@cad29:~/project/RTL7/layout
Number of Core ports routed: 318
Number of Followpin connections: 159
End power routing: cpu: 0:00:01, real: 0:00:01, peak: 440.00 megs.

Begin updating DB with routing results ...
Updating DB with 0 via definition ...
route created 477 wires.
ViaGen created 1590 vias; deleted 0 via to avoid violation.



| Layer  | Created | Deleted |
|--------|---------|---------|
| METAL1 | 477     | NA      |
| VIA12  | 318     | 0       |
| VIA23  | 318     | 0       |
| VIA34  | 318     | 0       |
| VIA45  | 318     | 0       |
| VIA56  | 318     | 0       |



Innovus 6>
Innovus 6> VERIFY_CONNECTIVITY use new engine.

***** Start: VERIFY_CONNECTIVITY *****
Start Time: Thu Jun 11 04:06:25 2020

Design Name: RNN
Database Units: 2000
Design Boundary: (0.0000, 0.0000) (497.2600, 663.3800)
Error Limit = 1000; Warning Limit = 50
Check specified nets
Use 32 pthreads

Begin Summary
Found no problems or warnings.
End Summary

End Time: Thu Jun 11 04:06:26 2020
Time Elapsed: 0:00:01.0

***** End: VERIFY_CONNECTIVITY *****
Verification Complete : 0 Viosl. 0 Wrngs.
(CPU Time: 0:00:01.5 MEM: 20.008M)

Innovus 6>

```

## ○ Transistor-level Nano Route



```

[1 b5902086@cad29:~/project/RTL7/layout]
End delay calculation. (MEM=5637.09 CPU=0:00:08.3 REAL=0:00:00.0)
End delay calculation (fullDC) (MEM=5637.09 CPU=0:00:09.0 REAL=0:00:00.0)
Loading CTE timing window with TwFlowType 0...(CPU = 0:00:00.0, REAL = 0:00:00.0, MEM = 3910.1M)
Add other locks and setupCteToAAEClockMapping during iter 1
Loading CTE timing window is completed (CPU = 0:00:00.2, REAL = 0:00:00.0, MEM = 3910.1M)
Starting ST iteration 2
Calculate late delays in OCV mode...
Calculate early delays in OCV mode...
Calculate late delays in OCV mode...
Calculate early delays in OCV mode...
Start delay calculation (fullDC) (32 T). (MEM=3918.21)
Glitch Analysis: View av_func mode min -- Total Number of Nets Skipped = 0.
Glitch Analysis: View av_func mode min -- Total Number of Nets Analyzed = 98.
Glitch Analysis: View av_scan mode min -- Total Number of Nets Skipped = 0.
Glitch Analysis: View av_scan mode min -- Total Number of Nets Analyzed = 98.
Total number of fetched objects: 17500
AAE INFO: Total number of nets for which stage creation was skipped for all views 0
AAE INFO-618: Total number of nets in the design is 17529, 0.0 percent of the nets selected for ST analysis
End delay calculation. (MEM=5730.63 CPU=0:00:00.3 REAL=0:00:00.0)
End delay calculation (fullDC). (MEM=5730.63 CPU=0:00:00.4 REAL=0:00:00.0)
*** Done Building Timing Graph (cpu=0:00:12.7 real=0:00:02.0 tottSessionCpu=0:15:08 mem=5730.6M)

-----
timeDesign Summary
-----

Hold views included:
av_func_mode_min av_scan_mode_min

+-----+-----+-----+
| Hold mode | all | reg/reg | default |
+-----+-----+-----+
| WNS (ns): | 0.042 | 0.042 | 0.654 |
| TNS (ns): | 0.000 | 0.000 | 0.000 |
| Violating Paths: | 0 | 0 | 0 |
| All Paths: | 2700 | 2657 | 2699 |
+-----+-----+-----+
Density: 90.328%
Reported timing to dir timingReports
Total CPU time: 18.93 sec
Total Real time: 8.0 sec
Total Memory Usage: 3588.609375 Mbytes
Reset AAE Options
innovus 7>

```

```

[1 b5902086@cad29:~/project/RTL7/layout]
+-----+-----+-----+
| WNS (ns): | 0.042 | 0.042 | 0.654 |
| TNS (ns): | 0.000 | 0.000 | 0.000 |
| Violating Paths: | 0 | 0 | 0 |
| All Paths: | 2700 | 2657 | 2699 |
+-----+-----+-----+
Density: 90.328%
Reported timing to dir timingReports
Total CPU time: 18.93 sec
Total Real time: 8.0 sec
Total Memory Usage: 3588.609375 Mbytes
Reset AAE Options
innovus 7> *** Starting Verify Geometry (MEM: 3588.6) ***
**WARN: (IMPVFG-257): verifyGeometry command is replaced by verify_drc command. It still works in this release but will be removed in future release. Please update your script to use the new command.
VERIFY GEOMETRY ..... Starting Verification
VERIFY GEOMETRY ..... Initializing
VERIFY GEOMETRY ..... Deleting Existing Violations
VERIFY GEOMETRY ..... Creating Sub-Areas
..... bin size: 8320
**WARN: (IMPVFG-198): Area to be verified is small to see any runtime gain from multi-cpus. Use setMultiCpuUsage command to adjust the number of CPUs.
VERIFY GEOMETRY ..... SubArea : 1 of 1
VERIFY GEOMETRY ..... Cells : 0 Viols.
VERIFY GEOMETRY ..... SameNet : 0 Viols.
VERIFY GEOMETRY ..... Wiring : 0 Viols.
VERIFY GEOMETRY ..... Antenna : 0 Viols.
VERIFY GEOMETRY ..... Sub-Area : 1 complete 0 Viols. 0 Wrngs.
VG: elapsed time: 12.00
Begin Summary ...
Cells : 0
SameNet : 0
Wiring : 0
Antenna : 0
Short : 0
Overlap : 0
End Summary
Verification Complete : 0 Viols. 0 Wrngs.
*****End: VERIFY GEOMETRY*****
*** verify geometry (CPU: 0:00:12.2 MEM: 302.5M)
innovus 7>

```

## ○ Transistor-level Pass

```

[1 b5902086@cad29:~/project/RTL7]
warning! Timing violation
$setupHoldHold(< posedge CK && (flag == 1):6300 PS, negedge D:6231 PS, 0.253 : 253 PS, -0.066 : -66 PS );
File: ./syn/tsmc13.neg.v, line = 18064
Scope: testfixture.u_RNN.\mul_32_reg[0]
Time: 6317 PS

Warning! Timing violation
$setupHoldHold(< posedge CK && (flag == 1):6300 PS, negedge D:6232 PS, 0.252 : 252 PS, -0.066 : -66 PS );
File: ./syn/tsmc13.neg.v, line = 18064
Scope: testfixture.u_RNN.\mul_12_reg[0]
Time: 6318 PS

Warning! Timing violation
$setupHoldHold(< posedge CK && (flag == 1):6300 PS, negedge D:6233 PS, 0.253 : 253 PS, -0.066 : -66 PS );
File: ./syn/tsmc13.neg.v, line = 18064
Scope: testfixture.u_RNN.\mul_12_reg[0]
Time: 6319 PS

Warning! Timing violation
$setupHoldHold(< posedge CK 8&& (flag == 1):6300 PS, negedge D:6233 PS, 0.252 : 252 PS, -0.066 : -66 PS );
File: ./syn/tsmc13.neg.v, line = 18064
Scope: testfixture.u_RNN.\mul_03_reg[0]
Time: 6319 PS

reset==0
bus==1

-----
----- SUM M A R Y -----
Congratulations! All data have been generated successfully! The result is PASS!

-----
Simulation complete via $finish(1) at time 5358831358 PS + 0
./testfixture.v:171      #($CYCLE/2); $finish;
ncsim> exit
(10 min 52 s) Thu Jun 11 04:41:53
(42) #~/project/RTL7
(CAD)b5902086@cad29:[0]$ ncverilog testfixture.v layout/RNN_APP.v -v syn/tsmc13.neg.v +define+SDF +ncmaxdelays

```

