

## High Level Block Diagram





## Design Decision

No pipelining was used. Because the architecture we were given we had to load the row values of the transformed matrix once at a time. So, we need to wait for the whole matrix to be completed because the coo matrix values can have random index of the rows of the transformed matrix.

If we could take the column values of the matrix then pipelining could have been done by me. Or perhaps I could not think of any pipelining with this architecture.

|                       |             |              |              |              |             |              |             |              |             |              |              |             |
|-----------------------|-------------|--------------|--------------|--------------|-------------|--------------|-------------|--------------|-------------|--------------|--------------|-------------|
| input                 | FM0<br>Wgt0 | FM1<br>Wgt0  | FM2<br>Wgt0  | .....        | FM5<br>Wgt0 | FM0<br>Wgt1  | FM1<br>Wgt1 | .....        | FM5<br>Wgt1 | FM0<br>Wgt2  | .....        | FM5<br>Wgt2 |
| Matrix multiplication |             | FM0*<br>Wgt0 | FM1*<br>Wgt0 | FM2*<br>Wgt0 | .....       | FM5*<br>Wgt0 | FM0<br>Wgt1 | FM1*<br>Wgt1 | .....       | FM5*<br>Wgt1 | FM0*<br>Wgt2 | .....       |

|             |       |                                                                                            |       |       |       |       |       |       |       |       |       |       |
|-------------|-------|--------------------------------------------------------------------------------------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| input       | Coo 0 | Coo 1                                                                                      | Coo 2 | Coo 3 | Coo 4 | Coo 5 | Coo 0 | Coo 1 | Coo 2 | Coo 3 | Coo 4 | Coo 5 |
| Aggregation |       | Adj Matrix row(random) = Vector addition (Adj_matrix(random) , Transformed Matrix(random)) |       |       |       |       |       |       |       |       |       |       |

**Random —> depends on the coo values**

|        |           |           |           |           |           |           |           |
|--------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| Input  | Adj row 0 | Adj row 1 | Adj row 2 | Adj row 3 | Adj row 4 | Adj row 5 |           |
| argmax |           | maxAddi 0 | maxAddi 1 | maxAddi 2 | maxAddi 3 | maxAddi 4 | maxAddi 5 |

## GCN block Behavioral Simulation

## Transformation Block



GCN\_TB



## After Synthesis



```
VSIM 5> run -all
# max_addi_answer[0]      DUT: 0      GOLD: 0
# max_addi_answer[1]      DUT: 0      GOLD: 0
# max_addi_answer[2]      DUT: 0      GOLD: 0
# max_addi_answer[3]      DUT: 1      GOLD: 1
# max_addi_answer[4]      DUT: 1      GOLD: 1
# max_addi_answer[5]      DUT: 2      GOLD: 2
#
#
# ** Note: $finish  : /afs/asu.edu/users/p/b/h/pbhoumik/asap7_rundir/lab4/Synthesis/RTL/tb/GCN_TB_post_syn_apr.sv(122)
#   Time: 60502 ps  Iteration: 0  Instance: /GCN_TB
# 1
# Break in Module GCN_TB at /afs/asu.edu/users/p/b/h/pbhoumik/asap7_rundir/lab4/Synthesis/RTL/tb/GCN_TB_post_syn_apr.sv line 122
VSIM 6>
```

Correctness of the module with post synthesis simulation

**Total Latency:** 90.75ns with 666.67MHz (1500ps clk period)

Screenshot from modelsim:



Gold output matches the dut output. Also it can be seen in the wave output.

**Power:** 3.59mW

```

Total Power
-----
Total Internal Power: 1.26014933 35.1014%
Total Switching Power: 2.32876300 64.8675%
Total Leakage Power: 0.00111653 0.0311%
Total Power: 3.59002886

-----
Ended Static Power Report Generation: (cpu=0:00:00, real=0:00:01,
mem(process/total)=1479.99MB/1479.99MB)

Output file is ./POWER/GCN.rpt

```

**Area**

Standard cells + filler cells = 48.73 mm<sup>2</sup>

```
=====
Floorplan/Placement Information
=====
Total area of Standard cells: 44352.360 um^2
Total area of Standard cells(Subtracting Physical Cells): 21816.579 um^2
Total area of Macros: 0.000 um^2
Total area of Blockages: 0.000 um^2
Total area of Pad cells: 0.000 um^2
Total area of Core: 44367.523 um^2
Total area of Chip: 48731.446 um^2
Effective Utilization: 1.00000e+00
```

Snippet from log



Design x and y dimensions screenshot  
(\*Please ignore the top right corner ruler marking.)

## Innovus Density

```
Density: 52.300%
Total number of glitch violations: 0
-----
**optDesign ... cpu = 0:00:41, real = 0:01:27, mem = 1349.4M, totSessionCpu=0:10:59 **
  ReSet Options after AAE Based Opt flow
*** Finished optDesign ***
Reading RCDB with compressed RC data.
Info: Destroy the CCOpt slew target map.
```

## Number of cells from summaryReport: 63311

```
innovus 1> reportGateCount
Gate area 0.6998 um^2
[0] GCN Gates=31173 Cells=11170 Area=21816.6 um^2
```

## **connectivity errors: 0**

```
***** Start: VERIFY CONNECTIVITY *****
Start Time: Wed Nov 22 13:55:31 2023

Design Name: GCN
Database Units: 4000
Design Boundary: (0.0000, 0.0000) (220.7520, 220.7520)
Error Limit = 1000; Warning Limit = 50
Check all nets
**** 13:55:31 **** Processed 5000 nets.
**** 13:55:32 **** Processed 10000 nets.

Begin Summary
  Found no problems or warnings.
End Summary

End Time: Wed Nov 22 13:55:32 2023
Time Elapsed: 0:00:01.0

***** End: VERIFY CONNECTIVITY *****
  Verification Complete : 0 Viols.  0 Wrngs.
  (CPU Time: 0:00:00.8  MEM: 0.000M)
```

## **DRC errors: 0**

```
VERIFY DRC ..... Sub-Area : 14 complete 0 Viols.
VERIFY DRC ..... Sub-Area: {110.592 165.888 165.888 220.752} 15 of 16
VERIFY DRC ..... Sub-Area : 15 complete 0 Viols.
VERIFY DRC ..... Sub-Area: {165.888 165.888 220.752 220.752} 16 of 16
VERIFY DRC ..... Sub-Area : 16 complete 0 Viols.
```

Verification Complete : 0 Viols.

```
*** End Verify DRC (CPU: 0:00:36.3  ELAPSED TIME: 37.00  MEM: 250.2M) ***
```

### Conn.rpt snippet

```
#####
# Generated by:      Cadence Innovus 17.12-s095_1
# OS:                Linux x86_64(Host ID eecad101)
# Generated on:     Wed Nov 22 12:34:23 2023
# Design:            GCN
# Command:          verify_drc
#####
```

No DRC violations were found

### Geom.rpt snippet

```
#####
# Generated by:      Cadence Innovus 17.12-s095_1
# OS:                Linux x86_64(Host ID eecad101)
# Generated on:     Wed Nov 22 12:34:23 2023
# Design:            GCN
# Command:          verify_drc
#####
```

No DRC violations were found

### **Worst Path Setup**

```
#####
# Generated by:      Cadence Innovus 17.12-s095_1
# OS:                Linux x86_64(Host ID eecad101)
# Generated on:     Wed Nov 22 12:44:58 2023
# Design:            GCN
# Command:          timeDesign -postRoute -pathReports -drvReports -slackReports -numPaths 50 -prefix GCN_postRoute -outDir timingReportsFinal
#####
Path 1: MET Setup Check with Pin trans/fmmw_mem_reg_1_1_15 /CLK
Endpoint:  trans/fmmw_mem_reg_1_1_15 /D (v) checked with leading edge of
'clk'
Beginpoint: data_in[356] (v) triggered by leading edge of
'clk'
Path Groups: {clk}
Analysis View: default_setup_view
Other End Arrival Time 79.400
- Setup 5.770
+ Phase Shift 1500.000
= Required Time 1573.620
- Arrival Time 1383.660
= Slack Time 190.020
Clock Rise Edge 0.000
+ Input Delay 0.100
+ Drive Adjustment 4.600
= Beginpoint Arrival Time 4.700
Timing Path:
```

## Worst Path Hold

```
#####
# Generated by: Cadence Innovus 17.12-s095 1
# OS: Linux x86_64(Host ID eecad101)
# Generated on: Wed Nov 22 12:45:19 2023
# Design: GCN
# Command: timeDesign -postRoute -hold -pathReports -slackReports -numPaths 50 -prefix GCN_postRoute -outDir timingReportsFinal
#####
Path 1: MET Hold Check with Pin trans/spad_memory_reg_37_2/_CLK
Endpoint: trans/spad_memory_reg_37_2/_D (v) checked with leading edge of
'clk'
Beginpoint: data_in[292] (v) triggered by leading edge of
'clk'
Path Groups: {clk}
Analysis View: default_hold_view
Other End Arrival Time 8.450
+ Hold 13.732
+ Phase Shift 0.000
= Required Time 22.191
Arrival Time 22.300
Slack Time 0.109
Clock Rise Edge 0.000
+ Input Delay 0.100
+ Drive Adjustment 3.600
= Beginpoint Arrival Time 3.700
Timing Path:
```

**Comments:** As you can see I achieved a lot of slack using 1500ps as clk period. Initially I started with 2000clk and my power was 2.59mW. So, there is a clear tradeoff of frequency and power. By increasing the clk period further (~150ps) I can get to lower power with higher latency.