

### Architecture/High-level Block Diagram:



### Design decisions:

The GCN hardware design is organized into three key functional blocks: Transformation, Combination, and Argmax. Each block is tailored to handle a distinct phase—data preparation, graph-based feature aggregation, and result selection—supporting modular development and scalability.

The Transformation Block handles input preprocessing. A scratchpad buffers feature and weight data to minimize memory latency, and a finite state machine (FSM) coordinates dynamic read/write operations. Feature and weight counters automate tracking, and processed values are written into matrix memory for intermediate storage.

Next, the Combination Block integrates the graph structure by applying adjacency-based updates. Feature matrices are aligned with neighbor relationships via COO-format decoding and stored in dedicated memory. The core computation involves aggregating neighboring node features using weighted sums.

Finally, the Argmax Block identifies the most activated output—signifying the predicted label or class. This result (`max_addi_address`) is output along with a done signal to indicate completion.

This design leverages efficient memory access patterns, FSM-driven control, and sparse data handling to optimize for both speed and flexibility.

Bhaumik Thakker  
1233701211

## 1. Transformation Block

The Transformation\_Block performs matrix-vector multiplication between features and weights.

It includes submodules: Vector\_Multiplier, Weight\_Counter, Feature\_Counter, Scratch\_Pad, Matrix\_FM\_WM\_Memory, and Transformation\_FSM.

- Feature and weight vectors are stored in local scratchpads.
- Counters select feature rows and weight columns.
- The Vector\_Multiplier performs parallel dot-products (WEIGHT\_COLS = 3 parallel multipliers).
- Results are stored in Matrix\_FM\_WM\_Memory.
- Transformation\_FSM controls read, multiply, and write sequences.

## 2. Combination Block

The Combination\_Block aggregates transformed features based on sparse graph connections (COO format).

It includes coo\_in, ROW\_Counter, Matrix\_FM\_WM\_ADJ\_Memory, and Combination\_FSM.

- coo\_in provides source-target node pairs.
- ROW\_Counter iterates over edges.
- Features are accumulated selectively according to graph edges.
- Sparse aggregation is handled efficiently to reduce unnecessary operations.
- Combination\_FSM manages control flow and signals completion.

## 3. Argmax Block

The argmax module selects the maximum feature per node after aggregation.

- Accepts aggregated feature vectors.
- Uses compare-and-select logic to find the maximum value per node.
- Stores maximum indices in max\_addi\_answer.
- Internal FSM controls idle, compare, and output states.
- Triggered after done\_comb from Combination Block.

## Data Flow

Start

13 hours

Read weight data  $[0 \rightarrow 1 \rightarrow 2 \rightarrow 3 \rightarrow 4 \rightarrow 5]$  3 times increment

Read feature data  $[0 \rightarrow 1 \rightarrow 2 \rightarrow 3 \rightarrow 4 \rightarrow 5]$  5 iteration

Increment Feature data

Increment weight.

Done-tray

while getting data use temp and sum

to multiply repetitive addition want to

Implement CSA in future

Keep storing data to FM WM Mem

## Combination block

done tray

Coo record

Read Row-1

Write row-1

Read row-2

Write row-2

Increment

Done comb

logic

$$fmwmadj[0] = fmwmadj[0] + fmwmrow[1]$$

Block in [ ] show row number

Addition performed in Adj

## Behavioral Verilog – Simulation:



## Post\_Synthesis – Simulation:

### DC simulation:



Bhaumik Thakker  
1233701211

### Apr simulation:



### Total Latency:

Total Latency: 79.5 ns, at the clock frequency of 1000 MHz.

### Screenshot of Modelsim:



Bhaumik Thakker  
1233701211

### Power:

Total Power (From Innovus): 6.72mW

```
a21116_asu@nc-asu6-l02:APR_2

File Edit View Search Terminal Help
2025-May-04 19:13:27 (2025-May-04 19:13:27 GMT): 60%
2025-May-04 19:13:27 (2025-May-04 19:13:27 GMT): 70%
2025-May-04 19:13:28 (2025-May-04 19:13:28 GMT): 80%
2025-May-04 19:13:28 (2025-May-04 19:13:28 GMT): 90%

Finished Calculating power
2025-May-04 19:13:28 (2025-May-04 19:13:28 GMT)
Ended Power Computation: (cpu=0:00:01, real=0:00:01, mem(process/total)=1289.49MB/1289.49MB)

Begin Processing User Attributes
Ended Processing User Attributes: (cpu=0:00:00, real=0:00:00, mem(process/total)=1289.49MB/1289.49MB)
Ended Power Analysis: (cpu=0:00:02, real=0:00:02, mem(process/total)=1289.49MB/1289.49MB)

Begin Static Power Report Generation
'

Total Power
-----
Total Internal Power: 2.27413741 33.8107%
Total Switching Power: 4.45073826 66.1712%
Total Leakage Power: 0.00121674 0.0181%
Total Power: 6.72669243
-----
Ended Static Power Report Generation: (cpu=0:00:00, real=0:00:00,
mem(process/total)=1289.49MB/1289.49MB)

Output file is Power/GCN.rpt
innovus 1>
innovus 1>
```

### Area:

Standard cells + Filler cells: 0.044079 mm<sup>2</sup>

```
=====
Floorplan/Placement Information
=====
Total area of Standard cells: 44079.656 um^2
Total area of Standard cells(Subtracting Physical Cells): 22927.225 um^2
Total area of Macros: 0.000 um^2
Total area of Blockages: 0.000 um^2
Total area of Pad cells: 0.000 um^2
Total area of Core: 44109.827 um^2
Total area of Chip: 53916.586 um^2
Effective Utilization: 1.0000e+00
Number of Cell Rows: 194
```

### Innovus density:

Before filler cell insertion: 0.553

```
Density: 55.387%
Total number of glitch violations: 0
-----
```

### Number of gates:

Gates = 32783

Cells = 12243

```
innovus 1>
innovus 1> reportGateCount
Gate area 0.6998 um^2
[0] GCN Gates=32783 Cells=12243 Area=22943.1 um^2
innovus 2> |
```

Bhaumik Thakker  
1233701211

## Layout:



## Post\_APR – DRC Check:



## Post\_APR – LVS Check:



Bhaumik Thakker  
1233701211

### Screenshot of your timing report for the worstcase hold setup path:

```

GCN_postRoute_all.tarp
-----/asap7_rundir/Lab4_1233701211/APR_ATimingReports

# Generated by: Cadence Innovus 17.12-s095.1
# OS: Linux x86_64(Host ID nc-asu6-l07.apporto.com)
# Generated on: Mon May 5 04:56:29 2025
# Design: GCN
# Command: optDesign -postroute -hold

Path 1: MET Setup Check with Pin u_combination_adj_mem_inst_mem_reg_2_2_15_/
CLK
Endpoint: u_combination_adj_mem_inst_mem_reg_2_2_15_/_D (v) checked with
leading edge of 'clk'
Beginning point: u_combination_fsm_inst_current_state_reg_0/_QN (^) triggered by
leading edge of 'clk'
Path Groups: {freqreg}
Analysis View: default_setup_view
Other End Arrival Time 85.600
- Setup 10.157
+ Phase Shift 1000.000
= Required Time 1075.433
- Arrival Time 999.300
= Slack Time 76.133
Clock Rise Edge 0.000
+ Drive Adjustment 5.000
+ Source Insertion Delay -89.800
= Beginpoint Arrival Time -84.800
Timing Path:
+-----+
| Pin | Edge | Net | Cell | Delay | Arrival | Required |
|-----+-----+-----+-----+-----+-----+-----+
| clk | ^ | clk | BUFX12f ASAP7 75t_R | -84.800 | -8.667 |
| CTS ccl_a BUF clk G0 L1 1/A | ^ | clk | BUFX12f ASAP7 75t_R | 2.700 | -82.160 |
| CTS ccl_a BUF clk G0 L1 1/Y | ^ | CTS_120 | BUFX12f ASAP7 75t_R | 14.300 | -67.800 |
| CTS ccl_a BUF clk G0 L2 3/A | ^ | CTS_120 | BUFX3 ASAP7 75t_R | 8.700 | -59.100 |
| CTS ccl_a BUF clk G0 L2 3/Y | ^ | CTS_107 | BUFX3 ASAP7 75t_R | 25.300 | -33.800 |
+-----+
Messenger
Matlab Tab Width: 8 Ln 1, Col 1 INS

```

### Screenshot of your timing report for the worstcase hold

```

GCN_postRoute_all.hold.tarp
-----/asap7_rundir/Lab4_1233701211/APR_ATimingReports

# Generated by: Cadence Innovus 17.12-s095.1
# OS: Linux x86_64(Host ID nc-asu6-l07.apporto.com)
# Generated on: Mon May 5 04:58:29 2025
# Design: GCN
# Command: optDesign -postroute -hold

Path 1: MET Hold Check with Pin transform/scratch_pa_inst_memory_reg_16_4_/
CLK
Endpoint: transform/scratch_pad_inst_memory_reg_16_4/_D (v) checked with
leading edge of 'clk'
Beginning point: data_in[399] (v) triggered by
leading edge of 'clk'
Path Groups: {clk}
Analysis View: default_hold_view
Other End Arrival Time 5.064
+ Hold 13.633
+ Phase Shift 0.000
= Required Time 18.707
Arrival Time 19.300
Slack Time 0.593
Clock Rise Edge 0.000
+ Input Delay 0.100
+ Drive Adjustment 2.900
= Beginpoint Arrival Time 3.000
Timing Path:
+-----+
| Pin | Edge | Net | Cell | Delay | Arrival | Required |
|-----+-----+-----+-----+-----+-----+-----+
| data_in[399] | v | data_in_16_4 | NAND2xp33 ASAP7 75t_R | 0.100 | 3.100 | 2.407 |
| transform/U1152/B | ^ | data_in_16_4 | NAND2xp33 ASAP7 75t_R | 7.500 | 16.000 | 16.007 |
| transform/U1152/Y | ^ | transform/n169 | OAI21xp33 ASAP7 75t_R | 0.000 | 16.600 | 16.007 |
| transform/U1153/B | ^ | transform/n169 | OAI21xp33 ASAP7 75t_R | 8.700 | 19.300 | 18.707 |
| transform/U1153/Y | v | transform/n2113 | OAI21xp33 ASAP7 75t_R | 8.700 | 19.300 | 18.707 |
+-----+
Messenger
Matlab Tab Width: 8 Ln 1, Col 1 INS
path:

```