

## ECE 464/564 Fall 2025 Project

### Convolutional Neural Network Pipeline on a 1024x1024 Input with a 4x4 Kernel

Vishwarudhresh.C

VCHAND22



# Data path

Input stage:



Output stage



# Convolution Stage



# Pooling stage



$\text{avg} > 128$   
 $\text{avg} < -128$

# Control path



# Row\_Loaded



## Row convolution



## Column convolution



## Pool offset



## Control Signal Generation Truth Table

| State   | icmd[1:0] | ocmd[1:0] | ioe   | ooe   | rdy   | wact  | Description              |
|---------|-----------|-----------|-------|-------|-------|-------|--------------------------|
| -----   | -----     | -----     | ----- | ----- | ----- | ----- | -----                    |
| IDLE    | 00        | 00        | 0     | 0     | 1     | 0     | Ready, waiting for start |
| LOAD_K  | 00        | 00        | 0     | 0     | 0     | 0     | Prepare to load filter   |
| PREP    | 00        | 00        | 0     | 0     | 0     | 0     | Prepare to load row      |
| RD_REQ  | 01        | 00        | 0     | 0     | 0     | 0     | Issue read command       |
| RD_WAIT | 00        | 00        | 0     | 0     | 0     | 0     | Wait for tRCD            |
| RD_EXE  | 00        | 00        | 0     | 0     | 0     | 0     | Read data from DRAM      |
| RD_DONE | 00        | 00        | 0     | 0     | 0     | 0     | Read complete            |
| PROC    | 00        | 00        | 0     | 0     | 0     | 0     | Processing pixels        |
| WR_REQ  | 00        | 10        | 0     | 0     | 0     | 1     | Issue write command      |
| WR_WAIT | 00        | 00        | 0     | wc>=4 | 0     | 1     | Wait for tRCD            |
| WR_EXE  | 00        | 00        | 0     | 1     | 0     | 1     | Write data to DRAM       |
| WR_DONE | 00        | 00        | 0     | 0     | 0     | 0     | Write complete           |
| FINISH  | 00        | 00        | 0     | 0     | 1     | 0     | Done, ready high         |

## Memory Write Data Selection Truth Table

| bc[2:0] | Selected Byte | Output Data (odin) |
|---------|---------------|--------------------|
| 0       | obuf[63:56]   | obuf[(7-0)*8 +: 8] |
| 1       | obuf[55:48]   | obuf[(7-1)*8 +: 8] |
| 2       | obuf[47:40]   | obuf[(7-2)*8 +: 8] |
| 3       | obuf[39:32]   | obuf[(7-3)*8 +: 8] |
| 4       | obuf[31:24]   | obuf[(7-4)*8 +: 8] |
| 5       | obuf[23:16]   | obuf[(7-5)*8 +: 8] |
| 6       | obuf[15:8]    | obuf[(7-6)*8 +: 8] |
| 7       | obuf[7:0]     | obuf[(7-7)*8 +: 8] |

```

# translate
# report_timing > ${::env(OUTPUT_DIR)}/reports/timing_max_fast_holdfixed_${type}.rpt
# report_timing -delay min > ${::env(OUTPUT_DIR)}/reports/timing_min_fast_holdfixed_${type}.rpt
# set target_library NangateOpenCellLibrary_PDKv1_2_v2008_10_typical_nldm.db
# set link_library NangateOpenCellLibrary_PDKv1_2_v2008_10_typical_nldm.db
# set link_library [concat $link_library dv_foundation.sldb]
# translate
# report_timing > ${::env(OUTPUT_DIR)}/reports/timing_max_TYP_HOLDfixed_${type}.rpt
# report_timing -delay min > ${::env(OUTPUT_DIR)}/reports/timing_min_TYP_HOLDfixed_${type}.rpt
#-----
# Write out area distribution for the final design
#-----
report_cell > ${::env(OUTPUT_DIR)}/reports/cell_report_final.rpt
#-----
# Write out the resulting netlist in Verilog format
#-----
change_names -rules verilog -hierarchy > ${::env(OUTPUT_DIR)}/logs/fixed_names_init
write -hierarchy -f verilog -o ${::env(OUTPUT_DIR)}/gl/${modname}_final.v
Writing verilog file '/mnt/ncsudrive/v/vchand22/proj/build/synth/gl/dut_final.v'.
# write -hierarchy -format verilog -output ${modname}_netlist_holdfixed_${type}.v #RAVI
#-----
# Write out the 'slowest' (maximum) timing file
# in Standard Delay Format. We might use this in
# later verification.
#-----
write_sdf ${::env(OUTPUT_DIR)}/sdf/${modname}_max.sdf
Information: Annotated 'cell' delays are assumed to include load delay. (UID-282)
Information: Writing timing information to file '/mnt/ncsudrive/v/vchand22/proj/build/synth/sdf/dut_max.sdf'. (WT-3)
1

Memory usage for this session 2211 Mbytes.
Memory usage for this session including child processes 2674 Mbytes.
CPU usage for this session 2903 seconds ( 0.81 hours ).
Elapsed time for this session 2927 seconds ( 0.81 hours ).

Thank you...
[100%] Built target synth
[vchand22@grendel43 build]$ cmake .. --preset run
Preset CMake variables:

```

```

# Loading sv_std.std
# Loading work.tb(fast)
# Loading work.sram1rw(fast)
# Loading work.dram(fast)
# Loading work.dram(fast_1)
# Loading work.dut(fast)
# Loading work.tri_state_driver(fast)
# run -all
# [TB] cfg: dqwidth=8, burstlen=8, rdlat=5
# [TB] start
# INFO[TB]: ##### Running Test: 0 #####
# INFO[TB]: ##### CLASS: ECE564 #####
# INFO:[dram] Reading memory file: '/mnt/ncsudrive/v/vchand22/proj/inputs/input0.dat'
# INFO:LVL0: Test: Passed
# INFO[TB]: ##### Running Test: 1 #####
# INFO[TB]: ##### CLASS: ECE564 #####
# INFO:[dram] Reading memory file: '/mnt/ncsudrive/v/vchand22/proj/inputs/input1.dat'
# INFO:LVL0: Test: Passed
# INFO[TB]: ##### Running Test: 2 #####
# INFO[TB]: ##### CLASS: ECE564 #####
# INFO:[dram] Reading memory file: '/mnt/ncsudrive/v/vchand22/proj/inputs/input2.dat'
# INFO:LVL0: Test: Passed
# INFO[TB]: ##### Running Test: 3 #####
# INFO[TB]: ##### CLASS: ECE564 #####
# INFO:[dram] Reading memory file: '/mnt/ncsudrive/v/vchand22/proj/inputs/input3.dat'
# INFO:LVL0: Test: Passed
# INFO[TB]: ##### Running Test: 4 #####
# INFO[TB]: ##### CLASS: ECE564 #####
# INFO:[dram] Reading memory file: '/mnt/ncsudrive/v/vchand22/proj/inputs/input4.dat'
# INFO:LVL0: Test: Passed
# INFO[TB]: ##### Running Test: 5 #####
# INFO[TB]: ##### CLASS: ECE564 #####
# INFO:[dram] Reading memory file: '/mnt/ncsudrive/v/vchand22/proj/inputs/input5.dat'
# INFO:LVL0: Test: Passed
# INFO[TB]: Total number of cases : 6
# INFO[TB]: Total number of passes : 6
# INFO[TB]: Finial Results      : 100.00
# INFO[TB]: Finial Time Result   : 208232100 ns
# INFO[TB]: Finial Cycle Result   : 41646420 cycles
#
# [TB] done
# ** Note: $finish    : /mnt/ncsudrive/v/vchand22/proj/srcs/tb/tb.sv(318)
#       Time: 208232100 ns Iteration: 1 Instance: /tb
# End time: 06:15:22 on Nov 24, 2025, Elapsed time: 0:01:02
# Errors: 0, Warnings: 1
[100%] Built target vsim-dut-c
[vchand22@grendel43 build]$ █

```

## Timing reports

```
timing_min_fast_holdcheck_tut1.rpt +  
File Edit View  
Operating Conditions: fast Library: NangateOpenCellLibrary_PDKv1  
Wire Load Model Mode: top  
Startpoint: rbat_reg[0]  
          (rising edge-triggered flip-flop clocked by clk)  
Endpoint: rbat_reg[0]  
          (rising edge-triggered flip-flop clocked by clk)  
Path Group: clk  
Path Type: min  
  
Point           Incr      Path  
-----  
clock clk (rise edge)    0.0000  0.0000  
clock network delay (ideal) 0.0000  0.0000  
rbat_reg[0]/CK (DFFR_X1) 0.0000 # 0.0000 r  
rbat_reg[0]/Q (DFFR_X1)  0.0730  0.0730 r  
U380287/ZN (AOI22_X1)   0.0182  0.0912 f  
rbat_reg[0]/D (DFFR_X1)  0.0000  0.0912 f  
data arrival time        0.0912  
  
clock clk (rise edge)    0.0000  0.0000  
clock network delay (ideal) 0.0000  0.0000  
clock uncertainty       0.0500  0.0500  
rbat_reg[0]/CK (DFFR_X1) 0.0000  0.0500 r  
library hold time        0.0020  0.0520  
data required time        0.0520  
-----  
data required time        0.0520  
data arrival time         -0.0912  
-----  
slack (MET)              0.0391  
  
timing_min_fast_holdcheck_tut1.rpt +  
File Edit View  
Operating Conditions: fast Library: NangateOpenCellLibrary_PDKv1_2_v200  
Wire Load Model Mode: top  
Startpoint: rbat_reg[0]  
          (rising edge-triggered flip-flop clocked by clk)  
Endpoint: rbat_reg[0]  
          (rising edge-triggered flip-flop clocked by clk)  
Path Group: clk  
Path Type: min  
  
Point           Incr      Path  
-----  
clock clk (rise edge)    0.0000  0.0000  
clock network delay (ideal) 0.0000  0.0000  
rbat_reg[0]/CK (DFFR_X1) 0.0000 # 0.0000 r  
rbat_reg[0]/Q (DFFR_X1)  0.0730  0.0730 r  
U380287/ZN (AOI22_X1)   0.0182  0.0912 f  
rbat_reg[0]/D (DFFR_X1)  0.0000  0.0912 f  
data arrival time        0.0912  
  
clock clk (rise edge)    0.0000  0.0000  
clock network delay (ideal) 0.0000  0.0000  
clock uncertainty       0.0500  0.0500  
rbat_reg[0]/CK (DFFR_X1) 0.0000  0.0500 r  
library hold time        0.0020  0.0520  
data required time        0.0520  
-----  
data required time        0.0520  
data arrival time         -0.0912  
-----  
slack (MET)              0.0391  
  
1
```

```

timing_max_slow_holdfixed.tut1.rj × +
File Edit View
intadd_0/U9/C0 (FA_X1) 0.5172 31.7810 f
intadd_0/U8/C0 (FA_X1) 0.5172 32.2983 f
intadd_0/U7/C0 (FA_X1) 0.5172 32.8155 f
intadd_0/U6/C0 (FA_X1) 0.5172 33.3327 f
intadd_0/U5/C0 (FA_X1) 0.5172 33.8500 f
intadd_0/U4/C0 (FA_X1) 0.5172 34.3672 f
intadd_0/U3/C0 (FA_X1) 0.5172 34.8845 f
intadd_0/U2/C0 (FA_X1) 0.4983 35.3828 f
U178842/ZN (XNOR2_X2) 0.4035 35.7863 r
U381055/ZN (AOI22_X1) 0.2065 35.9928 f
U180134/ZN (OR4_X1) 0.2943 36.2871 f
U178841/ZN (OR2_X1) 0.2993 36.5864 f
U381089/ZN (NOR4_X1) 0.5710 37.1574 r
U381093/ZN (AOI211_X1) 0.1948 37.3522 f
U178839/ZN (NOR2_X1) 0.3823 37.7345 r
U381121/ZN (OAI21_X1) 0.2815 38.0159 f
U381181/ZN (OAI22_X1) 0.4364 38.4523 r
obuf_reg[10]/D (DFFR_X1) 0.0000 38.4523 r
data arrival time 38.4523

clock clk (rise edge) 40.0000 40.0000
clock network delay (ideal) 0.0000 40.0000
clock uncertainty -0.0500 39.9500
obuf_reg[10]/CK (DFFR_X1) 0.0000 39.9500 r
library setup time -0.3647 39.5853
data required time 39.5853
-----
data required time 39.5853
data arrival time -38.4523
-----
slack (MET) 1.1330

```

### Cell report final :

|                    |         |                                                               |
|--------------------|---------|---------------------------------------------------------------|
| wrp_reg[21]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[22]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[23]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[24]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[25]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[26]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[27]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[28]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[29]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[30]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| wrp_reg[31]        | DFFR_X1 | NangateOpenCellLibrary_PDKv1_2_v2008_10_slow_nldm<br>5.5860 n |
| -----              |         |                                                               |
| Total 277902 cells |         | 456407.8647                                                   |
| 1                  |         |                                                               |

Clock cycle achieved: 40 ns

Cell area achieved:456407.86 microns

Cycles achieved: 41646420 cycles

Test cases passed : 6