

# [HW6\_prob1] VGGNet\_Hardware\_Mapping\_Tiling (10 pts)

- The VGGNet 3rd layer (assuming layers are indexed from 0)
- Weight stationary with an array size = 16 X 16
- Input and output channels are both 64
- Needs to be tiled into 16 pieces
- Goal is to format input, psum, and output in the following form:
- Input (a\_tile) format: [ic\_tile, oc\_tile, ic index (row #), nij (time step)]
- psum (psum) format: [ic\_tile, oc\_tile, oc index (col #), nij (time step), kij]
- Output (out) format: [oc index, o\_nij]

*Abbreviation - ic: input channel, oc: output channel, o\_nij: nij index for output*

*Refer “[Code14]\_VGGNet\_Hardware\_Mapping.ipynb” to understand the indices better with an example.*

## [HW6\_prob2-1] mac\_tile Design (no need to simulate compilation error should be zero) (10 pts)

- Download from github: hardware/w5/hw1\_2
- Equip ports: in\_w, out\_e, in\_n, out\_s, inst\_w (input), inst\_e (output), clk, reset (synchronous)
- Add latches: inst\_q[1:0], a\_q (activation), b\_q (weight), c\_q (psum), load\_ready\_q
- a\_q is connected to out\_e, and inst\_e is connected to inst\_q
- When reset ==1, inst\_q[1:0] becomes all 0, and load\_ready\_q becomes 1
- Accept your inst\_w[1] (execution) always into inst\_q[1] latch.
- When either inst\_w[0] or inst\_w[1], accept the new in\_w into a\_q latch.
- When inst\_w[0] (kernel load) == 1 and also load\_ready\_q ==1, accept the new weight in the b\_q latch via in\_w port. At the same time, load\_ready\_q becomes 0.
- When load\_ready\_q ==0, latch inst\_w[0] into inst\_q[0], which is connected to inst\_e

# Waveform for 2 PEs in a Row

tile0

```
clk=0  
inst_w[1:0]=01  
load_ready_q=0  
in_w[3:0]=x  
b_q[3:0]=F  
inst_q[1:0]=01  
inst_w[1:0]=01  
load_ready_q=0  
in_w[3:0]=x  
b_q[3:0]=1  
inst_q[1:0]=01
```

tile0



Bubble cycle

## [HW6\_prob2-2] mac\_row & mac\_array (no need to simulate compilation error should be zero) (10 pts)

- Use for loop to build mac\_row based on mac\_tile from prob1
  - PE row should provide a “valid[col-1:0]” vector to let ofifo know
  - Inst\_w[1:0] is also propagated from col0 to col7
  - valid for the column is inst\_e[1] for the column
  - Look at “temp” wire use-case
- 
- Similarly, use for loop to build mac\_array based on mac\_row
  - PE row should provide a “valid[col-1:0]” vector to let ofifo know
  - Here, only the last row’s valid signals are used.