

# ECE284 FA25 Final Progress Report

| Item                                                                     | Current Status | Status during Poster Presentation | Note                                                                                                                                                                                 |
|--------------------------------------------------------------------------|----------------|-----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Part1<br>Vanilla Version                                                 | Complete       | Complete                          | Vanilla 8x8 weight-stationary systolic array fully implemented and verified with testbench controller; FPGA synthesis completed with frequency and power reported                    |
| Part2<br>2bit and 4bit Lane<br>Reconfigurable SIMD<br>Systolic Array     | Complete       | In progress                       | Lane-reconfigurable SIMD WS array supporting 2-bit and 4-bit activation implemented; 2-bit mode processes 16 output channels via controller-level tiling without modifying datapath. |
| Part3<br>Weight-stationary and<br>output stationary<br>reconfigurable PE | Complete       | In progress                       | WS/OS reconfigurable architecture implemented with shared datapath and FIFO support; OS mode verified using first convolution layer with partial channel and spatial mapping.        |
| Alpha 1<br>Optimized training for<br>Quantized VGG                       | Complete       | Complete                          | Optimized quantization-aware training using Adam, label smoothing, and cosine scheduler to improve model accuracy.                                                                   |
| Alpha 2<br>The C2F Pruning<br>Method (Mixed-                             | Complete       | Complete                          | Mixed-granularity pruning method balancing sparsity                                                                                                                                  |

|                                                   |             |             |                                                                                                                                                                                                |
|---------------------------------------------------|-------------|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Granularity)                                      |             |             | and accuracy while remaining systolic-array friendly.                                                                                                                                          |
| Alpha 3<br>Output Stationary<br>Skip Optimization | complete    | In progress | Output-stationary gate skipping implemented with fine-grained clock gating at the PE, row, and array levels; functionality validated by comparing testbench outputs against reference results. |
| Alpha 4<br>Scalable Multi cores<br>Tiling         | In progress | In progress | Multi-core tiling architecture implemented under weight-stationary execution; functional validation is ongoing due to remaining testbench issues.                                              |