Skip to content

A parametric RTL code generator of an efficient integer MxM Systolic Array implementation for Xilinx FPGAs, with error detection capabilities.

NeuroFan/LABFT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Libano's Hardened Systolic Array Generator

A parametric RTL code generator of an efficient integer MxM Systolic Array implementation for Xilinx FPGAs.

This repository is an evil cousin to Libano's Systolic Array Generator, with error detection capabilities.

This repository is also part of an IEEE Transactions on Reliability paper that is currently under review.


Overview

In a systolic array, there is a rythmic style of computation, in which, at every clock cycle, input data is pumped in, and output data is pumped out. The term systolic is therefore a reference to the functioning of a biological heart[1].

There are a number of mathematical operations that can be implemented using systolic arrays, but the one in this project is a weight stationary matrix multiplier. Nowadays, systolic arrays are the architectural core of state-of-the-art neural network accelerators, such as Google's DPU[2] and Xilinx's TPU[3].

This implementation uses 8-bit integer representation for the inputs, which allows for simultaneosly executing two multiplications in a single DSP[4]. Furthermore, a time-multiplexing scheme is employed on the DSPs[5][6], allowing them to run twice as fast as the rest of the logic. Thus, overall, each DSP is able to execute four 8-bit integer multiplications per clock cycle. The adders responsible for accumulation are implemented with CLB[7][8] elements, such as LUTs and CARRYs.

Hence, the Processing Elements (PEs) that constitute the array are multiply-accumulate (MAC) units.

systolic-arch

labft-arch


Resource Utilization & Performance

Given a systolic array of size NxN:

  • DSPs: N2 DSP48E[1[5]|2[6]] (1 for each PE)
  • Operations/Cycle: 8N2 (N2 PEs, 2x2xMUL + 4xADD per PE)
  • Frequency: Will mostly depend target device, but can also depend on N (/validation/)
    • 14x14 @ XC7Z020 @ 200MHz
    • 32x32 @ XCZU9 @ 300MHz

Repository Organization

  • /docs/: Relevant repository documentation.
  • /generator/: Python script for generating RTL (edit 'settings.py', run 'main.py', import '/RTL/import_me/*').
  • /validation/: OOC Vivado projects, scripts, and reports for synth/place/route of 14x14/32x32 arrays on 7000/US+.

References


Extras

systolic-demo

About

A parametric RTL code generator of an efficient integer MxM Systolic Array implementation for Xilinx FPGAs, with error detection capabilities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • VHDL 52.4%
  • SystemVerilog 33.2%
  • Verilog 8.1%
  • Shell 2.7%
  • Tcl 1.2%
  • Python 1.1%
  • Other 1.3%