

## Leveraging MLIR for Efficient Irregular-Shaped CGRA Overlay Design

Mohamed Bouaziz, Suhail A. Fahmy

King Abdullah University of Science and Technology (KAUST), Saudi Arabia

### Motivation



Fig. 1: Generic CGRA architecture

- Efficient as datapath architectures ✓
- Flexible as FPGAs ✓
- Communicates at word level ✓
- Computes using efficient FUs ✓
- Regular-shaped and homogenous ✗
- Doesn't fit HPC data patterns ✗

#### Example:

- HW underutilization of top right PE & bottom left PE
- Oblivious compiler to data access patterns
- Much complex search space

#### Generally, it applies to:

- Interconnects
- Memory and registers structure
- Functional units operations



Fig. 2: Mapping Example

→ Poor hardware reuse & Increased deployment effort

### Related Work & Proposed Approach

|                   | HW optimization | SW optimization | Overlay Generation | Multi-targeting |
|-------------------|-----------------|-----------------|--------------------|-----------------|
| REVAMP, ASPLOS'22 | +               | -               | -                  | +               |
| FlexC, ArXiv'23   | -               | +               | -                  | +               |
| OverGen, Micro'22 | ±               | ±               | +                  | -               |
| This work         | +               | +               | +                  | +               |

Tab. 1: Comparison with Related Work



Fig. 3: Optimized Overlay Generation Process

### Preliminary Results



Fig. 4: Preliminary Results



Fig. 4: AMD Versal ACAP VCK5000