



Website



GitHub



Documentation

# Open-Source FPGA on Silicon

## Case Studies on PRGA

**Ang Li, Ting-Jung Chang, Fei Gao, David Wentzlaff**

Princeton University

[angl@princeton.edu](mailto:angl@princeton.edu)



Website



GitHub



Documentation



CIFER<sup>[1]</sup> eFPGA



DECADES<sup>[2]</sup> eFPGA

Open-Source Hardware

7K BLEs

BRAM

DSP

Multi-Modal LUT6

12nm FinFET

Open-Source Software

Carry Chain

Fast Configuration

1. T.-J. Chang\*, A. Li\*, F. Gao, T. Ta, G. Tziantzioulis, Y. Ou, M. Wang, J. Tu, K. Xu, P. Jackson, A. Ning, G. Chirkov, M. Orenes-Vera, S. Agwa, X. Yan, E. Tang, J. Balkind, C. Batten, and D. Wentzlaff, "CIFER: A 12nm, 16mm<sup>2</sup>, 22-Core SoC with a 1541 LUT6/mm<sup>2</sup>, 1.92 MOPS/LUT, Fully Synthesizable, Cache-Coherent, Embedded FPGA", CICC'23
2. F. Gao, T.-J. Chang, A. Li, M. Orenes-Vera, D. Giri, P. Jackson, A. Ning, G. Tziantzioulis, J. Zuckerman, J. Tu, K. Xu, G. Chirkov, G. Tombesi, J. Balkind, M. Martonosi, L. Carloni, and D. Wentzlaff, "DECADES: A 67mm<sup>2</sup>, 1.46TOPS, 55 Giga Cache-Coherent 64-bit RISC-V Instructions per second, Heterogeneous Manycore SoC with 109 Tiles including Accelerators, Intelligent Storage, and eFPGA in 12nm FinFET", CICC'23



Website



GitHub



Documentation

# FPGAs are increasingly being used ...

For Open-Source Hardware  
Research & Prototyping



OpenPiton

CHIPYARD

In Long-Term Production  
Systems



- Project Catapult
- Project Brainwave

EC2 F1  
Cloud FPGAs



SAMSUNG  
SmartSSD



Website



GitHub



Documentation

# FPGAs are increasingly being used ...

For Open-Source Hardware  
Research & Prototyping

In Long-Term Production  
Systems

Research on FPGAs themselves?  
Silicon Prototyping of FPGAs?  
Domain/Application-Specific FPGAs?





Website



GitHub



Documentation



# Princeton Reconfigurable Gate Array (PRGA)

## An Open-Source FPGA Prototyping and Research Framework



Website



GitHub



Documentation

# Outline

- PRGA
- CIFER eFPGA & DECADES eFPGA
  - Architecture
  - Physical Design
- Evaluation
- Conclusion



Website



GitHub



Documentation

# Princeton Reconfigurable Gate Array

- **Co-generation** of a custom FPGA and a bespoke CAD toolchain
  - Synthesizable: ASIC EDA + standard cells
  - Open-Source CAD: Yosys, VPR, FASM, iverilog, ...



Website



GitHub



Documentation

# Princeton Reconfigurable Gate Array

- **Co-generation** of a custom FPGA and a bespoke CAD toolchain
  - Synthesizable: ASIC EDA + standard cells
  - Open-Source CAD: Yosys, VPR, FASM, iverilog, ...
- **Flexible, scalable architecture**
  - Modern FPGA features: BRAM, DSP, multi-modal logic elements, ...
  - Bring-your-own-modules



Website



GitHub



Documentation

# Princeton Reconfigurable Gate Array

- **Co-generation** of a custom FPGA and a bespoke CAD toolchain
  - Synthesizable: ASIC EDA + standard cells
  - Open-Source CAD: Yosys, VPR, FASM, iverilog, ...
- Flexible, scalable architecture
  - Modern FPGA features: BRAM, DSP, multi-modal logic elements, ...
  - Bring-your-own-modules
- Intuitive Python API
- Template-based, human-readable Verilog (Jinja)
- Automated simulation scripts at various levels of abstraction
- ...



Website



GitHub



Documentation

# Princeton Reconfigurable Gate Array





Website



GitHub



Documentation

# Princeton Reconfigurable Gate Array





Website



GitHub



Documentation

# Princeton Reconfigurable Gate Array





Website



GitHub



Documentation

# Princeton Reconfigurable Gate Array





Website



GitHub



Documentation

# Princeton Reconfigurable Gate Array





Website



GitHub



Documentation

# Outline

- PRGA
- CIFER eFPGA & DECADES eFPGA
  - Architecture
  - Physical Design
- Evaluation
- Conclusion



Website



GitHub



Documentation



CIFER<sup>[1]</sup> eFPGA



DECADES<sup>[2]</sup> eFPGA

Open-Source Hardware

7K BLEs

BRAM

DSP

Multi-Modal LUT6

12nm FinFET

Open-Source Software

Carry Chain

Fast Configuration

1. T.-J. Chang\*, A. Li\*, F. Gao, T. Ta, G. Tziantzioulis, Y. Ou, M. Wang, J. Tu, K. Xu, P. Jackson, A. Ning, G. Chirkov, M. Orenes-Vera, S. Agwa, X. Yan, E. Tang, J. Balkind, C. Batten, and D. Wentzlaff, "CIFER: A 12nm, 16mm<sup>2</sup>, 22-Core SoC with a 1541 LUT6/mm<sup>2</sup>, 1.92 MOPS/LUT, Fully Synthesizable, Cache-Coherent, Embedded FPGA", CICC'23
2. F. Gao, T.-J. Chang, A. Li, M. Orenes-Vera, D. Giri, P. Jackson, A. Ning, G. Tziantzioulis, J. Zuckerman, J. Tu, K. Xu, G. Chirkov, G. Tombesi, J. Balkind, M. Martonosi, L. Carloni, and D. Wentzlaff, "DECADES: A 67mm<sup>2</sup>, 1.46TOPS, 55 Giga Cache-Coherent 64-bit RISC-V Instructions per second, Heterogeneous Manycore SoC with 109 Tiles including Accelerators, Intelligent Storage, and eFPGA in 12nm FinFET", CICC'23



Website



GitHub



Documentation

# Acknowledgements: CIFER Team



David Wentzlaff



Christopher Batten



Jonathan Balkind



Ting-Jung Chang



Fei Gao



Tuan Ta



Georgios Tziantzioulis



Yanghui Ou



Moyang Wang



Jinzheng Tu



Kaifeng Xu



Paul Jackson



August Ning



Grigory Chirkov



Marcelo Orenes-Vera



Shady Agwa



PRINCETON  
UNIVERSITY



Cornell  
Engineering

& undergrads:  
Xiaoyu Yan & Eric Tang



Website



GitHub



Documentation

# Acknowledgements: DECADES Team



David Wentzlaff



Luca Carloni



Margaret Martonosi Jonathan Balkind



Fei Gao



Ting-Jung Chang



Marcelo Orenes-Vera



Paul Jackson



August Ning



Georgios Tzantzioulis Joseph Zuckerman



Jinzheng Tu



Kaifeng Xu



Grigory Chirkov



Gabriele Tombesi



PRINCETON  
UNIVERSITY



COLUMBIA | ENGINEERING  
The Fu Foundation School of Engineering and Applied Science



Website



GitHub



Documentation

# Outline

- PRGA
- CIFER eFPGA & DECADES eFPGA
  - Architecture
  - Physical Design
- Evaluation
- Conclusion



Website



GitHub



Documentation

# Architecture Overview

|                 |                    | CIFER eFPGA                                  | DECADES eFPGA         |
|-----------------|--------------------|----------------------------------------------|-----------------------|
| Logic Resources | BLE (LUT6 + other) | 6,720                                        | 7,040                 |
|                 | Routing Channel    | 160 (20× L1 + 15× L4)                        | 200 (20× L1 + 20× L4) |
|                 | Block RAM          | 432Kb (18× 24Kb)                             | 512Kb (32× 16Kb)      |
|                 | Hard Multiplier    | -                                            | 32× INT40             |
| Configuration   | Bitstream Size     | ~168KB                                       | ~192KB                |
|                 | Config. Network    | 8-bit packet-switched + 1-bit shift-register |                       |



Website



GitHub



Documentation

# Basic Logic Element (BLE)

CIFER eFPGA BLE



DECADES eFPGA BLE





Website



GitHub



Documentation

# Basic Logic Element (BLE)

CIFER eFPGA BLE



DECADES eFPGA BLE



Fracturable LUT



Website



GitHub



Documentation

# Basic Logic Element (BLE)

CIFER eFPGA BLE



DECADES eFPGA BLE



Fast Adder/Carry Chain



Website



GitHub



Documentation

# Basic Logic Element (BLE)

CIFER eFPGA BLE



DECADES eFPGA BLE



Bypass-able Registers



Website



GitHub



Documentation

# Hierarchical Design

## CIFER eFPGA Hierarchy



## DECADES eFPGA Hierarchy





Website



GitHub



Documentation

# Hierarchical Design

## CIFER eFPGA Hierarchy



## DECADES eFPGA Hierarchy



**6 Unique Physical Blocks**

**7 Unique Physical Blocks**



Website



GitHub



Documentation

# Configuration

## CIFER eFPGA



## DECADES eFPGA



**1-bit D-Flipflop Scan-chain within Sub-Arrays**



Website



GitHub



Documentation

# Configuration

**CIFER eFPGA**



**DECADES eFPGA**



**8-bit Packet-Switched Network between Sub-Arrays**



Website



GitHub



Documentation

# Outline

- PRGA
- CIFER eFPGA & DECADES eFPGA
  - Architecture
  - Physical Design
- Evaluation
- Conclusion



Website



GitHub



Documentation

# Overview

## CIFER eFPGA



## DECADES eFPGA



12nm FinFET

Standard Cells

SRAM Compiler



Website



GitHub



Documentation

# Floorplanning

**CIFER eFPGA**



**DECADES eFPGA**



:(< Narrow Aspect Ratio >8:1



Website



GitHub



Documentation

# Floorplanning



**DECADES eFPGA**



:(  
Narrow Aspect Ratio >8:1



Website



GitHub



Documentation

# Floorplanning



**DECADES eFPGA**



:(  
Narrow Aspect Ratio >8:1



Website



GitHub



Documentation

# Timing Constraints

- Cycle-free FPGA<sup>[1]</sup>
  - Eliminates combinational loops in FPGA architectures



1. Ang Li, Ting-Jung Chang, and David Wentzlaff, "Automated Design of FPGAs Facilitated by Cycle-Free Routing", FPL'20



Website



GitHub



Documentation

# Timing Constraints

- Cycle-free FPGA<sup>[1]</sup>
  - Eliminates combinational loops in FPGA architectures
- Config clock
  - **H-tree did not work**
    - Clock buffering -> huge area
    - Couldn't reach <1000 DRC
  - **Clock mesh**
    - >1GHz
    - Consumes a lot of power!



Illustration of the multi-source, cfg. clk mesh  
(This clock structure is NOT a novelty of this work)



Website



GitHub



Documentation

# SYN, PnR, STA, DRC, LVS, LEC, DFM ...

- Same methodology as digital VLSI design using standard cell libraries



Website



GitHub



Documentation

# Verification

- RTL testing with **no** problem
  - Emulation-over-simulation
- Gate-level simulation problems:
  - Zero-/Unit-delay gatesim: glitch amplification
    - Combinational loops through LUTs
  - SDF-annotated gatesim: couldn't get it to work...



Website



GitHub



Documentation

# Outline

- PRGA
- CIFER eFPGA & DECADES eFPGA
  - Architecture
  - Physical Design
- Evaluation
- Conclusion



Website



GitHub



Documentation

# CIFER eFPGA vs. Commercial eFPGA

|                  | Logic Resources |             |                                       | Performance   |               |                | Efficiency    |
|------------------|-----------------|-------------|---------------------------------------|---------------|---------------|----------------|---------------|
|                  | LUT6            | BRAM (Kbit) | Logic Density (LUT6/mm <sup>2</sup> ) | Fmax† (MHz)   | INT8‡ GOPS    | INT8‡ MOPS/LUT | INT8‡ GOPS/W  |
| <b>Baseline*</b> | 8760            | 0           | 1991                                  | 747           | 56.5          | 6.45           | 312.4         |
| <b>CIFER</b>     | 6720            | 432         | 1541                                  | 300           | 12.9          | 1.92           | 148.1         |
| %                | <b>76.71%</b>   | -           | <b>77.40%</b>                         | <b>40.16%</b> | <b>22.83%</b> | <b>29.77%</b>  | <b>47.41%</b> |

\* Baseline:  commercial eFPGA in TSMC **16nm** [1]

† Fmax benchmark: [*baseline*] INT16 FFT-32; [*CIFER*] 64-bit LFSR

‡ Performance/Efficiency benchmark: [*baseline*] GEMM; [*CIFER*] INT8-complex FFT-64



Website



GitHub



Documentation

# CIFER eFPGA $F_{\max}$





Website



GitHub



Documentation

# CIFER eFPGA Area Breakdown





Website



GitHub



Documentation

# DECADES eFPGA vs. CIFER eFPGA

|         | LUT  |        |       | Multiplier |        |       | Efficiency<br>(GOPs/W) |
|---------|------|--------|-------|------------|--------|-------|------------------------|
|         | Used | Avail. | Util. | Used       | Avail. | Util. |                        |
| CIFER   | 6041 | 6720   | 89.9% | 0          | 0      | 0.0%  | 148.1                  |
| DECADES | 2276 | 7040   | 32.3% | 24         | 32     | 75.0% | 170.6 (+15.2%)         |



Website



GitHub



Documentation

# Outline

- PRGA
- CIFER eFPGA & DECADES eFPGA
  - Architecture
  - Physical Design
- Evaluation
- Conclusion



Website



GitHub



Documentation

# Conclusion

- PRGA: silicon-proven, open-source FPGA IP
- CIFER eFPGA & DECADES eFPGA



Website



GitHub



Documentation

# Conclusion

- PRGA: silicon-proven, open-source FPGA IP
- CIFER eFPGA & DECADES eFPGA
- :( Notable gaps to commercial standards
- :) Lots of low-hanging optimizations available



Website



GitHub



Documentation

# Thank You!