

# RISC-V SoC Microarchitecture Design & Optimization

## Design Review #3

**Group 23**

**Instructor & Sponsor:** Weikang Qian

**Group Member:** Li Shi, Jian Shi, Yichao Yuan, Yiqiu Sun, Zhiyuan Liu



JOINT INSTITUTE  
交大密西根学院

# Team Members



Zhiyuan Liu

Li Shi

Yichao Yuan

Jian Shi

Yiqiu Sun

# Overview

- Introduction
- Detailed Design
  - Microarchitecture Design
  - Animation
- Discussion of Engineering Specifications
- Implement Plan & Progress Review
- Conclusion

# 1. Introduction

# Introduction Design Problems

Domain Specific Optimization: Future of Computing



Fig 1. Tesla self-driving cars.

Source: [www.businessinsider.com/tesla-autopilot-full-self-driving-subscription-early-2021-elon-musk-2020-12](http://www.businessinsider.com/tesla-autopilot-full-self-driving-subscription-early-2021-elon-musk-2020-12)



## 2. Detailed Design

# Detailed Microarchitecture Design

## Simplified Pipeline Diagram (4-way Superscalar, 10 Stages)



Fig 2. Simplified pipeline diagram.

# Detailed Microarchitecture Design

## Superscalar Execution (Frontend)



**Instruction Fetch**  
4-way with branch prediction

**Instruction Decode**  
4-way

**Register Rename**  
4-way (12 ports)

# Detailed Microarchitecture Design

## Superscalar Execution (Backend)



# Detailed Microarchitecture Design

## Instruction Dynamic Scheduling



**Simulation &  
Verification**

**Simulation  
&  
Verification**

Circuit  
Layout  
Design

Run Synthesis

# 3. Discussion of Engineering Specifications

# Discussion of Engineering Specifications

## Overview

(↑ Higher is better / ↓ Lower is better)

|                                                  | Unit  | Target Value | Measured Value | Comment           |
|--------------------------------------------------|-------|--------------|----------------|-------------------|
| Support RV32G instruction set architecture (ISA) | -     | Yes          | Partially      | RV32IM            |
| Core frequency on FPGA test platform             | MHz   | 100 ↑        | 74.88          | Synthesis Result  |
| Number of pipeline stages                        | -     | 9            | 10             |                   |
| Instructions executed per clock cycle (IPC)      | -     | 0.5 ↑        | -              | TODO              |
| Support instruction dynamic scheduling           | -     | Yes          | Yes            |                   |
| Typical total cache size                         | KB    | 32 ↑         | 0              | Under Development |
| Number of function units                         | -     | 6 ↑          | 6              |                   |
| Usage of look-up tables (LUT) on FPGA            | k     | 120 ↓        | 74.72          | Synthesis Result  |
| Usage of block RAM (BRAM) on FPGA                | -     | 50 ↓         | 0              | Synthesis Result  |
| Usage of digital signal processor (DSP) on FPGA  | -     | 30 ↓         | 8              | Synthesis Result  |
| Power consumption on target FPGA test platform   | W     | 5 ↓          | -              | TODO              |
| Operations processed within unit energy          | MOp/J | 25 ↑         | -              | TODO              |
| Number of flexibly-configured modules            | -     | 10 ↑         | 13             |                   |
| User guide and programmers manual                | -     | Yes          | Partially      | Under Development |

Table 1. Engineering specifications.

# Discussion of Engineering Specifications

## Overview

(↑ Higher is better / ↓ Lower is better)

|                                                  | Unit  | Target Value | Measured Value | Comment           |
|--------------------------------------------------|-------|--------------|----------------|-------------------|
| Support RV32G instruction set architecture (ISA) | -     | Yes          | Partially      | RV32IM            |
| Core frequency on FPGA test platform             | MHz   | 100 ↑        | 74.88          | Synthesis Result  |
| Number of pipeline stages                        | -     | 9            | 10             |                   |
| Instructions executed per clock cycle (IPC)      | -     | 0.5 ↑        | -              | TODO              |
| Support instruction dynamic scheduling           | -     | Yes          | Yes            |                   |
| Typical total cache size                         | KB    | 32 ↑         | 0              | Under Development |
| Number of function units                         | -     | 6 ↑          | 6              |                   |
| Usage of look-up tables (LUT) on FPGA            | k     | 120 ↓        | 74.72          | Synthesis Result  |
| Usage of block RAM (BRAM) on FPGA                | -     | 50 ↓         | 0              | Synthesis Result  |
| Usage of digital signal processor (DSP) on FPGA  | -     | 30 ↓         | 8              | Synthesis Result  |
| Power consumption on target FPGA test platform   | W     | 5 ↓          | -              | TODO              |
| Operations processed within unit energy          | MOp/J | 25 ↑         | -              | TODO              |
| Number of flexibly-configured modules            | -     | 10 ↑         | 13             |                   |
| User guide and programmers manual                | -     | Yes          | Partially      | Under Development |

Table 1. Engineering specifications.

# Discussion of Engineering Specifications

## Design Oversights - Energy Measurement

### Failure in measuring energy consumption for a single chip

Board Energy Consumption  $\neq$  Chip Energy Consumption

**Incorrect data  
from Xilinx Vivado EDA tool**



Fig 3. Vivado synthesis summary.

# Discussion of Engineering Specifications

## Overview

(↑ Higher is better / ↓ Lower is better)

|                                                  | Unit  | Target Value | Measured Value | Comment           |
|--------------------------------------------------|-------|--------------|----------------|-------------------|
| Support RV32G instruction set architecture (ISA) | -     | Yes          | Partially      | RV32IM            |
| Core frequency on FPGA test platform             | MHz   | 100 ↑        | 74.88          | Synthesis Result  |
| Number of pipeline stages                        | -     | 9            | 10             |                   |
| Instructions executed per clock cycle (IPC)      | -     | 0.5 ↑        | -              | TODO              |
| Support instruction dynamic scheduling           | -     | Yes          | Yes            |                   |
| Typical total cache size                         | KB    | 32 ↑         | 0              | Under Development |
| Number of function units                         | -     | 6 ↑          | 6              |                   |
| Usage of look-up tables (LUT) on FPGA            | k     | 120 ↓        | 74.72          | Synthesis Result  |
| Usage of block RAM (BRAM) on FPGA                | -     | 50 ↓         | 0              | Synthesis Result  |
| Usage of digital signal processor (DSP) on FPGA  | -     | 30 ↓         | 8              | Synthesis Result  |
| Power consumption on target FPGA test platform   | W     | 5 ↓          | -              | TODO              |
| Operations processed within unit energy          | MOp/J | 25 ↑         | -              | TODO              |
| Number of flexibly-configured modules            | -     | 10 ↑         | 13             |                   |
| User guide and programmers manual                | -     | Yes          | Partially      | Under Development |

Table 1. Engineering specifications.

# Discussion of Engineering Specifications

## Design Oversights - Circuit Size

### Failure in on-board test

Expensive to do verification on large FPGA board

| Name        | CLB LUTs<br>(522720) | CLB Registers<br>(1045440) | CARRY8<br>(65340) | F7 Muxes<br>(261360) | F8 Muxes<br>(130680) | CLB<br>(65340) | LUT as Logic<br>(522720) | LUT as Memory<br>(161280) | DSPs<br>(1968) | Bonded IOB<br>(668) | HPIOB_M<br>(264) | HPIOB_S<br>(264) | HPIOB_SNGL<br>(44) | GLOBAL CLOCK<br>BUFFERS (940) |
|-------------|----------------------|----------------------------|-------------------|----------------------|----------------------|----------------|--------------------------|---------------------------|----------------|---------------------|------------------|------------------|--------------------|-------------------------------|
| N top       | 74717                | 16226                      | 60                | 6241                 | 2061                 | 13669          | 74055                    | 662                       | 8              | 199                 | 98               | 97               | 4                  | 1                             |
| core (core) | 74717                | 16226                      | 60                | 6241                 | 2061                 | 13669          | 74055                    | 662                       | 8              | 0                   | 0                | 0                | 0                  | 0                             |

Fig 4. Vivado synthesis result.

### Verification through software

Xilinx Vivado (EDA), Verilator (Digital Circuit Simulator), GTKWave (Waveform Viewer) ...



# 4. Implement Plan & Progress Review

# Implement Plan & Progress Review



# Modifications & Oversight

|                        | <b>Modifications</b>                                                    | <b>Oversights</b>                                                              |
|------------------------|-------------------------------------------------------------------------|--------------------------------------------------------------------------------|
| <b>Synthesis</b>       | Application-Specific Integrated Circuit (ASIC) instead of FPGA          | Inadequate placing/routing space on our available FPGA                         |
| <b>Simulation</b>      | Verilator instead of Vivado                                             | Open-source and more convenient for testing<br>Based on C++ testbench          |
| <b>Execution Units</b> | Open-source Intellectual Property (IP) cores instead of Xilinx IP cores | Open-source and more flexible<br>May require more work to integrate and verify |

Table 2. Project modifications and oversights.

# 5. Conclusion

# Conclusion



# Thank you!

## RISC-V SoC Microarchitecture Design & Optimization

**Group 23**

**Instructor & Sponsor:** Weikang Qian

**Group Member:** Li Shi, Jian Shi, Yichao Yuan, Yiqiu Sun, Zhiyuan Liu



JOINT INSTITUTE  
交大密西根学院