

# Ve370 Big RC

## Chapter 1

### Concepts:

1. Moore's Law
2. Hardware : physical structure
3. Software : set of instructions
  - a. Application software
  - b. System software



4. Levels of Program Code:
  - a. High-level (C++, Python)
  - b. Assembly language (mips, risc-v)
  - c. hardware representation (machine code, single stage, pipeline)

# Translating and Starting a Program



## Abstraction in modern computer systems



5. Instruction set architecture (ISA)
6. Integrated Circuit Cost

$$\text{Cost per die} = \frac{\text{Cost per wafer}}{\text{Dies per wafer} \times \text{Yield}}$$

Dies per wafer  $\approx$  Wafer area/Die area

$$\text{Yield} = \frac{1}{(1 + (\text{Defects per area} \times \text{Die area}/2))^2}$$

7. Performance

$$\begin{aligned} &\text{Performance}_X / \text{Performance}_Y \\ &= \text{Execution time}_Y / \text{Execution time}_X = n \end{aligned}$$

$$\begin{aligned} \text{CPU Time} &= \text{CPU Clock Cycles} \times \text{Clock Cycle Time} \\ &= \frac{\text{CPU Clock Cycles}}{\text{Clock Rate}} \end{aligned}$$

$$\text{Clock Rate}_B = \frac{\text{Clock Cycles}_B}{\text{CPU Time}_B} = \frac{1.2 \times \text{Clock Cycles}_A}{6s}$$

$$\begin{aligned}\text{Clock Cycles}_A &= \text{CPU Time}_A \times \text{Clock Rate}_A \\ &= 10s \times 2\text{GHz} = 20 \times 10^9\end{aligned}$$

$$\text{Clock Rate}_B = \frac{1.2 \times 20 \times 10^9}{6s} = \frac{24 \times 10^9}{6s} = 4\text{GHz}$$

$$\text{Clock Cycles} = \text{Instruction Count} \times \text{Cycles per Instruction}$$

$$\begin{aligned}\text{CPU Time} &= \text{Instruction Count} \times \text{CPI} \times \text{Clock Cycle Time} \\ &= \frac{\text{Instruction Count} \times \text{CPI}}{\text{Clock Rate}}\end{aligned}$$

$$\begin{aligned}\text{CPU Time}_A &= \text{Instruction Count} \times \text{CPI}_A \times \text{Cycle Time}_A \\ &= I \times 2.0 \times 250\text{ps} = I \times 500\text{ps} \quad \boxed{\text{A is faster...}}$$

$$\begin{aligned}\text{CPU Time}_B &= \text{Instruction Count} \times \text{CPI}_B \times \text{Cycle Time}_B \\ &= I \times 1.2 \times 500\text{ps} = I \times 600\text{ps}\end{aligned}$$

$$\frac{\text{CPU Time}_B}{\text{CPU Time}_A} = \frac{I \times 600\text{ps}}{I \times 500\text{ps}} = 1.2 \quad \boxed{\text{...by this much}}$$

$$\text{Clock Cycles} = \sum_{i=1}^n (\text{CPI}_i \times \text{Instruction Count}_i)$$

$$\text{CPI} = \frac{\text{Clock Cycles}}{\text{Instruction Count}} = \sum_{i=1}^n \left( \text{CPI}_i \times \frac{\text{Instruction Count}_i}{\text{Instruction Count}} \right)$$

Instruction Count (IC)      CPI      Clock cycle time (Tc)

$$\text{CPU Time} = \frac{\text{Instructions}}{\text{Program}} \times \frac{\text{Clock cycles}}{\text{Instruction}} \times \frac{\text{Seconds}}{\text{Clock cycle}}$$

- Performance depends on
  - Algorithm: affects IC, possibly CPI
  - Programming language: affects IC, CPI
  - Compiler: affects IC, CPI
  - Instruction set architecture: affects IC, CPI, T<sub>c</sub>

## 7. Power

$$\text{Power} = \text{Capacitive load} \times \text{Voltage}^2 \times \text{Frequency}$$

**Suggestions:** Have a quick look over the slides and the textbook.

## Chapter 2

Principles:

**Design Principle 1: Simplicity favours regularity**

**Design Principle 2: Smaller is faster**

**Design Principle 3: Make the common case fast**

RF and Memory

RISC-V has a  $32 \times 64$ -bit register file

- Use for frequently accessed data
  - 64-bit data is called a “doubleword”
    - $32 \times 64$ -bit general purpose registers x0 to x31
  - **32-bit data is called a “word”**
- A register is a group of flip-flops used to store a binary word



A 4 bit register file

- x0: the constant value 0
- x1: return address
- x2: stack pointer
- x3: global pointer
- x4: thread pointer
- x5 – x7, x28 – x31: temporaries
- x8: frame pointer
- x9, x18 – x27: saved registers
- x10 – x11: function arguments/results
- x12 – x17: function arguments

- Memory is byte addressed
  - Each address identifies an **8-bit** (byte)
- RISC-V is LittleEndian
  - Least-significant byte at least address of a word



Instructions:





Function calling:

- Six steps required
  1. Place parameters in registers x10 to x17
  2. Transfer control to procedure
  3. Acquire storage for procedure
  4. Perform procedure's operations
  5. Place result in register for caller
  6. Return to place of call (address in x1)

**Note:** Leaf and non-Leaf functions have different procedures

**Others:**

- `jalr x0, 0(x1) <-> jr x1`
- Like `jal`, but jumps to  $0 + \text{address in } x1$



**Suggestions:** Review Project 1. Beginning of RC in week 4 gives an example of calculateAverage. Function calling is strictly formative, memorizing the procedure is of critical importance.

Also note that in P1, you are using RV32, but for P2, P3 and exams, you are using RV64, beware the difference



## Chapter 5

1. 2's complement arithmetic (Ve270)
2. Multiplication and Division (Project 1), review the process chart on the slides

# Multiplication Hardware



~ 200 cycles/multiplication!



# Division Hardware



$$\text{Dividend} = \text{Quotient} * \text{Divisor} + \text{Remainder}$$

3. Corresponding RISC-V Extension set
4. Sign dealing in mul and div

## Chapter 6

# IEEE Floating-Point Format

single: 8 bits  
double: 11 bits

single: 23 bits  
double: 52 bits

| S | Exponent | Fraction |
|---|----------|----------|
|---|----------|----------|

$$x = (-1)^S \times (1 + \text{Fraction}) \times 2^{(\text{Exponent} - \text{Bias})}$$

- S: sign bit (0  $\Rightarrow$  non-negative, 1  $\Rightarrow$  negative)
- Normalize significand:  $1.0 \leq |\text{significand}| < 2.0$ 
  - Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)
  - Significand is Fraction with the “1.” restored
- Exponent: excess representation: actual exponent + Bias
  - Ensures exponent is **unsigned**
  - Single: **Bias = 127**; Double: **Bias = 1203**

$$-235.125 = -100100101001.001_2 = -1.00100101001001_2 \times 2^{11}$$

Other aspects:

1. Range
2. Precision
3. Overflow
4. Special Representations
5. Arithmetic for float numbers
6. FP Extensions in RISC-V

**Suggestions:** Along with Chapter 5, check the textbook & slides and do a few exercises

## Chapter 7

See Ve270 or Slides

## Chapter 8



Performance issues:

Critical path: load instruction

All instructions have same execution time

Clock frequency determined by critical path

Violates "making common case fast"

**Suggestions:** Critically important, over 50% chance of having a question on this topic. Exercise on this helps a lot. Especially focus on the control signals.

## Chapter 9-10

Reason for having pipeline structure: **Performance**

Core for pipeline: Separate single-stage into multiple segments (so the whole processor will not be occupied by one instruction), like assembly line



5 stages: IF, ID, EX, MEM, WB. Bars between each two stages, named IF/ID, ID/EX, used for storing signals from previous stage

## Five stages, one step per stage

1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register





- shading on right half means “READ”
- shading on left half means “WRITE”
- white background means “NOT USED”
- dotted line means always “NO WRITE” and “NO READ” on ID and WB

Others:

Conceptual terminology

Pipeline diagram

#### Time calculation

**Suggestions:** Normally you should expect at least one question on this. **DO HAVE SOME PRACTICE YOURSELF**

## Chapter 11

- Data hazards
  - Instruction output needed before it's available
  - Need to wait for previous instruction to complete its data read/write
- Structural hazards
  - Same resource needed for different purposes at the same time (Possible: ALU, Register File, Memory)
  - Why do we have IM and DM separately?
- Control hazards
  - Flow of execution depends on previous instruction
  - One instruction affects whether another executes at all



Two kinds of data hazard, classified according to usage: EX hazard and MEM hazard

### EX hazard

#### Method 1: Stalling

Just delay the subsequent commands (nop)

Cons: use time

#### Method 2: Forwarding

Core: Earliest time possible

The latest usage time is (pre)EX stage

The earliest time possible is (aft)EX stage

If earliest time possible < latest time needed, then forwarding can be implemented

**How to implement:** Add a path from result of ALU (or other possible stages) to input of ALU, and use MUX and control signals to determine whether forwarding or not

# Forwarding Paths with ALUSrc



How to determine forwarding:

## Revised Forwarding Conditions

- 1&2 hazard (Type 1)
  - if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)  
and (EX/MEM.RegisterRd == ID/EX.RegisterRs1))  
**MUX select signal ForwardA = 10**
  - if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)  
and (EX/MEM.RegisterRd == ID/EX.RegisterRs2))  
**MUX select signal ForwardB = 10**
- 1&3 hazard (Type 2)
  - if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)  
and (MEM/WB.RegisterRd == ID/EX.RegisterRs1))  
**MUX select signal ForwardA = 01**
  - if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)  
and (MEM/WB.RegisterRd == ID/EX.RegisterRs2))  
**MUX select signal ForwardB = 01**



# Revised Forwarding Condition

- MEM hazard
  - if (MEM/WB.RegWrite  
and (MEM/WB.RegisterRd ≠ 0)  
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)  
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))  
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 01
  - if (MEM/WB.RegWrite  
and (MEM/WB.RegisterRd ≠ 0)  
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)  
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))  
and (MEM/WB.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 01

Type 1

## Datapath with Hazard Detection



## MEM hazard (Load Use Hazard)

The ultimate choice is to **stall**

Another method to save time is code scheduling

# Question: how to implement



JOINT INSTITUTE  
交大密西根学院

## Chapter 12

Method 1: Stall

Method 2: Branch Prediction (Static and Dynamic)

Penalty for wrong prediction

Also move branch into earlier stages will help. But beware not to influence critical path (or critical stage), you might also need forwarding for this, too



| IF.Flush | ID.Flush | EX.Flush |
|----------|----------|----------|
| 0        | 0        | 0        |



## Putting it all together



**Good Luck to your Mid-Term Exam!**