

**Lecture 11:**

**Implementing a**

**Processor 3/5**

Assignment Project Exam Help

<https://tutorcs.com>

WeChat: cstutorcs

---

**Introduction to Computer Architecture**

**UC Davis EEC 170, Fall 2019**

# RISC-V Pipelined Datapath



# Pipeline registers

- Need registers between stages
  - To hold information produced in previous cycle
- If a signal needs to be valid across clock stages, you *must* store it in a register

Assignment Project Exam Help

<https://tutorcs.com>

WeChat: cstutorcs



# Pipeline Operation

- Cycle-by-cycle flow of instructions through the pipelined datapath
  - “Single-clock-cycle” pipeline diagram
    - Shows pipeline usage in a single cycle  
Assignment Project Exam Help
    - Highlight resources used  
<https://tutorcs.com>
    - c.f. “multi-clock-cycle” diagram  
WeChat: estutorcs
    - Graph of operation over time
- We'll look at “single-clock-cycle” diagrams for load & store

# IF for Load, Store, ...



# ID for Load, Store, ...



# EX for Load



# MEM for Load



# WB for Load

Id  
Write-back



# Corrected Datapath for Load



# EX for Store



# MEM for Store



# WB for Store

sd  
Write-back



# Multi-Cycle Pipeline Diagram

## ■ Form showing resource usage



# Multi-Cycle Pipeline Diagram

## ■ Traditional form



# Single-Cycle Pipeline Diagram

## ■ State of pipeline in a given cycle



# Pipelined Control (Simplified)



# Pipelined Control

- Control signals derived from instruction
  - As in single-cycle implementation



# Pipelined Control



# Data Hazards in ALU Instructions

- Consider this sequence:

```
sub  x2, x1, x3  
and  x12, x2, x5  
or   x13, x6, x2  
add  x14, x2, x2  
sd   x15, 100(x2)
```

WeChat: cstutorcs

- We can resolve hazards with forwarding

- How do we detect when to forward?

# Dependencies & Forwarding



# Detecting the Need to Forward



- Pass register numbers along pipeline
  - e.g., **ID/EX.RegisterRs1** = register number for Rs1 sitting in ID/EX pipeline register
- ALU operand register numbers in EX stage are given by
  - **ID/EX.RegisterRs1, ID/EX.RegisterRs2**
- Data hazards when
  - 1a. **EX/MEM.RegisterRd = ID/EX.RegisterRs1**
  - 1b. **EX/MEM.RegisterRd = ID/EX.RegisterRs2**
  - 2a. **MEM/WB.RegisterRd = ID/EX.RegisterRs1**
  - 2b. **MEM/WB.RegisterRd = ID/EX.RegisterRs2**
    - ^ I'm writing back this reg
    - ^ I read this reg

WeChat: cstutorcs



# Detecting the Need to Forward

- But only if forwarding instruction will write to a register!
  - EX/MEM.RegWrite, MEM/WB.RegWrite
- And only if Rd for that instruction is not x0
  - EX/MEM.RegisterRd  $\neq 0$  <https://tutorcs.com>
  - MEM/WB.RegisterRd  $\neq 0$  WeChat: cstutorcs

# Forwarding Paths



# Forwarding Conditions

| Mux control   | Source | Explanation                                                                                                                                                              |
|---------------|--------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ForwardA = 00 | ID/EX  | The first ALU operand comes from the register file.                                                                                                                      |
| ForwardA = 10 | EX/MEM | The first ALU operand is forwarded from the prior ALU result.<br><a href="#">Assignment Project Exam Help</a>                                                            |
| ForwardA = 01 | MEM/WB | <a href="https://tutorcs.com">https://tutorcs.com</a> The first ALU operand is forwarded from data memory or an earlier ALU result.<br><a href="#">WeChat: cstutorcs</a> |
| ForwardB = 00 | ID/EX  | The second ALU operand comes from the register file.                                                                                                                     |
| ForwardB = 10 | EX/MEM | The second ALU operand is forwarded from the prior ALU result.                                                                                                           |
| ForwardB = 01 | MEM/WB | The second ALU operand is forwarded from data memory or an earlier ALU result.                                                                                           |

# Double Data Hazard

- Consider the sequence:

add  $x_1, x_1, x_2$

add  $x_1, x_1, x_3$

add  $x_1, x_1, x_4$

- Both hazards occur

Assignment Project Exam Help

- Want to use the most recent

WeChat: cstutorcs

- Revise MEM (the earlier instruction)'s hazard condition

- Only fwd if EX (the later instruction)'s hazard condition isn't true

# Revised Forwarding Condition

## ■ MEM hazard

- if (MEM/WB.RegWrite  
and (MEM/WB.RegisterRd ≠ 0)

and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0))

Assignment Project Exam Help  
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))

and (MEM/WB.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 01

- if (MEM/WB.RegWrite

<https://tutorcs.com>

WeChat: cstutorcs

and (MEM/WB.RegisterRd ≠ 0)

and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0))

and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))

and (MEM/WB.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 01

writing to a reg

that reg is not 0

EX/MEM is not also

writing to that reg

we are reading that reg

# Datapath with Forwarding



# Load-Use Hazard Detection

- Check when using instruction is decoded in ID stage
- ALU operand register numbers in ID stage are given by
  - IF/ID.RegisterRs1, IF/ID.RegisterRs2
- Load-use hazard when
  - ID/EX.MemRead and <https://tutorcs.com>
  - $((ID/EX.RegisterRd = IF/ID.RegisterRs1) \text{ or } (ID/EX.RegisterRd = IF/ID.RegisterRs2))$
- If detected, stall and insert bubble
- This is the hardware solution. It is feasible (but not what RISC-V did) to just disallow a compiler from emitting this code.

# How to Stall the Pipeline



- Force control values in ID/EX register to 0
  - EX, MEM and WB do **nop (no-operation)**  
Assignment Project Exam Help
- Prevent update of PC and IF/ID register
  - Using instruction is decoded again  
WeChat: cstutoros
  - Following instruction is fetched again
  - 1-cycle stall allows MEM to read data for Id
    - Can subsequently forward to EX stage

# Load-Use Data Hazard



# Datapath with Hazard Detection



# Stalls and Performance

The BIG Picture

- **Stalls reduce performance**
  - But are required to get correct results
- **Compiler can arrange code to avoid hazards and stalls**
  - Requires knowledge of the pipeline structure  
Assignment Project Exam Help

<https://tutorcs.com>

WeChat: cstutorcs

# Branch Hazards

- If branch outcome determined in MEM



# Reducing Branch Delay

- Move hardware to determine outcome to ID stage
  - Target address adder
  - Register comparator

## ■ Example: branch taken

Assignment Project Exam Help

```
36: sub x10, x4, x8
40: beq x1, x3, 16 // PC-relative branch
        to 40+16*2=72
        WeChat://cstutors
44: and x12, x2, x5
48: or x13, x2, x6
52: add x14, x4, x2
56: sub x15, x6, x7
...
72: ld x4, 50(x7)
```

# Example: Branch Taken



# Example: Branch Taken



# Dynamic Branch Prediction

- In deeper and superscalar pipelines, branch penalty is more significant
- Use dynamic prediction. What's below is a 1-bit predictor:
  - Branch prediction buffer (aka branch history table)  
Assignment Project Exam Help
  - Indexed by recent branch instruction addresses  
<https://tutorcs.com>
  - Stores outcome (taken/not taken)  
WeChat: cstutorcs
  - To execute a branch
    - Check table, expect the same outcome
    - Start fetching from fall-through or target
    - If wrong, flush pipeline and flip prediction

# 1-Bit Predictor: Shortcoming

- # ■ Inner loop branches mispredicted twice!

outer: ...

...

inner: ...

... Assignment Project Exam Help

beq ..., ..., inner

...

beq ..., WeChat: cstutorcs

<https://tutorcs.com>

<https://tutorcs.com>

- Mispredict as taken on last iteration of inner loop
  - Then mispredict as not taken on first iteration of inner loop next time around

# 2-Bit Predictor

- Only change prediction on two successive mispredictions



# Calculating the Branch Target

- Even with predictor, still need to calculate the target address
  - 1-cycle penalty for a taken branch
- Branch target buffer (“BTB”)
  - Cache of target addresses
  - Indexed by PC when instruction fetched
    - If hit and instruction is predicted taken, can fetch target immediately

# Exceptions and Interrupts

- “Unexpected” events requiring change in flow of control
  - Different ISAs use the terms differently
- Exception
  - Arises within the CPU
    - e.g., undefined opcode, syscall, ...  
Assignment Project Exam Help  
<https://tutorcs.com>  
WeChat: cstutorcs
- Interrupt
  - From an external I/O controller
- Dealing with them without sacrificing performance is hard

# Handling Exceptions

- Save PC of offending (or interrupted) instruction
  - In RISC-V: Supervisor Exception Program Counter (SEPC)
- Save indication of the problem
  - In RISC-V: Supervisor Exception Cause Register (SCAUSE)  
Assignment Project Exam Help  
<https://tutorcs.com>
  - 64 bits, but most bits unused  
WeChat: cstutorcs
    - Exception code field: 2 for undefined opcode, 12 for hardware malfunction, ...
- Jump to handler
  - Assume at  $0000\ 0000\ 1C09\ 0000_{\text{hex}}$
  - Handler decides what to do based on SCAUSE

# An Alternate Mechanism

- Vectored Interrupts
  - Handler address determined by the cause
- Exception vector address to be added to a vector table base register:
  - Undefined opcode       $00\ 0100\ 0000_{\text{two}}$   
Assignment Project Exam Help  
<https://tutorcs.com>
  - Hardware malfunction:  $01\ 1000\ 0000_{\text{two}}$   
WeChat: cstutorcs
  - ....      ...
- Instructions either
  - Deal with the interrupt, or
  - Jump to real handler

# Handler Actions

- **Read cause, and transfer to relevant handler**
- **Determine action required**
- **If restartable**
  - **Take corrective action** Assignment Project Exam Help
  - **use SEPC to return to program** <https://tutotorcs.com>
- **Otherwise** WeChat: cstutorcs
  - **Terminate program**
  - **Report error using SEPC, SCAUSE, ...**

# Exceptions in a Pipeline

- Another form of control hazard
- Consider malfunction on add in EX stage

add x1, x2, x1

- Prevent x1 from being clobbered
  - Complete previous instructions
  - Flush add and subsequent instructions
  - Set SEPC and SCAUSE register values
  - Transfer control to handler
- Similar to mispredicted branch
  - Use much of the same hardware

Assignment Project Exam Help

<https://tutorcs.com>

WeChat: cstutorcs

# Pipeline with Exceptions



# Exception Properties

- Restartable exceptions
  - Pipeline can flush the instruction
  - Handler executes, then returns to the instruction
    - Refetched and executed from scratch  
Assignment Project Exam Help
    - Exceptions are not all bad! This is a useful mechanism  
<https://tutorcs.com>
- PC saved in SEPC register  
WeChat: cstutorcs
  - Identifies causing instruction

# Multiple Exceptions

- Pipelining overlaps multiple instructions
  - Could have multiple exceptions at once
- Simple approach: deal with exception from earliest instruction
  - Flush subsequent instructions  
Assignment Project Exam Help
  - “Precise” exceptions <https://tutorcs.com>
- In complex pipelines WeChat: cstutorcs
  - Multiple instructions issued per cycle
  - Out-of-order completion
  - Maintaining precise exceptions is difficult!

# Imprecise Exceptions

- Just stop pipeline and save state
  - Including exception cause(s)
- Let the handler work out
  - Which instruction(s) had exceptions  
Assignment Project Exam Help
  - Which to complete or flush  
<https://tutorcs.com>
    - May require “~~manual~~” completion
- Simplifies hardware, but more complex handler software
- Not feasible for complex multiple-issue out-of-order pipelines