

# Unit 10. Pipeline Multiple Instructions

$K \leftarrow$  number of pipes.  
 $n \leftarrow$  stage.

- Control Hazard
- Branch determines flow of control
  - Fetching next instruction depends on compare outcome
  - Pipeline can't always fetch correct instruction
- In MIPS pipeline
  - Need to compare registers and compute target early in the pipeline

Final : 08/20 Thur 6-9 PM  
40 Multiple Choices  
No cheat sheet  
After midterm: 65%



# Stall on Branch

- Wait until branch outcome determined before fetching next instruction



# Branch Prediction

- Longer pipelines can't readily determine branch outcome early
  - Stall penalty becomes unacceptable
- Predict outcome of branch
  - Only invalid if prediction is wrong
- In MIPS pipeline
  - Can predict branches not taken
  - Fetch instruction following branch, with no delay

has to  
predict



# MIPS with Predict Not Taken



# Pipeline Summary

## The BIG Picture

reason we use pipeline : take advantage of multi CPU

- Pipelining improves performance by increasing instruction throughput
  - Executes multiple instructions in parallel
  - Each instruction has the same latency
- Subject to hazards
  - Structure, data, control
- Instruction set design affects complexity of pipeline implementation

# MIPS Pipelined Datapath



# Pipeline registers

- Need registers between stages
  - To hold information produced in previous cycle  
*single cycle data path.*



# Pipeline Operation

- Cycle-by-cycle flow of instructions through the pipelined datapath
  - “Single-clock-cycle” pipeline diagram
    - Shows pipeline usage in a single cycle
    - Highlight resources used
  - c.f. “multi-clock-cycle” diagram
    - Graph of operation over time
- We'll look at “single-clock-cycle” diagrams for load & store

# IF for Load, Store, ...



# ID for Load, Store, ...



# EX for Load



# MEM for Load



# WB for Load

Iw  
Write back



Wrong  
register  
number

# Corrected Datapath for Load



# EX for Store



# MEM for Store



# WB for Store

SW  
Write-back



# Multi-Cycle Pipeline Diagram

- Form showing resource usage



# Multi-Cycle Pipeline Diagram

## Traditional form



# Single-Cycle Pipeline Diagram

## State of pipeline in a given cycle

add \$14, \$5, \$6      lw \$13, 24 (\$1)      add \$12, \$3, \$4      sub \$11, \$2, \$3      lw \$10, 20(\$1)

Instruction fetch      Instruction decode      Execution      Memory      Write-back



# Pipelined Control (Simplified)



# Pipelined Control

- Control signals derived from instruction
  - As in single-cycle implementation



# Pipelined Control



# Data Hazards in ALU Instructions

- Consider this sequence:

```
sub $2, $1, $3  
and $12, $2, $5  
or $13, $6, $2  
add $14, $2, $2  
sw $15, 100($2)
```



- We can resolve hazards with forwarding
  - How do we detect when to forward?

# Dependencies & Forwarding



# Detecting the Need to Forward

- Pass register numbers along pipeline
  - e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register
- ALU operand register numbers in EX stage are given by
  - ID/EX.RegisterRs, ID/EX.RegisterRt
- Data hazards when
  - 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs
  - 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
  - 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
  - 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Fwd from  
EX/MEM  
pipeline reg

Fwd from  
MEM/WB  
pipeline reg

# Detecting the Need to Forward

- If the register name are the same on leading and following instructions
    - EX/MEM.RegWrite, MEM/WB.RegWrite
  - And if the control signal is not \$zero
    - EX/MEM.RegisterRd  $\neq 0$ ,  
MEM/WB.RegisterRd  $\neq 0$
- Output to registers  
(if not, why need to fwd?)*
- not output to \$zero (if output to \$zero, why need fwd?)*

# Forwarding Paths



b. With forwarding

# Forwarding Conditions

## EX hazard

- if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)  
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))  
 $\text{ForwardA} = 10$   
*# someone is waiting  
for Rd in their RS*
- if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)  
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))  
 $\text{ForwardB} = 10$   
*# some one is  
Waiting for Rd in  
their Rt*

## MEM hazard

- if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)  
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))  
 $\text{ForwardA} = 01$
- if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)  
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))  
 $\text{ForwardB} = 01$

only difference

Output to register

output is not \$zero

# Double Data Hazard

- Consider the sequence:

add \$1, \$1, \$2  
add \$1, \$1, \$3  
add \$1, \$1, \$4

*output affects both  
next instructions*

- Both hazards occur
  - Want to use the most recent

# Revised Forwarding Condition

## MEM hazard

- if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)  
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)  
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))  
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
- ForwardA = 01
- if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)  
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)  
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))  
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01

+ this means if the value Rd can be provided in EX/IMEM then MiPs should use that value instead of getting from MEM/WB stage, which data might be older than EX/IMEM. (principle: take latest data EX/IMEM newer than MEM/WB)

# Datapath with Forwarding



# Load-Use Data Hazard



# Load-Use Hazard Detection

- Check when using instruction is decoded in ID stage
- ALU operand register numbers in ID stage are given by
  - IF/ID.RegisterRs, IF/ID.RegisterRt
- Load-use hazard when
  - ID/EX.MemRead and
    - ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))
- If detected, stall and insert bubble

# How to Stall the Pipeline

- Force control values in ID/EX register to 0
  - EX, MEM and WB do **nop** (no-operation)

In the one that needs to stall, all EX, MEM, WB bubble
- Prevent update of PC and IF/ID register
- Using instruction is decoded again
  - Following instruction is fetched again
  - 1-cycle stall allows MEM to read data for 1w
    - Can subsequently forward to EX stage

# Stall/Bubble in the Pipeline



# Datapath with Hazard Detection



# Stalls and Performance

## The BIG Picture

3 methods. : - stall(stop)  
- forward  
- reorder

- Stalls reduce performance
  - But are required to get correct results
- Compiler can arrange code to avoid hazards and stalls
  - Requires knowledge of the pipeline structure

# Branch Hazards

- branch outcome determined in MEM



# Reducing Branch Delay

## Example: branch taken

36: sub \$10, \$4, \$8  
40: beq \$1, \$3, 7  
44: and \$12, \$2, \$5  
48: or \$13, \$2, \$6  
52: add \$14, \$4, \$2  
56: slt \$15, \$6, \$7

jump ↗

72: lw \$4, 50(\$7)

$44 + 28 = 72$

← send NOP to invalid all these commands

# Example: Branch Taken



# Data Hazards for Branches

- If a comparison in EXE register is a destination of an instruction at least two stages earlier



- Can resolve using forwarding

# Data Hazards for Branches

- If a comparison register is a destination of preceding ALU instruction or 2<sup>nd</sup> preceding load instruction
  - Need 1 stall cycle

lw \$1, addr  
add \$4, \$5, \$6  
beq stalled  
beq \$1, \$4, target



# Data Hazards for Branches

- If a comparison register is a destination of immediately preceding load instruction
  - Need 2 stall cycles



# Dynamic Branch Prediction

- In deeper and superscalar pipelines, branch penalty is more significant
- Use dynamic prediction
  - Branch prediction buffer (aka branch history table)
  - Indexed by recent branch instruction addresses
  - Stores outcome (taken/not taken) *using HW device to store prediction history.*
  - To execute a branch
    - Check table, expect the same outcome
    - Start fetching from fall-through or target
    - If wrong, flush pipeline and flip prediction

static is base on past history.

# 1-Bit Predictor: Shortcoming

g2k

- 1 bit predictor will waste two times if wrong

↑  
error, if wrong → change.

- If mispredict on last time
- Then will mispredict again on first time because of the changed predict bit

# 2-Bit Predictor

- Only change prediction on two successive mispredictions



g 2k  
wait for 2 errors then  
change.

# Calculating the Branch Target

- Even with predictor, still need to calculate the target address
- The branch to target address

# Exceptions and Interrupts

- “Unexpected” events requiring change in flow of control
  - Different ISAs use the terms differently
- Exception
  - Arises within the CPU
    - e.g., undefined opcode, overflow, syscall, ...
- Interrupt
  - From an external I/O controller
- Dealing with them without sacrificing performance is hard

2nd type of error

3rd.

1st is hazard.

5 Exception.

- I/O

- ?

- overflow

- undefined

- HW not avail.

# Handling Exceptions

g2k

- In MIPS, exceptions managed by a System Control Coprocessor (CP0)
- Use “EPC” (“Exception Program Counter”) to save Exception instruction address
- Use “Cause Register” to save Exception reason
- Use “Rescue” register to Jump to rescue handler at 8000 00180

use HW called EPC to retrieve addr of the exception.

3 ways to handle exception

- 1) EPC
- 2) Cause register
- 3) Rescue

# An Alternate Mechanism

- Vectored Interrupts
  - Handler address determined by the cause
- Example:
  - Undefined opcode: C000 0000
  - Overflow: C000 0020
  - .... C000 0040
- Instructions either
  - Deal with the interrupt, or
  - Jump to rescue handler

# Handler Actions

- Read cause, and transfer to relevant handler
- Determine action required
- If restartable
  - Take corrective action
  - use EPC to return to program
- Otherwise
  - Terminate program
  - Report error using EPC, cause, ...

# Exceptions in a Pipeline

- Another form of control hazard
- Consider overflow on add in EX stage
  - add \$1, \$2, \$1
    - Prevent \$1 from being clobbered
    - Complete previous instructions
    - Flush add and subsequent instructions
    - Set Cause and EPC register values
    - Transfer control to handler
- Similar to mispredicted branch
  - Use much of the same hardware

# Pipeline with Exceptions



# Exception Properties

- Restartable exceptions
  - Pipeline can flush the instruction
  - Handler executes, then returns to the instruction
    - Refetched and executed from scratch
- PC saved in EPC register
  - Identifies causing instruction
  - Actually  $PC + 4$  is saved
    - Handler must adjust

# Exception Example



# Multiple Exceptions

\* hit more than 1 exception.

- Pipelining overlaps multiple instructions
  - Could have multiple exceptions at once
- Simple approach: deal with exception from earliest instruction
  - Flush subsequent instructions
  - “Precise” exceptions
- In complex pipelines
  - Multiple instructions issued per cycle
  - Out-of-order completion
  - Maintaining precise exceptions is difficult!

# Unit 10 Homework

1. Explain what is “pipelined instructions” and how does it work ? Use an example to compare the non-pipeline and pipeline difference ?
2. What are the potential hazard problems in the pipeline instructions and how to solve each problem ?
3. What are “forwarding” and “stall” ? When and how do we use them ?
4. What is the branch hazard ? How to fix the branch hazard ?
5. What is exception ? How does MIPS handles the exception problem ?