

# ECE 485/585

## Computer Organization and Design

### Lecture 10: Pipelining and Hazards

Fall 2022

Won-Jae Yi, Ph.D.

Department of Electrical and Computer Engineering  
Illinois Institute of Technology

### Summary So Far...

- First analyze the instructions... then build the processor
- If instructions take different amounts of time, multi-cycle is better
- Data path implemented using:
  - Combinational logic for arithmetic
  - State holding elements to remember bits
- Control implemented using:
  - Combinational logic for single-cycle implementation
  - More complicated logic for multi-cycle implementation
- We will Improve Performance by Pipelining

### Pipelining Analogy



### MIPS Pipeline

- Five stages, one step per stage
  - IF: Instruction fetch from memory
  - ID: Instruction decode & register read
  - EX: Execute operation or calculate address
  - MEM: Access memory operand
  - WB: Write result back to register



### Pipeline Performance

- Assume time for stages is
  - 100ps for register read or write
  - 200ps for other stages
- Compare pipelined datapath with single-cycle datapath

| Instr    | Instr fetch | Register read | ALU op | Memory access | Register write | Total time |
|----------|-------------|---------------|--------|---------------|----------------|------------|
| Iw       | 200ps       | 100 ps        | 200ps  | 200ps         | 100 ps         | 600ps      |
| sw       | 200ps       | 100 ps        | 200ps  | 200ps         |                | 700ps      |
| R-format | 200ps       | 100 ps        | 200ps  |               | 100 ps         | 600ps      |
| beq      | 200ps       | 100 ps        | 200ps  |               |                | 500ps      |

Registers

Jcc

### Pipeline Performance



### Pipeline Speedup

- If all stages are balanced
  - e.g., all take the same time
- Time between instructions<sub>pipelined</sub> = Time between instructions<sub>nonpipelined</sub> / Number of stages
- If not balanced, speedup is less
- Speedup due to increased throughput
  - Latency (time for each instruction) does not decrease

### Pipelining and ISA Design

- MIPS ISA designed for pipelining
  - All instructions are 32-bits
    - Easier to fetch and decode in one cycle
    - c.f. x86: 1- to 17-byte instructions
  - Few and regular instruction formats
    - Can decode and read registers in one step
  - Load/store addressing
    - Can calculate address in 3<sup>rd</sup> stage
    - Access memory in 4<sup>th</sup> stage
  - Alignment of memory operands
    - Memory access takes only one cycle

### Pipeline Hazards

- Situations that prevent starting the next instruction in the next cycle
  - Cannot execute in the following clock cycle due to dependencies
- Structure hazards
  - A required resource is busy
- Data hazards
  - Need to wait for previous instruction to complete its data read/write
- Control hazards
  - Deciding on control action depends on previous instruction





## Branch Prediction

- Longer pipelines can't readily determine branch outcome early
  - Stall penalty becomes unacceptable
- Predict outcome of branch
  - Only stall if prediction is wrong
- In MIPS pipeline
  - Can predict branches not taken ✓
  - Fetch instruction after branch, with no delay
  - Only stall if branches are taken



19

## MIPS with Prediction Not Taken



20

## More-Realistic Branch Prediction

### Static branch prediction

- Based on typical branch behavior
  - Example: loop and if-statement branches
  - Predict backward branches taken
  - Predict forward branches not taken

$\rightarrow \text{for } i=0; i<10; i++ \text{;}$

$\rightarrow \text{if } i>5 \text{ then } \dots \text{ end if;}$

### Dynamic branch prediction

- Hardware measures actual branch behavior
  - e.g., record recent history of each branch
- Assume future behavior will continue the trend
- When wrong, stall while re-fetching, and update history

$\rightarrow \text{for } i=0; i<10; i++ \text{;}$

$\rightarrow \text{if } i>5 \text{ then } \dots \text{ end if;}$

## MIPS Pipelined Datapath



22

## Pipeline Registers

- Need registers between stages
  - To hold information produced in previous cycle



23

## IF for Load, Store, ...



25

## ID for Load, Store, ...



26

## EX for Load, Store, ...



27

## MEM for Load



28

## WB for Load



29

## Corrected Datapath for Load



30

## EX for Store



31

## MEM for Store



32

## WB for Store



33

## Multicycle Pipeline Diagram

- Form showing resource usage



34

## Multicycle Pipeline Diagram

- Traditional Form



35

## Single-Cycle Pipeline Diagram

- State of pipeline in a given cycle



36

## Pipelined Control (Simplified)

*→ determined in IP stage*

*A must be carried to the next future stages.*



## Control Lines grouped into 3 Stages



## Pipelined Control



## Pipeline Summary

### The BIG Picture

- Pipelining improves performance by increasing instruction throughput
  - Executes multiple instructions in parallel
  - Each instruction has the same latency
- Subject to hazards
  - Structure, data, control
- Instruction set design affects complexity of pipeline implementation

## Dependencies & Forwarding



## Detecting the Need to Forward

- Pass register numbers along pipeline
  - e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register
- ALU operand register numbers in EX stage are given by
  - ID/EX.RegisterRs, ID/EX.RegisterRt

(Condition 1)

Condition 1  
Data hazards when  
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs  
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt  
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs  
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt



43

## Pipelined Control

- Control signals derived from instruction
  - As in single-cycle implementation



38

39

## Data Hazards in ALU Instructions

- Consider this sequence:
  - ✓ Sub \$2, \$1, \$3 → cc1 cc2 cc3 cc4 cc5 cc6
  - ✓ and \$12, \$2, \$5 → cc1 cc2 cc3 cc4 cc5 cc6
  - ✗ or \$13, \$6, \$2 → cc1 cc2 cc3 cc4 cc5 cc6
  - ✗ add \$14, \$2, \$2 → cc1 cc2 cc3 cc4 cc5 cc6
  - ✓ sw \$15, 100(\$2) → cc1 cc2 cc3 cc4 cc5 cc6
- We can resolve hazards with forwarding
  - How do we detect when to forward?

forward w/  
2 csts

Page 45



42

## Detecting the Need to Forward

- But only if forwarding instruction will write to a register!
  - EX/MEM.RegWrite, MEM/WB.RegWrite
- And only if Rd for that instruction is not \$Zero
  - EX/MEM.RegisterRd ≠ 0, MEM/WB.RegisterRd ≠ 0



44

45

## Forwarding Paths



## Forwarding Conditions COND. 1 ~ 3

- EX hazard**
  - if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs))  
ForwardA = 10
  - if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt))  
ForwardB = 10
- MEM hazard**
  - if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs))  
ForwardA = 01
  - if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt))  
ForwardB = 01

46

47

48

## Revised Forwarding Condition

- MEM hazard**
  - if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs))  
ForwardA = 01
  - if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt))  
ForwardB = 01

49

50

51

## Load-Use Data Hazard Detection

- Check when using instruction is decoded in ID stage
- ALU operand register numbers in ID stage are given by
  - IF/ID.RegisterRs, IF/ID.RegisterRt
- Load-use hazard when
  - ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs or ID/EX.RegisterRt = IF/ID.RegisterRt))
- If detected, stall and insert bubble

*(MEM.RD=1)*  
*(lw)*

## How to Stall the Pipeline

- Force control values in ID/EX register to 0
  - EX, MEM and WB do nop (no-operation)
- Prevent update of PC and IF/ID register
  - Using instruction is decoded again
  - Following instruction is fetched again
  - 1-cycle stall allows MEM to read data for 1w
    - Can subsequently forward to EX stage

52

53

54

## Double Data Hazard

- Consider the sequence:
 

|                   |      |     |     |     |
|-------------------|------|-----|-----|-----|
| cc1               | cc2  | cc3 | cc4 | cc5 |
| add \$1, \$1, \$2 | → IF | ID  | EX  | MEM |
| add \$1, \$1, \$3 | → IF | ID  | EX  | MEM |
| add \$1, \$1, \$4 | → IF | ID  | EX  | MEM |
- Both hazards occur
  - Want to use the most recent
- Revise MEM hazard condition
  - Only forward if EX hazard condition isn't true

## Datapath with Forwarding



50

## Load-Use Data Hazard



## Stall/Bubble in the Pipeline





## Stall/Bubble in the Pipeline



## Stalls and Performance

### The BIG Picture

- Stalls reduce performance
  - But are required to get correct results
- Compiler can arrange code to avoid hazards and stalls
  - Requires knowledge of the pipeline structure

## Datapath with Hazard Detection

