

Copyrighted material for distribution

# ECE 485/585

## Computer Organization and Design

### Lecture 11: Branch Hazards and Exceptions

Fall 2022

Won-Jae Yi, Ph.D.

Department of Electrical and Computer Engineering  
Illinois Institute of Technology

#### Reducing Branch Delay $k_{\text{jet}} = PC + 4 + (\text{offset} \times 4)$

- Move hardware to determine outcome to ID stage
  - Target address adder
  - Register comparator  $R_S, R_T$
- Example: branch taken
  - $\rightarrow 36: \text{sub } \$10, \$4, \$8$
  - $\rightarrow 40: \text{beq } \$1, \$3, 7$
  - $\rightarrow 44: \text{and } \$12, \$2, \$5$
  - $\rightarrow 48: \text{or } \$13, \$2, \$6$
  - $\rightarrow 52: \text{add } \$14, \$4, \$2$
  - $\rightarrow 56: \text{sll } \$15, \$6, \$7$
  - $\rightarrow \dots$
  - $\rightarrow 72: \text{lw } \$4, 50(\$7)$



4

#### Datapath with Hazard Detection



2

#### Data Hazards for Branches

- If a comparison register is a destination of 2<sup>nd</sup> or 3<sup>rd</sup> preceding ALU instruction



7

#### Data Hazards for Branches

- If a comparison register is a destination of preceding ALU instruction or 2<sup>nd</sup> preceding load instruction
  - Need 1 stall cycle



8

#### Branch Hazards

- If branch outcome determined in MEM



3

#### Example: Branch Taken



4

#### Data Hazards for Branches

- If a comparison register is a destination of immediately preceding load instruction
  - Need 2 stall cycles



9

## Delayed Decision

- If the branch hardware has been moved to the ID stage, then we can eliminate all branch stalls with **delayed branches** which are defined as always executing the next sequential instruction after the branch instruction – the branch takes effect *after* that next instruction
  - MIPS compiler moves an instruction to immediately after the branch that is not affected by the branch (a **safe** instruction) thereby **hiding** the branch delay
- With deeper pipelines, the branch delay grows requiring more than one delay slot
  - Delayed branches have lost popularity compared to more expensive but more flexible (dynamic) hardware branch prediction
  - Growth in available transistors has made hardware branch prediction relatively cheaper

*beq ~~, ~*



10

11

## 1-Bit Predictor: Shortcoming

- Inner loop branches mispredicted twice!



13

14

## 2-Bit Predictor

- Only change prediction on two successive mispredictions



14

15

## Exceptions and Interrupts

- “Unexpected” events requiring change in flow of control
  - Different ISAs use the terms differently
- Exception
  - Arises within the CPU
    - e.g., undefined opcode, overflow, syscall, ...
- Interrupt
  - From an external I/O controller
- Dealing with them without sacrificing performance is hard



16

17

## Handling Exceptions

- In MIPS, exceptions managed by a System Control Coprocessor (CPO)
  - Save PC of offending (or interrupted) instruction
    - In MIPS: Exception Program Counter (EPC)
  - Save indication of the problem
    - In MIPS: Cause register
  - Jump to exception handler at 0x80000180
    - change PC
    - 0x8000 0000, 10

17

## Dynamic Branch Prediction

- In deeper and superscalar pipelines, branch penalty is more significant
- Use dynamic prediction
  - Branch prediction buffer (a.k.a. branch history table)
- Indexed by recent branch instruction addresses
- Stores outcome (taken/not taken)
- To execute a branch
  - Check table, expect the same outcome
  - Start fetching from fall-through or target
  - If wrong, flush pipeline and flip prediction

*Remember = branch history*

18

## Calculating the Branch Target

- Even with predictor, still need to calculate the target address
  - 1-cycle penalty for a taken branch
- Branch target buffer
  - Cache of target addresses
  - Indexed by PC when instruction fetched
    - If hit and instruction is branch predicted taken, can fetch target immediately



## Handler Actions

- Read cause and transfer to relevant handler
- Determine action required
- If restartable
  - Take corrective action
  - use EPC to return to program
- Otherwise
  - Terminate program
  - Report error using EPC, cause, ...

↙ PC

19

## Exceptions in a Pipeline

- Another form of control hazard
- Consider overflow on add in EX stage
  - add \$1, \$2, \$1
  - Prevent \$1 from being clobbered
  - Complete previous instructions
  - Flush add and subsequent instructions
  - Set Cause and EPC register values
  - Transfer control to handler
- Similar to mispredicted branch
  - Use much of the same hardware

## Pipeline with Exceptions



20

## Exception Properties

- Restartable exceptions
  - Pipeline can flush the instruction
  - Handler executes, then returns to the instruction
    - Refetched and executed from scratch
- PC saved in EPC register
  - Identifies causing instruction
  - Actually PC + 4 is saved
  - Handler must adjust

22

## Exception Example

- Exception on add in
    - 40 sub \$11, \$2, \$4
    - 44 and \$12, \$2, \$5
    - 48 or \$13, \$2, \$6
    - 50 add \$1, \$2, \$1
    - 54 sllt \$15, \$6, \$7
    - lw \$16, 50(\$7)
    - ...
  - Handler
    - PC → 0x80000180
    - sw \$26, 1000(\$0)
    - sw \$27, 1004(\$0)
    - ...
- ENC = 4C + 4  
= 50*
- overflow*
- Reserved for kernel.*
- \$K1*
- \$K1*
- stack*
- lw \$26, 1000(\$0)*
- lw \$27, 1004(\$0)*

23

## Exception Example



24

## Exception Example



25

## Multiple Exceptions

- Pipelining overlaps multiple instructions
  - Could have multiple exceptions at once
- Simple approach: deal with exception from earliest instruction
  - Flush subsequent instructions
  - "Precise" exceptions
- In complex pipelines
  - Multiple instructions issued per cycle
  - Out-of-order completion
  - Maintaining precise exceptions is difficult!

26

## Imprecise Exceptions

- Just stop pipeline and save state
  - Including exception cause(s)
- Let the handler work out
  - Which instruction(s) had exceptions
  - Which to complete or flush
    - May require "manual" completion
- Simplifies hardware, but more complex handler software
- Not feasible for complex multiple-issue out-of-order pipelines

27