

# Pipelining Hazards.

- Speedup from pipeline implementation:
  - Based on the assumption that new instructions can be "pumped" into pipeline every cycle.
- However, there are pipeline hazards:
  - Problems that prevent next instruction from immediately following previous instruction.
  - Structural Hazards:
    - Simultaneous use of a hardware resource (RegFile, Memory, ALU)
  - Data Hazards:
    - Data dependencies between instructions. (e.g. lw, sw)
  - Control Hazards:
    - Change in program flow. (e.g. beq)

Graphical Notation for Pipeline:



- Horizontal = the stages of an instruction
- Vertical = the instructions in different pipeline stages

# Structural Hazard

- If there is only a **single memory module**:



## Solution 1: Stall the Pipeline



## 2. Solution 2: Separate Memory

- Split memory into **Data** and **Instruction** memory



# Data Dependency

- When 2 instructions access (read/write) the same register (register contention)

## RAW

### "Read-After-Write"

- Occurs when a later instruction **reads** from the destination register **written** by an earlier instruction.

also "true data dependency"

- eg. i1: add \$1, \$2, \$3 # writes to \$1  
i2: sub \$4, \$1, \$5 # reads from \$1



- Effect of incorrect execution:

If i2 reads register \$1 before i1 can write back the result,  
i2 will get a **stale result (old result)**

- Also have:

- WAR: "Write-after-Read"
- WAW: "Write-after-Write"

} Do not cause any pipeline hazard;  
Affects processor only when instructions are executed out of program order.

MIPS is in sequential order

- Value from prior instruction is needed before write back



## Solution: C Branch Forwarding

- Forward the result to any trailing (later) instructions before it is altered in the register file.
- Bypass (replace) the data read from register file.



## 4.2 Data Hazards: LOAD Instruction



## 4.2 Data Hazards: LOAD Instruction Solution



# Control Dependency

An instruction  $j$  is control dependent on  $i$  if  $i$  controls whether or not  $j$  executes.

- Typically  $i$  would be a branch instruction.

- eg.  $i_1: \text{beq } \$3, \$5, \text{label} \# \text{branch}$   
 $i_2: \text{add } \$1, \$2, \$4. \quad \# \text{depends on } i_1.$

- Effect of incorrect execution:

- If  $i_2$  is allowed to execute before  $i_1$  is determined, register  $\$1$  may be incorrectly changed.



Solution:

- Add delays until decision is known  $\Rightarrow$  waiting 3 clock cycles.
- 3 clock cycle penalty (reduce):
  - Early Branch Resolution** - move branch decision calculation to earlier pipeline stage
  - Branch Prediction** - guess the outcome before it is produced. (Correct if wrong)
  - Delayed Branching** - do something useful while waiting for the outcome. (Calculate instr. not dependent on branch)

# Early Branch Resolution

- Make decision in **ID** stage instead of **MEM** — branch decision is known at the end of ID stage.
- Move branch target address calculation
  - Move register comparison → cannot use ALU for register comparison any more



- Wait until the branch decision is known:
  - Then fetch the correct instruction
- Reduced from 3 to **1 clock cycle delay**

# Problems:

- Decision making process is brought from MEM stage to ID stage.

- However, if the register(s) involved in the comparison is produced by preceding instruction:
  - Further stall is still needed!



## Solution:

- Add forwarding path from ALU to ID stage
- One clock cycle delay** is still needed



- Problem is worse with **load** followed by **branch**

## Solution:

- MEM to ID forwarding and 2 more stall cycles!
- In this case, we ended up with 3 total stall cycles
- ***no improvement!***



# Branch Prediction

- Simple prediction
  - All branchers are assumed to be **not taken**.
  - Fetch the successor instruction and start pumping it through the pipeline stages.

- When the actual branch outcome is known:

**Not Taken** - Guessed Correctly → no pipeline stall.

**Taken** - Guessed wrongly → Wrong instruction in the pipeline

→ **Flush** successor instruction from the pipeline.

→ **Stalls**.



# Delayed Branch

- Branch outcome takes  $X$  number of cycles to be known  
→  $X$  cycles stall.

- Idea:

- Move non-control dependent instructions into the  $X$  slots following a branch.
  - Known as the **branch-delay slot**.
- These instructions are executed regardless of the branch outcome

- In MIPS processor:

- Branch-Delay slot = 1 (with the early branch)

- E.g.



- The "or" instruction is moved into the delayed slot:
  - Get executed regardless of the branch outcome→ Same behavior as the original code! → Correctness should be preserved!

- Done by compiler.

- Best case scenario**

- There is an instruction preceding the branch which can be moved into the delayed slot
  - Program correctness must be preserved!

- Worst case scenario**

- Such instruction cannot be found  
→ Add a no-op (**nop**) instruction in the branch-delay slot

- Re-ordering** instructions is a common method of program optimization

- Compiler must be smart enough to do this
- Usually can find such an instruction at least 50% of the time

