

# Pipelined Architecture

---

Virendra Singh

Computer Architecture and Dependable Systems Lab

Department of Electrical Engineering

Indian Institute of Technology Bombay

<http://www.ee.iitb.ac.in/~viren/>

E-mail: [viren@ee.iitb.ac.in](mailto:viren@ee.iitb.ac.in)

*CS-683: Advanced Computer Architecture*



---

Lecture 8 (16 Aug 2021)

**CADSL**

# Ways to Handle Branch

---

- Stall or bubble
- Delayed branch
- Branch prediction:
  - Heuristics
    - Next instruction
    - Prediction based on statistics (dynamic)
    - Hardware decision (dynamic)
  - Prediction error: pipeline flush



# Branch Prediction ✓

Statically taken  
↓

taken      not taken  
70%      30%

- ① Instruction is a branch instruction ] Fetch.
- ② Target address  
    Record.

✓

| PC Add. | BTA  |
|---------|------|
| 100     | 4000 |
| 149     | 7500 |
| 273     | 2400 |
|         |      |
|         |      |
|         |      |

Branch History Table

Look up table



# Branch Prediction



# Branch Prediction



## Dynamic decisions

— use the behaviour of last time



Record the behaviour  
(in look-up table - BHT)

BHT

| PC  | BTA  | HB | C | History buf. |
|-----|------|----|---|--------------|
| 100 | 2400 | 10 | ✓ |              |
|     |      |    |   |              |

if ( $i < j$ )

can capture  
the date  
dependent  
behaviour

      

⇒ 80%

↑ read at fetch  
& modify at Execute



# Branch Prediction

---

- Useful for program loops.
- A one-bit prediction scheme: a one-bit buffer carries a “history bit” that tells what happened on the last branch instruction
  - History bit = 1, branch was taken
  - History bit = 0, branch was not taken



# Branch Prediction



# Branch Prediction for a Loop

1  
2  
3  
4  
5



Execution of Instruction 4

| Execution seq. | Old hist. bit | Next instr. |    |      | New hist. bit | Prediction |
|----------------|---------------|-------------|----|------|---------------|------------|
|                |               | Pred.       | I  | Act. |               |            |
| 1              | 0 ✓           | 5           | 1  | 2    | 1             | Bad        |
| 2              | ✓ 1           | 2           | 2  | 2    | 1             | Good       |
| 3              | 1             | 2           | 3  | 2    | 1             | Good       |
| 4              | 1             | 2           | 4  | 2    | 1             | Good       |
| 5              | 1             | 2           | 5  | 2    | 1             | Good       |
| 6              | 1             | 2           | 6  | 2    | 1             | Good       |
| 7              | 1             | 2           | 7  | 2    | 1             | Good       |
| 8              | 1             | 2           | 8  | 2    | 1             | Good       |
| 9              | 1             | 2           | 9  | 2    | 1             | Good       |
| 10             | 1             | 2           | 10 | 5    | 0             | Bad        |

h.bit = 0 branch not taken, h.bit = 1 branch taken.



$\overbrace{TTT \dots}^{\text{Loop}} T \overbrace{NTNT \dots}^{\text{State}} T \dots$   
 Taken type ) 2 bits.  
 $\overbrace{NTNTNT \dots}^{q_j (i < j)}$   
 $NT NT NT \dots - NT \dots T \dots NT NT \dots - T \dots$   
 $N \uparrow NT NT \dots - NT \dots T \dots T \dots T \dots$



# Two-Bit Prediction Buffer

- Can improve correct prediction statistics.



# Branch Prediction for a Loop

1

~~Fix~~

2

3

4

5



Execution of Instruction 4

| Execution seq. | Old Pred. Buf | Next instr. |    |      | New pred. Buf | Prediction |
|----------------|---------------|-------------|----|------|---------------|------------|
|                |               | Pred.       | I  | Act. |               |            |
| 1              | 10            | 2           | 1  | 2    | 11            | Good       |
| 2              | 11            | 2           | 2  | 2    | 11            | Good       |
| 3              | 11            | 2           | 3  | 2    | 11            | Good       |
| 4              | 11            | 2           | 4  | 2    | 11            | Good       |
| 5              | 11            | 2           | 5  | 2    | 11            | Good       |
| 6              | 11            | 2           | 6  | 2    | 11            | Good       |
| 7              | 11            | 2           | 7  | 2    | 11            | Good       |
| 8              | 11            | 2           | 8  | 2    | 11            | Good       |
| 9              | 11            | 2           | 9  | 2    | 11            | Good       |
| 10             | 11            | 2           | 10 | 5    | 10            | Bad        |



16 Aug 2021

80% Accuracy,  $CPI = 1 + 2 \times 1 \times 2 = 1.04$

12

CADSL

80% 1 bu<sup>r</sup>  
↓  
→ 90% 2 bu<sup>r</sup>  
92% 3 bu<sup>r</sup>  
93% 4 bu<sup>r</sup>



# Delayed Branch Exec.



16 Aug 2021

CS683@IITB

IF ID Err New WB

1.  $I_1$   ~~$B_2$~~   $\xrightarrow{\text{br}} \text{cmd}$
2.  $I_2$   $\xrightarrow{\text{cmd}}$   $I_1$   $\xleftarrow{\text{evaluate}}$
3.  $I_3$   $I_2$   $I_1$   $\xleftarrow{\text{evaluate}}$
4.  $I_{20}$   ~~$I_1$~~   $I_2 I_1$





- $I^F \quad I^D \quad \Rightarrow \text{Max Wt.}$
1.  $I_3$
  2.  $I_{\cancel{1}} \quad I_3$
  3.  $I_2 \checkmark \quad I_{\cancel{1}}$        $I_3$       evaluated
  4.  $I_{\cancel{2}} \quad I_2 \quad I_{\cancel{1}} \quad I_3$
  5.  $I_{\cancel{2}} \quad I_{\cancel{3}} \quad I_2 \quad I_{\cancel{1}} \quad I_3 \quad \checkmark$
- no - penalty





# Delayed Branch Example

151.

- Stall on branch

✓ [add \$4, \$5, \$6] ↵

beq \$1, \$2, skip

next instruction ↓

...

skip or \$7, \$8, \$9

MIPS) → ↑ branch delay  
↓ st

independent

- Delayed branch

beq \$1, \$2, skip

add \$4, \$5, \$6

next instruction

...

skip or \$7, \$8, \$9

Instruction executed irrespective  
of branch decision



# Delayed Branch



# Summary: Hazards

---

- Structural hazards
  - Cause: resource conflict
  - Remedies: (i) hardware resources, (ii) stall (bubble)
- Data hazards ✓
  - Cause: data unavailability
  - Remedies: (i) forwarding, (ii) stall (bubble), (iii) code reordering
- Control hazards
  - Cause: out-of-sequence execution (branch or jump)
  - Remedies: (i) stall (bubble), (ii) branch prediction/pipeline flush, (iii) delayed branch/pipeline flush

detect / correction

VLIW / Superscalar



15 minutes

1

→ 200

150 - 200

RUX8F6KI

2021 CS683

# Memory Organization

TLELXV



# Memory Performance Gap



# Why Memory Hierarchy?

---

- Need lots of bandwidth

$$\begin{aligned}BW &= \frac{1.0inst}{cycle} \times \left[ \frac{1Ifetch}{inst} \times \frac{4B}{Ifetch} + \frac{0.4Dref}{inst} \times \frac{4B}{Dref} \right] \times \frac{1Gcycles}{sec} \\&= \frac{5.6GB}{sec}\end{aligned}$$

- Need lots of storage
  - 64MB (minimum) to multiple TB
- Must be cheap per bit
  - (TB x anything) is a lot of money!
- These requirements seem incompatible



# Thank You

