

# CIS 5710: Computer Organization & Design

## Unit 11: Static & Dynamic Scheduling

Slides originally developed by  
Drew Hilton, Amir Roth, Milo Martin and Joe Devietti  
at University of Pennsylvania

# This Unit: Static & Dynamic Scheduling

---



- Code scheduling
  - To reduce pipeline stalls
  - To increase ILP (insn level parallelism)
- Static scheduling by the compiler
  - Approach & limitations
- Dynamic scheduling in hardware
  - Register renaming
  - Instruction selection
  - Handling memory operations

---

# **Code Scheduling & Limitations**

# Code Scheduling

---

- Scheduling: act of finding independent instructions
  - “Static” done at compile time by the compiler (software)
  - “Dynamic” done at runtime by the processor (hardware)
- Why schedule code?
  - Scalar pipelines: fill in load-to-use delay slots to improve CPI
  - Superscalar: place independent instructions together
    - As above, load-to-use delay slots
    - Allow multiple-issue decode logic to let them execute at the same time

# Compiler Scheduling

- Compiler can schedule (move) instructions to reduce stalls
  - Basic pipeline scheduling**: eliminate back-to-back load-use pairs
  - Example code sequence:  $a = b + c; d = f - e;$ 
    - $sp$  stack pointer,  $sp+0$  is “ $a$ ”,  $sp+4$  is “ $b$ ”, etc...

Before

```
ld [sp+4]→r2
ld [sp+8]→r3
add r2,r3→r1 //stall
st r1→[sp+0]
ld [sp+16]→r5
ld [sp+20]→r6
sub r6,r5→r4 //stall
st r4→[sp+12]
```

After

```
ld [sp+4]→r2
ld [sp+8]→r3
ld [sp+16]→r5
add r2,r3→r1 //no stall
ld [sp+20]→r6
st r1→[sp+0]
sub r6,r5→r4 //no stall
st r4→[sp+12]
```

# Compiler Scheduling Requires

---

- **Large scheduling scope**

- Independent instruction to put between load-use pairs
  - + Original example: large scope, two independent computations
  - This example: small scope, one computation

Before

```
ld [sp+4]→r2
ld [sp+8]→r3
add r2,r3→r1 //stall
st r1→[sp+0]
```

After (same!)

```
ld [sp+4]→r2
ld [sp+8]→r3
add r2,r3→r1 //stall
st r1→[sp+0]
```

- Compiler can create **larger** scheduling scopes
  - For example: loop unrolling & function inlining

# Scheduling Scope Limited by Branches

**r1 and r2 are inputs**

loop:

  jz r1, not\_found

  ld [r1+0] → r3

  sub r2, r3 → r4

  jz r4, found

  ld [r1+4] → r1

  jmp loop

Aside: what does this code do?



Legal to move load up past branch?

# Compiler Scheduling Requires

---

- **Enough registers**

- To hold additional “live” values
- Example code contains 7 different values (including `sp`)
- Before: max 3 values live at any time → 3 registers enough
- After: max 4 values live → 3 registers not enough

Original

```
ld [sp+4]→r2
ld [sp+8]→r1
add r1,r2→r1 //stall
st r1→[sp+0]
ld [sp+16]→r2
ld [sp+20]→r1
sub r2,r1→r1 //stall
st r1→[sp+12]
```

Wrong!

```
ld [sp+4]→r2
ld [sp+8]→r1
ld [sp+16]→r2
add r1,r2→r1 // wrong r2
ld [sp+20]→r1
st r1→[sp+0] // wrong r1
sub r2,r1→r1
st r1→[sp+12]
```

# Compiler Scheduling Requires

- **Alias analysis**

- Ability to tell whether load/store reference same memory locations
  - Effectively, whether load/store can be rearranged
- Previous example: easy, loads/stores use same base register (**sp**)
- New example: can compiler tell that **r8 != r9**?
- Must be **conservative**

Before

```
ld [r9+4]→r2
ld [r9+8]→r3
add r3,r2→r1 //stall
st r1→[r9+0]
ld [r8+0]→r5
ld [r8+4]→r6
sub r5,r6→r4 //stall
st r4→[r8+8]
```

Wrong(?)

```
ld [r9+4]→r2
ld [r9+8]→r3
ld [r8+0]→r5 //does r8==r9?
add r3,r2→r1
ld [r8+4]→r6 //does r8+4==r9?
st r1→[r9+0]
sub r5,r6→r4
st r4→[r8+8]
```

# A Good Case: Static Scheduling of SAXPY

---

- **SAXPY** (Single-precision A X Plus Y)
  - Linear algebra routine (used in solving systems of equations)

```
for (i=0;i<N;i++)  
    Z[i]=(A*X[i])+Y[i];
```

```
0: ldf [X+r1]→f1      // loop  
1: mulf f0,f1→f2      // A in f0  
2: ldf [Y+r1]→f3      // X,Y,Z are constant addresses  
3: addf f2,f3→f4  
4: stf f4→[Z+r1]  
5: addi r1,4→r1        // i in r1  
6: blt r1,r2,0          // N*4 in r2
```

- Static scheduling works great for SAXPY
  - All loop iterations independent
  - Use loop unrolling to increase scheduling scope
  - Aliasing analysis is tractable (just ensure X, Y, Z are independent)
  - Still limited by number of registers

# Unrolling & Scheduling SAXPY

- Fuse two (in general, K) iterations of loop
  - Fuse loop control: induction variable (*i*) increment + branch
  - Adjust register names & induction uses (constants → constants+4)
  - Reorder operations to reduce stalls

```
ldf [X+r1]→f1  
mulf f0,f1→f2  
ldf [Y+r1]→f3  
addf f2,f3→f4  
stf f4→[Z+r1]  
addi r1,4→r1  
blt r1,r2,0
```



```
ldf [X+r1]→f1  
mulf f0,f1→f2  
ldf [Y+r1]→f3  
addf f2,f3→f4  
stf f4→[Z+r1]
```



```
ldf [X+r1]→f1  
mulf f0,f1→f2  
ldf [Y+r1]→f3  
addf f2,f3→f4  
stf f4→[Z+r1]  
addi r1,4→r1  
blt r1,r2,0
```

```
ldf [X+r1]→f1  
ldf [X+r2+4]→f5  
mulf f0,f1→f2  
mulf f0,f5→f6  
ldf [Y+r1]→f3  
ldf [Y+r1+4]→f7  
addf f2,f3→f4  
addf f6,f7→f8  
stf f4→[Z+r1]  
stf f8→[Z+r1+4]  
addi r1,8→r1  
blt r1,r2,0
```

# Compiler Scheduling Limitations

---

- Scheduling scope
  - Example: can't generally move memory operations past branches
- Limited number of registers (set by ISA)
- Inexact “memory aliasing” information
  - Often prevents reordering of loads above stores by compiler
- Caches misses (or any runtime event) confound scheduling
  - How can the compiler know which loads will miss vs hit?
  - Can impact the compiler’s scheduling decisions

---

# **Dynamic (Hardware) Scheduling**

# Can Hardware Overcome These Limits?

---

- **Dynamically-scheduled processors**
  - Also called “out-of-order” processors
  - Hardware re-schedules insns...
  - ...within a sliding window of VonNeumann insns
  - As with pipelining and superscalar, ISA unchanged
    - Same hardware/software interface, appearance of in-order
- Increases scheduling scope
  - Does loop unrolling transparently!
  - Uses branch prediction to “unroll” branches
- Examples:
  - Pentium Pro/II/III (3-wide), Core 2 (4-wide), Alpha 21264 (4-wide), MIPS R10000 (4-wide), Power5 (5-wide)

# Example: In-Order Limitations #1

|                  | 0 | 1 | 2         | 3              | 4              | 5         | 6              | 7              | 8              | 9 | 10 | 11 | 12 |
|------------------|---|---|-----------|----------------|----------------|-----------|----------------|----------------|----------------|---|----|----|----|
| Ld [r1] → r2     | F | D | X         | M <sub>1</sub> | M <sub>2</sub> | W         |                |                |                |   |    |    |    |
| add r2 + r3 → r4 | F | D | <b>d*</b> | <b>d*</b>      | <b>d*</b>      | X         | M <sub>1</sub> | M <sub>2</sub> | W              |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F | D         | <b>d*</b>      | <b>d*</b>      | <b>d*</b> | X              | M <sub>1</sub> | M <sub>2</sub> | W |    |    |    |
| ld [r7] → r4     |   | F | D         | <b>p*</b>      | <b>p*</b>      | <b>p*</b> | X              | M <sub>1</sub> | M <sub>2</sub> | W |    |    |    |

- In-order pipeline, three-cycle load-use penalty
  - 2-wide
- Why not the following?

|                         | 0 | 1 | 2         | 3              | 4                    | 5                    | 6              | 7              | 8              | 9 | 10 | 11 | 12 |
|-------------------------|---|---|-----------|----------------|----------------------|----------------------|----------------|----------------|----------------|---|----|----|----|
| Ld [r1] → r2            | F | D | X         | M <sub>1</sub> | M <sub>2</sub>       | W                    |                |                |                |   |    |    |    |
| add r2 + r3 → <b>r4</b> | F | D | <b>d*</b> | <b>d*</b>      | <b>d*</b>            | X                    | M <sub>1</sub> | M <sub>2</sub> | W              |   |    |    |    |
| xor <b>r4</b> ^ r5 → r6 |   | F | D         | <b>d*</b>      | <b>d*</b>            | <b>d*</b>            | X              | M <sub>1</sub> | M <sub>2</sub> | W |    |    |    |
| ld [r7] → <b>r4</b>     |   | F | D         | <b>X</b>       | <b>M<sub>1</sub></b> | <b>M<sub>2</sub></b> | <b>W</b>       |                |                |   |    |    |    |

# Example: In-Order Limitations #2

|                  | 0 | 1 | 2         | 3              | 4              | 5         | 6              | 7              | 8              | 9 | 10 | 11 | 12 |
|------------------|---|---|-----------|----------------|----------------|-----------|----------------|----------------|----------------|---|----|----|----|
| Ld [p1] → p2     | F | D | X         | M <sub>1</sub> | M <sub>2</sub> | W         |                |                |                |   |    |    |    |
| add p2 + p3 → p4 | F | D | <b>d*</b> | <b>d*</b>      | <b>d*</b>      | X         | M <sub>1</sub> | M <sub>2</sub> | W              |   |    |    |    |
| xor p4 ^ p5 → p6 |   | F | D         | <b>d*</b>      | <b>d*</b>      | <b>d*</b> | X              | M <sub>1</sub> | M <sub>2</sub> | W |    |    |    |
| ld [p7] → p8     |   | F | D         | <b>p*</b>      | <b>p*</b>      | <b>p*</b> | X              | M <sub>1</sub> | M <sub>2</sub> | W |    |    |    |

- In-order pipeline, three-cycle load-use penalty
  - 2-wide
- Why not the following:

|                         | 0 | 1 | 2         | 3              | 4                    | 5                    | 6              | 7              | 8              | 9 | 10 | 11 | 12 |
|-------------------------|---|---|-----------|----------------|----------------------|----------------------|----------------|----------------|----------------|---|----|----|----|
| Ld [p1] → p2            | F | D | X         | M <sub>1</sub> | M <sub>2</sub>       | W                    |                |                |                |   |    |    |    |
| add p2 + p3 → <b>p4</b> | F | D | <b>d*</b> | <b>d*</b>      | <b>d*</b>            | X                    | M <sub>1</sub> | M <sub>2</sub> | W              |   |    |    |    |
| xor <b>p4</b> ^ p5 → p6 |   | F | D         | <b>d*</b>      | <b>d*</b>            | <b>d*</b>            | X              | M <sub>1</sub> | M <sub>2</sub> | W |    |    |    |
| ld [p7] → <b>p8</b>     |   | F | D         | <b>X</b>       | <b>M<sub>1</sub></b> | <b>M<sub>2</sub></b> | <b>W</b>       |                |                |   |    |    |    |

# Out-of-Order Pipeline



# Out-of-Order Execution

---

- Also called “Dynamic scheduling”
  - Done by the hardware on-the-fly during execution
- Looks at a “window” of instructions waiting to execute
  - Each cycle, picks the next ready instruction(s)
- Two steps to enable out-of-order execution:
  - Step #1: Register renaming – to avoid “false” dependencies
  - Step #2: Dynamically schedule – to enforce “true” dependencies
- Key to understanding out-of-order execution:
  - **Data dependencies**

# Dependence types

---

- **RAW** (Read After Write) = “true dependence” (true)



- **WAW** (Write After Write) = “output dependence” (false)



- **WAR** (Write After Read) = “anti-dependence” (false)



- WAW & WAR are “false”, Can be **totally eliminated** by “renaming”

# Register Renaming Algorithm

---

- Two key data structures:
  - `maptable[architectural_reg] → physical_reg`
  - Free list: allocate (new) & free registers (implemented as a queue)
- Algorithm: at “decode” stage for each instruction:

```
insn.phys_input1 = maptable[insn.arch_input1]
insn.phys_input2 = maptable[insn.arch_input2]
insn.old_phys_output = maptable[insn.arch_output]
new_reg = new_phys_reg()
maptable[insn.arch_output] = new_reg
insn.phys_output = new_reg
```

- At “commit”
  - Once all older instructions have committed, free register  
`free_phys_reg(insn.old_phys_output)`

# Out-of-order Pipeline



**Have unique register names**

**Now put into out-of-order execution structures**

**In-order commit**

---

# Register Renaming

# Register Renaming Algorithm (Simplified)

---

- Two key data structures:
  - `maptable[architectural_reg] → physical_reg`
  - Free list: allocate (new) & free registers (implemented as a queue)
    - ignore freeing of registers for now
- Algorithm: at “decode” stage for each instruction:

```
insn.phys_input1 = maptable[insn.arch_input1]  
insn.phys_input2 = maptable[insn.arch_input2]
```

```
new_reg = new_phys_reg()  
maptable[insn.arch_output] = new_reg  
insn.phys_output = new_reg
```

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p3 |
| r4 | p4 |
| r5 | p5 |

Map table

|     |
|-----|
| p6  |
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 →

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p3 |
| r4 | p4 |
| r5 | p5 |

Map table

|     |
|-----|
| p6  |
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

$\text{xor } r1 \wedge r2 \rightarrow r3$        $\longrightarrow$        $\text{xor } p1 \wedge p2 \rightarrow p6$   
 $\text{add } r3 + r4 \rightarrow r4$   
 $\text{sub } r5 - r2 \rightarrow r3$   
 $\text{addi } r3 + 1 \rightarrow r1$

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p3 |
| r4 | p4 |
| r5 | p5 |

Map table

|     |
|-----|
| p6  |
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

$\text{xor } r1 \wedge r2 \rightarrow r3$   
 $\text{add } r3 + r4 \rightarrow r4$   
 $\text{sub } r5 - r2 \rightarrow r3$   
 $\text{addi } r3 + 1 \rightarrow r1$

$\longrightarrow$   $\text{xor } p1 \wedge p2 \rightarrow p6$

|           |           |
|-----------|-----------|
| r1        | p1        |
| r2        | p2        |
| <b>r3</b> | <b>p6</b> |
| r4        | p4        |
| r5        | p5        |

Map table

|     |
|-----|
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add **r3** + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 → p6  
add **p6** + **p4** →

|           |           |
|-----------|-----------|
| r1        | p1        |
| r2        | p2        |
| <b>r3</b> | <b>p6</b> |
| <b>r4</b> | <b>p4</b> |
| r5        | p5        |

Map table

|     |
|-----|
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 → p6  
add p6 + p4 → **p7**

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p6 |
| r4 | p4 |
| r5 | p5 |

Map table

|           |
|-----------|
| <b>p7</b> |
| p8        |
| p9        |
| p10       |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 → p6  
add p6 + p4 → p7

|           |           |
|-----------|-----------|
| r1        | p1        |
| r2        | p2        |
| r3        | p6        |
| <b>r4</b> | <b>p7</b> |
| r5        | p5        |

Map table

|     |
|-----|
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 →

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p6 |
| r4 | p7 |
| r5 | p5 |

Map table

|     |
|-----|
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → **p8**

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p6 |
| r4 | p7 |
| r5 | p5 |

Map table

|           |
|-----------|
| <b>p8</b> |
| p9        |
| p10       |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8

|           |           |
|-----------|-----------|
| r1        | p1        |
| r2        | p2        |
| <b>r3</b> | <b>p8</b> |
| r4        | p7        |
| r5        | p5        |

Map table

|     |
|-----|
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi **r3** + 1 → r1

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi **p8** + 1 →



|           |           |
|-----------|-----------|
| r1        | p1        |
| r2        | p2        |
| <b>r3</b> | <b>p8</b> |
| r4        | p7        |
| r5        | p5        |

Map table

|     |
|-----|
| p9  |
| p10 |

Free-list

# Renaming example

---

$\text{xor } r1 \wedge r2 \rightarrow r3$   
 $\text{add } r3 + r4 \rightarrow r4$   
 $\text{sub } r5 - r2 \rightarrow r3$   
 $\text{addi } r3 + 1 \rightarrow r1$

$\text{xor } p1 \wedge p2 \rightarrow p6$   
 $\text{add } p6 + p4 \rightarrow p7$   
 $\text{sub } p5 - p2 \rightarrow p8$   
 $\text{addi } p8 + 1 \rightarrow \mathbf{p9}$



|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table

|           |
|-----------|
| <b>p9</b> |
| p10       |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

|    |    |
|----|----|
| r1 | p9 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table



Free-list

# Register Renaming Overview

- To eliminate register conflicts/hazards
- “Architected” vs “Physical” registers – level of indirection
  - Names:  $r_1, r_2, r_3$
  - Locations:  $p_1, p_2, p_3, p_4, p_5, p_6, p_7$
  - Original mapping:  $r_1 \rightarrow p_1, r_2 \rightarrow p_2, r_3 \rightarrow p_3, p_4-p_7$  are “available”

|                                | MapTable                                                                                                                                                                                                                                                                                                                                                                                                                                                       | FreeList | Original insns | Renamed insns |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|----------------|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|-----------------|------------|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--------------------------------|--------------------------------|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--------------------------------|--------------------------------|------------------------------|
| Time                           | <table border="1"><tr><td><math>r_1</math></td><td><math>r_2</math></td><td><math>r_3</math></td></tr><tr><td><math>p_1</math></td><td><math>p_2</math></td><td><math>p_3</math></td></tr><tr><td><math>p_4</math></td><td><math>p_2</math></td><td><math>p_3</math></td></tr><tr><td><math>p_4</math></td><td><math>p_2</math></td><td><math>p_5</math></td></tr><tr><td><math>p_4</math></td><td><math>p_2</math></td><td><math>p_6</math></td></tr></table> | $r_1$    | $r_2$          | $r_3$         | $p_1$ | $p_2$ | $p_3$ | $p_4$ | $p_2$ | $p_3$ | $p_4$ | $p_2$ | $p_5$ | $p_4$ | $p_2$ | $p_6$ | <table border="1"><tr><td><math>p_4, p_5, p_6, p_7</math></td></tr><tr><td><math>p_5, p_6, p_7</math></td></tr><tr><td><math>p_6, p_7</math></td></tr><tr><td><math>p_7</math></td></tr></table> | $p_4, p_5, p_6, p_7$ | $p_5, p_6, p_7$ | $p_6, p_7$ | $p_7$ | <table><tr><td>add <math>r_2, r_3 \rightarrow r_1</math></td></tr><tr><td>sub <math>r_2, r_1 \rightarrow r_3</math></td></tr><tr><td>mul <math>r_2, r_3 \rightarrow r_3</math></td></tr><tr><td>div <math>r_1, 4 \rightarrow r_1</math></td></tr></table> | add $r_2, r_3 \rightarrow r_1$ | sub $r_2, r_1 \rightarrow r_3$ | mul $r_2, r_3 \rightarrow r_3$ | div $r_1, 4 \rightarrow r_1$ | <table><tr><td>add <math>p_2, p_3 \rightarrow p_4</math></td></tr><tr><td>sub <math>p_2, p_4 \rightarrow p_5</math></td></tr><tr><td>mul <math>p_2, p_5 \rightarrow p_6</math></td></tr><tr><td>div <math>p_4, 4 \rightarrow p_7</math></td></tr></table> | add $p_2, p_3 \rightarrow p_4$ | sub $p_2, p_4 \rightarrow p_5$ | mul $p_2, p_5 \rightarrow p_6$ | div $p_4, 4 \rightarrow p_7$ |
| $r_1$                          | $r_2$                                                                                                                                                                                                                                                                                                                                                                                                                                                          | $r_3$    |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| $p_1$                          | $p_2$                                                                                                                                                                                                                                                                                                                                                                                                                                                          | $p_3$    |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| $p_4$                          | $p_2$                                                                                                                                                                                                                                                                                                                                                                                                                                                          | $p_3$    |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| $p_4$                          | $p_2$                                                                                                                                                                                                                                                                                                                                                                                                                                                          | $p_5$    |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| $p_4$                          | $p_2$                                                                                                                                                                                                                                                                                                                                                                                                                                                          | $p_6$    |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| $p_4, p_5, p_6, p_7$           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| $p_5, p_6, p_7$                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| $p_6, p_7$                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| $p_7$                          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| add $r_2, r_3 \rightarrow r_1$ |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| sub $r_2, r_1 \rightarrow r_3$ |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| mul $r_2, r_3 \rightarrow r_3$ |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| div $r_1, 4 \rightarrow r_1$   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| add $p_2, p_3 \rightarrow p_4$ |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| sub $p_2, p_4 \rightarrow p_5$ |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| mul $p_2, p_5 \rightarrow p_6$ |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
| div $p_4, 4 \rightarrow p_7$   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
|                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
|                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
|                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
|                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |
|                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                |          |                |               |       |       |       |       |       |       |       |       |       |       |       |       |                                                                                                                                                                                                  |                      |                 |            |       |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |                                                                                                                                                                                                                                                           |                                |                                |                                |                              |

- Renaming – conceptually write each register once
  - + Removes **false** dependences
  - + Leaves **true** dependences intact!
- When to reuse a physical register? After overwriting insn done

# Out-of-order Pipeline



Have unique register names  
Now put into out-of-order execution structures

---

# **Dynamic Scheduling Mechanisms**

# Dynamic Scheduling Overview



- Instructions fetch/decoded/renamed into *Instruction Buffer*
  - Also called “instruction window” or “instruction scheduler”
- Instructions (conceptually) check ready bits every cycle
  - Execute oldest “ready” instruction, set output as “ready”

# Dynamic Scheduling/Issue Algorithm

---

- Data structures:
  - Ready table[phys\_reg] → yes/no (part of “issue queue”)
- Algorithm at “issue” stage (prior to read registers):

foreach instruction:

```
if table[insn.phys_input1] == ready &&
    table[insn.phys_input2] == ready then
        insn is “ready”
```

select the oldest “ready” instruction

```
table[insn.phys_output] = ready
```

- Multiple-cycle instructions? (such as loads)
  - For an insn with latency of N, set “ready” bit N-1 cycles in future

# Dispatch

---

- Put renamed instructions into out-of-order structures
- Re-order buffer (ROB)
  - Holds instructions from Fetch through Commit
- Issue Queue
  - Central piece of scheduling logic
  - Holds instructions from Dispatch through Issue
  - Tracks ready inputs
    - Physical register names + ready bit
    - “AND” the bits to tell if ready



# Dispatch Steps

---

- Allocate Issue Queue (IQ) slot
  - Full? Stall
- Read **ready bits** of inputs
  - 1-bit per physical reg
- Clear **ready bit** of output in table
  - Instruction has not produced value yet
- Write data into Issue Queue (IQ) slot

# Dispatch Example

---

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

**Ready bits**

|    |   |
|----|---|
| p1 | y |
| p2 | y |
| p3 | y |
| p4 | y |
| p5 | y |
| p6 | y |
| p7 | y |
| p8 | y |
| p9 | y |

**Issue Queue**

| Insn | Inp1 | R | Inp2 | R | Dst | Bday |
|------|------|---|------|---|-----|------|
|      |      |   |      |   |     |      |
|      |      |   |      |   |     |      |
|      |      |   |      |   |     |      |
|      |      |   |      |   |     |      |

# Dispatch Example

---

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

**Ready bits**

|           |          |
|-----------|----------|
| p1        | y        |
| p2        | y        |
| p3        | y        |
| p4        | y        |
| p5        | y        |
| <b>p6</b> | <b>n</b> |
| p7        | y        |
| p8        | y        |
| p9        | y        |

**Issue Queue**

| Insn | Inp1 | R | Inp2 | R | Dst | Bday |
|------|------|---|------|---|-----|------|
| xor  | p1   | y | p2   | y | p6  | 0    |
|      |      |   |      |   |     |      |
|      |      |   |      |   |     |      |
|      |      |   |      |   |     |      |

# Dispatch Example

---

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

**Ready bits**

|           |          |
|-----------|----------|
| p1        | y        |
| p2        | y        |
| p3        | y        |
| p4        | y        |
| p5        | y        |
| p6        | n        |
| <b>p7</b> | <b>n</b> |
| p8        | y        |
| p9        | y        |

**Issue Queue**

| Insn | Inp1      | R        | Inp2      | R        | Dst       | Bday     |
|------|-----------|----------|-----------|----------|-----------|----------|
| xor  | p1        | y        | p2        | y        | p6        | 0        |
| add  | <b>p6</b> | <b>n</b> | <b>p4</b> | <b>y</b> | <b>p7</b> | <b>1</b> |
|      |           |          |           |          |           |          |
|      |           |          |           |          |           |          |

# Dispatch Example

---

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

**Ready bits**

|           |          |
|-----------|----------|
| p1        | y        |
| p2        | y        |
| p3        | y        |
| p4        | y        |
| p5        | y        |
| p6        | n        |
| p7        | n        |
| <b>p8</b> | <b>n</b> |
| p9        | y        |

**Issue Queue**

| Insn | Inp1 | R | Inp2 | R | Dst | Bday |
|------|------|---|------|---|-----|------|
| xor  | p1   | y | p2   | y | p6  | 0    |
| add  | p6   | n | p4   | y | p7  | 1    |
| sub  | p5   | y | p2   | y | p8  | 2    |
|      |      |   |      |   |     |      |

# Dispatch Example

---

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

**Ready bits**

|    |   |
|----|---|
| p1 | y |
| p2 | y |
| p3 | y |
| p4 | y |
| p5 | y |
| p6 | n |
| p7 | n |
| p8 | n |
| p9 | n |

**Issue Queue**

| Insn | Inp1 | R | Inp2 | R | Dst | Bday |
|------|------|---|------|---|-----|------|
| xor  | p1   | y | p2   | y | p6  | 0    |
| add  | p6   | n | p4   | y | p7  | 1    |
| sub  | p5   | y | p2   | y | p8  | 2    |
| addi | p8   | n | ---  | y | p9  | 3    |

# Out-of-order pipeline

---

- Execution (out-of-order) stages
- **Select** ready instructions
  - Send for execution
- **Wakeup** dependents



# Dynamic Scheduling/Issue Algorithm

---

- Data structures:
  - Ready table[phys\_reg] → yes/no (part of issue queue)
- Algorithm at “schedule” stage (prior to read registers):

foreach instruction:

```
if table[insn.phys_input1] == ready &&
    table[insn.phys_input2] == ready then
        insn is "ready"
```

select the oldest “ready” instruction

```
table[insn.phys_output] = ready
```

# Issue = Select + Wakeup

---

- **Select** oldest of “ready” instructions
  - “xor” is the oldest ready instruction below
  - “xor” and “sub” are the two oldest ready instructions below
  - Note: may have resource constraints: i.e. load/store/floating point

| Insn | Inp1 | R | Inp2 | R | Dst | Bday |        |
|------|------|---|------|---|-----|------|--------|
| xor  | p1   | y | p2   | y | p6  | 0    | Ready! |
| add  | p6   | n | p4   | y | p7  | 1    |        |
| sub  | p5   | y | p2   | y | p8  | 2    | Ready! |
| addi | p8   | n | ---  | y | p9  | 3    |        |

# Issue = Select + Wakeup

---

- Wakeup dependent instructions
  - Search for destination (Dst) in inputs & set “ready” bit
    - Implemented with a special memory array circuit called a Content Addressable Memory (CAM)
  - Also update ready-bit table for future instructions

| Insn | Inp1      | R        | Inp2 | R | Dst       | Bday |
|------|-----------|----------|------|---|-----------|------|
| xor  | p1        | y        | p2   | y | <b>p6</b> | 0    |
| add  | <b>p6</b> | <b>y</b> | p4   | y | p7        | 1    |
| sub  | p5        | y        | p2   | y | <b>p8</b> | 2    |
| addi | <b>p8</b> | <b>y</b> | ---  | y | p9        | 3    |

Ready bits

|           |          |
|-----------|----------|
| p1        | y        |
| p2        | y        |
| p3        | y        |
| p4        | y        |
| p5        | y        |
| <b>p6</b> | <b>y</b> |
| p7        | n        |
| <b>p8</b> | <b>y</b> |
| p9        | n        |

- For multi-cycle operations (loads, floating point)
  - Wakeup deferred a few cycles
  - Include checks to avoid structural hazards

# Issue

---

- **Select/Wakeup** one cycle
- Dependent instructions execute on back-to-back cycles
  - Next cycle: add/addi are ready:

| Insn | Inp1 | R | Inp2 | R | Dst | Bday |
|------|------|---|------|---|-----|------|
|      |      |   |      |   |     |      |
| add  | p6   | y | p4   | y | p7  | 1    |
|      |      |   |      |   |     |      |
| addi | p8   | y | ---  | y | p9  | 3    |

- Issued instructions are removed from issue queue
  - Free up space for subsequent instructions

# 000 execution (2-wide)



# 000 execution (2-wide)



# 000 execution (2-wide)



# 000 execution (2-wide)



# 000 execution (2-wide)



# 000 execution (2-wide)



# OOO execution (2-wide)

Note similarity  
to in-order!



---

# **Renaming Revisited**

# Re-order Buffer (ROB)

---

- ROB entry holds all info for recovery/commit
  - **All instructions** & in order
  - Architectural register names, physical register names, insn type
  - Not removed until very last thing ("commit")
- Operation
  - Fetch: insert at tail (if full, stall)
  - Commit: remove from head (if not yet done, stall)
- Purpose: tracking for in-order commit
  - Maintain appearance of in-order execution
  - Needed to support:
    - **Misprediction recovery**
    - **Freeing of physical registers**

# Renaming revisited

---

- Track (or “log”) the “overwritten register” in ROB
  - Free this register at commit
  - Also used to restore the map table on “recovery”
    - Used for branch misprediction recovery

# Register Renaming Algorithm (Full)

---

- Two key data structures:
  - `maptable[architectural_reg] → physical_reg`
  - Free list: allocate (new) & free registers (implemented as a queue)
- Algorithm: at “decode” stage for each instruction:

```
insn.phys_input1 = maptable[insn.arch_input1]
insn.phys_input2 = maptable[insn.arch_input2]
insn.old_phys_output = maptable[insn.arch_output]
new_reg = new_phys_reg()
maptable[insn.arch_output] = new_reg
insn.phys_output = new_reg
```

- At “commit”
  - **Once all older instructions have committed, free register `free_phys_reg(insn.old_phys_output)`**

# Recovery

---

- Completely remove wrong path instructions
  - Flush from IQ
  - Remove from ROB
  - Restore map table to before misprediction
  - Free destination registers
- How to restore map table?
  - Option #1: log-based reverse renaming to recover each instruction
    - Tracks the old mapping to allow it to be reversed
    - Done sequentially for each instruction (slow)
    - See next slides
  - Option #2: checkpoint-based recovery
    - Checkpoint state of maptable and free list each cycle
    - Faster recovery, but requires more state
  - Option #3: hybrid (checkpoint for branches, unwind for others)

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p3 |
| r4 | p4 |
| r5 | p5 |

Map table

|     |
|-----|
| p6  |
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

→ xor p1 ^ p2 → [ p3 ]

|    |           |
|----|-----------|
| r1 | p1        |
| r2 | p2        |
| r3 | <b>p3</b> |
| r4 | p4        |
| r5 | p5        |

Map table

|     |
|-----|
| p6  |
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

→ xor p1 ^ p2 → p6 [ p3 ]

|    |           |
|----|-----------|
| r1 | p1        |
| r2 | p2        |
| r3 | <b>p6</b> |
| r4 | p4        |
| r5 | p5        |

Map table

|     |
|-----|
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

→ xor p1 ^ p2 → p6  
add p6 + p4 → [ p3 ]  
[ p4 ]

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p6 |
| r4 | p4 |
| r5 | p5 |

Map table

|     |
|-----|
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

→ xor p1 ^ p2 → p6  
add p6 + p4 → p7  
[ p3 ]  
[ p4 ]

|    |           |
|----|-----------|
| r1 | p1        |
| r2 | p2        |
| r3 | p6        |
| r4 | <b>p7</b> |
| r5 | p5        |

Map table

|     |
|-----|
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 →  
[ p3 ]  
[ p4 ]  
[ p6 ]

|    |           |
|----|-----------|
| r1 | p1        |
| r2 | p2        |
| r3 | <b>p6</b> |
| r4 | p7        |
| r5 | p5        |

Map table

|     |
|-----|
| p8  |
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p3 + 1 → p1

[ p3 ]  
[ p4 ]  
[ p6 ]

|    |           |
|----|-----------|
| r1 | p1        |
| r2 | p2        |
| r3 | <b>p8</b> |
| r4 | p7        |
| r5 | p5        |

Map table

|     |
|-----|
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1



xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 →

[ p3 ]  
[ p4 ]  
[ p6 ]  
[ p1 ]

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table

|     |
|-----|
| p9  |
| p10 |

Free-list

# Renaming example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

[ p3 ]  
[ p4 ]  
[ p6 ]  
[ p1 ]

|    |    |
|----|----|
| r1 | p9 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table



Free-list

# Recovery Example

**Now, let's use this info. to recover from a branch misprediction**

**bnz r1 loop**

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

**bnz p1, loop**

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

[ ]

[ p3 ]

[ p4 ]

[ p6 ]

[ p1 ]

|    |    |
|----|----|
| r1 | p9 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table



Free-list

# Recovery Example

---

bnz r1 loop  
xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

bnz p1, loop [ ]  
xor p1 ^ p2 → p6 [ p3 ]  
add p6 + p4 → p7 [ p4 ]  
sub p5 - p2 → p8 [ p6 ]  
addi p8 + 1 → p9 [ p1 ]

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table



Free-list

# Recovery Example

---

bnz r1 loop  
xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3

bnz p1, loop  
xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8

[ ]  
[ p3 ]  
[ p4 ]  
[ p6 ]

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p6 |
| r4 | p7 |
| r5 | p5 |

Map table

|     |
|-----|
| p8  |
| p9  |
| p10 |

Free-list

# Recovery Example

---

bnz r1 loop  
xor r1 ^ r2 → r3  
add r3 + r4 → r4

bnz p1, loop  
xor p1 ^ p2 → p6  
add p6 + p4 → p7

[ ]  
[ p3 ]  
[ p4 ]

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p6 |
| r4 | p4 |
| r5 | p5 |

Map table

|     |
|-----|
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Recovery Example

---

bnz r1 loop

xor r1 ^ r2 → r3

bnz p1, loop

xor p1 ^ p2 → p6

[ ]  
[ p3 ]

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p3 |
| r4 | p4 |
| r5 | p5 |

Map table

|     |
|-----|
| p6  |
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Recovery Example

---

bnz r1, loop

bnz p1, loop

[ ]

|    |    |
|----|----|
| r1 | p1 |
| r2 | p2 |
| r3 | p3 |
| r4 | p4 |
| r5 | p5 |

Map table

|     |
|-----|
| p6  |
| p7  |
| p8  |
| p9  |
| p10 |

Free-list

# Commit

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

xor p1 ^ p2 → p6 [ p3 ]  
add p6 + p4 → p7 [ p4 ]  
sub p5 - p2 → p8 [ p6 ]  
addi p8 + 1 → p9 [ p1 ]

- Commit: instruction becomes **architected state**
  - In-order, only when instructions are finished
  - Free overwritten register (why?)

# Freeing over-written register

---

```
xor r1 ^ r2 -> r3  
add r3 + r4 -> r4  
sub r5 - r2 -> r3  
addi r3 + 1 -> r1
```

```
xor p1 ^ p2 -> p6  
add p6 + p4 -> p7  
sub p5 - p2 -> p8  
addi p8 + 1 -> p9
```

[ p3 ]  
[ p4 ]  
[ p6 ]  
[ p1 ]

- P3 was r3 **before** xor
- P6 is r3 **after** xor
  - Anything older than xor should read p3
  - Anything younger than xor should read p6 (until another insn writes r3)
- At commit of xor, **no older instructions exist**

# Commit Example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

xor p1 ^ p2 → p6  
add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

[ p3 ]  
[ p4 ]  
[ p6 ]  
[ p1 ]

|    |    |
|----|----|
| r1 | p9 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table



Free-list

# Commit Example

---

xor r1 ^ r2 → r3  
add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

xor p1 ^ p2 → p6 [ p3 ]  
add p6 + p4 → p7 [ p4 ]  
sub p5 - p2 → p8 [ p6 ]  
addi p8 + 1 → p9 [ p1 ]

|    |    |
|----|----|
| r1 | p9 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table

|     |
|-----|
| p10 |
| p3  |

Free-list

# Commit Example

---

add r3 + r4 → r4  
sub r5 - r2 → r3  
addi r3 + 1 → r1

add p6 + p4 → p7  
sub p5 - p2 → p8  
addi p8 + 1 → p9

[ p4 ]  
[ p6 ]  
[ p1 ]

|    |    |
|----|----|
| r1 | p9 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table

|     |
|-----|
| p10 |
| p3  |
| p4  |

Free-list

# Commit Example

---

sub r5 - r2 → r3  
addi r3 + 1 → r1

sub p5 - p2 → p8  
addi p8 + 1 → p9

[ p6 ]  
[ p1 ]

|    |    |
|----|----|
| r1 | p9 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table

|     |
|-----|
| p10 |
| p3  |
| p4  |
| p6  |

Free-list

# Commit Example

---

addi r3 + 1 → r1

addi p8 + 1 → p9

[ p1 ]

|    |    |
|----|----|
| r1 | p9 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table

|     |
|-----|
| p10 |
| p3  |
| p4  |
| p6  |
| p1  |

Free-list

# Commit Example

---

|    |    |
|----|----|
| r1 | p9 |
| r2 | p2 |
| r3 | p8 |
| r4 | p7 |
| r5 | p5 |

Map table

|     |
|-----|
| p10 |
| p3  |
| p4  |
| p6  |
| p1  |

Free-list

---

# **Dynamic Scheduling Example**

# Dynamic Scheduling Example

---

- The following slides are a detailed but concrete example
- Yet, it contains enough detail to be overwhelming
  - Try not to worry about the details
- Focus on the big picture:

**Hardware can reorder instructions  
to extract instruction-level parallelism**

# Recall: Motivating Example

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|----------------|----------------|---|---|----|----|----|
| Id [p1] → p2     | F | Di | I  | RR | X  | M <sub>1</sub> | M <sub>2</sub> | W              | C |   |    |    |    |
| add p2 + p3 → p4 | F | Di |    |    |    | I              | RR             | X              | W | C |    |    |    |
| xor p4 ^ p5 → p6 |   | F  | Di |    |    | I              | RR             | X              | W | C |    |    |    |
| Id [p7] → p8     |   | F  | Di | I  | RR | X              | M <sub>1</sub> | M <sub>2</sub> | W |   | C  |    |    |

- How would this execution occur cycle-by-cycle?
- Execution latencies assumed in this example:
  - Loads have two-cycle load-to-use penalty
    - Three cycle total execution latency
  - All other instructions have single-cycle execution latency
- “Issue queue”: hold all waiting (un-executed) instructions
  - Holds ready/not-ready status
  - Faster than looking up in ready table each cycle

# Out-of-Order Pipeline – Cycle 0

| Map Table |    | Ready Table |     |
|-----------|----|-------------|-----|
| r1        | p8 | p1          | yes |
| r2        | p7 | p2          | yes |
| r3        | p6 | p3          | yes |
| r4        | p5 | p4          | yes |
| r5        | p4 | p5          | yes |
| r6        | p3 | p6          | yes |
| r7        | p2 | p7          | yes |
| r8        | p1 | p8          | yes |
|           |    | p9          | --- |
|           |    | p10         | --- |
|           |    | p11         | --- |
|           |    | p12         | --- |

| Reorder Buffer | Insn | To Free | Done? |
|----------------|------|---------|-------|
|                | ld   |         | no    |
|                | add  |         | no    |
|                |      |         |       |
|                |      |         |       |

| <b>Insn</b> | <b>Src1</b> | <b>R?</b> | <b>Src2</b> | <b>R?</b> | <b>Dest</b> | <b>Bdy</b> |
|-------------|-------------|-----------|-------------|-----------|-------------|------------|
|             |             |           |             |           |             |            |
|             |             |           |             |           |             |            |
|             |             |           |             |           |             |            |
|             |             |           |             |           |             |            |

# Out-of-Order Pipeline – Cycle 1a

|                  | 0 | 1  | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|---|---|---|---|---|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di |   |   |   |   |   |   |   |   |    |    |    |
| add r2 + r3 → r4 | F |    |   |   |   |   |   |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   |    |   |   |   |   |   |   |   |   |    |    |    |
| Id [r7] → r4     |   |    |   |   |   |   |   |   |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1 p8     | p1 yes      |
| r2 p9     | p2 yes      |
| r3 p6     | p3 yes      |
| r4 p5     | p4 yes      |
| r5 p4     | p5 yes      |
| r6 p3     | p6 yes      |
| r7 p2     | p7 yes      |
| r8 p1     | p8 yes      |
|           | p9 no       |
|           | p10 ---     |
|           | p11 ---     |
|           | p12 ---     |

Issue Queue

| Insn | Src1 | R?  | Src2 | R?  | Dest | Bdy |
|------|------|-----|------|-----|------|-----|
| Id   | p8   | yes | ---  | yes | p9   | 0   |
|      |      |     |      |     |      |     |
|      |      |     |      |     |      |     |
|      |      |     |      |     |      |     |

# Out-of-Order Pipeline – Cycle 1b

|                  | 0 | 1  | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|---|---|---|---|---|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di |   |   |   |   |   |   |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |   |   |   |   |   |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   |    |   |   |   |   |   |   |   |   |    |    |    |
| Id [r7] → r4     |   |    |   |   |   |   |   |   |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p10         |
| r5        | p4          |
| r6        | p3          |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |
|----------------|------|---------|-------|
|                | Insn | To Free | Done? |
|                | Id   | p7      | no    |
|                | add  | p5      | no    |
|                |      |         |       |
|                |      |         |       |
|                |      |         |       |

  

| Issue Queue |      |     |      |     |      |     |
|-------------|------|-----|------|-----|------|-----|
| Insn        | Src1 | R?  | Src2 | R?  | Dest | Bdy |
| Id          | p8   | yes | ---  | yes | p9   | 0   |
| add         | p9   | no  | p6   | yes | p10  | 1   |
|             |      |     |      |     |      |     |
|             |      |     |      |     |      |     |
|             |      |     |      |     |      |     |

# Out-of-Order Pipeline – Cycle 1c

| Map Table |     | Ready Table |     |
|-----------|-----|-------------|-----|
| r1        | p8  | p1          | yes |
| r2        | p9  | p2          | yes |
| r3        | p6  | p3          | yes |
| r4        | p10 | p4          | yes |
| r5        | p4  | p5          | yes |
| r6        | p3  | p6          | yes |
| r7        | p2  | p7          | yes |
| r8        | p1  | p8          | yes |
|           |     | p9          | no  |
|           |     | p10         | no  |
|           |     | p11         | --- |
|           |     | p12         | --- |

| Reorder Buffer | <b>Insn</b> | <b>To Free</b> |  | <b>Done?</b> |
|----------------|-------------|----------------|--|--------------|
|                | ld          | p7             |  | no           |
|                | add         | p5             |  | no           |
|                | xor         |                |  | no           |
|                | ld          |                |  | no           |

  

| <b>Insn</b> | <b>Src1</b> | <b>R?</b> | <b>Src2</b> | <b>R?</b> | <b>Dest</b> | <b>Bdy</b> |
|-------------|-------------|-----------|-------------|-----------|-------------|------------|
| ld          | p8          | yes       | ---         | yes       | p9          | 0          |
| add         | p9          | no        | p6          | yes       | p10         | 1          |
|             |             |           |             |           |             |            |
|             |             |           |             |           |             |            |

# Out-of-Order Pipeline – Cycle 2a

|                  | 0 | 1  | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|---|---|---|---|---|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I |   |   |   |   |   |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |   |   |   |   |   |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   |    | F |   |   |   |   |   |   |   |    |    |    |
| Id [r7] → r4     |   |    | F |   |   |   |   |   |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p10         |
| r5        | p4          |
| r6        | p3          |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer |  | Insn | To Free | Done? |
|----------------|--|------|---------|-------|
|                |  | Id   | p7      | no    |
|                |  | add  | p5      | no    |
|                |  | xor  |         | no    |
|                |  | Id   |         | no    |

  

| Issue Queue |      |     |      |     |      |     |
|-------------|------|-----|------|-----|------|-----|
| Insn        | Src1 | R?  | Src2 | R?  | Dest | Bdy |
| Id          | p8   | yes | ---  | yes | p9   | 0   |
| add         | p9   | no  | p6   | yes | p10  | 1   |
|             |      |     |      |     |      |     |
|             |      |     |      |     |      |     |
|             |      |     |      |     |      |     |

# Out-of-Order Pipeline – Cycle 2b

|                  | 0 | 1  | 2  | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|---|---|---|---|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  |   |   |   |   |   |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |   |   |   |   |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |   |   |   |   |   |   |   |    |    |    |
| Id [r7] → r4     |   | F  |    |   |   |   |   |   |   |   |    |    |    |

| Map Table |
|-----------|
| r1 p8     |
| r2 p9     |
| r3 p6     |
| r4 p10    |
| r5 p4     |
| r6 p11    |
| r7 p2     |
| r8 p1     |

| Ready Table |
|-------------|
| p1 yes      |
| p2 yes      |
| p3 yes      |
| p4 yes      |
| p5 yes      |
| p6 yes      |
| p7 yes      |
| p8 yes      |
| p9 no       |
| p10 no      |
| p11 no      |
| ---         |

| Reorder Buffer |         |       |  |  |  |  |
|----------------|---------|-------|--|--|--|--|
| Insn           | To Free | Done? |  |  |  |  |
| Id             | p7      | no    |  |  |  |  |
| add            | p5      | no    |  |  |  |  |
| xor            | p3      | no    |  |  |  |  |
| Id             |         | no    |  |  |  |  |

  

| Issue Queue |      |     |      |     |      |     |
|-------------|------|-----|------|-----|------|-----|
| Insn        | Src1 | R?  | Src2 | R?  | Dest | Bdy |
| Id          | p8   | yes |      | yes | p9   | 0   |
| add         | p9   | no  | p6   | yes | p10  | 1   |
| xor         | p10  | no  | p4   | yes | p11  | 2   |
|             |      |     |      |     |      |     |

# Out-of-Order Pipeline – Cycle 2c

|                  | 0 | 1  | 2  | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|---|---|---|---|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  |   |   |   |   |   |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |   |   |   |   |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |   |   |   |   |   |   |   |    |    |    |
| Id [r7] → r4     |   | F  | Di |   |   |   |   |   |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |      |    |      |     |
|----------------|------|---------|-------|------|----|------|-----|
|                | Insn | Src1    | R?    | Src2 | R? | Dest | Bdy |
| Id             | p7   |         |       |      |    |      |     |
| add            | p5   |         |       |      |    |      |     |
| xor            | p3   |         |       |      |    |      |     |
| Id             | p10  |         |       |      |    |      |     |

  

| Issue Queue |
|-------------|
|             |

# Out-of-Order Pipeline – Cycle 3

|                  | 0 | 1  | 2  | 3  | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|---|---|---|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR |   |   |   |   |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |   |   |   |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |   |   |   |   |   |   |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  |   |   |   |   |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |      |     |      |     |
|----------------|------|---------|-------|------|-----|------|-----|
|                | Insn | Src1    | R?    | Src2 | R?  | Dest | Bdy |
|                | Id   | p7      | no    |      |     |      |     |
|                | add  | p5      | no    |      |     |      |     |
|                | xor  | p3      | no    |      |     |      |     |
|                | Id   | p10     | no    |      |     |      |     |
| Issue Queue    |      |         |       |      |     |      |     |
|                | Insn | Src1    | R?    | Src2 | R?  | Dest | Bdy |
|                | Id   | p8      | yes   |      | yes | p9   | 0   |
|                | add  | p9      | no    | p6   | yes | p10  | 1   |
|                | xor  | p10     | no    | p4   | yes | p11  | 2   |
|                | Id   | p2      | yes   | ---  | yes | p12  | 3   |

# Out-of-Order Pipeline – Cycle 4

|                  | 0 | 1  | 2  | 3  | 4  | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|---|---|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  |   |   |   |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    |   |   |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |   |   |   |   |   |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR |   |   |   |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |      |     |      |     |
|----------------|------|---------|-------|------|-----|------|-----|
|                | Insn | Src1    | R?    | Src2 | R?  | Dest | Bdy |
|                | Id   | p8      | yes   |      | yes | p9   | 0   |
|                | add  | p9      | yes   | p6   | yes | p10  | 1   |
|                | xor  | p10     | no    | p4   | yes | p11  | 2   |
|                | Id   | p2      | yes   |      | yes | p12  | 3   |

Issue Queue

# Out-of-Order Pipeline – Cycle 5a

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|---|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> |   |   |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              |   |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                |   |   |   |   |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              |   |   |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer |  | Insn | To Free | Done? |
|----------------|--|------|---------|-------|
|                |  | Id   | p7      | no    |
|                |  | add  | p5      | no    |
|                |  | xor  | p3      | no    |
|                |  | Id   | p10     | no    |

  

| Issue Queue |      |     |      |     |      |     |
|-------------|------|-----|------|-----|------|-----|
| Insn        | Src1 | R?  | Src2 | R?  | Dest | Bdy |
| Id          | p8   | yes |      | yes | p9   | 0   |
| add         | p9   | yes | p6   | yes | p10  | 1   |
| xor         | p10  | yes | p4   | yes | p11  | 2   |
| Id          | p2   | yes |      | yes | p12  | 3   |

# Out-of-Order Pipeline – Cycle 5b

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|---|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> |   |   |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              |   |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                |   |   |   |   |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              |   |   |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |
|----------------|------|---------|-------|
|                | Id   | p7      | no    |
|                | add  | p5      | no    |
|                | xor  | p3      | no    |
|                | Id   | p10     | no    |

| Issue Queue | Insn | Src1 | R?  | Src2 | R?  | Dest | Bdy |
|-------------|------|------|-----|------|-----|------|-----|
|             | Id   | p8   | yes |      | yes | p9   | 0   |
|             | add  | p9   | yes | p6   | yes | p10  | 1   |
|             | xor  | p10  | yes | p4   | yes | p11  | 2   |
|             | Id   | p2   | yes |      | yes | p12  | 3   |

# Out-of-Order Pipeline – Cycle 6

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6              | 7 | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|----------------|---|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> | M <sub>2</sub> |   |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              | RR             |   |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                |                | I |   |   |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              | M <sub>1</sub> |   |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |
|----------------|------|---------|-------|
|                | Id   | p7      | no    |
|                | add  | p5      | no    |
|                | xor  | p3      | no    |
|                | Id   | p10     | no    |

Issue Queue

| Insn | Src1 | R?  | Src2 | R?  | Dest | Bdy |
|------|------|-----|------|-----|------|-----|
| Id   | p8   | yes |      | yes | p9   | 0   |
| add  | p9   | yes | p6   | yes | p10  | 1   |
| xor  | p10  | yes | p4   | yes | p11  | 2   |
| Id   | p2   | yes |      | yes | p12  | 3   |

# Out-of-Order Pipeline – Cycle 7

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|----------------|----------------|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> | M <sub>2</sub> | W              |   |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              | RR             | X              |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                | I              | RR             |   |   |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              | M <sub>1</sub> | M <sub>2</sub> |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |
|----------------|------|---------|-------|
|                | Id   | p7      | yes   |
|                | add  | p5      | no    |
|                | xor  | p3      | no    |
|                | Id   | p10     | no    |

Issue Queue

| Insn | Src1 | R?  | Src2 | R?  | Dest | Bdy |
|------|------|-----|------|-----|------|-----|
| Id   | p8   | yes |      | yes | p9   | 0   |
| add  | p9   | yes | p6   | yes | p10  | 1   |
| xor  | p10  | yes | p4   | yes | p11  | 2   |
| Id   | p2   | yes |      | yes | p12  | 3   |

# Out-of-Order Pipeline – Cycle 8a

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|----------------|----------------|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> | M <sub>2</sub> | W              | C |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              | RR             | X              |   |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                | I              | RR             |   |   |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              | M <sub>1</sub> | M <sub>2</sub> |   |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |
|----------------|------|---------|-------|
|                | Id   | p7      | yes   |
|                | add  | p5      | no    |
|                | xor  | p3      | no    |
|                | Id   | p10     | no    |

Issue Queue

| Insn | Src1 | R?  | Src2 | R?  | Dest | Bdy |
|------|------|-----|------|-----|------|-----|
| Id   | p8   | yes |      | yes | p9   | 0   |
| add  | p9   | yes | p6   | yes | p10  | 1   |
| xor  | p10  | yes | p4   | yes | p11  | 2   |
| Id   | p2   | yes |      | yes | p12  | 3   |

# Out-of-Order Pipeline – Cycle 8b

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|----------------|----------------|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> | M <sub>2</sub> | W              | C |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              | RR             | X              | W |   |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                | I              | RR             | X |   |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              | M <sub>1</sub> | M <sub>2</sub> | W |   |    |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |
|----------------|------|---------|-------|
|                | Id   | p7      | yes   |
|                | add  | p5      | yes   |
|                | xor  | p3      | no    |
|                | Id   | p10     | yes   |

Issue Queue

| Insn | Src1 | R?  | Src2 | R?  | Dest | Bdy |
|------|------|-----|------|-----|------|-----|
| Id   | p8   | yes |      | yes | p9   | 0   |
| add  | p9   | yes | p6   | yes | p10  | 1   |
| xor  | p10  | yes | p4   | yes | p11  | 2   |
| Id   | p2   | yes |      | yes | p12  | 3   |

# Out-of-Order Pipeline – Cycle 9a

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|----------------|----------------|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> | M <sub>2</sub> | W              | C |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              | RR             | X              | W | C |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                | I              | RR             | X |   |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              | M <sub>1</sub> | M <sub>2</sub> | W |   |    |    |    |

| Map Table |
|-----------|
| r1 p8     |
| r2 p9     |
| r3 p6     |
| r4 p12    |
| r5 p4     |
| r6 p11    |
| r7 p2     |
| r8 p1     |

| Ready Table |
|-------------|
| p1 yes      |
| p2 yes      |
| p3 yes      |
| p4 yes      |
| p5 ---      |
| p6 yes      |
| p7 ---      |
| p8 yes      |
| p9 yes      |
| p10 yes     |
| p11 yes     |
| p12 yes     |

| Reorder Buffer |  | Insn | To Free | Done? |
|----------------|--|------|---------|-------|
|                |  | Id   | p7      | yes   |
|                |  | add  | p5      | yes   |
|                |  | xor  | p3      | no    |
|                |  | Id   | p10     | yes   |

  

| Issue Queue |      |     |      |     |      |     |
|-------------|------|-----|------|-----|------|-----|
| Insn        | Src1 | R?  | Src2 | R?  | Dest | Bdy |
| Id          | p8   | yes |      |     | p9   | 0   |
| add         | p9   | yes | p6   | yes | p10  | 1   |
| xor         | p10  | yes | p4   | yes | p11  | 2   |
| Id          | p2   | yes |      |     | p12  | 3   |

# Out-of-Order Pipeline – Cycle 9b

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|----------------|----------------|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> | M <sub>2</sub> | W              | C |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              | RR             | X              | W | C |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                | I              | RR             | X | W |    |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              | M <sub>1</sub> | M <sub>2</sub> | W |   |    |    |    |

| Map Table |
|-----------|
| r1 p8     |
| r2 p9     |
| r3 p6     |
| r4 p12    |
| r5 p4     |
| r6 p11    |
| r7 p2     |
| r8 p1     |

| Ready Table |
|-------------|
| p1 yes      |
| p2 yes      |
| p3 yes      |
| p4 yes      |
| p5 ---      |
| p6 yes      |
| p7 ---      |
| p8 yes      |
| p9 yes      |
| p10 yes     |
| p11 yes     |
| p12 yes     |

| Reorder Buffer |         |       |  |  |  |     |
|----------------|---------|-------|--|--|--|-----|
| Insn           | To Free | Done? |  |  |  |     |
| Id             | p7      |       |  |  |  | yes |
| add            | p5      |       |  |  |  | yes |
| xor            | p3      |       |  |  |  | yes |
| Id             | p10     |       |  |  |  | yes |

  

| Issue Queue |      |     |      |     |      |     |
|-------------|------|-----|------|-----|------|-----|
| Insn        | Src1 | R?  | Src2 | R?  | Dest | Bdy |
| Id          | p8   | yes |      |     | p9   | 0   |
| add         | p9   | yes | p6   | yes | p10  | 1   |
| xor         | p10  | yes | p4   | yes | p11  | 2   |
| Id          | p2   | yes |      |     | p12  | 3   |

# Out-of-Order Pipeline – Cycle 10

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|----------------|----------------|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> | M <sub>2</sub> | W              | C |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              | RR             | X              | W | C |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                | I              | RR             | X | W | C  |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              | M <sub>1</sub> | M <sub>2</sub> | W |   | C  |    |    |

| Map Table | Ready Table |
|-----------|-------------|
| r1        | p8          |
| r2        | p9          |
| r3        | p6          |
| r4        | p12         |
| r5        | p4          |
| r6        | p11         |
| r7        | p2          |
| r8        | p1          |

| Reorder Buffer | Insn | To Free | Done? |
|----------------|------|---------|-------|
|                | Id   | p7      | yes   |
|                | add  | p5      | yes   |
|                | xor  | p3      | yes   |
|                | Id   | p10     | yes   |

Issue Queue

| Insn | Src1 | R?  | Src2 | R?  | Dest | Bdy |
|------|------|-----|------|-----|------|-----|
| Id   | p8   | yes |      | yes | p9   | 0   |
| add  | p9   | yes | p6   | yes | p10  | 1   |
| xor  | p10  | yes | p4   | yes | p11  | 2   |
| Id   | p2   | yes |      | yes | p12  | 3   |

# Out-of-Order Pipeline – Done!

|                  | 0 | 1  | 2  | 3  | 4  | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----|----------------|----------------|----------------|---|---|----|----|----|
| Id [r1] → r2     | F | Di | I  | RR | X  | M <sub>1</sub> | M <sub>2</sub> | W              | C |   |    |    |    |
| add r2 + r3 → r4 | F | Di |    |    |    | I              | RR             | X              | W | C |    |    |    |
| xor r4 ^ r5 → r6 |   | F  | Di |    |    |                | I              | RR             | X | W | C  |    |    |
| Id [r7] → r4     |   | F  | Di | I  | RR | X              | M <sub>1</sub> | M <sub>2</sub> | W |   | C  |    |    |

| Map Table |
|-----------|
| r1 p8     |
| r2 p9     |
| r3 p6     |
| r4 p12    |
| r5 p4     |
| r6 p11    |
| r7 p2     |
| r8 p1     |

| Ready Table |
|-------------|
| p1 yes      |
| p2 yes      |
| p3 ---      |
| p4 yes      |
| p5 ---      |
| p6 yes      |
| p7 ---      |
| p8 yes      |
| p9 yes      |
| p10 ---     |
| p11 yes     |
| p12 yes     |

| Reorder Buffer |  | Insn | To Free | Done? |
|----------------|--|------|---------|-------|
|                |  | Id   | p7      | yes   |
|                |  | add  | p5      | yes   |
|                |  | xor  | p3      | yes   |
|                |  | Id   | p10     | yes   |

  

| Issue Queue |      |     |      |     |      |     |
|-------------|------|-----|------|-----|------|-----|
| Insn        | Src1 | R?  | Src2 | R?  | Dest | Bdy |
| Id          | p8   | yes |      | yes | p9   | 0   |
| add         | p9   | yes | p6   | yes | p10  | 1   |
| xor         | p10  | yes | p4   | yes | p11  | 2   |
| Id          | p2   | yes |      | yes | p12  | 3   |

---

# Handling Memory Operations

# Recall: Types of Dependencies

---

- RAW (Read After Write) = “true dependence”



- WAW (Write After Write) = “output dependence”



- WAR (Write After Read) = “anti-dependence”



- WAW & WAR are “false”, Can be **totally eliminated** by “renaming”

# Also Have Dependencies via Memory

---

- If value in “r2” and “r3” is the same...
- RAW (Read After Write) – True dependency



- WAW (Write After Write)



- WAR (Write After Read)



WAR/WAW are “false dependencies”

- But can't rename memory in same way as registers
  - Why? Addresses are not known at rename
  - Need to use other tricks

# Let's Start with Just Stores

---

- Stores: Write data cache, not registers
  - Can we rename memory?
- No (at least not easily)
  - Cache writes unrecoverable
- Solution: write stores into cache only when certain
  - When are we certain? At "commit"

# Handling Stores

|                  | 0 | 1  | 2  | 3  | 4              | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----------------|----------------|----------------|----------------|---|---|----|----|----|
| mul p1 * p2 → p3 | F | Di | I  | RR | X <sub>1</sub> | X <sub>2</sub> | X <sub>3</sub> | X <sub>4</sub> | W | C |    |    |    |
| jump-not-zero p3 | F | Di |    |    |                |                | I              | RR             | X | W | C  |    |    |
| st p5 → [p3+4]   |   | F  | Di |    |                |                | I              | RR             | X | M | W  | C  |    |
| st p4 → [p6+8]   |   | F  | Di | I? |                |                |                |                |   |   |    |    |    |

- Can “st p4 → [p6+8]” issue in cycle 3?
  - Its register inputs are ready...

# Problem #1: Out-of-Order Stores

|                  | 0 | 1  | 2  | 3  | 4              | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----------------|----------------|----------------|----------------|---|---|----|----|----|
| mul p1 * p2 → p3 | F | Di | I  | RR | X <sub>1</sub> | X <sub>2</sub> | X <sub>3</sub> | X <sub>4</sub> | W | C |    |    |    |
| jump-not-zero p3 | F | Di |    |    |                |                | I              | RR             | X | W | C  |    |    |
| st p5 → [p3+4]   |   | F  | Di |    |                |                | I              | RR             | X | M | W  | C  |    |
| st p4 → [p6+8]   |   | F  | Di | I? | RR             | X              | M              | W              |   |   |    |    | C  |

- Can “st p4 → [p6+8]” write the cache in cycle 6?
  - “st p5 → [p3+4]” has not yet executed
- What if p3+4 == p6+8?
  - The two stores write the same address! WAW dependency!
  - Not known until their “X” stages (cycle 5 & 8)
- Unappealing solution: all stores execute in-order
- We can do better...

# Problem #2: Speculative Stores

|                  | 0 | 1  | 2  | 3  | 4              | 5              | 6              | 7              | 8 | 9 | 10 | 11 | 12 |
|------------------|---|----|----|----|----------------|----------------|----------------|----------------|---|---|----|----|----|
| mul p1 * p2 → p3 | F | Di | I  | RR | X <sub>1</sub> | X <sub>2</sub> | X <sub>3</sub> | X <sub>4</sub> | W | C |    |    |    |
| jump-not-zero p3 | F | Di |    |    |                |                | I              | RR             | X | W | C  |    |    |
| st p5 → [p3+4]   |   | F  | Di |    |                | I              | RR             | X              | M | W | C  |    |    |
| st p4 → [p6+8]   |   | F  | Di | I? | RR             | X              | M              | W              |   |   |    |    | C  |

- Can “st p4 → [p6+8]” write the cache in cycle 6?
  - Store is still “speculative” at this point
- What if “jump-not-zero” is mis-predicted?
  - Not known until its “X” stage (cycle 8)
- How does it “undo” the store once it hits the cache?
  - Answer: it can’t; stores write the cache only at **commit**
  - Guaranteed to be non-speculative at that point

# Store Queue (SQ)

---

- Solves two problems
  - Allows for recovery of speculative stores
  - Allows out-of-order stores
- Store Queue (SQ)
  - **At dispatch, each store is given a slot in the Store Queue**
  - First-in-first-out (FIFO) queue
  - Each entry contains: “address”, “value”, and “bday”
- Operation:
  - Dispatch (in-order): allocate entry in SQ (stall if full)
  - Execute (out-of-order): write store value into store queue
  - Commit (in-order): read value from SQ and write into data cache
  - Branch recovery: remove entries from the store queue
- Also solves problems with loads

# Store Queue Operation

|                   | 0 | 1  | 2  | 3  | 4              | 5              | 6              | 7              | 8              | 9              | 10 | 11 | 12 |
|-------------------|---|----|----|----|----------------|----------------|----------------|----------------|----------------|----------------|----|----|----|
| fdiv p1 / p2 → p9 | F | Di | I  | RR | X <sub>1</sub> | X <sub>2</sub> | X <sub>3</sub> | X <sub>4</sub> | X <sub>5</sub> | X <sub>6</sub> | W  | C  |    |
| st p4 → [p5+4]    | F | Di | I  | RR | X              | SQ             |                |                |                |                |    | C  |    |
| st p3 → [p6+8]    |   | F  | Di | I  | RR             | X              | SQ             |                |                |                |    |    | C  |

- Stores write to SQ, not M
  - similar to register renaming, where we allocated a new physical register for each insn
- What if the fdiv triggers a /0 exception?

# Memory Forwarding

|                   | 0 | 1  | 2  | 3  | 4              | 5              | 6              | 7              | 8              | 9              | 10 | 11 | 12 |
|-------------------|---|----|----|----|----------------|----------------|----------------|----------------|----------------|----------------|----|----|----|
| fdiv p1 / p2 → p9 | F | Di | I  | RR | X <sub>1</sub> | X <sub>2</sub> | X <sub>3</sub> | X <sub>4</sub> | X <sub>5</sub> | X <sub>6</sub> | W  | C  |    |
| st p4 → [p5+4]    | F | Di | I  | RR | X              | SQ             |                |                |                |                |    | C  |    |
| st p3 → [p6+8]    |   | F  | Di | I  | RR             | X              | SQ             |                |                |                |    |    | C  |
| ld [p7] → p8      |   | F  | Di | I? | RR             | X              | M <sub>1</sub> | M <sub>2</sub> | W              |                |    |    | C  |

- Can “ld [p7] → p8” issue and begin execution?
  - Why or why not?

# Memory Forwarding

|                   | 0 | 1  | 2  | 3  | 4              | 5              | 6              | 7              | 8              | 9              | 10 | 11 | 12 |
|-------------------|---|----|----|----|----------------|----------------|----------------|----------------|----------------|----------------|----|----|----|
| fdiv p1 / p2 → p9 | F | Di | I  | RR | X <sub>1</sub> | X <sub>2</sub> | X <sub>3</sub> | X <sub>4</sub> | X <sub>5</sub> | X <sub>6</sub> | W  | C  |    |
| st p4 → [p5+4]    | F | Di | I  | RR | X              | SQ             |                |                |                |                |    | C  |    |
| st p3 → [p6+8]    |   | F  | Di | I  | RR             | X              | SQ             |                |                |                |    |    | C  |
| ld [p7] → p8      |   | F  | Di | I? | RR             | X              | M <sub>1</sub> | M <sub>2</sub> | W              |                |    |    | C  |

- Can “ld [p7] → p8” issue and begin execution?
  - Why or why not?
- If the load reads from either of the stores’ addresses...
  - Load must get correct value, but stores don’t write cache until commit...
- Solution: “memory forwarding”
  - Load also searches the Store Queue (in parallel with cache access)
  - Conceptually like register bypassing, but different implementation
    - Why? Addresses unknown until execute

# Problem #3: WAR Hazards

|                  | 0 | 1  | 2  | 3  | 4              | 5              | 6              | 7              | 8 | 9              | 10             | 11 | 12 |
|------------------|---|----|----|----|----------------|----------------|----------------|----------------|---|----------------|----------------|----|----|
| mul p1 * p2 → p3 | F | Di | I  | RR | X <sub>1</sub> | X <sub>2</sub> | X <sub>3</sub> | X <sub>4</sub> | W | C              |                |    |    |
| jump-not-zero p3 | F | Di |    |    |                |                | I              | RR             | X | W              | C              |    |    |
| ld [p3+4] → p5   |   | F  | Di |    |                |                | I              | RR             | X | M <sub>1</sub> | M <sub>2</sub> | W  | C  |
| st p4 → [p6+8]   |   | F  | Di | I  | RR             | X              | SQ             |                |   |                |                |    | C  |

- What if “ $p3+4 == p6 + 8$ ”?
  - WAR: need to make sure that load doesn’t read store’s result
  - Need to get values based on “program order” not “execution order”
- Bad solution: require all stores/loads to execute in-order
- Good solution: add “age” fields to store queue (SQ)
  - Loads read from **youngest older matching** store
  - Another reason the SQ is a FIFO queue

# Memory Forwarding via Store Queue

- Store Queue (SQ)
  - Holds all in-flight stores
  - CAM: searchable by address
  - Age logic: determine youngest matching store older than load
- Store rename/dispatch
  - Allocate entry in SQ
- Store execution
  - Update SQ
    - Address + Data
- Load execution
  - Search SQ identify youngest older matching store
    - Match? Read SQ
    - No Match? Read cache



# Store Queue (SQ)

---

- On load execution, select the store that is:
  - To same address as load
  - Older than the load (before the load in program order)
- Of these matching stores, select the youngest
  - The store to the same address that immediately precedes the load

# When Can Loads Execute?

|                  | 0 | 1  | 2  | 3  | 4              | 5              | 6              | 7              | 8 | 9  | 10 | 11 | 12 |
|------------------|---|----|----|----|----------------|----------------|----------------|----------------|---|----|----|----|----|
| mul p1 * p2 → p3 | F | Di | I  | RR | X <sub>1</sub> | X <sub>2</sub> | X <sub>3</sub> | X <sub>4</sub> | W | C  |    |    |    |
| jump-not-zero p3 | F | Di |    |    |                |                | I              | RR             | X | W  | C  |    |    |
| st p5 → [p3+4]   |   | F  | Di |    |                |                | I              | RR             | X | SQ | C  |    |    |
| ld [p6+8] → p7   |   | F  | Di | I? | RR             | X              | M <sub>1</sub> | M <sub>2</sub> | W |    |    |    | C  |

- Can “ld [p6+8] → p7” issue in cycle 3
  - Why or why not?

# When Can Loads Execute?

|                  | 0 | 1  | 2  | 3  | 4              | 5              | 6              | 7              | 8 | 9  | 10 | 11 | 12 |
|------------------|---|----|----|----|----------------|----------------|----------------|----------------|---|----|----|----|----|
| mul p1 * p2 → p3 | F | Di | I  | RR | X <sub>1</sub> | X <sub>2</sub> | X <sub>3</sub> | X <sub>4</sub> | W | C  |    |    |    |
| jump-not-zero p3 | F | Di |    |    |                |                | I              | RR             | X | W  | C  |    |    |
| st p5 → [p3+4]   |   | F  | Di |    |                |                | I              | RR             | X | SQ | C  |    |    |
| ld [p6+8] → p7   |   | F  | Di | I? | RR             | X              | M <sub>1</sub> | M <sub>2</sub> | W |    |    |    | C  |

- Aliasing! Does  $p3+4 == p6+8$ ?
  - If no, load should get value from memory
    - **Can it start to execute?**
  - If yes, load should get value from store
    - By reading the store queue?
    - **But the value isn't put into the store queue until cycle 9**
- **Key challenge:** don't know addresses until execution!
  - One solution: require all loads to wait for all earlier (prior) stores

# Conservative Load Scheduling

---

- Conservative load scheduling:
  - All older stores have executed
    - Some architectures: split store address / store data
      - Only requires knowing addresses (not the store values)
  - Advantage: always safe
  - Disadvantage: performance (limits ILP)

# Conservative Load Scheduling

|                 | 0 | 1  | 2  | 3  | 4 | 5              | 6              | 7  | 8  | 9              | 10             | 11 | 12 | 13 | 14 | 15 |
|-----------------|---|----|----|----|---|----------------|----------------|----|----|----------------|----------------|----|----|----|----|----|
| ld [p1] → p4    | F | Di | I  | Rr | X | M <sub>1</sub> | M <sub>2</sub> | W  | C  |                |                |    |    |    |    |    |
| ld [p2] → p5    | F | Di | I  | Rr | X | M <sub>1</sub> | M <sub>2</sub> | W  | C  |                |                |    |    |    |    |    |
| add p4, p5 → p6 |   | F  | Di |    |   | I              | Rr             | X  | W  | C              |                |    |    |    |    |    |
| st p6 → [p3]    |   | F  | Di |    |   | I              | Rr             | X  | SQ | C              |                |    |    |    |    |    |
| ld [p1+4] → p7  |   |    | F  | Di |   |                | I              | Rr | X  | M <sub>1</sub> | M <sub>2</sub> | W  | C  |    |    |    |
| ld [p2+4] → p8  |   |    | F  | Di |   |                | I              | Rr | X  | M <sub>1</sub> | M <sub>2</sub> | W  | C  |    |    |    |
| add p7, p8 → p9 |   |    | F  | Di |   |                |                |    |    | I              | Rr             | X  | W  | C  |    |    |
| st p9 → [p3+4]  |   |    | F  | Di |   |                |                |    |    | I              | Rr             | X  | SQ | C  |    |    |

Conservative load scheduling: can't issue ld [p1+4] until cycle 7!

Might as well be an in-order machine on this example

Can we do better? How?

# Optimistic Load Scheduling

|                 | 0 | 1  | 2  | 3  | 4 | 5              | 6              | 7              | 8              | 9  | 10 | 11 | 12 | 13 | 14 | 15 |
|-----------------|---|----|----|----|---|----------------|----------------|----------------|----------------|----|----|----|----|----|----|----|
| ld [p1] → p4    | F | Di | I  | Rr | X | M <sub>1</sub> | M <sub>2</sub> | W              | C              |    |    |    |    |    |    |    |
| ld [p2] → p5    | F | Di | I  | Rr | X | M <sub>1</sub> | M <sub>2</sub> | W              | C              |    |    |    |    |    |    |    |
| add p4, p5 → p6 |   | F  | Di |    |   | I              | Rr             | X              | W              | C  |    |    |    |    |    |    |
| st p6 → [p3]    |   | F  | Di |    |   | I              | Rr             | X              | SQ             | C  |    |    |    |    |    |    |
| ld [p1+4] → p7  |   |    | F  | Di | I | Rr             | X              | M <sub>1</sub> | M <sub>2</sub> | W  | C  |    |    |    |    |    |
| ld [p2+4] → p8  |   |    | F  | Di | I | Rr             | X              | M <sub>1</sub> | M <sub>2</sub> | W  | C  |    |    |    |    |    |
| add p7, p8 → p9 |   |    | F  | Di |   |                | I              | Rr             | X              | W  | C  |    |    |    |    |    |
| st p9 → [p3+4]  |   |    | F  | Di |   |                | I              | Rr             | X              | SQ | C  |    |    |    |    |    |

**Optimistic load scheduling: can actually benefit from out-of-order!**  
**Let's speculate!**

# Load Speculation

---

- Speculation requires three things.....
  - 1. When do we speculate?
  - 2. How do we detect a mis-speculation?
  - 3. How do we recover from mis-speculations?
    - Squash offending load and all newer insns
    - Similar to branch mis-prediction recovery

# Load Queue

- Detects load ordering violations
- Load execution: Write address into LQ
  - Also note any store forwarded from
- Store execution: Search LQ
  - Younger load with same addr?
    - Did younger load forward from younger store? [See slide 149]



# Store Queue + Load Queue

---

- Store Queue: handles forwarding, allows OoO stores
  - Entry per store (allocated @ dispatch, deallocated @ commit)
  - Written by stores (@ execute)
  - Searched by loads (@ execute)
  - Read from SQ to write data cache (@ commit)
- Load Queue: detects ordering violations
  - Entry per load (allocated @ dispatch, deallocated @ commit)
  - Written by loads (@ execute)
  - Searched by stores (@ execute)
- Both together
  - Allows aggressive load scheduling
  - Stores don't constrain load execution

# Optimistic Load Scheduling Problem

---

- Allows loads to issue before older stores
  - Increases ILP
  - + Good: When no conflict, increases performance
  - Bad: Conflict => squash => worse performance than waiting
- Can we have our cake AND eat it too?

# Predictive Load Scheduling

---

- Predict which loads must wait for stores
- Fool me once, shame on you-- fool me twice?
  - Loads default to aggressive
  - Keep table of load PCs that have been caused squashes
    - Schedule these conservatively
  - + Simple predictor
  - Makes “bad” loads wait for **all** older stores
- More complex predictors used in practice
  - Predict which stores loads should wait for
  - “Store Sets” paper

---

# Load/Store Queue Examples

# Initial State

(Stores to different addresses)

1. St p1 → [p2]
2. St p3 → [p4]
3. Ld [p5] → p6

| RegFile |     | Load Queue |  |
|---------|-----|------------|--|
|         | Bdy | Addr       |  |
| p1      | 5   |            |  |
| p2      | 100 |            |  |
| p3      | 9   |            |  |
| p4      | 200 |            |  |
| p5      | 100 |            |  |
| p6      | --- |            |  |
| p7      | --- |            |  |
| p8      | --- |            |  |

Store Queue

|  | Bdy | Addr | Val |
|--|-----|------|-----|
|  |     |      |     |
|  |     |      |     |
|  |     |      |     |
|  |     |      |     |

Cache

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

| RegFile |     | Load Queue |  |
|---------|-----|------------|--|
|         | Bdy | Addr       |  |
| p1      | 5   |            |  |
| p2      | 100 |            |  |
| p3      | 9   |            |  |
| p4      | 200 |            |  |
| p5      | 100 |            |  |
| p6      | --- |            |  |
| p7      | --- |            |  |
| p8      | --- |            |  |

Store Queue

|  | Bdy | Addr | Val |
|--|-----|------|-----|
|  |     |      |     |
|  |     |      |     |
|  |     |      |     |
|  |     |      |     |

Cache

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

| RegFile |     | Load Queue |  |
|---------|-----|------------|--|
|         | Bdy | Addr       |  |
| p1      | 5   |            |  |
| p2      | 100 |            |  |
| p3      | 9   |            |  |
| p4      | 200 |            |  |
| p5      | 100 |            |  |
| p6      | --- |            |  |
| p7      | --- |            |  |
| p8      | --- |            |  |

Store Queue

|  | Bdy | Addr | Val |
|--|-----|------|-----|
|  |     |      |     |
|  |     |      |     |
|  |     |      |     |
|  |     |      |     |

Cache

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

# Good Interleaving

(Shows importance of address check)

- ## 1. St p1 → [p2]

|    | RegFile | Load Queue |
|----|---------|------------|
| p1 | 5       |            |
| p2 | 100     |            |
| p3 | 9       |            |
| p4 | 200     |            |
| p5 | 100     |            |
| p6 | ---     |            |
| p7 | ---     |            |
| p8 | ---     |            |

  

|  | Store Queue |      |     |
|--|-------------|------|-----|
|  | Bdy         | Addr | Val |
|  | 1           | 100  | 5   |
|  |             |      |     |
|  |             |      |     |

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

- ## 2. St p3 → [p4]

| RegFile |     | Load Queue |  | Store Queue |      |     |
|---------|-----|------------|--|-------------|------|-----|
|         | Bdy | Addr       |  | Bdy         | Addr | Val |
| p1      | 5   |            |  |             |      |     |
| p2      | 100 |            |  |             |      |     |
| p3      | 9   |            |  |             |      |     |
| p4      | 200 |            |  |             |      |     |
| p5      | 100 |            |  |             |      |     |
| p6      | --- |            |  | 1           | 100  | 5   |
| p7      | --- |            |  | 2           | 200  | 9   |
| p8      | --- |            |  |             |      |     |

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

- 1. St p1 → [p2]
  2. St p3 → [p4]
  3. Ld [p5] → p6

| RegFile |     | Load Queue |      | Store Queue |      |     |
|---------|-----|------------|------|-------------|------|-----|
|         |     | Bdy        | Addr | Bdy         | Addr | Val |
| p1      | 5   |            |      |             |      |     |
| p2      | 100 |            | 3    |             | 100  |     |
| p3      | 9   |            |      |             |      |     |
| p4      | 200 |            |      |             |      |     |
| p5      | 100 |            |      |             |      |     |
| p6      | 5   |            |      |             |      |     |
| p7      | --- |            | 1    | 100         | 5    |     |
| p8      | --- |            | 2    | 200         | 9    |     |

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

# Different Initial State

(All to same address)

1. St p1 → [p2]
2. St p3 → [p4]
3. Ld [p5] → p6

| RegFile |     | Load Queue |  | Store Queue |      |
|---------|-----|------------|--|-------------|------|
|         | Bdy | Addr       |  | Bdy         | Addr |
| p1      | 5   |            |  |             |      |
| p2      | 100 |            |  |             |      |
| p3      | 9   |            |  |             |      |
| p4      | 100 |            |  |             |      |
| p5      | 100 |            |  |             |      |
| p6      | --- |            |  |             |      |
| p7      | --- |            |  |             |      |
| p8      | --- |            |  |             |      |

  

| Cache |     |
|-------|-----|
| Addr  | Val |
| 100   | 13  |
| 200   | 17  |

| RegFile |     | Load Queue |  | Store Queue |      |
|---------|-----|------------|--|-------------|------|
|         | Bdy | Addr       |  | Bdy         | Addr |
| p1      | 5   |            |  |             |      |
| p2      | 100 |            |  |             |      |
| p3      | 9   |            |  |             |      |
| p4      | 100 |            |  |             |      |
| p5      | 100 |            |  |             |      |
| p6      | --- |            |  |             |      |
| p7      | --- |            |  |             |      |
| p8      | --- |            |  |             |      |

  

| Cache |     |
|-------|-----|
| Addr  | Val |
| 100   | 13  |
| 200   | 17  |

| RegFile |     | Load Queue |  | Store Queue |      |
|---------|-----|------------|--|-------------|------|
|         | Bdy | Addr       |  | Bdy         | Addr |
| p1      | 5   |            |  |             |      |
| p2      | 100 |            |  |             |      |
| p3      | 9   |            |  |             |      |
| p4      | 100 |            |  |             |      |
| p5      | 100 |            |  |             |      |
| p6      | --- |            |  |             |      |
| p7      | --- |            |  |             |      |
| p8      | --- |            |  |             |      |

  

| Cache |     |
|-------|-----|
| Addr  | Val |
| 100   | 13  |
| 200   | 17  |

# Good Interleaving #1

# (Program Order)

## 1. St p1 → [p2]

| RegFile | Load Queue |  |
|---------|------------|--|
| p1      | 5          |  |
| p2      | 100        |  |
| p3      | 9          |  |
| p4      | 100        |  |
| p5      | 100        |  |
| p6      | ---        |  |
| p7      | ---        |  |
| p8      | ---        |  |

  

| Store Queue |      |     |
|-------------|------|-----|
| Bdy         | Addr | Val |
| 1           | 100  | 5   |
|             |      |     |
|             |      |     |

## 2. St p3 → [p4]

| RegFile |     | Load Queue |      | Store Queue |  |  |
|---------|-----|------------|------|-------------|--|--|
| p1      | 5   | Bdy        | Addr |             |  |  |
| p2      | 100 |            |      |             |  |  |
| p3      | 9   |            |      |             |  |  |
| p4      | 100 |            |      |             |  |  |
| p5      | 100 |            |      |             |  |  |
| p6      | --- | Bdy        | Addr | Val         |  |  |
| p7      | --- | 1          | 100  | 5           |  |  |
| p8      | --- | 2          | 100  | 9           |  |  |

### 3. Ld [p5] → p6

|    | RegFile | Load Queue |      |   |
|----|---------|------------|------|---|
|    |         | Bdy        | Addr |   |
| p1 | 5       |            |      |   |
| p2 | 100     | 3          | 100  |   |
| p3 | 9       |            |      |   |
| p4 | 100     |            |      |   |
| p5 | 100     |            |      |   |
| p6 | 9       |            |      |   |
| p7 | ---     | 1          | 100  | 5 |
| p8 | ---     | 2          | 100  | 9 |

1. St p1 → [p2]
  2. St p3 → [p4]
  3. Ld [p5] → p6

# Good Interleaving #2

(Stores reordered)

1. St p1 → [p2]
2. St p3 → [p4]
3. Ld [p5] → p6

2. St p3 → [p4]

|    | RegFile | Load Queue |  |  |
|----|---------|------------|--|--|
|    | Bdy     | Addr       |  |  |
| p1 | 5       |            |  |  |
| p2 | 100     |            |  |  |
| p3 | 9       |            |  |  |
| p4 | 100     |            |  |  |
| p5 | 100     |            |  |  |
| p6 | ---     |            |  |  |
| p7 | ---     |            |  |  |
| p8 | ---     |            |  |  |

Store Queue

|  | Bdy | Addr | Val |
|--|-----|------|-----|
|  |     |      |     |
|  | 1   |      |     |
|  | 2   | 100  | 9   |

Cache

|  | Addr | Val |
|--|------|-----|
|  |      |     |
|  | 100  | 13  |
|  | 200  | 17  |

1. St p1 → [p2]

|    | RegFile | Load Queue |  |  |
|----|---------|------------|--|--|
|    | Bdy     | Addr       |  |  |
| p1 | 5       |            |  |  |
| p2 | 100     |            |  |  |
| p3 | 9       |            |  |  |
| p4 | 100     |            |  |  |
| p5 | 100     |            |  |  |
| p6 | ---     |            |  |  |
| p7 | ---     |            |  |  |
| p8 | ---     |            |  |  |

Store Queue

|  | Bdy | Addr | Val |
|--|-----|------|-----|
|  |     |      |     |
|  | 1   |      |     |
|  | 2   | 100  | 9   |

Cache

|  | Addr | Val |
|--|------|-----|
|  |      |     |
|  | 100  | 13  |
|  | 200  | 17  |

3. Ld [p5] → p6

|    | RegFile | Load Queue |  |  |
|----|---------|------------|--|--|
|    | Bdy     | Addr       |  |  |
| p1 | 5       |            |  |  |
| p2 | 100     |            |  |  |
| p3 | 9       |            |  |  |
| p4 | 100     |            |  |  |
| p5 | 100     |            |  |  |
| p6 | 9       |            |  |  |
| p7 | ---     |            |  |  |
| p8 | ---     |            |  |  |

Store Queue

|  | Bdy | Addr | Val |
|--|-----|------|-----|
|  |     |      |     |
|  | 1   | 100  | 5   |
|  | 2   | 100  | 9   |

Cache

|  | Addr | Val |
|--|------|-----|
|  |      |     |
|  | 100  | 13  |
|  | 200  | 17  |

# Bad Interleaving #1

(Load reads the cache)

1. St p1 → [p2]
2. St p3 → [p4]
3. Ld [p5] → p6

3. Ld [p5] → p6



2. St p3 → [p4]



# Bad Interleaving #2

(Load gets value from wrong store)

1. St p1 → [p2]
2. St p3 → [p4]
3. Ld [p5] → p6

## 1. St p1 → [p2]

| RegFile |     | Load Queue |      |                   | Store Queue |  |  |
|---------|-----|------------|------|-------------------|-------------|--|--|
| p1      | 5   | Bdy        | Addr | Bdy of Fwd. Store |             |  |  |
| p2      | 100 |            |      |                   |             |  |  |
| p3      | 9   |            |      |                   |             |  |  |
| p4      | 100 |            |      |                   |             |  |  |
| p5      | 100 |            |      |                   |             |  |  |
| p6      | --- | Bdy        | Addr | Val               |             |  |  |
| p7      | --- | 1          | 100  | 5                 |             |  |  |
| p8      | --- |            |      |                   |             |  |  |

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

### 3. Ld [p5] → p6

| RegFile |     | Load Queue |      |                   | Store Queue |  |  |
|---------|-----|------------|------|-------------------|-------------|--|--|
| p1      | 5   | Bdy        | Addr | Bdy of Fwd. Store |             |  |  |
| p2      | 100 | 3          | 100  | 1                 |             |  |  |
| p3      | 9   |            |      |                   |             |  |  |
| p4      | 100 |            |      |                   |             |  |  |
| p5      | 100 |            |      |                   |             |  |  |
| p6      | 5   | Bdy        | Addr | Val               |             |  |  |
| p7      | --- | 1          | 100  | 5                 |             |  |  |
| p8      | --- | 2          |      |                   |             |  |  |

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

## 2. St p3 → [p4]

| RegFile | Load Queue | Store Queue |
|---------|------------|-------------|
| p1      | 5          |             |
| p2      | 100        |             |
| p3      | 9          |             |
| p4      | 100        |             |
| p5      | 100        |             |
| p6      | 5          |             |
| p7      | ---        |             |
| p8      | ---        |             |

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

# Bad/Good Interleaving

(Load gets value from correct store, but does it work?)

2. St p3 → [p4]

|    | RegFile |      |                   | Load Queue |  |  |
|----|---------|------|-------------------|------------|--|--|
|    | Bdy     | Addr | Bdy of Fwd. Store |            |  |  |
| p1 | 5       |      |                   |            |  |  |
| p2 | 100     |      |                   |            |  |  |
| p3 | 9       |      |                   |            |  |  |
| p4 | 100     |      |                   |            |  |  |
| p5 | 100     |      |                   |            |  |  |
| p6 | ---     |      |                   |            |  |  |
| p7 | ---     |      |                   |            |  |  |
| p8 | ---     |      |                   |            |  |  |

  

|    | Store Queue |      |     |
|----|-------------|------|-----|
|    | Bdy         | Addr | Val |
| p5 | 100         |      |     |
| p6 | 9           |      |     |
| p7 | ---         |      |     |
| p8 | 2           | 100  | 9   |

3. Ld [p5] → p6

|    | RegFile |      |                   | Load Queue |  |  |
|----|---------|------|-------------------|------------|--|--|
|    | Bdy     | Addr | Bdy of Fwd. Store |            |  |  |
| p1 | 5       |      |                   |            |  |  |
| p2 | 100     |      |                   |            |  |  |
| p3 | 9       |      |                   |            |  |  |
| p4 | 100     |      |                   |            |  |  |
| p5 | 100     |      |                   |            |  |  |
| p6 | 9       |      |                   |            |  |  |
| p7 | ---     |      |                   |            |  |  |
| p8 | ---     |      |                   |            |  |  |

  

|    | Store Queue |      |     |
|----|-------------|------|-----|
|    | Bdy         | Addr | Val |
| p5 | 100         |      |     |
| p6 | 9           |      |     |
| p7 | ---         |      |     |
| p8 | 2           | 100  | 9   |

1. St p1 → [p2]
2. St p3 → [p4]
3. Ld [p5] → p6

1. St p1 → [p2]

|    | RegFile |      |                   | Load Queue |  |  |
|----|---------|------|-------------------|------------|--|--|
|    | Bdy     | Addr | Bdy of Fwd. Store |            |  |  |
| p1 | 5       |      |                   |            |  |  |
| p2 | 100     |      |                   |            |  |  |
| p3 | 9       |      |                   |            |  |  |
| p4 | 100     |      |                   |            |  |  |
| p5 | 100     |      |                   |            |  |  |
| p6 | 9       |      |                   |            |  |  |
| p7 | ---     |      |                   |            |  |  |
| p8 | ---     |      |                   |            |  |  |

  

|    | Store Queue |      |     |
|----|-------------|------|-----|
|    | Bdy         | Addr | Val |
| p5 | 100         |      |     |
| p6 | 9           |      |     |
| p7 | ---         |      |     |
| p8 | 2           | 100  | 9   |

Cache

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

Cache

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

Cache

| Addr | Val |
|------|-----|
| 100  | 13  |
| 200  | 17  |

---

# Out-of-Order Everywhere

# OoO Everywhere

---

- Joe's phone – iPhone 7
  - Apple A10 processor
  - details very closely guarded
  - definitely OoO
  - some things known about Cyclone (Apple A7 chip)
    - issue 6 insns per cycle
    - 192-entry ROB
    - 4 integer ALUs
    - 2 Load/Store units
    - 14-19 cycle branch misprediction penalty
    - <https://www.anandtech.com/show/7910/apples-cyclone-microarchitecture-detailed>

# OoO everywhere for a while now

---

- Joe's previous phone's CPU: Qualcomm Krait 400 processor
  - based on ARM Cortex A15 processor
  - **out-of-order** 2.5GHz quad-core
  - 3-wide fetch/decode
  - 4-wide issue
  - 11-stage integer pipeline
  - 28nm process technology
  - 4/4KB DM L1\$, 16/16KB 4-way SA L2\$, 2MB 8-way SA L3\$

# Modern OoO desktop/server CPU

---

- AMD Zen microarchitecture (2016)
  - AMD slides from HotChips 2016 conference follow



## ZEN MICROARCHITECTURE

- ▲ Fetch Four x86 instructions
- ▲ Op Cache instructions
- ▲ 4 Integer units
  - Large rename space – 168 Registers
  - 192 instructions in flight/8 wide retire
- ▲ 2 Load/Store units
  - 72 Out-of-Order Loads supported
- ▲ 2 Floating Point units x 128 FMACs
  - built as 4 pipes, 2 Fadd, 2 Fmul
- ▲ I-Cache 64K, 4-way
- ▲ D-Cache 32K, 8-way
- ▲ L2 Cache 512K, 8-way
- ▲ Large shared L3 cache
- ▲ 2 threads per core



## FETCH

- ▲ Decoupled Branch Prediction
- ▲ TLB in the BP pipe
  - 8 entry L0 TLB, all page sizes
  - 64 entry L1 TLB, all page sizes
  - 512 entry L2 TLB, no 1G pages
- ▲ 2 branches per BTB entry
- ▲ Large L1 / L2 BTB
- ▲ 32 entry return stack
- ▲ Indirect Target Array (ITA)
- ▲ 64K, 4-way Instruction cache
- ▲ Micro-tags for IC & Op cache
- ▲ 32 byte fetch



## DECODE

- ▲ Inline Instruction-length Decoder
- ▲ Decode 4 x86 instructions
- ▲ Op cache
- ▲ Micro-op Queue
- ▲ Stack Engine
- ▲ Branch Fusion
- ▲ Memory File for Store to Load Forwarding



## LOAD/STORE AND L2

- ▲ 72 Out of Order Loads
- ▲ 44 entry Store Queue
- ▲ Split TLB/Data Pipe, store pipe
- ▲ 64 entry L1 TLB, all page sizes
- ▲ 1.5K entry L2 TLB, no 1G pages
- ▲ 32K, 8 way Data Cache
  - Supports two 128-bit accesses
- ▲ Optimized L1 and L2 Prefetchers
- ▲ 512K, private (2 threads), inclusive L2



## ZEN CACHE HIERARCHY

- ▲ Fast private 512K L2 cache
- ▲ Fast shared L3 cache
- ▲ High bandwidth enables prefetch improvements
- ▲ L3 is filled from L2 victims
- ▲ Fast cache-to-cache transfers
- ▲ Large Queues for Handling L1 and L2 misses



## CPU COMPLEX

- ▲ A CPU complex (CCX) is four cores connected to an L3 Cache.
- ▲ The L3 Cache is 16-way associative, 8MB, mostly exclusive of L2.
- ▲ The L3 Cache is made of 4 slices, by low-order address interleave.
- ▲ Every core can access every cache with same average latency

# Summary: Scheduling

---



- Code scheduling
  - To reduce pipeline stalls
  - To increase ILP (insn-level parallelism)
- Static scheduling by the compiler
  - Approach & limitations
- Dynamic scheduling in hardware
  - Register renaming
  - Instruction selection
  - Handling memory operations
- Up next: multicore