

# **Lecture 11:**

# **Instruction Level Parallelism & Dynamic Scheduling**

# What is ILP? ↪

- fine-grained parallelism
- measure of inter-instruction dependency in an application
  - ✓ ILP assumes a unit-cycle operation, infinite resources, perfect front-end
  - ✓ ILP  $\neq$  IPC
    - $IPC = \# \text{ instructions} / \# \text{ cycles}$
  - ✓ ILP is the upper bound of attainable IPC
- limited by
  - ✓ data dependency
  - ✓ control dependency
- enabled and improved by RISC
  - ✓ more ILP of a RISC over CISC does not imply a better overall performance
  - ✓ CISC can be implemented like a RISC

# ILP Example

true dependency forces sequentiality

$$\text{ILP} = 3/3 = 1$$

false dependency removed

$$\text{ILP} = 3/2 = 1.5$$



c1: load r2, (r12)

c2: add r1, r2, #9



mul r8, r5, r6

# Window in Search of ILP



# Window in Search of ILP

$$R5 = 8(R6)$$

$$R7 = R5 - R4$$

$$R9 = R7 * R7$$

$$R15 = 16(R6)$$

$$R17 = R15 - R14$$

$$R19 = R15 * R15$$

# Window in Search of ILP

$$C1: R5 = 8(R6)$$

$$R15 = 16(R6)$$

$$C2: R7 = R5 - R4$$

$$R17 = R15 - R14$$

$$R19 = R15 * R15$$

$$C3: R9 = R7 * R7$$

- ILP =  $6/3 = 2$  better than 1 and 1.5
- larger window gives more opportunities
  - ✓ who exploit the instruction window?
  - ✓ what limits the window?

# Memory Dependency

- ambiguously dependency also forces “sequentiality”
- to increase ILP
  - ✓ needs dynamic memory disambiguation mechanisms that are either safe or recoverable
- ILP could be 1, could be 3, depending on the actual dependence



# ILP, Another Example

when only 4 registers available

$\downarrow$

$R1 = 8(R0)$

$R3 = R1 - 5$

$R2 = R1 * R3$

$24(R0) = R2$

$R1 = 16(R0)$

$R3 = R1 - 5$

$R2 = R1 * R3$

$32(R0) = R2$



ILP =

# ILP, Another Example

when more registers (or register renaming) available



ILP = 2

# Basic Blocks

a = array[i];  
b = array[j];  
c = array[k];  
d = b + c;  
while (d < t) {  
 a++;  
 c \*= 5;  
 d = b + c;  
}  
array[i] = a;  
array[j] = d;



i1: lw r1, (r11)  
i2: lw r2, (r12)  
i3: lw r3, (r13)  
i4: add r2, r2, r3  
i5: bge r2, r9, i9  
i6: addi r1, r1, 1  
i7: mul r3, r3, 5  
i8: j i4  
i9: sw r1, (r11)  
i10: sw r2, (r12)  
i11: jr r31

# Basic Blocks

```
a = array[i];
b = array[j];
c = array[k];
d = b + c;
while (d < t) {
    a++;
    c *= 5;
    d = b + c;
}
array[i] = a;
array[j] = d;
```



# Control Flow Graph



# ILP (w/o Speculation)

**BB1**

```
i1: lw r1, (r11)
i2: lw r2, (r12)
i3: lw r3, (r13)
```

lw r1, (r11)      lw r2, (r12)      lw r3, (r13)  
 BB1 = 3

add r2, r2, r3  
 jge r2, r9, i9  
 BB2 = 1

addi r1, r1, 1      mul r3, r3, 5      j i4  
 BB3 = 3

sw r1, (r11)      jr r31  
 sw r2, (r12)

8  
4

**BB2**

```
i4: add r2, r2, r3
i5: jge r2, r9, i9
```

**BB3**

```
i6: addi r1, r1, 1
i7: mul r3, r3, 5
i8: j i4
```

**BB4**

```
i9: sw r1, (r11)
i10: sw r2, (r12)
i11: jr r31
```

BB1 → BB2 → BB3

$$\text{ILP} = \frac{8}{4} = 2$$

BB1 → BB2 → BB4

$$\text{ILP} = \frac{8}{5} = 1.6$$

# ILP (w/ Speculation)

no control dependence



$BB1 \rightarrow BB2 \rightarrow BB3$



$BB1 \rightarrow BB2 \rightarrow BB4$



# Data Flow Execution Model

$i_1: r_2 \leftarrow 4(r_{22})$   
 ~~$i_2: r_{10} = 4(r_{25})$~~   
 $i_3: r_{10} = r_2 + r_{10}$   
 $i_4: 4(r_{26}) = r_{10}$   
 $i_5: r_{14} = 8(r_{27})$   
 $i_6: r_6 = (r_{22})$   
 $i_7: r_5 = (r_{23})$   
 $i_8: r_5 = r_6 - r_5$   
 $i_9: r_4 = r_{14} * r_5$   
 $i_{10}: r_{15} = 12(r_{27})$   
 $i_{11}: r_7 = 4(r_{22})$   
 $i_{12}: r_8 = 4(r_{23})$   
 $i_{13}: r_8 = r_7 - r_8$   
 $i_{14}: r_8 = r_{15} * r_8$   
 $i_{15}: r_8 = r_4 - r_8$   
 $i_{16}: (r_{28}) = r_8$



# Data Flow Execution Model

- to exploit maximal ILP
- an instruction can be executed immediately after
  - ✓ all source operands are ready
  - ✓ execution unit available
  - ✓ destination is ready (to be written)

X

late .  
look at  $\oplus h : >$

# Dynamic Scheduling

- to exploit ILP at run-time, execute instructions OOO

- HW will

- ✓ maintain true dependency (data flow manner)
- ✓ find ILP within an instruction window (pool)
  - need an accurate branch predictor

- pros

- ✓ scalable performance: allows code to be compiled on one platform, but also run efficiently on another
- ✓ handle cases where dep is unknown at compile-time

- cons

- ✓ hardware complexity (main argument from the VLIW/EPIC camp)

Out - of - order

# OOO Execution

- OOO execution  $\equiv$  out-of-order completion
- OOO execution  $\neq$  out-of-order retirement (commit)
- no (speculative) instruction allowed to retire until it is confirmed on the right path
- fetch, decode, issue (i.e., front-end) are still done in the program order



# IBM 360

- IBM 360 introduced
  - ✓ 8-bit = 1 byte
  - ✓ 32-bit = 1 word
  - ✓ byte-addressable memory
- Tomasulo algorithm
  - ✓ dynamic scheduling
  - ✓ register renaming

Pr.: Blauww

# Tomasulo Algorithm

- goal: high performance without special compilers
  - ✓ dynamic scheduling done completely by HW
  - ✓ generally use “superscalar processor” for such category as opposed to “VLIW” or “EPIC”
- why study? Lead to Pentium Pro/II/III/4, Core, Alpha 21264, MIPS R10000, HP 8000, PowerPC 604

# IBM 360/91 FPU w/ Tomasulo Algorithm

- not to stall FP instructions due to long latency
  - ✓ two function units — FP Add + FP Mult/Div
- three new mechanisms
  - ✓ reservation stations (RS)
    - 3 in FP add, 2 in FP mult/div
    - register name is discarded when issue to reservation station
  - ✓ tags
    - 4-bit tag for one of the 11 possible sources (5 RSs + 6 FLB for loads)
    - written for unavailable sources whose results are being generated by one of the sources (5 RS or 6 FLB)
    - new tag assignment eliminates false dependency
  - ✓ common data bus (CDB), driven by
    - 11 sources: 5 RS + 6 FLB
    - 17 destinations: 2\*5 RS + 3 SDB + 4 FLR

# Basic Principles

- do not rely on a centralized register file !
- RS fetches and buffers an operand as soon as it is available via CDB
  - ✓ eliminating the need to get it from a register (No WAF)
  - ✓ data flow execution model
- pending instructions designate the RS that will provide their input (renaming and maintain RAW)
- due to in-order issue, the register status table always keeps the latest write (no WAW issue)



# Key Representation

- $Op$  — operation to perform in the units
- $V_j$  — value of source 1 (called SINK in 360/91)
- $V_k$  — value of source 2 (called SOURCE in 360/91)
- $Q_j$  — RS (tag) will produce source 1
- $Q_k$  — RS (tag) will produce source 2
- $A(\text{ddress})$  — hold info for the memory address generation for a load or store
- $Q_i$  — whose value should be stored into the register

# IBM 360/91 FPU w/ Tomasulo Algorithm



# IBM 360/91 FPU w/ Tomasulo Algorithm



# RAW Example:

$$\begin{array}{l} \rightarrow i: R_2 \leftarrow R_0 + R_4 \quad (2) \\ \rightarrow j: R_8 \leftarrow R_0 + R_2 \quad (2) \end{array}$$

Cycle #0:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 3.5  |
| 4   |          | 10.0 |
| 8   |          | 7.8  |

Cycle #1: Issue i

| RS | Tag | Sink | Tag | Src  |
|----|-----|------|-----|------|
| 1  | 0   | 6.0  | 0   | 10.0 |
| 2  |     |      |     |      |
| 3  |     |      |     |      |

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 3.5  |
| 4   | 1        | 10.0 |
| 8   |          | 7.8  |

Cycle #2: Issue j

| RS | Tag | Sink | Tag | Src  |
|----|-----|------|-----|------|
| 1  | 0   | 6.0  | 0   | 10.0 |
| 2  | 0   | 6.0  | 1   | ---  |
| 3  |     |      |     |      |

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   | 1        | 1    |
| 4   |          | ---  |
| 8   | 1        | 2    |

# RAW Example:

i:  $R2 \leftarrow R0 + R4$  (2)  
j:  $R8 \leftarrow R0 + R2$  (2)

Cycle #3: Broadcasts tag and result: CDB\_a=<RS1,16.0>

| RS | Tag | Sink | Tag | Src  |
|----|-----|------|-----|------|
| 1  |     |      |     |      |
| 2  | 0   | 6.0  | 0   | 16.0 |
| 3  |     |      |     |      |

Adder



| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 16.0 |
| 4   |          | 10.0 |
| 8   | 1        | 2    |

Cycle #5: Broadcasts tag and result: CDB\_a=<RS2,22.0>

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder



| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 16.0 |
| 4   |          | 10.0 |
| 8   |          | 22.0 |

# WAR Example:

i:  $R4 \leftarrow R0 \times R8$  (3)  
 j:  $R0 \leftarrow R4 \times R2$  (3)  
 k:  $R2 \leftarrow R2 + R8$  (2)

Cycle #0:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 3.5  |
| 4   |          | 10.0 |
| 8   |          | 7.8  |

Cycle #1: Issue i

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  | 0   | 6.0  | 0   | 7.8 |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 3.5  |
| 4   | 1        | ---  |
| 8   |          | 7.8  |

Cycle #2: Issue j

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  | 0   | 6.0  | 0   | 7.8 |
| 5  | 4   | ---  | 0   | 3.5 |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   | 1        | ---  |
| 2   | 2        | 3.5  |
| 4   | 1        | ---  |
| 8   |          | 7.8  |

# WAR Example:

29

i:  $R4 \leftarrow R0 \times R8$  (3)  
j:  $R0 \leftarrow R4 \times R2$  (3)  
k:  $R2 \leftarrow R2 + R8$  (2)

Cycle #3: Issue k

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  | 0   | 3.5  | 0   | 7.8 |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  | 0   | 6.0  | 0   | 7.8 |
| 5  | 4   | --   | 0   | 3.5 |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   | 1        | 5    |
| 2   | 1        | 1    |
| 4   | 1        | 4    |
| 8   |          | 7.8  |

Cycle #4: Broadcasts CDB\_m=<RS4,46.8>;

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  | 0   | 3.5  | 0   | 7.8 |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  | 0   | 46.8 | 0   | 3.5 |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   | 1        | 5    |
| 2   | 1        | 1    |
| 4   |          | 46.8 |
| 8   |          | 7.8  |

Cycle #5: Broadcasts CDB\_a=<RS1,11.3>

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  | 0   | 46.8 | 0   | 3.5 |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   | 1        | 5    |
| 2   |          | 11.3 |
| 4   |          | 46.8 |
| 8   |          | 7.8  |

# WAR Example:

i:  $R4 \leftarrow R0 \times R8$  (3<sup>30</sup>)  
j:  $R0 \leftarrow R4 \times R2$  (3)  
k:  $R2 \leftarrow R2 + R8$  (2)

Cycle #7: Broadcasts CDB\_m=<RS5,163.8>

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data  |
|-----|----------|-------|
| 0   |          | 163.8 |
| 2   |          | 11.3  |
| 4   |          | 46.8  |
| 8   |          | 7.8   |

# WAW Example:

i:  $R4 \leftarrow R0 \times R8$  (3<sup>31</sup>)  
j:  $R2 \leftarrow R0 + R4$  (2)  
k:  $R4 \leftarrow R0 + R8$  (2)

Cycle #0:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 3.5  |
| 4   |          | 10.0 |
| 8   |          | 7.8  |

Cycle #1: Issue i

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  | 0   | 6.0  | 0   | 7.8 |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 3.5  |
| 4   | 1        | ---  |
| 8   |          | 7.8  |

Cycle #2: Issue j

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  | 0   | 6.0  | 4   | --- |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  | 0   | 6.0  | 0   | 7.8 |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   | 1        | ---  |
| 4   | 1        | ---  |
| 8   |          | 7.8  |

# WAW Example:

i:  $R4 \leftarrow R0 \times R8$  (3<sup>32</sup>)  
j:  $R2 \leftarrow R0 + R4$  (2)  
k:  $R4 \leftarrow R0 + R8$  (2)

## Cycle #3: Issue k

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  | 0   | 6.0  | 4   | --- |
| 2  | 0   | 6.0  | 0   | 7.8 |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  | 0   | 6.0  | 0   | 7.8 |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   | 1        | ---  |
| 4   | 2        | ---  |
| 8   |          | 7.8  |

## Cycle #4: Broadcasts CDB\_m=<RS4,46.8>

| RS | Tag | Sink | Tag | Src  |
|----|-----|------|-----|------|
| 1  | 0   | 6.0  | 0   | 46.8 |
| 2  | 0   | 6.0  | 0   | 7.8  |
| 3  |     |      |     |      |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   | 1        | ---  |
| 4   | 2        | ---  |
| 8   |          | 7.8  |

## Cycle #5: Broadcasts CDB\_a=<RS2,13.8>

| RS | Tag | Sink | Tag | Src  |
|----|-----|------|-----|------|
| 1  | 0   | 6.0  | 0   | 46.8 |
| 2  |     |      |     |      |
| 3  |     |      |     |      |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   | 1        | ---  |
| 4   |          | 13.8 |
| 8   |          | 7.8  |

# WAW Example:

i:  $R4 \leftarrow R0 \times R8$  (3<sup>33</sup>)  
j:  $R2 \leftarrow R0 + R4$  (2)  
k:  $R4 \leftarrow R0 + R8$  (2)

Cycle #6: Broadcasts CDB\_a=<RS1,52.8>

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 52.8 |
| 4   |          | 13.8 |
| 8   |          | 7.8  |

# Yet Another Example

Cycle #0:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

- i:  $R2 \leftarrow R2 + R4$  (2)  
j:  $R8 \leftarrow R0 \times R2$  (3)  
k:  $R2 \leftarrow R8 \times R2$  (3)

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          | 6.0  |
| 2   |          | 3.5  |
| 4   |          | 10.0 |
| 8   |          | 7.8  |

Cycle #1:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

Cycle #2:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

i:  $R2 \leftarrow R2 + R4$  (2)  
j:  $R8 \leftarrow R0 \times R2$  (3)  
k:  $R2 \leftarrow R8 \times R2$  (3)

### Cycle #3:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

### Cycle #4:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

### Cycle #5:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

36

i:  $R2 \leftarrow R2 + R4$  (2)  
j:  $R8 \leftarrow R0 \times R2$  (3)  
k:  $R2 \leftarrow R8 \times R2$  (3)

### Cycle #6:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

### Cycle #7:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

### Cycle #8:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

i:  $R2 \leftarrow R2 + R4$  (2)  
j:  $R8 \leftarrow R0 \times R2$  (3)  
k:  $R2 \leftarrow R8 \times R2$  (3)

### Cycle #9:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

### Cycle #10:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

### Cycle #11:

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 1  |     |      |     |     |
| 2  |     |      |     |     |
| 3  |     |      |     |     |

Adder

| RS | Tag | Sink | Tag | Src |
|----|-----|------|-----|-----|
| 4  |     |      |     |     |
| 5  |     |      |     |     |

Multiplier/Divider

| FLR | Busy Tag | Data |
|-----|----------|------|
| 0   |          |      |
| 2   |          |      |
| 4   |          |      |
| 8   |          |      |

# Issues in Tomasulo Algorithm

- CDB at high speed?
- precise exception issues
- speculative instructions
  - ✓ branch prediction enlarges instruction window
  - ✓ how to rollback when mispredicted?



# Tomasulo Summary

- prevents register as bottleneck
- avoids/removes WAR, WAW hazards
- lasting contributions
  - ✓ dynamic scheduling
  - ✓ register renaming (in what way does the register name change?)
  - ✓ load/store disambiguation

# Major Challenges

- exceptions/interrupts
  - ✓ can't identify a particular point in the program at which an interrupt/exception occurs
  - ✓ how do you know where to go back to after an interrupt handler completes?
- interaction with pipelined ALUs and cache memories
  - ✓ reservation station couldn't be released until instruction completes,
  - ✓ cache miss can take many cycles
  - ✓ would need many reservation stations – large instruction window

# Issue w/ Imprecise Interrupt

- add instructions take one cycle
  - ✓ e.g.,
    - Load (left side) induces a “data page fault”;
    - Add (right side) induces an “instruction page fault”
  - ✓ if out-of-order completion is allowed
    - r10, r12, (or r3, r4) ... will be modified
    - wrong values will be used by the re-issued load
  - ✓ interrupt classes
    - program interrupts (exceptions or traps)
    - external interrupts (asynchronous)



# Precise Interrupts

- to reflect a sequential architecture model  $\Rightarrow$  serially correct (think about a single issue, non-pipelined processor)
- keep **precise state** of an execution
  - ✓ all instructions before the interrupted instruction must be completed
  - ✓ state should appear as if no instruction issued after the interrupted instruction
  - ✓ interrupted PC should be presented to the interrupt handler (restartable)
- similar to branch misprediction handling
- out-of-order execution makes the ordering hard
  - ✓ undo what comes after an interrupt

# Why Supporting Precise Interrupts

- need to maintain a precise state (for recovery)
  - ✓ need to know where to resume execution when interrupt handler finishes
  - ✓ debugging -- present consistent view to programmer at breakpoints/errors
  - ✓ emulation of instructions in software

# Support Precise Interrupt

- buffer results
- reconstruct the scenario (state) as sequential execution
- restart from saved PC with saved PC state

# Announcement

- next lecture: dynamic scheduling (speculative execution)
  - ✓ Ch. 3.4 – 3.8 (HP2)
- MP assignment
  - ✓ MP3 checkpoint 1 due on 3/11 5pm

# Tomasulo Example (H&P Text)

## Instruction status:

| Instruction | j   | k   | Issue | Comp | Result |
|-------------|-----|-----|-------|------|--------|
| LD          | F6  | 34+ | R2    |      |        |
| LD          | F2  | 45+ | R3    |      |        |
| MULTD       | F0  | F2  | F4    |      |        |
| SUBD        | F8  | F6  | F2    |      |        |
| DIVD        | F10 | F0  | F6    |      |        |
| ADDD        | F6  | F8  | F2    |      |        |

|       | Busy Address |
|-------|--------------|
| Load1 | No           |
| Load2 | No           |
| Load3 | No           |

## Reservation Stations:

| Station Stations: |       | S1   | S2 | RS | RS |    |    |
|-------------------|-------|------|----|----|----|----|----|
| Time              | Name  | Busy | Op | Vj | Vk | Qj | Qk |
|                   | Add1  | No   |    |    |    |    |    |
|                   | Add2  | No   |    |    |    |    |    |
|                   | Add3  | No   |    |    |    |    |    |
|                   | Mult1 | No   |    |    |    |    |    |
|                   | Mult2 | No   |    |    |    |    |    |

These are RS,  
we have only  
one FU for each  
type (MUL, ADD,  
LD). We reduce  
Load from 6 to 3  
for simplicity. SDB  
is not shown either

## Register result status:

# Assumption

- INT (load) — 1 cycle
- MULT — 10 cycles
- ADD — 2 cycles
- DIVIDE — 40 cycles

# Tomasulo Example Cycle 1

## Instruction status:

| Instruction status: |     |     |       | Exec | Write  |
|---------------------|-----|-----|-------|------|--------|
| Instruction         | j   | k   | Issue | Comp | Result |
| LD                  | F6  | 34+ | R2    | 1    |        |
| LD                  | F2  | 45+ | R3    |      |        |
| MULTD               | F0  | F2  | F4    |      |        |
| SUBD                | F8  | F6  | F2    |      |        |
| DIVD                | F10 | F0  | F6    |      |        |
| ADDD                | F6  | F8  | F2    |      |        |

|       | Busy | Address |
|-------|------|---------|
| Load1 | Yes  | 34+R2   |
| Load2 | No   |         |
| Load3 | No   |         |

## Reservation Stations:

| Station Stations: |       | S1   | S2 | RS | RS |    |    |
|-------------------|-------|------|----|----|----|----|----|
| Time              | Name  | Busy | Op | Vj | Vk | Qj | Qk |
|                   | Add1  | No   |    |    |    |    |    |
|                   | Add2  | No   |    |    |    |    |    |
|                   | Add3  | No   |    |    |    |    |    |
|                   | Mult1 | No   |    |    |    |    |    |
|                   | Mult2 | No   |    |    |    |    |    |

## Register result status:

Timing diagram illustrating the sampling and update process:

- Clock**: Period = 1.
- Qi**: Sampling occurs at **F6**. At **F6**, the value is updated to **Load1**.

# Tomasulo Example Cycle 2

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  |
|-------------|-----|-----|-------|------|--------|
|             |     |     |       | Comp | Result |
| LD          | F6  | 34+ | R2    | 1    |        |
| LD          | F2  | 45+ | R3    | 2    |        |
| MULTD       | F0  | F2  | F4    |      |        |
| SUBD        | F8  | F6  | F2    |      |        |
| DIVD        | F10 | F0  | F6    |      |        |
| ADDD        | F6  | F8  | F2    |      |        |

|       | Busy | Address |
|-------|------|---------|
| Load1 | Yes  | 34+R2   |
| Load2 | Yes  | 45+R3   |
| Load3 | No   |         |

## Reservation Stations:

| Time | Name  | Busy | Op | S1 | S2 | RS | RS |
|------|-------|------|----|----|----|----|----|
|      |       |      |    | Vj | Vk | Qj | Qk |
|      | Add1  | No   |    |    |    |    |    |
|      | Add2  | No   |    |    |    |    |    |
|      | Add3  | No   |    |    |    |    |    |
|      | Mult1 | No   |    |    |    |    |    |
|      | Mult2 | No   |    |    |    |    |    |

## Register result status:

| Clock | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|----|----|----|-----|-----|-----|-----|
| Qi    |    |    |    |    |    |     |     |     |     |
| 2     |    |    |    |    |    |     |     |     |     |

Load2

Load1

RS enables multiple outstanding loads

Load is calculating the effective address

# Tomasulo Example Cycle 3

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy  | Address   |
|-------------|-----|-----|-------|------|--------|-------|-----------|
|             |     |     |       | Comp | Result |       |           |
| LD          | F6  | 34+ | R2    | 1    | 3      | Load1 | Yes 34+R2 |
| LD          | F2  | 45+ | R3    | 2    |        | Load2 | Yes 45+R3 |
| MULTD       | F0  | F2  | F4    | 3    |        | Load3 | No        |
| SUBD        | F8  | F6  | F2    |      |        |       |           |
| DIVD        | F10 | F0  | F6    |      |        |       |           |
| ADDD        | F6  | F8  | F2    |      |        |       |           |

## Reservation Stations:

| Time | Name  | Busy | S1    | S2 | RS    | RS    |    |
|------|-------|------|-------|----|-------|-------|----|
|      |       |      | Op    | Vj | Vk    | Qj    | Qk |
|      | Add1  | No   |       |    |       |       |    |
|      | Add2  | No   |       |    |       |       |    |
|      | Add3  | No   |       |    |       |       |    |
|      | Mult1 | Yes  | MULTD |    | R(F4) | Load2 |    |
|      | Mult2 | No   |       |    |       |       |    |

## Register result status:

| Clock | F0    | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|-------|-------|----|-------|----|-----|-----|-----|-----|
| Qi    | Mult1 | Load2 |    | Load1 |    |     |     |     |     |
| 3     |       |       |    |       |    |     |     |     |     |

Regs names are “renamed” in RSs; MULT issued vs. scoreboard  
 Load1 completing; what is waiting for Load1?

# Tomasulo Example Cycle 4

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write | Busy      | Address  |
|-------------|-----|-----|-------|------|-------|-----------|----------|
| LD          | F6  | 34+ | R2    | 1    | 3     | 4         | Load1 No |
| LD          | F2  | 45+ | R3    | 2    | 4     | Load2 Yes | 45+R3    |
| MULTD       | F0  | F2  | F4    | 3    |       | Load3 No  |          |
| SUBD        | F8  | F6  | F2    | 4    |       |           |          |
| DIVD        | F10 | F0  | F6    |      |       |           |          |
| ADDD        | F6  | F8  | F2    |      |       |           |          |

## Reservation Stations:

| Time | Name  | Busy | Op    | S1    | S2    | RS    | RS    |
|------|-------|------|-------|-------|-------|-------|-------|
|      |       |      |       | Vj    | Vk    | Qj    | Qk    |
|      | Add1  | Yes  | SUBD  | M(A1) |       |       | Load2 |
|      | Add2  | No   |       |       |       |       |       |
|      | Add3  | No   |       |       |       |       |       |
|      | Mult1 | Yes  | MULTD |       | R(F4) | Load2 |       |
|      | Mult2 | No   |       |       |       |       |       |

## Register result status:

| Clock |    | F0    | F2    | F4 | F6    | F8   | F10 | F12 | ... | F30 |
|-------|----|-------|-------|----|-------|------|-----|-----|-----|-----|
| 4     | Qi | Mult1 | Load2 |    | M(A1) | Add1 |     |     |     |     |

Load1 write to CDB; Load2 completing; what is waiting for Load2?

# Tomasulo Example Cycle 5

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |
|-------------|-----|-----|-------|------|--------|------|---------|
|             |     |     |       | Comp | Result |      |         |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   |
| MULTD       | F0  | F2  | F4    | 3    |        |      | Load3   |
| SUBD        | F8  | F6  | F2    | 4    |        |      |         |
| DIVD        | F10 | F0  | F6    | 5    |        |      |         |
| ADDD        | F6  | F8  | F2    |      |        |      |         |

## Reservation Stations:

| Time | Name  | Busy | S1    | S2    | RS    | RS    |    |
|------|-------|------|-------|-------|-------|-------|----|
|      |       |      | Op    | Vj    | Vk    | Qj    | Qk |
| 2    | Add1  | Yes  | SUBD  | M(A1) | M(A2) |       |    |
|      | Add2  | No   |       |       |       |       |    |
|      | Add3  | No   |       |       |       |       |    |
| 10   | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | R(F6) | Mult1 |    |

## Register result status:

| Clock | F0 | F2    | F4    | F6 | F8 | F10  | F12   | ... | F30 |
|-------|----|-------|-------|----|----|------|-------|-----|-----|
| 5     | Qi | Mult1 | M(A2) |    |    | Add1 | Mult2 |     |     |

# Tomasulo Example Cycle 6

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write | Result | Busy  | Address |
|-------------|-----|-----|-------|------|-------|--------|-------|---------|
| LD          | F6  | 34+ | R2    | 1    | 3     | 4      | Load1 | No      |
| LD          | F2  | 45+ | R3    | 2    | 4     | 5      | Load2 | No      |
| MULTD       | F0  | F2  | F4    | 3    |       |        | Load3 | No      |
| SUBD        | F8  | F6  | F2    | 4    |       |        |       |         |
| DIVD        | F10 | F0  | F6    | 5    |       |        |       |         |
| ADDD        | F6  | F8  | F2    | 6    |       |        |       |         |

## Reservation Stations:

| Time | Name  | Busy | Op    | S1 Vj | S2 Vk | RS Qj | RS Qk |
|------|-------|------|-------|-------|-------|-------|-------|
| 1    | Add1  | Yes  | SUBD  | M(A1) | M(A2) |       |       |
|      | Add2  | Yes  | ADDD  |       | R(F2) | Add1  |       |
|      | Add3  | No   |       |       |       |       |       |
| 9    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |       |
|      | Mult2 | Yes  | DIVD  |       | R(F6) | Mult1 |       |

## Register result status:

| Clock |    | F0    | F2 | F4 | F6   | F8   | F10   | F12 | ... | F30 |
|-------|----|-------|----|----|------|------|-------|-----|-----|-----|
| 6     | Qi | Mult1 |    |    | Add2 | Add1 | Mult2 |     |     |     |

R(F6) was entered in Cycle 5

Issue ADDD here vs. scoreboard?

# Tomasulo Example Cycle 7

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write | Busy | Address |    |
|-------------|-----|-----|-------|------|-------|------|---------|----|
| LD          | F6  | 34+ | R2    | 1    | 3     | 4    | Load1   | No |
| LD          | F2  | 45+ | R3    | 2    | 4     | 5    | Load2   | No |
| MULTD       | F0  | F2  | F4    | 3    |       |      | Load3   | No |
| SUBD        | F8  | F6  | F2    | 4    | 7     |      |         |    |
| DIVD        | F10 | F0  | F6    | 5    |       |      |         |    |
| ADDD        | F6  | F8  | F2    | 6    |       |      |         |    |

## Reservation Stations:

| Time | Name  | Busy | Op    | S1    | S2    | RS    | RS |
|------|-------|------|-------|-------|-------|-------|----|
|      |       |      |       | Vj    | Vk    | Qj    | Qk |
| 0    | Add1  | Yes  | SUBD  | M(A1) | M(A2) |       |    |
|      | Add2  | Yes  | ADDD  |       | R(F2) | Add1  |    |
|      | Add3  | No   |       |       |       |       |    |
| 8    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | R(F6) | Mult1 |    |

## Register result status:

| Clock |    | F0    | F2 | F4 | F6   | F8   | F10   | F12 | ... | F30 |
|-------|----|-------|----|----|------|------|-------|-----|-----|-----|
| 7     | Qi | Mult1 |    |    | Add2 | Add1 | Mult2 |     |     |     |

Add1 completing; what is waiting for it?

# Tomasulo Example Cycle 8

## Instruction status:

| Instruction | j   | k   | Issue | Comp | Result |
|-------------|-----|-----|-------|------|--------|
| LD          | F6  | 34+ | R2    | 1    | 3      |
| LD          | F2  | 45+ | R3    | 2    | 4      |
| MULTD       | F0  | F2  | F4    | 3    |        |
| SUBD        | F8  | F6  | F2    | 4    | 7      |
| DIVD        | F10 | F0  | F6    | 5    |        |
| ADDD        | F6  | F8  | F2    | 6    |        |

|       | Busy Address |
|-------|--------------|
| Load1 | No           |
| Load2 | No           |
| Load3 | No           |

## Reservation Stations:

| Station Stations: |       |      | S1    | S2    | RS    | RS    |    |
|-------------------|-------|------|-------|-------|-------|-------|----|
| Time              | Name  | Busy | Op    | Vj    | Vk    | Qj    | Qk |
|                   | Add1  | No   |       |       |       |       |    |
| 2                 | Add2  | Yes  | ADDD  | M1-M2 | R(F2) |       |    |
|                   | Add3  | No   |       |       |       |       |    |
| 7                 | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|                   | Mult2 | Yes  | DIVD  |       | R(F6) | Mult1 |    |

## Register result status:

| Clock |    | F0    | F2 | F4   | F6    | F8    | F10 | F12 | ... | F30 |
|-------|----|-------|----|------|-------|-------|-----|-----|-----|-----|
| 8     | Qi | Mult1 |    | Add2 | M1-M2 | Mult2 |     |     |     |     |

# Tomasulo Example Cycle 9

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |
|-------------|-----|-----|-------|------|--------|------|---------|
|             |     |     |       | Comp | Result |      |         |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   |
| MULTD       | F0  | F2  | F4    | 3    |        |      | Load3   |
| SUBD        | F8  | F6  | F2    | 4    | 7      | 8    |         |
| DIVD        | F10 | F0  | F6    | 5    |        |      |         |
| ADDD        | F6  | F8  | F2    | 6    |        |      |         |

## Reservation Stations:

| Time | Name  | Busy | Op    | S1    | S2    | RS    | RS |
|------|-------|------|-------|-------|-------|-------|----|
|      |       |      |       | Vj    | Vk    | Qj    | Qk |
|      | Add1  | No   |       |       |       |       |    |
| 1    | Add2  | Yes  | ADDD  | M1-M2 | R(F2) |       |    |
|      | Add3  | No   |       |       |       |       |    |
| 6    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | R(F6) | Mult1 |    |

## Register result status:

| Clock | F0 | F2    | F4 | F6   | F8 | F10   | F12 | ... | F30 |
|-------|----|-------|----|------|----|-------|-----|-----|-----|
| 9     | FU | Mult1 |    | Add2 |    | Mult2 |     |     |     |

# Tomasulo Example Cycle 10

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write | Busy | Address |    |
|-------------|-----|-----|-------|------|-------|------|---------|----|
| LD          | F6  | 34+ | R2    | 1    | 3     | 4    | Load1   | No |
| LD          | F2  | 45+ | R3    | 2    | 4     | 5    | Load2   | No |
| MULTD       | F0  | F2  | F4    | 3    |       |      | Load3   | No |
| SUBD        | F8  | F6  | F2    | 4    | 7     | 8    |         |    |
| DIVD        | F10 | F0  | F6    | 5    |       |      |         |    |
| ADDD        | F6  | F8  | F2    | 6    | 10    |      |         |    |

## Reservation Stations:

| Time | Name  | Busy | Op    | S1    | S2    | RS    | RS |
|------|-------|------|-------|-------|-------|-------|----|
|      |       |      |       | Vj    | Vk    | Qj    | Qk |
|      | Add1  | No   |       |       |       |       |    |
| 0    | Add2  | Yes  | ADDD  | M1-M2 | R(F2) |       |    |
|      | Add3  | No   |       |       |       |       |    |
| 5    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | R(F6) | Mult1 |    |

## Register result status:

| Clock | F0 | F2    | F4 | F6   | F8 | F10   | F12 | ... | F30 |
|-------|----|-------|----|------|----|-------|-----|-----|-----|
| 10    | FU | Mult1 |    | Add2 |    | Mult2 |     |     |     |

Add2 completing; what is waiting for it?

# Tomasulo Example Cycle 11

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |    |
|-------------|-----|-----|-------|------|--------|------|---------|----|
|             |     |     |       | Comp | Result |      |         |    |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   | No |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   | No |
| MULTD       | F0  | F2  | F4    | 3    |        |      | Load3   | No |
| SUBD        | F8  | F6  | F2    | 4    | 7      | 8    |         |    |
| DIVD        | F10 | F0  | F6    | 5    |        |      |         |    |
| ADDD        | F6  | F8  | F2    | 6    | 10     | 11   |         |    |

## Reservation Stations:

| Time | Name  | Busy | Op          | S1    | S2    | RS    | RS |
|------|-------|------|-------------|-------|-------|-------|----|
|      |       |      |             | Vj    | Vk    | Qj    | Qk |
|      | Add1  | No   |             |       |       |       |    |
|      | Add2  | No   |             |       |       |       |    |
|      | Add3  | No   |             |       |       |       |    |
| 4    | Mult1 | Yes  | MULTD M(A2) | R(F4) |       |       |    |
|      | Mult2 | Yes  | DIVD        |       | R(F6) | Mult1 |    |

## Register result status:

| Clock | F0 | F2    | F4 | F6            | F8 | F10   | F12 | ... | F30 |
|-------|----|-------|----|---------------|----|-------|-----|-----|-----|
| 11    | FU | Mult1 |    | (M1-M2+M(A2)) |    | Mult2 |     |     |     |

Write result of ADDD here vs. scoreboard?

All quick instructions complete in this cycle!

# Tomasulo Example Cycle 12

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |    |
|-------------|-----|-----|-------|------|--------|------|---------|----|
|             |     |     |       | Comp | Result |      |         |    |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   | No |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   | No |
| MULTD       | F0  | F2  | F4    | 3    |        |      | Load3   | No |
| SUBD        | F8  | F6  | F2    | 4    | 7      | 8    |         |    |
| DIVD        | F10 | F0  | F6    | 5    |        |      |         |    |
| ADDD        | F6  | F8  | F2    | 6    | 10     | 11   |         |    |

## Reservation Stations:

| Time | Name  | Busy | S1    | S2    | RS    | RS    |    |
|------|-------|------|-------|-------|-------|-------|----|
|      |       |      | Op    | Vj    | Vk    | Qj    | Qk |
|      | Add1  | No   |       |       |       |       |    |
|      | Add2  | No   |       |       |       |       |    |
|      | Add3  | No   |       |       |       |       |    |
| 3    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | R(F6) | Mult1 |    |

## Register result status:

| Clock | F0 | F2    | F4 | F6 | F8 | F10   | F12 | ... | F30 |
|-------|----|-------|----|----|----|-------|-----|-----|-----|
| 12    | FU | Mult1 |    |    |    | Mult2 |     |     |     |

# Tomasulo Example Cycle 13

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |    |
|-------------|-----|-----|-------|------|--------|------|---------|----|
|             |     |     |       | Comp | Result |      |         |    |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   | No |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   | No |
| MULTD       | F0  | F2  | F4    | 3    |        |      | Load3   | No |
| SUBD        | F8  | F6  | F2    | 4    | 7      | 8    |         |    |
| DIVD        | F10 | F0  | F6    | 5    |        |      |         |    |
| ADDD        | F6  | F8  | F2    | 6    | 10     | 11   |         |    |

## Reservation Stations:

| Time | Name  | Busy | S1    | S2    | RS    | RS    |    |
|------|-------|------|-------|-------|-------|-------|----|
|      |       |      | Op    | Vj    | Vk    | Qj    | Qk |
|      | Add1  | No   |       |       |       |       |    |
|      | Add2  | No   |       |       |       |       |    |
|      | Add3  | No   |       |       |       |       |    |
| 2    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | R(F6) | Mult1 |    |

## Register result status:

| Clock | F0 | F2    | F4 | F6 | F8 | F10   | F12 | ... | F30 |
|-------|----|-------|----|----|----|-------|-----|-----|-----|
| 13    | FU | Mult1 |    |    |    | Mult2 |     |     |     |

# Tomasulo Example Cycle 14

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write | Busy | Address |    |
|-------------|-----|-----|-------|------|-------|------|---------|----|
| LD          | F6  | 34+ | R2    | 1    | 3     | 4    | Load1   | No |
| LD          | F2  | 45+ | R3    | 2    | 4     | 5    | Load2   | No |
| MULTD       | F0  | F2  | F4    | 3    |       |      | Load3   | No |
| SUBD        | F8  | F6  | F2    | 4    | 7     | 8    |         |    |
| DIVD        | F10 | F0  | F6    | 5    |       |      |         |    |
| ADDD        | F6  | F8  | F2    | 6    | 10    | 11   |         |    |

## Reservation Stations:

| Time | Name  | Busy | Op          | S1    | S2    | RS    | RS |
|------|-------|------|-------------|-------|-------|-------|----|
|      |       |      |             | Vj    | Vk    | Qj    | Qk |
|      | Add1  | No   |             |       |       |       |    |
|      | Add2  | No   |             |       |       |       |    |
|      | Add3  | No   |             |       |       |       |    |
| 1    | Mult1 | Yes  | MULTD M(A2) | R(F4) |       |       |    |
|      | Mult2 | Yes  | DIVD        |       | R(F6) | Mult1 |    |

## Register result status:

| Clock | F0 | F2    | F4 | F6 | F8 | F10   | F12 | ... | F30 |
|-------|----|-------|----|----|----|-------|-----|-----|-----|
| 14    | FU | Mult1 |    |    |    | Mult2 |     |     |     |

# Tomasulo Example Cycle 15

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |    |
|-------------|-----|-----|-------|------|--------|------|---------|----|
|             |     |     |       | Comp | Result |      |         |    |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   | No |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   | No |
| MULTD       | F0  | F2  | F4    | 3    | 15     |      | Load3   | No |
| SUBD        | F8  | F6  | F2    | 4    | 7      | 8    |         |    |
| DIVD        | F10 | F0  | F6    | 5    |        |      |         |    |
| ADDD        | F6  | F8  | F2    | 6    | 10     | 11   |         |    |

## Reservation Stations:

| Time | Name  | Busy | S1    | S2    | RS    | RS    |    |
|------|-------|------|-------|-------|-------|-------|----|
|      |       |      | Op    | Vj    | Vk    | Qj    | Qk |
|      | Add1  | No   |       |       |       |       |    |
|      | Add2  | No   |       |       |       |       |    |
|      | Add3  | No   |       |       |       |       |    |
| 0    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | R(F6) | Mult1 |    |

## Register result status:

| Clock | F0 | F2    | F4 | F6 | F8 | F10   | F12 | ... | F30 |
|-------|----|-------|----|----|----|-------|-----|-----|-----|
| 15    | FU | Mult1 |    |    |    | Mult2 |     |     |     |

# Tomasulo Example Cycle 16

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |
|-------------|-----|-----|-------|------|--------|------|---------|
|             |     |     |       | Comp | Result |      |         |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   |
| MULTD       | F0  | F2  | F4    | 3    | 15     | 16   | Load3   |
| SUBD        | F8  | F6  | F2    | 4    | 7      | 8    |         |
| DIVD        | F10 | F0  | F6    | 5    |        |      |         |
| ADDD        | F6  | F8  | F2    | 6    | 10     | 11   |         |

## Reservation Stations:

| Time | Name  | Busy | Op   | S1   | S2    | RS | RS |
|------|-------|------|------|------|-------|----|----|
|      |       |      |      | Vj   | Vk    | Qj | Qk |
|      | Add1  | No   |      |      |       |    |    |
|      | Add2  | No   |      |      |       |    |    |
|      | Add3  | No   |      |      |       |    |    |
|      | Mult1 | No   |      |      |       |    |    |
| 40   | Mult2 | Yes  | DIVD | M*F4 | R(F6) |    |    |

## Register result status:

| Clock | F0 | F2   | F4 | F6 | F8 | F10   | F12 | ... | F30 |
|-------|----|------|----|----|----|-------|-----|-----|-----|
| 16    | FU | M*F4 |    |    |    | Mult2 |     |     |     |

**Faster than light computation  
(skip a couple of cycles)**

# Tomasulo Example Cycle 55

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |
|-------------|-----|-----|-------|------|--------|------|---------|
|             |     |     |       | Comp | Result |      |         |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   |
| MULTD       | F0  | F2  | F4    | 3    | 15     | 16   | Load3   |
| SUBD        | F8  | F6  | F2    | 4    | 7      | 8    |         |
| DIVD        | F10 | F0  | F6    | 5    |        |      |         |
| ADDD        | F6  | F8  | F2    | 6    | 10     | 11   |         |

## Reservation Stations:

| Time | Name  | Busy | S1   | S2   | RS    | RS |    |
|------|-------|------|------|------|-------|----|----|
|      |       |      | Op   | Vj   | Vk    | Qj | Qk |
|      | Add1  | No   |      |      |       |    |    |
|      | Add2  | No   |      |      |       |    |    |
|      | Add3  | No   |      |      |       |    |    |
|      | Mult1 | No   |      |      |       |    |    |
| 1    | Mult2 | Yes  | DIVD | M*F4 | R(F6) |    |    |

## Register result status:

| Clock | F0 | F2 | F4 | F6 | F8 | F10   | F12 | ... | F30 |
|-------|----|----|----|----|----|-------|-----|-----|-----|
| 55    | FU |    |    |    |    | Mult2 |     |     |     |

# Tomasulo Example Cycle 56

## Instruction status:

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |
|-------------|-----|-----|-------|------|--------|------|---------|
|             |     |     |       | Comp | Result |      |         |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   |
| MULTD       | F0  | F2  | F4    | 3    | 15     | 16   | Load3   |
| SUBD        | F8  | F6  | F2    | 4    | 7      | 8    |         |
| DIVD        | F10 | F0  | F6    | 5    | 56     |      |         |
| ADDD        | F6  | F8  | F2    | 6    | 10     | 11   |         |

## Reservation Stations:

| Time | Name  | Busy | S1   | S2   | RS    | RS |    |
|------|-------|------|------|------|-------|----|----|
|      |       |      | Op   | Vj   | Vk    | Qj | Qk |
|      | Add1  | No   |      |      |       |    |    |
|      | Add2  | No   |      |      |       |    |    |
|      | Add3  | No   |      |      |       |    |    |
|      | Mult1 | No   |      |      |       |    |    |
| 0    | Mult2 | Yes  | DIVD | M*F4 | R(F6) |    |    |

## Register result status:

| Clock | F0 | F2 | F4 | F6 | F8 | F10   | F12 | ... | F30 |
|-------|----|----|----|----|----|-------|-----|-----|-----|
| 56    | FU |    |    |    |    | Mult2 |     |     |     |

Mult2 is completing; what is waiting for it?

# Tomasulo Example Cycle 57

## Instruction status:

| Instruction | j   | k   |    | Issue | Comp | Result | Busy  | Address |
|-------------|-----|-----|----|-------|------|--------|-------|---------|
| LD          | F6  | 34+ | R2 | 1     | 3    | 4      | Load1 | No      |
| LD          | F2  | 45+ | R3 | 2     | 4    | 5      | Load2 | No      |
| MULTD       | F0  | F2  | F4 | 3     | 15   | 16     | Load3 | No      |
| SUBD        | F8  | F6  | F2 | 4     | 7    | 8      |       |         |
| DIVD        | F10 | F0  | F6 | 5     | 56   | 57     |       |         |
| ADDD        | F6  | F8  | F2 | 6     | 10   | 11     |       |         |

## Reservation Stations:

| Time | Name  | Busy | Op | S1 | S2 | RS | RS |
|------|-------|------|----|----|----|----|----|
|      | Add1  | No   |    |    |    |    |    |
|      | Add2  | No   |    |    |    |    |    |
|      | Add3  | No   |    |    |    |    |    |
|      | Mult1 | No   |    |    |    |    |    |
|      | Mult2 | No   |    |    |    |    |    |

## Register result status:

| Clock | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30    |
|-------|----|----|----|----|----|-----|-----|-----|--------|
| 57    | FU |    |    |    |    |     |     |     | Result |

Once again: In-order issue, out-of-order execution and completion.