



### *Instruction Level Parallelism*

Tomasulo algorithm

---

# HPPS

---



# Outline

- Tomasulo algorithm
- Comparison between Scoreboard and Tomasulo
- Tomasulo example with loops
- Hardware-based speculation
- In-order commit of instructions: reorder buffer
- Speculative Tomasulo

H  
P  
S

# Tomasulo Algorithm

- Another dynamic algorithm: allows execution to proceed in the presence of dependences
- Invented at IBM 3 years after CDC 6600 for the IBM 360/91
- Same Goal: high performance w/o special compilers
- Lead to:
  - ▶ Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604



# Tomasulo Algorithm vs. Scoreboard

- Control & buffers distributed with Function Units (FU) vs. centralized in scoreboard;
  - ▶ FU buffers called “reservation stations”; have pending operands
- Registers in instructions replaced by values or pointers to reservation stations(RS); called register renaming ;
  - ▶ avoids WAR, WAW hazards
  - ▶ More reservation stations than registers, so can do optimizations compilers can't
- Results to FU from RS, not through registers, over Common Data Bus that broadcasts results to all FUs
- Load and Stores treated as FUs with RSs as well
- Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue



# Tomasulo Algorithm Basics

- The control logic and the buffers are distributed with FUs (vs. centralized in scoreboard)
- Operand buffers are called **Reservation Stations**
- Each instruction is an entry of a reservation station
- Its operands are replaced by values or pointers  
**(Register Renaming)**



# Tomasulo Algorithm Basics

- Register Renaming allows to:
  - ▶ Avoid WAR and WAW hazards
  - ▶ Reservation stations are more than registers (so can do better optimizations than a compiler).
- Results are dispatched to other FUs through a Common Data Bus
- Load/Stores treated as FUs

# Tomasulo Algorithm for an FPU



# Reservation Station Components

- Tag identifying the RS
- OP = the operation to perform on the component.
- V<sub>j</sub>, V<sub>k</sub> = Value of the source operands
- Q<sub>j</sub>, Q<sub>k</sub> = Pointers to RS that produce V<sub>j</sub>, V<sub>k</sub>  
Zero value = Source op. is already available in V<sub>j</sub> or V<sub>k</sub>
- Busy = Indicates RS Busy
- Note: Only one of V-field or Q-field is valid for each operand



# Other components

- RF and the Store buffer have a Value (V) and a Pointer (Q) field.  
Pointer (Q) field corresponds to number of reservation station producing the result to be stored in RF or store buffer  
If zero  $\Rightarrow$  no active instructions producing the result (RF or store buffer content is the correct value).
- Load buffers have an address field (A), and a busy field.
- Store Buffers have also an address field (A).  
● A: To hold info for memory address calculation for load/store. Initially contains the instruction offset (immediate field); after address calculation stores the effective address.

# Three stages of the Tomasulo Algorithm

## • ISSUE

Get an instruction  $I$  from the queue. If it is an FP op.  
Check if an RS is empty (i.e., check for structural  
hazards).

Rename registers;

**WAR resolution:** If  $I$  writes  $Rx$ , read by an  
instruction  $K$  already issued,  $K$  knows already the  
value of  $Rx$  or knows what instruction will write it.  
So the RF can be linked to  $I$ .

**WAW resolution:** Since we use in-order issue, the  
RF can be linked to  $I$ .



# Three stages of the Tomasulo Algorithm (1)

## Execution

When both operands are ready then execute. If not ready, watch the common data bus for results.

By delaying execution until operands are available, RAW hazards are avoided.

Notice that several instructions could become ready in the same clock cycle for the same FU.

## Load and stores: two-step process

- First step: compute effective address, place it in load or store buffer.
- Loads in Load Buffer execute as soon as memory unit is available; stores in store buffer wait for the value to be stored before being sent to memory unit.

# Three stages of the Tomasulo Algorithm (2)

- Loads and stores: kept in program order through effective address calculation - helps in preventing hazards through memory.
- To preserve exception behavior:
  - ▶ no instruction can initiate execution until all branches preceding it in program order have completed. If branch prediction is used, CPU must know prediction correctness before beginning execution of following instructions. Speculation allows more brilliant results!
- Write result

When result is available, write on Common Data Bus and from there into RF and into all RSs (including store buffers) waiting for this result; stores also write data to memory during this stage. Mark reservation stations available.



# The Common Data Bus

- A common data bus is a data+source bus.
- In the IBM 360/91  
Data=64 bits, Source=4 bits
- FU must perform associative lookup in the RS.

H  
P  
S

# Tomasulo algorithm (some details)

- Loads and stores go through a functional unit for effective address computation before proceeding to effective load and store buffers;
- Loads take a second execution step to access memory, then go to Write Result to send the value from memory to RF and/or RS;
- Stores complete their execution in their Write Result stage (writes data to memory)
- All writes occur in Write Result - simplifying Tomasulo algorithm.



# Tomasulo algorithm (some details)

- A Load and a Store can be done in different order, *provided they access different memory locations*
- Otherwise, a WAR (interchange load-store sequence) or a RAW (interchange store-load sequence) may result (WAW if two stores are interchanged). Loads can be reordered freely.
- To detect such hazards: data memory addresses associated with *any earlier memory operation* must have been computed by the CPU (e.g.: address computation executed in program order)



# Tomasulo algorithm (some details)

- Load executed out of order with previous store: assume address computed in program order. When Load address has been computed, it can be compared with A fields in active Store buffers: in the case of a match, Load is not sent to Load buffer until conflicting store completes.
- Stores must check for matching addresses in both Load and Store buffers (*dynamic disambiguation*, alternative to static disambiguation performed by the compiler)
- Drawback: amount of hardware required.
- Each RS must contain a fast associative buffer; single CDB may limit performance.



# Tomasulo Example

*Instruction status:*

| Instruction | j   | k   | Issue | Exec | Write |
|-------------|-----|-----|-------|------|-------|
|             |     |     |       |      |       |
| LD          | F6  | 34+ | R2    |      |       |
| LD          | F2  | 45+ | R3    |      |       |
| MULTD       | F0  | F2  | F4    |      |       |
| SUBD        | F8  | F6  | F2    |      |       |
| DIVD        | F10 | F0  | F6    |      |       |
| ADDD        | F6  | F8  | F2    |      |       |

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

*Reservation Stations:*

| Time | Name  | Busy | Op | S1 | S2 | RS | RS |
|------|-------|------|----|----|----|----|----|
|      | Add1  | No   |    |    |    |    |    |
|      | Add2  | No   |    |    |    |    |    |
|      | Add3  | No   |    |    |    |    |    |
|      | Mult1 | No   |    |    |    |    |    |
|      | Mult2 | No   |    |    |    |    |    |

*Register result status:*

| Clock | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|----|----|----|-----|-----|-----|-----|
| 0     | FU |    |    |    |    |     |     |     |     |

# Tomasulo Example Cycle 1

*Instruction status:*

| Instruction | j   | k   | Issue | Exec | Write  |
|-------------|-----|-----|-------|------|--------|
|             |     |     |       | Comp | Result |
| LD          | F6  | 34+ | R2    | 1    |        |
| LD          | F2  | 45+ | R3    |      |        |
| MULTD       | F0  | F2  | F4    |      |        |
| SUBD        | F8  | F6  | F2    |      |        |
| DIVD        | F10 | F0  | F6    |      |        |
| ADDD        | F6  | F8  | F2    |      |        |

|       | Busy | Address |
|-------|------|---------|
| Load1 | Yes  | 34+R2   |
| Load2 | No   |         |
| Load3 | No   |         |

*Reservation Stations:*

| Time | Name  | Busy | Op | S1 | S2 | RS | RS |
|------|-------|------|----|----|----|----|----|
|      | Add1  | No   |    |    |    |    |    |
|      | Add2  | No   |    |    |    |    |    |
|      | Add3  | No   |    |    |    |    |    |
|      | Mult1 | No   |    |    |    |    |    |
|      | Mult2 | No   |    |    |    |    |    |

*Register result status:*

| Clock | F0 | F2 | F4 | F6 | F8    | F10 | F12 | ... | F30 |
|-------|----|----|----|----|-------|-----|-----|-----|-----|
| 1     |    |    |    |    | Load1 |     |     |     |     |

# Tomasulo Example Cycle 2

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | <i>Issue</i> | Exec        | Write         |
|-------------|----------|----------|--------------|-------------|---------------|
|             |          |          |              | <i>Comp</i> | <i>Result</i> |
| LD          | F6       | 34+      | R2           | 1           |               |
| LD          | F2       | 45+      | R3           | 2           |               |
| MULTD       | F0       | F2       | F4           |             |               |
| SUBD        | F8       | F6       | F2           |             |               |
| DIVD        | F10      | F0       | F6           |             |               |
| ADDD        | F6       | F8       | F2           |             |               |

|       | Busy | Address |
|-------|------|---------|
| Load1 | Yes  | 34+R2   |
| Load2 | Yes  | 45+R3   |
| Load3 | No   |         |

*Reservation Stations:*

| Time | Name  | Busy | SI        | S2        | RS        | RS        |           |
|------|-------|------|-----------|-----------|-----------|-----------|-----------|
|      |       |      | <i>Op</i> | <i>Vj</i> | <i>Vk</i> | <i>Qj</i> | <i>Qk</i> |
|      | Add1  | No   |           |           |           |           |           |
|      | Add2  | No   |           |           |           |           |           |
|      | Add3  | No   |           |           |           |           |           |
|      | Mult1 | No   |           |           |           |           |           |
|      | Mult2 | No   |           |           |           |           |           |

*Register result status:*

| Clock | F0 | F2    | F4 | F6 | F8    | F10 | F12 | ... | F30 |
|-------|----|-------|----|----|-------|-----|-----|-----|-----|
| 2     | FU | Load2 |    |    | Load1 |     |     |     |     |

Note: Unlike 6600, can have multiple loads outstanding

# Tomasulo Example Cycle 3

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | <i>Issue</i> | <i>Exec</i> | <i>Write</i>  | <i>Busy</i> | <i>Address</i> |
|-------------|----------|----------|--------------|-------------|---------------|-------------|----------------|
|             |          |          |              | <i>Comp</i> | <i>Result</i> |             |                |
| LD          | F6       | 34+      | R2           | 1           | 3             | Load1       | Yes 34+R2      |
| LD          | F2       | 45+      | R3           | 2           |               | Load2       | Yes 45+R3      |
| MULTD       | F0       | F2       | F4           | 3           |               | Load3       | No             |
| SUBD        | F8       | F6       | F2           |             |               |             |                |
| DIVD        | F10      | F0       | F6           |             |               |             |                |
| ADDD        | F6       | F8       | F2           |             |               |             |                |

*Reservation Stations:*

| Time | Name  | <i>Busy</i> | <i>Op</i> | <i>S1</i> | <i>S2</i> | <i>RS</i> | <i>RS</i> |
|------|-------|-------------|-----------|-----------|-----------|-----------|-----------|
|      |       |             |           | <i>Vj</i> | <i>Vk</i> | <i>Qj</i> | <i>Qk</i> |
|      | Add1  | No          |           |           |           |           |           |
|      | Add2  | No          |           |           |           |           |           |
|      | Add3  | No          |           |           |           |           |           |
|      | Mult1 | Yes         | MULTD     |           | R(F4)     | Load2     |           |
|      | Mult2 | No          |           |           |           |           |           |

*Register result status:*

| Clock |    | F0    | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|-------|-------|----|-------|----|-----|-----|-----|-----|
| 3     | FU | Mult1 | Load2 |    | Load1 |    |     |     |     |     |

Note: registers names are removed ("renamed") in Reservation Stations; MULT issued vs. scoreboard  
 Load1 completing: what is waiting for Load1?

# Tomasulo Example Cycle 4

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | Issue | Exec | Write  | Busy      | Address  |
|-------------|----------|----------|-------|------|--------|-----------|----------|
|             |          |          |       | Comp | Result |           |          |
| LD          | F6       | 34+      | R2    | 1    | 3      | 4         | Load1 No |
| LD          | F2       | 45+      | R3    | 2    | 4      | Load2 Yes | 45+R3    |
| MULTD       | F0       | F2       | F4    | 3    |        | Load3 No  |          |
| SUBD        | F8       | F6       | F2    | 4    |        |           |          |
| DIVD        | F10      | F0       | F6    |      |        |           |          |
| ADDD        | F6       | F8       | F2    |      |        |           |          |

*Reservation Stations:*

| Time | Name  | Busy | Op    | SI    | S2    | RS    | RS    |
|------|-------|------|-------|-------|-------|-------|-------|
|      |       |      |       | Vj    | Vk    | Qj    | Qk    |
|      | Add1  | Yes  | SUBD  | M(A1) |       |       | Load2 |
|      | Add2  | No   |       |       |       |       |       |
|      | Add3  | No   |       |       |       |       |       |
|      | Mult1 | Yes  | MULTD |       | R(F4) | Load2 |       |
|      | Mult2 | No   |       |       |       |       |       |

*Register result status:*

| Clock | F0 | F2    | F4    | F6 | F8    | F10  | F12 | ... | F30 |
|-------|----|-------|-------|----|-------|------|-----|-----|-----|
| 4     | FU | Mult1 | Load2 |    | M(A1) | Add1 |     |     |     |

Load2 completing; what is waiting for Load2?

# Tomasulo Example Cycle 5

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | <i>Issue</i> | <i>Exec</i> | <i>Write</i>  |
|-------------|----------|----------|--------------|-------------|---------------|
|             |          |          |              | <i>Comp</i> | <i>Result</i> |
| LD          | F6       | 34+      | R2           | 1           | 3             |
| LD          | F2       | 45+      | R3           | 2           | 4             |
| MULTD       | F0       | F2       | F4           | 3           |               |
| SUBD        | F8       | F6       | F2           | 4           |               |
| DIVD        | F10      | F0       | F6           | 5           |               |
| ADDD        | F6       | F8       | F2           |             |               |

| <i>Busy</i> | <i>Address</i> |
|-------------|----------------|
| Load1       | No             |
| Load2       | No             |
| Load3       | No             |

*Reservation Stations:*

| <i>Time</i> | <i>Name</i> | <i>Busy</i> | <i>Op</i> | <i>S1</i> | <i>S2</i> | <i>RS</i> | <i>RS</i> |
|-------------|-------------|-------------|-----------|-----------|-----------|-----------|-----------|
| 2           | Add1        | Yes         | SUBD      | M(A1)     | M(A2)     |           |           |
|             | Add2        | No          |           |           |           |           |           |
|             | Add3        | No          |           |           |           |           |           |
| 10          | Mult1       | Yes         | MULTD     | M(A2)     | R(F4)     |           |           |
|             | Mult2       | Yes         | DIVD      |           | M(A1)     | Mult1     |           |

*Register result status:*

| Clock |           | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
| 5     | <i>FU</i> | Mult1     | M(A2)     |           | M(A1)     | Add1      | Mult2      |            |     |            |

# Tomasulo Example Cycle 6

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> |    | Exec         |             |               | Write |  | Busy  | Address |
|-------------|----------|----------|----|--------------|-------------|---------------|-------|--|-------|---------|
|             |          |          |    | <i>Issue</i> | <i>Comp</i> | <i>Result</i> |       |  |       |         |
| LD          | F6       | 34+      | R2 | 1            | 3           | 4             |       |  | Load1 | No      |
| LD          | F2       | 45+      | R3 | 2            | 4           | 5             |       |  | Load2 | No      |
| MULTD       | F0       | F2       | F4 | 3            |             |               |       |  | Load3 | No      |
| SUBD        | F8       | F6       | F2 | 4            |             |               |       |  |       |         |
| DIVD        | F10      | F0       | F6 | 5            |             |               |       |  |       |         |
| ADDD        | F6       | F8       | F2 | 6            |             |               |       |  |       |         |

*Reservation Stations:*

| Time | Name  | Busy | S1        |           | S2        |           | RS        |  | RS |  |
|------|-------|------|-----------|-----------|-----------|-----------|-----------|--|----|--|
|      |       |      | <i>Op</i> | <i>Vj</i> | <i>Vk</i> | <i>Qj</i> | <i>Qk</i> |  |    |  |
| 1    | Add1  | Yes  | SUBD      | M(A1)     | M(A2)     |           |           |  |    |  |
|      | Add2  | Yes  | ADDD      |           | M(A2)     |           | Add1      |  |    |  |
|      | Add3  | No   |           |           |           |           |           |  |    |  |
| 9    | Mult1 | Yes  | MULTD     | M(A2)     | R(F4)     |           |           |  |    |  |
|      | Mult2 | Yes  | DIVD      |           | M(A1)     | Mult1     |           |  |    |  |

*Register result status:*

| Clock | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
| 6     | <i>FU</i> | Mult1     | M(A2)     |           | Add2      | Add1       | Mult2      |     |            |

Issue ADDD here vs. scoreboard?

# Tomasulo Example Cycle 7

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | <i>Issue</i> | <i>Exec</i> | <i>Write</i>  | <i>Busy</i> | <i>Address</i> |
|-------------|----------|----------|--------------|-------------|---------------|-------------|----------------|
|             |          |          |              | <i>Comp</i> | <i>Result</i> |             |                |
| LD          | F6       | 34+      | R2           | 1           | 3             | 4           | Load1          |
| LD          | F2       | 45+      | R3           | 2           | 4             | 5           | Load2          |
| MULTD       | F0       | F2       | F4           | 3           |               |             | Load3          |
| SUBD        | F8       | F6       | F2           | 4           | 7             |             |                |
| DIVD        | F10      | F0       | F6           | 5           |               |             |                |
| ADDD        | F6       | F8       | F2           | 6           |               |             |                |

*Reservation Stations:*

| Time | Name  | Busy | <i>S1</i> | <i>S2</i> | <i>RS</i> | <i>RS</i> |
|------|-------|------|-----------|-----------|-----------|-----------|
|      |       |      | <i>Op</i> | <i>Vj</i> | <i>Vk</i> | <i>Qk</i> |
| 0    | Add1  | Yes  | SUBD      | M(A1)     | M(A2)     |           |
|      | Add2  | Yes  | ADDD      |           | M(A2)     | Add1      |
|      | Add3  | No   |           |           |           |           |
| 8    | Mult1 | Yes  | MULTD     | M(A2)     | R(F4)     |           |
|      | Mult2 | Yes  | DIVD      |           | M(A1)     | Mult1     |

*Register result status:*

| Clock |           | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
| 7     | <i>FU</i> | Mult1     | M(A2)     |           |           | Add2      | Add1       | Mult2      |     |            |

Add1 completing; what is waiting for it?

# Tomasulo Example Cycle 8

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | <i>Issue</i> | <i>Exec</i> | <i>Write</i>  | <i>Busy</i> | <i>Address</i> |    |
|-------------|----------|----------|--------------|-------------|---------------|-------------|----------------|----|
|             |          |          |              | <i>Comp</i> | <i>Result</i> |             |                |    |
| LD          | F6       | 34+      | R2           | 1           | 3             | 4           | Load1          | No |
| LD          | F2       | 45+      | R3           | 2           | 4             | 5           | Load2          | No |
| MULTD       | F0       | F2       | F4           | 3           |               |             | Load3          | No |
| SUBD        | F8       | F6       | F2           | 4           | 7             | 8           |                |    |
| DIVD        | F10      | F0       | F6           | 5           |               |             |                |    |
| ADDD        | F6       | F8       | F2           | 6           |               |             |                |    |

*Reservation Stations:*

| Time | Name  | Busy | <i>S1</i> | <i>S2</i> | <i>RS</i> | <i>RS</i> |
|------|-------|------|-----------|-----------|-----------|-----------|
|      |       |      | <i>Op</i> | <i>Vj</i> | <i>Vk</i> | <i>Qj</i> |
|      | Add1  | No   |           |           |           |           |
| 2    | Add2  | Yes  | ADDD      | (M-M)     | M(A2)     |           |
|      | Add3  | No   |           |           |           |           |
| 7    | Mult1 | Yes  | MULTD     | M(A2)     | R(F4)     |           |
|      | Mult2 | Yes  | DIVD      |           | M(A1)     | Mult1     |

*Register result status:*

| Clock |           | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
|       | <i>FU</i> | Mult1     | M(A2)     |           | Add2      | (M-M)     | Mult2      |            |     |            |
| 8     |           |           |           |           |           |           |            |            |     |            |

# Tomasulo Example Cycle 9

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | <i>Issue</i> | <i>Exec</i> | <i>Write</i>  | <i>Busy</i> | <i>Address</i> |    |
|-------------|----------|----------|--------------|-------------|---------------|-------------|----------------|----|
|             |          |          |              | <i>Comp</i> | <i>Result</i> |             |                |    |
| LD          | F6       | 34+      | R2           | 1           | 3             | 4           | Load1          | No |
| LD          | F2       | 45+      | R3           | 2           | 4             | 5           | Load2          | No |
| MULTD       | F0       | F2       | F4           | 3           |               |             | Load3          | No |
| SUBD        | F8       | F6       | F2           | 4           | 7             | 8           |                |    |
| DIVD        | F10      | F0       | F6           | 5           |               |             |                |    |
| ADDD        | F6       | F8       | F2           | 6           |               |             |                |    |

*Reservation Stations:*

| Time | Name  | Busy | <i>S1</i> | <i>S2</i> | <i>RS</i> | <i>RS</i> |
|------|-------|------|-----------|-----------|-----------|-----------|
|      |       |      | <i>Op</i> | <i>Vj</i> | <i>Vk</i> | <i>Qj</i> |
|      | Add1  | No   |           |           |           |           |
| 1    | Add2  | Yes  | ADDD      | (M-M)     | M(A2)     |           |
|      | Add3  | No   |           |           |           |           |
| 6    | Mult1 | Yes  | MULTD     | M(A2)     | R(F4)     |           |
|      | Mult2 | Yes  | DIVD      |           | M(A1)     | Mult1     |

*Register result status:*

| Clock |           | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
| 9     | <i>FU</i> | Mult1     | M(A2)     |           | Add2      | (M-M)     | Mult2      |            |     |            |

# Tomasulo Example Cycle 10

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | Issue | Exec | Write  | Busy | Address |
|-------------|----------|----------|-------|------|--------|------|---------|
|             |          |          |       | Comp | Result |      |         |
| LD          | F6       | 34+      | R2    | 1    | 3      | 4    | Load1   |
| LD          | F2       | 45+      | R3    | 2    | 4      | 5    | Load2   |
| MULTD       | F0       | F2       | F4    | 3    |        |      | Load3   |
| SUBD        | F8       | F6       | F2    | 4    | 7      | 8    |         |
| DIVD        | F10      | F0       | F6    | 5    |        |      |         |
| ADD         | F6       | F8       | F2    | 6    | 10     |      |         |

*Reservation Stations:*

| Time | Name  | Busy | S1    | S2    | RS    | RS    |    |
|------|-------|------|-------|-------|-------|-------|----|
|      |       |      | Op    | Vj    | Vk    | Qj    | Qk |
|      | Add1  | No   |       |       |       |       |    |
| 0    | Add2  | Yes  | ADD   | (M-M) | M(A2) |       |    |
|      | Add3  | No   |       |       |       |       |    |
| 5    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | M(A1) | Mult1 |    |

*Register result status:*

| Clock | F0 | F2    | F4    | F6 | F8   | F10   | F12   | ... | F30 |
|-------|----|-------|-------|----|------|-------|-------|-----|-----|
| 10    | FU | Mult1 | M(A2) |    | Add2 | (M-M) | Mult2 |     |     |

Add2 completing; what is waiting for it?

# Tomasulo Example Cycle 11

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> |    | <i>Issue</i> | <i>Exec</i>   | <i>Write</i> | <i>Busy</i> | <i>Address</i> |
|-------------|----------|----------|----|--------------|---------------|--------------|-------------|----------------|
|             |          |          |    | <i>Comp</i>  | <i>Result</i> |              |             |                |
| LD          | F6       | 34+      | R2 | 1            | 3             | 4            | Load1       | No             |
| LD          | F2       | 45+      | R3 | 2            | 4             | 5            | Load2       | No             |
| MULTD       | F0       | F2       | F4 | 3            |               |              | Load3       | No             |
| SUBD        | F8       | F6       | F2 | 4            | 7             | 8            |             |                |
| DIVD        | F10      | F0       | F6 | 5            |               |              |             |                |
| ADDD        | F6       | F8       | F2 | 6            | 10            | 11           |             |                |

*Reservation Stations:*

| Time | Name  | Busy | <i>S1</i> | <i>S2</i> | <i>RS</i> | <i>RS</i> |
|------|-------|------|-----------|-----------|-----------|-----------|
|      |       |      | <i>Op</i> | <i>Vj</i> | <i>Vk</i> | <i>Qk</i> |
|      | Add1  | No   |           |           |           |           |
|      | Add2  | No   |           |           |           |           |
|      | Add3  | No   |           |           |           |           |
| 4    | Mult1 | Yes  | MULTD     | M(A2)     | R(F4)     |           |
|      | Mult2 | Yes  | DIVD      |           | M(A1)     | Mult1     |

*Register result status:*

| Clock | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
| 11    | <i>FU</i> | Mult1     | M(A2)     | (M-M+M)   | (M-M)     | Mult2      |            |     |            |

Write result of ADDD here vs. scoreboard?

All quick instructions complete in this cycle!

# Tomasulo Example Cycle 12

*Instruction status:*

| Instruction | j   | k   | Issue | Exec | Write  | Busy | Address |    |
|-------------|-----|-----|-------|------|--------|------|---------|----|
|             |     |     |       | Comp | Result |      |         |    |
| LD          | F6  | 34+ | R2    | 1    | 3      | 4    | Load1   | No |
| LD          | F2  | 45+ | R3    | 2    | 4      | 5    | Load2   | No |
| MULTD       | F0  | F2  | F4    | 3    |        |      | Load3   | No |
| SUBD        | F8  | F6  | F2    | 4    | 7      | 8    |         |    |
| DIVD        | F10 | F0  | F6    | 5    |        |      |         |    |
| ADDD        | F6  | F8  | F2    | 6    | 10     | 11   |         |    |

*Reservation Stations:*

| Time | Name  | Busy | S1    | S2    | RS    | RS    |    |
|------|-------|------|-------|-------|-------|-------|----|
|      |       |      | Op    | Vj    | Vk    | Qj    | Qk |
|      | Add1  | No   |       |       |       |       |    |
|      | Add2  | No   |       |       |       |       |    |
|      | Add3  | No   |       |       |       |       |    |
| 3    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |    |
|      | Mult2 | Yes  | DIVD  |       | M(A1) | Mult1 |    |

*Register result status:*

| Clock | F0 | F2    | F4    | F6 | F8      | F10  | F12   | ... | F30 |
|-------|----|-------|-------|----|---------|------|-------|-----|-----|
| 12    | FU | Mult1 | M(A2) |    | (M-M+M) | M(M) | Mult2 |     |     |

# Tomasulo Example Cycle 13

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> |    | Exec  |      |        | Write |  | Busy  | Address |
|-------------|----------|----------|----|-------|------|--------|-------|--|-------|---------|
|             |          |          |    | Issue | Comp | Result |       |  |       |         |
| LD          | F6       | 34+      | R2 | 1     | 3    | 4      |       |  | Load1 | No      |
| LD          | F2       | 45+      | R3 | 2     | 4    | 5      |       |  | Load2 | No      |
| MULTD       | F0       | F2       | F4 | 3     |      |        |       |  | Load3 | No      |
| SUBD        | F8       | F6       | F2 | 4     | 7    | 8      |       |  |       |         |
| DIVD        | F10      | F0       | F6 | 5     |      |        |       |  |       |         |
| ADDD        | F6       | F8       | F2 | 6     | 10   | 11     |       |  |       |         |

*Reservation Stations:*

| Time | Name  | Busy | S1    |       | S2    |       | RS    |  | RS |  |
|------|-------|------|-------|-------|-------|-------|-------|--|----|--|
|      |       |      | Op    | Vj    | Vk    | Qj    | Qk    |  |    |  |
|      | Add1  | No   |       |       |       |       |       |  |    |  |
|      | Add2  | No   |       |       |       |       |       |  |    |  |
|      | Add3  | No   |       |       |       |       |       |  |    |  |
| 2    | Mult1 | Yes  | MULTD | M(A2) | R(F4) |       |       |  |    |  |
|      | Mult2 | Yes  | DIVD  |       |       | M(A1) | Mult1 |  |    |  |

*Register result status:*

| Clock | F0 | F2    | F4    | F6 | F8      | F10   | F12   | ... | F30 |
|-------|----|-------|-------|----|---------|-------|-------|-----|-----|
| 13    | FU | Mult1 | M(A2) |    | (M-M+M) | (M-M) | Mult2 |     |     |

# Tomasulo Example Cycle 14

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> |    | Exec Write   |             |               | Busy  | Address |
|-------------|----------|----------|----|--------------|-------------|---------------|-------|---------|
|             |          |          |    | <i>Issue</i> | <i>Comp</i> | <i>Result</i> |       |         |
| LD          | F6       | 34+      | R2 | 1            | 3           | 4             | Load1 | No      |
| LD          | F2       | 45+      | R3 | 2            | 4           | 5             | Load2 | No      |
| MULTD       | F0       | F2       | F4 | 3            |             |               | Load3 | No      |
| SUBD        | F8       | F6       | F2 | 4            | 7           | 8             |       |         |
| DIVD        | F10      | F0       | F6 | 5            |             |               |       |         |
| ADDD        | F6       | F8       | F2 | 6            | 10          | 11            |       |         |

*Reservation Stations:*

| Time | Name  | Busy | S1        |           | S2        |           | RS        |  |
|------|-------|------|-----------|-----------|-----------|-----------|-----------|--|
|      |       |      | <i>Op</i> | <i>Vj</i> | <i>Vk</i> | <i>Qj</i> | <i>Qk</i> |  |
|      | Add1  | No   |           |           |           |           |           |  |
|      | Add2  | No   |           |           |           |           |           |  |
|      | Add3  | No   |           |           |           |           |           |  |
| 1    | Mult1 | Yes  | MULTD     | M(A2)     | R(F4)     |           |           |  |
|      | Mult2 | Yes  | DIVD      |           | M(A1)     | Mult1     |           |  |

*Register result status:*

| Clock |           | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i>    | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|-----------|-----------|-----------|-----------|--------------|-----------|------------|------------|-----|------------|
| 14    | <i>FU</i> | Mult1     | M(A2)     |           | (M-M+N(M-M)) | Mult2     |            |            |     |            |

# Tomasulo Example Cycle 15

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> |    | <i>Issue</i> | <i>Exec</i>   | <i>Write</i> | <i>Busy</i> | <i>Address</i> |
|-------------|----------|----------|----|--------------|---------------|--------------|-------------|----------------|
|             |          |          |    | <i>Comp</i>  | <i>Result</i> |              |             |                |
| LD          | F6       | 34+      | R2 | 1            | 3             | 4            | Load1       | No             |
| LD          | F2       | 45+      | R3 | 2            | 4             | 5            | Load2       | No             |
| MULTD       | F0       | F2       | F4 | 3            | 15            |              | Load3       | No             |
| SUBD        | F8       | F6       | F2 | 4            | 7             | 8            |             |                |
| DIVD        | F10      | F0       | F6 | 5            |               |              |             |                |
| ADDD        | F6       | F8       | F2 | 6            | 10            | 11           |             |                |

*Reservation Stations:*

| <i>Time</i> | <i>Name</i> | <i>Busy</i> | <i>Op</i> | <i>S1</i> | <i>S2</i> | <i>RS</i> | <i>RS</i> |
|-------------|-------------|-------------|-----------|-----------|-----------|-----------|-----------|
|             | Add1        | No          |           |           |           |           |           |
|             | Add2        | No          |           |           |           |           |           |
|             | Add3        | No          |           |           |           |           |           |
| 0           | Mult1       | Yes         | MULTD     | M(A2)     | R(F4)     |           |           |
|             | Mult2       | Yes         | DIVD      |           | M(A1)     | Mult1     |           |

*Register result status:*

| Clock | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
| 15    | <i>FU</i> | Mult1     | M(A2)     |           | (M-M+M)   | (M-M)      | Mult2      |     |            |

# Tomasulo Example Cycle 16

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> |    | Exec         | Write       | Busy | Address  |
|-------------|----------|----------|----|--------------|-------------|------|----------|
|             |          |          |    | <i>Issue</i> | <i>Comp</i> |      |          |
| LD          | F6       | 34+      | R2 | 1            | 3           | 4    | Load1 No |
| LD          | F2       | 45+      | R3 | 2            | 4           | 5    | Load2 No |
| MULTD       | F0       | F2       | F4 | 3            | 15          | 16   | Load3 No |
| SUBD        | F8       | F6       | F2 | 4            | 7           | 8    |          |
| DIVD        | F10      | F0       | F6 | 5            |             |      |          |
| ADDD        | F6       | F8       | F2 | 6            | 10          | 11   |          |

*Reservation Stations:*

| Time | Name  | Busy | Op   | <i>S1</i> | <i>S2</i> | <i>RS</i> | <i>RS</i> |
|------|-------|------|------|-----------|-----------|-----------|-----------|
|      |       |      |      | <i>Vj</i> | <i>Vk</i> | <i>Qj</i> | <i>Qk</i> |
|      | Add1  | No   |      |           |           |           |           |
|      | Add2  | No   |      |           |           |           |           |
|      | Add3  | No   |      |           |           |           |           |
|      | Mult1 | No   |      |           |           |           |           |
| 40   | Mult2 | Yes  | DIVD | M*F4      | M(A1)     |           |           |

*Register result status:*

| Clock |    | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|----|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
| 16    | FU | M*F4      | M(A2)     |           | (M-M+N    | (M-M)     | Mult2      |            |     |            |

**Faster than light  
computation  
(skip a couple of cycles)**

# Tomasulo Example Cycle 55

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | Issue | Exec | Write  | Busy | Address |    |
|-------------|----------|----------|-------|------|--------|------|---------|----|
|             |          |          |       | Comp | Result |      |         |    |
| LD          | F6       | 34+      | R2    | 1    | 3      | 4    | Load1   | No |
| LD          | F2       | 45+      | R3    | 2    | 4      | 5    | Load2   | No |
| MULTD       | F0       | F2       | F4    | 3    | 15     | 16   | Load3   | No |
| SUBD        | F8       | F6       | F2    | 4    | 7      | 8    |         |    |
| DIVD        | F10      | F0       | F6    | 5    |        |      |         |    |
| ADDD        | F6       | F8       | F2    | 6    | 10     | 11   |         |    |

*Reservation Stations:*

| Time | Name  | Busy | S1   | S2   | RS    | RS |    |
|------|-------|------|------|------|-------|----|----|
|      |       |      | Op   | Vj   | Vk    | Qj | Qk |
|      | Add1  | No   |      |      |       |    |    |
|      | Add2  | No   |      |      |       |    |    |
|      | Add3  | No   |      |      |       |    |    |
|      | Mult1 | No   |      |      |       |    |    |
| 1    | Mult2 | Yes  | DIVD | M*F4 | M(A1) |    |    |

*Register result status:*

| Clock | F0 | F2   | F4    | F6 | F8            | F10   | F12 | ... | F30 |
|-------|----|------|-------|----|---------------|-------|-----|-----|-----|
| 55    | FU | M*F4 | M(A2) |    | (M-M+M (M-M)) | Mult2 |     |     |     |

# Tomasulo Example Cycle 56

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | Issue | Exec | Write  | Busy | Address |
|-------------|----------|----------|-------|------|--------|------|---------|
|             |          |          |       | Comp | Result |      |         |
| LD          | F6       | 34+      | R2    | 1    | 3      | 4    | Load1   |
| LD          | F2       | 45+      | R3    | 2    | 4      | 5    | Load2   |
| MULTD       | F0       | F2       | F4    | 3    | 15     | 16   | Load3   |
| SUBD        | F8       | F6       | F2    | 4    | 7      | 8    |         |
| DIVD        | F10      | F0       | F6    | 5    | 56     |      |         |
| ADDD        | F6       | F8       | F2    | 6    | 10     | 11   |         |

*Reservation Stations:*

| Time | Name  | Busy | S1   | S2   | RS    | RS |    |
|------|-------|------|------|------|-------|----|----|
|      |       |      | Op   | Vj   | Vk    | Qj | Qk |
|      | Add1  | No   |      |      |       |    |    |
|      | Add2  | No   |      |      |       |    |    |
|      | Add3  | No   |      |      |       |    |    |
|      | Mult1 | No   |      |      |       |    |    |
| 0    | Mult2 | Yes  | DIVD | M*F4 | M(A1) |    |    |

*Register result status:*

| Clock | F0 | F2   | F4    | F6            | F8    | F10 | F12 | ... | F30 |
|-------|----|------|-------|---------------|-------|-----|-----|-----|-----|
| 56    | FU | M*F4 | M(A2) | (M-M+M (M-M)) | Mult2 |     |     |     |     |

Mult2 is completing; what is waiting for it?

# Tomasulo Example Cycle 57

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | Op | Exec  |      |        | Busy  | Address |
|-------------|----------|----------|----|-------|------|--------|-------|---------|
|             |          |          |    | Issue | Comp | Result |       |         |
| LD          | F6       | 34+      | R2 | 1     | 3    | 4      | Load1 | No      |
| LD          | F2       | 45+      | R3 | 2     | 4    | 5      | Load2 | No      |
| MULTD       | F0       | F2       | F4 | 3     | 15   | 16     | Load3 | No      |
| SUBD        | F8       | F6       | F2 | 4     | 7    | 8      |       |         |
| DIVD        | F10      | F0       | F6 | 5     | 56   | 57     |       |         |
| ADDD        | F6       | F8       | F2 | 6     | 10   | 11     |       |         |

*Reservation Stations:*

| Time | Name  | Busy | Op   | <i>V<sub>j</sub></i> | <i>V<sub>k</sub></i> | <i>Q<sub>j</sub></i> | <i>Q<sub>k</sub></i> |
|------|-------|------|------|----------------------|----------------------|----------------------|----------------------|
|      | Add1  | No   |      |                      |                      |                      |                      |
|      | Add2  | No   |      |                      |                      |                      |                      |
|      | Add3  | No   |      |                      |                      |                      |                      |
|      | Mult1 | No   |      |                      |                      |                      |                      |
|      | Mult2 | Yes  | DIVD | M*F4                 | M(A1)                |                      |                      |

*Register result status:*

| Clock | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i>    | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|-------|-----------|-----------|-----------|-----------|--------------|------------|------------|-----|------------|
| 56    | <i>FU</i> | M*F4      | M(A2)     |           | (M-M+N)(M-M) | Result     |            |     |            |

Once again: In-order issue, out-of-order execution and completion.

# Compare to Scoreboard Cycle 62

*Instruction status:*

| Instruction | <i>j</i> | <i>k</i> | Read         |             |             |               | Write        |             |               |
|-------------|----------|----------|--------------|-------------|-------------|---------------|--------------|-------------|---------------|
|             |          |          | <i>Issue</i> | <i>Oper</i> | <i>Comp</i> | <i>Result</i> | <i>Issue</i> | <i>Comp</i> | <i>Result</i> |
| LD          | F6       | 34+      | R2           | 1           | 2           | 3             | 4            | 1           | 3             |
| LD          | F2       | 45+      | R3           | 5           | 6           | 7             | 8            | 2           | 4             |
| MULTD       | F0       | F2       | F4           | 6           | 9           | 19            | 20           | 3           | 15            |
| SUBD        | F8       | F6       | F2           | 7           | 9           | 11            | 12           | 4           | 7             |
| DIVD        | F10      | F0       | F6           | 8           | 21          | 61            | 62           | 5           | 56            |
| ADDD        | F6       | F8       | F2           | 13          | 14          | 16            | 22           | 6           | 10            |

- Why take longer on scoreboard/6600?
  - Structural Hazards
  - Lack of forwarding

# Tomasulo (IBM) versus Scoreboard (CDC)

- Issue window size=14
  - No issue on structural hazards
  - WAR, WAW avoided with renaming
  - Broadcast results from FU
  - Control distributed on RS
  - Allows loop unrolling in hw
- Issue window size=5
  - No issue on structural hazards
  - Stall the completion for WAW and WAR hazards
  - Results written back on registers.
  - Control centralized through the Scoreboard.



# Tomasulo Loop Example

|         |    |      |    |
|---------|----|------|----|
| Loop:LD | F0 | 0    | R1 |
| MULTDF4 | F0 | F2   |    |
| SD      | F4 | 0    | R1 |
| SUBI    | R1 | R1   | #8 |
| BNEZ    | R1 | Loop |    |

- Assume Multiply takes 4 clocks
- Assume first load takes 8 clocks (cache miss), second load takes 1 clock (hit)
- To be clear, will show clocks for SUBI, BNEZ
- Reality: integer instructions ahead

# Loop Example

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu |
|------|-------------|----|----|------------|------|--------|--------|------|----|
|      |             |    |    | Issue      | Comp | Result |        |      |    |
| 1    | LD F0       | 0  | R1 |            |      |        | Load1  | No   |    |
| 1    | MULTD F4    | F0 | F2 |            |      |        | Load2  | No   |    |
| 1    | SD F4       | 0  | R1 |            |      |        | Load3  | No   |    |
| 2    | LD F0       | 0  | R1 |            |      |        | Store1 | No   |    |
| 2    | MULTD F4    | F0 | F2 |            |      |        | Store2 | No   |    |
| 2    | SD F4       | 0  | R1 |            |      |        | Store3 | No   |    |

*Reservation Stations:*

| Time | Name  | Busy | Op | S1 S2 RS |    |    | Code:          |
|------|-------|------|----|----------|----|----|----------------|
|      |       |      |    | Vj       | Vk | Qj |                |
|      | Add1  | No   |    |          |    |    | LD F0 0 R1     |
|      | Add2  | No   |    |          |    |    | MULTD F4 F0 F2 |
|      | Add3  | No   |    |          |    |    | SD F4 0 R1     |
|      | Mult1 | No   |    |          |    |    | SUBI R1 R1 #8  |
|      | Mult2 | No   |    |          |    |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|----|----|----|----|-----|-----|-----|-----|
| 0     | 80 | Fu |    |    |    |    |     |     |     |     |

**Rename Table!**

# Loop Example Cycle 1

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |            | Busy   | Addr | Fu |
|------|-------------|----|----|------------|------------|--------|------|----|
|      |             |    |    | Issue      | CompResult |        |      |    |
| 1    | LD F0       | 0  | R1 | 1          |            | Load1  | Yes  | 80 |
| 1    | MULTD F4    | F0 | F2 |            |            | Load2  | No   |    |
| 1    | SD F4       | 0  | R1 |            |            | Load3  | No   |    |
| 2    | LD F0       | 0  | R1 |            |            | Store1 | No   |    |
| 2    | MULTD F4    | F0 | F2 |            |            | Store2 | No   |    |
| 2    | SD F4       | 0  | R1 |            |            | Store3 | No   |    |

*Reservation Stations:*

| Time | Name  | Busy | Op | S1 | S2 | RS | Code:           |
|------|-------|------|----|----|----|----|-----------------|
|      |       |      |    | Vj | Vk | Qj |                 |
|      | Add1  | No   |    |    |    |    | LD F0 0 R1      |
|      | Add2  | No   |    |    |    |    | MULTD F4 F0 F2  |
|      | Add3  | No   |    |    |    |    | SD F4 0 R1      |
|      | Mult1 | No   |    |    |    |    | SUBI R1 R1 #8   |
|      | Mult2 | No   |    |    |    |    | BNEZ R1 R1 Loop |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|----|----|-----|-----|-----|-----|
| 1     | 80 | Fu | Load1 |    |    |    |     |     |     |     |

# Loop Example Cycle 2

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu |
|------|-------------|----|----|------------|------|--------|--------|------|----|
|      |             |    |    | Issue      | Comp | Result |        |      |    |
| 1    | LD F0       | 0  | R1 | 1          |      |        | Load1  | Yes  | 80 |
| 1    | MULTD F4    | F0 | F2 | 2          |      |        | Load2  | No   |    |
| 1    | SD F4       | 0  | R1 |            |      |        | Load3  | No   |    |
| 2    | LD F0       | 0  | R1 |            |      |        | Store1 | No   |    |
| 2    | MULTD F4    | F0 | F2 |            |      |        | Store2 | No   |    |
| 2    | SD F4       | 0  | R1 |            |      |        | Store3 | No   |    |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2    | RS    | Code:          |
|------|-------|------|-------|----|-------|-------|----------------|
|      |       |      |       | Vj | Vk    | Qj    |                |
|      | Add1  | No   |       |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |       |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |       |       | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    | R(F2) | Load1 | SUBI R1 R1 #8  |
|      | Mult2 | No   |       |    |       |       | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4    | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|-------|----|----|-----|-----|-----|-----|
| 2     | 80 | Fu | Load1 | Mult1 |    |    |     |     |     |     |

# Loop Example Cycle 3

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |            | Busy   | Addr | Fu    |
|------|-------------|----|----|------------|------------|--------|------|-------|
|      |             |    |    | Issue      | CompResult |        |      |       |
| 1    | LD F0       | 0  | R1 | 1          |            | Load1  | Yes  | 80    |
| 1    | MULTD F4    | F0 | F2 | 2          |            | Load2  | No   |       |
| 1    | SD F4       | 0  | R1 | 3          |            | Load3  | No   |       |
| 2    | LD F0       | 0  | R1 |            |            | Store1 | Yes  | 80    |
| 2    | MULTD F4    | F0 | F2 |            |            | Store2 | No   | Mult1 |
| 2    | SD F4       | 0  | R1 |            |            | Store3 | No   |       |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2 | RS | Code:          |
|------|-------|------|-------|----|----|----|----------------|
|      |       |      |       | Vj | Vk | Qj |                |
|      | Add1  | No   |       |    |    |    | LD F0 0 R1     |
|      | Add2  | No   |       |    |    |    | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |    |    | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    |    |    | SUBI R1 R1 #8  |
|      | Mult2 | No   |       |    |    |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 3     | 80 | Fu | Load1 |    | Mult1 |    |     |     |     |     |

Implicit renaming sets up “DataFlow” graph

# What does this mean physically?



# Loop Example Cycle 4

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |            | Busy   | Addr   | Fu    |
|------|-------------|----|----|------------|------------|--------|--------|-------|
|      |             |    |    | Issue      | CompResult |        |        |       |
| 1    | LD F0       | 0  | R1 | 1          |            | Load1  | Yes 80 |       |
| 1    | MULTD F4    | F0 | F2 | 2          |            | Load2  | No     |       |
| 1    | SD F4       | 0  | R1 | 3          |            | Load3  | No     |       |
| 2    | LD F0       | 0  | R1 |            |            | Store1 | Yes 80 | Mult1 |
| 2    | MULTD F4    | F0 | F2 |            |            | Store2 | No     |       |
| 2    | SD F4       | 0  | R1 |            |            | Store3 | No     |       |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2    | RS    | Code:          |
|------|-------|------|-------|----|-------|-------|----------------|
|      |       |      |       | Vj | Vk    | Qj    |                |
|      | Add1  | No   |       |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |       |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |       |       | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    | R(F2) | Load1 | SUBI R1 R1 #8  |
|      | Mult2 | No   |       |    |       |       | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 4     | 80 | Fu | Load1 |    | Mult1 |    |     |     |     |     |

**Dispatching SUBI Instruction**

# Loop Example Cycle 5

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |            | Busy   | Addr   | Fu    |
|------|-------------|----|----|------------|------------|--------|--------|-------|
|      |             |    |    | Issue      | CompResult |        |        |       |
| 1    | LD F0       | 0  | R1 | 1          |            | Load1  | Yes 80 |       |
| 1    | MULTD F4    | F0 | F2 | 2          |            | Load2  | No     |       |
| 1    | SD F4       | 0  | R1 | 3          |            | Load3  | No     |       |
| 2    | LD F0       | 0  | R1 |            |            | Store1 | Yes 80 | Mult1 |
| 2    | MULTD F4    | F0 | F2 |            |            | Store2 | No     |       |
| 2    | SD F4       | 0  | R1 |            |            | Store3 | No     |       |

*Reservation Stations:*

| Time | Name  | Busy      | Op | S1          | S2 | RS | Code:          |
|------|-------|-----------|----|-------------|----|----|----------------|
|      |       |           |    | Vj          | Vk | Qj |                |
|      | Add1  | No        |    |             |    |    | LD F0 0 R1     |
|      | Add2  | No        |    |             |    |    | MULTD F4 F0 F2 |
|      | Add3  | No        |    |             |    |    | SD F4 0 R1     |
|      | Mult1 | Yes Multd |    | R(F2) Load1 |    |    | SUBI R1 R1 #8  |
|      | Mult2 | No        |    |             |    |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 5     | 72 | Fu | Load1 |    | Mult1 |    |     |     |     |     |

And, BNEZ instruction

# Loop Example Cycle 6

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |            | Busy   | Addr   | Fu    |
|------|-------------|----|----|------------|------------|--------|--------|-------|
|      |             |    |    | Issue      | CompResult |        |        |       |
| 1    | LD F0       | 0  | R1 | 1          |            | Load1  | Yes 80 |       |
| 1    | MULTD F4    | F0 | F2 | 2          |            | Load2  | Yes 72 |       |
| 1    | SD F4       | 0  | R1 | 3          |            | Load3  | No     |       |
| 2    | LD F0       | 0  | R1 | 6          |            | Store1 | Yes 80 | Mult1 |
| 2    | MULTD F4    | F0 | F2 |            |            | Store2 | No     |       |
| 2    | SD F4       | 0  | R1 |            |            | Store3 | No     |       |

*Reservation Stations:*

| Time | Name  | Busy      | Op | S1          | S2 | RS | Code:          |
|------|-------|-----------|----|-------------|----|----|----------------|
|      |       |           |    | Vj          | Vk | Qj |                |
|      | Add1  | No        |    |             |    |    | LD F0 0 R1     |
|      | Add2  | No        |    |             |    |    | MULTD F4 F0 F2 |
|      | Add3  | No        |    |             |    |    | SD F4 0 R1     |
|      | Mult1 | Yes Multd |    | R(F2) Load1 |    |    | SUBI R1 R1 #8  |
|      | Mult2 | No        |    |             |    |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 6     | 72 | Fu | Load2 |    | Mult1 |    |     |     |     |     |

Notice that F0 never sees Load from location 80

# Loop Example Cycle 7

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |            | Busy   | Addr | Fu    |
|------|-------------|----|----|------------|------------|--------|------|-------|
|      |             |    |    | Issue      | CompResult |        |      |       |
| 1    | LD          | F0 | 0  | R1         | 1          | Load1  | Yes  | 80    |
| 1    | MULTD       | F4 | F0 | F2         | 2          | Load2  | Yes  | 72    |
| 1    | SD          | F4 | 0  | R1         | 3          | Load3  | No   |       |
| 2    | LD          | F0 | 0  | R1         | 6          | Store1 | Yes  | 80    |
| 2    | MULTD       | F4 | F0 | F2         | 7          | Store2 | No   | Mult1 |
| 2    | SD          | F4 | 0  | R1         |            | Store3 | No   |       |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2    | RS    | Code:          |
|------|-------|------|-------|----|-------|-------|----------------|
|      |       |      |       | Vj | Vk    | Qj    |                |
|      | Add1  | No   |       |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |       |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |       |       | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    | R(F2) | Load1 | SUBI R1 R1 #8  |
|      | Mult2 | Yes  | Multd |    | R(F2) | Load2 | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 7     | 72 | Fu | Load2 |    | Mult2 |    |     |     |     |     |

Register file completely detached from iteration 1

# Loop Example Cycle 8

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |            | Busy   | Addr   | Fu    |
|------|-------------|----|----|------------|------------|--------|--------|-------|
|      |             |    |    | Issue      | CompResult |        |        |       |
| 1    | LD F0       | 0  | R1 | 1          |            | Load1  | Yes 80 |       |
| 1    | MULTD F4    | F0 | F2 | 2          |            | Load2  | Yes 72 |       |
| 1    | SD F4       | 0  | R1 | 3          |            | Load3  | No     |       |
| 2    | LD F0       | 0  | R1 | 6          |            | Store1 | Yes 80 | Mult1 |
| 2    | MULTD F4    | F0 | F2 | 7          |            | Store2 | Yes 72 | Mult2 |
| 2    | SD F4       | 0  | R1 | 8          |            | Store3 | No     |       |

*Reservation Stations:*

| Time | Name  | Busy      | Op | S1    | S2    | RS | Code:          |
|------|-------|-----------|----|-------|-------|----|----------------|
|      |       |           |    | Vj    | Vk    | Qj |                |
|      | Add1  | No        |    |       |       |    | LD F0 0 R1     |
|      | Add2  | No        |    |       |       |    | MULTD F4 F0 F2 |
|      | Add3  | No        |    |       |       |    | SD F4 0 R1     |
|      | Mult1 | Yes Multd |    | R(F2) | Load1 |    | SUBI R1 R1 #8  |
|      | Mult2 | Yes Multd |    | R(F2) | Load2 |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4    | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|-------|----|----|-----|-----|-----|-----|
| 8     | 72 | Fu | Load2 | Mult2 |    |    |     |     |     |     |

First and Second iteration completely overlapped

# What does this mean physically?

HPS



# Loop Example Cycle 9

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |            | Busy | Addr   | Fu           |
|------|-------------|----|----|------------|------------|------|--------|--------------|
|      |             |    |    | Issue      | CompResult |      |        |              |
| 1    | LD          | F0 | 0  | R1         | 1          | 9    | Load1  | Yes 80       |
| 1    | MULTD       | F4 | F0 | F2         | 2          |      | Load2  | Yes 72       |
| 1    | SD          | F4 | 0  | R1         | 3          |      | Load3  | No           |
| 2    | LD          | F0 | 0  | R1         | 6          |      | Store1 | Yes 80 Mult1 |
| 2    | MULTD       | F4 | F0 | F2         | 7          |      | Store2 | Yes 72 Mult2 |
| 2    | SD          | F4 | 0  | R1         | 8          |      | Store3 | No           |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2    | RS    | Code:          |
|------|-------|------|-------|----|-------|-------|----------------|
|      |       |      |       | Vj | Vk    | Qj    |                |
|      | Add1  | No   |       |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |       |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |       |       | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    | R(F2) | Load1 | SUBI R1 R1 #8  |
|      | Mult2 | Yes  | Multd |    | R(F2) | Load2 | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4    | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|-------|----|----|-----|-----|-----|-----|
| 9     | 72 | Fu | Load2 | Mult2 |    |    |     |     |     |     |

Load1 completing: who is waiting?  
Note: Dispatching SUBI

# Loop Example Cycle 10

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr   | Fu    |
|------|-------------|----|----|------------|------|--------|--------|--------|-------|
|      |             |    |    | Issue      | Comp | Result |        |        |       |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No     |       |
| 1    | MULTD F4    | F0 | F2 | 2          |      |        | Load2  | Yes 72 |       |
| 1    | SD F4       | 0  | R1 | 3          |      |        | Load3  | No     |       |
| 2    | LD F0       | 0  | R1 | 6          | 10   |        | Store1 | Yes 80 | Mult1 |
| 2    | MULTD F4    | F0 | F2 | 7          |      |        | Store2 | Yes 72 | Mult2 |
| 2    | SD F4       | 0  | R1 | 8          |      |        | Store3 | No     |       |

*Reservation Stations:*

| Time | Name  | Busy | Op                | S1 | S2    | RS    | Code:          |
|------|-------|------|-------------------|----|-------|-------|----------------|
|      |       |      |                   | Vj | Vk    | Qj    |                |
|      | Add1  | No   |                   |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |                   |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |                   |    |       |       | SD F4 0 R1     |
| 4    | Mult1 | Yes  | Multd M[80] R(F2) |    |       |       | SUBI R1 R1 #8  |
|      | Mult2 | Yes  | Multd             |    | R(F2) | Load2 | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 10    | 64 | Fu | Load2 |    | Mult2 |    |     |     |     |     |

Load2 completing: who is waiting?

Note: Dispatching BNEZ

# Loop Example Cycle 11

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu    |
|------|-------------|----|----|------------|------|--------|--------|------|-------|
|      |             |    |    | Issue      | Comp | Result |        |      |       |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |       |
| 1    | MULTD F4    | F0 | F2 | 2          |      |        | Load2  | No   |       |
| 1    | SD F4       | 0  | R1 | 3          |      |        | Load3  | Yes  | 64    |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | Yes  | 80    |
| 2    | MULTD F4    | F0 | F2 | 7          |      |        | Store2 | Yes  | 72    |
| 2    | SD F4       | 0  | R1 | 8          |      |        | Store3 | No   | Mult2 |

*Reservation Stations:*

| Time | Name  | Busy | Op                | S1 | S2 | RS | Code:          |
|------|-------|------|-------------------|----|----|----|----------------|
|      |       |      |                   | Vj | Vk | Qj |                |
|      | Add1  | No   |                   |    |    |    | LD F0 0 R1     |
|      | Add2  | No   |                   |    |    |    | MULTD F4 F0 F2 |
|      | Add3  | No   |                   |    |    |    | SD F4 0 R1     |
| 3    | Mult1 | Yes  | Multd M[80] R(F2) |    |    |    | SUBI R1 R1 #8  |
| 4    | Mult2 | Yes  | Multd M[72] R(F2) |    |    |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 11    | 64 | Fu | Load3 |    | Mult2 |    |     |     |     |     |

Next load in sequence

# Loop Example Cycle 12

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu    |
|------|-------------|----|----|------------|------|--------|--------|------|-------|
|      |             |    |    | Issue      | Comp | Result |        |      |       |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |       |
| 1    | MULTD F4    | F0 | F2 | 2          |      |        | Load2  | No   |       |
| 1    | SD F4       | 0  | R1 | 3          |      |        | Load3  | Yes  | 64    |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | Yes  | 80    |
| 2    | MULTD F4    | F0 | F2 | 7          |      |        | Store2 | Yes  | 72    |
| 2    | SD F4       | 0  | R1 | 8          |      |        | Store3 | No   | Mult2 |

*Reservation Stations:*

| Time | Name  | Busy | Op                | S1 | S2 | RS | Code:          |
|------|-------|------|-------------------|----|----|----|----------------|
|      |       |      |                   | Vj | Vk | Qj |                |
|      | Add1  | No   |                   |    |    |    | LD F0 0 R1     |
|      | Add2  | No   |                   |    |    |    | MULTD F4 F0 F2 |
|      | Add3  | No   |                   |    |    |    | SD F4 0 R1     |
| 2    | Mult1 | Yes  | Multd M[80] R(F2) |    |    |    | SUBI R1 R1 #8  |
| 3    | Mult2 | Yes  | Multd M[72] R(F2) |    |    |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 12    | 64 | Fu | Load3 |    | Mult2 |    |     |     |     |     |

Why not issue third multiply?

# Loop Example Cycle 13

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu |
|------|-------------|----|----|------------|------|--------|--------|------|----|
|      |             |    |    | Issue      | Comp | Result |        |      |    |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |    |
| 1    | MULTD F4    | F0 | F2 | 2          |      |        | Load2  | No   |    |
| 1    | SD F4       | 0  | R1 | 3          |      |        | Load3  | Yes  | 64 |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | Yes  | 80 |
| 2    | MULTD F4    | F0 | F2 | 7          |      |        | Store2 | Yes  | 72 |
| 2    | SD F4       | 0  | R1 | 8          |      |        | Store3 | No   |    |

*Reservation Stations:*

| Time | Name  | Busy | Op                | S1 | S2 | RS | Code:          |
|------|-------|------|-------------------|----|----|----|----------------|
|      |       |      |                   | Vj | Vk | Qj |                |
|      | Add1  | No   |                   |    |    |    | LD F0 0 R1     |
|      | Add2  | No   |                   |    |    |    | MULTD F4 F0 F2 |
|      | Add3  | No   |                   |    |    |    | SD F4 0 R1     |
| 1    | Mult1 | Yes  | Multd M[80] R(F2) |    |    |    | SUBI R1 R1 #8  |
| 2    | Mult2 | Yes  | Multd M[72] R(F2) |    |    |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 13    | 64 | Fu | Load3 |    | Mult2 |    |     |     |     |     |

# Loop Example Cycle 14

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu    |
|------|-------------|----|----|------------|------|--------|--------|------|-------|
|      |             |    |    | Issue      | Comp | Result |        |      |       |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |       |
| 1    | MULTD F4    | F0 | F2 | 2          | 14   |        | Load2  | No   |       |
| 1    | SD F4       | 0  | R1 | 3          |      |        | Load3  | Yes  | 64    |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | Yes  | 80    |
| 2    | MULTD F4    | F0 | F2 | 7          |      |        | Store2 | Yes  | 72    |
| 2    | SD F4       | 0  | R1 | 8          |      |        | Store3 | No   | Mult2 |

*Reservation Stations:*

| Time | Name  | Busy | Op                | S1 | S2 | RS | Code:          |
|------|-------|------|-------------------|----|----|----|----------------|
|      |       |      |                   | Vj | Vk | Qj |                |
|      | Add1  | No   |                   |    |    |    | LD F0 0 R1     |
|      | Add2  | No   |                   |    |    |    | MULTD F4 F0 F2 |
|      | Add3  | No   |                   |    |    |    | SD F4 0 R1     |
| 0    | Mult1 | Yes  | Multd M[80] R(F2) |    |    |    | SUBI R1 R1 #8  |
| 1    | Mult2 | Yes  | Multd M[72] R(F2) |    |    |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 14    | 64 | Fu | Load3 |    | Mult2 |    |     |     |     |     |

- Mult1 completing. Who is waiting?

# Loop Example Cycle 15

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu    |
|------|-------------|----|----|------------|------|--------|--------|------|-------|
|      |             |    |    | Issue      | Comp | Result |        |      |       |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |       |
| 1    | MULTD F4    | F0 | F2 | 2          | 14   | 15     | Load2  | No   |       |
| 1    | SD F4       | 0  | R1 | 3          |      |        | Load3  | Yes  | 64    |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | Yes  | 80    |
| 2    | MULTD F4    | F0 | F2 | 7          | 15   |        | Store2 | Yes  | 72    |
| 2    | SD F4       | 0  | R1 | 8          |      |        | Store3 | No   | Mult2 |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1    | S2    | RS | Code:          |
|------|-------|------|-------|-------|-------|----|----------------|
|      |       |      |       | Vj    | Vk    | Qj |                |
|      | Add1  | No   |       |       |       |    | LD F0 0 R1     |
|      | Add2  | No   |       |       |       |    | MULTD F4 F0 F2 |
|      | Add3  | No   |       |       |       |    | SD F4 0 R1     |
|      | Mult1 | No   |       |       |       |    | SUBI R1 R1 #8  |
| 0    | Mult2 | Yes  | Multd | M[72] | R(F2) |    | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 15    | 64 | Fu | Load3 |    | Mult2 |    |     |     |     |     |

- Mult2 completing. Who is waiting?

# Loop Example Cycle 16

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu |
|------|-------------|----|----|------------|------|--------|--------|------|----|
|      |             |    |    | Issue      | Comp | Result |        |      |    |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |    |
| 1    | MULTD F4    | F0 | F2 | 2          | 14   | 15     | Load2  | No   |    |
| 1    | SD F4       | 0  | R1 | 3          |      |        | Load3  | Yes  | 64 |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | Yes  | 80 |
| 2    | MULTD F4    | F0 | F2 | 7          | 15   | 16     | Store2 | Yes  | 72 |
| 2    | SD F4       | 0  | R1 | 8          |      |        | Store3 | No   |    |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2    | RS    | Code:          |
|------|-------|------|-------|----|-------|-------|----------------|
|      |       |      |       | Vj | Vk    | Qj    |                |
|      | Add1  | No   |       |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |       |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |       |       | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    | R(F2) | Load3 | SUBI R1 R1 #8  |
|      | Mult2 | No   |       |    |       |       | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 16    | 64 | Fu | Load3 |    | Mult1 |    |     |     |     |     |

# Loop Example Cycle 17

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu |
|------|-------------|----|----|------------|------|--------|--------|------|----|
|      |             |    |    | Issue      | Comp | Result |        |      |    |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |    |
| 1    | MULTD F4    | F0 | F2 | 2          | 14   | 15     | Load2  | No   |    |
| 1    | SD F4       | 0  | R1 | 3          |      |        | Load3  | Yes  | 64 |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | Yes  | 80 |
| 2    | MULTD F4    | F0 | F2 | 7          | 15   | 16     | Store2 | Yes  | 72 |
| 2    | SD F4       | 0  | R1 | 8          |      |        | Store3 | Yes  | 64 |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2    | RS    | Code:          |
|------|-------|------|-------|----|-------|-------|----------------|
|      |       |      |       | Vj | Vk    | Qj    |                |
|      | Add1  | No   |       |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |       |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |       |       | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    | R(F2) | Load3 | SUBI R1 R1 #8  |
|      | Mult2 | No   |       |    |       |       | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 17    | 64 | Fu | Load3 |    | Mult1 |    |     |     |     |     |

# Loop Example Cycle 18

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu    |
|------|-------------|----|----|------------|------|--------|--------|------|-------|
|      |             |    |    | Issue      | Comp | Result |        |      |       |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |       |
| 1    | MULTD F4    | F0 | F2 | 2          | 14   | 15     | Load2  | No   |       |
| 1    | SD F4       | 0  | R1 | 3          | 18   |        | Load3  | Yes  | 64    |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | Yes  | 80    |
| 2    | MULTD F4    | F0 | F2 | 7          | 15   | 16     | Store2 | Yes  | 72    |
| 2    | SD F4       | 0  | R1 | 8          |      |        | Store3 | Yes  | 64    |
|      |             |    |    |            |      |        |        |      | Mult1 |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2    | RS    | Code:          |
|------|-------|------|-------|----|-------|-------|----------------|
|      |       |      |       | Vj | Vk    | Qj    |                |
|      | Add1  | No   |       |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |       |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |       |       | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    | R(F2) | Load3 | SUBI R1 R1 #8  |
|      | Mult2 | No   |       |    |       |       | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 18    | 64 | Fu | Load3 |    | Mult1 |    |     |     |     |     |

# Loop Example Cycle 19

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu |
|------|-------------|----|----|------------|------|--------|--------|------|----|
|      |             |    |    | Issue      | Comp | Result |        |      |    |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |    |
| 1    | MULTD F4    | F0 | F2 | 2          | 14   | 15     | Load2  | No   |    |
| 1    | SD F4       | 0  | R1 | 3          | 18   | 19     | Load3  | Yes  | 64 |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | No   |    |
| 2    | MULTD F4    | F0 | F2 | 7          | 15   | 16     | Store2 | Yes  | 72 |
| 2    | SD F4       | 0  | R1 | 8          | 19   |        | Store3 | Yes  | 64 |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2    | RS    | Code:          |
|------|-------|------|-------|----|-------|-------|----------------|
|      |       |      |       | Vj | Vk    | Qj    |                |
|      | Add1  | No   |       |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |       |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |       |       | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    | R(F2) | Load3 | SUBI R1 R1 #8  |
|      | Mult2 | No   |       |    |       |       | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 19    | 64 | Fu | Load3 |    | Mult1 |    |     |     |     |     |

# Loop Example Cycle 20

*Instruction status:*

| ITER | Instruction | j  | k  | Exec Write |      |        | Busy   | Addr | Fu    |
|------|-------------|----|----|------------|------|--------|--------|------|-------|
|      |             |    |    | Issue      | Comp | Result |        |      |       |
| 1    | LD F0       | 0  | R1 | 1          | 9    | 10     | Load1  | No   |       |
| 1    | MULTD F4    | F0 | F2 | 2          | 14   | 15     | Load2  | No   |       |
| 1    | SD F4       | 0  | R1 | 3          | 18   | 19     | Load3  | Yes  | 64    |
| 2    | LD F0       | 0  | R1 | 6          | 10   | 11     | Store1 | No   |       |
| 2    | MULTD F4    | F0 | F2 | 7          | 15   | 16     | Store2 | No   |       |
| 2    | SD F4       | 0  | R1 | 8          | 19   | 20     | Store3 | Yes  | 64    |
|      |             |    |    |            |      |        |        |      | Mult1 |

*Reservation Stations:*

| Time | Name  | Busy | Op    | S1 | S2    | RS    | Code:          |
|------|-------|------|-------|----|-------|-------|----------------|
|      |       |      |       | Vj | Vk    | Qj    |                |
|      | Add1  | No   |       |    |       |       | LD F0 0 R1     |
|      | Add2  | No   |       |    |       |       | MULTD F4 F0 F2 |
|      | Add3  | No   |       |    |       |       | SD F4 0 R1     |
|      | Mult1 | Yes  | Multd |    | R(F2) | Load3 | SUBI R1 R1 #8  |
|      | Mult2 | No   |       |    |       |       | BNEZ R1 Loop   |

*Register result status*

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|-------|----|-------|----|-----|-----|-----|-----|
| 20    | 64 | Fu | Load3 |    | Mult1 |    |     |     |     |     |

# Why can Tomasulo overlap iterations of loops?

- Register renaming
  - ▶ Multiple iterations use different physical destinations for registers (dynamic loop unrolling).
  - ▶ Replace static register names from code with dynamic register “pointers”
  - ▶ Effectively increases size of register file
  - ▶ Permit instruction issue to advance past integer control flow operations.
- Crucial: integer unit must “get ahead” of floating point unit so that we can issue multiple iterations  
⇒ Branch Prediction

Other idea: Tomasulo building “DataFlow” graph.



# Unrolled Loop That Minimizes Stalls

|    |          |                           |
|----|----------|---------------------------|
| 1  | Loop: LD | F0 , 0 (R1)               |
| 2  | LD       | F6 , -8 (R1)              |
| 3  | LD       | F10 , -16 (R1)            |
| 4  | LD       | F14 , -24 (R1)            |
| 5  | ADDD     | F4 , F0 , F2              |
| 6  | ADDD     | F8 , F6 , F2              |
| 7  | ADDD     | F12 , F10 , F2            |
| 8  | ADDD     | F16 , F14 , F2            |
| 9  | SD       | 0 (R1) , F4               |
| 10 | SD       | -8 (R1) , F8              |
| 11 | SD       | -16 (R1) , F12            |
| 12 | SUBI     | R1 , R1 , #32             |
| 13 | BNEZ     | R1 , LOOP                 |
| 14 | SD       | 8 (R1) , F16 ; 8-32 = -24 |

14 clock cycles, or 3.5 per iteration

Used new registers => register renaming!

# Why issue in-order?

- In-order issue permits us to analyze data flow of program
  - ▶ Know which results flow to which **subsequent** instructions
  - ▶ If we issued out-of-order, we would confuse RAW and WAR hazards!
- This idea works perfectly well “in principle” with **multiple instructions issued per clock**:
  - ▶ Need to multi-port “rename table” and be able to rename a sequence of instructions together
  - ▶ Need to be able to issue to multiple reservation stations in a single cycle.
  - ▶ Need to have 2x number of read ports and 1x number of write ports in register file.
- In-order issue can be serious bottleneck when issuing **multiple instructions per clock-cycle**



# Now what about exceptions???

- Out-of-order commit really messes up our chance to get precise exceptions!
  - ▶ Register file contains results from later instructions while earlier ones have not completed yet.
  - ▶ What if need to cause exception on one of those early instructions??
- Need to “rollback” register file to consistent state:
  - ▶ Recall: “precise” interrupt means that there is some PC such that:
    - all instructions before have committed results
    - and none after have committed results.
- Technique for precise exceptions: *in-order completion or commit*
  - ▶ Must commit instruction results in same order as issue



# Limits to the Instruction Level Parallelism

- Branches
- Exceptions
  - ▶ (non-)Precise: operand integrity for the exception handler
  - ▶ (non-)Exact: handler modifications are seen by instructions after the exception



# Tomasulo Drawbacks

- Complexity
  - ▶ Large amount of hardware
  - ▶ delays of 360/91, MIPS 10000, IBM 620?
- Many associative stores (CDB) at high speed
- Performance limited by Common Data Bus
  - ▶ Multiple CDBs => more FU logic for parallel assoc stores

H  
P  
P  
S

# Summary #1

---

- HW exploiting ILP

- ▶ Works when can't know dependence at compile time.
- ▶ Code for one machine runs well on another

Key idea of Scoreboard: Allow instructions behind stall to proceed

(Decode => Issue instr & read operands)

- ▶ Enables out-of-order execution => out-of-order completion
- ▶ ID stage checked both for structural & data dependencies
- ▶ Original version didn't handle forwarding (CDC 6600 in 1963)
- ▶ No automatic register renaming
- ▶ Pipeline stalls for WAR and WAW hazards.
- ▶ Are these fundamental limitations??? (No)



# Summary #2

- Reservations stations: *renaming* to larger set of registers + buffering source operands
  - ▶ Prevents registers as bottleneck
  - ▶ Avoids WAR, WAW hazards of Scoreboard
  - ▶ Allows loop unrolling in HW
- Not limited to basic blocks  
(integer units gets ahead, beyond branches)
- Helps cache misses as well
- Lasting Contributions
  - ▶ Dynamic scheduling
  - ▶ Register renaming
  - ▶ Load/store disambiguation
- 360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264

H  
P  
S

# Problem: “Fetch” unit

H  
P  
T  
P  
S



Instruction fetch decoupled from execution

Often issue logic (+ rename) included with Fetch

# Branches must be resolved quickly for loop overlap!

- In our loop-unrolling example, we relied on the fact that branches were under control of “fast” integer unit in order to get overlap!

| Loop: | LD    | F0 | 0    | R1 |
|-------|-------|----|------|----|
|       | MULTD | F4 | F0   | F2 |
|       | SD    | F4 | 0    | R1 |
|       | SUBI  | R1 | R1   | #8 |
|       | BNEZ  | R1 | Loop |    |

- What happens if branch depends on result of multd??
  - We completely lose all of our advantages!
  - Need to be able to “predict” branch outcome.
  - If we were to predict that branch was taken, this would be right most of the time.
- Problem **much** worse for superscalar machines!

# Prediction: Branches, Dependencies, Data

- Prediction has become essential to getting good performance from scalar instruction streams.
- We will discuss predicting branches. However, architects are now predicting everything:  
*data dependencies, actual data, and results of groups of instructions:*
  - ▶ At what point does computation become a probabilistic operation + verification?
  - ▶ We are pretty close with control hazards already...
- Why does prediction work?
  - ▶ Underlying algorithm has regularities.
  - ▶ Data that is being operated on has regularities.
  - ▶ Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems.
- Prediction ⇒ Compressible information streams?



# Hardware-Based Speculation

- When an instruction is no longer speculative, we allow it to update the register file or memory (**instruction commit**).
- The key idea behind speculation is to allow instructions to execute **out-of-order** but to force them to commit **in-order** and to prevent any irrevocable action (such as updating state or taking an exception) until an instruction commits.
- **Reorder buffer (ROB)** to hold the results of instructions that have completed execution but have not committed or to pass results among instructions that may be speculated.



# Relationship between precise interrupts and speculation

- Speculation is a form of guessing.
- Important for branch prediction:
  - ▶ Need to “take our best shot” at predicting branch direction.
  - ▶ If we issue multiple instructions per cycle, lose lots of potential instructions otherwise:
    - Consider 4 instructions per cycle
    - If take single cycle to decide on branch, waste from 4 - 7 instruction slots!
- If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly:
  - ▶ This is exactly same as precise exceptions!
- Technique for both precise interrupts/exceptions and speculation: *in-order completion or commit*

H  
P  
P  
S

# What about Precise Exceptions/Interrupts?

- Both Scoreboard and Tomasulo have:
  - ▶ *In-order issue, out-of-order execution, out-of-order completion*
- To allow for precise interrupts there is the need for a way to resynchronize execution with instruction stream (i.e. with issue-order)
  - ▶ Easiest way is with *in-order completion (i.e. reorder buffer)*
  - ▶ Other Techniques (Smith paper): Future File, History Buffer



# HW support for precise interrupts

- Concept of Reorder Buffer (ROB):
  - ▶ Holds instructions in FIFO order, exactly as they were issued
    - Each ROB entry contains PC, dest reg, result, exception status
  - ▶ When instructions complete, results placed into ROB
    - Supplies operands to other instruction between execution complete & commit  $\Rightarrow$  more registers like RS
    - Tag results with ROB buffer number instead of reservation station
  - ▶ Instructions **commit**  $\Rightarrow$  values at head of ROB placed in registers
  - ▶ As a result, easy to undo speculated instructions on mispredicted branches or on exceptions



# Reorder buffer



# HW support for more ILP

- *Reorder buffer also useful for speculation*
- *Speculation*: Allow an instruction *without* any consequences (including exceptions) if branch is not actually taken (“HW undo”); called “boosting”
- Combine branch prediction to choose which instructions to execute with dynamic scheduling to execute before branches resolved
- Separate *speculative* bypassing of results from real bypassing of results
  - ▶ When instruction no longer speculative, write boosted results (instruction commit) or discard boosted results
  - ▶ Execute out-of-order but commit in-order to prevent irrevocable action (update state or exception) until instruction commits



# ReOrder Buffer (ROB)

- Buffer to hold the results of instructions that have finished execution but non committed
- Buffer to pass results among instructions that can be speculated
- Support *out-of-order* execution but *in-order* commit
- Speculative Tomasulo Algorithm with ROB:
  - ▶ Pointers are directed toward ROB slot.
  - ▶ A register or memory is updated only when the instruction reaches the head of ROB (that is until the instruction is no longer speculative).



# ReOrder Buffer (ROB)

- ROB completely replaces the store buffers
- The renaming function of Reservation Stations is replaced by ROB
- Reservation Stations now used only to buffer instructions and operands to Fus (to reduce structural hazards).
- Pointers now are directed toward ROB slots
- Processors with ROB can dynamically execute while maintaining precise interrupt model because instruction commit happens in order.

H  
P  
P  
S

# ReOrder Buffer

- 4 fields:
  - ▶ **Instruction Type**
  - ▶ **Destination:**
    - RF number (for load and ALU ops)
    - Memory address (for stores)
  - ▶ **Value:** To hold the value of instruction results until the instruction commits
  - ▶ **Ready:** Indicates that the instruction has completed execution and the value is ready



# ReOrder Buffer

- Reorder buffer can be operand source  $\Rightarrow$  More registers like reservation station
- Use reorder buffer number instead of reservation station when execution completes
- Supplies operands between execution complete & commit
- Once operand commits, result is put into register
- Instructions commit
- As a result, its easy to undo speculated instructions on mispredicted branches or on exceptions



# ReOrder Buffer (ROB)

- Originally (1988) introduced to solve *precise interrupt* problem; generalized to grant sequential consistency;
- Basically, ROB is a *circular buffer* with *head pointer* (indicating next free entry) and *tail pointer* indicating the instruction that will *commit* (leaving ROB) *first*.

H  
P  
P  
S

# ReOrder Buffer (ROB)

- Instructions are written in ROB in *strict program order* - when an instruction is issued, an entry is allocated to it *in sequence*. Entry indicates *status of instruction*: *issued (i)*, *in execution (x)*, *finished (f)* (+ *other items!*),
- An instruction can **commit (retire)** iff
  1. It has finished, and
  2. All previous instructions have already retired.

# ReOrder Buffer (ROB)



# ReOrder Buffer (ROB)

- Only retiring instructions can *complete*, i.e., update architectural registers and memory;
- ROB can support *both speculative execution and exception handling*
- Speculative execution: each ROB entry is extended to include a *speculative status field*, indicating whether instruction has been executed speculatively;
- Finished instructions *cannot retire as long as they are in speculative status*.
- Interrupt handling: Exceptions generated in connection with instruction execution are made *precise* by accepting exception request only when instruction becomes “next to retire” (exceptions are processed *in order*)



# What are the hardware complexities with reorder buffer (ROB)?

IPS



How do you find the latest version of a register?

- Need associative comparison network
- Could use future file or just use the register result status buffer to track which specific reorder buffer has received the value

Need as many ports on ROB as register file

# Hardware-based Speculation

- Outcome of branches is *speculated* and program is *executed as if* speculation was correct (simple dynamic scheduling would only fetch and decode, not execute!)
- Mechanisms are necessary to handle incorrect speculation - *hardware speculation* extend dynamic scheduling.

H  
P  
S

# HW-based Speculation

- HW-based Speculation combines 3 ideas:
  - ▶ **Dynamic Branch Prediction** to choose which instruction to execute
  - ▶ **Speculation** to execute instructions before control dependences are resolved
  - ▶ **Dynamic Scheduling** supporting out-of-order execution but in-order commit to prevent any irrevocable actions (such as register update or taking exception) until an instruction commits

H  
P  
P  
S

# Hardware-based Speculation

- Issue an instruction dependent on branch before the branch result is known.
- Commit is always made *in order*.
- Commit of a speculative instruction is made only when the branch outcome is known.
- The same holds for exceptions (synchronous or asynchronous) deviations of control flow
- Follows the predicted flow of data values to choose when to execute an instruction;
- Essentially, a ***data flow*** mode of execution: instructions execute as soon as their operands are available.



# Hardware-based Speculation

- Adopted in PowerPc 603/604, MIPS R10000/R12000, Pentium II/III/4, AMD K5/K6 Athlon.
- Extends hardware support for Tomasulo algorithm: to support speculation, *commit* phase is separated from execution phase, *reorder buffer* is introduced.



# Hardware-based Speculation

- Basic Tomasulo algorithm: instruction writes result in register file, where subsequent instructions find it:
- With speculation, results are written only when instruction ***commits*** - and it is known whether the instruction had to be executed.
- Key idea: executing ***out of order***, committing ***in order***.
- *Boosting*

H  
P  
P  
S

# Four Steps of Speculative Tomasulo Algorithm

## 1. Issue—get instruction from FP Op Queue

If reservation station **and reorder buffer slot** free, issue instr & send operands **& reorder buffer no.** for destination (this stage sometimes called “dispatch”)

## 2. Execution—operate on operands (EX)

When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”)

## 3. Write result—finish execution (WB)

Write on Common Data Bus to all awaiting FUs  
**& reorder buffer**; mark reservation station available.

## 4. Commit—update register with reorder result

When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)



# Speculative Tomasulo's Algorithm

- Tomasulo's “Boosting” needs a buffer for uncommitted results (reorder buffer).

Each entry in ROB contains **four fields**:

- **Instruction type field** - indicates whether instruction is a branch (no destination result), a store (has memory address destination), or a load/ALU (register destination)
- **Destination field**: supplies register number (for loads and ALU instructions) or memory address (for stores) where results should be written;
- **Value field** (used to hold value of result until instruction commits)
- **Ready field**: indicates that instruction has completed execution, value is ready



# ReOrder Buffer Extension

- ROB completely replaces store buffers: stores execute in two steps, the second one when instruction commits;
- Renaming function of reservation stations completely replaced by ROB;
- Reservation stations now only queue operations (and operands) to Fus between the time they issue and the time they begin execution;
- Results are tagged with ROB entry number rather than with RS number  $\Rightarrow$  ROB entry assigned to instruction must be tracked in the reservation stations.



# ReOrder Buffer Extension

- All instructions ***excluding incorrectly predicted branches*** (or incorrectly speculated loads) commit when reaching head of ROB;
- Incorrectly predicted branches reaches head of ROB  $\Rightarrow$  wrong speculation is indicated, ROB is flushed, execution restarts at correct successor of branch.
- Speculative actions are easily undone.
- Processors with ROB can dynamically execute while maintaining a precise interrupt model:
  - ▶ if instruction  $I_j$  causes interrupt, CPU waits until  $I_j$  reaches the head of the ROB and takes the interrupt, flushing all other pending instructions.



# Steps of Speculative Tomasulo's Algorithm (1)

1. **Issue:** get an instruction from the queue. RS && ROB must have a slot free. When Rob is full  $\Rightarrow$  Stop issuing instructions until an entry is free.  
Dispatch the operation indicating in which ROB slot it must write
2. **Execution:** When both operands ready, execute. If not watch in the CDB.
3. **Write Result:** Write on CDB and on ROB as well as to any RS waiting for this result.  
Mark the RS available.

H  
P  
S

# Steps of Speculative Tomasulo's Algorithm (2)

## 4. Commit: 3 different possible sequences:

1. **Normal commit**: instruction reaches the head of the ROB, result is present in the buffer. Result is stored in the register, instruction is removed from ROB;
2. **Store commit**: as above, but memory rather than register is updated;
3. **Instruction is a branch with incorrect prediction**: it indicates that speculation was wrong. ROB is flushed (“graduation”), execution restarts at correct successor of the branch.  
If the branch was correctly predicted, branch is finished



# Exception Handling

- Not recognize the exception until it is ready to commit
- If speculated instruction raises an exception record in the ROB
- If branch misprediction and the instruction should not have been executed flush exception
- If the instruction reaches the head of ROB and it is no longer speculative, the exception should be taken

H  
P  
P  
S

# Tomasulo With Reorder buffer

UTP



# Tomasulo With Reorder buffer

HIPS



# Tomasulo With Reorder buffer:

Done?

FPS

FP Op Queue



Reorder Buffer



Newest  
Oldest

Dest

|   |      |               |
|---|------|---------------|
| 2 | ADDD | R (F4) , ROB1 |
|   |      |               |

FP adders

Dest

|   |      |              |
|---|------|--------------|
| 3 | DIVD | ROB2, R (F6) |
|   |      |              |

FP multipliers

To Memory

from Memory

Dest

|   |       |
|---|-------|
| 1 | 10+R2 |
|   |       |

Reservation Stations

# Tomasulo With Reorder buffer:

Done?

FPS

Reorder Buffer



|     |                  |   | ROB7 |
|-----|------------------|---|------|
| F0  | ADDD F0, F4, F6  | N | ROB6 |
| F4  | LD F4, 0 (R3)    | N | ROB5 |
| --  | BNE F2, <...>    | N | ROB4 |
| F2  | DIVD F2, F10, F6 | N | ROB3 |
| F10 | ADDD F10, F4, F0 | N | ROB2 |
| F0  | LD F0, 10 (R2)   | N | ROB1 |

Newest



Oldest

Registers

Dest

|   |      |             |
|---|------|-------------|
| 2 | ADDD | R(F4), ROB1 |
| 6 | ADDD | ROB5, R(F6) |
|   |      |             |

Dest

|   |      |             |
|---|------|-------------|
| 3 | DIVD | ROB2, R(F6) |
|   |      |             |
|   |      |             |

Reservation Stations

FP adders

FP multipliers

To  
Memory

from  
Memory

Dest

|   |       |
|---|-------|
| 1 | 10+R2 |
| 6 | 0+R3  |
|   |       |

# Tomasulo With Reorder buffer:

FPS



# Tomasulo With Reorder buffer

HPS



# Tomasulo With Reorder buffer

HIPS

FP Op Queue



Reorder Buffer

|     |        |                  | Done? |
|-----|--------|------------------|-------|
| --  | M[10]  | ST 0 (R3) , F4   | Y     |
| F0  | <val2> | ADDD F0, F4, F6  | Ex    |
| F4  | M[10]  | LD F4, 0 (R3)    | Y     |
| --  |        | BNE F2, <...>    | N     |
| F2  |        | DIVD F2, F10, F6 | N     |
| F10 |        | ADDD F10, F4, F0 | N     |
| F0  |        | LD F0, 10 (R2)   | N     |

Newest



Oldest

Registers

Dest

|   |      |               |
|---|------|---------------|
| 2 | ADDD | R (F4) , ROB1 |
|   |      |               |

FP adders

Reservation Stations

Dest

|   |      |               |
|---|------|---------------|
| 3 | DIVD | ROB2 , R (F6) |
|   |      |               |

FP multipliers

To Memory

from Memory

Dest

|   |       |
|---|-------|
| 1 | 10+R2 |
|   |       |

# Tomasulo With Reorder buffer

HIPS



## Memory Disambiguation: Sorting out RAW Hazards in memory

- Question: Given a load that follows a store in program order, are the two related?
  - ▶ (Alternatively: is there a RAW hazard between the store and the load)?  
Eg:      st      0 (R2) , R5  
              ld      R6 , 0 (R3)
- Can we go ahead and start the load early?
  - ▶ Store address could be delayed for a long time by some calculation that leads to R2 (divide?).
  - ▶ We might want to issue/begin execution of both operations in same cycle.
  - ▶ Today: Answer is that we are not allowed to start load until we know that address  $0(R2) \neq 0(R3)$
  - ▶ Next Week: We might guess at whether or not they are dependent (called “dependence speculation”) and use reorder buffer to fixup if we are wrong.



# Hardware Support for Memory Disambiguation

- Need buffer to keep track of all outstanding stores to memory, in program order.
  - ▶ Keep track of address (when becomes available) and value (when becomes available)
  - ▶ FIFO ordering: will retire stores from this buffer in program order
- When issuing a load, record current head of store queue (know which stores are ahead of you).
- When have address for load, check store queue:
  - ▶ If *any* store prior to load is waiting for its address, stall load.
  - ▶ If load address matches earlier store address (associative lookup), then we have a *memory-induced RAW hazard*:
    - store value available  $\Rightarrow$  return value
    - store value not available  $\Rightarrow$  return ROB number of source
  - ▶ Otherwise, send out request to memory
- Actual stores commit in order, so no worry about WAR/WAW hazards through memory.



# Memory Disambiguation

HIPS



# Relationship between precise interrupts and speculation

- Speculation is a form of guessing
  - ▶ Branch prediction, data prediction
  - ▶ If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly
  - ▶ This is exactly same as precise exceptions!
- Branch prediction is a very important!
  - ▶ Need to “take our best shot” at predicting branch direction.
  - ▶ If we issue multiple instructions per cycle, lose lots of potential instructions otherwise:
    - Consider 4 instructions per cycle
    - If take single cycle to decide on branch, waste from 4 - 7 instruction slots!
- Technique for both precise interrupts/exceptions and speculation: *in-order completion or commit*
  - ▶ This is why reorder buffers in all new processors

H  
P  
S