

# Reduction of Data Hazards Stalls with Dynamic Scheduling

i.e. Current pipeline: In-order  
Single issue with FP support

- • So far we have dealt with data (RAW) hazards in instruction pipelines by:

- Result forwarding (register bypassing) to reduce or eliminate stalls needed to prevent RAW hazards as a result of true data dependence.
- Hazard detection hardware to stall the pipeline starting with the instruction that uses the result. *i.e. forward + stall (if needed)*
- Compiler-based static pipeline scheduling to separate the dependent instructions minimizing actual hazard-prevention stalls in scheduled code.
  - Loop unrolling to increase basic block size: More ILP exposed.

Dynamic = Done at runtime by CPU

i.e. Start of instruction execution is not in program order

- • Dynamic scheduling: (out-of-order execution)

- – Uses a CPU-based mechanism to reorder or rearrange instruction execution order to reduce stalls dynamically at runtime.
  - Better dynamic exploitation of instruction-level parallelism (ILP).
- + – Enables handling some cases where instruction dependencies are unknown at compile time (ambiguous dependencies).
- – Similar to the other pipeline optimizations above, a dynamically scheduled processor cannot remove true data dependencies, but tries to avoid or reduce stalling.

## Fully In-Order Integer Pipeline:



All instructions go through all pipelines stages in program order

Date Hazards: RAW: Possible WAR, WAW: Impossible

## In-Order Pipeline with FP Support:



Up to start of EX in program order

Can be reached out of program order

Date Hazards: RAW, WAW: Possible WAR: Impossible

## Out-of-Order Execution Pipelined Processor (Dynamically Scheduled):



In program order

Can be reached out of program order  
Including:

- Read Operands (Part of ID)
- Start of Execution

Date Hazards: RAW, WAW, WAR: All Possible



All Pipelines Are Still Single Issue

CMPE550 - Shaaban

# Dynamic Pipeline Scheduling: *The Concept*

Dynamic = Done at runtime by CPU

(Out-of-order execution)

i.e Start of instruction execution is not in program order

- **Dynamic pipeline scheduling overcomes the limitations of in-order pipelined execution by allowing out-of-order instruction execution.**
- **Instruction are allowed to start executing out-of-order as soon as their operands are available.**
  - Better dynamic exploitation of instruction-level parallelism (ILP).

## Example:

In the case of in-order pipelined execution  
SUB.D must wait for DIV.D to complete  
which stalled ADD.D before starting execution  
In out-of-order execution SUBD can start as soon  
as the values of its operands F8, F14 are available.



- This implies allowing out-of-order instruction commit (completion).
- May lead to imprecise exceptions if an instruction issued earlier raises an exception.
  - This is similar to pipelines with multi-cycle floating point units.

# Dynamic Pipeline Scheduling

- Dynamic instruction scheduling is accomplished by:

→ – Dividing the Instruction Decode ID stage into two stages or steps:

- 1• Issue: Decode instructions, check for structural hazards. In-order
  - + – A record of data dependencies is constructed as instructions are issued
    - – This creates a dynamically-constructed dependency graph for the window of instructions in-flight (being processed) in the CPU.
- 2• Read operands: Wait until data hazard conditions, if any, are resolved, then read operands when available (then start execution)  
  
(All instructions pass through the issue stage 1 in order but can be stalled or pass each other in the read operands stage 2).
  - In the instruction fetch stage IF, fetch an additional instruction every cycle into a latch or several instructions into an instruction queue.
  - Increase the number of functional units to meet the demands of the additional instructions in their EX stage.

- Two approaches to dynamic scheduling:

(Control Data Corp.)

- FYI → 1 – Dynamic scheduling with the Scoreboard used first in **CDC6600** (1963)
- 2 – The Tomasulo approach pioneered by the **IBM 360/91** (1966)

# Single Instruction Issue Per Cycle (In-Order)



**Issue:**  
Decode Instructions  
+  
Construct Dependency Graph  
Dynamically (At Runtime By CPU)  
As Instructions Are Issued in Program  
Order one per cycle (Single-Issue)

# Single Instruction Issue Per Cycle (In-Order)



# Single Instruction Issue Per Cycle (In-Order)



# Single Instruction Issue Per Cycle (In-Order)



# Dynamic Scheduling With A Scoreboard

- The scoreboard is a centralized hardware mechanism that maintains an execution rate of one instruction per cycle by executing an instruction as soon as its operands are available in registers and no hazard conditions prevent it.

No Forwarding

— e.g. Forming a single-issue out-of-order pipeline

EX Includes MEM

- It replaces ID, EX, WB with four stages: ID1, ID2, EX, WB
  - Includes MEM
  - Issue
  - Read Operands
- Every instruction goes through the scoreboard where a record of data dependencies is constructed (ID1 corresponds to instruction issue).

No changes to Instruction Fetch (IF)

In ID1  
(Issue)  
In-Order

→ In effect dynamically constructing the dependency graph by hardware for a window of instructions as they are issued one at a time in program order.

- A system with a scoreboard is assumed to have several functional units with their status information reported to the scoreboard.
- If the scoreboard determines that an instruction cannot execute immediately it executes another waiting instruction and keeps monitoring hardware units status and decide when the instruction can proceed to execute.
- The scoreboard also decides when an instruction can write its results to registers (hazard detection and resolution is centralized in the scoreboard).

Instruction Fetch (IF) is not changed

Order = Program Instruction Order

# Instruction Execution Stages with A Scoreboard

1 **Issue (ID1):** An instruction is issued if:

Stage 0 Instruction Fetch (IF): No changes, in-order

1  
Always done in program order

2  
3

- A functional unit for the instruction is available (**No structural hazard**).
- The instruction result destination register is not marked for writing by an earlier active instruction (**No WAW hazard**, i.e no output dependence)
- If the above conditions are satisfied, the scoreboard issues the instruction to a functional unit and updates its internal data structures. As indicated by instruction issue requirements, **structural and WAW hazards are resolved here by stalling the instruction issue.** (this stage replaces part of **ID** stage in the conventional MIPS pipeline).

Can be done out of program order

2 **Read operands (ID2):** The scoreboard monitors the availability of the source operands. A source operand is available when no earlier active instruction will write it. When all source operands are available the scoreboard tells the functional unit to *read* all operands from the registers at once (no forwarding supported) and start execution (**RAW hazards resolved here dynamically**). This completes **ID**.

From registers (No forwarding)

3 **Execution (EX):** The functional unit starts execution upon receiving operands. When the results are ready it notifies the scoreboard (replaces **EX, MEM** in MIPS).

4 **Write result (WB):** Once the scoreboard senses that a functional unit completed execution, it checks for **WAR hazards** and stalls the completing instruction if needed otherwise the write back is completed. The functional unit issued to the instruction is marked as available (not busy) after WB is completed.

# Scoreboard Stages

Including Instruction Fetch (IF): No changes, **still in-order**





The basic structure of a MIPS processor with a scoreboard

In Fourth Edition: Appendix A.7  
(In Third Edition: Appendix A.8)

FP units are not pipelined similar to CDC6600

CMPE550 - Shaaban

# Three Parts of the Scoreboard

- 1 **Instruction status:** Which of 4 steps the instruction is in.
- 2 **Functional unit status:** Indicates the state of the functional unit (FU).

Nine fields for each functional unit:

| Used/Needed<br>To Track Data<br>Dependencies | Busy | Op | Fi | Fj, Fk | Qj, Qk | Rj, Rk | 0 = No = Not ready<br>1 = Yes = Ready | Source-register numbers<br>i.e. Operand Registers | If Any | Functional units producing source (operand) registers Fj, Fk | Flags indicating when Fj, Fk are ready<br>Yes or = 1 means ready | (set to Yes after operand is available to read<br>both operands read at once from registers) | i.e when both Rj, Rk are set to yes<br>(both operands are ready) |
|----------------------------------------------|------|----|----|--------|--------|--------|---------------------------------------|---------------------------------------------------|--------|--------------------------------------------------------------|------------------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------------|
|----------------------------------------------|------|----|----|--------|--------|--------|---------------------------------------|---------------------------------------------------|--------|--------------------------------------------------------------|------------------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------------|

- 3 **Register result status:** Indicates which functional unit will write to each register, if one exists. Blank when no pending instructions will write that register.



Needed to check for possible WAW hazard and possibly stall issue

e.g.

|      |    |       |    |       |     |
|------|----|-------|----|-------|-----|
| F0   | F1 | F2    | F3 | ..... | F31 |
| Add1 | -- | Mult1 | -- | ..... | --  |

**CMPE550 - Shaaban**

# The Scoreboard: Detailed Pipeline Control

| Instruction status | Wait until                                                                                                                                                                           | Bookkeeping                                                                                                                                                                                                                                                                                                                                                                                       |
|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue              | Not busy (FU) and not result(D)                                                                                                                                                      | $\text{Busy(FU)} \leftarrow \text{yes}; \text{Op(FU)} \leftarrow \text{op};$<br>$\text{Fi(FU)} \leftarrow 'D'; \text{Fj(FU)} \leftarrow 'S1';$<br>$\text{Fk(FU)} \leftarrow 'S2'; \text{Qj} \leftarrow \text{Result('S1')};$<br>$\text{Qk} \leftarrow \text{Result('S2')}; \text{Rj} \leftarrow \text{not Qj};$<br>$\text{Rk} \leftarrow \text{not Qk}; \text{Result('D')} \leftarrow \text{FU};$ |
| Read operands      | Rj and Rk                                                                                                                                                                            | Rj $\leftarrow$ Yes Rk $\leftarrow$ Yes                                                                                                                                                                                                                                                                                                                                                           |
| Execution complete | Functional unit done                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                   |
| Write result       | $\forall f ((\text{Fj}(f) \neq \text{Fi(FU)})$<br>$\text{or } \text{Rj}(f) = \text{No}) \text{ & }$<br>$(\text{Fk}(f) \neq \text{Fi(FU)})$<br>$\text{or } \text{Rk}(f) = \text{No})$ | $\forall f (\text{if } \text{Qj}(f) = \text{FU} \text{ then } \text{Rj}(f) \leftarrow \text{Yes});$<br>$\forall f (\text{if } \text{Qk}(f) = \text{FU} \text{ then } \text{Rk}(f) \leftarrow \text{Yes});$<br>$\text{Result}(\text{Fi(FU)}) \leftarrow 0; \text{Busy(FU)} \leftarrow \text{No}$                                                                                                   |

DAP Spr. '98 ©UCB 30

# A Scoreboard Example

→ The following code is run on the MIPS with a scoreboard given earlier with:

| Functional Unit (FU)    | # of FUs | EX cycles |
|-------------------------|----------|-----------|
| Integer                 | 1        | 1         |
| Floating Point Multiply | 2        | 10        |
| Floating Point add/sub  | 1        | 2         |
| Floating point Divide   | 1        | 40        |



→ All functional units  
are not pipelined  
(similar to CDC6600)



→ Assume EX and MEM  
for L.D. done in one cycle

# Dependency Graph For Example Code

$\emptyset$



## Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

## Date Dependence:

(1, 4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4, 6)

## Output Dependence:

(1, 6)

## Anti-dependence:

(5, 6)

Real Data Dependence (RAW) →

Anti-dependence (WAR) →+←

Output Dependence (WAW) →○←

**CMPE550 - Shaaban**

# Dependency Graph For Example Code

Cycle 1

$\emptyset$



Issue Instruction 1 to  
Only Integer Unit

Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

Output Dependence:

Anti-dependence:

Real Data Dependence (RAW) →  
Anti-dependence (WAR) +→  
Output Dependence (WAW) -○-

CMPE550 - Shaaban

# Scoreboard Example: Cycle 1

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40

## Instruction status

| Instruction | <i>j</i> | <i>k</i> | Issue | Read operands | Execution complete | Write result |
|-------------|----------|----------|-------|---------------|--------------------|--------------|
| L.D         | F6       | 34+      | R2    |               |                    |              |
| L.D         | F2       | 45+      | R3    |               |                    |              |
| MUL.DF0     | F2       | F4       |       |               |                    |              |
| SUB.DF8     | F6       | F2       |       |               |                    |              |
| DIV.D       | F10      | F0       | F6    |               |                    |              |
| ADD.DF6     | F8       | F2       |       |               |                    |              |



Means at end of Cycle 1

## Functional unit status

| Time | Name    | Busy | Op   | dest<br><i>Fi</i> | <i>S1</i><br><i>Fj</i> | <i>S2</i><br><i>Fk</i> | FU for <i>j</i> | FU for <i>k</i> | <i>Fj?</i><br><i>Rj</i> | <i>Fk?</i><br><i>Rk</i> |
|------|---------|------|------|-------------------|------------------------|------------------------|-----------------|-----------------|-------------------------|-------------------------|
| →    | Integer | Yes  | Load | F6                |                        | R2                     |                 |                 |                         | Yes                     |
|      | Mult1   | No   |      |                   |                        |                        |                 |                 |                         |                         |
|      | Mult2   | No   |      |                   |                        |                        |                 |                 |                         |                         |
|      | Add     | No   |      |                   |                        |                        |                 |                 |                         |                         |
|      | Divide  | No   |      |                   |                        |                        |                 |                 |                         |                         |

## Register result status

| Clock | 1 | FU | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|-------|---|----|----|----|----|----|----|-----|-----|-----|-----|
|       |   |    |    |    |    |    |    |     |     |     |     |

Integer

→ Issue first L.D

# Dependency Graph For Example Code

Cycle 2

$\emptyset$



Read Operands (ID2)

Issue second L.D?  
No, stall on structural hazard.  
Single integer functional unit is  
busy.

|   | Example Code      |
|---|-------------------|
| 1 | L.D F6, 34(R2)    |
| 2 | L.D F2, 45(R3)    |
| 3 | MUL.D F0, F2, F4  |
| 4 | SUB.D F8, F6, F2  |
| 5 | DIV.D F10, F0, F6 |
| 6 | ADD.D F6, F8, F2  |

Date Dependence:

Output Dependence:

Anti-dependence:



CMPE550 - Shaaban

# Scoreboard Example: Cycle 2

End of

FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40

Instruction status

| Instruction | j   | k  |  |
|-------------|-----|----|--|
| L.D F6      | 34+ | R2 |  |
| L.D F2      | 45+ | R3 |  |
| MUL.DF0     | F2  | F4 |  |
| SUB.DF8     | F6  | F2 |  |
| DIV.D F10   | F0  | F6 |  |
| ADD.DF6     | F8  | F2 |  |

Functional unit status

| Time | Name    | dest |      | S1 | S2 | FU for j | FU for k | Fj? | Fk? |     |
|------|---------|------|------|----|----|----------|----------|-----|-----|-----|
|      |         | Busy | Op   | Fi | Fj | Fk       | Qj       | Qk  | Rj  | Rk  |
|      | Integer | Yes  | Load | F6 |    |          | R2       |     |     | Yes |
|      | Mult1   | No   |      |    |    |          |          |     |     |     |
|      | Mult2   | No   |      |    |    |          |          |     |     |     |
|      | Add     | No   |      |    |    |          |          |     |     |     |
|      | Divide  | No   |      |    |    |          |          |     |     |     |

Register result status

| Clock | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|----|----|----|-----|-----|-----|-----|
| 2     |    |    |    |    |    |     |     |     |     |
| FU    |    |    |    |    |    |     |     |     |     |

- ? Issue second L.D? No, stall on structural hazard. Single integer functional unit is busy.

# Dependency Graph For Example Code

Cycle 3

∅ →



EX + MEM Access Done

→ Issue second L.D?  
No, stall on structural hazard.  
Single integer functional unit is  
still busy.

→ Issue MUL.D? No, cannot issue out of order  
(second L.D not issued yet)

Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

Output Dependence:

Anti-dependence:



CMPE550 - Shaaban

# Scoreboard Example: Cycle 3

End of

Instruction status

| Instruction | j   | k   |    |
|-------------|-----|-----|----|
| L.D         | F6  | 34+ | R2 |
| L.D         | F2  | 45+ | R3 |
| MUL.DF0     | F2  | F4  |    |
| SUB.DF8     | F6  | F2  |    |
| DIV.D       | F10 | F0  | F6 |
| ADD.DF6     | F8  | F2  |    |



From Assumption  
EX, MEM  
for L.D. in one cycle

Issue ?

Functional unit status

| Time | Name    | Busy | Op   | dest                 | S1                   | S2                   | FU for j             | FU for k             | Fj?                  | Fk?                  |
|------|---------|------|------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
|      |         |      |      | <i>F<sub>i</sub></i> | <i>F<sub>j</sub></i> | <i>F<sub>k</sub></i> | <i>Q<sub>j</sub></i> | <i>Q<sub>k</sub></i> | <i>R<sub>j</sub></i> | <i>R<sub>k</sub></i> |
|      | Integer | Yes  | Load | F6                   |                      | R2                   |                      |                      |                      | Yes                  |
|      | Mult1   | No   |      |                      |                      |                      |                      |                      |                      |                      |
|      | Mult2   | No   |      |                      |                      |                      |                      |                      |                      |                      |
|      | Add     | No   |      |                      |                      |                      |                      |                      |                      |                      |
|      | Divide  | No   |      |                      |                      |                      |                      |                      |                      |                      |

Register result status

| Clock | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|----|----|----|-----|-----|-----|-----|
| 3     |    |    |    |    |    |     |     |     |     |

FU

Integer



- Issue MUL.D? No, cannot issue out of order  
(second L.D not issued yet)

# Dependency Graph For Example Code

Cycle 4

$\emptyset$



→ Issue second L.D?  
No, stall on structural hazard.  
Single integer functional unit is  
still busy this cycle

Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

Output Dependence:

Anti-dependence:



CMPE550 - Shaaban

# Scoreboard Example: Cycle 4

End of

Instruction status

| Instruction | <i>j</i> | <i>k</i> |    |
|-------------|----------|----------|----|
| L.D         | F6       | 34+      | R2 |
| L.D         | F2       | 45+      | R3 |
| MUL.DF0     | F2       | F4       |    |
| SUB.DF8     | F6       | F2       |    |
| DIV.D       | F10      | F0       | F6 |
| ADD.DF6     | F8       | F2       |    |

Issue ? →

Functional unit status

| Time                                                                                           | Name    |
|------------------------------------------------------------------------------------------------|---------|
| Actually free end<br>of this cycle 4<br><br>(available for<br>instruction issue<br>next cycle) | Integer |
|                                                                                                | Mult1   |
|                                                                                                | Mult2   |
|                                                                                                | Add     |
|                                                                                                | Divide  |

Register result status

| Clock | <i>FU</i> |
|-------|-----------|
| 4     |           |

Read      Execution      Write

| Issue | operands | complete | Result |
|-------|----------|----------|--------|
|-------|----------|----------|--------|

|   |   |   |   |
|---|---|---|---|
| 1 | 2 | 3 | 4 |
|   |   |   |   |

*dest*      S1      S2      FU for *j*      FU for *k*      *Fj?*      *Fk?*

| Busy | Op | <i>Fi</i> | <i>Fj</i> | <i>Fk</i> | <i>Qj</i> | <i>Qk</i> | <i>Rj</i> | <i>Rk</i> |
|------|----|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| No   |    |           |           |           |           |           |           |           |
| No   |    |           |           |           |           |           |           |           |
| No   |    |           |           |           |           |           |           |           |
| No   |    |           |           |           |           |           |           |           |
| No   |    |           |           |           |           |           |           |           |

|           |           |           |           |           |            |            |     |            |
|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
| <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|           |           |           |           |           |            |            |     |            |

→ Issue second L.D?

Next Cycle →

# Dependency Graph For Example Code

Cycle 5

$\emptyset$



Issue Instruction 2 to Integer Unit

Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |



Date Dependence:

Output Dependence:

Anti-dependence:



CMPE550 - Shaaban

# Scoreboard Example: Cycle 5

End of

Instruction status

| Instruction | j   | k   |    |
|-------------|-----|-----|----|
| L.D         | F6  | 34+ | R2 |
| L.D         | F2  | 45+ | R3 |
| MUL.D       | F0  | F2  | F4 |
| SUB.D       | F8  | F6  | F2 |
| DIV.D       | F10 | F0  | F6 |
| ADD.D       | F6  | F8  | F2 |

Read    Execution    Write  
 Issue    operands complete    Result

| Issue | 1 | 2 | 3 | 4 |
|-------|---|---|---|---|
|       | 5 |   |   |   |

Functional unit status

| Time | Name    | Busy | Op   | dest                 | S1                   | S2                   | FU for j             | FU for k             | Fj?                  | Fk?                  |
|------|---------|------|------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
|      |         |      |      | <i>F<sub>i</sub></i> | <i>F<sub>j</sub></i> | <i>F<sub>k</sub></i> | <i>Q<sub>j</sub></i> | <i>Q<sub>k</sub></i> | <i>R<sub>j</sub></i> | <i>R<sub>k</sub></i> |
| →    | Integer | Yes  | Load | F2                   |                      | R3                   |                      |                      | Yes                  |                      |
|      | Mult1   | No   |      |                      |                      |                      |                      |                      |                      |                      |
|      | Mult2   | No   |      |                      |                      |                      |                      |                      |                      |                      |
|      | Add     | No   |      |                      |                      |                      |                      |                      |                      |                      |
|      | Divide  | No   |      |                      |                      |                      |                      |                      |                      |                      |

Register result status

| Clock | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|-------|----|----|----|----|----|-----|-----|-----|-----|
| 5     |    |    |    |    |    |     |     |     |     |

FU

Integer

→ Issue second L.D

# Dependency Graph For Example Code

Cycle 6



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |



Date Dependence:  
(2, 3)

Output Dependence:

Anti-dependence:



CMPE550 - Shaaban

# Scoreboard Example: Cycle 6

End of

Issue

Instruction status

| Instruction | j   | k   |    |
|-------------|-----|-----|----|
| L.D         | F6  | 34+ | R2 |
| L.D         | F2  | 45+ | R3 |
| MUL.D       | F0  | F2  | F4 |
| SUB.D       | F8  | F6  | F2 |
| DIV.D       | F10 | F0  | F6 |
| ADD.D       | F6  | F8  | F2 |

Functional unit status

| Time | Name    |
|------|---------|
|      | Integer |
| →    | Mult1   |
|      | Mult2   |
|      | Add     |
|      | Divide  |

Register result status

Clock

6

FU

|       |         |    |    |    |     |     |     |     |
|-------|---------|----|----|----|-----|-----|-----|-----|
| F0    | F2      | F4 | F6 | F8 | F10 | F12 | ... | F30 |
| Mult1 | Integer |    |    |    |     |     |     |     |

→ Issue MUL.D

Producer of  
Result Needed

Ready Flags

| dest | S1   | S2 | FU for j |    | Fj?     | Fk? |    |     |
|------|------|----|----------|----|---------|-----|----|-----|
| Busy | Op   | Fi | Fj       | Fk | Qj      | Qk  | Rj | Rk  |
| Yes  | Load | F2 |          | R3 |         |     |    | Yes |
| Yes  | Mult | F0 | F2       | F4 | Integer |     | No | Yes |
| No   |      |    |          |    |         |     |    |     |
| No   |      |    |          |    |         |     |    |     |
| No   |      |    |          |    |         |     |    |     |

# Dependency Graph For Example Code

Cycle 7

$\emptyset$



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:  
(1,4) (2, 3) (2, 4)

Output Dependence:

Anti-dependence:



CMPE550 - Shaaban

# Scoreboard Example: Cycle 7

End of

Instruction status

| Instruction | j   | k   |    |
|-------------|-----|-----|----|
| L.D         | F6  | 34+ | R2 |
| L.D         | F2  | 45+ | R3 |
| MUL.D       | F0  | F2  | F4 |
| SUB.D       | F8  | F6  | F2 |
| DIV.D       | F10 | F0  | F6 |
| ADD.D       | F6  | F8  | F2 |

Read      Execution      Write  
 Issue      operands complete      Result

|   |   |   |   |
|---|---|---|---|
| 1 | 2 | 3 | 4 |
| 5 | 6 | 7 |   |
| 6 | ? |   |   |
| 7 |   |   |   |
|   |   |   |   |

Functional unit status

Time

| Name    | dest |    | S1 | S2 | FU for j | FU for k | Fj? | Fk? |
|---------|------|----|----|----|----------|----------|-----|-----|
| Busy    | Op   | Fi | Fj | Fk | Qj       | Qk       | Rj  | Rk  |
| Integer | Load | F2 |    | R3 |          |          |     | Yes |
| Mult1   | Mult | F0 | F2 | F4 | Integer  |          | No  | Yes |
| Mult2   |      |    |    |    |          |          |     |     |
| → Add   | Sub  | F8 | F6 | F2 | Integer  | Yes      | No  |     |
| Divide  |      |    |    |    |          |          |     |     |

Register result status

Clock

7

FU

|       |         |    |    |     |     |     |     |     |
|-------|---------|----|----|-----|-----|-----|-----|-----|
| F0    | F2      | F4 | F6 | F8  | F10 | F12 | ... | F30 |
| Mult1 | Integer |    |    | Add |     |     |     |     |

- • Issue SUB.D
- • Read multiply operands?

# Dependency Graph For Example Code



Issue Instruction 5 to  
Mult2 Functional Unit

Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:  
(1,4) (1, 5) (2, 3) (2, 4)  
(3, 5)

Output Dependence:

Anti-dependence:

Real Data Dependence (RAW) →  
Anti-dependence (WAR) +→  
Output Dependence (WAW) ←○

Cycle 8a (First half of cycle 8)

CMPE550 - Shaaban

# Scoreboard Example: Cycle 8a

## (First half of cycle 8)

Instruction status

| Instruction | j   | k   |
|-------------|-----|-----|
| L.D         | F6  | 34+ |
| L.D         | F2  | 45+ |
| MUL.D       | F0  | F2  |
| SUB.D       | F8  | F6  |
| DIV.D       | F10 | F0  |
| ADD.D       | F6  | F8  |

Functional unit status

Time

| Name | dest | S1 | S2 | FU for j | FU for k | Fj? | Fk? |
|------|------|----|----|----------|----------|-----|-----|
| Op   | Fi   | Fj | Fk | Qj       | Qk       | Rj  | Rk  |
| Load | F2   |    | R3 |          |          |     | Yes |
| Mult | F0   | F2 | F4 | Integer  |          | No  | Yes |
| Add  |      |    |    |          | Integer  | Yes | No  |
| Div  | F10  | F0 | F6 | Mult1    |          | No  | Yes |

→ Divide

Register result status

Clock

8

|    | F0    | F2      | F4 | F6  | F8 | F10    | F12 | ... | F30 |
|----|-------|---------|----|-----|----|--------|-----|-----|-----|
| FU | Mult1 | Integer |    | Add |    | Divide |     |     |     |

➔ • Issue DIV.D

# Dependency Graph For Example Code



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(3, 5)

Output Dependence:

Anti-dependence:



Cycle 8b (Second half of cycle 8)

CMPE550 - Shaaban

# Scoreboard Example: Cycle 8b

## (Second half of cycle 8)

End of Cycle 8

| Instruction status |     |     | Issue | Read operands | Execution complete | Write result |
|--------------------|-----|-----|-------|---------------|--------------------|--------------|
| Instruction        | j   | k   |       |               |                    |              |
| L.D                | F6  | 34+ | R2    | 1             | 2                  | 3            |
| L.D                | F2  | 45+ | R3    | 5             | 6                  | 7            |
| MUL.D              | F0  | F2  | F4    | 6             |                    |              |
| SUB.D              | F8  | F6  | F2    | 7             |                    |              |
| DIV.D              | F10 | F0  | F6    | 8             |                    |              |
| ADD.D              | F6  | F8  | F2    |               |                    |              |

  

| Functional unit status |         |      | dest | S1  | S2 | FU for j | FU for k | Fj? | Fk? |     |
|------------------------|---------|------|------|-----|----|----------|----------|-----|-----|-----|
| Time                   | Name    | Busy | Op   | Fi  | Fj | Fk       | Qj       | Qk  | Rj  | Rk  |
|                        | Integer | No   |      |     |    |          |          |     |     |     |
|                        | Mult1   | Yes  | Mult | F0  | F2 | F4       |          |     | Yes | Yes |
|                        | Mult2   | No   |      |     |    |          |          |     |     |     |
|                        | Add     | Yes  | Sub  | F8  | F6 | F2       |          |     | Yes | Yes |
|                        | Divide  | Yes  | Div  | F10 | F0 | F6       | Mult1    |     | No  | Yes |

  

| Register result status |    |  | F0    | F2 | F4 | F6  | F8     | F10 | F12 | ... | F30 |
|------------------------|----|--|-------|----|----|-----|--------|-----|-----|-----|-----|
| Clock                  | FU |  | Mult1 |    |    | Add | Divide |     |     |     |     |
| 8                      |    |  |       |    |    |     |        |     |     |     |     |

→ • Second L.D writes result to F2

# Dependency Graph For Example Code

Cycle 9



→ Issue ADD.D?  
No, stall on structural hazard.  
Single Add functional unit is busy.

Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:  
(1,4) (1, 5) (2, 3) (2, 4)  
(3, 5)

Output Dependence:

Anti-dependence:



CMPE550 - Shaaban

# Scoreboard Example: Cycle 9

End of

FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40

Instruction status

| Instruction | j   | k   |    |
|-------------|-----|-----|----|
| L.D         | F6  | 34+ | R2 |
| L.D         | F2  | 45+ | R3 |
| MUL.D       | F0  | F2  | F4 |
| SUB.D       | F8  | F6  | F2 |
| DIV.D       | F10 | F0  | F6 |
| ADD.D       | F6  | F8  | F2 |

Issue ?

| Issue | Read operands | Execution complete | Write result |
|-------|---------------|--------------------|--------------|
| 1     | 2             | 3                  | 4            |
| 5     | 6             | 7                  | 8            |
| 6     | 9             |                    |              |
| 7     | 9             |                    |              |
| 8     | ?             |                    |              |

Functional unit status

| Execution cycles | Time | Name | dest | S1 | S2 | FU for j | FU for k | Fj? | Fk? |
|------------------|------|------|------|----|----|----------|----------|-----|-----|
|                  |      |      | Fi   | Fj | Fk | Qj       | Qk       | Rj  | Rk  |
|                  | No   |      |      |    |    |          |          |     |     |
|                  | Yes  | Mult | F0   | F2 | F4 |          |          | Yes | Yes |
|                  | No   |      |      |    |    |          |          |     |     |
|                  | Yes  | Sub  | F8   | F6 | F2 |          |          | Yes | Yes |
|                  | Yes  | Div  | F10  | F0 | F6 | Mult1    |          | No  | Yes |

Register result status

Clock

9

FU

|       |    |     |    |        |     |     |     |     |
|-------|----|-----|----|--------|-----|-----|-----|-----|
| F0    | F2 | F4  | F6 | F8     | F10 | F12 | ... | F30 |
| Mult1 |    | Add |    | Divide |     |     |     |     |



- Read operands for MUL.D & SUB.D
- Issue ADD.D?

Ex starts next cycle for both instructions

CMPE550 - Shaaban

# Dependency Graph For Example Code

Cycle 11



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:  
(1,4) (1, 5) (2, 3) (2, 4)  
 (3, 5)

Output Dependence:

Anti-dependence:

➔ Issue ADD.D?  
 No, stall on structural hazard.  
 Single Add functional unit is still busy.



CMPE550 - Shaaban

# Scoreboard Example: Cycle 11

End of

Instruction status

| Instruction | j   | k   |    |
|-------------|-----|-----|----|
| L.D         | F6  | 34+ | R2 |
| L.D         | F2  | 45+ | R3 |
| MUL.D       | F0  | F2  | F4 |
| SUB.D       | F8  | F6  | F2 |
| DIV.D       | F10 | F0  | F6 |
| ADD.D       | F6  | F8  | F2 |

Read operands    Execution complete    Write result

| Issue | 1 | 2 | 3  | 4 |
|-------|---|---|----|---|
| 5     |   | 6 | 7  | 8 |
| 6     |   |   |    |   |
| 7     |   | 9 | 11 |   |
| 8     |   |   |    |   |
| ?     |   |   |    |   |

Functional unit status

| Time | Name | dest | S1 | S2 | FU for j | FU for k | Fj? | Fk? |
|------|------|------|----|----|----------|----------|-----|-----|
| Busy | Op   | Fi   | Fj | Fk | Qj       | Qk       | Rj  | Rk  |
| No   |      |      |    |    |          |          |     |     |
| Yes  | Mult | F0   | F2 | F4 |          |          | Yes | Yes |
| No   |      |      |    |    |          |          |     |     |
| Yes  | Sub  | F8   | F6 | F2 |          |          | Yes | Yes |
| Yes  | Div  | F10  | F0 | F6 | Mult1    |          | No  | Yes |

Register result status

Clock

11

FU

| F0    | F2 | F4 | F6  | F8 | F10    | F12 | ... | F30 |
|-------|----|----|-----|----|--------|-----|-----|-----|
| Mult1 |    |    | Add |    | Divide |     |     |     |



- Issue ADD.D?

# Dependency Graph For Example Code

Cycle 12



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:  
(1,4) (1, 5) (2, 3) (2, 4)  
(3, 5)

Output Dependence:

Anti-dependence:

➔ Issue ADD.D? Next Cycle  
No, stall on structural hazard.  
Single Add functional unit is still busy.



CMPE550 - Shaaban

# Scoreboard Example: Cycle 12

End of

| Instruction status |     |     |    | Issue | Read operands | Execution complete | Write back result |
|--------------------|-----|-----|----|-------|---------------|--------------------|-------------------|
| Instruction        | j   | k   |    |       |               |                    |                   |
| L.D                | F6  | 34+ | R2 | 1     | 2             | 3                  | 4                 |
| L.D                | F2  | 45+ | R3 | 5     | 6             | 7                  | 8                 |
| MUL.D              | F0  | F2  | F4 | 6     | 9             |                    |                   |
| SUB.D              | F8  | F6  | F2 | 7     | 9             | 11                 | 12                |
| DIV.D              | F10 | F0  | F6 | 8     | ?             |                    |                   |
| ADD.D              | F6  | F8  | F2 | ?     |               |                    |                   |

  

| Functional unit status |      |      |  | dest | S1 | S2    | FU for j | FU for k | Fj? | Fk? |
|------------------------|------|------|--|------|----|-------|----------|----------|-----|-----|
| Execution cycles       | Time | Name |  | Fi   | Fj | Fk    | Qj       | Qk       | Rj  | Rk  |
| Busy                   | Op   |      |  |      |    |       |          |          |     |     |
| No                     |      |      |  |      |    |       |          |          |     |     |
| Yes                    | Mult | F0   |  | F2   | F4 |       |          |          | Yes | Yes |
| No                     |      |      |  |      |    |       |          |          |     |     |
| No                     | Div  | F10  |  | F0   | F6 | Mult1 |          |          | No  | Yes |

  

| Register result status | F0 | F2 | F4 | F6 | F8    | F10 | F12 | ... | F30 |
|------------------------|----|----|----|----|-------|-----|-----|-----|-----|
| Clock                  | F0 | F2 | F4 | F6 | Mult1 | F10 | F12 | ... | F30 |

- ? • Read operands for DIV.D?  
 → ? • Issue ADD.D?      Next Cycle →

# Dependency Graph For Example Code

Cycle 13



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)

|                            |       |
|----------------------------|-------|
| Real Data Dependence (RAW) | →     |
| Anti-dependence            | (WAR) |
| Output Dependence          | (WAW) |

CMPE550 - Shaaban

# Scoreboard Example: Cycle 13

End of

|       | Instruction status |     | Issue | Read operands |   |    |    | Execution complete |  | Write result |  |
|-------|--------------------|-----|-------|---------------|---|----|----|--------------------|--|--------------|--|
|       | Instruction        | j   | k     |               |   |    |    |                    |  |              |  |
| L.D   | F6                 | 34+ | R2    | 1             | 2 | 3  | 4  |                    |  |              |  |
| L.D   | F2                 | 45+ | R3    | 5             | 6 | 7  | 8  |                    |  |              |  |
| MUL.D | F0                 | F2  | F4    | 6             | 9 |    |    |                    |  |              |  |
| SUB.D | F8                 | F6  | F2    | 7             | 9 | 11 | 12 |                    |  |              |  |
| DIV.D | F10                | F0  | F6    | 8             |   |    |    |                    |  |              |  |
| ADD.D | F6                 | F8  | F2    | 13            |   |    |    |                    |  |              |  |

  

|                            | Functional unit status |        | dest | S1   |     | FU for j |    | FU for k |    | Fj? Fk? |     |
|----------------------------|------------------------|--------|------|------|-----|----------|----|----------|----|---------|-----|
|                            | Time                   | Name   |      | Fi   | Fj  | Fk       | Qj | Qk       | Rj | Rk      |     |
| Execution cycles Remaining | Time                   | Name   | No   |      |     |          |    |          |    |         |     |
|                            | 6                      | Mult1  | Yes  | Mult | F0  | F2       | F4 |          |    | Yes     | Yes |
|                            |                        | Mult2  | No   |      |     |          |    |          |    |         |     |
|                            | →                      | Add    | Yes  | Add  | F6  | F8       | F2 |          |    | Yes     | Yes |
|                            |                        | Divide | Yes  | Div  | F10 | F0       | F6 | Mult1    |    | No      | Yes |

  

| Clock | F0 | F2 | F4 | F6    | F8  | F10 | F12 | ... | F30    |
|-------|----|----|----|-------|-----|-----|-----|-----|--------|
| 13    |    |    |    | Mult1 | Add |     |     |     | Divide |

→ • Issue ADD.D, Add FP unit available at end of cycle 12 (start of 13)

# Dependency Graph For Example Code

Cycle 17



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)

|                            |    |
|----------------------------|----|
| Real Data Dependence (RAW) | →  |
| Anti-dependence (WAR)      | ↔  |
| Output Dependence (WAW)    | ○→ |

EX done last cycle.  
WB in this cycle?

CMPE550 - Shaaban

# Scoreboard Example: Cycle 17

End of

| Instruction status |     |     | Read  | Execution | Write    |        |    |
|--------------------|-----|-----|-------|-----------|----------|--------|----|
| Instruction        | j   | k   | Issue | operands  | complete | Result |    |
| L.D                | F6  | 34+ | R2    | 1         | 2        | 3      | 4  |
| L.D                | F2  | 45+ | R3    | 5         | 6        | 7      | 8  |
| MUL.D              | F0  | F2  | F4    | 6         | 9        |        |    |
| SUB.D              | F8  | F6  | F2    | 7         | 9        | 11     | 12 |
| DIV.D              | F10 | F0  | F6    | 8         |          |        |    |
| ADD.D              | F6  | F8  | F2    | 13        | 14       | 16     |    |

  

| Functional unit status |         |      | dest | S1  | S2 | FU for j | FU for k | Fj? | Fk? |     |
|------------------------|---------|------|------|-----|----|----------|----------|-----|-----|-----|
| Time                   | Name    | Busy | Op   | Fi  | Fj | Fk       | Qj       | Qk  | Rj  | Rk  |
|                        | Integer | No   |      |     |    |          |          |     |     |     |
| 2                      | Mult1   | Yes  | Mult | F0  | F2 | F4       |          |     | Yes | Yes |
|                        | Mult2   | No   |      |     |    |          |          |     |     |     |
|                        | Add     | Yes  | Add  | F6  | F8 | F2       |          |     | Yes | Yes |
|                        | Divide  | Yes  | Div  | F10 | F0 | F6       | Mult1    |     | No  | Yes |

  

| Register result status | F0    | F2 | F4 | F6  | F8 | F10    | F12 | ... | F30 |
|------------------------|-------|----|----|-----|----|--------|-----|-----|-----|
| Clock 17 FU            | Mult1 |    |    | Add |    | Divide |     |     |     |

- • Write result of ADD.D? No, WAR hazard (DIV.D did not read any operands including F6)

# Dependency Graph For Example Code

Cycle 20



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)

Real Data Dependence (RAW) →

Anti-dependence (WAR) →

Output Dependence (WAW) →

WB in this cycle?

CMPE550 - Shaaban

# Scoreboard Example: Cycle 20

End of

| Instruction status |     |     | Issue | Read operands                               | Execution complete                            | Write result                  |
|--------------------|-----|-----|-------|---------------------------------------------|-----------------------------------------------|-------------------------------|
| Instruction        | j   | k   |       |                                             |                                               |                               |
| L.D                | F6  | 34+ | R2    | 1<br>5<br>6<br>7<br>8<br>9<br>11<br>12<br>? | 2<br>6<br>9<br>9<br>19<br>20<br>11<br>12<br>? | 3<br>7<br>19<br>11<br>12<br>? |
| L.D                | F2  | 45+ | R3    |                                             |                                               |                               |
| MUL.D              | F0  | F2  | F4    |                                             |                                               |                               |
| SUB.D              | F8  | F6  | F2    |                                             |                                               |                               |
| DIV.D              | F10 | F0  | F6    |                                             |                                               |                               |
| ADD.D              | F6  | F8  | F2    | 13                                          | 14                                            | 16                            |

  

| Functional unit status |         |      | dest | S1  | S2 | FU for j | FU for k | Fj? | Fk? |     |
|------------------------|---------|------|------|-----|----|----------|----------|-----|-----|-----|
| Time                   | Name    | Busy | Op   | Fi  | Fj | Fk       | Qj       | Qk  | Rj  | Rk  |
|                        | Integer | No   |      |     |    |          |          |     |     |     |
|                        | Mult1   | No   |      |     |    |          |          |     |     |     |
|                        | Mult2   | No   |      |     |    |          |          |     |     |     |
|                        | Add     | Yes  | Add  | F6  | F8 | F2       |          |     | Yes | Yes |
|                        | Divide  | Yes  | Div  | F10 | F0 | F6       |          |     | Yes | Yes |

  

| Register result status |    | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|------------------------|----|----|----|----|----|----|-----|-----|-----|-----|
| Clock                  | 20 |    |    |    |    |    |     |     |     |     |



- ? • Read operands for DIV.D?      Next Cycle →
- ? • Write result of ADD.D?

# Dependency Graph For Example Code

Cycle 21



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)



WB in this cycle?

CMPE550 - Shaaban

# Scoreboard Example: Cycle 21

End of

Instruction status

| Instruction | j   | k   |    |
|-------------|-----|-----|----|
| L.D         | F6  | 34+ | R2 |
| L.D         | F2  | 45+ | R3 |
| MUL.D       | F0  | F2  | F4 |
| SUB.D       | F8  | F6  | F2 |
| DIV.D       | F10 | F0  | F6 |
| ADD.D       | F6  | F8  | F2 |

Read operands

| Issue | 1 | 2  | 3  | 4  |
|-------|---|----|----|----|
| 5     |   | 6  | 7  | 8  |
| 6     |   | 9  | 19 | 20 |
| 7     |   | 9  | 11 | 12 |
| 8     |   | 21 |    |    |
| 13    |   | 14 | 16 | ?  |

ExecutionFunctional unit statusTimeNameIntegerMult1Mult2AddDivide

40

| Busy | Op  | dest | S1 | S2 | FU for j | FU for k | Fj? | Fk? |
|------|-----|------|----|----|----------|----------|-----|-----|
|      |     | Fi   | Fj | Fk | Qj       | Qk       | Rj  | Rk  |
| No   |     |      |    |    |          |          |     |     |
| No   |     |      |    |    |          |          |     |     |
| No   |     |      |    |    |          |          |     |     |
| Yes  | Add | F6   | F8 | F2 |          |          | Yes | Yes |
| Yes  | Div | F10  | F0 | F6 |          |          | Yes | Yes |

Register result statusClock

21

FU

| F0 | F2 | F4 | F6  | F8 | F10    | F12 | ... | F30 |
|----|----|----|-----|----|--------|-----|-----|-----|
|    |    |    | Add |    | Divide |     |     |     |

- • DIV.D reads operands, starts execution next cycle  
 → • Write result of ADD.D? Next Cycle →

# Dependency Graph For Example Code

Cycle 22



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)



CMPE550 - Shaaban

# Scoreboard Example: Cycle 22

End of

| Instruction status |     |     | Issue | Read operands | Execution complete | Write result |    |
|--------------------|-----|-----|-------|---------------|--------------------|--------------|----|
| Instruction        | j   | k   |       |               |                    |              |    |
| L.D                | F6  | 34+ | R2    | 1             | 2                  | 3            | 4  |
| L.D                | F2  | 45+ | R3    | 5             | 6                  | 7            | 8  |
| MUL.D              | F0  | F2  | F4    | 6             | 9                  | 19           | 20 |
| SUB.D              | F8  | F6  | F2    | 7             | 9                  | 11           | 12 |
| DIV.D              | F10 | F0  | F6    | 8             | 21                 |              |    |
| ADD.D              | F6  | F8  | F2    | 13            | 14                 | 16           | 22 |

  

| Functional unit status |         |      | dest | S1  | S2 | FU for j | FU for k | Fj? | Fk? |     |
|------------------------|---------|------|------|-----|----|----------|----------|-----|-----|-----|
| Time                   | Name    | Busy | Op   | Fi  | Fj | Fk       | Qj       | Qk  | Rj  | Rk  |
|                        | Integer | No   |      |     |    |          |          |     |     |     |
|                        | Mult1   | No   |      |     |    |          |          |     |     |     |
|                        | Mult2   | No   |      |     |    |          |          |     |     |     |
|                        | Add     | No   |      |     |    |          |          |     |     |     |
| 39                     | Divide  | Yes  | Div  | F10 | F0 | F6       |          |     | Yes | Yes |

  

| Register result status |    | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|------------------------|----|----|----|----|----|----|-----|-----|-----|-----|
| Clock                  | 22 |    |    |    |    |    |     |     |     |     |

→ First cycle DIV.D execution (39 more ex cycles)

→ ADD.D writes result in F6 (No WAR, DIV.D read operands in cycle 21)  
(Possible)

# Dependency Graph For Example Code

Cycle 61



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)

|                            |       |
|----------------------------|-------|
| Real Data Dependence (RAW) | →     |
| Anti-dependence            | (WAR) |
| Output Dependence          | (WAW) |

CMPE550 - Shaaban

# Scoreboard Example: Cycle 61

End of

Instruction status

| Instruction | j   | k   | Issue | Read operands | Execution complete | Write result |
|-------------|-----|-----|-------|---------------|--------------------|--------------|
| L.D         | F6  | 34+ | R2    | 1             | 2                  | 3            |
| L.D         | F2  | 45+ | R3    | 5             | 6                  | 7            |
| MUL.D       | F0  | F2  | F4    | 6             | 9                  | 19           |
| SUB.D       | F8  | F6  | F2    | 7             | 9                  | 11           |
| DIV.D       | F10 | F0  | F6    | 8             | 21                 | 61           |
| ADD.D       | F6  | F8  | F2    | 13            | 14                 | 16           |
|             |     |     |       |               |                    | 22           |

Functional unit status

## Time

| Name     | dest |     | S1  | S2 | FU for j | FU for k | Fj? | Fk? |
|----------|------|-----|-----|----|----------|----------|-----|-----|
| Busy     | Op   | Fi  | Fj  | Fk | Qj       | Qk       | Rj  | Rk  |
| Integer  | No   |     |     |    |          |          |     |     |
| Mult1    | No   |     |     |    |          |          |     |     |
| Mult2    | No   |     |     |    |          |          |     |     |
| Add      | No   |     |     |    |          |          |     |     |
| 0 Divide | Yes  | Div | F10 | F0 | F6       |          | Yes | Yes |

Register result status

## Clock

61

FU

|        |    |    |    |    |     |     |     |     |
|--------|----|----|----|----|-----|-----|-----|-----|
| F0     | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
| Divide |    |    |    |    |     |     |     |     |



- DIV.D done executing

# Dependency Graph For Example Code

Cycle 62



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)



CMPE550 - Shaaban

# Scoreboard Example: Cycle 62

| Instruction status |     |     |    | Issue | Read operands | Execution complete | Write back result |
|--------------------|-----|-----|----|-------|---------------|--------------------|-------------------|
| Instruction        | j   | k   |    |       |               |                    |                   |
| L.D                | F6  | 34+ | R2 | 1     | 2             | 3                  | 4                 |
| L.D                | F2  | 45+ | R3 | 5     | 6             | 7                  | 8                 |
| MUL.D              | F0  | F2  | F4 | 6     | 9             | 19                 | 20                |
| SUB.D              | F8  | F6  | F2 | 7     | 9             | 11                 | 12                |
| DIV.D              | F10 | F0  | F6 | 8     | 21            | 61                 | 62                |
| ADD.D              | F6  | F8  | F2 | 13    | 14            | 16                 | 22                |

  

| Functional unit status |         |      |    | dest | S1 | S2 | FU for j | FU for k | Fj? | Fk? |
|------------------------|---------|------|----|------|----|----|----------|----------|-----|-----|
| Time                   | Name    | Busy | Op | Fi   | Fj | Fk | Qj       | Qk       | Rj  | Rk  |
|                        | Integer | No   |    |      |    |    |          |          |     |     |
|                        | Mult1   | No   |    |      |    |    |          |          |     |     |
|                        | Mult2   | No   |    |      |    |    |          |          |     |     |
|                        | Add     | No   |    |      |    |    |          |          |     |     |
|                        | Divide  | No   |    |      |    |    |          |          |     |     |

  

| Register result status |    | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|------------------------|----|----|----|----|----|----|-----|-----|-----|-----|
| Clock                  | 62 | FU |    |    |    |    |     |     |     |     |

• We have:

- In-order issue,
- Out-of-order read operands, execution, completion

# Dynamic Scheduling: The Tomasulo Algorithm

- ➔ • Developed at IBM and first implemented in IBM's 360/91 mainframe in 1966, about 3 years after the debut of the scoreboard in the CDC 6600.
- Dynamically schedule the pipeline in hardware to reduce stalls.
- Differences between IBM 360 & CDC 6600 ISA.
  - IBM has only 2 register specifiers/instr vs. 3 in CDC 6600.
  - IBM has 4 FP registers vs. 8 in CDC 6600 (part of ISA).
- ➔ • Some CPU architectures that can be considered descendants of the IBM 360/91 which implement and utilize a variation of the Tomasulo Algorithm include:

**RISC CPUs:** Alpha 21264, HP 8600, MIPS R12000, PowerPC G4 ..

**RISC-core x86 CPUs:** AMD Athlon, Intel Pentium III, 4, Xeon, ....

# Tomasulo Algorithm Vs. Scoreboard

- Control & buffers distributed with Functional Units (FUs) Vs. centralized in Scoreboard:
  - FU buffers are called “reservation stations” which have pending instructions and operands and other instruction status info (including data dependencies).
    - Reservations stations are sometimes referred to as “physical registers” or “renaming registers” as opposed to architecture ISA registers specified by the ISA.
- ISA Registers in instructions are replaced by either data values (if available) or pointers (renamed) to reservation stations (RS) that will supply the value later:
  - This process is called register renaming. Done in issue stage (in-order)
    - Register renaming eliminates WAR, WAW hazards (name dependence).
    - Allows for a *hardware-based* version of loop unrolling.
    - More reservation stations than ISA registers are possible, leading to optimizations that compilers can't achieve and prevents the number of ISA registers from becoming a bottleneck.
- Instruction results go (forwarded) from RSs to RSs, not through registers, over a broadcast bus, Common Data Bus (CDB) that broadcasts results to all waiting RSs (dependant instructions).
  - Loads and Stores are treated as FUs with RSs as well.

1

Register Renaming

2

Data Forwarding

# Reservation Station (RS) Fields

- **Op** Operation to perform in the unit (e.g., + or -)
- **V<sub>j</sub>, V<sub>k</sub>** Value of Source operands S1 and S2 When available
  - Store Reservation Stations (RSs) have a single **V** field indicating result to be stored.
- **Q<sub>j</sub>, Q<sub>k</sub>** Reservation stations (if any) producing source registers. (value to be broadcast). (i.e. operand values needed by instruction)
  - No ready flags as in Scoreboard;  $Q_j, Q_k = 0 \Rightarrow$  ready.
  - **Store buffers only have Q<sub>i</sub> for RS producing result.** to be stored
    - i.e. Store Reservation Stations
    - One
- **A:** Address information for loads or stores. Initially immediate field of instruction then effective address when calculated.
- **Busy:** Indicates reservation station is busy.
- **Register result status:** Q<sub>i</sub> Indicates which Reservation Station will write each register, if one exists.
  - Blank (or 0) when no pending instruction (i.e. RS) exist that will write to that register.

# Three Stages of Tomasulo Algorithm

**1 Issue (IS):** Get instruction from pending Instruction Queue (IQ).

- Instruction issued to <sup>2</sup>a free reservation station(RS) (no structural hazard).
  - Selected RS is marked busy.  Stage 0 Instruction Fetch (IF): No changes, in-order
  - Control sends **available instruction operands values** (from ISA registers) to assigned RS.  
/ Using Q Fields
  - Operands **not available** yet are renamed to RSs that will produce the operand (register renaming). (Dynamic construction of data dependency graph) 

## **2 Execution (EX): Operate on operands.**

- When both operands are ready then start executing on assigned FU.
  - If all operands are not ready, watch Common Data Bus (CDB) for needed result (forwarding done via CDB). (i.e. wait on any remaining operands, no RAW)

**3 Write result (WB): Finish execution.** Data dependencies observed  
And also to destination

- Write result on Common Data Bus (CDB) to all awaiting units (RSs)
    - Mark reservation station as available.
      - i.e. broadcast result on CDB (forwarding)
  - Normal data bus: data + destination (“go to” bus). → Note: No WB for stores or branches

**Common Data Bus (CDB):** data + source (“come from” bus):

- 64 bits for data + 4 bits for Functional Unit **source** address.
  - Write data to waiting RS if source matches expected RS (that produces result).
  - Does the result forwarding via broadcast to waiting RSs.

**Can be  
done  
out of  
program  
order**

## **In Fourth Edition: Chapter 2.4 (In Third Edition: Chapter 3.2)**

### Including destination register

- CMPE550 - Shaaban

# Stages of Tomasulo Algorithm

Including Instruction Fetch (IF): No changes, **still in-order**



# Dynamic Scheduling: The Tomasulo Approach



The basic structure of a MIPS floating-point unit using Tomasulo's algorithm

In Fourth Edition: Chapter 2.4  
 (In Third Edition: Chapter 3.2)

Pipelined FP units are used here

**CMPE550 - Shaaban**

# Dynamic Scheduling: The Tomasulo Approach



The basic structure of a MIPS floating-point unit using Tomasulo's algorithm

In Fourth Edition: Chapter 2.4  
(In Third Edition: Chapter 3.2)

Pipelined FP units are used here

**CMPE550 - Shaaban**

# Dynamic Scheduling: The Tomasulo Approach



In Fourth Edition: Chapter 2.4  
(In Third Edition: Chapter 3.2)

Pipelined FP units are used here

CMPE550 - Shaaban

# Dynamic Scheduling: The Tomasulo Approach



# Hi I am RS ADD1 and here's my Result

- CMPE550 - Shaaban

# Dynamic Scheduling: The Tomasulo Approach



What if two or more instructions are done executing in the same cycle and need to broadcast their results on the CDB?



The basic structure of a MIPS floating-point unit using Tomasulo's algorithm

**IBM 360/91**  
Tomasulo-based (1966)

Vs.

**CDC 6600**

Scoreboard-based (1963)

**Pipelined Functional Units**

(6 load, 3 store, 3 +, 2 x/÷)

window size:  $\leq 14$  instructions

No issue on structural hazard

→ WAW: renaming avoids it

→ WAR: renaming avoids it

→ Broadcast results from FU

→ (Implements forwarding)

Control: reservation stations  
distributed

Eliminated  
By register  
renaming

Over CDB

In Fourth Edition: Chapter 2.4 (In Third Edition: Chapter 3.2)

**Multiple Functional Units  
(Not pipelined)**

(1 load/store, 1 +, 2 x, 1 ÷)

 $\leq 5$  instructions

same

stall issue ID1

stall completion WB

Write/read registers

(Forwarding *not* supported)

central scoreboard

**CMPE550 - Shaaban**

# Steps in The Tomsulo Approach and The Requirements of Each Step

| Instruction status | Wait until                                    | Action or bookkeeping                                                                                                                                                                                                                                                                  |
|--------------------|-----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue              | Station or buffer empty                       | <pre> if (Register['S1'].Qi ≠ 0)     {RS[r].Qj← Register['S1'].Qi) else {RS[r].Vj← S1; RS[r].Qj← 0}; if (Register[S2].Qi≠0)     {RS[r].Qk← Register[S2].Qi); else {RS[r].Vk← S2; RS[r].Qk← 0} RS[r].Busy← yes; Register['D'].Qi=r;</pre>                                               |
| Execute            | (RS[r].Qj=0) and<br>(RS[r].Qk=0)              | None—operands are in Vj and Vk                                                                                                                                                                                                                                                         |
| Write result       | Execution completed at r<br>and CDB available | <pre> ∀x(if (Register[x].Qi=r)      (Fx← result;     Register[x].Qi← 0)); ∀x(if (RS[x].Qj=r)  (RS[x].Vj← result;     RS[x].Qj ← 0)); ∀x(if (RS[x].Qk=r)  (RS[x].Vk← result;     RS[x].Qk ← 0)); ∀x(if (Store[x].Qi=r) (Store[x].V← result;     Store[x].Qi ← 0)); RS[r].Busy← No</pre> |

# Drawbacks of The Tomasulo Approach

- • **Implementation Complexity:**
  - Example: The implementation of the Tomasulo algorithm may have caused delays in the introduction of 360/91, MIPS 10000, IBM 620 among other CPUs.
- Many high-speed associative result stores using (CDB) are required.
- • Performance limited by **one Common Data Bus**
  - Possible solution:
    - **Multiple CDBs** → more Functional Unit and RS logic needed for parallel associative stores. (Even more complexity?)

# Tomasulo Approach Example

Using the same code used in the scoreboard example to be run on the Tomasulo configuration given earlier:

RS = Reservation Stations

Integer  
Floating Point Multiply/divide  
Floating Point add/sub

| # of RSs | EX Cycles |
|----------|-----------|
| 3        | 1         |
| 2        | 10/40     |
| 3        | 2         |

- |   |        |             |
|---|--------|-------------|
| 1 | L.D    | F6, 34(R2)  |
| 2 | L.D    | F2, 45(R3)  |
| 3 | MUL. D | F0, F2, F4  |
| 4 | SUB.D  | F8, F6, F2  |
| 5 | DIV.D  | F10, F0, F6 |
| 6 | ADD.D  | F6, F8, F2  |

→ Pipelined Functional Units

Real Data Dependence (RAW) →

Anti-dependence (WAR) →

Output Dependence (WAW) →

→ L.D processing takes two cycles: EX, MEM  
(only one cycle in scoreboard example)

# Dependency Graph For Example Code

$\emptyset$



## Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

## Date Dependence:

(1, 4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4, 6)

## Output Dependence:

(1, 6)

## Anti-dependence:

(5, 6)

|                            |                               |
|----------------------------|-------------------------------|
| Real Data Dependence (RAW) | $\rightarrow$                 |
| Anti-dependence            | $\rightarrow\leftarrow$       |
| Output Dependence          | $\leftarrow\circlearrowright$ |

The same code used is the scoreboard example

**CMPE550 - Shaaban**

# Tomasulo Example: Cycle 0

(i.e at end of cycle 0)

FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40

| Instruction status | IS          |     | EX    |                    | WB     |   | Load Reservation Stations (RSs) |         |  |
|--------------------|-------------|-----|-------|--------------------|--------|---|---------------------------------|---------|--|
|                    | Instruction | j   | Issue | Execution complete | Result | / | Busy                            | Address |  |
| L.D                | F6          | 34+ | R2    |                    |        |   | No                              |         |  |
| L.D                | F2          | 45+ | R3    |                    |        |   | No                              |         |  |
| MUL.D              | F0          | F2  | F4    |                    |        |   | No                              |         |  |
| SUB.D              | F8          | F6  | F2    |                    |        |   |                                 |         |  |
| DIV.D              | F10         | F0  | F6    |                    |        |   |                                 |         |  |
| ADD.D              | F6          | F8  | F2    |                    |        |   |                                 |         |  |

  

| RSs | Reservation Stations |      | S1   |    | S2 |    | RS for j |    | RS for k |  |
|-----|----------------------|------|------|----|----|----|----------|----|----------|--|
|     | Time                 | Name | Busy | Op | Vj | Vk | Qj       | Qk |          |  |
| 0   | Add1                 |      | No   |    |    |    |          |    |          |  |
| 0   | Add2                 |      | No   |    |    |    |          |    |          |  |
| 0   | Add3                 |      | No   |    |    |    |          |    |          |  |
| 0   | Mult1                |      | No   |    |    |    |          |    |          |  |
| 0   | Mult2                |      | No   |    |    |    |          |    |          |  |

  

| Register result status | F0 | F2 | F4 | F6 | F8 | F10 | F12 | ... | F30 |
|------------------------|----|----|----|----|----|-----|-----|-----|-----|
| Clock 0                | FU |    |    |    |    |     |     |     |     |



# Dependency Graph For Example Code

Cycle 1

$\emptyset$



Issue Instruction 1 to  
Load1 RS

Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

Output Dependence:

Anti-dependence:



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example Cycle 1

End of

FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40

| Issue | Instruction | j   | k   | IS    | EX                 | WB           |
|-------|-------------|-----|-----|-------|--------------------|--------------|
|       |             |     |     | Issue | Execution complete | Write Result |
|       | L.D         | F6  | 34+ | R2    | 1                  |              |
|       | L.D         | F2  | 45+ | R3    |                    |              |
|       | MUL.D       | F0  | F2  | F4    |                    |              |
|       | SUB.D       | F8  | F6  | F2    |                    |              |
|       | DIV.D       | F10 | F0  | F6    |                    |              |
|       | ADD.D       | F6  | F8  | F2    |                    |              |

|       | Busy | Address |
|-------|------|---------|
| Load1 | Yes  | 34+R2   |
| Load2 | No   |         |
| Load3 | No   |         |

| Reservation Stations |       | S1   | S2 | RS for j       | RS for k       |                |                |
|----------------------|-------|------|----|----------------|----------------|----------------|----------------|
| Time                 | Name  | Busy | Op | V <sub>j</sub> | V <sub>k</sub> | Q <sub>j</sub> | Q <sub>k</sub> |
| 0                    | Add1  | No   |    |                |                |                |                |
| 0                    | Add2  | No   |    |                |                |                |                |
|                      | Add3  | No   |    |                |                |                |                |
| 0                    | Mult1 | No   |    |                |                |                |                |
| 0                    | Mult2 | No   |    |                |                |                |                |

Register result status

| Clock | F0 | F2 | F4 | F6    | F8 | F10 | F12 | ... | F30 |
|-------|----|----|----|-------|----|-----|-----|-----|-----|
| 1     |    |    |    |       |    |     |     |     |     |
|       | FU |    |    | Load1 |    |     |     |     |     |

- • Issue first load to load1 reservation station (RS)

# Dependency Graph For Example Code

Cycle 2



Issue Instruction 2 to  
Load2 RS

| Example Code |       |             |
|--------------|-------|-------------|
| 1            | L.D   | F6, 34(R2)  |
| 2            | L.D   | F2, 45(R3)  |
| 3            | MUL.D | F0, F2, F4  |
| 4            | SUB.D | F8, F6, F2  |
| 5            | DIV.D | F10, F0, F6 |
| 6            | ADD.D | F6, F8, F2  |

Date Dependence:

Output Dependence:

Anti-dependence:



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example: Cycle 2

End of

Issue

Instruction status

| Instruction | $j$ | $k$ | IS Issue | EX Execution complete | WB Write Result |
|-------------|-----|-----|----------|-----------------------|-----------------|
| L.D         | F6  | 34+ | R2       | 1                     |                 |
| L.D         | F2  | 45+ | R3       | 2                     |                 |
| MUL.D       | F0  | F2  |          | F4                    |                 |
| SUB.D       | F8  | F6  |          | F2                    |                 |
| DIV.D       | F10 | F0  |          | F6                    |                 |
| ADD.D       | F6  | F8  |          | F2                    |                 |

|       | Busy | Address |
|-------|------|---------|
| Load1 | Yes  | 34+R2   |
| Load2 | Yes  | 45+R3   |
| Load3 | No   |         |

Reservation Stations

| Time | Name  | Busy | Op | $V_j$ | $V_k$ | $RS \text{ for } j$ | $RS \text{ for } k$ |
|------|-------|------|----|-------|-------|---------------------|---------------------|
| 0    | Add1  | No   |    |       |       |                     |                     |
| 0    | Add2  | No   |    |       |       |                     |                     |
|      | Add3  | No   |    |       |       |                     |                     |
| 0    | Mult1 | No   |    |       |       |                     |                     |
| 0    | Mult2 | No   |    |       |       |                     |                     |

Register result status

| Clock | FU | F0 | F2    | F4 | F6 | F8 | F10   | F12 | ... | F30 |
|-------|----|----|-------|----|----|----|-------|-----|-----|-----|
| 2     |    |    | Load2 |    |    |    | Load1 |     |     |     |

- • Issue second load to load2 reservation station

Note: Unlike 6600, can have multiple loads outstanding

(CDC6600 only has one integer FU)

# Dependency Graph For Example Code

Cycle 3



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |



Date Dependence:  
(2, 3)

Output Dependence:

Anti-dependence:



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example: Cycle 3

End of

Instruction status

| Instruction | j   | k   | Issue | Execution complete | WB  |
|-------------|-----|-----|-------|--------------------|-----|
| L.D         | F6  | 34+ | R2    | 1                  | → 3 |
| L.D         | F2  | 45+ | R3    | 2                  |     |
| MUL.D       | F0  | F2  | F4    | 3                  |     |
| SUB.D       | F8  | F6  | F2    |                    |     |
| DIV.D       | F10 | F0  | F6    |                    |     |
| ADD.D       | F6  | F8  | F2    |                    |     |

IS EX WB  
Issue Execution complete Write Result

| Busy  | Address |
|-------|---------|
| Load1 | 34+R2   |
| Load2 | 45+R3   |
| Load3 | No      |

Load processing takes 2 cycles (EX, Mem)

Reservation Stations

| Time | Name  | Busy | Op    | S1 | S2    | RS for j | RS for k |
|------|-------|------|-------|----|-------|----------|----------|
| 0    | Add1  | No   |       |    |       |          |          |
| 0    | Add2  | No   |       |    |       |          |          |
|      | Add3  | No   |       |    |       |          |          |
| → 0  | Mult1 | Yes  | MULTD | —  | R(F4) | Load2    | —        |
| 0    | Mult2 | No   |       |    |       |          |          |

Register result status

## Clock

3



- • Issue MUL.D to reservation station Mult1

# Dependency Graph For Example Code

Cycle 4



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:  
(1,4) (2, 3) (2, 4)

Output Dependence:

Anti-dependence:



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example: Cycle 4

End of

Instruction status

| Instruction | j   | k   |
|-------------|-----|-----|
| L.D         | F6  | 34+ |
| L.D         | F2  | 45+ |
| MUL.D       | F0  | F4  |
| SUB.D       | F8  | F2  |
| DIV.D       | F10 | F6  |
| ADD.D       | F6  | F8  |



## WB Over CDB

| Busy | Address |
|------|---------|
| No   |         |
| Yes  | 45+R3   |
| No   |         |

Reservation Stations

| Time | Name  | Busy | Op     | V <sub>i</sub> | V <sub>k</sub> | R.S for j | R.S for k |
|------|-------|------|--------|----------------|----------------|-----------|-----------|
| 0    | Add1  | Yes  | SUB.D  | M(34+R2)       | —              | —         | Load2     |
| 0    | Add2  | No   |        |                |                |           |           |
|      | Add3  | No   |        |                |                |           |           |
| 0    | Mult1 | Yes  | MULT.D | —              | R(F4)          | Load2     | —         |
| 0    | Mult2 | No   |        |                |                |           |           |

Register result status

| Clock | 4 | F0 | F2    | F4    | F6       | F8   | F10 | F12 | ... | F30 |
|-------|---|----|-------|-------|----------|------|-----|-----|-----|-----|
|       |   | FU | Mult1 | Load2 | M(34+R2) | Add1 |     |     |     |     |

- • Issue SUB.D      i.e. register F6 has the loaded value from memory
- • Load2 completing; what is waiting for it?

# Dependency Graph For Example Code

Cycle 5



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:  
 $(1,4) (1, 5) (2, 3) (2, 4)$   
 $(3, 5)$

Output Dependence:

Anti-dependence:



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example: Cycle 5

FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40

End of

## Instruction status

| Instruction | j   | k   | Issue | Execution complete | WB Result |
|-------------|-----|-----|-------|--------------------|-----------|
| L.D         | F6  | 34+ | R2    | 1                  | 3         |
| L.D         | F2  | 45+ | R3    | 2                  | 4         |
| MUL.D       | F0  | F2  | F4    | 3                  |           |
| SUB.D       | F8  | F6  | F2    | 4                  |           |
| DIV.D       | F10 | F0  | F6    | 5                  |           |
| ADD.D       | F6  | F8  | F2    |                    |           |



## WB Over CDB

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

Issue

## Reservation Stations

Time

| Time | Name  | Busy | Op    | Vj       | Vk       | RS for j | RS for k |
|------|-------|------|-------|----------|----------|----------|----------|
| 2    | Add1  | Yes  | SUBD  | M(34+R2) | M(45+R3) |          |          |
| 0    | Add2  | No   |       |          |          |          |          |
|      | Add3  | No   |       |          |          |          |          |
| 10   | Mult1 | Yes  | MULTD | M(45+R3) | R(F4)    |          |          |
| 0    | Mult2 | Yes  | DIVD  |          | M(34+R2) | Mult1    |          |

Execution cycles  
remaining  
(execution  
actually starts  
next cycle)

→

## Register result status

Clock

5

| Clock | F0 | F2    | F4       | F6 | F8       | F10  | F12 | ...   | F30 |
|-------|----|-------|----------|----|----------|------|-----|-------|-----|
| 5     | FU | Mult1 | M(45+R3) |    | M(34+R2) | Add1 |     | Mult2 |     |

- Load2 result forwarded via CDB to Add1, Mult1  
(SUB.D, MUL.D execution will start execution next cycle 6) ➔

- Issue DIV.D to Mult2 reservation station

# Dependency Graph For Example Code

Cycle 6

Start EX For Instructions 3 and 4



Issue Instruction 6 to Add2 RS

Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:  
 $(1,4) (1,5) (2,3) (2,4)$   
 $(2,6) (3,5) (4,6)$

Output Dependence:  
 $(1,6)$

Anti-dependence:  
 $(5,6)$



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example Cycle 6

End of

**FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40**

Instruction status

| Instruction | j   | k   | Issue | Execution | WB     |
|-------------|-----|-----|-------|-----------|--------|
|             |     |     |       | complete  | Result |
| L.D         | F6  | 34+ | R2    | 1         | 4      |
| L.D         | F2  | 45+ | R3    | 2         | 5      |
| MUL.DF0     | F2  | F4  |       | 3         |        |
| SUB.DF8     | F6  | F2  |       | 4         |        |
| DIV.D       | F10 | F0  | F6    | 5         |        |
| ADD.DF6     | F8  | F2  | 6     |           |        |

| Busy  | Address |
|-------|---------|
| Load1 | No      |
| Load2 | No      |
| Load3 | No      |

Issue

Reservation Stations

| Time | Name  | Busy | Op    | Vj       | Vk       | RS for j | RS for k |
|------|-------|------|-------|----------|----------|----------|----------|
| 1    | Add1  | Yes  | SUBD  | M(34+R2) | M(45+R3) |          |          |
| 0    | Add2  | Yes  | ADDD  | —        | M(45+R3) | Add1     | —        |
|      | Add3  | No   |       |          |          |          |          |
| 9    | Mult1 | Yes  | MULTD | M(45+R3) | R(F4)    |          |          |
| 0    | Mult2 | Yes  | DIVD  | M(34+R2) | Mult1    |          |          |

Register result status

| Clock | F0 | F2    | F4       | F6   | F8   | F10   | F12 | ... | F30 |
|-------|----|-------|----------|------|------|-------|-----|-----|-----|
| 6     | FU | Mult1 | M(45+R3) | Add2 | Add1 | Mult2 |     |     |     |

- • ADD.D is issued here vs. scoreboard (in cycle 16)

# Dependency Graph For Example Code

Cycle 7



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example: Cycle 7

End of

FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40

| Instruction status |     |     | IS    | EX<br>Execution | WB<br>Write |      |         |
|--------------------|-----|-----|-------|-----------------|-------------|------|---------|
| Instruction        | j   | k   | Issue | complete        | Result      | Busy | Address |
| L.D                | F6  | 34+ | R2    | 1               | 3           | 4    | Load1   |
| L.D                | F2  | 45+ | R3    | 2               | 4           | 5    | Load2   |
| MUL.D              | F0  | F2  | F4    | 3               |             |      | Load3   |
| SUB.D              | F8  | F6  | F2    | 4               | 7           |      |         |
| DIV.D              | F10 | F0  | F6    | 5               |             |      |         |
| ADD.D              | F6  | F8  | F2    | 6               |             |      |         |

| Reservation Stations |       | S1   | S2    | RS for j | RS for k |       |    |
|----------------------|-------|------|-------|----------|----------|-------|----|
| Time                 | Name  | Busy | Op    | Vj       | Vk       | Qj    | Qk |
| 0                    | Add1  | Yes  | SUBD  | M(34+R2) | M(45+R3) |       |    |
| 0                    | Add2  | Yes  | ADDD  |          | M(45+R3) | Add1  |    |
|                      | Add3  | No   |       |          |          |       |    |
| 8                    | Mult1 | Yes  | MULTD | M(45+R3) | R(F4)    |       |    |
| 0                    | Mult2 | Yes  | DIVD  |          | M(34+R2) | Mult1 |    |

## Register result status

| Clock | F0 | F2    | F4       | F6 | F8   | F10 | F12  | ... | F30   |
|-------|----|-------|----------|----|------|-----|------|-----|-------|
| 7     | FU | Mult1 | M(45+R3) |    | Add2 |     | Add1 |     | Mult2 |

- • RS Add1 (SUB.D) completing; what is waiting for it?

# Dependency Graph For Example Code

Cycle 10



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example: Cycle 10

End of

| Instruction status     |       |       | IS       | EX        | WB       |       |    |         |       |     |     |
|------------------------|-------|-------|----------|-----------|----------|-------|----|---------|-------|-----|-----|
| Instruction            | j     | k     | Issue    | Execution | Write    |       |    |         |       |     |     |
|                        |       |       |          | complete  | Result   |       |    |         |       |     |     |
| L.D                    | F6    | 34+   | R2       | 1         | 3        | 4     |    |         |       |     |     |
| L.D                    | F2    | 45+   | R3       | 2         | 4        | 5     |    |         |       |     |     |
| MUL.D                  | F0    | F2    | F4       | 3         |          |       |    |         |       |     |     |
| SUB.D                  | F8    | F6    | F2       | 4         | 7        | → 8   |    |         |       |     |     |
| DIV.D                  | F10   | F0    | F6       | 5         |          |       |    |         |       |     |     |
| ADD.D                  | F6    | F8    | F2       | 6         | → 10     | →     |    |         |       |     |     |
| Reservation Stations   |       |       | RS for j |           | RS for k |       |    |         |       |     |     |
| Time                   | Name  | Busy  | Op       | Vj        | Vk       | Qj    | Qk |         |       |     |     |
| 0                      | Add1  | No    |          |           |          |       |    |         |       |     |     |
| 0                      | Add2  | Yes   | ADDD     | M()–M()   | M(45+R3) |       |    |         |       |     |     |
| 0                      | Add3  | No    |          |           |          |       |    |         |       |     |     |
| 5                      | Mult1 | Yes   | MULTD    | M(45+R3)  | R(F4)    |       |    |         |       |     |     |
| 0                      | Mult2 | Yes   | DIVD     |           | M(34+R2) | Mult1 |    |         |       |     |     |
| Register result status |       |       |          |           |          |       |    |         |       |     |     |
| Clock                  |       |       | F0       | F2        | F4       | F6    | F8 | F10     | F12   | ... | F30 |
| 10                     | FU    | Mult1 | M(45+R3) |           |          | Add2  |    | M()–M() | Mult2 |     |     |

**Done executing** [ ]

→ • RS Add2 (i.e. ADD.D) completed execution

# Dependency Graph For Example Code

Cycle 11



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example: Cycle 11

End of



- ➔ • Write back (WB) result of ADD.D in this cycle
- ➔ (What about anti-dependence over F6 with DIV.D ?)

# Dependency Graph For Example Code

Cycle 15



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)



The same code used is the scoreboard example

CMPE550 - Shaaban

# Tomasulo Example: Cycle 15

End of

|                    |     |     | IS    | EX                 | WB           | Memory |         |
|--------------------|-----|-----|-------|--------------------|--------------|--------|---------|
| Instruction status |     |     |       | Execution complete | Write Result | Busy   | Address |
| Instruction        | j   | k   | Issue |                    |              |        |         |
| L.D                | F6  | 34+ | R2    | 1                  | 3            | 4      | Load1   |
| L.D                | F2  | 45+ | R3    | 2                  | 4            | 5      | Load2   |
| MUL.D              | F0  | F2  | F4    | 3                  | → 15 →       |        | Load3   |
| SUB.D              | F8  | F6  | F2    | 4                  | 7            | 8      | No      |
| DIV.D              | F10 | F0  | F6    | 5                  |              |        | No      |
| ADD.D              | F6  | F8  | F2    | 6                  | 10           | 11     | No      |

| Reservation Stations |       |         | S1             | S2       | RS for j | RS for k |
|----------------------|-------|---------|----------------|----------|----------|----------|
| Time                 | Name  | Busy Op | Vj             | Vk       | Qj       | Qk       |
| 0                    | Add1  | No      |                |          |          |          |
| 0                    | Add2  | No      |                |          |          |          |
|                      | Add3  | No      |                |          |          |          |
| executing            | Mult1 | Yes     | MULTD M(45+R3) | R(F4)    |          |          |
| 0                    | Mult2 | Yes     | DIVD           | M(34+R2) | Mult1    |          |

| Register result status |           | <i>F0</i> | <i>F2</i> | <i>F4</i> | <i>F6</i> | <i>F8</i> | <i>F10</i> | <i>F12</i> | ... | <i>F30</i> |
|------------------------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|-----|------------|
| Clock                  |           |           |           |           |           |           |            |            |     |            |
| 15                     | <i>FU</i> | Mult1     | M(45+R3)  |           | (M-M)+M() | M()-M()   | Mult2      |            |     |            |

➔ • Mult1 completed execution; what is waiting for it?

# Dependency Graph For Example Code

# Cycle 16



**The same code used is the scoreboard example**

#91 lec # 4 Spring 2023 2-13-2023

CMPE550 - Shaaban

---

#91 lec # 4 Spring 2023 2-13-2023

#91 lec # 4 Spring 2023 2-13-2023

# Tomasulo Example: Cycle 16

End of

FP EX Cycles : Add = 2 cycles, Multiply = 10, Divide = 40

| Instruction status     |             |      | IS       | EX    | WB                 |              |             |
|------------------------|-------------|------|----------|-------|--------------------|--------------|-------------|
|                        | Instruction | j    | k        | Issue | Execution complete | Write Result |             |
| L.D                    | F6          | 34+  | R2       | 1     | 3                  | 4            | Load1       |
| L.D                    | F2          | 45+  | R3       | 2     | 4                  | 5            | Load2       |
| MUL.D                  | F0          | F2   | F4       | 3     | 15                 | 16           | Load3       |
| SUB.D                  | F8          | F6   | F2       | 4     | 7                  | 8            |             |
| DIV.D                  | F10         | F0   | F6       | 5     |                    |              |             |
| ADD.D                  | F6          | F8   | F2       | 6     | 10                 | 11           |             |
| Reservation Stations   |             |      | S1       | S2    | RS for j           | RS for k     |             |
| Time                   | Name        | Busy | Op       | Vj    | Vk                 | Qj           | Qk          |
| 0                      | Add1        | No   |          |       |                    |              |             |
| 0                      | Add2        | No   |          |       |                    |              |             |
|                        | Add3        | No   |          |       |                    |              |             |
| 0                      | Mult1       | No   |          |       |                    |              |             |
| 40                     | Mult2       | Yes  | DIVD     | M*F4  |                    | M(34+R2)     |             |
| WB Over CDB            |             |      |          |       |                    |              |             |
| Register result status |             |      |          |       |                    |              |             |
| Clock                  |             | F0   | F2       | F4    | F6                 | F8           | F10 ... F30 |
| 16                     | FU          | M*F4 | M(45+R3) |       | (M-M)+M()          | M()-M()      | Mult2       |

Execution cycles remaining (execution actually starts next cycle)

→ Only Divide instruction remains

DIV.D execution will start next cycle (17)

CMPE550 - Shaaban

# Dependency Graph For Example Code

Cycle 57



Example Code

|   |       |             |
|---|-------|-------------|
| 1 | L.D   | F6, 34(R2)  |
| 2 | L.D   | F2, 45(R3)  |
| 3 | MUL.D | F0, F2, F4  |
| 4 | SUB.D | F8, F6, F2  |
| 5 | DIV.D | F10, F0, F6 |
| 6 | ADD.D | F6, F8, F2  |

Date Dependence:

(1,4) (1, 5) (2, 3) (2, 4)  
(2, 6) (3, 5) (4,6)

Output Dependence:

(1,6)

Anti-dependence:

(5,6)



CMPE550 - Shaaban

The same code used is the scoreboard example

# Tomasulo Example: Cycle 57

(vs. 62 cycles for scoreboard)



- We have:
- ➔ • In-order issue,
- ➔ • Out-of-order execution, completion

# Tomasulo Loop Example

## (Hardware/CPU-Based Version of Loop-Unrolling)

|       |        |                                 |
|-------|--------|---------------------------------|
| Loop: | L.D    | F0, 0(R1)                       |
|       | MUL.D  | F4,F0,F2                        |
|       | S.D    | F4, 0(R1)                       |
|       | DADDUI | R1,R1, # -8                     |
|       | BNE    | R1,R2, Loop ; branch if R1 ≠ R2 |



Note independent (parallel) loop iterations  
(the same loop used in loop unrolling example)

- Assume FP Multiply takes 4 execution clock cycles.
- Assume first load takes 8 cycles (possibly due to a cache miss), second load takes 4 cycles (cache hit). 3rd ....
- Assume R1 = 80 initially.
- Assume DADDUI only takes one cycle (issue)
- Assume branch resolved in issue stage (no EX or CDB write)
- Assume branch is predicted taken and no branch misprediction. i.e. Perfect branch prediction. How? Target? What if prediction Is wrong?
- No branch delay slot is used in this example.
- Stores take 4 cycles (ex, mem) and do not write on CDB
- We'll go over the execution to complete first two loop iterations. →

# Tomasulo Loop Example Dependency Graph (First three iterations shown)



Example Code

|                  |   |       |            |
|------------------|---|-------|------------|
| First Iteration  | 1 | L.D   | F0, 0 (R1) |
|                  | 2 | MUL.D | F4, F0, F2 |
|                  | 3 | S.D   | F4, 0(R1)  |
| Second Iteration | 4 | L.D   | F0, 0(R1)  |
|                  | 5 | MUL.D | F4, F0, F2 |
|                  | 6 | S.D   | F4, 0(R1)  |
| Third Iteration  | 7 | L.D   | F0, 0(R1)  |
|                  | 8 | MUL.D | F4, F0, F2 |
|                  | 9 | S.D   | F4, 0(R1)  |

Loop maintenance (DADDUI) and branches (BNE) not shown

Name dependencies between iteration 3 instructions and iteration 1 instructions are not shown in graph

# Loop Example Cycle 0

(i.e at end of cycle 0)

| Instruction status |    |    |           | IS    | EX       | WB     | Execution Write |         |  |
|--------------------|----|----|-----------|-------|----------|--------|-----------------|---------|--|
| Instruction        | j  | k  | iteration | Issue | complete | Result | Busy            | Address |  |
| L.D F0             | 0  | R1 | 1         |       |          |        | Load1           | No      |  |
| MUL.D F4           | F0 | F2 | 1         |       |          |        | Load2           | Green   |  |
| S.D F4             | 0  | R1 | 1         |       |          |        | Load3           | No      |  |
| L.D F0             | 0  | R1 | 2         |       |          |        | Store1          | No      |  |
| MUL.D F4           | F0 | F2 | 2         |       |          |        | Store2          | No      |  |
| S.D F4             | 0  | R1 | 2         |       |          |        | Store3          | No      |  |

| Reservation Stations |       |      | S1 | S2                       | RS for j | RS for k                     |    |                    |
|----------------------|-------|------|----|--------------------------|----------|------------------------------|----|--------------------|
| Time                 | Name  | Busy | Op | Vj                       | Vk       | Qj                           | Qk | Code:              |
| 0                    | Add1  | No   |    |                          |          |                              |    | L.D F0, 0(R1)      |
| 0                    | Add2  | No   |    |                          |          |                              |    | MUL.D F4,F0,F2     |
| 0                    | Add3  | No   |    | Operand Values<br>(Data) |          | Producer of<br>Result Needed |    | S.D F4, 0(R1)      |
| 0                    | Mult1 | No   |    |                          |          |                              |    | DADDUI R1, R1, #-8 |
| 0                    | Mult2 | No   |    |                          |          |                              |    | BNE R1,R2,loop     |

18

—

EX

**Includes  
MEM**

WR

- CMPE550 - Shaaban

# Tomasulo Loop Example Timing Diagram

Cycle

Iteration

|    | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
|----|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
| 1  | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
| 2  | MUL.D  | I |   |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
| 3  | S.D.   |   | I |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 4  | DADDUI |   |   | I |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 5  | BNE    |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 6  | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
| 7  | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
| 8  | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 9  | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
| 10 | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 11 | L.D.   |   |   |   |   |   |   |   |   |    | I  | E  | E  | E  | E  | W  |    |    |    |    |    |
| 12 | MUL.D  |   |   |   |   |   |   |   |   |    |    | I  |    |    |    |    | E  | E  | E  | E  |    |
| 13 | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
| 14 | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 15 | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |
| 16 | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  | E  |    |
| 17 | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |
| 18 | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    | I  |    |
| 19 | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    | I  |
| 20 | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    | I  |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 1

End of

Issue

## Instruction status

| Instruction | j  | k  | iteration |
|-------------|----|----|-----------|
| L.D         | F0 | 0  | R1        |
| MUL.D       | F4 | F0 | F2        |
| S.D         | F4 | 0  | R1        |
| L.D         | F0 | 0  | R1        |
| MUL.D       | F4 | F0 | F2        |
| S.D         | F4 | 0  | R1        |

IS EX WB

Execution Write

Issue complete Result

EX Cycles remaining

| Busy | Address |
|------|---------|
| Yes  | 80      |
| No   |         |
| No   | Qi      |
| No   |         |
| No   |         |
| No   |         |

## Reservation Stations

| Time | Name  | Busy | Op | S1 | S2 | RS for j | RS for k |
|------|-------|------|----|----|----|----------|----------|
|      |       |      |    | Vj | Vk | Qj       | Qk       |
| 0    | Add1  | No   |    |    |    |          |          |
| 0    | Add2  | No   |    |    |    |          |          |
| 0    | Add3  | No   |    |    |    |          |          |
| 0    | Mult1 | No   |    |    |    |          |          |
| 0    | Mult2 | No   |    |    |    |          |          |

Code:

|        |             |
|--------|-------------|
| L.D    | F0, 0(R1)   |
| MUL.D  | F4,F0,F2    |
| S.D    | F4, 0(R1)   |
| DADDUI | R1, R1, #-8 |
| BNE    | R1,R2,loop  |

Issue

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6 | F8 | F10 | F12... | F30 |
|-------|----|----|-------|----|----|----|-----|--------|-----|
| 1     | 80 | Qi | Load1 |    |    |    |     |        |     |



First L.D issues, takes 8 cycles to complete execution (including mem access)

CMPE550 - Shaaban

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    |    | E  | E  | E  | E  |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 2

End of

## Instruction status

| Instruction | j  | k     | iteration |
|-------------|----|-------|-----------|
| L.D         | F0 | 0 R1  | 1         |
| MUL.D       | F4 | F0 F2 | 1         |
| S.D         | F4 | 0 R1  | 1         |
| L.D         | F0 | 0 R1  | 2         |
| MUL.D       | F4 | F0 F2 | 2         |
| S.D         | F4 | 0 R1  | 2         |

IS EX WB

Execution Write

Issue complete Result

| Issue | 1 | 2 |
|-------|---|---|
|       |   |   |

Busy Address

|        |   |        |
|--------|---|--------|
| Load1  | 7 | Yes 80 |
| Load2  |   | No     |
| Load3  |   | No Qi  |
| Store1 |   | No     |
| Store2 |   | No     |
| Store3 |   | No     |

## Reservation Stations

| Time | Name  | Busy | Op    | S1 | S2    | RS for j | RS for k |
|------|-------|------|-------|----|-------|----------|----------|
|      |       |      |       | Vj | Vk    | Qj       | Qk       |
| 0    | Add1  | No   |       |    |       |          |          |
| 0    | Add2  | No   |       |    |       |          |          |
| 0    | Add3  | No   |       |    |       |          |          |
| 0    | Mult1 | Yes  | MULTD |    | R(F2) | Load1    |          |
| 0    | Mult2 | No   |       |    |       |          |          |

Code:

|        |             |
|--------|-------------|
| L.D    | F0, 0(R1)   |
| MUL.D  | F4,F0,F2    |
| S.D    | F4, 0(R1)   |
| DADDUI | R1, R1, #-8 |
| BNE    | R1,R2,loop  |

Issue

## Register result status

| Clock | R1 | F0 | F2    | F4    | F6 | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|-------|----|----|-----|---------|-----|
| 2     | 80 | Qi | Load1 | Mult1 |    |    |     |         |     |



First MUL.D issues, wait on first L.D (Load1) to write on CDB

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|--|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |  |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |  |
|           | S.D.   |   | I |   |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |  |
|           | DADDUI |   |   | I |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
| 2         | L.D.   |   |   |   | I | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   | I |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |  |
|           | S.D.   |   |   |   |   |   | I |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |  |
|           | DADDUI |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |  |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    |    | E  | E  | E  | E  |    |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |  |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  | E  |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |

I = Issue    E = Execute    W = Write Result on CDB

# Loop Example Cycle 3

End of

Issue →

## Instruction status

| Instruction | j  | k     | iteration |
|-------------|----|-------|-----------|
| L.D         | F0 | 0 R1  | 1         |
| MUL.D       | F4 | F0 F2 | 1         |
| S.D         | F4 | 0 R1  | 1         |
| L.D         | F0 | 0 R1  | 2         |
| MUL.D       | F4 | F0 F2 | 2         |
| S.D         | F4 | 0 R1  | 2         |

IS EX WB

Execution Write

Issue complete Result

| Issue | complete | Result |
|-------|----------|--------|
| 1     |          |        |
| 2     |          |        |
| 3     |          |        |

Busy Address

| Load1  | 6 | Yes 80       |
|--------|---|--------------|
| Load2  |   | No           |
| Load3  |   | No           |
| Store1 |   | Yes 80 Mult1 |
| Store2 |   | No           |
| Store3 |   | No           |

## Reservation Stations

| Time | Name  | Busy | Op    | RS for j |       | RS for k |    |
|------|-------|------|-------|----------|-------|----------|----|
|      |       |      |       | Vj       | Vk    | Qj       | Qk |
| 0    | Add1  | No   |       |          |       |          |    |
| 0    | Add2  | No   |       |          |       |          |    |
| 0    | Add3  | No   |       |          |       |          |    |
| 0    | Mult1 | Yes  | MULTD |          | R(F2) | Load1    |    |
| 0    | Mult2 | No   |       |          |       |          |    |

Code:

|        |             |
|--------|-------------|
| L.D    | F0, 0(R1)   |
| MUL.D  | F4,F0,F2    |
| S.D    | F4, 0(R1)   |
| DADDUI | R1, R1, #-8 |
| BNE    | R1,R2,loop  |

Issue

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 3     | 80 | Qi | Load1 |    | Mult1 |    |     |         |     |

→ First S.D issues, wait on first MUL.D (Mult1) to write on CDB

CMPE550 - Shaaban

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    |    | I  | E  | E  | E  | E  |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 4

End of

|                               |          |          |                  | IS                   | EX                   | WB                   |                                |
|-------------------------------|----------|----------|------------------|----------------------|----------------------|----------------------|--------------------------------|
|                               |          |          |                  | Execution Write      |                      |                      |                                |
|                               |          |          |                  | Issue                | complete             | Result               |                                |
| Instruction                   | <i>j</i> | <i>k</i> | <i>iteration</i> |                      |                      |                      |                                |
| L.D                           | F0       | 0 R1     | 1                | 1                    |                      |                      | Load1 5<br>Busy Yes Address 80 |
| MUL.D                         | F4       | F0 F2    | 1                | 2                    |                      |                      | Load2 No                       |
| S.D                           | F4       | 0 R1     | 1                | 3                    |                      |                      | Load3 No Qi                    |
| L.D                           | F0       | 0 R1     | 2                |                      |                      |                      | Store1 Yes 80 Mult1            |
| MUL.D                         | F4       | F0 F2    | 2                |                      |                      |                      | Store2 No                      |
| S.D                           | F4       | 0 R1     | 2                |                      |                      |                      | Store3 No                      |
| <u>Reservation Stations</u>   |          |          |                  | S1                   | S2                   | RS for <i>j</i>      | RS for <i>k</i>                |
| Time                          | Name     | Busy     | Op               | <i>V<sub>j</sub></i> | <i>V<sub>k</sub></i> | <i>Q<sub>j</sub></i> | <i>Q<sub>k</sub></i>           |
| 0                             | Add1     | No       |                  |                      |                      |                      | L.D F0, 0(R1)                  |
| 0                             | Add2     | No       |                  |                      |                      |                      | MUL.D F4,F0,F2                 |
| 0                             | Add3     | No       |                  |                      |                      |                      | S.D F4, 0(R1)                  |
| 0                             | Mult1    | Yes      | MULTD            |                      | R(F2)                | Load1                | DADDUI R1, R1, #5              |
| 0                             | Mult2    | No       |                  |                      |                      |                      | BNE R1,R2,loop                 |
| <u>Register result status</u> |          |          |                  |                      |                      |                      |                                |
| Clock                         | R1       |          |                  | F0                   | F2                   | F4                   | F6                             |
| 4                             | 72       | Qi       |                  | Load1                |                      | Mult1                |                                |

→ First DADDUI issues (not shown)

CMPE550 - Shaaban

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    |    | I  | E  | E  | E  | E  |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 5

End of

## Instruction status

| Instruction | j  | k     | iteration |
|-------------|----|-------|-----------|
| L.D         | F0 | 0 R1  | 1         |
| MUL.D       | F4 | F0 F2 | 1         |
| S.D         | F4 | 0 R1  | 1         |
| L.D         | F0 | 0 R1  | 2         |
| MUL.D       | F4 | F0 F2 | 2         |
| S.D         | F4 | 0 R1  | 2         |

## IS EX WB

### Execution Write

| Issue | complete | Result |
|-------|----------|--------|
| 1     |          |        |
| 2     |          |        |
| 3     |          |        |

| Busy   | Address | Q <sub>i</sub> |
|--------|---------|----------------|
| Load1  | Yes 80  |                |
| Load2  | No      |                |
| Load3  | No      | Q <sub>i</sub> |
| Store1 | Yes 80  | Mult1          |
| Store2 | No      |                |
| Store3 | No      |                |

## Reservation Stations

| Time | Name  | Busy | Op    | S1             | S2             | RS for j       | RS for k       |
|------|-------|------|-------|----------------|----------------|----------------|----------------|
|      |       |      |       | V <sub>j</sub> | V <sub>k</sub> | Q <sub>j</sub> | Q <sub>k</sub> |
| 0    | Add1  | No   |       |                |                |                |                |
| 0    | Add2  | No   |       |                |                |                |                |
| 0    | Add3  | No   |       |                |                |                |                |
| 0    | Mult1 | Yes  | MULTD |                | R(F2)          | Load1          |                |
| 0    | Mult2 | No   |       |                |                |                |                |

| Code:  |             |
|--------|-------------|
| L.D    | F0, 0(R1)   |
| MUL.D  | F4,F0,F2    |
| S.D    | F4, 0(R1)   |
| DADDUI | R1, R1, #-8 |
| BNE    | R1,R2,loop  |

Issue

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 5     | 72 | Qi | Load1 |    | Mult1 |    |     |         |     |

→ First BNE issues (not shown), assumed predicted taken

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|--|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |  |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |  |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |  |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    |    |    |    |    | E  | E  | E  | E  | W  |    |  |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    |    |    | E  | E  | E  |  |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |  |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  | E  | E  | E  | E  |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |  |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  | E  |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 6

| Instruction status | Instruction | j  | k  | iteration | IS    | EX                 | WB           | End of            |
|--------------------|-------------|----|----|-----------|-------|--------------------|--------------|-------------------|
|                    |             |    |    |           | Issue | Execution complete | Write Result |                   |
| L.D F0             |             | 0  | R1 | 1         | 1     |                    |              | Load1 3<br>Yes 80 |
| MUL.D F4           |             | F0 | F2 | 1         | 2     |                    |              | Load2 4<br>Yes 72 |
| S.D F4             |             | 0  | R1 | 1         | 3     |                    |              | Load3<br>No       |
| L.D F0             |             | 0  | R1 | 2         | 6     |                    |              | Store1<br>Yes 80  |
| MUL.D F4           |             | F0 | F2 | 2         |       |                    |              | Store2<br>No      |
| S.D F4             |             | 0  | R1 | 2         |       |                    |              | Store3<br>No      |

  

| Reservation Stations | Time | Name  | Busy | Op    | S1 | S2    | RS for j | RS for k | Code:             |
|----------------------|------|-------|------|-------|----|-------|----------|----------|-------------------|
|                      |      |       |      |       | Vj | Vk    | Qj       | Qk       |                   |
|                      | 0    | Add1  | No   |       |    |       |          |          | L.D F0, 0(R1)     |
|                      | 0    | Add2  | No   |       |    |       |          |          | MUL.D F4,F0,F2    |
|                      | 0    | Add3  | No   |       |    |       |          |          | S.D F4, 0(R1)     |
|                      | 0    | Mult1 | Yes  | MULTD |    | R(F2) |          | Load1    | DADDUI R1, R1, #8 |
|                      | 0    | Mult2 | No   |       |    |       |          |          | BNE R1,R2,loop    |

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 6     | 72 | Qi | Load2 |    | Mult1 |    |     |         |     |

- Second L.D. issues (will take four ex cycles) Note: F0 never sees Load1 result
- WAW between first and second L.D on F0 eliminated by register renaming

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    |    | I  | E  | E  | E  | E  |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 7

End of

## Instruction status

| Instruction | j  | k  | iteration |
|-------------|----|----|-----------|
| L.D F0      | 0  | R1 | 1         |
| MUL.D F4    | F0 | F2 | 1         |
| S.D F4      | 0  | R1 | 1         |
| L.D F0      | 0  | R1 | 2         |
| MUL.D F4    | F0 | F2 | 2         |
| S.D F4      | 0  | R1 | 2         |

IS      EX      WB  
Execution Write

| Issue | complete | Result |
|-------|----------|--------|
| 1     |          |        |
| 2     |          |        |
| 3     |          |        |
| 6     |          |        |
| 7     |          |        |

Busy    Address

|        |   |                    |
|--------|---|--------------------|
| Load1  | 2 | Yes    80          |
| Load2  | 3 | Yes    72          |
| Load3  |   | No    Qi           |
| Store1 |   | Yes    80    Mult1 |
| Store2 |   | No                 |
| Store3 |   | No                 |

## Reservation Stations

| Time | Name  | Busy | Op    | RS for j |       | RS for k |    | Code:                     |
|------|-------|------|-------|----------|-------|----------|----|---------------------------|
|      |       |      |       | Vj       | Vk    | Qj       | Qk |                           |
| 0    | Add1  | No   |       |          |       |          |    | L.D            F0, 0(R1)  |
| 0    | Add2  | No   |       |          |       |          |    | MUL.D        F4,F0,F2     |
| 0    | Add3  | No   |       |          |       |          |    | S.D            F4, 0(R1)  |
| 0    | Mult1 | Yes  | MULTD |          | R(F2) | Load1    |    | DADDUI      R1, R1, #-8   |
| → 0  | Mult2 | Yes  | MULTD |          | R(F2) | Load2    |    | BNE            R1,R2,loop |

## Register result status

| Clock | R1 | F0 | F2    | F4    | F6 | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|-------|----|----|-----|---------|-----|
| 7     | 72 | Qi | Load2 | Mult2 |    |    |     |         |     |

→ • Second MUL.D issues (to RS Mult2) Note: F4 never sees Mult1 result ←

→ • WAW between first and second MUL.D on F4 eliminated by register renaming

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    |    | E  | E  | E  | E  |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    |    | I  |    | E  | E  | E  | E  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 8

End of

## Instruction status

| Instruction | $j$ | $k$ | iteration |
|-------------|-----|-----|-----------|
| L.D         | F0  | 0   | R1        |
| MUL.D       | F4  | F0  | F2        |
| S.D         | F4  | 0   | R1        |
| L.D         | F0  | 0   | R1        |
| MUL.D       | F4  | F0  | F2        |
| S.D         | F4  | 0   | R1        |

IS      EX      WB  
Execution Write

| Issue | complete | Result |
|-------|----------|--------|
| 1     |          |        |
| 2     |          |        |
| 3     |          |        |
| 6     |          |        |
| 7     |          |        |
| 8     |          |        |

| Busy   | Address        |
|--------|----------------|
| Load1  | 1 Yes 80       |
| Load2  | 2 Yes 72       |
| Load3  | No Qi          |
| Store1 | 1 Yes 80 Mult1 |
| Store2 | 2 Yes 72 Mult2 |
| Store3 | No             |

Issue →

## Reservation Stations

| Time | Name  | Busy | Op    | RS for $j$ |       | RS for $k$ |       |
|------|-------|------|-------|------------|-------|------------|-------|
|      |       |      |       | V $j$      | V $k$ | Q $j$      | Q $k$ |
| 0    | Add1  | No   |       |            |       |            |       |
| 0    | Add2  | No   |       |            |       |            |       |
| 0    | Add3  | No   |       |            |       |            |       |
| 0    | Mult1 | Yes  | MULTD |            | R(F2) | Load1      |       |
| 0    | Mult2 | Yes  | MULTD |            | R(F2) | Load2      |       |

Code:

|        |             |
|--------|-------------|
| L.D    | F0, 0(R1)   |
| MUL.D  | F4,F0,F2    |
| S.D    | F4, 0(R1)   |
| DADDUI | R1, R1, #-8 |
| BNE    | R1,R2,loop  |

Issue

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 8     | 72 | Qi | Load2 |    | Mult2 |    |     |         |     |



- Second S.D issues (to RS Store2)

# Tomasulo Loop Example Timing Diagram

| Iteration | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   | E  | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   | I |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   | I | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   | I |   |   |   |    | E  | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   | I |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    |    | I  | E  | E  | E  |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 9

End of

## Instruction status

| Instruction | j  | k     | iteration |
|-------------|----|-------|-----------|
| L.D         | F0 | 0 R1  | 1         |
| MUL.D       | F4 | F0 F2 | 1         |
| S.D         | F4 | 0 R1  | 1         |
| L.D         | F0 | 0 R1  | 2         |
| MUL.D       | F4 | F0 F2 | 2         |
| S.D         | F4 | 0 R1  | 2         |

| Issue | Execution |   | WB Result |
|-------|-----------|---|-----------|
|       | complete  |   |           |
| 1     |           | 9 |           |
| 2     |           |   |           |
| 3     |           |   |           |
| 6     |           |   |           |
| 7     |           |   |           |
| 8     |           |   |           |

| Load   | Busy | Address | First Load EX Done |  |
|--------|------|---------|--------------------|--|
|        |      |         | Qi                 |  |
| Load1  | Yes  | 80      |                    |  |
| Load2  | Yes  | 72      |                    |  |
| Load3  | No   |         |                    |  |
| Store1 | Yes  | 80      | Mult1              |  |
| Store2 | Yes  | 72      | Mult2              |  |
| Store3 | No   |         |                    |  |

## Reservation Stations

| Time | Name  | Busy | Op    |
|------|-------|------|-------|
| 0    | Add1  | No   |       |
| 0    | Add2  | No   |       |
| 0    | Add3  | No   |       |
| 0    | Mult1 | Yes  | MULTD |
| 0    | Mult2 | Yes  | MULTD |

| S1 | S2 | RS for j | RS for k |
|----|----|----------|----------|
| Vj | Vk | Qj       | Qk       |

| Code:  |            |
|--------|------------|
| L.D    | F0, 0(R1)  |
| MUL.D  | F4,F0,F2   |
| S.D    | F4, 0(R1)  |
| DADDUI | R1, R1, #1 |
| BNE    | R1,R2,loop |

Issue

## Register result status

| Clock | R1 | F0 | F2    | F4    | F6 | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|-------|----|----|-----|---------|-----|
| 9     | 64 | Qi | Load2 | Mult2 |    |    |     |         |     |



- Issue second DADDUI (not shown)
- Load1 completing; what is waiting for it?

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|--|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |  |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |  |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |  |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |  |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |  |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    | I  |    | E  | E  | E  | E  |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |  |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  | E  |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 10

## Instruction status

| Instruction | $j$ | $k$   | iteration |
|-------------|-----|-------|-----------|
| L.D         | F0  | 0 R1  | 1         |
| MUL.D       | F4  | F0 F2 | 1         |
| S.D         | F4  | 0 R1  | 1         |
| L.D         | F0  | 0 R1  | 2         |
| MUL.D       | F4  | F0 F2 | 2         |
| S.D         | F4  | 0 R1  | 2         |

## IS EX WB

| Issue    | Execution | Write |
|----------|-----------|-------|
| complete | Result    |       |
| 1        | 9         | 10    |
| 2        |           |       |
| 3        |           |       |
| 6        | 10        |       |
| 7        |           |       |
| 8        |           |       |

Second Load EX Done

Busy Address

| Load1  | No           |
|--------|--------------|
| Load2  | 0 Yes 72     |
| Load3  | No Qi        |
| Store1 | Yes 80 Mult1 |
| Store2 | Yes 72 Mult2 |
| Store3 | No           |

## Reservation Stations

Time Name Busy Op

S1 S2 RS for  $j$  RS for  $k$

V $j$

V $k$

Q $j$

Q $k$

Code:

Execution cycles  
remaining  
(execution  
actually starts  
next cycle)

|   |       |                       |
|---|-------|-----------------------|
| 0 | Add1  | No                    |
| 0 | Add2  | No                    |
| 0 | Add3  | No                    |
| 4 | Mult1 | Yes MULTD M(80) R(F2) |
| 0 | Mult2 | Yes MULTD R(F2) Load2 |

WB Over CDB

L.D F0, 0(R1)  
MUL.D F4,F0,F2  
S.D F4, 0(R1)  
DADDUI R1, R1, #-8  
BNE R1,R2,loop

Issue

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 10    | 64 | Qi | Load2 |    | Mult2 |    |     |         |     |

- Load1 result forwarded via CDB to Mult1, execution will start next cycle 11
- Issue second BNE (not shown)
- Load2 completing; what is waiting for it?

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   |    | I  | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    | I  |    |    |    | I  |    | E  | E  | E  |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 11

End of

## Instruction status

| Instruction | <i>j</i> | <i>k</i> | iteration |
|-------------|----------|----------|-----------|
| L.D F0      | 0        | R1       | 1         |
| MUL.D F4    | F0       | F2       | 1         |
| S.D F4      | 0        | R1       | 1         |
| L.D F0      | 0        | R1       | 2         |
| MUL.D F4    | F0       | F2       | 2         |
| S.D F4      | 0        | R1       | 2         |

| IS | EX    |          | WB     |
|----|-------|----------|--------|
|    | Issue | complete | Result |
|    | 1     | 9        | 10     |
|    | 2     |          |        |
|    | 3     |          |        |
|    | 6     | 10       | 11     |
|    | 7     |          |        |
|    | 8     |          |        |

| Busy | Address  |
|------|----------|
| No   |          |
| No   |          |
| Yes  | 64 Qi    |
| Yes  | 80 Mult1 |
| Yes  | 72 Mult2 |
| No   |          |

## Reservation Stations

| Time | Name  | Busy | Op    |
|------|-------|------|-------|
| 0    | Add1  | No   |       |
| 0    | Add2  | No   |       |
| 0    | Add3  | No   |       |
| 3    | Mult1 | Yes  | MULTD |
| 4    | Mult2 | Yes  | MULTD |

| S1 | S2 | RS for <i>j</i> | RS for <i>k</i> |
|----|----|-----------------|-----------------|
| Vj | Vk | Qj              | Qk              |

| Code:  | Issue       |
|--------|-------------|
| L.D    | F0, 0(R1)   |
| MUL.D  | F4,F0,F2    |
| S.D    | F4, 0(R1)   |
| DADDUI | R1, R1, #-8 |
| BNE    | R1,R2,loop  |

Execution cycles remaining  
(execution actually starts next cycle)

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 11    | 64 | Qi | Load3 |    | Mult2 |    |     |         |     |

→ Load2 result forwarded via CDB to Mult2, execution will start next cycle 12  
 → Third iteration L.D. issues (to RS Load3)

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   |    | I  | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    | I  |    |    |    | I  | E  | E  | E  | E  |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 12

End of

## Instruction status

| Instruction | <i>j</i> | <i>k</i> | iteration |
|-------------|----------|----------|-----------|
| L.D F0      | 0        | R1       | 1         |
| MUL.D F4    | F0       | F2       | 1         |
| S.D F4      | 0        | R1       | 1         |
| L.D F0      | 0        | R1       | 2         |
| MUL.D F4    | F0       | F2       | 2         |
| S.D F4      | 0        | R1       | 2         |

| IS | Execution | WB       |        | Busy   | Address      |
|----|-----------|----------|--------|--------|--------------|
|    |           | complete | Result |        |              |
|    | Issue 1   | 9        | 10     | Load1  | No           |
|    | Issue 2   |          |        | Load2  | No           |
|    | Issue 3   |          |        | Load3  | Yes 64 Qi    |
|    | Issue 6   | 10       | 11     | Store1 | Yes 80 Mult1 |
|    | Issue 7   |          |        | Store2 | Yes 72 Mult2 |
|    | Issue 8   |          |        | Store3 | No           |

## Reservation Stations

| Time | Name  | Busy | Op    | RS for <i>j</i> |                | RS for <i>k</i> | Code:                   |
|------|-------|------|-------|-----------------|----------------|-----------------|-------------------------|
|      |       |      |       | V <sub>j</sub>  | V <sub>k</sub> |                 |                         |
| 0    | Add1  | No   |       |                 |                |                 | L.D F0, 0(R1)           |
| 0    | Add2  | No   |       |                 |                |                 | MUL.D F4,F0,F2 – Issue? |
| 0    | Add3  | No   |       |                 |                |                 | S.D F4, 0(R1)           |
| 2    | Mult1 | Yes  | MULTD | M(80)           | R(F2)          |                 | DADDUI R1, R1, #-8      |
| 3    | Mult2 | Yes  | MULTD | M(72)           | R(F2)          |                 | BNE R1,R2,loop          |

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 12    | 64 | Qi | Load3 |    | Mult2 |    |     |         |     |

➔ Issue third iteration MUL.D ?

No both Mult reservation stations busy

CMPE550 - Shaaban

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|--|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |  |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   | I |   |   |   |   |   |   |    |    |    | E  | E  | E  | E  | W  |    |    |    |    |  |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |  |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    |    |    |    |    |    | E  | E  | W  |    |    |  |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    |    |    | E  | E  | E  |  |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |  |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    | I  |    | E  | E  | E  | E  |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |  |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  | E  |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 13

End of

## Instruction status

| Instruction | <i>j</i> | <i>k</i> | iteration |
|-------------|----------|----------|-----------|
| L.D F0      | 0        | R1       | 1         |
| MUL.D F4    | F0       | F2       | 1         |
| S.D F4      | 0        | R1       | 1         |
| L.D F0      | 0        | R1       | 2         |
| MUL.D F4    | F0       | F2       | 2         |
| S.D F4      | 0        | R1       | 2         |

| IS    | EX       | WB        |        | Busy | Address  |
|-------|----------|-----------|--------|------|----------|
|       |          | Execution | Write  |      |          |
| Issue | complete | Result    |        |      |          |
| 1     | 9        | 10        | Load1  | No   |          |
| 2     |          |           | Load2  | No   |          |
| 3     |          |           | Load3  | Yes  | 64 Qi    |
| 6     | 10       | 11        | Store1 | Yes  | 80 Mult1 |
| 7     |          |           | Store2 | Yes  | 72 Mult2 |
| 8     |          |           | Store3 | No   |          |

## Reservation Stations

| Time | Name  | Busy | Op    | S1             | S2             | RS for <i>j</i> | RS for <i>k</i> | Code:  |
|------|-------|------|-------|----------------|----------------|-----------------|-----------------|--------|
|      |       |      |       | V <sub>j</sub> | V <sub>k</sub> | Q <sub>j</sub>  | Q <sub>k</sub>  |        |
| 0    | Add1  | No   |       |                |                |                 |                 | L.D    |
| 0    | Add2  | No   |       |                |                |                 |                 | MUL.D  |
| 0    | Add3  | No   |       |                |                |                 |                 | S.D    |
| 1    | Mult1 | Yes  | MULTD |                | M(80)          | R(F2)           |                 | DADDUI |
| 2    | Mult2 | Yes  | MULTD |                | M(72)          | R(F2)           |                 | BNE    |

?

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 13    | 64 | Qi | Load3 |    | Mult2 |    |     |         |     |

→ Issue third iteration MUL.D, S.D ?

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|--|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |  |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |  |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |  |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |  |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |  |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |  |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | E  | W  |    |    |    |    |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    | I  |    | E  | E  | E  | E  |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |  |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  | E  |    |  |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |  |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 14

End of

## Instruction status

| Instruction | j  | k     | iteration |
|-------------|----|-------|-----------|
| L.D         | F0 | 0 R1  | 1         |
| MUL.D       | F4 | F0 F2 | 1         |
| S.D         | F4 | 0 R1  | 1         |
| L.D         | F0 | 0 R1  | 2         |
| MUL.D       | F4 | F0 F2 | 2         |
| S.D         | F4 | 0 R1  | 2         |

| Issue | EX Execution |        | WB Result |
|-------|--------------|--------|-----------|
|       | complete     | Result |           |
| 1     | 9            | 10     | Load1     |
| 2     | 14           |        | Load2     |
| 3     |              |        | Load3     |
| 6     | 10           | 11     | Store1    |
| 7     |              |        | Store2    |
| 8     |              |        | Store3    |

## Busy Address

|     |    |       |
|-----|----|-------|
| No  |    |       |
| No  |    |       |
| Yes | 64 | Qi    |
| Yes | 80 | Mult1 |
| Yes | 72 | Mult2 |
| No  |    |       |

## Reservation Stations

| Time | Name  | Busy | Op    | S1 |       | RS for j |    | RS for k |  | Code:  |
|------|-------|------|-------|----|-------|----------|----|----------|--|--------|
|      |       |      |       | Vj | Vk    | Qj       | Qk |          |  |        |
| 0    | Add1  | No   |       |    |       |          |    |          |  | L.D    |
| 0    | Add2  | No   |       |    |       |          |    |          |  | MUL.D  |
| 0    | Add3  | No   |       |    |       |          |    |          |  | S.D    |
| 0    | Mult1 | Yes  | MULTD |    | M(80) | R(F2)    |    |          |  | DADDUI |
| 1    | Mult2 | Yes  | MULTD |    | M(72) | R(F2)    |    |          |  | BNE    |

Issue?

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 14    | 64 | Qi | Load3 |    | Mult2 |    |     |         |     |

- • Mult1 completing; what is waiting for it?

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   |    | I  | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    | I  |    |    |    | I  | E  | E  | E  | E  |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 15

| Instruction status |    |    |           | IS    | EX                 | WB           |                              |
|--------------------|----|----|-----------|-------|--------------------|--------------|------------------------------|
| Instruction        | j  | k  | iteration | Issue | Execution complete | Write Result |                              |
| L.D F0             | 0  | R1 | 1         | 1     | 9                  | 10           | Load1<br>No                  |
| MUL.D F4           | F0 | F2 | 1         | 2     | 14                 | 15           | Load2<br>No                  |
| S.D F4             | 0  | R1 | 1         | 3     |                    |              | Load3<br>Yes 64 Qi           |
| L.D F0             | 0  | R1 | 2         | 6     | 10                 | 11           | Store1<br>Yes 80 M(80)*R(F2) |
| MUL.D F4           | F0 | F2 | 2         | 7     | 15                 |              | Store2<br>Yes 72 Mult2       |
| S.D F4             | 0  | R1 | 2         | 8     |                    |              | Store3<br>No                 |

  

| Reservation Stations |         |      |       | S1    | S2    | RS for j | RS for k |
|----------------------|---------|------|-------|-------|-------|----------|----------|
| Time                 | Name    | Busy | Op    | Vj    | Vk    | Qj       | Qk       |
| 0                    | Add1    | No   |       |       |       |          |          |
| 0                    | Add2    | No   |       |       |       |          |          |
| 0                    | Add3    | No   |       |       |       |          |          |
| 0                    | Mult1   | No   |       |       |       |          |          |
| EX Done              | 0 Mult2 | Yes  | MULTD | M(72) | R(F2) |          |          |

  

| Register result status |    |    |       |
|------------------------|----|----|-------|
| Clock                  | R1 | F0 | F2    |
| 15                     | 64 | Qi | Load3 |

- Mult2 completing; what is waiting for it? Issue third multiply?
- Third iteration L.D done execution

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    | E  | E  | E  | E  |    |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   |    | I  | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    | I  |    |    |    | I  | E  | E  | E  | E  |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 16

End of

## Instruction status

| Instruction | <i>j</i> | <i>k</i> | iteration |
|-------------|----------|----------|-----------|
| L.D F0      | 0        | R1       | 1         |
| MUL.D F4    | F0       | F2       | 1         |
| S.D F4      | 0        | R1       | 1         |
| L.D F0      | 0        | R1       | 2         |
| MUL.D F4    | F0       | F2       | 2         |
| S.D F4      | 0        | R1       | 2         |

| IS<br>Issue | EX<br>Execution |       | WB<br>Result |
|-------------|-----------------|-------|--------------|
|             | complete        | Write |              |
| 1           | 9               | 10    | Load1        |
| 2           | 14              | 15    | Load2        |
| 3           |                 |       | Load3        |
| 6           | 10              | 11    | Store1       |
| 7           | 15              | 16    | Store2       |
| 8           |                 |       | Store3       |

| Busy | Address | WB Over CDB |
|------|---------|-------------|
| No   |         |             |
| No   |         |             |
| Yes  | Qi      |             |
| Yes  | 80      | M(80)*R(F2) |
| Yes  | 72      | M(72)*R(72) |
| No   |         |             |

## Reservation Stations

| Time | Name  | Busy | Op    | S1                   | S2                   | RS for <i>j</i>      | RS for <i>k</i>      |
|------|-------|------|-------|----------------------|----------------------|----------------------|----------------------|
|      |       |      |       | <i>V<sub>j</sub></i> | <i>V<sub>k</sub></i> | <i>Q<sub>j</sub></i> | <i>Q<sub>k</sub></i> |
| 0    | Add1  | No   |       |                      |                      |                      |                      |
| 0    | Add2  | No   |       |                      |                      |                      |                      |
| 0    | Add3  | No   |       |                      |                      |                      |                      |
| 0    | Mult1 | Yes  | MULTD |                      | R(F2)                | Load3                |                      |
| 0    | Mult2 | No   |       |                      |                      |                      |                      |

Code:

|        |            |
|--------|------------|
| L.D    | F0, 0(R1)  |
| MUL.D  | F4,F0,F2   |
| S.D    | F4, 0(R1)  |
| DADDUI | R1, R1, #8 |
| BNE    | R1,R2,loop |

Issue

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 16    | 64 | Qi | Load3 |    | Mult1 |    |     |         |     |

→ Issue third iteration MUL.D (to RS Mult1)

CMPE550 - Shaaban

# Tomasulo Loop Example Timing Diagram

| Iteration | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
| 1         | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    | I  |    | E  | E  | E  | E  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 17

## Instruction status

| Instruction | $j$ | $k$ | iteration |
|-------------|-----|-----|-----------|
| L.D F0      | 0   | R1  | 1         |
| MUL.D F4    | F0  | F2  | 1         |
| S.D F4      | 0   | R1  | 1         |
| L.D F0      | 0   | R1  | 2         |
| MUL.D F4    | F0  | F2  | 2         |
| S.D F4      | 0   | R1  | 2         |

## IS EX WB

### Execution Write

| Issue | complete | Result |
|-------|----------|--------|
| 1     | 9        | 10     |
| 2     | 14       | 15     |
| 3     |          |        |
| 6     | 10       | 11     |
| 7     | 15       | 16     |
| 8     |          |        |

End of

## Busy Address

| Load1  | No  | Qi             |
|--------|-----|----------------|
| Load2  | No  |                |
| Load3  | No  |                |
| Store1 | Yes | 80 M(80)*R(F2) |
| Store2 | Yes | 72 M(72)*R(72) |
| Store3 | Yes | Mult1          |

## Reservation Stations

| Time | Name  | Busy | Op    |
|------|-------|------|-------|
| 0    | Add1  | No   |       |
| 0    | Add2  | No   |       |
| 0    | Add3  | No   |       |
| 4    | Mult1 | Yes  | MULTD |
| 0    | Mult2 | No   |       |

S1 S2 RS for  $j$  RS for  $k$

V $j$  V $k$  Q $j$  Q $k$

Code:

|        |            |
|--------|------------|
| L.D    | F0, 0(R1)  |
| MUL.D  | F4,F0,F2   |
| S.D    | F4, 0(R1)  |
| DADDUI | R1, R1, #8 |
| BNE    | R1,R2,loop |

Issue

## Register result status

| Clock | R1 | Q $i$ | F0 | F2 | F4 | F6 | F8 | F10 | F12 ... | F30 |
|-------|----|-------|----|----|----|----|----|-----|---------|-----|
| 17    | 64 | Qi    | F0 |    |    |    |    |     |         |     |

WB Over CDB

M(64) R(F2)

→ Third iteration L.D writes on CDB (delayed one cycle due to CDB conflict) ←

→ Issue third iteration S.D (to RS Store3)

# Tomasulo Loop Example Timing Diagram

Cycle

|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
| Iteration | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
| 1         | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   | I |   |   |   |   |   |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   | I |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   | I |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   | I | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   | I |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    | E  | E  | E  | E  |    |
|           | DADDUI |   |   |   |   |   | I |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   | I | E  | E  | E  | E  | W  |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   | I  |    |    |    |    | I  |    | E  | E  | E  | E  |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 18

End of

## Instruction status

| Instruction | <i>j</i> | <i>k</i> | iteration |
|-------------|----------|----------|-----------|
| L.D         | F0       | 0 R1     | 1         |
| MUL.D       | F4       | F0 F2    | 1         |
| S.D         | F4       | 0 R1     | 1         |
| L.D         | F0       | 0 R1     | 2         |
| MUL.D       | F4       | F0 F2    | 2         |
| S.D         | F4       | 0 R1     | 2         |

## IS EX WB

*Execution Write*

| Issue | complete | Result |
|-------|----------|--------|
| 1     | 9        | 10     |
| 2     | 14       | 15     |
| 3     |          |        |
| 6     | 10       | 11     |
| 7     | 15       | 16     |
| 8     |          |        |

| Busy | Address     |
|------|-------------|
| No   |             |
| No   | Qi          |
| Yes  | M(80)*R(F2) |
| Yes  | M(72)*R(72) |
| Yes  | Mult1       |

## Reservation Stations

| Time | Name  | Busy | Op    | S1                   | S2                   | RS for <i>j</i>      | RS for <i>k</i>      |
|------|-------|------|-------|----------------------|----------------------|----------------------|----------------------|
|      |       |      |       | <i>V<sub>j</sub></i> | <i>V<sub>k</sub></i> | <i>Q<sub>j</sub></i> | <i>Q<sub>k</sub></i> |
| 0    | Add1  | No   |       |                      |                      |                      |                      |
| 0    | Add2  | No   |       |                      |                      |                      |                      |
| 0    | Add3  | No   |       |                      |                      |                      |                      |
| 3    | Mult1 | Yes  | MULTD |                      | M(64)                | R(F2)                |                      |
| 0    | Mult2 | No   |       |                      |                      |                      |                      |

Code:

|        |            |
|--------|------------|
| L.D    | F0, 0(R1)  |
| MUL.D  | F4,F0,F2   |
| S.D    | F4, 0(R1)  |
| DADDUI | R1, R1, #8 |
| BNE    | R1,R2,loop |

Issue

## Register result status

| Clock | R1 | F0 | F2 | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|----|----|-------|----|-----|---------|-----|
| 18    | 56 | Qi |    |    | Mult1 |    |     |         |     |

➔ Issue third iteration DADDUI

CMPE550 - Shaaban

# Tomasulo Loop Example Timing Diagram

Cycle

|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
| Iteration | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
| 1         | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    | E  | E  | E  | E  |    |    |
|           | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
|           | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   |    | I  | E  | E  | E  | E  | W  |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    | I  |    |    |    | E  | E  | E  | E  |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    | I  |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    | I  |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    | I  |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

First Loop Iteration Done

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

→ (First Loop Iteration Done)

# Loop Example Cycle 19

| Instruction status     | Instruction | j    | k     | iteration | IS    |                    |              | EX       |     | WB |             | Busy    | Address |
|------------------------|-------------|------|-------|-----------|-------|--------------------|--------------|----------|-----|----|-------------|---------|---------|
|                        |             |      |       |           | Issue | Execution complete | Write Result |          |     |    |             |         |         |
| L.D                    | F0          | 0    | R1    | 1         | 1     | 9                  | 10           | Load1    | No  |    |             |         |         |
| MUL.D                  | F4          | F0   | F2    | 1         | 2     | 14                 | 15           | Load2    | No  |    |             |         |         |
| S.D                    | F4          | 0    | R1    | 1         | 3     | 19                 |              | Load3    | No  |    | Qi          |         |         |
| L.D                    | F0          | 0    | R1    | 2         | 6     | 10                 | 11           | Store1   | No  |    |             |         |         |
| MUL.D                  | F4          | F0   | F2    | 2         | 7     | 15                 | 16           | Store2   | Yes | 72 | M(72)*R(72) |         |         |
| S.D                    | F4          | 0    | R1    | 2         | 8     |                    |              | Store3   | Yes | 64 | Mult1       |         |         |
| Reservation Stations   |             |      |       |           | S1    | S2                 | RS for j     | RS for k |     |    |             |         |         |
| Time                   | Name        | Busy | Op    |           | Vj    | Vk                 | Qj           | Qk       |     |    |             |         |         |
| 0                      | Add1        | No   |       |           |       |                    |              |          |     |    |             |         |         |
| 0                      | Add2        | No   |       |           |       |                    |              |          |     |    |             |         |         |
| 0                      | Add3        | No   |       |           |       |                    |              |          |     |    |             |         |         |
| 2                      | Mult1       | Yes  | MULTD |           |       | M(64)              | R(F2)        |          |     |    |             |         |         |
| 0                      | Mult2       | No   |       |           |       |                    |              |          |     |    |             |         |         |
| Register result status |             |      |       |           |       |                    |              |          |     |    |             |         |         |
| Clock                  | R1          |      |       |           | F0    |                    | F2           | F4       | F6  | F8 | F10         | F12 ... | F30     |
| 19                     | 56          | Qi   |       |           |       |                    |              |          |     |    |             |         |         |

→ First S.D done (No write on CDB for stores) First loop iteration done  
 → Issue third iteration BNE

CMPE550 - Shaaban

# Tomasulo Loop Example Timing Diagram

Cycle

|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
| Iteration | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
| 1         | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   | I |   |   |   |   |   |   |    |    |    |    |    |    |    |    | E  | E  | E  | E  |
|           | DADDUI |   |   | I |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   | I |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   | I | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   | I |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
|           | S.D.   |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    | E  | E  | E  | E  |
|           | DADDUI |   |   |   |   |   | I |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   | I |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   | I | E  | E  | E  | E  | W  |    |    |    |    |    |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   | I  |    |    |    |    | I  |    |    | E  | E  | E  | E  |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  | E  |    |    |
|           | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
|           | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|           | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

Second Loop Iteration Done

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# → (First Two Loop Iterations Done) Loop Example Cycle 20



## Instruction status

| Instruction | <i>j</i> | <i>k</i> | iteration | IS EX Execution Write |          |        | Busy   | Address      |
|-------------|----------|----------|-----------|-----------------------|----------|--------|--------|--------------|
|             |          |          |           | Issue                 | complete | Result |        |              |
| L.D         | F0       | 0        | R1        | 1                     | 9        | 10     | Load1  | 4<br>Yes 54  |
| MUL.D       | F4       | F0       | F2        | 1                     | 14       | 15     | Load2  | No           |
| S.D         | F4       | 0        | R1        | 1                     | 19       |        | Load3  | No Qi        |
| L.D         | F0       | 0        | R1        | 2                     | 10       | 11     | Store1 | No           |
| MUL.D       | F4       | F0       | F2        | 2                     | 15       | 16     | Store2 | 0<br>No      |
| S.D         | F4       | 0        | R1        | 2                     | 20       |        | Store3 | Yes 64 Mult1 |

## Reservation Stations

| Time | Name  | Busy | Op    | RS for <i>j</i> |                | RS for <i>k</i> |                | Code:              |
|------|-------|------|-------|-----------------|----------------|-----------------|----------------|--------------------|
|      |       |      |       | V <sub>j</sub>  | V <sub>k</sub> | Q <sub>j</sub>  | Q <sub>k</sub> |                    |
| 0    | Add1  | No   |       |                 |                |                 |                | L.D F0, 0(R1)      |
| 0    | Add2  | No   |       |                 |                |                 |                | MUL.D F4,F0,F2     |
| 0    | Add3  | No   |       |                 |                |                 |                | S.D F4, 0(R1)      |
| 1    | Mult1 | Yes  | MULTD | M(64)           | R(F2)          |                 |                | DADDUI R1, R1, #-8 |
| 0    | Mult2 | No   |       |                 |                |                 |                | BNE R1,R2,loop     |

## Register result status

| Clock | R1 | F0 | F2    | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|----|-------|----|-------|----|-----|---------|-----|
| 20    | 56 | Qi | Load1 |    | Mult1 |    |     |         |     |

→ Second S.D done (No write on CDB for stores) Second loop iteration done

→ Issue fourth iteration L.D (to RS Load1)

# Tomasulo Loop Example Timing Diagram

|           | Cycle  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
|-----------|--------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|
|           | 1      | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| Iteration | L.D.   | I | E | E | E | E | E | E | E | W  |    |    |    |    |    |    |    |    |    |    |    |
| 1         | MUL.D  |   | I |   |   |   |   |   |   |    | E  | E  | E  | E  | W  |    |    |    |    |    |    |
| 1         | S.D.   |   |   | I |   |   |   |   |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
| 1         | DADDUI |   |   |   | I |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 1         | BNE    |   |   |   |   | I |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | L.D.   |   |   |   |   | I | E | E | E | E  | W  |    |    |    |    |    |    |    |    |    |    |
| 2         | MUL.D  |   |   |   |   |   | I |   |   |    |    | E  | E  | E  | E  | W  |    |    |    |    |    |
| 2         | S.D.   |   |   |   |   |   |   | I |   |    |    |    |    |    |    | E  | E  | E  | E  |    |    |
| 2         | DADDUI |   |   |   |   |   |   |   | I |    |    |    |    |    |    |    |    |    |    |    |    |
| 2         | BNE    |   |   |   |   |   |   |   |   | I  |    |    |    |    |    |    |    |    |    |    |    |
| 3         | L.D.   |   |   |   |   |   |   |   |   | I  | E  | E  | E  | E  | W  |    |    |    |    |    |    |
| 3         | MUL.D  |   |   |   |   |   |   |   |   |    | I  |    |    |    |    | E  | E  | E  | E  |    |    |
| 3         | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    | I  |    |    |    |    |    |    |
| 3         | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  |    |    |    |    |
| 3         | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | L.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    | I  | E  |    |    |    |
| 4         | MUL.D  |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    | I  |    |    |    |
| 4         | S.D.   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 4         | DADDUI |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |
| 4         | BNE    |   |   |   |   |   |   |   |   |    |    |    |    |    |    |    |    |    |    |    |    |

I = Issue    E = Execute    W = Write Result on CDB

CMPE550 - Shaaban

# Loop Example Cycle 21

End of

## Instruction status

| Instruction | j  | k     | iteration |
|-------------|----|-------|-----------|
| L.D         | F0 | 0 R1  | 1         |
| MUL.D       | F4 | F0 F2 | 1         |
| S.D         | F4 | 0 R1  | 1         |
| L.D         | F0 | 0 R1  | 2         |
| MUL.D       | F4 | F0 F2 | 2         |
| S.D         | F4 | 0 R1  | 2         |

## IS EX WB

### Execution Write

| Issue | complete | Result |
|-------|----------|--------|
| 1     | 9        | 10     |
| 2     | 14       | 15     |
| 3     | 19       |        |
| 6     | 10       | 11     |
| 7     | 15       | 16     |
| 8     | 20       |        |

## Busy Address

|        |   |     |          |
|--------|---|-----|----------|
| Load1  | 3 | Yes | 54       |
| Load2  |   | No  |          |
| Load3  |   | No  | Qi       |
| Store1 |   | No  |          |
| Store2 |   | No  |          |
| Store3 |   | Yes | 64 Mult1 |

## Reservation Stations

| Time | Name  | Busy | Op    | Vj    | Vk    | Qj    | Qk | Code:              |
|------|-------|------|-------|-------|-------|-------|----|--------------------|
| 0    | Add1  | No   |       |       |       |       |    | L.D F0, 0(R1)      |
| 0    | Add2  | No   |       |       |       |       |    | MUL.D F4,F0,F2     |
| 0    | Add3  | No   |       |       |       |       |    | S.D F4, 0(R1)      |
| 0    | Mult1 | Yes  | MULTD | M(64) | R(F2) |       |    | DADDUI R1, R1, #-8 |
| 0    | Mult2 | Yes  | MULTD |       | R(F2) | Load1 |    | BNE R1,R2,loop     |

## Register result status

| Clock | R1 | F0    | F2 | F4 | F6    | F8 | F10 | F12 ... | F30 |
|-------|----|-------|----|----|-------|----|-----|---------|-----|
| 21    | 56 | Load3 |    |    | Mult2 |    |     |         |     |

- Mult1 (third iteration MUL.D) completing; what is waiting for it?
- Issue fourth iteration MUL.D (to RS Mult2)

# Quiz # 4

On Lecture Notes Set # 4

→ Wednesday, March 1

→ Quiz On Tomasulo Only – Not on Scoreboard (FYI)