

# Exe 1 Scoreboard: the Code

I1: LD F6 32+ R2  
I2: ADDD F2 F6 F4  
I3: MULTD F0 F4 F2  
I4: SUBD F12 F2 F6  
I5: ADDD F0 F12 F2

## CONFLICTS

I1: LD F6 32+ R2

I2: ADDD F2 F6 F4

I3: MULTD F0 F4 F2

I4: SUBD F12 F2 F6

I5: ADDD F0 F12 F2

RAW F6 I1→I2

RAW F6 I1→I4

RAW F2 I2→I4

RAW F2 I2→I3

RAW F2 I2→I5

RAW F12 I4→I5

WAW F0 I3→I5

.

# Exe 1.2 Scoreboard: $\exists$ a configuration?

|                    | Issue | Read Op | Exec Co. | Write R. |
|--------------------|-------|---------|----------|----------|
| I1: LD F6 32+ R2   | 1     | 2       | 7        | 8        |
| I2: ADDD F2 F6 F4  | 2     | 9       | 11       | 12       |
| I3: MULTD F0 F4 F2 | 4     | 13      | 43       | 44       |
| I4: SUBD F12 F2 F6 | 3     | 9       | 11       | 12       |
| I5: ADDD F0 F12 F2 | 13    | 17      | 19       | 20       |

- Is there a “configuration” that can respect the shown execution?
- How many units? Which kind? What latency?

# Exe 1.2 Scoreboard: $\exists$ a configuration?

|                    | Issue | Read Op | Exec Co. | Write R. |
|--------------------|-------|---------|----------|----------|
| I1: LD F6 32+ R2   | 1     | 2       | 7        | 8        |
| I2: ADDD F2 F6 F4  | 2     | 9       | 11       | 12       |
| I3: MULTD F0 F4 F2 | 4     | 13      | 43       | 44       |
| I4: SUBD F12 F2 F6 | 3     | 9       | 11       | 12       |
| I5: ADDD F0 F12 F2 | 13    | 17      | 19       | 20       |

INVALID FOR SEVERAL PROBLEMS:

- ORDER NOT RESPECTED
- ATTEMPT TO ACQUIRE THE SAME SOURCE FROM DIFFERENT INSTRUCTIONS

# Exe 1.3 Scoreboard: if not correct, write right one

|    | Instruction    | ISSUE | READ OPERAND | EXE COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|--------------|--------------|----|---------|------|
| I1 | LD F6 32+ R2   |       |              |              |    |         |      |
| I2 | ADDD F2 F6 F4  |       |              |              |    |         |      |
| I3 | MULTD F0 F4 F2 |       |              |              |    |         |      |
| I4 | SUBD F12 F2 F6 |       |              |              |    |         |      |
| I5 | ADDD F0 F12 F2 |       |              |              |    |         |      |

If the previous table was not correct, please, write the right one and specify the number, kind and latency for each unit.

4 FP ALU 3 CC LATENCY, SINGLE WRITE PORT FOR THE POOL  
1 MEM 2 CC LATENCY

# Exe 1.3 Scoreboard: if not correct, write right one

|    | Instruction    | ISSUE | READ OPERAND | EXE COMPLETE | WB  | Hazards              | Unit     |
|----|----------------|-------|--------------|--------------|-----|----------------------|----------|
| I1 | LD F6 32+ R2   | 1     | 2            | $2+2=4$      | 5 * |                      | M.U.     |
| I2 | ADDD F2 F6 F4  | 2     | 5+1=6        | 6+3=9        | 10  | RAW F6               | FP. U. 1 |
| I3 | MULTD F0 F4 F2 | 3     | 11           | 14           | 15  | RAW F2               | FP. U. 2 |
| I4 | SUBD F12 F2 F6 | 4     | 11           | 14           | 16  | RAW F2<br>+STRUCT RF | FP. U. 3 |
| I5 | ADDD F0 F12 F2 | 16    | 17           | 20           | 21  | WAW F0<br>RAW F12    | FP. U. 4 |

If the previous table was not correct, please, write the right one and specify the number, kind and latency for each unit.

4 FP ALU 3cc LATENCY, SINGLE WRITE PORT FOR THE POOL  
1 MEM 2cc LATENCY

# Scoreboard Vs Tomasulo - Clk=1

S1: ADDD F0, F2, F4

S2: MULTD F2, F6, F8

S3: MULTD F10, F0, F2

S4: ADDD F0, F12, F14

$$\max(5+1, 8+1) = 9$$

| Name  | Busy | Op  | Fi | Fj | Fk | Qj | Qk  | Rj  | Rk  | Et  | Ist |    | Issue | Read           | Op | Exec     | Co. | Write | R. |
|-------|------|-----|----|----|----|----|-----|-----|-----|-----|-----|----|-------|----------------|----|----------|-----|-------|----|
| Mult1 |      |     |    |    |    |    |     |     |     |     |     |    | addd  | 1              | 2  | 4        |     |       | 5  |
| mult2 |      |     |    |    |    |    |     |     |     |     |     |    | multd | 2              | 3  | 7        |     |       | 8  |
| Add1  | YES  | ADD | F0 | F2 | F4 |    |     |     |     | YES | YES | S1 | multd | 3              | *  | $9+4=13$ |     |       | 14 |
| Add2  |      |     |    |    |    |    |     |     |     |     |     |    | addd  | <del>5+6</del> | 7  | 9        |     |       | 10 |
| F0    |      |     | F2 | F4 | F6 | F8 | F10 | F12 | F14 |     |     |    |       |                |    |          |     |       |    |
| ADD1  |      |     |    |    |    |    |     |     |     |     |     |    |       |                |    |          |     |       |    |

| Name  | Op  | Vj    | Vk    | Qj | Qk | Etime |     | Issue | Exec | Co.     | Write | R. |
|-------|-----|-------|-------|----|----|-------|-----|-------|------|---------|-------|----|
| mult1 |     |       |       |    |    |       |     | addd  | 1    | $1+2=3$ |       | 4  |
| mult2 |     |       |       |    |    |       |     | multd | 2    | $2+4=6$ |       | 7  |
| add1  | ADD | R(F2) | R(F4) |    |    |       |     | multd | 3    | *       |       | 12 |
| add2  |     |       |       |    |    |       |     | addd  | 4    | $4+2=6$ |       | 8  |
| F0    |     | F2    | F4    | F6 | F8 | F10   | F12 | F14   |      |         |       |    |
| ADD1  |     |       |       |    |    |       |     |       |      |         |       |    |

$$\max(7+4, 8+2) = 11$$

ADD 2cc, MUL 4cc

# Exe.2 The code

I1: LD \$F1, 0(\$R1)  
I2: FADD \$F2, \$F2, \$F3  
I3: ADDI \$R3, \$R3, 8  
I4: LD \$F4, 0(R2)  
I5: FADD \$F5, \$F4, \$F2  
I6: FMULT \$F6, \$F1, \$F4  
I7: ADDI \$R5, \$R5, 1  
I8: LD \$R6, 0(\$R4)  
I9: SD \$F6, 0(\$R5)  
I10: SD \$F5, 0(\$R1)  
I11: ADD \$R1, \$R6, \$R1

## CONFLICTS

I1: LD \$F1, 0(\$R1)  
I2: FADD \$F2, \$F2, \$F3  
I3: ADDI \$R3, \$R3, 8  
I4: LD \$F4, 0(\$R2)  
I5: FADD \$F5, \$F4, \$F2  
I6: FMULT \$F6, \$F1, \$F4  
I7: ADDI \$R5, \$R5, 1  
I8: LD \$R6, 0(\$R4)  
I9: SD \$F6, 0(\$R5)  
I10: SD \$F5, 0(\$R1)  
I11: ADD \$R1, \$R6, \$R1

RAW \$F1 11→16

RAW \$F2 12→15

RAW \$F4 14→15

RAW \$F4 14→16

RAW \$F5 15→10

RAW \$F6 16→19

RAW \$R5 17→19

RAW \$R6 18→11

WAR \$R1 120→111

# Exe.2 Scoreboard CC0

\*STRUCT RF

|     | Instruction            | ISSUE | READ OPERAND | EXE COMPLETE | WB      | Hazards                            | Unit    |
|-----|------------------------|-------|--------------|--------------|---------|------------------------------------|---------|
| I1  | LD \$F1, 0(\$R1)       | 1     | 2            | 2+3=5        | 6       |                                    | M.U.1   |
| I2  | FADD \$F2, \$F2, \$F3  | 2     | 3            | 3+4=7        | 8       |                                    | FP.U.1  |
| I3  | ADDI \$R3, \$R3, 8     | 3     | 4            | 4+1=5        | 7       | STRUCT RF                          | ALU U.1 |
| I4  | LD \$F4, 0(R2)         | 4     | 5            | 8            | 9       |                                    | M.U.2   |
| I5  | FADD \$F5, \$F4, \$F2  | 5     | 10           | 14           | 15      | RAW \$F4 14->15<br>RAW \$F2 12->15 | FP.U.2  |
| I6  | FMULT \$F6, \$F1, \$F4 | 6     | 10           | 14           | 15+1=16 | RAW SF1 11->16<br>RAW SF4 14->16   | FP.U.3  |
| I7  | ADDI \$R5, \$R5, 1     | 7     | 8            | 9            | 10      |                                    | ALU U.2 |
| I8  | LD \$R6, 0(\$R4)       | 8     | 9            | 12           | 13      |                                    | M.U.3   |
| I9  | SD \$F6, 0(\$R5)       | 9     | 17           | 20           | 21      | RAW \$F6 16->19<br>RAW \$R5 17->19 | M.U.1   |
| I10 | SD \$F5, 0(\$R1)       | 10    | 16           | 19           | 20      | RAW \$F5 13->20                    | M.U.2   |
| I11 | ADD \$R1, \$R6, \$R1   | 11    | 14           | 15           | 17      | RAW \$R6 18->21<br>WAR \$R1 15->11 | ALU U.1 |

3 MU ,3cc

3 FPUs, 4cc

2 Integer ALU, 1cc  
Single W port overall

RAW F0 I1-I6    RAW F4 I4-I5    RAW F5 I5-I10    RAW R6 I8-I11

RAW F2 I2-I5

RAW F4 I4-I6

RAW F6 I6-I9

WAR R1 I10-I11

POLITECNICO  
MILANO 1863

# Exe .1 Scoreboard: Code

```
I1:  lw $f1, 0($r0)
I2:  faddi $f1, $f1, C1
I3:  faddi $f2, $f1, C2
I4:  sw $f2, 0($r0)
I5:  lw $f2, 4($r0)
I6:  fadd $f2, $f2, $f2
I7:  sw $f2, 4($r0)
```

## CONFLICTS

I1: lw \$f1, 0(\$r0)  
I2: faddi \$f1, \$f1, C1  
I3: faddi \$f2, \$f1, C2  
I4: sw \$f2, 0(\$r0)  
I5: lw \$f4, 4(\$r0)  
I6: fadd \$f2, \$f2, \$f2  
I7: sw \$f2, 4(\$r0)

RAW \$F1 11-12

RAW \$F1 12-13

RAW \$F2 13-14

RAW \$F2 15-16

RAW \$P2 16-17

WAW \$F1 11-12

WAW \$P2 15-16

WAW \$F2 13-15

WAW \$P2 13-16

WAW \$F2 14-16

WAR \$F2 13-15

# Exe Scoreboard

3 FPU, latency 2 cc  
4 LDU, latency 4 cc

| Instruction              | ISSUE | READ OPERAND | EXE COMPLETE | WB | Unit |
|--------------------------|-------|--------------|--------------|----|------|
| I1:lw \$f1, 0(\$r0)      |       |              |              |    |      |
| I2:faddi \$f1, \$f1, C1  |       |              |              |    |      |
| I3:faddi \$f2, \$f1, C2  |       |              |              |    |      |
| I4:sw \$f2, 0(\$r0)      |       |              |              |    |      |
| I5:lw \$f2, 4(\$r0)      |       |              |              |    |      |
| I6:fadd \$f2, \$f2, \$f2 |       |              |              |    |      |
| I7:sw \$f2, 4(\$r0)      |       |              |              |    |      |

I1 cache miss

I5 cache miss

→ Penalty 5 cc

# Exe Scoreboard

3 FPU, latency 2 cc  
4 LDU, latency 4 cc

| Instruction              | ISSUE | READ OPERAND | EXE COMPLETE | WB | Unit |
|--------------------------|-------|--------------|--------------|----|------|
| I1:lw \$f1, 0(\$r0)      | 1     | 2            | 2+4+2=8      | 9  | LDU1 |
| I2:faddi \$f1, \$f1, c1  | 10    | 11           | 13           | 14 | FPU1 |
| I3:faddi \$f2, \$f1, c2  | 11    | 15           | 17           | 18 | FPU2 |
| I4:sw \$f2, 0(\$r0)      | 12    | 19           | 23           | 24 | LDU1 |
| I5:lw \$f2, 4(\$r0)      | 19    | 20           | 20+4+9=29    | 30 | LDU2 |
| I6:fadd \$f2, \$f2, \$f2 | 31    | 32           | 34           | 35 | FPU1 |
| I7:sw \$f2, 4(\$r0)      | 32    | 36           | 40           | 41 | LDU3 |

I1 cache miss

→ Penalty of 2 cc

I5 cache miss

→ Penalty 5 cc

## Problem 1

Assume that the following code is executed on a CPU with SCOREBOARD and with the following units:

- 1 LOAD/STORE unit with Latency= 1
- 2 MULT units with latency = 10
- 1 DIVIDE unit with latency = 30
- 1 ADD/SUBD unit with latency 2

|                | Issue    | Read Op   | Exec Co.  | Write R.  |
|----------------|----------|-----------|-----------|-----------|
| LD F6 32+ R2   | 1        | <u>2</u>  | 3         | 4         |
| LD F2 45+ R3   | 5        | 6         | 7         | 8         |
| MULTD F0 F4 F2 | 6        | 9         | 19        | 20        |
| ADD F8 F2 F6   | 7        | 9         | <u>10</u> | <u>11</u> |
| DIVD F12 F8 F0 | 8        | <u>21</u> | 51        | 52        |
| SUBD F8 F6 F2  | <u>9</u> | 13        | 14        | <u>15</u> |

- A. List all the possible conflicts in the code.
- B. Is there a “configuration” that can respect the shown execution?  
How many units? Which kind? What latency?
- C. If the previous table was not correct, please, write the right one by having:  
 1 LOAD: 1cc  
 2 MULT: 10cc  
 1 DIVL 30cc  
 3 ADD/SUB: 1cc

1

|       |           |           |           |
|-------|-----------|-----------|-----------|
| LD    | F6        | 32+       | R2        |
| LD    | F2        | 45+       | R3        |
| MULTD | F0        | F4        | <u>F2</u> |
| ADD   | <u>F8</u> | <u>F2</u> | <u>F6</u> |
| DIVD  | F12       | <u>F8</u> | <u>F0</u> |
| SUBD  | <u>F8</u> | <u>F6</u> | <u>F2</u> |

WAV F8 14-16

RAW F6 11-14  
RAW F6 11-16

RAW F2 12-13  
RAW F2 12-14  
RAW F2 12-16

RAW F0 13-15

RAW F8 14-15  
WAR F8 15-16

**Answer 1.B**

NO: IT SUFFICES TO NOTICE THAT  
 WAR ON F8 (15-16) IS NOT MANAGED

**Answer 1.C**

**! NOTICE THAT ADD/SUBD, HERE, HAS A ≠ CC THAN BEFORE**

|                | Issue | Read Op | Exec Co. | Write R. |            |
|----------------|-------|---------|----------|----------|------------|
| LD F6 32+ R2   | 1     | 2       | 3        | 4        | ONLY       |
| LD F2 45+ R3   | 5     | 6       | 7        | 8        | 1 LOU      |
| MULTD F0 F4 F2 | 6     | 9       | 19       | 20       | AVAILABLE  |
| ADD F8 F2 F6   | 7     | 9       | 10       | 11       | ↓          |
| DIVD F12 F8 F0 | 8     | 21      | 51       | 52       | STRUCT LOU |
| SUBD F8 F6 F2  | 12    | 13      | 14       | 22       |            |