

CS M151B Final , Junhang Wang (50494113)

$$ET = IC \cdot \frac{CPI}{\uparrow} \cdot CT$$

2



cost of mis-prediction: 2

R 50%

LW 25%

SW 10%

REQ / BNE 15%

25%

LW

DEP

20%

LW

DEP

10%

LW

DEP

I \$

D \$

25% miss

30%



Branch resolves in EX

Branch predictor 20% miss

2

20

$$TCPJ = BCPJ + MCPI$$

$$BCPI = 1 + (\text{control hazard rate}) (\text{control hazard penalty})$$

$$+ (\text{data hazard rate}) (\text{data hazard penalty})$$

$$= 1 + 0.15 \cdot 0.2 \cdot 2 + 0.25 \cdot 0.25 \cdot 1$$

$$= 1.1225$$

$$MCPI = I\$M + D\$M$$

$$= 1 \cdot 0.25 (20 + 0.1 \cdot 150) + 0.25 \cdot 0.3 (20 + 0.1 \cdot 150)$$

$$= 11.375$$

$$TCPJ = 1.1225 + 11.375$$

$$= 12.4975$$



cost of misprediction : 3



cost of data hazard followed by dep : 2

cost of data hazard followed by ind. dep : 1

cost of data hazard followed by ind. ind - dep : 0

$$TCP_I = BCPI + MCPI$$

$$BCPI = 1 + (\text{control hazard rate}) (\text{control hazard penalty})$$

$$+ (\text{data hazard rate}) (\text{data hazard penalty})$$

$$= 1 + 0.15 \cdot 0.2 \cdot 3 + 0.25 \cdot 0.25 \cdot 2 + 0.25 \cdot 0.2 \cdot 1$$

$$= 1.265$$

$$MCPI = 1 M + 0 M$$

$$= 1 \cdot 0.1 \cdot (20 + 0.1 \cdot 150) + 0.25 \cdot 0.15 (20 + 0.1 \cdot 150)$$

$$= 4.8125$$

$$TCP_I = 1.265 + 4.8125$$

$$= 6.0775$$

3

Core 1

$sw \$21, 8(\$+0)$   
 $\downarrow$   
 $0x504$



(word at  $0x50C$ ) =  $0x77777777$

L1 miss

L2 hit

L2

| 1 | 1 | 0x508 | 0x DE | A D | B E | E F | D E | F E | C 8 | 0x50F |
|---|---|-------|-------|-----|-----|-----|-----|-----|-----|-------|

↓

77 77 77 77

Core 0 L1 Detects change

0x508 0x DE A D B E E F D E F E C 8 ED

Assume write-invalidate protocol

4

problem  
too many live variables

5





2 ...000101000000

3 ...0111010011000

4 ...000101001000

5 ...000001001100

6 ...0101010010000

7 ...0001010101100

8 ...0001010010000

9 ...0111010011000

10 ...0001010011000

11 ...0000010011000

12 ...0101010010000

L1

|    | U   | D#  | TAG     |
|----|-----|-----|---------|
|    | Idx |     |         |
| 1  | 01  | ... | 0001011 |
| 2  | 01  | ... | 0001010 |
| 3  | 01  | ... | 0111010 |
| 4  | 01  | ... | 0001010 |
| 5  | 01  | ... | 0000010 |
| 6  | 01  | ... | 0101010 |
| 7  | 01  | ... | 0001011 |
| 8  | 01  | ... | 0001010 |
| 9  | 01  | ... | 0111010 |
| 10 | 01  | ... | 0001010 |
| 11 | 01  | ... | 0000010 |
| 12 | 01  | ... | 0101010 |

L2

|   | Idx | TAG        |
|---|-----|------------|
| 1 | 101 | ...000101  |
| 2 | 001 | ...0001011 |
| 3 | 001 | ...011101  |
| 5 | 001 | ...000000  |
| 6 | 001 | ...010101  |

TLB

|   | TAG   |
|---|-------|
| 1 | ...00 |
| 2 | ...01 |

6

$$ET = \frac{IC}{\downarrow} \cdot \frac{CPI}{\overline{\quad}} \cdot \frac{CT}{\uparrow}$$

7

## Branch delay slots

| LW / SW          | ALU / Branch         |                                                                    |
|------------------|----------------------|--------------------------------------------------------------------|
| lw \$t0, 8(\$s0) |                      | lw \$t0, 8(\$s0)                                                   |
|                  | addi \$s0, \$s0, 16  | add \$t1, \$t1, \$s1                                               |
|                  | add \$t0, \$t0, \$s1 | lw \$t1, 0(\$t1)                                                   |
| lw \$t0, 0(\$t0) | bne \$s0, \$s2, HERE | add \$t9, \$t9, \$t1<br>addi \$s0, \$0, 32<br>bne \$s0, \$s2, HERE |
|                  | add \$t9, \$t9, \$t0 |                                                                    |

| LW / SW           | ALU / Branch         |
|-------------------|----------------------|
| lw \$t0, 8(\$s0)  | addi \$s0, \$s0, 32  |
| lw \$t1, 24(\$s0) | add \$t0, \$t0, \$s1 |
| lw \$t0, 0(\$t0)  | add \$t1, \$t1, \$s1 |
| lw \$t1, 0(\$t1)  | bne \$s0, \$s2, HERE |
|                   | add \$t9, \$t9, \$t0 |
|                   | add \$t9, \$t9, \$t1 |

unroll Once

lw \$t0, 8(\$s0)

add \$t0, \$t0, \$s1

lw \$t0, 0(\$t0)

~~add \$t9, \$t9, \$t0~~

lw \$t1, 24(\$s0)

add \$t1, \$t1, \$s1

lw \$t1, 0(\$t1)

add \$t9, \$t9, \$t1

addi \$s0, \$0, 32

bne \$s0, \$s2, HERE

$$ET = \frac{IC}{\downarrow} \cdot \frac{CPI}{\text{same}} \cdot \frac{CT}{\text{same}}$$

9

W A R

~~WA~~ or WT



|                  |                  |                      |               |
|------------------|------------------|----------------------|---------------|
| 10               |                  |                      |               |
| HERE : ... 00000 | lw \$t0, 0(\$t1) |                      | (PC > 2) / 04 |
| add              |                  |                      |               |
| lw               |                  |                      |               |
| sub              |                  |                      |               |
| beq              |                  | ... THERE (T, NT, T) | 0             |
| lw               |                  |                      |               |
| add              |                  |                      |               |
| THERE : add      |                  |                      |               |
| bne : HERE       |                  | (T, T, NT)           | 0             |
|                  |                  |                      |               |

BP

|    |    |
|----|----|
| 00 | 00 |
| 01 | 01 |
| 10 | 01 |
| 11 | 01 |

Both branches map to the same entry

| current state | prediction | actual branch | new state |
|---------------|------------|---------------|-----------|
| 01            | NT         | BEQ1 T        | 11        |
| 11            | T          | BNE1 T        | 11        |
| 11            | T          | BEQ2 NT       | 10        |
| 10            | T          | BNE2 T        | 11        |
| 11            | T          | BEQ3 T        | 11        |
| 11            | T          | BNE3 NT       | 10        |

11



12

A-4  
A-3 X  
A-2 WB  
A-1 M  
A EX2  
A+1 EX1  
A+2

16

↓ in control hazard

14

2 instructions per cycle

$$\begin{array}{r} 20 + 60 \\ \hline 40 \end{array}$$

15.

L1

capacity miss



16



20