

# Homework 7 Yash Shah

## Problem 5.5

5.5.1 —

$$\text{Offset} = 5 \text{ bits}$$



$$\text{cache block size} = 1 \text{ byte} - \text{offset bits}$$

$$\text{word size} = 32 \text{ bits} = 4 \text{ bytes}$$

$$= 8 - 5$$

$$= 3 \Rightarrow 2^3 \text{ cache block size}$$

8 words

5.5.2 —

$$\text{index field} = 5 \text{ bits} \Rightarrow 2^5 = 32 \text{ entries in cache}$$

5.5.3 —

$$\text{Cache Storage} = \# \text{ entries} \cdot \text{words/block} \cdot \text{bytes/word} \cdot \text{bits/byte}$$

$$= 32 \cdot 8 \cdot 8 \cdot 8$$

$$= 16384 \text{ bits}$$

tag bits      valid bits  
↓                ↓

$$\text{ratio} = \frac{18144}{16384}$$

$$\text{data storage} = 16384 + 54(32) + 1(32)$$

$$= 18144$$

$$\text{ratio} = 1.11$$

5.5.4

| Address | Tag | Index | Offset | H/M? | Replaced bytes |
|---------|-----|-------|--------|------|----------------|
| 00      | 0   | 00    | 00     | M    |                |
| 04      | 0   | 00    | 04     | H    |                |
| 10      | 0   | 00    | 10     | H    |                |
| 84      | 0   | 04    | 04     | M    |                |
| EB      | 0   | 07    | 08     | M    |                |
| A0      | 0   | 04    | 00     | M    |                |
| 400     | 1   | 00    | 00     | M    | 00 → 10        |
| 1E      | 0   | 00    | 10     | M    | 400 → 40       |
| 2C      | 0   | 04    | 04     | H    |                |
| C0      | 3   | 00    | 00     | M    |                |
| 84      | 0   | 00    | 00     | M    |                |
| 884     | 2   | 04    | 04     | M    | 04 → 08        |

5.5.5 —

$$3 \text{ hits out of } 12 = \frac{3}{12} = 0.25$$

25%

5.5.6 —

$$<00, 3, \text{mem}[00] - \text{mem}[10]>$$

$$<04, 2, \text{mem}[04] - \text{mem}[08]>$$

$$<04, 1, \text{mem}[03] - \text{mem}[07]>$$

$$<07, 0, \text{mem}[02] - \text{mem}[1F]>$$

Problem 5.10.

5.10.1

$$\text{clock rate}_{P_1} = 1/0.66 = 152$$

$$\text{clock rate}_{P_2} = 1/0.90 = 1.11$$

5.10.2

MAT = main memory access / L1 hit time

$$\begin{aligned} \text{MAT}_{P_1} &= 70/0.66 \\ &= 107 \text{ cycles} \end{aligned}$$

$$\begin{aligned} \text{MAT}_{P_2} &= 70/0.90 \\ &= 78 \text{ cycles} \end{aligned}$$

$$\text{AMAT} = (1 + (\text{L1 miss rate} \cdot \text{MAT})) \cdot \text{L1 hit time}$$

$$\text{AMAT}_{P_1} = (1 + (0.05 \cdot 107)) \cdot 0.66$$

$$\text{AMAT}_{P_2} = (1 + (0.06 \cdot 78)) \cdot 0.90$$

$$\text{AMAT}_{P_1} = 9.31 \text{ ns}$$

$$\text{AMAT}_{P_2} = 5.11 \text{ ns}$$

5.10.3

$$\begin{aligned} CP_{P_1} &= AMAT_{P_1} + 0.08 \cdot 0.36 \cdot 107 \\ &= 9.31 + 3.08 \end{aligned}$$

$CP_{P_1} = 12.39 \text{ cycles}$

$$\begin{aligned} CP_{P_2} &= AMAT_{P_2} + 0.06(0.36)(78) \\ CP_{P_2} &= 7.36 \text{ cycles} \end{aligned}$$

P2 is faster because it takes less cycles than P1 to complete the same instruction

5.10.4

Addition of L2 cache  $\rightarrow$  95% miss rate, lat time = 5.62

$$\Rightarrow 0.68(1 + 0.08(5.62 + 107 \cdot 95\%)) = 6.32 \text{ ns}$$

AMAT has gotten better with addition of L2

5.10.5

$$\begin{aligned} CP_{P_1 \text{ improved}} &= 6.32 + 0.36 \cdot 9.31 \\ &= 9.67 \text{ cycles} \end{aligned}$$

5.10.6

$$AMAT_{P_2 \text{ w/L2}} < AMAT_{P_1 \text{ w/o L2}}$$

$$1 + 0.08(6.32 + 107r) < 9.31$$

$r < 0.91$

5.10.7

$$AMAT_{P_1 \text{ w/L2}} < AMAT_{P_1}$$

$$1 + 0.08(6.32 + 107r) < 5.11$$

$r < 0.42$

### Problem 5.12

S.12.1

2 GHz machine  $\Rightarrow 0.5 \text{ ps} \Rightarrow 100 / 0.5 = 200 \text{ cycles}$

1)

$$L1 : CPI + \text{cycles} \cdot L1 \text{ miss rate} = 1.5 + 200(0.07) = 15.5$$

$$\text{Direct Mapped } L2 : CPI + L2 \text{ DM speed} \cdot \text{cycles} \cdot L2 \text{ miss} = 1.5 + 0.07(12 + 200 \cdot 0.035 \cdot 0.07) = 2.83$$

$$8\text{-way set } L2 : CPI + L2 \text{ 8-way speed} \cdot \text{cycles} \cdot L2 \text{ miss} = 1.5 + 0.07(28 + 200 \cdot 0.015 \cdot 0.07) = 3.67$$

After building,

2)

$$L1 : CPI + \text{cycles} \cdot L1 \text{ miss rate} = 1.5 + 400(0.07) = 29.5$$

$$\text{Direct Mapped } L2 : CPI + L2 \text{ DM speed} \cdot \text{cycles} \cdot L2 \text{ miss} = 1.5 + 0.07(12 + 400 \cdot 0.035 \cdot 0.07) = 3.52$$

$$8\text{-way set } L2 : CPI + L2 \text{ 8-way speed} \cdot \text{cycles} \cdot L2 \text{ miss} = 1.5 + 0.07(28 + 400 \cdot 0.015 \cdot 0.07) = 3.88$$

S.12.2

$$MAT_{L3} = 0.07(12 + 0.035(50 + 0.13 \cdot 200)) = 1.03$$

The MAT before was 10.5, so adding the L3 cache improves the MAT,

which is the advantage of L3 cache. However, it takes too much space

in the processor for its level of increase in MAT.

S.12.3

We want to find

$\swarrow$  L1 cache from 3.12.1

$$1.5 + 0.07(50 + 200r) < 2.83$$

$r < -0.155$  which is not physically possible.