

# Assignment 5

Due Date : May 24, 2021 (day)

Name : 易 翔  
 Student ID : 11912013  
 Score : \_\_\_\_\_

**Problem 1 (3.5) Score:** \_\_\_\_\_

**Solution:** (1) The cache size is  $2^{5-2} = 8$  (In words).

(2) Since  $\text{len(index)}=5$ , The number of entry of this cache is  $2^5 = 32$ .

(3) Total number of bits is  $2^n \times (\text{block size} + \text{tag size} + \text{valid size}) = 2^5 \times (8 \times 32 + 22 + 1) = 32 \times 279$ . And the bits for data storage is  $2^5 \times 8 \times 32 = 2^5 \times 256$ . So the final answer (ratio) is  $\frac{2^5 \times 256}{2^5 \times 279} \approx 0.918$ .

(4) First rewrite the address in binary form (only show the low 12 bits since other bits are all 0):

|         | address        |                |                |                |                |                |
|---------|----------------|----------------|----------------|----------------|----------------|----------------|
| decimal | 0              | 4              | 16             | 132            | 232            | 160            |
| binary  | 0000_0000_0000 | 0000_0000_0100 | 0000_0001_0000 | 0000_1000_0100 | 0000_1110_1000 | 0000_1010_0000 |
| decimal | 1024           | 30             | 140            | 3100           | 180            | 2180           |
| binary  | 0100_0000_0000 | 0000_0001_1110 | 0000_1000_1110 | 1100_0001_1100 | 0000_1011_0100 | 1000_1000_0100 |

We only need to check the 9-5 bits as index, 11-10 as tag (other part of tag are same):

|      | address        | index | tag | result           |
|------|----------------|-------|-----|------------------|
| 0    | 0000_0000_0000 | 00000 | 00  | Miss             |
| 4    | 0000_0000_0100 | 00000 | 00  | Hit              |
| 16   | 0000_0001_0000 | 00000 | 00  | Hit              |
| 132  | 0000_1000_0100 | 00100 | 00  | Miss             |
| 232  | 0000_1110_1000 | 00111 | 00  | Miss             |
| 160  | 0000_1010_0000 | 00101 | 00  | Miss             |
| 1024 | 0100_0000_0000 | 00000 | 01  | Miss and replace |
| 30   | 0000_0001_1110 | 00000 | 00  | Miss and replace |
| 140  | 0000_1000_1110 | 00100 | 00  | Hit              |
| 3100 | 1100_0001_1100 | 00000 | 11  | Miss and replace |
| 180  | 0000_1011_0100 | 00101 | 00  | Hit              |
| 2180 | 1000_1000_0100 | 00100 | 10  | Miss and replace |

From the table we can obtain that the answer is 4.

(5) Hit ratio =  $\frac{4}{12} \times 100\% \approx 33.3\%$

(6) The answer is shown as the following table

| index | tag | data                        |
|-------|-----|-----------------------------|
| 00000 | 11  | mem[11_00000]~mem[11_11111] |
| 00100 | 10  | mem[10_00100]~mem[11_00011] |
| 00101 | 00  | mem[00_00101]~mem[01_00100] |
| 00111 | 00  | mem[00_00111]~mem[01_00110] |

□

**Problem 2 (5.6) Score:** \_\_\_\_\_

**Solution:** (1) P1 cycle time:  $\frac{1}{0.66ns} = \frac{1}{0.66 \times 10^{-9}s} = 1.52GHz$   
 P2 cycle time:  $\frac{1}{0.90ns} = 1.11GHz$ .

(2) AMAT = hit time + miss rate  $\times$  miss price (daijia). Notice that the cycle is unable to divided. From the problem set description we can obtain that main memory accesses take 70 ns. The AMAT for P1 is  $1 + 8.0\% \times \lceil \frac{70ns}{0.66ns} \rceil = 9.56\text{cycle} = 6.3096ns$ .

The AMAT for P2 is  $1 + 6.0\% \times \lceil \frac{70ns}{0.90ns} \rceil = 5.68\text{cycle} = 5.112ns$ .

|    | L1 size | L1 Miss Rate | L1 Hit Time |
|----|---------|--------------|-------------|
| P1 | 2 KiB   | 8.0%         | 0.66ns      |
| P2 | 4 KiB   | 6.0%         | 0.90ns      |

- (3) According to the instruction counter  $I$ , The number of clock cycles lost due to instruction missing is  $0.08I \times 107$ , The number of clock cycles due to missing data at any time is  $0.08 \times 0.36 \times 107$ . In a word, the CPI for P1 is  $1 + 0.08 \times 107 + 0.08 \times 0.36 \times 107 \approx 12.64$ . Similarly, we can calculate that the CPI for P2 is 7.36.  $12.64 \times 0.66 \approx 8.34$ ,  $7.36 \times 0.90 \approx 6.62$ . Since  $6.62 < 8.34$ , then P2 is faster.
- (4) By the same way in sub-problem 2,  $AMAT = 1 + 8.0\% \times (\lceil \frac{5.62ns}{0.66ns} \rceil + 95\% \times 107) \approx 9.9$  cycle. Compare to 2, since  $9.9 > 9.56$ , then the AMAT is worse with the L2 cache.
- (5) By the same way in sub-problem 3,  $CPI = 1 + 8.0\% \times (\lceil \frac{5.62ns}{0.66ns} \rceil + 95\% \times 107) + 36\% \times 8.0\% \times (\lceil \frac{5.62ns}{0.66ns} \rceil + 95\% \times 107) = 13.03872 \approx 13.04$ .
- (6)  $13.04 \times 0.66 \approx 8.61 > 6.62$ , P2 is faster. Consider the miss rate  $t$  would P1 need in its L1 cache. For new CPI in P1 to match the performance of P2, we can obtain that  $\frac{6.64}{0.66} = 1 + t \times (\lceil \frac{5.62ns}{0.66ns} \rceil + 95\% \times 107) + t \times 36\% \times (\lceil \frac{5.62ns}{0.66ns} \rceil + 95\% \times 107)$ . After solving the equation, we can find the answer is  $0.06021 \times 100\% \approx 6\%$ .

□