

Name (Pinyin): Xu Hongtu

Email (Prefix): xuh1

# Computer Architecture I

## Homework 6

2021 Spring April 22

### Instructions:

Homework 6 covers the content of caches, please refer to the lecture slides. You can print it out, write on it and scan it into a pdf, or you can edit the PDF directly, just remember: you must create a **PDF** and upload to the **Gradescope**. Please assign the questions properly on Gradescope, otherwise you will lose 25% of points.

### [30 points] Question Set 1. Direct Mapped Cache

In a 32-bit machine (word size is 32 bit), the clock frequency is 2GHz. We have a direct mapped cache with properties as follows:

1. Cache size is 32 Bytes;
2. Block size is 8 Bytes;
3. Cache hit time is 2 cycles;
4. Cache miss penalty is 100 cycles;

1-A. What is the width (in bits) of each field of following address bit assignment?

|                 |                      |                         |
|-----------------|----------------------|-------------------------|
| TAG:<br>27 bits | Set index:<br>2 bits | Block offset:<br>3 bits |
|-----------------|----------------------|-------------------------|

Please provide info about how you came to this result. (Answer 6pt + Analysis 4pt)

$$\text{block offset : } \log_2 8 = 3 \text{ bits}$$

$$\text{set index : } \log_2 \frac{32}{8} = 2 \text{ bits}$$

$$\text{tag : } 32 - 3 - 2 = 27 \text{ bits}$$

1-B. We will access the data of addresses as follows. Fill in the blanks. It is about T/I/O(tag/index/offset, write down the value in decimal), whether there is a hit. (each line worth 1 pt.)

| Addresses (serially access) | T/I/O | Miss or Hit {"Miss", "Hit"} |
|-----------------------------|-------|-----------------------------|
| 0x00000004                  | 0/0/4 | Miss                        |
| 0x00000005                  | 0/0/5 | Hit                         |
| 0x00000068                  | 3/1/0 | Miss                        |
| 0x000000c8                  | 6/1/0 | Miss                        |
| 0x00000068                  | 3/1/0 | Miss                        |
| 0x000000dd                  | 6/3/5 | Miss                        |
| 0x00000045                  | 2/0/5 | Miss                        |
| 0x00000004                  | 0/0/4 | Miss                        |
| 0x000000c8                  | 6/1/0 | Miss                        |
| 0x00000004                  | 0/0/4 | Hit                         |

1-C. Calculations. (Show progress, worth 50% pts)

1-C-i: miss rate: (4 pt.)

From the table above, we know the miss rate is

$$\frac{8}{10} = 80\%$$

1-C-ii: AMAT (ns): (3 pt.)

$$AMAT = (2 + 80\% \times 100) \times \frac{1}{2 \times 10^9} = 41 \text{ ns}$$

1-C-iii: AMAT if we don't have this cache (ns): (3 pt.)

$100 \times \frac{1}{2 \times 10^9} = 50 \text{ ns}$  since each time we need to access the memory

## [30 points] Question Set 2. Four-Way Set Associative Cache

From Q1 (32bit machine, clock frequency is 2GHz), we implemented a four-way set associative cache. The parameters are shown as follows:

1. Cache size is 32 B;
2. Block size is 8 Bytes;
3. Cache hit time is 2 cycles;
4. Cache miss penalty is 100 cycles;

2-A. What the width of each field of following address bit assignment:

|              |                   |                      |
|--------------|-------------------|----------------------|
| TAG: 29 bits | Set index: 0 bits | Block offset: 3 bits |
|--------------|-------------------|----------------------|

Give me the progress how you think about it. (Answer 6pt + Analysis 4pt)

$$\text{block off set: } \log_2 8 = 3 \text{ bits}$$

$$\text{set index: } \log_2 \frac{32}{8} \times \frac{1}{4} = 0 \text{ bits}$$

$$\text{tag: } 32 - 3 - 0 = 29 \text{ bits}$$

2-B. We will access the data of addresses as follows. Fill in the blanks. It is about T/I/O (tag/index/offset, write down the value in decimal), and whether there is a hit. Use LRU algorithm. (each line worth 1 pt.)

| Addresses (serially access) | T/I/O  | Miss or Hit {"Miss", "Hit"} |
|-----------------------------|--------|-----------------------------|
| 0x00000004                  | 0/0/4  | Miss                        |
| 0x00000005                  | 0/0/5  | Hit                         |
| 0x00000068                  | 13/0/0 | Miss                        |
| 0x000000c8                  | 25/0/0 | Miss                        |
| 0x00000068                  | 13/0/0 | Hit                         |
| 0x000000dd                  | 27/0/5 | Miss                        |
| 0x00000045                  | 8/0/5  | Miss                        |
| 0x00000004                  | 0/0/4  | Miss                        |
| 0x000000c8                  | 25/0/0 | Miss                        |
| 0x00000004                  | 0/0/4  | Hit                         |

## 2-C. Calculations.

### 2-C-i. Miss rate: (5 pt.)

From the table above, we know the miss rate is

$$\frac{7}{10} = 70\%$$

### 2-C-ii. Calculate the AMAT in ns. (5 pt.)

$$AMAT = (2 + 100 \times 70\%) \times \frac{1}{2 \times 10^9} = 36 \text{ ns}$$

## [22 points] Question Set 3. Cache Friendly Programming

This C program is run on a processor with a direct-mapped data cache with a size of 1KiB and a block size of 16 bytes. Assume that your cache begins cold and no optimization. (32bit machine)

```
int i, j, A[256*256];
sum = 0;
for (i = 0; i < 256; i++) {
    for (j = 0; j < 256; j++) {
        sum += A[256*i + j];
    }
}
```

Assume no optimization and sizeof(int) == 4 and A = 0x4000.

### 3-A-i. Make use of what kind of locality (2 pt.)

Spatial locality and (temporal locality if sum is not in the register,

### 3-A-ii. Calculate the miss rate. Show progress (10 pt.)



the second loop, each step += 4 Byte

So, each 4 access we have 1 miss and 3 hit.

next 4, we map to next cache line and do the same process.

So, the miss rate is 25%

3-B. Calculate the miss rate.

```
int i, j, A[256*256];  
sum = 0;  
for (i = 0; i < 256; i++) {  
    for (j = 0; j < 256; j++) {  
        sum += A[256*j + i];  
    }  
}
```

Show your progress (10 pt.)

the second loop, each step  $\approx 256 \times 4 = 1024 \text{ Byte}$   
which is the same size as the cache size.  
So, each step will be map to the same cache line  
and since this is a direct-mapped cache, then each time will miss.  
So, the miss rate is 100%.

64      168

#### [18 points] Question Set 4. AMAT

4-A. Impact of increasing associativity with fixed blocks size and number of sets. Fill in with increase, decrease or unchange. (3 pt)

|              |          |
|--------------|----------|
| Hit time     | increase |
| Miss rate    | decrease |
| Miss penalty | unchange |

4-B. Suppose your L1\$ has a hit time of 2 cycles and a local miss rate of 20%. Your L2\$ has a hit time of 15 cycles and a global miss rate of 5%. Your main memory access needs 100 cycles.

4-B-i. Write down your local miss rate of L2\$. (5 pt)

$$\text{local L}_2 \times \text{local L}_1 = \text{global}$$
$$\Rightarrow \text{local L}_2 = \frac{5\%}{20\%} = 25\%$$

4-B-ii. Write down the AMAT of the system. (5 pt)

$$\text{AMAT} = 2 + 20\% \times (15 + 25\% \times 100) = 10 \text{ cycles},$$

4-B-iii. Suppose your newly added L3\$ has a hit time of 30 cycles and you want reduce your AMAT of the system to 8 cycle, what is the largest local miss rate? (5 pt)

let the local miss rate be  $x$ , then

$$\text{AMAT} = 2 + 20\% \times (15 + 25\% \times (30 + 100x)) = 8$$

$$\Rightarrow x = \frac{3}{10} = 30\%$$

So, the largest local miss rate is 30%