

## Homework 2

Q1) Fill in the blanks

- a) Instruction Level Parallelism
- b) control
- c) hardware
- d) array/vector
- e) vector
- f) threads
- g) distributed shared memory
- h) test top
- i) 35  $(2^5 \times 2^{10} \times 2^{10} \times 2^{10})$
- j) registers
- k) temporal
- l) valid

Q2) a) Hit rates (1 cycle clock)

$$I_{cache} = 98\% \quad D_{cache} = 92\% \quad \text{Penalty} = 10 \text{ cycles}$$

$$AMAT_{I_{cache}} = 1 + (1 - 0.98) \times 10 = 1 + 0.02 \times 10 = \frac{1.2}{\text{Ans}} \text{ cycles}$$

$$AMAT_{D_{cache}} = 1 + (1 - 0.92) \times 10 = 1 + 0.08 \times 10 = \frac{1.8}{\text{Ans}} \text{ cycles}$$

### Runtime

$$\text{Instruc}^n = 800,000 \times 0.98 = 784,000 \text{ hits (1 cycle)} = 784,000 \text{ cycles}$$

$$(800,000 - 784,000) \times (10+1) = 16,000 \times 11 = 176,000 \text{ cycles}$$

For Data cache, since we have already counted cache access previously so need to count only penalties for miss.

$$(100,000 - (100,000 \times 0.92)) \times 10 = 8000 \times 10 = 80,000 \text{ cycles}$$

$$\text{Total runtime} = 784,000 + 176,000 + 80,000 = \underline{\underline{1040,000 \text{ cycles}}} \rightarrow \text{Ans}$$

$$c) \text{IPC} = \frac{\text{No of Instructions}}{\text{No of cycles}} = \frac{800000}{1040000} = \frac{0.769}{\cancel{1040000}} \text{ Instructions Per Cycle}$$

Ans.

for ideal pipeline (with no stalls or delays), the number of cycles is equal to the number of instructions. If we have 'n' instructions, it takes 'n' cycles

$$\text{IPC} = \frac{n}{n} = \frac{1 \text{ Instruction Per Cycle}}{\text{Ans}}$$

(Q3)

a) Machine = 16-bit

256 entry cache

$$\therefore \text{No of line-bits} = \log_2 256 = 8 \text{ bits (00 to FF)}$$

Cache line Size = 16 bytes

1 Memory address = 1 byte

So we bring in 16 memory addresses in each line

$$\text{Offset bits required} = \log_2 16 = 4 \text{ bits}$$

$$\text{Tag bits} = 16 - (8+4) = 4 \text{ bits} \rightarrow \text{Ans}$$

(Q4)

b)

1) 6F8E  
Tag Line  
Offset  
(Cache miss)



|              | Tag | Data | Valid | Dirty |
|--------------|-----|------|-------|-------|
| 2) 4F8E      | F8  | X 4  | 1     | -     |
| (Cache miss) |     |      |       |       |
| 3) 4F8D      | F8  | 4    | 1     | -     |
| (Cache hit)  |     |      |       |       |
| 4) 4FC2      | FC  | 4    | 1     | -     |
| (Cache miss) |     |      |       |       |
| 5) 0000      | 00  | 0    | 1     | -     |
| (Cache miss) |     |      |       |       |
| 6) 0002      | 00  | 0    | 1     | -     |
| (Cache hit)  |     |      |       |       |
| 7) 2FD1      | F0  | 2    | 1     | -     |
| (Cache miss) |     |      |       |       |
| 8) 4FF8E     | F8  | 4    | 1     | -     |
| (Cache hit)  |     |      |       |       |
| 9) 2F67      | F6  | 2    | 1     | -     |
| (Cache miss) |     |      |       |       |
| 10) 2E38     | E3  | 2    | 1     | -     |
| (Cache miss) |     |      |       |       |
| 11) 3F69     | F6  | 2 3  | 1     | -     |
| (Cache miss) |     |      |       |       |
| 12) 2E32     |     | 2    | 1     | -     |
| (Cache hit)  |     |      |       |       |

## Final Cache Entries

|    | Tag | Data | Valid | Dirty |
|----|-----|------|-------|-------|
| 00 | 0   | -    | 1     | -     |
| E3 | 2   | -    | 1     | -     |
| F0 | 2   | -    | 1     | -     |
| F6 | 3   | -    | 1     | -     |
| F8 | 4   | -    | 1     | -     |
| FC | 4   | -    | 1     | -     |
| FF |     |      | 0     |       |

→ Ans

c) Time in no-cache machine =  $30 \times 12 = 360$  cycles

Cache hit = 1 cycle

Cache miss penalty = 30 cycle

8 cache misses & 4 cache hits.

$$\Rightarrow 30 \times 8 + 4 \times 1 = 240 + 4 = 252 \text{ cycles}$$

→ (30 penalty + 1 cache access)

$$\text{Time saved} = 360 - 252 = \underline{\underline{108 \text{ cycles saved}}}$$

→ Ans

d) There are 3 instances of spatial locality cache hits in this program

The 2nd instruction is 4F8E which brings the whole line (4F80 - 4F8F) into the cache memory since

The cache is of 16 byte line size.  
The 3<sup>rd</sup> instruction accesses 4F8D which is now already present in the cache at line ~~number~~ number F8. Similar case is for 0000, 0002 and 2E38 & 2E32.

For temporal locality cache hit, we see that memory address 4F8E is accessed in the 2<sup>nd</sup> as well as the 8<sup>th</sup> instruction. This is an example of temporal locality because the same address is accessed in a short time span twice.

- (Q4) Since, we cannot jump to a 28-bit displacement constant using BE&Z, we can jump to another label where we can use the unconditional jump J to jump to our actual destination.

-text

main :

```
# Some code here  
LI $r1, 0      # To perform the jump  
BE&Z $r1, my-label
```

my-label:

AJ long-disp

# Assuming long-disp works like B  
(i.e., not handling the PC relative counter)

# Some Code Here

LI \$v0, 10

syscall

-data

```
long-disp .word 0xFFFFFFF # 28 bit constant
```