

## Assignment Ht 4 COAL

Q1

• Direct Mapped Cache. maps each memory block to a

specific cache block/line, based on the memory address in the main memory. It uses a simple

mapping method where the index bits of the memory address determines the cache block. The tags are compared with the cache and memory addresses.

This makes it very fast and easy to implement because only one comparison is needed to check if the data is in the cache. However the main drawback is that it

has a higher conflict misses, meaning different memory addresses that map to the same cache block will replace each other leading to poor performance if there are many collisions. This design is ideal for systems that

prioritize speed and simplicity such as embedded devices or small cache. Used in microcontrollers (like printers etc)

• Set Associative Cache. ~~implementation~~ is better than the

direct mapped cache. It divides the cache into multiple

sets, with each set containing the same no. of cache blocks.

A memory block is mapped to a set, but can be placed in

any block within that set. This reduces conflict misses since

multiple blocks within the set. However, it requires more comparators, one per cache block in the set making it more complex than direct mapped cache. The hit rate <sup>also</sup> improves as there is a higher chance that the memory block will be found in one of the available blocks in the set. These caches are usually used in ~~most of the general purpose processors~~ general purpose processors where a balance b/w performance and hardware cost is needed. Access time is also moderate since the set may need to check multiple blocks before finding the data. Used in Intel cores etc.

- Fully Associative Cache: don't have any restrictions. A memory block can be placed anywhere in the cache. any block. This design has the best hit rate because there are no fixed locations and the cache can store any block in any line. However it requires more hardware because it needs to compare the memory address with all the cache tags (one for each line) making it the slowest of the three designs. It needs a more complex replacement policy by LRU or Random, to decide which data to replace when the cache is full. (also used by set associative cache); These are used in small, performance critical components like TLB's, where avoiding cache misses is more important than the hardware cost or speed.

Q2

$$\text{a. Cache size} = 16 \text{ KB} = 16 \times 1024 = 16384 \text{ bytes}$$

Block size = 32 bytes

Address width = 32 bits.

$$\text{no of blocks, Cache size} = \frac{16384}{32}$$

= 512 blocks.

b. Tag | index | offset.

From. Block size

$$\text{Offset bits} = \log_2(32) = 5 \text{ bits}$$

$$\text{Index bits} = \log_2(\text{nof blocks}) = \log_2(512) = 9 \text{ bits}$$

$$\text{Tag bits} = 32 - (5 + 9)$$

= 18 bits.

(1)  $0x00000000 \Rightarrow \begin{smallmatrix} 31 & & & & 0 \\ 00 & \dots & 00 \end{smallmatrix}$

- Tag = (18 zeros)  $\begin{smallmatrix} 31 & & & 0 \\ 0 & \dots & 0 \end{smallmatrix}$

- Index =  $\begin{smallmatrix} 13 & & 5 \\ 00 & \dots & 0 \end{smallmatrix}$

- Offset =  $\begin{smallmatrix} 4 & & 0 \\ 0 & \dots & 0 \end{smallmatrix}$

- Miss (As cache empty).

Miss.

(2)  $0x00000008 \Rightarrow \begin{smallmatrix} 31 & & & & 0 \\ 00 & \dots & 00 & \dots & 1000 \end{smallmatrix}$

- Tag =  $\begin{smallmatrix} 31 & & & 0 \\ 0 & \dots & 0 \end{smallmatrix}$

- Index =  $\begin{smallmatrix} 13 & & 5 \\ 0 & \dots & 0 \end{smallmatrix}$

- Offset =  $\begin{smallmatrix} 4 & & 0 \\ 01000 & \dots & 0 \end{smallmatrix}$

Hit (As. Index has the same.

Tag as 0x0...00).

(3)  $0x00000010 \Rightarrow 00 \dots 10000$   
Tag =  $\overset{31}{0} \dots \overset{14}{0}$   
Index =  $\overset{13}{0} \dots \overset{5}{0}$   
Offset =  $\overset{4}{1}0000$   
Hit (As Index has the same tag as  $0x00 \dots 00$ )  
(Data will be replaced)

(4)  $0x00000018 \Rightarrow 00 \dots 10000$   
Tag =  $\overset{31}{0} \dots \overset{14}{0}$   
Index =  ~~$\overset{13}{1}0000$~~   $\overset{13}{0} \dots \overset{5}{0}$   
Offset =  $\overset{4}{1}1000$   
Hit (Same Tag as  $0x0 \dots 10$ )

(5)  $0x00000020 \Rightarrow 00 \dots 10000$   
Tag =  $\overset{31}{0} \dots \overset{14}{0}$   
Index =  $\overset{13}{0} \dots \overset{5}{01}$   
Offset =  $\overset{4}{0}0000$   
Miss. (Nothing at index  $0 \dots 01$ )

(6)  $0x00000028 \Rightarrow 00 \dots 101000$   
Tag =  $\overset{31}{0} \dots \overset{14}{0}$   
Index =  $\overset{13}{0} \dots \overset{5}{1}$   
Offset =  $\overset{4}{0}1000$   
Hit (As Index has the same tag as  $0x0 \dots 20$ ).  
(Data will be replaced.)

(7)  $0x00000030 \Rightarrow 00 \dots 110000$   
Tag =  $\overset{31}{0} \dots \overset{14}{0}$   
Index =  $\overset{13}{0} \dots \overset{5}{1}$   
Offset =  $\overset{4}{1}0000$   
Hit (As Index has the same tag as  $0x0 \dots 30$ ).  
(Data will be replaced)

(8)  $0x00000038 \Rightarrow 0^{31} \dots 111000^0$   
Tag =  $0^{31} \dots 0^{14}$   
Index =  $0^{13} \dots 1^5$   
Offset =  $11000^0$

Hit.

⑨  $0x00000040 \Rightarrow 0^{31} \dots 0^1 000000$   
Tag =  $0^{31} \dots 0^{14}$   
Index =  $0^{13} \dots 1^5$   
Offset =  $0D000^0$

Miss.

(10)  $0x000000 - 0x00000048 \Rightarrow 0^{31} \dots 1001000$   
Tag =  $0^{31} \dots 0^{14}$   
Index =  $0^{13} \dots 1^5$   
Offset =  $0000^0$

Hit. (Same as.  $0x \dots$ )  
Data will be replaced.

C. Hit Rate = ?

$$\text{Hit Rate} = \frac{\text{No of hits}}{\text{Total no. of access.}} = \frac{7}{10} = 0.7$$

70% Hit Rate.

Q3

2nd Associative cache.

Total size, 8KB =  $8 \times 1024 = 8192$  bytes

Block size = 16 bytes.

32 bit memory address

a.

$$\text{No. of blocks} = \frac{\text{Cache size}}{\text{Block size}}$$

$$\Rightarrow \frac{8192}{16} = 512 \text{ blocks.}$$

$$\text{No. of sets} = \frac{\text{Total Blocks}}{\text{Blocks per set}} = \frac{512}{2}, 256 \text{ sets.}$$

b.

①  $0x00001000 \Rightarrow 0 \dots 10000000000000$

Offset bits =  $\log_2(16) = 4$  bits.

Set Index bits =  $\log_2(256) = 8$  bits.

Tag ~~offset~~ bits =  $32 - (4 + 8) = 20$  bits.

Tag bits =  $0 \dots 1^8$

Index bits = "0000000" : Set = 0. Miss.

Offset bits = "000"

②  $0x00001004 \Rightarrow 0 \dots 1000000000100$   
Tag = 0 ... 1  
Index = 0 ... 0 Set = 0 Hit  
Offset = 0100.

③  $0x00002000 \Rightarrow 0 \dots 1000000000000000$   
Tag = 0 ... 10  
Index = 0 ... 0 Set = 0 Miss  
Offset = 0000 (Different Tag)

④  $0x00002004 \Rightarrow 0 \dots 100000000000100$   
Tag = 0 ... 10  
Index = 0 ... 0 Set = 0 Hit.  
Offset = 0100

⑤  $0x00001000 \Rightarrow 0 \dots 1000000000000000$   
Tag = 0 ... 1 Miss Hit.  
Index = 0 ... 0 Set = 0  
Offset = 0000 (Different Tag)

⑥  $0x00001004 \Rightarrow 0 \dots 1000000000100$ .  
Tag = 0 ... 1  
Index = 0 ... 0 Set = 0 Hit.  
Offset = 0100.

(7) 0x00002000

Tag = 0 - 10

Index = 0 - 0

Set 0

Hit

Offset = 0000

(8) 0x00002004

Tag = 0 - 10

Index = 0 - 0

Set 0.

Hit

Offset = 0100.

C.

(1) 0x00001000 Miss

Block 1 Set 0 Block 2

Tag 1 Empty.

(2) 0x00001004 Hit <sup>(Most used)</sup> Tag 1 used again Empty.

(3) 0x00002000 Miss.

Tag 1 Tag 2.

(4) 0x00002004 ~~Miss~~ <sup>Hit</sup> ~~Most used~~ Tag 2 used again Tag 1 → LRU

(5) 0x00001000 Hit. → LRU

(6) 0x00001004 Hit → LRU

(7) 0x00002000 Hit. → LRU

(8) 0x00002004 Hit → LRU

Final State

Block 1: Tag 1 (Most recently used.)

Block 2: Tag 2. (Least Recently used)

2 blocks present.

$\xrightarrow{2^6 \rightarrow 26}$  memory address bits

Q4 Main Memory = 64M words  $\rightarrow 64 \times 2^{20} \text{ words} = 2^{26} \text{ words}$

Cache Size = 128k words  $\rightarrow 128 \times 1024 \times 4$

Block size = 8 words  $\rightarrow 8 \times 4$

Word size = 4 bytes

### a. Direct Mapped Cache.

Tag | Index | Offset

No of blocks  $\rightarrow \frac{128 \times 1024 \times 4}{8 \times 4} = 16384$

Index  $= \log_2(16384) = 14$  bits

Offset  $= \log_2(8) = 3$  bits

Tag  $= 26 - (14 + 3) = 9$  bits

### b. Fully Associative Cache.

Tag | Offset

Offset  $= \log_2(8) = 3$  bits

Tag  $= 26 - 3 = 23$  bits.

### c. Set 2-Way Set Associative Cache.

Tag | Set Index | Offset

Offset = 3 bits

No of sets, No of blocks  $\rightarrow \frac{16384}{2} = 8192$

Blocks per set

Index set =  $\log_2(8192) = 13$  bits.

Tag =  $26 - (13 + 3) = 10$  bits.

d) 8 way Set Associative Cache.

No of Sets =  $\frac{16384}{8} = 2048$

Index =  $\log_2(2048) = 11$  bits

Offset = 3 bits

Tag =  $26 - (3 + 11)$   
= 12 bits.

Q5 Tag 31-10 | Index (9-5) | offset (4-0)  
22 bits. 5 bits. 5 bits

(a) Cache block size depends on offsets.

Cache block size =  $2^5 = 32$  bytes / 4  
= 8 words

b. Entries = ?

? No of blocks = ?

Entries =  $2^5 = 32$  blocks

c. Index | tag | Data.

The cache has 32 blocks.

Each block stores 32 bytes.

$$\begin{aligned}
 \text{Total bits} &: 2^n \times (\text{Block size} + \text{tag size} + \text{valid field}) \\
 &= 32 \times (32 \times 8 + 22 + 1) \\
 &= 8928
 \end{aligned}$$

$$\begin{aligned}
 \text{Data Storage bits} &: 32 \times 32 \times 8 \\
 &= 8192.
 \end{aligned}$$

$$\begin{aligned}
 \text{Ratio} &\rightarrow \frac{\text{Total bits}}{\text{Storage bits}} = \frac{8928}{8192} \\
 &\rightarrow 1.09
 \end{aligned}$$

Q6 a.

$$(1) 0 \Rightarrow 0 \dots 0$$

| Tag | Index | Offset | Miss | (Inserted at Index) |
|-----|-------|--------|------|---------------------|
| 0   | 0     | 00000  | Miss | Tag 0               |

$$(2) 4 \Rightarrow 0 \dots 0100$$

| 0 | 0 | 00100 | Hit | (found at index) |
|---|---|-------|-----|------------------|
| 0 | 0 | 00100 | Hit |                  |

$$(3) 16 \Rightarrow 0 \dots 10000$$

| Tag | Index | Offset | Hit | (found at index) |
|-----|-------|--------|-----|------------------|
| 0   | 0     | 10000  | Hit |                  |

$$(4) 32 \Rightarrow 0$$

| Tag | Index | Offset | Miss | (Inserted at Index) |
|-----|-------|--------|------|---------------------|
| 0   | 0     | 00000  | Miss | Tag 0               |

⑤ 232

0 - 0 | 00111 | 01000 Miss (Inserted at  
Index 7 Tag 0)

⑥ 160

0 - 0 | 00101 | 00000 Miss (Inserted at Index 5  
Tag 0)

⑦ 1024

R 0 - 1 | 00000 | 00000 Miss (Replace at Index 0)  
Tag 1

⑧ 30

R 0 - 0 | 0 - 0 | 11110 Miss (Replace at Index 0)  
Tag 0

⑨ 140

0 - 0 | 00100 | 01100 Hit Miss (Replace)

⑩ 3100

R 0 - 11 | 00000 | 11100 Miss (Replace at Index 0)  
Tag 3

⑪ 180

0 - 0 | 00101 | 10100 Hit

⑫ 2180

R 0 - 10 | 00100 | 00100 Miss (Replace At Index 0)  
Tag 2

4 blocks replaced.

b. Hit Ratio ?

$$\frac{4 \text{ Hits}}{12} = 0.333 \\ 33.3\% \\ \text{Hit Ratio}$$

c. Index Tag Data.

~~0~~ 0 0 30

~~Index 1 empty.~~

34 2 2180.

5 0 160.

7 0 232.

Q2 a. 13 bit physical address

Block offset = 1 word - 4 bytes =  $2^2 = 2$  bits

Cache Index = ?

No of blocks = 32

No of sets = No of blocks  
Blocks per set.

$$= \frac{32}{4} = 8 \text{ sets}$$

Cache Index =  $\log_2(8) = 3$  bits

Tag bits =  $13 - (2 + 3)$   
= 8 bits.

b. 0x0D74

|      |      |      |       |
|------|------|------|-------|
| 0    | 0    | 7    | 4     |
| 0000 | 1101 | 0111 | 0100, |

Block offset bits = 2 bits

Index = 3 bits.

Tag = 8 bits.

| Tag       | Index. | offset |
|-----------|--------|--------|
| 01101011. | 101    | 00     |
| 107       | 5      | 0      |

Miss As. nothing in set 5.

c.

Set 7

|       | Tag     |    |
|-------|---------|----|
| Way 0 | 0x06.   | 1  |
| Way 1 | 0x1D    | 1  |
| Way 2 | Invalid | 0. |
| Way 3 | 0x0D.   | 1. |

0x06.      offset = 2      Tag = 8      Index 3

~~00000000 1001 10~~

Set 7

Index 111

Tag 0x06 = 00000110

0x06

| Tag      | Index | Offset |            |
|----------|-------|--------|------------|
| 00000110 | 111   | 00     | 0x 000000C |
| =        | =     | 01     | 0x 0000D   |
| =        | =     | 10     | 0x 000E    |
| =        | =     | 11     | 0x 00DF    |

0x1D

| Tag      | Index | Offset |        |
|----------|-------|--------|--------|
| 00011101 | 111   | 00     | 0x03BC |
| =        | 111   | 01     | 0x03BD |
| =        | 111   | 10     | 0x03BE |
| =        | 111   | 11     | 0x03BF |

0x0D

| Tag      | Index | Offset |        |
|----------|-------|--------|--------|
| 00001101 | 111   | 00     | 0x01BC |
| =        | 111   | 01     | 0x01BD |
| =        | 111   | 10     | 0x01BE |
| =        | 111   | 11     | 0x01BF |

Q8

Write through  $\Rightarrow$  whenever cache is written <sup>main</sup> memory is also written.

Advantage: - Cache and memory are always in sync.

- They is good for ~~multiple~~ & simple for multicore CPUs, as they will not cause

Disadvantages: - It is slower without write buffer.

as every write must wait for memory.

- This causes stalls.

→ Write back - writes cache at first, if ~~main~~ <sup>cache block</sup> memory is to be replaced from main memory is updated.

Adv: ~~reduces~~ writes faster (no waiting) and less memory traffic (good performance)

Dis: Risk of memory loss if power goes out before writing back  
Harder to manage for multi-core processors.

Og

Yes in some cases only, ~~then~~ depending on access pattern  
~~Indirect~~ Fully Associative Caches reduce conflict misses while direct mapped cache are more prone to them, but not all the time.

Cache - 2 blocks

Memory



A - Block-Index 0

B - Block-Index 1

C - Block-Index 0

| In Direct Mapped Cache |    |
|------------------------|----|
| Access. A              | A  |
| B                      | AB |
| A                      | AB |
| B                      | AB |
| C                      | CB |
| A                      | AB |
| B                      | AB |

miss

miss

hit

hit

miss

miss

Total Hits = 3 Miss = 4

In Full associative cache.

| Access | Cache | replace | M/H  | Total Hit = 2 | Miss = 5 |
|--------|-------|---------|------|---------------|----------|
| A      | A     | -       | miss |               |          |
| B      | AB    | -       | miss |               |          |
| A      | AB    | -       | hit  |               |          |
| B      | AB    | -       | hit  |               |          |
| C      | BC    | A       | miss |               |          |
| A      | AC    | B       | miss |               |          |
| B      | BC    | A       | miss |               |          |

Direct Cache  $\rightarrow$  Fully Associative Cache (Hit)

It all depends on the access pattern.