

# CSE301 – Computer Organization

## Lecture 6 – Cache Memory

### Characteristics

#### i. Structural Characteristics

| Characteristic          | What it Describes                  | Examples                                   |
|-------------------------|------------------------------------|--------------------------------------------|
| <b>Location</b>         | Where memory resides in the system | CPU registers, cache, main memory, SSD/HDD |
| <b>Capacity</b>         | How much data it can store         | 8 GB RAM, 1 TB SSD                         |
| <b>Unit of transfer</b> | How data moves in/out              | Word, block, cache line                    |
| <b>Access method</b>    | How memory is reached              | Sequential, Direct, Random, Associative    |

#### Access methods:

##### 1 Sequential Access

- Data is organized and accessed **in linear order**
- To reach a particular item, all preceding data must be traversed
- **Variable and slow access time**
- Used in: **Magnetic tape**, streaming storage

##### 2 Direct Access

- Memory is divided into **fixed-size blocks/records**
- The device can move **directly to the vicinity** of the block, then access sequentially within it
- Faster than sequential but **not uniform access time**
- Used in: **HDDs, optical disks**

##### 3 Random Access

- Each location has a **unique physical address**
- Any location can be accessed **directly and with uniform access time**
- Provides high performance at **higher cost**
- Used in: **RAM, CPU registers**

##### 4 Associative Access (Content-Addressable)

- Access based on **data content**, not address
- Parallel comparison across all locations → **very fast search**
- Hardware is costly, used for small specialized memories
- Used in: **Cache tag memory, TLB, CAM**

#### ii. Performance Characteristics

| Characteristic           | Meaning                                                                                                                     |
|--------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| <b>Access time</b>       | Random: Delay between read request and delivery of data<br>Non-Random: time to position rd/wr mechanism at desired location |
| <b>Memory cycle time</b> | Time before memory is ready for the next operation                                                                          |
| <b>Transfer rate</b>     | How fast data is moved once access begins                                                                                   |

#### Transfer rate:

**Random:**

$$\text{cycle\_time} = T_{\text{access}} + T_{\text{recover}}$$

$$R = 1/\text{cycle\_time}$$

**Non-Random:**

$$R = N/(T_N - T_A)$$

where:

N: # of bits

$T_N$ : Av. time to rd/wr N bits

$T_A$ : Av. access time

**iii. Physical Characteristics**

| Category                                     | Description                                         | Examples                                                                                                                               |
|----------------------------------------------|-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|
| <b>Physical Type</b>                         | The technology used to store bits                   | Semiconductor memory (RAM/ROM), Magnetic surface memory (disk, tape), Optical / Magneto-optical (CD/DVD)                               |
| <b>Physical Characteristics (Volatility)</b> | Behavior of stored data with respect to power       | <b>Volatile:</b> Data lost without power (e.g., DRAM, SRAM) — <b>Non-volatile:</b> Data retained without power (e.g., Flash, ROM, HDD) |
| <b>Organization</b>                          | Physical arrangement of bits into larger data units | Bits grouped into <b>words</b> , blocks, or pages for access                                                                           |

**Locality of reference**

Locality of reference is the tendency of a program to access a relatively small portion of memory repeatedly over a short period of time.

**This is the principle that make us use cache**

**Cache Memory**

- Small amount of fast memory
- Based on SRAM - 1bit 6 transistors
- Located between CPU and Main Memory



### Cache Operation:

CPU requests data → Cache checks:

- **Hit:** fast access ( $T_{cache}$ )
- **Miss:** slow access from memory + load cache ( $T_{cache} + T_{memory}$ )



### Cache Addressing:

|                            | Logical (Virtual) Cache                                   | Physical Cache                                      |
|----------------------------|-----------------------------------------------------------|-----------------------------------------------------|
| <b>Addressing</b>          | Uses <b>virtual addresses</b>                             | Uses <b>physical addresses</b>                      |
| <b>Access Path</b>         | CPU accesses cache <b>before MMU translation</b>          | CPU accesses cache <b>after address translation</b> |
| <b>Speed</b>               | Faster access                                             | Slightly slower (translation required)              |
| <b>Context Switch</b>      | Must flush cache on each context switch                   | No flush needed                                     |
| <b>Application Sharing</b> | Same virtual addresses can be used by different processes | Physical addresses are unique across all memory     |

|                        |                                                                                                            |                                                                                                              |
|------------------------|------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|
| <b>Pros &amp; Cons</b> | + Faster access before MMU<br>- Requires flushing on context switch<br>- Can cause aliasing issues         | + Safe, no flush needed<br>- Slightly slower due to translation                                              |
| <b>Schematic</b>       |  <p>(a) Logical cache</p> |  <p>(b) Physical cache</p> |

## Mapping Functions

|                                               | <b>Direct Mapping</b>                                                                      | <b>Associative Mapping</b>                                                                   | <b>Set Associative Mapping</b>                                                      |
|-----------------------------------------------|--------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| <b>Schematic</b>                              |           |            |  |
| <b>Address format</b>                         | Tag (s-r bit) Line (r bit) Word (W bit)                                                    | Tag (s bit) Word (W bit)                                                                     | Tag (s-d bit) Set (d bit) Word (W bit)                                              |
| <b>Memory Size ≡ No. of addressable units</b> | $2^{S+W}$ word                                                                             | $2^{S+W}$ word                                                                               | $2^{S+W}$ word                                                                      |
| <b>Block size ≡ Line size</b>                 | $2^W$ word                                                                                 | $2^W$ word                                                                                   | $2^W$ word                                                                          |
| <b>No. of blocks in MM</b>                    | $2^S$ block                                                                                | $2^S$ block                                                                                  | $2^S$ block                                                                         |
| <b>No. of cache lines</b>                     | $2^r$ line                                                                                 | undet.                                                                                       | $k * 2^d$ line                                                                      |
| <b>Cache size</b>                             | $2^r \times 2^W$ word                                                                      | undet.                                                                                       | $k \times 2^d \times 2^W$ word                                                      |
| <b>Cache Sets</b>                             | -                                                                                          | -                                                                                            | $2^d$ set                                                                           |
| <b>Tag Size</b>                               | s-r bit                                                                                    | s bit                                                                                        | s-d bit                                                                             |
| <b>Pros &amp; Cons</b>                        | + Simple Addressing<br>+ Fixed Location<br>+ Fast Access<br>- Cache miss frequently happen | + Cache hit rate is much better<br>- Harder to implement comparison circuit<br>- Slow access | -                                                                                   |

## Replacement Algorithms

- Direct Mapping:** No replacement needed — each memory block maps to a fixed cache line.
- Associative & Set-Associative Mapping:** When a replacement is required:

1. **Least Recently Used (LRU):** Replace the block that has not been used for the longest time.
  2. **First In First Out (FIFO):** Replace the oldest block in the cache (based on entry time).
  3. **Least Frequently Used (LFU):** Replace the block accessed the least number of times (counter-based).
  4. **Random:** Replace a randomly chosen block.
- 

## Read/Write Policies

### Read Policy

- **Read Hit:**
  - Data is read directly from the cache.
- **Read Miss:**
  1. **Read-through:** Load the word from memory into cache and read.
  2. **No Read-through:** Read directly from memory without updating the cache.

### Write Policy

- **Write Hit:**
  1. **Write-through:** Write data to both cache and main memory immediately (word-by-word).
  2. **Write-back:** Write data to cache only; update memory later **only when the block is replaced** (block-by-block). Use a **dirty bit** to track changes.
- **Write Miss:**
  - Allocate the block in cache first, then follow the **write-hit policy**.