

# Operating System: Translation Lookaside Buffers

---

Sang Ho Choi ([shchoi@kw.ac.kr](mailto:shchoi@kw.ac.kr))

School of Computer & Information Engineering  
KwangWoon University

# Temporal Overhead of Paging

- Address translation is too slow
  - A simple linear page table doubles the cost of memory lookups
    - One for the page table, another to fetch the data or instruction
  - Multi-level page tables increase the cost further (discussed later)
- Goal: make address translation fast
  - Make fetching from a virtual address about as efficient as fetching from a physical address

page table  
일고 P. A의  
값을 찾는다.

# TLB (Translation Lookaside Buffer)

- Part of the chip's memory-management unit (MMU)
- A hardware cache of popular virtual-to-physical address translation

자주 접근하는 주소들을  
캐시처럼  
저장.



# TLB (Cont.)

- Address translation with TLB



# TLB Basic Algorithms

```
1: VPN = (VirtualAddress & VPN_MASK) >> SHIFT           VPN만 측정  
2: (Success, TlbEntry) = TLB_Lookup(VPN)                   TLB 오류  
3: if(Success == True) { // TLB Hit  
4:     if(CanAccess(TlbEntry.ProtectBit) == True) {         권한 있으면  
5:         offset = VirtualAddress & OFFSET_MASK           오프셋 측정하고  
6:         PhysAddr = (TlbEntry.PFN << SHIFT) | Offset      PFN 캐시드한거랑 오프셋  
7:         AccessMemory(PhysAddr)                          OR연산  
8:     } else  
9:         RaiseException(PROTECTION_ERROR)               권한이거나
```

- (1 lines) extract the virtual page number (VPN)
- (2 lines) check if the TLB holds the translation for this VPN
- (5-8 lines) extract the page frame number from the relevant TLB entry, and form the desired physical address and access memory

# TLB Basic Algorithms (Cont.)

PageTable Base Register (P.T의 시작주소)

Page table entry

```
11: else //TLB Miss /  
12:     PTEAddr = PTBR + (VPN * sizeof(PTE))  
13:     PTE = AccessMemory(PTEAddr)  
14:     (...)  
15:  
16:     TLB_Insert( VPN , PTE.PFN , PTE.ProtectBits) ← 사용했던 번역  
17:     RetryInstruction()  
18: }  
19: }
```

TLB에 업데이트.

- (11-12 lines) The hardware accesses the page table to find the translation
- (16 lines) updates the TLB with the translation

# Example: Accessing An Array

- How a TLB can improve its performance



```
0:     int sum = 0 ;  
1:     for( i=0; i<10; i++) {  
2:         sum+=a[i];  
3:     }
```

The TLB improves performance  
due to **spatial locality**

3 misses and 7 hits  
Thus **TLB hit rate** is 70%

# Locality

- **Temporal Locality**

- An instruction or data item that has been recently accessed will likely be re-accessed soon in the future



- **Spatial Locality**

- If a program accesses memory at address  $x$ , it will likely soon access memory near  $x$



# TLB Performance

---

- Effective Access Time with TLB
  - TLB lookup time:  $\varepsilon$
  - Memory access time: 1
  - Hit ratio:  $\alpha$ 
    - Percentage that is found in TLB
  - Effective Access Time
$$= \alpha \times \text{Hit memory time} + (1-\alpha) \times \text{miss memory time}$$
$$= \alpha (1 + \varepsilon) + (1 - \alpha)(2 + \varepsilon) = 2 + \varepsilon - \alpha$$

# TLB entry

- TLB is managed by **Full Associative** method
  - A typical TLB might have 32,64, or 128 entries
  - Hardware search the entire TLB in parallel to find the desired translation
  - other bits: valid bits , protection bits, address-space identifier, dirty bit



Typical TLB entry look like this

# TLB Issue: Context Switching



# TLB Issue: Context Switching



# TLB Issue: Context Switching

Process A



Virtual Memory

Process B



Virtual Memory

TLB Table

| VPN | PFN | valid | prot |
|-----|-----|-------|------|
| 10  | 100 | 1     | rwx  |
| -   | -   | -     | -    |
| 10  | 170 | 1     | rwx  |
| -   | -   | -     | -    |

Can't **Distinguish** which entry is meant for which process

Flush TLB on each context switch → Cost is high

# To Solve Problem

- Provide an address space identifier (ASID) field in the TLB



TLB Table

| VPN | PFN | valid | prot | ASID |
|-----|-----|-------|------|------|
| 10  | 100 | 1     | rwx  | 1    |
| -   | -   | -     | -    | -    |
| 10  | 170 | 1     | rwx  | 2    |
| -   | -   | -     | -    | -    |

# Another Case

- Two processes **share a page**
  - Process 1 is sharing physical page 101 with Process2
  - P1 maps this page into the 10<sup>th</sup> page of its address space
  - P2 maps this page to the 50<sup>th</sup> page of its address space

| VPN | PFN | valid | prot | ASID |
|-----|-----|-------|------|------|
| 10  | 101 | 1     | rwx  | 1    |
| -   | -   | -     | -    | -    |
| 50  | 101 | 1     | rwx  | 2    |
| -   | -   | -     | -    | -    |

Sharing of pages is  
**useful** as it reduces  
the number of  
physical pages in use

# TLB Replacement Policy

- LRU (Least Recently Used)
  - Evict an entry that has not recently been used
  - Take advantage of *locality* in the memory-reference stream



Total 11 TLB miss

# A Real TLB Entry

All 64 bits of this TLB entry (example of MIPS R4000)



| Flag             | Content                                                                  |
|------------------|--------------------------------------------------------------------------|
| 19-bit VPN       | The rest reserved for the kernel.                                        |
| 24-bit PFN       | Systems can support up to 64GB of main memory( $2^{24} * 4KB$ pages ).   |
| Global bit(G)    | Used for pages that are globally-shared among processes.                 |
| ASID             | OS can use to distinguish between address spaces.                        |
| Coherence bit(C) | determine how a page is cached by the hardware.                          |
| Dirty bit(D)     | marking when the page has been written.                                  |
| Valid bit(V)     | tells the hardware if there is a valid translation present in the entry. |