

15/08/20

W3

## Memory Management:

(Primary memory or Physical memory or RAM)

→ There are two views of memory

i) Physical view:



ii) Abstract view (programmer's view):



Q: If size of word is 'm' bits and no of bits in address is 'n'

$$\text{Then } \text{Memory Capacity (total words)} = 2^n = N$$

→ If ~~Address~~ N = 8 words

$$\Rightarrow n = 3 \text{ bits}$$

i.e.,  $N = 2^n \text{ words}$   
 $n = \log_2 N \text{ bits}$

$$\Rightarrow N = 16 \text{ GB}$$

$$\Rightarrow \text{i.e., } N = 16 \times 2^{30} = 2^{34}$$

$$n = 34$$

$\rightarrow$  If  $n=28$  bits

a max of  $2^{28}$  bytes can be addressed  
i.e., 256 MB

$\rightarrow$  If  $n=34$  bits and word length = 16 bits

find capacity of memory in terms of words, bytes, bits.

Sol:

$$2^n = 2^{34} \text{ words} \geq 16 \text{ GiB words}$$

$$\Rightarrow 2^{34} \times 2 \text{ bytes} = 2^{35} \text{ bytes} = 32 \text{ GiB}$$

$$2^{34} \times 16 \text{ bits} = 2^{38} \text{ bits} = 256 \text{ Gi-bits}$$

$\rightarrow$  let capacity of memory be 128 Tbits and word size be 64 bits

find capacity of memory in terms of bytes and words

Sol:

$$128 \text{ Tbits} = 2^7 \times 2^{40} \text{ bits}$$
$$\geq 2^{47} \text{ bits}$$

$$\text{Memory capacity in bytes} = \frac{2^{47}}{8} = 2^{44} \text{ bytes}$$
$$= 16 \text{ TB}$$

$$\text{Memory capacity in words} = \frac{2^{47}}{64} = 2^{41} \text{ words}$$
$$= 2 \text{ Ti-words}$$

## Loading vs Linking

↳ loading of executable program from disk to memory.

Loading:

Loading is of two types:

i) Static Loading:

\* Entire program is loaded before execution

### (ii) Dynamic Loading:

Modules can be loaded during runtime.

Eg: Consider

```
main()
{
    if (...) {
        f1();
    }
    else
        f2();
}
```

DA

In static loading we load main(), f<sub>1</sub>(), f<sub>2</sub>() into memory. However we use only one out of f<sub>1</sub> & f<sub>2</sub>. This is space inefficient.

Thus we can use dynamic loading and load only main() initially and load either f<sub>1</sub>, f<sub>2</sub> based on the requirement at the time of execution.

→ However execution time is more if we use dynamic loading.

→ Static loading: Space inefficient, time efficient

Dynamic loading: Space efficient, time inefficient

### Linking:

Linking is the process of resolving external references  
 ↳ function, variable (global)  
 ↳ or any objects.

kk.c

extern int x; // external reference

main() { bsd; int x = 10; }

{ f(); // since f() is defined after main() }

{ main() has address of f() }

f() is not known earlier. This is an unresolved reference.

kk.c

int x;

f()

scanf()

printf()

g() // BSA → unresolved reference

g()

h() // external reference

h()



Linking is of two types

(i) Static Linking

(ii) Dynamic Linking

The linker resolves addresses through logical addresses

### i) Static Linking:

\* Linking is done before runtime.



### Drawback:

\* Sometimes unnecessary modules may be linked and this is inefficient. for which we sometime may postpone linking till runtime (i.e., latebinding or dynamic linking)

\* However execution of program is faster in the case of static linking.

### ii) Dynamic Linking / Late Binding:

For every ~~piece~~ ~~un~~ unresolved reference compiler associates a piece of code called Stub. This stub when executes at runtime calls linker.

→ The libraries which are linked at runtime are called

### Dynamic Link Libraries (DLL)

→ It is more space efficient.

#### Benefits of dynamic linking

- \* space efficiency
- \* Reusability of libraries
- \* flexibility of modification

#### Drawback

- \* time inefficient
- \* security threat

→ static linking is more secure.

#### Modules needed by linker:

- object code
- external references
- relocation information
- \* ~~Linker does not need physical address of memory.~~

#### Address Binding

Association of program instructions and data units to memory locations is called address binding.

- I<sub>1</sub> : Load a, #1      binding  
I<sub>2</sub> : Load b, #5  
I<sub>3</sub> : Load R<sub>1</sub>, a  
I<sub>4</sub> : Load R<sub>2</sub>, b  
I<sub>5</sub> : Add R<sub>1</sub>, R<sub>2</sub>  
I<sub>6</sub> : Store c, R<sub>1</sub>



The time at which binding takes place is called binding time. 118

Binding can take place at three times:

i) Compile time } Static

ii) Load time

iii) Run time } Dynamic

### Compile time Binding:

- \* Compiler associates physical addresses.
- \* The program is loaded into fixed memory locations.

### Load time Binding:

\* Here loader does the work of binding.

\* Based on the value in base register, the instructions are loaded into memory.

### Runtime Binding (Dynamic Relocation)

\* Here loader does the work of binding at runtime.

→ If a process is swapped out

- \* it must be swapped into the same memory location if we use compile time binding or load time binding.
- \* it can be swapped into different memory location if we use runtime binding.

### Functions & Goals of Memory Manager:

#### Functions:

i) Allocation

ii) Protection

iii) Address Translation (done by MMU)

iv) Free Space Management

v) Deallocation

## Goals:

- Effective utilization of memory
- i.e., Minimize Fragmentation
- Manage the execution of larger program in a small memory area.

## → Memory management Techniques



## Protection:

\* base register contains starting address of the process.

\* limit register contains size of process.

Working of protection mechanism :-



trap to os (trap is an interrupt sent to OS saying an error has occurred)

# 1. Contiguous Memory Management Techniques:

## a) Overlays:

Assume we have a 2-pass Assembler.

Below are sizes of each module.

pass 1: 70 kB

pass 2: 80 kB

Symbol table: 30 kB

Common Routines: 20 kB

Overlay loader: 10 kB

(or) Overlay driver

total memory: 210 kB

Assume we have only 150 kB of free memory

- Pass 1 and Pass 2 are independent modules. I.e., pass 1 can run without any use of pass 2 and vice versa
- Now 1st overlay loader loads pass 1. Later pass 1 is overlaid (replaced) by pass 2

|                 |       |
|-----------------|-------|
| Symbol table    | 30 kB |
| Common Routines | 20 kB |
| Loader          | 10 kB |
| Pass 1          |       |
| Pass 2          |       |

## Limitation:

- Program must be possible to be divided into independent modules

Eg: Program (Overlay tree)



For the above overlay tree, what is the minimum amount of memory required for executing? the above?

Sq:

During pass 1 we may need A,B,C (i.e., 19KB)

During pass 2 we may need A,B,C,D (i.e., 18K)

In that way at some pass we may need A,C,D,I

$$\text{i.e., } 10 + 8 + 8 = 26K$$

∴ size of max pass = 26K

∴ Min memory required = 26K

\*  $\rightarrow$  Minimum memory required = Max { path lengths from root to the leaves }

## b) Partitions:

Partitions is of 2 types:

i) Fixed Partitioning (MFT: Multiprogramming with Fixed no of Tasks)

ii) Variable Partitioning (MVT: Multiprogramming with Variable no of Tasks)

### i) Fixed Partitions: (a) Static Partitioning

User area: It is divided into fixed no of partitions.



Partitions may be of different sizes.

Each partition holds exactly one process.

## Address translation & Protection:

### Protection

#### address translation



## Allocation Strategies:

### First fit:

→ search from first partition and store it wherever enough sized partition is found

### Best fit:

smallest free and big enough partition is chosen

### Worst fit:

Largest free and big enough partition is chosen

### Next fit:

Next fit works like fit except that it search for free partition starts from last allocation.

## Performance issues:

### i) Internal Fragmentation:

After allocating a process to a partition we might be left out with some space in the partition. The left out space even if it can accommodate a new process shouldn't be allocated. This problem is called internal fragmentation.

## 2) External Fragmentation:

No external fragmentation

3) Degree of multiprogramming is restricted to no of partitions

4) ~~size of max.~~ Maximum process size that is runnable is limited

5) Best fit is the most suitable allocation technique. cuz it has least amount of internal fragmentation.

## (ii) Variable Partitions (or) Dynamic Partitioning

partitions are created dynamically <sup>created</sup> at runtime

At time t assume we have below requests

70k; 120k; 300k; 25k; 80k;  
P<sub>1</sub> P<sub>2</sub> P<sub>3</sub> P<sub>4</sub> P<sub>5</sub>

Here OS maintains a table to keep track of which parts of memory are available

|                       |
|-----------------------|
| P <sub>1</sub> , 70k  |
| P <sub>2</sub> , 120k |
| P <sub>3</sub> , 300k |
| P <sub>4</sub> , 25k  |
| P <sub>5</sub> , 80k  |

→ If P<sub>5</sub> finishes its execution

we have two holes of 80k and 500k adjacent to each other. Now these

two will be joined. This joining of adjacent holes is called coalescing.

Assume after some time P<sub>2</sub> and P<sub>4</sub> has finished its execution and

we have a new process request to P<sub>6</sub> of 23k.

Now we may use first fit, best fit, worst fit, next fit

|                       |
|-----------------------|
| P <sub>1</sub> , 70k  |
| P <sub>6</sub> , 23k  |
| 97k                   |
| P <sub>3</sub> , 300k |
| 25k                   |
| P <sub>5</sub> , 80k  |
| 500k                  |

first fit

|                       |
|-----------------------|
| P <sub>1</sub> , 70k  |
| P <sub>6</sub> , 120k |
| P <sub>3</sub> , 300k |
| P <sub>6</sub> , 23k  |
| 2k                    |
| P <sub>5</sub> , 80k  |
| 500k                  |

best fit

## Performance issues

1. No internal fragmentation
2. we may have External fragmentation

Eg. for previous example,

assume we have a process req of 550k, we can't place in the memory. However total available space is more than 550k. Thus we say that we may have external fragmentation.

3. Deg. of M-Pr is flexible

4. Maximum size of runnable process is also flexible.

5. Using best fit creates small free holes. These holes are wasted in the form of external fragmentation.

Hence worst fit is best allocation strategy for dynamic partitioning. (In TB it is given that first & best are better than worst fit. first & best are equally good in storage utilization, however

Overcoming the problem of External fragmentation :

we have 2 ways for overcoming external fragmentation.

i) Compaction (or) Defragmentation :

At time t, assume memory is shown below

Now if we have a process ( $P_4$ ) request with size 250k we cannot allocate it.

→ Compaction moves all the processes to one place.

→ However compaction is possible only if we use runtime binding.

→ Also compaction takes more time to finish.

we say we have EF iff  
we can't allocate memory for process even if total available space is more than the size of process

Memory after compaction is



### (b) Non-Contiguous Allocation (Paging)

If we don't use contiguous allocation we can now ~~place~~ divide the program and place it in different holes.

H4/1



~~250 & 600~~ ∴ 600 k & 300 k

H4/2



first fit



Best fit



only first fit

∴ opt(b)

H4/3

126



To get denied always we provide maximum hole possible

$$\therefore 541 \text{ k}$$

(ii) To find smallest req to be denied we must create

maximum possible holes of equal sizes

$$\text{max possible holes} = 3$$

$$\text{total free space} = 540$$

$$\text{size of each hole} = \frac{540}{3} = 180$$



$\therefore 181 \text{ k}$  can be denied in this case

H4M

$$\text{Memory size} = 2^{46} \text{ bytes}$$

$$\text{no of partitions} = \frac{2^{46}}{2^{24}} = 2^{22} \text{ partitions}$$

$$= 4M \text{ partitions}$$

$$\text{partition address (ptr)} = 22 \text{ bits}$$

(i) size of pointer to the nearest byte

nearest byte to 22 bits is 3 bytes (24 bits)

(ii) size of each entry = size(ptr) + size(PID)

$$= \cancel{24} 3 + 4 \text{ bytes}$$

$$= 7 \text{ bytes}$$

$$\text{i.e., } 7 \times 500 = 3500 \text{ bytes}$$

2. merging all pd between overflows, prioritized smallest RT

H5 LS

Load time is the time when they are loaded into memory

The load time of

$P_1, P_2, P_3, P_4, P_7$  is 0

$P_5, P_6$  must wait until 20K free hole

is freed

| $P_1$ | $P_2$ | $P_3$ | $P_4$ | $P_7$ | $P_5$ | $P_6$ |
|-------|-------|-------|-------|-------|-------|-------|
| 0     | 4     | 14    | 16    | 17    | 19    | 29    |

load time of  $P_5 = 14$

load time of  $P_6 = 29$



| RQ | $P_1, P_2, P_3, P_4, P_7, P_5, P_6$ |
|----|-------------------------------------|
|    | $t=0 \ t=14 \ t=29$                 |

Completion times can be seen in the gantt chart.

gantt chart is oldest sproc and push all inserted at

show 'P' at 2AM is null, did 'P' at 11 to 20K

did 'P' at 8AM to 20K

show 'P' at 2AM is null, did 'P' at 11 to 20K

did 'P' at 8AM to end execution to 20K

## 2. Non-Contiguous Memory Allocation Techniques

### Logical Address Space (LAS) vs Physical Address Space (PAS)

(c)

(Virtual Address Space (VAS))



- \* → In all uniprogrammed OS, the address generated by the program is physical address
- \* → Also in Compile time binding & Load time binding, the address generated ~~is~~ by program is physical address
- \* → ~~In runtime binding~~ In runtime binding, the address generated by the program is logical address which is not equal to ~~to~~ the corresponding physical address.
- \* → In M.Pr systems we allow CPU to generate logical address and MMU translates it to PA. ~~This way~~ This approach provides protection because we are not letting executing program generate PA and access memory directly.



→ Based on the technique we use the hardware of MMU changes.

The ~~hardware~~ It may be page table (or) segment table

→ If size of PA is 'n' bits, then PAS is  $2^n$  words.  
also size of MAR is n bits.

→ If size of VA is 'm' bits, then LAS (or) VAS is  $2^m$  words  
also size of address bus of CPU is m bits.

→ The address binding used here is runtime binding.

129

→ LAS =  $2^{\text{LA}}$  words

LA =  $\log_2 \text{LAS}$  bits

→ PAS =  $2^{\text{PA}}$  words

PA =  $\log_2 \text{PAS}$  bits

→ theoretically

LAS  $\Rightarrow$

LAS > PAS

LAS < PAS

LAS = PAS

→ But practically

LAS > PAS



### i) Paging: (simple paging)

Paging helps in sharing code

Assume

$$\text{LAS} = 8\text{KB}$$

$$\text{PAS} = 4\text{KB}$$

$$\text{Page size} = 1\text{KB}$$

(PS)

$$\Rightarrow \text{LA} = 13 \text{ bits}$$

$$\text{PA} = 12 \text{ bits}$$

Pagesize is defined by H/W

Paging increases context switch time

#### Step 1: Organization of LAS over PAS

→ LAS is divided into equal size units called pages

→ Generally page ~~as pages~~ page size is power of 2 (but not necessary though)

$$\rightarrow \text{no of pages} = \frac{8\text{KB}}{1\text{KB}} = 8 \text{ pages}$$

i.e.,  $\text{no of pages (N)} = \frac{\text{LAS}}{\text{PS}}$



→ Every page is given page number or page id.

$$\rightarrow \boxed{\text{Page No (P)} = \log_2 8 = \log_2 N} \quad \text{i.e., } \log_2 N = \log_2 8 = 3 \text{ bit}$$

$$N = 2^P$$

$$\rightarrow \boxed{\text{Page offset (d)} = \log_2 \text{PS} \text{ (bits)}} \quad \text{i.e., pg. off} = \log 1\text{KB} = \log 2^{10} = 10$$

$$\text{PS} = 2^d$$

→ Here we generally select req page first, then using page offset then we go to required word.

→ Logical Address format:

|                                          |        |
|------------------------------------------|--------|
| Pg-No                                    | offset |
| $\leftarrow P \rightarrow d \rightarrow$ |        |



Eg: LA = 24 bits; PS = 4 KB

then

$$\text{No of pages (N)} = \frac{2^{24}}{2^{12}} = 2^{12} \text{ pages}$$

LA =  $2^{24}$  bits  $\Rightarrow$  24 bits  $\rightarrow$  12 bits for page number + 12 bits for offset

Page number  $P = \log_2 2^{12} = 12$  bits

for off  $d = 24 - 12 = 12$  bits  $\Rightarrow$  log PS =  $2^{12}$  bits

logical address format

|                             |    |
|-----------------------------|----|
| $\leftarrow 12 \rightarrow$ | LA |
| 12   12                     |    |

Eg: A system has ~~2k~~ 2k pages with page offset of 9 bits then

|    |                             |   |
|----|-----------------------------|---|
| LA | 11                          | 9 |
|    | $\leftarrow 20 \rightarrow$ |   |

$\Rightarrow LA = 20$   
 $LAS = 2^{20} = 1\text{MB}$

$P = 11$   
 $d = 9$

$2A2 - (4) \log 2^{11} = 11$   
 $P = \log 2^{11} = 11$

Ex: System supports a LA of 32 bits if page size is 16KB  
calculate no of pages in System

Sol:

$$LAS = 2^{32}$$

$$PS = 2^{14}$$

$$\text{no of pages} = \frac{2^{32}}{2^{14}} = 2^{18} \text{ pages}$$

$$\therefore N = 2^{18}$$

Method 2:



$$\Rightarrow f = 32 - 14 = 18$$

$$\Rightarrow \text{no of pages} = N = 2^{18}$$

### Step 2: Organization of PAS:

→ PAS is divided into equal size units known as frames (Page frames)

$$\rightarrow \boxed{\text{Frame-size (FS)} = \text{Page size}}$$

→ Any page can be stored in any frame (Non contiguous allocation)

In the example,

$$PAS = 4KB \quad PS = 1KB$$

$$\text{no of frames} = \frac{PAS}{PS} \quad (M)$$

| PAS |   |
|-----|---|
| 1   | 2 |
| 3   | 4 |
| 5   | 6 |
| 7   | 8 |

BS maintains frame table to keep track of information like which frames are allocated & which are free

$$\rightarrow \boxed{\text{frame no}(f) = \log_2 M \text{ bits}}$$

$$\rightarrow \boxed{\text{frame offset} = \text{page offset} = d}$$

→ physical address format:



Eg: PAS = 16KB; d = 9 bits

then

~~PAS~~ Phy address format



$$\Rightarrow f = 5$$

$$\Rightarrow \text{no of frames} = 2^5$$

$$\Rightarrow \text{page size} = 2^9 = 512 \text{ Bytes}$$

Eg: System has a LA of 35 bits with 4k pages. If it has 256 frames then calculate size of PA

Sol: 35 bits - 4,096 bytes to access

$$LAS = 2^{35} = 32 \text{ GB}$$

$$\text{no of pages} = 4 \text{K} = 2^{12}$$

$$\Rightarrow \text{page size} = \frac{\text{LAS}}{N} = \frac{2^{35}}{2^{12}} = 2^{23} \text{ bytes}$$

$$\Rightarrow \text{page offset (d)} = 23$$

$$\text{no of frames (M)} = 256$$

(Addressable memory) frame number (f)  $\approx \log_2 256 = 8$  bytes unit



$$23 \times 1 = 29 \quad 8 \times 1 = 8$$

$$23$$

frame off no



$$\therefore PA = 31$$

$$\frac{29}{8} = \text{constant} + 1$$

Q28) System has LA of 33 bits with a page size of 8 kB. If PA is 29 bits. calculate no of frames.

Sol:



$$\text{total M pages} = (2)^{13} = 8192$$

PA



$$\Rightarrow f = 16$$

$$\Rightarrow \text{no of pages frames} = 2^{16} = 64 \text{ K frames.}$$

### Step 3: Organization of MMU

MMU in paging is page table or page map table or address translation table.

- Each process has its own page table
- Page tables are generally stored in memory.
- Page tables is organized as a set of entries and each entry is known as page table entry.
- No of ~~page table~~ entries in PT = no of pages in LAS
- PT entries contains the frame no (f) in which the page is present.

| size of each entry |                     |
|--------------------|---------------------|
| 0                  | 01(f <sub>1</sub> ) |
| 1                  | -                   |
| 2                  | 11(f <sub>3</sub> ) |
| 3                  | 10(f <sub>2</sub> ) |
| 4                  | -                   |
| 5                  | 00(f <sub>0</sub> ) |
| 6                  | -                   |
| 7                  | -                   |

for Page table (MMU)



- PT entries are generally in bytes.

$$\boxed{\text{Page table size} = \text{No of pages in LAS} \times \text{size of each entry} = N \times e}$$

## Address translation (logical to physical)

136



∴ **101 | d** corresponds to **d<sup>th</sup>** word in **f<sub>d</sub>** frame 0 of memory.

(Q29)

LA = 34 bits; PA = 27 bits;

PS = 8 kB;

UHM to noitasingap : 2972

LAS = ?      Npages = ?      PA = ?      PS = ?

M<sub>frames</sub> = ?      P = ?      f = ?

Find format of logical address and physical address

Find size of a page table (approx) - q. in slot 0 to 09

Sol:  $\frac{2^{34}}{2^{13}} = 16 \text{ GB}$

$$PA = 2^{27} = 128 \text{ MB}$$

$$N_{\text{pages}} = \frac{2^{34}}{2^{13}} = 2^{21} \text{ pages}$$

$$M_{\text{frames}} = \frac{2^{27}}{2^{13}} = 2^{14} \text{ frames}$$

$$d = \log_2 PS = \log_2 2^{13} = 13$$

$$P = \log_2 N = 21$$

|        |   |
|--------|---|
| (A) 10 | 0 |
| (B) 11 | 1 |
| (C) 01 | 2 |
| (D) 00 | 3 |
|        | 4 |

$$f = \log_2 M = 14$$

logical address format



physical address format



size of page table = no of pages (entries)  $\times$  size of each entry

$$= 2^{21} \times 16 \text{ bits}$$

$$= 2^{22} \text{ bytes} \quad (\text{size entry} = 16)$$

$$\approx 2^{22} \cdot 4 \text{ MB}$$

- (Q3) Consider a ~~real~~ system supporting a 32 bit VAS with a page size of 4KB. If PAS is 64 MB calculate the approximate PT size in bytes:

Sol:

$$PS = 2^{12}$$

$$VAS = 2^{32}$$

$$PAS = 2^{26}$$

$$N_{\text{pages}} = \frac{2^{32}}{2^{12}} = 2^{20} \text{ pages}$$

$$N_{\text{frames}} = \frac{2^{26}}{2^{12}} = 2^{14} \text{ frames}$$

$$\text{size of PT entry} = 14 \text{ bits} \approx 2 \text{ bytes}$$

$$\text{size of PT} = 2^{20} \times 2 = 2 \text{ MB}$$

H4/6

$$PS = 2^{13} \text{ bytes}$$

$$PTS = 24 \times 2^{20} \text{ bytes}$$

$\in \underline{3 \times 2^{23} \text{ bytes}}$

$$e = 24 \text{ bits} = 3 \text{ bytes}$$

$$\text{no of pages} = \frac{24 \times 2^{20}}{3} = 8 \times 2^{20} = 2^{23} \text{ pages}$$

logical address:

|    |    |
|----|----|
| 23 | 13 |
|----|----|

$$LA = 23 + 13 = 36 \text{ bits}$$

$$LAS = 2^{36} = 64 \text{ GB}$$

H4/7

$$VA = l$$

$$N = Z \Rightarrow P = \log_2 Z$$

$$M = H \Rightarrow f = \log_2 H$$

LA:

|     |     |
|-----|-----|
| $P$ | $d$ |
|-----|-----|

$$l = P + d \Rightarrow d = l - P$$

$$d = l - \log_2 Z$$

PA:

|     |     |
|-----|-----|
| $f$ | $d$ |
|-----|-----|

$$PAS = 2^{f+d} = 2^{f+l-\log_2 Z} = \cancel{2^f} \cancel{\frac{2^l}{Z}} = \frac{2^{\log_2 H + l}}{Z} = \frac{H \cdot 2^l}{Z}$$

$$\therefore d = l - \log_2 Z, \quad PAS = \frac{2^{f+l}}{Z} = \frac{H \cdot 2^l}{Z}$$

H4/8

$$LA: 32 \text{ bits} \Rightarrow LAS = 2^{32}$$

$$PTES = 4 \text{ bytes}$$

$$PTS = 1 \text{ frame} = \text{page size} \quad \text{i.e., } PTS = PS$$

$$N \times e = PS$$

$$\frac{2^{32}}{\text{PS}} \times 4 = \text{PS}$$

$$\Rightarrow \text{PS}^2 = 2^{34}$$

$$\Rightarrow \text{PS} = 2^{17} \text{ bytes} = 128 \text{ KB}$$

### Performance of Paging:

#### 1) Temporal issue (Impact on time):

Address following the form  $(m_1 m_2 \dots m_n) \text{ PA} = \text{main memory address}$



Let main memory access time be 'm'.

so address translation time =  $m$  (overhead due to page table)  
Due to this throughput decreases  
data access time =  $m$

$\therefore \boxed{\text{Effective memory access time} = 2m}$

Now we reduce this overhead 'm' using the concept of TLB.

TLB: Translation Lookaside Buffer. (Logical address cache)

TLB is similar to cache.

→ TLB contains previously resolved VA and its corresponding PA

→ Let TLB access time be 'c' ( $c \ll m$ )

→ Now if corresponding VA is found in TLB, which takes less access time, we directly use the corresponding PA to access required data in the memory.

→ The event of finding required address in TLB is called TLB hit.

Here access time =  $C + M$

→ The event of missing required address in TLB is called TLB miss.

Here access time =  $C + 2M$  (Also '2m' if a parallel search is performed on TLB and memory)

→ TLB hit ratio 'x' is ~~not~~ fraction of hits ( $0 \leq x \leq 1$ )

$\Rightarrow$  TLB miss ratio =  $1-x$

Now,

$$\begin{aligned} \text{Effective Memory access time} \\ \text{with TLB} &= x(C+m) + (1-x)(C+2m) \end{aligned}$$

→ Hit ratio =  $\frac{\text{no of hits}}{\text{no of references}}$

Eg :  $m = 100\text{ns}$

$C = 20\text{ns}$

$x = 90\%$

$$MS = \text{miss access time}$$

Without TLB :  $AT = 2(100) = 200$

$$\text{Effec. Mem. AT} = 0.9(100+20) + 0.1(20+200)$$

$$= 108 + 22$$

$$= 130\text{ns} \quad (\text{less than without TLB})$$

If 'x' were 100%, then

$$\text{Effec. Mem. AT} = 0.1(120) + 0.9(220) = 210\text{ns} \quad (\text{more than without TLB})$$

→ The TLB we used is Logical address Cache.

If we use physical address cache

After obtaining required PA we need to access memory again.

To reduce this we can use physical address cache (PAC)

Physical address cache contains both PA and corresponding data.

→ with this now accessing instruction is



→ Let hit ratio of TLB =  $x_1$ ,

hit ratio of PAC =  $x_2$

access time of TLB =  $c_1$ ,

access time of PAC =  $c_2$

Memory access time =  $m$

Effective memory access time with both TLB & PAC

$$= x_1 [c_1 + x_2 c_2 + (1-x_2)(c_2+m)] \quad \begin{array}{l} \text{(This is valid if cache} \\ \text{is not In-line)} \end{array}$$

$$+ (1-x_1) [c_1 + m + x_2 c_2 + (1-x_2)(c_2+m)]$$

for PA

\* ~~not In-line cache~~ means both TLB & PAC are independent

142

\* In-line cache means TLB & PAC are dependent

For example

i.e., whenever we remove an entry from TLB (VA & PA)

the corresponding entry in PAC (PA & instruction) ~~is~~ <sup>or data</sup> also removed.

In this case if TLB miss occurs then there is no need to refer PAC (which would surely be a miss)

Effective memory access time with TLB & PAC (In-line cache)

$$= x_1 [c_1 + x_2 c_2 + (1-x_2)(c_2+m)] + (1-x_1)[c+2m]$$

\* (By default we consider not In-line)

H4/9

$$EMAT_1 = D = x_1 k + (1-x_1)(k+2m) \quad \text{--- ①}$$

$$EMAT_2 = Z = x_2 k + (1-x_2)(k+2m) \quad \text{--- ②}$$

$$\Rightarrow ① - ② \Rightarrow D - Z = (x_1 - x_2)k + (1-x_1 - 1+x_2)(k+2m)$$

$$D - Z = (x_1 - x_2)k - (x_1 - x_2)(k+2m)$$

$$D - Z = (x_1 - x_2)[k - k - 2m]$$

H4/9

without TLB

$$EMAT = D = 2m \Rightarrow m = \frac{D}{2}$$

with TLB

$$EMAT = Z$$

$$x\left(k + \frac{D}{2}\right) + (1-x)(k+D) = Z$$

$$xk + \frac{xD}{2} + k + D - xk - xD = Z$$

$$\frac{Dx}{2} = K + D - Z$$

$$Z = \frac{2(K+D-Z)}{D}$$

(174/10)

$$LA \text{ bits} = 32 \Rightarrow LAS = 2^{32}$$

$$PS = 2^{13}$$

word size = 4 bytes

e = Page table entry size = 4 bytes (1 word)

$$N = \frac{LAS}{PS} = 2^{19} \text{ pages}$$

$$\text{size of page table} = N \times e = 2^{19} \text{ words}$$

$$\text{time req. to load page table} = 2^{19} \times 100 \text{ nsec}$$

$$\text{total process runtime} = 100 \text{ msec} = 100 \times 10^6 \text{ nsec}$$

$$\text{req. fraction} = \frac{2^{19} \times 100}{100 \times 10^6} = \frac{2^{19}}{10^6} \approx 0.5242$$

i.e., 52.42%

## 2) Spatial Issue (space optimization)

Assume

$$LA = 32 \text{ bits}; PS = 4 \text{ KB}$$

$$\Rightarrow N_{\text{pages}} = 2^{20}$$

Page table size, PTS < Npages

$$PTS = 1 \text{ M entries}$$

If  $e(\text{PTEs})$  is 4 bytes

$$\text{then } PTS = 4 \text{ MB}$$

i.e., Every process will have a page table of 4MB

\* It is not desirable to have larger page tables. So we need to reduce page table size of a process.

## Reducing PT size of a process

i) Increase page size:

$$PTS = N \times e$$

$$\boxed{PTS \propto N \propto \frac{1}{PS} \Rightarrow PTS \propto \frac{1}{PS}}$$

However increasing PS has its own draw back.

Eg: Assume we have a program of size 1026 bytes

if  $PS = 1024$  bytes

here we require 2 pages in which we use only  
2 bytes of 2<sup>nd</sup> page

i.e., Internal fragmentation = 1022 bytes

if  $PS = 2$  bytes

then we require 513 pages.

i.e., Internal fragmentation

\* Increasing page size increases internal fragmentation

So, now, we need to come up with an optimal page size

optimal page size:

Let  $VAS = S$  bytes

$PS = p$  bytes

$PTE = e$  bytes

$$\Rightarrow PTS = \frac{VAS}{PS} \times PTE$$

$$\Rightarrow \boxed{PTS = \frac{S}{P} \times e}$$

Internal fragmentation can happen only in the last page

In the worst case IF could be almost equal to page size

$$\therefore \text{Avg. IF} = P/2$$

→ page table size & Avg IF are clearly overheads

∴ optimal page size is such that total overhead is minimum

i.e.,  $\frac{Se}{P} + \frac{P}{2}$  is minimum

$$\text{let } f = \frac{Se}{P} + \frac{P}{2}$$

$$\frac{df}{dP} = -\frac{Se}{P^2} + \frac{1}{2} = 0$$

$$\Rightarrow \frac{Se}{P^2} = \frac{1}{2} \Rightarrow P = \sqrt{2Se}$$

∴ optimal page size,  $P = \sqrt{2Se}$

w.r.t overhead of

PTs & IF

H4/11

$$VAS = 2^{16}$$

$$PS = 2^{12}$$

$$N = 16 \text{ pages}$$

$$N_{\text{text}} = 8 \text{ pages}$$

$$N_{\text{data}} = 5 \text{ pages}$$

$$N_{\text{stack}} = 4 \text{ pages}$$

} 17 pages

a)  $\therefore$  does not fit

a) Assume PS = 4 bytes

$$N_{\text{pages}} = \frac{2^{16}}{2^2} = 2^{14} \text{ pages} = 16384$$

$$\text{text} = \frac{2^{15}}{2^2} = 2^{13} \text{ pages} = 8912$$

$$\text{data} = \frac{16386}{4} = 4096.5 \Rightarrow 4097 \text{ pages}$$

$$\text{Stack} = \frac{15870}{4} = 3967.5 \Rightarrow 3968 \text{ pages}$$

Thus for PS = 4 bytes, the program fits

b) Assume max possible page size =  $2^x$  bytes

$$\Rightarrow N_{\text{pages}} = \frac{2^{16}}{2^x} = 2^{16-x} \text{ pages}$$

$$\text{text} : \frac{2^{15}}{2^x} = 2^{15-x} \text{ pages}$$

$$\text{data} : \left\lceil \frac{16386}{2^x} \right\rceil \text{ pages}$$

$$\text{Stack} : \left\lceil \frac{15870}{2^x} \right\rceil \text{ pages}$$

$$\Rightarrow 2^{15-x} + \left\lceil \frac{16386}{2^x} \right\rceil + \left\lceil \frac{15870}{2^x} \right\rceil \leq 2^{16-x}$$

$$2^{15-x} + \left( 2^{14-x} + 1 \right) + \left\lceil \frac{15870}{2^x} \right\rceil \leq 2^{16-x} \quad (\forall x \geq 2)$$

$$\Rightarrow \left\lceil \frac{15870}{2^x} \right\rceil \leq \frac{2^{16} - 2^{15} - 2^{14} - 1}{2^x}$$

$$\Rightarrow \left\lceil \frac{15870}{2^x} \right\rceil \leq \frac{2^4(4-2-1)}{2^x} - 1$$

$$\left\lceil \frac{15870}{2^x} \right\rceil \leq \frac{2^{14}-2^x}{2^x}$$

$$\left\lceil \frac{15870}{2^x} \right\rceil \leq 2^{14-x} - 1$$

put  $x=5$

$$496 \leq 512 - 1 \checkmark$$

put  $x=7$

$$124 \leq 128 - 1 \checkmark$$

put  $x=8$

$$62 \leq 64 - 1 \checkmark$$

put  $x=9$

$$31 \leq 32 - 1 \checkmark$$

put  $x=10$

$$16 \leq 16 - 1 \times$$

$\therefore$  Max page size such that the program could fit is

$$2^x = 2^9 = 512 \text{ bytes.}$$

## (ii) Hashed Paging | Paging with Hashing

\* This technique helps reduce size of page table.



→ Ad LAS & says max possible process size. However we may have smaller processes. In this case size of hash page table is very less than normal page table.

For example,

$$LAS = 2^{32} = 4GB \quad (\text{max possible process size})$$

$$PS = 4KB$$

In normal page table size of page table for any process is same.

Assume we have process of 128 KB

$$\Rightarrow \text{no. of pages} = \frac{128}{4} = 32$$

∴ Hash page table will have only 32 entries

However ~~normal~~ page table has  $2^{20}$  entries.

→ However hash page table may require extra time for searching through the list. So hash page table is space efficient but not time efficient.

### (iii) Inverted Paging / Reverse paging:

(This topic will be discussed in the end)

## 2. Multi-Level Paging / Hierarchical Paging:

\* The basic objective of multi-level paging is to reduce page table size overhead.

\* Multilevel paging is paging on page table.

\* Paging generally involves 3 steps:

i) divide the address space in pages

ii) Store the pages into frames of memory

iii) Access the pages of the address space through page table.

For example

Consider

$$LA = 32 \text{ bits}, PS = 4 \text{ KB}$$

$$\text{Now } N_{\text{pages}} = 2^{20} = 1 \text{ M pages}$$

without multilevel paging,  
the page table must be  
stored contiguously if it  
spans across several pages

Assume size of page table entry  $\approx 4$  bytes

$$\text{PTS} \approx 4 \text{ MB}$$

Now the page table itself is more than size of one page

$$\text{PT must be stored in } \frac{4 \text{ MB}}{4 \text{ KB}} = 1 \text{ K pages}$$

Now we page the page table using another page table (Outer page table) pointing to each slot of page table (Inner page table)

→ no of pages in inner

$$\text{Size of outer page table} = 1 \text{ K} \times 4 \text{ bytes}$$

$$= 4 \text{ KB}$$

i.e., 1 Page exactly



→ Now in the above scenario, assume we have a process of size 20 kB.

$$\text{i.e., } \frac{20 \text{ kB}}{4 \text{ KB}} = 5 \text{ pages}$$

~~normal page~~

Assume these 5 pages are

(2 code + 2 data + 1 stack)

→ Assume these 3 parts are present in 3 different location such that code is in 2 contiguous pages, data is in 2 contiguous pages and stack in one page such that we need 3 pages of page table. (As shown in the figure)

→ Now in the case of multilevel paging we need outer page table, and the 3 pages of inner page table for the execution of the processes. (i.e., (4KB), 1) (3KB each)

→ If we don't use multilevel paging then we must have

all 1M entries of page table (i.e., 4MB) in main memory.

→ Thus multilevel paging reduces the overhead of page table size that is to be stored in main memory.

### Addressing:



∴ Final logical address format for 2 level paging is



VA = 32 bits PS = 4 KB

152



→ Here 'f' is frame of main memory in which required page of inner page table is located.

→ 'f' is frame of main memory in which required page of process is stored.

→ For there offset 'd' is used to obtain required word.

H4/12



$$\Rightarrow d = 28 - 16 = 12$$

(i)  $\Rightarrow pg\ size = 2^{12} = 4\ KB$

$$N_{pages} = \frac{2^{28}}{2^{12}} = 2^{16}\ pages$$

in this problem  
page size of outer table & inner table are different

(ii) for translation we need outer pagetable & one page of inner page table in memory

i.e., page =  $9 \times 4\ KB = 8\ KB$

$$\begin{aligned}
 2^9 * 4B + 2^7 * 4B &= 2^{11} + 2^9 \\
 &= 2^9(5) \\
 &= 512(5) \\
 &= 2560 \text{ bytes} \\
 &= 2.5 \text{ KB}
 \end{aligned}$$

18/08/20

- \* If we use 2-level paging, we require 3 memory accesses in total.

Memory

$$\therefore \text{Effective memory access time} = 3m$$

without TLB

- \* Size for n-level paging without TLB

$$\text{EMAT} = (n+1)m$$

(H4/13)

$$VA = 46 \text{ bits} \Rightarrow VAS = 2^{46} = 64 \text{ TB}$$

$$\text{PTES} = 4 \text{ bytes} = 2^2$$

$$\text{let } PS = 2^x$$

$$N_{\text{pages}} = \frac{2^{46}}{2^x} = 2^{46-x}$$

$$\text{Size of 3rd level page table} = 2^{46-x} \times 2^2 = 2^{48-x} \text{ bytes}$$

$$\text{no of pages in 3rd level page table} = \frac{2^{48-x}}{2^x} = 2^{48-2x}$$

~~size~~ size of 2nd level page table =  $2^{48-2x} \times 2^2 = 2^{50-2x}$

$$\text{no of pages in 2nd lvl PT} = \frac{2^{50-2x}}{2^x} = 2^{50-3x}$$

$$\text{Size of 1st lvl page table} = 2^{50-3x} \times 2^2 = 2^{52-3x}$$

154

Given that

$$\text{Size of 1st lvl page table} = \text{size of frame}$$

$$\text{i.e., } 2^{52-3x} = 2^x$$

$$\Rightarrow 52 - 3x = x$$

$$\therefore x = 13$$

$$\therefore \text{page size} = 2^{13} = 8\text{KB}$$

~~logical~~ Page table entry size is same at all levels of page tables.

~~8 FT~~  $\therefore$  no of PT entries in one page of page table

$$= \frac{2^{13}}{4} = 2^{11}$$

$\therefore$  logical address format



Method 2:

Let  $2^x$  be req. PS

size of outer (1st lvl) P.T =  $2^x$

$$\text{no of entries} \leq \frac{2^x}{\text{PTEs}} = \frac{2^x}{4} = 2^{x-2}$$

LA:



Since after indexing to outer P.T, we go to a page of 2nd lvl

page P.T

no of entries in ~~second~~ a page 2nd lvl PT =  $2^{x-2}$

$\therefore$  LA



Since we get



$$\therefore \cancel{x} 3(x-2) + x = 46$$

$$\Rightarrow 4x = 52 \Rightarrow x = 13$$

$$\therefore \text{Page size} = 2^x = 2^{13} = 8 \text{ KB}$$

From the above question we can come up with a general formula for P.T with n level paging

Assume VAS =  $2^s$  bytes ; PS =  $2^x$  bytes ; PTE =  $2^c$  bytes

|                                                 |                                                                                    |
|-------------------------------------------------|------------------------------------------------------------------------------------|
| $\text{Size of OPT}$<br>$n\text{-level paging}$ | $= \left[ 2^{s-nx+nc} \right] \text{ bytes}$<br>(with uniform pages at all levels) |
|-------------------------------------------------|------------------------------------------------------------------------------------|

## II. Segmentation

Paging does not preserve users view of program

Program (segments)



- At user view program is divided into segments (each segment may be of diff sizes).
- Each segment has segment id. (assigned by loader)
- Different segments are stored non-contiguously. However a segment itself is stored totally at one location (contiguously).
- Here MMU is segment table.

|   | B    | L    |
|---|------|------|
| 0 | 500  | 300  |
| 1 | 1000 | 1500 |
| 2 | 2500 | 500  |
| 3 | 200  | 250  |
| 4 | 800  | 150  |
| 5 | 4000 | 2000 |

oldst B+Base address of segment  
L - length of segment

- Every process has its own segment table



Eg: For above example

$$LA(3,50) \rightarrow PA(200+50) = 250$$

Eg:  $LA(2,755)$   
     $\hookrightarrow (755 > 500)$   
    ∴ trap

## Performance of Segmentation

### (i) Temporal issue (impact on time):

- \* Exactly same as paging

i.e., ~~2M~~  
 $EMAT = 2M$  without TLB  
Silly we can add TLB & PAC (same equation as in the case of paging)

### (ii) Spatial issue

- \* Large segment table is undesirable
- \* One of ways to reduce size of segment table is to use paging on segment table.

#### Note:

- \* In case of segmentation, we don't divide PAS.

∴ PAS is organized with variable partitioning (MVT)

we know that variable partitioning suffers from external fragmentation

→ To overcome this external fragmentation we may use compaction or segmented-paging

reduces size of segment table

to overcomes the problem of EF

For memory allocation in segmentation also we can use first fit, best fit, worst fit & next fit

Note:

|              | IF | EF |
|--------------|----|----|
| Paging       | ✓  | ✗  |
| Segmentation | ✗  | ✓  |

IF - Internal Fragmentation

EF - External Fragmentation

(Q31)

Consider below segment table

| Segment | Base | Length |
|---------|------|--------|
| 0       | 219  | 600    |
| 1       | 2300 | 14     |
| 2       | 90   | 100    |
| 3       | 1327 | 580    |
| 4       | 1452 | 96     |

what the physical addresses for following logical addresses

a) 0,430

$$219 + (430 - 1) = 219 + 430 = 649$$

b) 1,10

$$2300 + 10 = 2310$$

c) 2,500

$$500 > 100$$

∴ trap

d) 3,400

$$1327 + 400 = 1727$$

e) 4,112

$$112 > 96$$

∴ trap

## Segmented - Paging:

Assume LAA  $\leq 34$  bits

$$\langle s, d \rangle = \langle 18, 16 \rangle \text{ bits}$$

$$\Rightarrow \text{Maximum segment size} = 2^{18} = 64 \text{ KB}$$

$$\text{Total no of such segments possible} = 2^{18} = 256 \text{ K segments.}$$

→ To avoid EF we apply paging on segment.

- i) Divide address space (segment) into pages
- ii) Store these pages in frames.
- iii) Access the frames through page table of a segment

Let page size = 1KB

$$\text{no of pages in a segment} = \frac{64 \text{ KB}}{1 \text{ KB}} = 64 \text{ pages}$$

→ Now PAS contains frames but not segments.

→ Now the page table of segment contains an entry for each page of the segment

→ Now every segment must be accessed through its corresponding page table.



no of Page tables = no of segments.

Thus we have 256k page tables.

Structure of segment table:



logical address format



we have applied paging on segment  
i.e., on 'd'



$2^{P_i}$  - no of entries in each page table

$2^{d_i}$  - size of page

d<sub>i</sub> say offset within the page

### Address Translation



\* Here we need to access segment table, then corresponding page table then the required data.

i.e., 3 memory accesses

$$\therefore \text{EMAT} = 3M$$

Like in the case of paging we can use TLB to reduce EMAT.

Here also the TLB contains VA & corresponding PA.

114/14 VAS = PAS =  $2^{16}$  bytes = 64 kB

no of segments in VAS = 8

size of each segment =  $\frac{2^{16}}{2^3} = 2^{13} = 8 \text{ kB}$

VA:

|   |    |
|---|----|
| S | d  |
| 3 | 13 |

Pg table entry size = 2 bytes

Let page size =  $2^x$

no of pages in a segment =  $\frac{2^{13}}{2^x} = 2^{13-x}$

i.e., no of entries in a PT =  $2^{13-x}$

size of PT =  $2^{13-x} * 2 = 2^{14-x}$

given size of PT = 1 frame size

i.e.,  $2^{14-x} = 2^x \Rightarrow x = 7$

$\therefore$  page size =  $2^x = 2^7$   
= 128 bytes

Assume segment table entry size = 4 bytes

size of segment table =  $8 \times 4 = 32 \text{ bytes}$

## Virtual Memory (VM):

VM gives an illusion to the programmer that a huge amount of memory is available for executing larger programs in small memory area.

- It is a technique that supports the goal of managing execution of larger programs in small memory area.
- CPU assumes that it can execute a program as big as VAS.

→ However PAS may not be as big as VAS.

So we need to store the program on disk.

From disk we load required pages into memory.

→ Sometimes required page may not be found in main memory. In that case required page is loaded into memory from the disk. This is known as demand paging.

### Demand paging:

Loading page from disk to memory at runtime based on demand is known as demand paging.

i.e., Virtual memory is implemented through demand paging.

### VM implementation through Demand Paging:

Let  $VAS = 8 \text{ kB}$ ;  $PAS = 4 \text{ kB}$ ;  $PS = 1 \text{ kB}$

$\Rightarrow N = 8$  ;  $M = 4$  ;  $d = 10$

#### Sharing:

- System libraries are shared by mapping system libraries into VAs.
- Multiple processes can share mem for process communication.

Shared memory region is considered to be part of both the processes' VAS



→ Whether the req page is present in the memory or not is known by valid/invalid (V/I) bit which is in the page table.

→ If required page is found in the memory then it is called page hit.

→ If required page is not found in the memory then it is called page miss or page fault.

|    |   |
|----|---|
| 10 | 1 |
| -  | 0 |
| -  | 0 |
| 01 | 1 |
| 0- | 0 |
| 00 | 1 |
| -  | 0 |
| 11 | 1 |

Page Table

## Page fault service:

1. Process that caused page fault gets blocked
2. Virtual memory manager now starts running (in kernel mode).
- Virtual memory manager tells disk manager about which page is missing.
3. Disk Managers communicates with device driver.
4. Device driver reads the page (through disk mechanism) and transfers required page to OS using DMA.
5. OS will try to load the page in one of the frames of user process.
6. If there is no empty frame to load new page then virtual memory manager has to initiate page replacement through which one of the pages in memory is replaced with required page. Here we have 2 cases:
  - (i) page that is selected for replacement is not modified then we can directly overwrite it with new page. (1 disk operation)
  - (ii) If it is modified (dirty page) then the modified page has to be saved in the disk. Later the new page is loaded (i.e., 2 disk operations)
7. After the new page is loaded we need to update page table.  
(Eg: changing V/I bit etc.)
8. Unblock the user process

Now when the process is scheduled again, it generates the same VA that it had generated previously and continues its execution.

This VM implementation through demand paging.

→ page fault service time is the time taken to service a page fault.

Here we need to reduce

This service time reduces throughput.  
So we need to reduce page fault rate.

If there is a TLB hit, then there is no page fault

→ Demand paging is of 2 types:



(By default we consider pure demand paging while solving questions)

Q32 The size of virtual memory supported by is eventually limited by size

Ans:

size of disk (Secondary Storage)

Q33 What

Note:

for real time implementation of virtual memory

$$PAS \leq VAS \leq \text{Disk AS}$$

## Performance of Virtual Memory

### (i) Temporal Issue: (time)

Assume

main memory access time = 'm'

page fault service time = 's' ( $s \ggg m$ )

page fault rate = 'p' ( $0 \leq p < 1$ )

page hit rate =  $1-p$

EMAT with

Demand paging

$$= (1-p)m + p*s$$

( $\because s \ggg m$ )

we don't take ~~s+m~~

In general pg. fault service time may range in milliseconds.

H4/16

$$PFST = 10ms$$

$$Mem. AT = 1\mu s$$

$$\text{page hit rate, } P = \frac{99.99}{100} = 0.9999$$

$$\text{EMAT} = \frac{99.99}{100}(1) + \frac{0.01(10 \times 10^3)}{100} \mu s$$

$$= \frac{99.99}{100} +$$

$$= \frac{99.99}{100} + 1$$

$$= 1.9999 \mu s \approx 2 \mu s$$

H4/17

$$\text{page fault rate} = \frac{1}{k}$$

$$\text{page hit rate} = \frac{k-1}{k}$$

$$\begin{aligned} \text{Effective instruction time } \cancel{\text{EMAT}} &= \frac{k-1}{k} i + \frac{1}{k} (i+j) = \frac{(k-1)i}{k} + \frac{i}{k} + \frac{j}{k} \\ &= i + \frac{j}{k} \end{aligned}$$

H4/18

PFST = 8 ms (empty frame or unmodified frame)

= 20 ms (modified frame)

Mem. AT = 100 ns

• page to be replaced is modified 70% of the time.

required ~~EMAT~~ EMAT  $\leq 2000 \text{ ns}$

• let p be page fault rate

$$\text{EMAT} = (1-p)(M) + p(0.7(20) + 0.3(8)) \quad \cancel{100}$$

$$= (1-p)(100) + p(14 + 2.4) =$$

$$= 100 - 100p + 16.4p$$

$$\Rightarrow 100 - 83.6p \leq 2000 \text{ ns}$$

$$\text{EMAT} = (1-p)(100) + p(0.7(20 \times 10^6) + 0.3(8 \times 10^6))$$

$$= 100 - 100p + p(14 \times 10^6 + 2.4 \times 10^6)$$

$$= 100 - 100p + 16.4p \times 10^6$$

$$\Rightarrow (164 \times 10^5 - 100)p \leq \cancel{1000} 2000 - 100$$

$$\Rightarrow (164000 - 1)(100p) \leq 1900$$

$$P \leq \frac{19}{163999}$$

max acceptable page fault rate =  $\frac{19}{163999}$

H4/19

Mem. AT = M (page hit)

= D (page fault)

for some processes

EMAT = X units.

page fault rate, P=?

$$X = (1-P)M + P(D)$$

$$X = M - MP + PD$$

$$\Rightarrow P(D-M) = X - M$$

$$P = \frac{X-M}{D-M}$$

Note:

Considering TLB & page fault rate the effective memory access time is

$$EMAT = x(c+m) + (1-x) \left[ \frac{(1-P)*m + P*s}{m+c} \right] \xrightarrow{\text{negligible}}$$

x - hit ratio of TLB

p - page fault rate

(ii) page replacement policies

Page reference string (More reference string)

\* set of successively unique pages referred in the given list of virtual addresses.

Eg: VAS:  $\langle 702, 755, 864, 084, 122, 560, 768, 934, 015, 125, 654 \rangle$

let page size = 100

Also all the above addresses are in decimal?

page  $P_0$  contains addresses 0 to 99

$P_1 \dots \dots \dots 100 \text{ to } 199$

$$\text{page number} = \left\lfloor \frac{VA}{PS} \right\rfloor$$

$$\text{page offset} = VA \% PS$$

∴ Reference string for above VA's is

$\langle 7, 8, 0, 1, 5, 7, 9, 0, 1, 6 \rangle$

(successive page numbers are unique)

Every reference string has 2 parameters:

i) length (i.e., 10 in above example)

ii) no of unique pages (~~no.~~ (n)) (i.e., 7 in above example)

### Frame Allocation Policies

Assume we have

'n' no of process ( $P_1, \dots, P_n$ ) -

demand of each process  $P_i$  be  $S_i$  no of frames

total demand for frames  $D = \sum_{i=1}^n S_i$

total frame available be:  $M$  ( $D > M$ )

frames allocated to process  $P_i$  be  $a_i$

Eg: Let

$$n=5, M=50$$

|       | $S_i$ | $a_i(1)$ | $a_i(2)$ |
|-------|-------|----------|----------|
| $P_1$ | 5     | 10       | 3        |
| $P_2$ | 20    | 10       | 10       |
| $P_3$ | 45    | 10       | 22       |
| $P_4$ | 18    | 10       | 9        |
| $P_5$ | 12    | 10       | 6        |

$$D: 100$$

$$\left[ \frac{AV}{29} \right] = 3.46 \text{ min page}$$

### ① Equal allocation policy

Every process gets equal no of frame

$$\therefore \frac{50}{5} = 10 \text{ frames}$$

### ② Proportional allocation

no of frames allocated is proportional to size

i.e.,  $\frac{s_i}{D} \times M$  no of frames for process  $P_i$

### ③ ~~x~~ 100% - rule

Every process is given  $x\%$  of the frames it has demanded.

This  $x$  is chosen based on main memory size and other factors.

## Page Replacement Techniques (pure demand paging by default)

Consider for a process

reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1

length of reference string  $= 20$

no of unique pages,  $n = 6$

### I) FIFO:

Assume no of frames allocate for the process = 3  
based on the time we loaded we replace

|   |   |   |   |   |
|---|---|---|---|---|
| 7 | 2 | 4 | 0 | 7 |
| 0 | 3 | 2 | 1 | 0 |
| X | 0 | 3 | 2 | 1 |

} frames

from above figure we have 15 page faults

$$\therefore \text{page fault rate} = \frac{15}{20} = 0.75$$

If we increase no of frame allocated, page faults decreases

If no of frames allocated = 4

|       |
|-------|
| X 8 2 |
| X 4 7 |
| X 0   |
| Z 8 1 |

: 10 page faults

$$\text{page fault rate} = \frac{10}{20} = 0.5$$

Ref-String II : {1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5}

FIFO

no of frame allocate = 3

|       |
|-------|
| X 4 5 |
| Z X 3 |
| Z X 4 |

: 9 page faults

$$\text{Pg fault rate} = \frac{9}{12} = 0.75$$

If no of frames allocated = 4

|       |
|-------|
| X 8 4 |
| Z X 5 |
| Z 2   |
| A 3   |

|       |
|-------|
| X 8 4 |
| Z X 5 |
| Z 2   |

: page faults = ~~10~~ 8

$$\text{page fault rate} = \frac{8}{12} = 0.67$$

In this example we got more page faults with more no of frames which is an anomaly.

This is known as Belady's anomaly

→ with increase in no of frames to the process, the page fault rate increases

→ Belady's Anomaly is observed only in FIFO & LIFO based replacement.

10/08/20

## 2) Optimal replacement:

Optimal replacement says that replace that page which will not be used for longest duration of time in future references.

set string:  $\langle 7, 0, 1, 2, 0, 3, 0, 4, 1, 3, 0, 3, 2, 1, 1, 2, 0, 1, 1, 7, 0, 1 \rangle$

3 frames:

|   |   |   |
|---|---|---|
| X | 2 | 7 |
| X | 4 | 0 |
| X | 3 | 1 |

$\therefore 9$  page faults (optimal writes to something alt)

4 frames:

|   |   |   |
|---|---|---|
| X | 3 | 1 |
| O |   |   |
| X | 4 |   |
| Z | 7 |   |

$\therefore 8$  page faults

Ref string:  $\langle 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 \rangle$

3 frames:



∴ page faults = 7

4 frames:



∴ page faults = 6

→ Optimal replacement ~~gener~~ has the least page fault rate!

Drawback:

- Optimal page replacement is not practically implementable since we can never know the future references.
- However this can be used as benchmark to measure the performance of other algorithms.

### 3) Least Recently Used (LRU):

→ LRU replaces that pt page which hasn't been referred for the longest duration of time in the past.

∴ selection criteria: Time of reference

Ref string: < 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 >

3 frames:

|   |   |   |   |   |
|---|---|---|---|---|
| X | Z | 4 | 0 | 1 |
| X | Z | 0 | 4 | 1 |
| X | Z | 2 | 7 | 1 |

∴ no of page faults = 12

(0 most frequent first) (least frequent first)  
4 frames:

|   |   |   |
|---|---|---|
| X | Z | 7 |
| O |   |   |
| X | A | 1 |
| 2 |   |   |

∴ no of page faults = 8

#### 4) Most Recently Used (MRU)

\* Most recently used page is replaced

Ref string: < 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 >

3 frames:

|   |   |
|---|---|
| X | O |
| O | Z |
| X | Z |

∴ no of page faults = 16

4 frames:

|   |
|---|
| 7 |
| O |
| Z |
| O |

∴ no of page faults = 12

### 5) Counting Algorithms



- \* Frequency means no of times the page is ~~refered~~ referred since it has been brought into memory. (i.e., if we load a page that has been loaded previously we start the count from 0)
- \* FIFO may be used ~~to~~ to resolve a conflict

3 frames:

Reference string:  $\langle 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 \rangle$



3 frames

LFU:



no of page faults = 13

MFU:



∴ no of page faults = 15

4 frames :

LFU :



Queue

1 7 0 X 2 3 4 1

no of page faults = 9

4FU :



no of page faults = 9

12 is Ans

Note:

→ Practically LRU has been found to give less page faults  
(i.e., best performance & close to optimal)

Implementation of LRU:

\* Here we need to monitor time of reference (TOR)

There are two methods to monitor time of reference

(i) Counter method :

we maintain a counter register which is incremented for every memory reference.

every clock pulse. whenever we refer a page

we assign the value of counter as time of reference for that page. Initial value of counter is 0.

### limitation :

\* An n-bit counter can count only  $2^n$  values after which ~~counter~~ count starts from 0. If count goes to '0' then the algorithm fails.

### Lru Stack method

→ Here we use a stack into which we push every new page we replace.

→ At the time of replacement, we need to replace ~~with~~ the least recently used page. This page is present at the bottom of the stack. Thus we need to pop out all the elements and push them again (except bottom element) and finally push the latest page onto top of the stack.

→ If a page hit occurs, then the referred place might be present somewhere in the middle of the stack. This element has to be onto the top of the stack for which we need to perform push & pop operations again.

### Advantage:

\* Unlike counter method, it works good in any case.

### Disadvantage:

\* push & pop operation requires more time.

### LRU Approximations

\* LRU approximations are the class of replacement techniques that pretend to work like LRU. i.e., they approximate to the behaviour of LRU.

we have 4 types of LRU approximations

i) Reference bit (R):

- $R \leftarrow$
- 0 : page not referred so far during present epoch
  - 1 : page is referred ~~so far~~ atleast once so far during present epoch.

Consider below page table

| Pg No | frame No. | V/I | TOL | R |
|-------|-----------|-----|-----|---|
| 0     | a         | 1   | 2   | 1 |
| 1     | b         | 1   | 4   | 0 |
| 2     | -         | 0   | -   | - |
| 3     | c         | 1   | 0   | 1 |
| 4     | d         | 1   | 3   | 1 |
| 5     | e         | 1   | 1   | 0 |

while implementing this algorithm we may not have TOL column

(TOL - Time of loading)

(Here in the Pg table Pg P<sub>1</sub> & P<sub>5</sub> are not referred in the present epoch)

\* Time is divided into equal interval and each interval is called an epoch.

\* When an epoch is finished all the 'R' values are cleared.

Algorithm:

→ If page hit occurs, then set 'R' of the corresponding page table entry.

→ If page fault occurs, start searching for ~~R=0~~ from the first entry for R=0 and replace the first encountered entry's corresponding page.

→ If 'R' value is one for all the pages then the algorithm fails.

\* To overcome this limitation we use next LRU approximation which is additional reference bits.

(ii) ~~Additional~~

(ii) Additional Reference bits:

Here we have more than one reference bits, say 'n', indicating past 'n' references ~~is~~ of past 'n' epochs

For example take  $n=8$

$$P_i : \frac{R_1 R_2 \dots R_8}{10011111}$$

$$P_j : 01101111$$

$$R_k : 10111111$$

\* Least significant bit corresponds to current epoch

\* Most significant bit corresponds to most previous epoch

For above example if a page fault occurs  $P_j$  is replaced.

( $\because$  the 4 least significant of the 3 processes are '1')

However  $R_4$  of  $P_j$  is 0 and  $R_4$  of  $P_i$  &  $P_k$  are 1

$\therefore$  Replace  $P_j$ )

\* After every epoch we need to left shift the reference bits by one bit.

\* However this method also has a chance to fail when all the reference bits of all the processes are 1's. Anyway the probability for this is very less.

(iii) Second Chance (Clock Algorithm) (Read from TB)

Selection criteria: (Time of Loading & Reference bit)

→ The page with R value as 0 and least TOL (early loaded page among R=0 pages) is replaced.

The pages whose TOL is less than replaced page, their R value is set to '0'.

Also the arrival time of those pages is set to current time

→ when all R values are 1's, the earliest loaded page is replaced.

i.e., The algorithm, when all R values are 1, degenerates to FIFO

\* Hence this algorithm (FIFO based) also suffers from Belady's anomaly.

(iv) Enhanced Second chance / Not recently used (NRU): (Read from TB)

Selection criteria: (Reference bit & modified bit)  
 $\begin{matrix} \text{(R)} \\ \text{(M)} \end{matrix}$

Modified bit  $\begin{cases} 0 : \text{page is not modified since its loading} \\ 1 : \text{page is modified since the time of its loading} \end{cases}$   
 Dirty bit

this means the page  
~~was~~ is not  
 referred in this  
 epoch but is  
 referred, and  
 modified in some  
 other previous epoch

| R | M       |
|---|---------|
| 0 | 0 (I)   |
| 0 | 1 (II)  |
| 1 | 0 (III) |
| 1 | 1 (IV)  |

\* If a page

### Page Buffering Algorithms:

#### (i) Maintaining free frame pool:

i) choose victim; ii) read page into one of free frame from pool; iii) victim is written out later. (This will help restart process quicker). victim is added to free frame pool.

we may even maintain list of modified pages and write them on disk and reset their modified bit. This ↑ probability of find clean frame

we can even keep track of page present in frames of free frame pool. So if same page is req, it doesn't need to be read in. This will help in ↓ penalty for replacing page.

then page with  $RM = 00$  is replaced

if no such page  $RM = 01$  is replaced

if no such page  $RM = 10$  is replaced

if no such page  $RM = 11$  is replaced

\* The above search starts from the first entry.

H4/20

length of ref string: L

no of unique pages: k

allocated frames: Z

\* In the worst case every reference can be a page fault

i.e., L

\* Since it is pure demand page we must have atleast k page faults

~~In the best case we can have all page hits after loading the servicing the last page fault~~

Eg: Let  $Z=3$ ,  $k=5$

Ref string: 1, 2, 3, 4, 5, #, 1, 5



In the best case the above could happen.

H4/21

Ref ~~string~~:  $P_1, P_2, P_3, P_4 \dots P_{n-1}, P_n, P_n, P_{n-1}, \dots, P_3, P_2, P_1$

After referring till  $P_n$  we will have

↳ (This is not reference string since we have  $P_n$  twice successively)

$P_n, P_{n-1}, P_{n-2} \dots P_{n-(k-1)}$  in the frames

$\therefore$  The next k references ( $P_n, P_{n-1}, P_{n-2} \dots P_{n-(k-1)}$ ) will be hits

total references =  $2n$

no of hits =  $k$

$\therefore$  no of page faults =  $2n - k$

H4/22

Ref string: ~~1, 2, 4, 5, 1, 2, 2,~~  
1, 2, 4, 5, 1, 2, 3.

$\therefore$  length of ref string = 7

no of frames allocated = 1

$\therefore$  7 page faults

H4/23

$$VAS = PAS = 2^{16} \text{ bytes} = 64 \text{ KB}$$

$$PS = 512 \text{ bytes} = 2^9 \text{ bytes}$$

$$PTEs = 4 \text{ bytes} = 32 \text{ bits}$$

$$\text{no of frames} = \frac{2^{16}}{2^9} = 2^7$$

no of bits to store frame number = 7

$$V/I = 1$$

$$R = 1$$

$$\text{Modified} = 1$$

$$\text{Page Protection} = 3$$

{ 13 bits}

$$\therefore 32 - 13 = 19 \text{ bits}$$

19 bits can be used to store other information

$$\text{no of pages} = \frac{2^{16}}{2^9} = 2^7$$

$$\text{size of page table} = 2^7 \times 4 = 2^9 = 512 \text{ bytes}$$

## Thrashing:

\* Excessive / high paging activity

+  
act of servicing page fault

i.e., high page fault rate

\* Thrashing is undesirable

\* Thrashing cause more processes to block

## Cause of Thrashing

→ Lack of frames (i.e., Lack of memory) } Primary reasons for

→ High degree of multiprogramming } thrashing

(we also have some secondary reasons which we will explore later)



### Thrashing Control strategies:



\* Prevention is done by controlling degree of multiprogramming. (controlled by long term scheduler)

#### Detect & Recover

Detection : \* High degree M·Pr & low CPU utilization  
 $\Rightarrow$  thrashing  
 \* High paging (high disk utilization)

The above 3 conditions confirms the presence of thrashing

Recovery: Decrease the degree of M·Pr

(This done by suspending the processes)

### Secondary reasons for thrashing:

→ Page replacement policy

→ page size:

page size must be large to reduce page faults.

It is because with large PS we have less pages.

(Large PS  $\Rightarrow$  less pages  $\Rightarrow$  less page faults)

→ Programming techniques & Data structures used by programmers. This fact is explained through below case studies.

### Case study I:

integer  $A[1..128][1..128]$ ;

1) for  $i \leftarrow 1$  to 128

for  $j \leftarrow 1$  to 128

$A[j][i] = 1$ ;

2) for  $i \leftarrow 1$  to 128

for  $j \leftarrow 1$  to 128

$A[i][j] = 1$ ;

Assume we ~~are~~ store the array in row major order

also let page size = 128 words

and each element of array is one word

$\Rightarrow$  each row is stored in one page exactly

Assume we are using pure demand paging

and replacement technique FIFO.

$\rightarrow$  Assume the OS has allocated 127

frames to the process.



page faults with 1st program

$\rightarrow A[1][1], A[1][2], \dots, A[1][128], A[2][1], \dots, A[2][128], \dots, A[128][1], \dots, A[128][128]$

Here every reference causes a page fault

i.e., 128 page faults.

Similarly for 2nd column we get 128 page faults

In this way

$$\text{total no of page faults} = 128 \times 128$$

Page faults for 2<sup>nd</sup> program :

$$\rightarrow A[1][1], A[1][2] \dots A[1][128]$$

page fault      page hit

$\therefore$  1<sup>st</sup> row causes one page fault

∴ we get 1 page fault for 2<sup>nd</sup> row

$\therefore$  total p no of page faults = 128

$\rightarrow \therefore$  2<sup>nd</sup> program is 128 times faster than that of 1<sup>st</sup> program

Note:

- \* If we use row major order to store 2d array then it is better to access in that order and silly for column major order.

This is known as principle of locality

- \*  $\therefore$  A programmer must use data structures that ensure principle of locality.

### Case Study - II

- \* If we consider array & linked list,

array allocation is contiguous ~~and~~ prep

linked list allocation is non-contiguous.

$\therefore$  linked list may cause more ~~at~~ page faults.

- \* The elements of array obeys principle of locality.

### Case Study III

#### Linear Search vs Binary Search

- In demand paging, b binary search causes more page faults because we don't access elements contiguously.  
whereas in linear search ~~we~~ we search in ~~linear~~ contiguous way causing less page faults and obeying principle of locality.

### Case Study - IV

#### Stack:

Stack operation which are push & pop are performed at the same end. ~~and~~   
 $\therefore$  less page faults

#### Hashing :

Hashing distributes key across memory and is more likely to cause more page faults.

#### Note:

- In general a Data Structure or a programming technique in demand paging is said to be good iff it obeys the concept of locality of reference.

## Working Set Strategy

III. page 83

- \* The objective of working set strategy is to control thrashing and utilize memory ~~more~~ effectively.
- \* Working set strategy is based on principle of locality of reference.

Consider below program

```
main() // 10KB          f() // 5KB           g() // 30KB       h() // 3KB
{ : { : { : { :
    f();      g();      h();      printf();
    ;         ;         ;         ;
} } } }
```

Total progr. program size = 73 KB

Assume page size = 1 KB

Assume the process has been allocated 35 frames

\* This is static allocation.

→ As the process executes it changes from one locality to other locality (i.e., from one function to other function). When process is in main 10 frames would be enough. When it is executing in f(), 5 frames would be enough.

→ Thus allocating frames based on the locality in which the process is executing would reduce thrashing and utilize memory more effectively.

→ To meet this requirement we use dynamic frame allocation technique.

- \* The working set strategy's working is based on the locality of reference.

It estimates the locality in which the process is running and no of frames are allocated accordingly.

- \* Estimating the size of locality:

\* Consider the reference string for ~~between~~ process  $P_i$ :

$$P_i: \langle 7, 0, 1, 2, 3, 0, 2, 16, 18, 22, 18, 16, 35, 38, 39, 34, 35, 37, 38, 36, 34, 36, 37 \rangle$$

$\Delta = 10$

Assume these references are generated in time  $t$ . After ~~At any given~~ time  $t$ , we need to know the locality in which the process is executing and size of locality.

\* observing the patterns we can know localities & size of localities.

\* to estimate the size of locality we define a parameter called working set window (wsww) at time  $t$  for a process

$$\text{Working set window}_i^t = \left\{ \begin{array}{l} \text{set of unique pages referred in the past} \\ (\text{wsww}) \quad \Delta \text{ references} \end{array} \right\}$$

where  $\Delta$  is an integer.

For example take  $\Delta = 10$

$$\Rightarrow \text{wsww}_i^t = \{ 34, 35, 36, 37, 38, 39 \}$$

$$\Rightarrow \text{wsww-size}_i^t = 6$$

size of locality is 6 pages

$\therefore 6$  is the demand

### Framework:

→ Let 'n' be no. of processes

→  $s_i$ : Demand for frames by process  $P_i$  (Estimated as shown  
(i.e., wsw size $_i$ ) in the previously  
example)

$$\rightarrow \text{total demand, } D = \sum_{i=1}^n s_i = \sum_{i=1}^n \text{WSWS}_i$$

→ Available frames = M

#### Case (i):

$$D \geq M$$

i.e., Balanced & no thrashing

#### Case (ii):

$$D < M$$

i.e., No thrashing & we can further increase deg of M.pr

#### Case (iii):

$$D > M$$

i.e., Thrashing

∴ Working sets strategy helps control <sup>thrashing</sup> strategy and utilize memory better.

#### Note:

\* The success of Working Set Strategy depends on the value of  $\Delta$ .

\* If  $\Delta$  is too small  $\Rightarrow$  more page faults.

\* If  $\Delta$  is too large

$\Rightarrow$  we also hold pages of previous locality  
and this is ineffective use of memory

Page ref: c c d b c e c e a d

$$\Delta = 4$$

$$WST_0 = \{a, d, e\}$$



$$\therefore \{e, d, a\}$$

Minimum faults = 2 - 1 = 0  
i.e., {e, d, a}

Statement illustrating 0 faults with each bad scheme

Ref to reading ref id 3904 \* without the numbers of  
bytes written) - 2 - 1 0 , 2 3 4 5 6 7 8 9 10  
• • • • •  $\Rightarrow$  5 page faults

$$t_0: \langle e d a \rangle : 3$$

$$t_1: \langle e d a c \rangle : 4 \rightarrow \text{At time } t_1, 'c' \text{ is referred}$$

$$t_2: \langle d a c \rangle : 3 \quad \text{since WSW, at } t_0 \text{ it doesn't have}$$

$$t_3: \langle a c d \rangle : 3 \quad \text{it causes page fault}$$

$$t_4: \langle c d b \rangle : 3$$

$$t_5: \langle d b c \rangle : 3$$

$$t_6: \langle d b c e \rangle : 4$$

$$t_7: \langle b e c \rangle : 3$$

$$t_8: \langle c e \rangle : 2$$

$$t_9: \langle c e a \rangle : 3$$

$$t_{10}: \langle c e a d \rangle : 4$$

$\Rightarrow$  Only at time  $t_4$ , 'b' is referred

since 'b' is not present  
in WSW at  $t_3$  a page  
fault is generated

$$\text{Avg no of frames needed} = \frac{32}{10} = 3.2 \text{ frames}$$

22/6 - both to substitute logically

23/6 - both to substitute logically

24/6 - both to substitute logically

25/6 - both to substitute logically

26/6 - both to substitute logically

27/6 - both to substitute logically

28/6 - both to substitute logically

29/6 - both to substitute logically