

use hierarchical page table

## ② swapping

data in file area of  $2^{\circ}$  mem is accessed by file system interface while the swap area is accessed by OS w/o file sys. interface for faster access.

Address mechanism in  $2^{\circ}$  mem

[cylinder no / track no / sector no]

load prog using swap area:



after linking, move page to swap area and then move it to mm using swap in

Usually we swap only the stack & heap area of a process becoz we can load new program to the code area  $P_1$  and later when  $P_1$  is needed we load its code area

from 2 mem pikk stack and heap from  
swap area to improve swap area  
utilisation.

⇒ lazy swapper (in paged mem. management)  
is called "page".

swaps in or out one page at a time  
based on the current requirement of mm

Using page we create:

• virtual memory

virtual address space.

logical view of address space

of a program (0...n-1) n - size of program



virtual address {  $\frac{P_1}{0} \quad \frac{P_2}{0}$   
                  :      :  
                  m-1   m-1

Physical address

address of an instruction in  
mm.



Usual convention is that all the address space of a process must be loaded in m.m before its execution.

Now, LIMITATION DUE TO ALLOCATION OF ENTIRE LOGICAL ADD. SPACE

① Let  $\frac{P_1}{0}$   $\Rightarrow$  n. must be less than size of m.m

Limitation: virtual address space of program is less than size of m.m

But now, we know for  $i^{th}$  instruction to execute  $i^{th}$  instruction & its data must be there in m.m

② Also if we allocate both stack & heap, large free space will be present w/o use for some time.



③ Also we might load many interrupt or error routines to  $\text{mm}$  which may not always occur during execution & thus causes wastage of space

④ compile time data initialisation

e.g. matrix multiplication

$A[1000][1000]$

$B[1000][1000]$

$C[1000][1000]$

compiler allocates almost a million bytes of memory where we actually use only a part of it. (i.e., some virtual address space is large, for its execution large amount of  $\text{mm}$  is allocated, while we actually use only a part of it in execution)

⑤ some features of a program may seldom get executed in which case loading entire program causes wastage of memory.

Improvement. To avoid the above limitation

- Allocate phy. mem only when its needed  
(not entire at start itself).

## • Demand page mem. management.

~~Load a page only when its needed by a program.~~

## Implementation



eg: i: mov EAX, [DATA1]

EAX, [DATA]  
max no. of page fault = 2  
instruction fetch

here after execution we need to  
fetch the instr. again

• store IP after decrementing

b) for [DATA] page fault @ data fetch.  
Save & load states of process  
in PCB

eg ②: MOV EAX, ECX → instr. fetch fault

③ MOVSB move max 2<sup>56</sup> bytes from  
one mem loc to another.

more than 1 page fault

④ ADD EAX, ECX → here source operand  
itself is modified, i can't  
perform same instr again  
after page fault

(how handle it)

- ① Virtual memory allows us to execute a program without loading the entire address space to MM.
- ② Also we can run a program whose size is greater than size of MM and can load and execute more process at a time. (as each program only uses less space)
- ③ less I/O required to load/swap out a process. (less no. of pages need to be loaded as not all are needed at the moment.)

We implement VM using demand paging

logical page 

|        |   |
|--------|---|
| frame# | 0 |
|--------|---|

 → logical page was in 'frame#' but it got replaced later on. ∴ logical page is not present in MM.

So how is page fault implemented?

Wherever page fault occurs, user have no control over it. System generates an interrupt.

In 80x86, CR2 stores cause and address at which an exception like page fault has occurred. (present in all micros in some way)

When interrupt occurs, CPU status, registers, IP, CS are stored and popped back after the ISR is executed to

continue execution

In case of page fault exception:

→ push SS, ESP → <sup>stack base</sup> stack pointer → to kernel stack

→ push EFLAGS, CS, IP;

→ push error code

→ JMP ISR of page fault

What does ISR do?

TSS  
Task stack segment  
SS0 & SP0  
of  
page  
fault  
in 80x86

① first OS must check if the mem-access  
is a valid one or not in  $\text{CPTLR} \times 512$

② If yes, then we need to load the page  
from 2<sup>nd</sup> mem.

→ for that @ first: we save gen. purpose  
registers

→ then initiate a disk read operation

→ change status of process to WAITING

as many other process's read

req. might be there in queue

waiting for free memory in MM

→ When our disk read is initiated  
it has:

- a) seek time
- b) latency time
- c) transfer time



→ handle move to pos using stepper motor



tracks with track #

sectors

cylinder #  $\Rightarrow$  all tracks in all sides;

with track #  $a$

track #  $\Rightarrow$  surface # of cylinder

sector #  $\Rightarrow$  which sector of a surface

seek time: time need to move

the read/write arm from

previous state to current

position (usually 5 ms)



latency time: time needed

for disk's req. sector to

reach the read/write head (usually 3ms  
for half revolution)



Now, if we need 5 ms for one revolution

of a disk with  $n$  sectors, we need the

read/write head to get data for  $\frac{5}{n}$  ms

It's called transfer time

time to read  
a sector

total disk read time = seek + latency + transfer time

during this time, scheduler runs some other program. When data is transferred.

HW generates an interrupt

2<sup>o</sup> mm Then currently running process waits. we load the page read to mm & update the page table entry of the process. Then the earlier process which is in wait remain continues its execution. When our process is scheduled some other time, we execute the same instruction that caused page fault.

Now since instruction execution time is negligible when compared to seek time... Cost of page fault lies in disk read (wait time is assumed to be 0)

Now assume page fault occurs 1%

of time and cost 8 ms.

$$\text{Av. mem. access time} = 0.99 \times 1 \text{ ns} +$$

$$0.01 \times (1 + 8 \times 10^8 \text{ ns})$$

$$= 0.99 + 8 \times 10^4 \text{ ns}$$

Memory access time =  $\approx 8 \times 10^4 \text{ ns}$

Time of block access at high speed  
during if ever 1% page fault causes, A.M.A  
becomes 80,000 times slower.

## A Advantage of demand Paging



## copy-on-write

when fork() is executed parent & child will have same code space & data space.



Now whenever parent or child writes something, the one who wrote copies the page to a new frame in MM and updates their copy.



if parent modifies its 4<sup>th</sup> code page then that page is copied to some other location and parent writes there.

Now since in most of the cases,  
the code area is read only, it  
improves the performance of MM.  
(no need for separate address space for  
parent & child at all times)

---

Now whenever page fault occurs, suppose  
if no mem. is available to load page  
to, either we can

- ① terminate P<sub>i</sub> (not efficient)
- ② swap out P<sub>i</sub> (costly)
- ③ page replacement

→ replace an allocated page  
in MM with page to be loaded from 2<sup>nd</sup>  
mem.

here :

(i) the page replacement algo chooses  
a page in MM & swap out it to 2<sup>nd</sup> mem/  
updates the process's page table entry to  
invalid and update free frame list

(ii). Then, the free page is loaded to the free frame



Now, if Dirty bit is 1, before swap out we need to write back the content of swapping page to  $2^{\text{th}}$  mem to avoid data loss. Here page fault need  $2 \frac{1}{2}$ % write back and swap in  $\frac{1}{2}$ %

but if dirty bit is 0, it means that page hasn't been modified, and its content is same as its copy in  $2^{\text{th}}$  mem (loaded at first), no need to write back we only need to load new page to that frame ... only  $1 \frac{1}{2}$ % operation needed

Now how a page is selected for replacement?

- Select a fitting page from process that generated page fault
- or select any fitting page from any process

Now to select fitting page we use page reference string.

- It can be a randomly generated page# but that won't improve efficiency
- Create a sequence of (page ref#, offset) and using that generate a list of pages ref# that can generate page fault called reference string

e.g.:  $(0,0), (0,10), (1,5), (1,6), (10,7), (5,2), (8,3)$

as 0<sup>th</sup> page accessed just before  $\therefore$  it's there in MM

pages that can generate fault

Based on the no. of page fault occurred using reference string of mm, we can evaluate efficiency of a page replacement algo.

### ① FIFO

The first page entering mm will be swapped on page fault. (new pages are added to tail of FIFO queue.) or add a timer field

e.g.:  $\checkmark \checkmark \checkmark \checkmark \checkmark \checkmark \checkmark \checkmark \checkmark \rightarrow$  faults  
 $0, 5, 7, \underline{0}, 2, \underline{1}, 3, 1, 4, 1, 7, 8$   
already swap  
in mm  
∴ no fault. as it came 1st & never



FIFO queue:  $\emptyset \neq 7 \neq 3 \neq 4 \neq 7 \neq 8$

|   |
|---|
| 8 |
| 3 |
| 4 |
| 7 |

→ (10-) page fault

→ not efficient

→ easy implementation

Now, what if we ↑ the size of MM?

Usually it ↓ the # page fault

(MP) But FIFO algorithm actually increases the # page fault when certain seq. of page access is made even if we  $\downarrow$  mm size

### disadvantage of FIFO

→ It also  $\uparrow$  page faults as it always replace active pages (it doesn't look recent reference)

✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓  
1 2 3 4 1 2 5 1 2 3 4 5  $\Rightarrow$  9 faults

e.g:



mem. frame = 3

Now if we  $\uparrow$  frame does  $\#$  page fault ↓

✓ ✓ ✓ i i i 5 1 2 3 4 5



$\Rightarrow$  10 faults



even though # frames  $\uparrow$

# page faults  $\uparrow$

not an efficient algo.

① These anomalies are called Belady's Anomaly.

② Optimal Algorithm

In FIFO, when 35 comes we replace 1 but immediate reference is for 1.  $\therefore$  here we don't have info about future reference. Now, if we know about the future references, we could replace 5 with a page that is not referred for longest time.

eg:  $1 \xrightarrow{mm} 2 \xrightarrow{mm} 3 \xrightarrow{mm} 4 \xrightarrow{mm} 5 \xrightarrow{mm} 1 \xrightarrow{mm} 2 \xrightarrow{mm} 3 \xrightarrow{mm} 4 \xrightarrow{mm} 5 \xrightarrow{mm} = 6$

replace 5 with 4 since its the page in mm that is referred after 1 2 & 3.



replace 4 with 1, 2 or 3 as 5 is referred immediately.

But even though # page fault is less,  
its not possible to implement as it  
need info about future reference.

(can be used to compare efficiency)

③

L R U

Now here the least recently used page is  
replaced.

7 0 1 2 0 3 0 4 2 3 0 3 ? 2 1 2 5 1 7 0 1  
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓

In optimal algo, we look for a page that  
is not used for longest time in future.

In LRU algo, we look for a page that is  
not used for longest time in past.

(based on locality)

|    |    |
|----|----|
| 21 | 01 |
| 03 | 03 |
| 2  | 7  |

⇒ 12 page faults

how to implement?

We need an additional field to say time of most recent access.

① use counter

② stack.

① After each mem. access, counter - a hw - is incremented, and counter value is copied to the field corresponding to each page (in pagetable) that shows time of last access



The page with min (time of last access)

is the page to be replaced.

### Issues:

- ① need to check all mm entries in page table to find min(counter value) - i.e., search in all pt where ( $v=1 \& \text{counters} = \text{min counter}$ )
- ② each mem access needs an additional mem. access to write counter value to mm (costly!)

Now we can avoid checking all entries by using stack

i.e., LRU page is always @ bottom of stack. Whenever a page is reference (if its present in stack, its removed) and put on top of stack.

Now using doubly linked list we can implement the stack

If same page is accessed no updation. But for each new page we need 6 updations in memory to update linked list & links to remove an entry from & to add one to the end.

This is costly. ∴ modern system uses LRU approximation.

① It uses additional reference bits in page table.



Now at each time slice if a page is referenced we shift the additional ref. bit to right by 1 and store 1 at its leftmost bit else just shift (i.e., we add ref. bit value to leftmost end of additional ref. bit & shift it right).

at last after n time slice  
seg: 010101110111  
8 10110101101011 CRU = page 8

∴ we can find CRU using just integer comparison.

But if 2 have same ref bits use FIFO or some other algo to select.

But if there are ~1000 pages & only 8 bits, the no. of pages with same ref.

bits is high.

∴ to avoid ↑ # bits, ↑ mem. of page table. To avoid this decrease

the # reference bits.

② If # ref. bits = 0, that algo is called

second chance / clock algo. if it only has a single ref. bit (no additional bits)

|       |    |
|-------|----|
|       | 01 |
|       | 0  |
|       | 0  |
| Pages | 01 |
|       | 1  |
|       | 0  |

initially page fault  
forward page fault,  $\rightarrow 0 \rightarrow 1$  for

here whenever a page fault occurs.

we look for a page to be replaced by looking at its ref. bits. if its 1

then its made 0 (gives a second chance) and its made 0 (gives a second chance)

and look the ref. bit of next page.

When we encounter a page with ref.

bit 0, its made 1 and replaced, and move to next page (handle).

In worst case if all ref. bits are 0.

then since the page and is circular queue, after all R. are made 0, it again checks the first page, since it was made 0 in last check, its replaced

### ③ Enhanced 2<sup>nd</sup> chance algo.

here dirty bit is used along with ref. bit. ∴ each page get 4 chances

if no  $(0,0)$  → best page to be replaced  
choose  $\rightarrow (0,1)$  → (no new ref & no need for write back)  
if no  $(0,1)$  → next best (need to write back)  
if no  $(1,0)$  → next class (no write back). give second chance

After this step → mostly recently referenced & need to write back

replace the first non-empty class pages.

∴ to implement this we need multiple

traversal through auxilal queue.

1<sup>st</sup> check if any  $(0,0)$  page exist. else

check for subsequent classes.

## Counting based algorithm

has a counter field along with each page. so whenever a page is accessed its counter is incremented

Now how to choose page?

(a) LFU (least frequently used)

Whenever a page fault occurs, the page with least no. of access is replaced.

problem:

The page with least no. of access may have large no. of access in coming future

(b) MFU (most freq. used)

Replace page with max # references.  
i.e., assuming the recently brought to mem. pages with low counter may have many accesses in near future.

(not comparable to optimal algorithm)

## Page buffering algorithm

→ here OS keeps a free frame pool.



After the 80 frames are filled, process generates a page fault.

here when the OS is executing the page replacement algorithm, page moves the page to be accessed from  $2^{\text{nd}}$  mem to the free frame pool. if the victim then the victim page is moved to free frame and the page brought to mm to the victim's location. If the victim has dirty bit 1, then its written back & finally free frame is freed.

Again if we keep track of pages in

free frame table, we can avoid

moving page from 2<sup>o</sup> mem to mm if

its already there in free frame pool

(when it was victim, it was moved to there)

### A additional enhancement

Whenever page is empty (if 10 is free), page writes back a page with dirty bit 1 in mm & make it 0 to that it can be avoided when this page is selected as victim in the future.

### Allocation of frames

In virtual mem., when size of virtual mem is larger than size of physical memory how to allocate frames?

→ Depends on mem. architecture

① To get min. no. of frames.

Assume each byte is 1 word & mem op  
how many min. no. of page is required

to execute this?



2 pages → instr. → in code

mem. access

↳ in stack /

or heap.

e.g: ADD EAX, DATA1 → 2 (best)

ADD {DATA1}, DATA2 → 3 (worst)

ADD DATA1, {DATA2} → 4 (worst)

↳ mem. op. has address

to another location

∴ We must allocate min. # pages

How to allocate?

Allocation algo / fixed algo.

way  $12 \times 80$  way 8, mapping.

## ① fixed allocation

frame size constant



and no need to swap  
process.  $\frac{80}{10} = 8$

but here, for some process, it may

have 1000 pages & some have only 1 page

Then the one with 1000 page generate more

page fault.

## ② proportional allocation

ie memory allocated proportional to

size of pages of process.

let  $s_i$  be size of process.

$$S = \sum_{i=1}^n s_i \quad (\text{total size of } n \text{ process})$$

∴ process  $P_i$  gets  $\frac{m}{s_i} \times s_i$  pages.

$m$  - #frames (size of mm)

Also we can use priority as the basis of allocating mem. One with more priority gets more pages. Or we can use a combination of size & priority.

~~Consider~~ When we use swap in swap out mechanism, when heavy page faults occur, then # processes waiting for swap in/out ↑ & as a result # processes in ready queue ↓ drastically. Thinking its because of lack of enough process in mm, system adds more process, ↑ degree of multiprocessing, which adds to page

fault, due to less #frames, processes waiting for disc oper. ↑ of CPU utilisation & drastically. This situation is called thrashing.



### Solution

- ① local swapping: when page fault occur, replace a page of same process (not an ideal soln)
- ② Allocate enough memory to continue execution of process.

|    |      |
|----|------|
| Pr | main |
|    | sub1 |
|    | sub2 |
|    | ...  |
|    | data |

→ It is based on locality of reference.  
→ Initially to run P, we need pages to load main & data associated with locality of ref.

- when sub routine 1 is executed  
its locality of reference changes  
as it needs another set of data
- ~~1<sup>st</sup>~~ for subr2 and rest



→ i.e., initially there will large # of page fault but later it ↓ & reaches a steady state. But one locality change, # page fault again ↑

Based on this, we have 2 model;

### Working set model

△ working set window = the last n no. of memory access made by process

Working set: Actual #<sup>diff</sup> pages that are accessed in the working set window.



Now as window changes, working set also changes.

Let  $D = \sum_{i=1}^n w_{s,i}$  → working set of  $i^{th}$  process use  $w_{s,i}$  # pages

If # frames  $n < D$ , then # page fault is high

else if  $n > D$ , then all pages can be accommodated in mem. ∴ less page fault.

## IMPLEMENTATION WORKING SET MODEL

Now let timer = 100 for every

page there is an additional  
in memory reference bit



Now after the  $\Delta$  (working set window)

is passed if there is atleast 1

reference to that page in all three

of the ref. bit, we say that, that

page is in the working set.

If all three ref. bit is 0,

no ref. in last  $> 300$  time,

its not in working set &

hence can be removed.

# Page fault frequency



when # page fault is above threshold 1

OS adds frames to the process and  
if its less than threshold 2, it reclaims  
frames from that process to keep  
# page fault b/w the upper & lower  
bound.

based on local page replacement algo.  
it's own pages

## Kernel memory allocation

For kernel data structures & I/O  
buffer may need a page. How to allocate  
page?

# ① Buddy system

If kernel need 200 bytes.

Then it allocates 256 (smallest memory size  $\geq$  req size =  $2^n$ )

Suppose initially 4096 bytes is available

for kernel allocation



for 200 requirement

i.e., large block is divided into smaller blocks

allocated if req is 200

fill least memory is wasted

Now, when [256] block is freed, it combines blocks till largest sized free block is obtained.

### Cons

- memory can be allocated as a power of 2.
- internal fragmentation occurs.

### ② SLAB Allocation → one or more contiguous pages



kernel memory = x blocks

if any internal kernel data structure is stored in a cache, all PCBs are stored in

sg (PCB)

that cache.

A slab can be in free state - partially filled or completely allocated state.

Now, when a new process's PCB comes, it stored in a partial or free slab in the cache, allocated for PCB.

If no free slabs are there, OS requests an additional slab from kernel space.



# Device management



I/O device I/H & I/O controller



interface I/H that  
allows transaction of data b/w  
cpu & I/O device



all device controller  
has atleast 4  
reg.

# reg for data-in  
'out & status may  
vary depending  
on complexity

status reg has the states  
& control reg has control info  
used to control the I/O device

## Device addressing

① We have peripheral I/O i.e., separate address for I/O devices  
(64 KB address for 80x86)

(addressing is diff. from mem. address)

Also called isolated I/O

(Even though mem & dev. address are equal in value, its different w.r.t. mechanism of accessing)

## Memory mapped I/O

Suppose our system has 20 bit add.

bus, then MM can have  $2^{20}$  add.



Now a portion of this is reserved for device addressing

ii) each address in the 1<sup>st</sup> 1KB  
(in above example) is associated  
with an I/O device

e.g. move add., 10  
if add is in 1<sup>st</sup> 1KB, then  
10 can be given as I/p or  
O/p data to a device

[In isolated I/O, IN & OUT instruction  
are used for mem. related  
operations]

In isolated, only 64kB is there for I/O  
but in memory mapped I/O we can  
have as many as our requirement  
within  $2^{48-1}$  address space as per  
architecture.

## WAYS TO ACCESS TO I/O

① polling: CPU always reads status reg  
which indicate whether the I/p or O/p

is ready in data-in or data-out reg.  
If ready, then CPU transfer data  
to/from I/O.

i.e., continuous read of status reg  
& data of I/O

Cons: If speed of I/O is comparable to  
CPU, we can use this

② interrupt driven I/O: When CPU reg.  
I/O, it initiates it & controller gives it  
to I/O. Once the I/O is done or  
interrupt is given back to CPU to  
say its done & do further instr.

③ DMA: I/O can directly access  
MM

If its a block transfer (4096 bytes)  
interrupt driven & polling is  
inefficient ( $I/O \rightarrow CPU \text{ reg} \rightarrow MM$ )

~~Device driver~~ Device driver



~~if~~, ~~as~~ ~~hard~~ device is a s/w (part of os)

that manage discs using % controller

(for linking its .ldm - loadable class module)

Uniform interface for device driver:

The device driver provides a uniform interface for OS to interact with different types of devices & their controllers.

Types of % error

characters derive from block device

→ block devices - reads & write 1 block  
at a time  
eg: magnetic disk

character by devic: reads & writes  
a stream of bytes at a time  
eg: 1 byte @ a time

Linux I/O implemented based on file  
system I/O interface

suppose we want to read from /dev/disk

OS converts it to its node # (11<sup>th</sup> to a file)  
for drives also

node # has major # → type of device

it allows OS to load the device

based on the devic corresponding to

major number with minor number

as parameter (to know instant of that device

eg: major # says to write to a printer  
minor # " " " " " printer

writing & reading to/from file / %o devic  
has common interface in OS level.

# Virtualisation

Gives us an illusion of separate m/c with h/w, kernel and address space

Till now



user feels that the entire h/w is dedicated for him (even though time sharing is happening) → in same OS.

This type of virtualisation is called process virtualisation (on a process level)

In a VM



request level parallelism

multiple OS needs same H/W.  
each OS has same priority & if all try to access H/W at same time

chaos happens. So we need some

mechanism to manage this.

(no time so lets finish here! 😊)

Other considerations in VM management

(demand paging)

① Pre-paging: Whenever a process starts <sup>for 1st/after timer</sup> execution, there is a chance of more # page faults initially

so to avoid this, if these pages are already loaded into mm before a page fault occurs, by paging h/w

this is called pre paging

if P<sub>1</sub> and 1, 3, 7, 10 & 13, are loaded

even before P<sub>1</sub> generates page fault after

its scheduled each time

② Page-size: Average virtual fragmentation  
=  $\frac{1}{2}$  page size

⇒ If page size ↑, int frag: ↑.

if " " ↓ " " ↓

⇒ Also if page size ↑, I/O time ↑ &

⇒ in general versa.

⇒ # page fault ↑ with ↓ with  
page size

But can we ↑ page size at h/w level?

↑ RAM or OS-level?

A: OS cannot ↑ page size, it  
depends on h/w resources available

(modern system provide 4-8 kB pages as  
well as >4 MB page size)

can't modify page size much. But

affect A M A time

→ ③ TLB reach

| Page | frame # |
|------|---------|
|      |         |
|      |         |

If TLB can store 16 page info.

If size of page = 4096,

then TLB reach =  $16 \times 4096$ .

If locality of a process is  $<$  TLB reach, all the memory access is through TLB & AMA.

But if process locality  $>$  TLB reach

memory access may cause TLB miss & hence AMA ↑

∴ how to ↑ TLB reach?

① increase size of TLB (# entries in TLB)

then TLB hit ↑ & AMA ↓

② P size of page: If our system allows 1MB or 4MB pages, then it's difficult to handle TLB in hardware level (amount of info to be stored)

If needs to be handled by s/w.

But this is AMA.

#### ④ Inverted page table:



usually if  $(\text{pid} | \text{page}\#)$ , we check all entries in inv. page table. But how can we say if page fault occurred, were that page stored in 2<sup>o</sup> mem. To have this, we have an external page table per process



if  $(\text{pid} | \text{page})$  is not in inv. page table, it looks in external p.t & if the page is invalid, the

address of the area where that page is stored in 2<sup>o</sup> mem is obtained. Then its load to MM of external & inv. page table is updated.

## ⑤ Program structure

float  $A[0:24][0:24]$

for  $i = 0; i < 1024; i++$

    for  $j = 0; j < 1024; j++$

$$A[i][j] = 10$$

{ if page size = 4kB  
# page = 512 }

now, since matrix is stored in row major

$A[0][0] \rightarrow$  1 page fault

$A[1][0] \rightarrow$  page fault as  $A[1][0]$  is after  $4 \times 1024$  bytes from  $A[0][0]$   
 $A[2][0] \rightarrow$  1 page fault = 4kB.

$A[512][0] \rightarrow$  page fault & page replacement  
as mem has only 512 frames.

$$\text{no. of faults} = 1024 \times 1024$$

Now  $A[i][j]$  is changed to  $A[i][j]$

$A[0][0] \rightarrow$  1

$A[0][1] \rightarrow$  no fault

$A[1][0] \rightarrow$  1

$A[1][1] \rightarrow$  no fault

$A[1023][0] \rightarrow$  1

$[1] \rightarrow$  no fault

} total 1024

faults