

## Chapter 8: Main Memory

### COURSE INSTRUCTORS:

DR. MUHAMMAD NASEEM, ENGR. FARHEEN QAZI, MS. FALAK SALEEM

COURSE : OPERATING SYSTEMS (SWE-204)

BATCH : 2019      FALL : 2020

DEPARTMENT OF SOFTWARE ENGINEERING  
SIR SYED UNIVERSITY OF ENGINEERING & TECHNOLOGY

1

## Today's Agenda

- ▶ Background
- ▶ Swapping
- ▶ Contiguous Memory Allocation
- ▶ Paging
- ▶ Structure of the Page Table
- ▶ Segmentation

2

## Objectives

- ▶ To provide a detailed description of various ways of organizing memory hardware
- ▶ To discuss various memory-management techniques, including paging and segmentation
- ▶ To provide a detailed description of the Intel Pentium, which supports both pure segmentation and segmentation with paging

3

## Background

- ▶ Program must be brought (from disk) into memory and placed within a process for it to be run
- ▶ Main memory and registers are only storage CPU can access directly
- ▶ Memory unit only sees a stream of addresses + read requests, or address + data and write requests
- ▶ Register access in one CPU clock (or less)
- ▶ Main memory can take many cycles, causing a **stall**
- ▶ **Cache** sits between main memory and CPU registers
- ▶ Protection of memory required to ensure correct operation

4

### Base and Limit Registers

- ▶ A pair of **base** and **limit registers** define the logical address space
- ▶ CPU must check every memory access generated in user mode to be sure it is between base and limit for that user



5

### Hardware Address Protection



6

### Address Binding

- ▶ Programs on disk, ready to be brought into memory to execute from an **input queue**
  - Without support, must be loaded into address 0000
- ▶ Inconvenient to have first user process physical address always at 0000
  - How can it not be?
- ▶ Further, addresses represented in different ways at different stages of a program's life
  - Source code addresses usually symbolic
  - Compiled code addresses **bind** to relocatable addresses
    - i.e. "14 bytes from beginning of this module"
  - Linker or loader will bind relocatable addresses to absolute addresses
    - i.e. 74014
  - Each binding maps one address space to another

7

### Binding of Instructions and Data to Memory

- ▶ Address binding of instructions and data to memory addresses can happen at three different stages
  - **Compile time:** If memory location known a priori, **absolute code** can be generated; must recompile code if starting location changes
  - **Load time:** Must generate **relocatable code** if memory location is not known at compile time
  - **Execution time:** Binding delayed until run time if the process can be moved during its execution from one memory segment to another
    - Need hardware support for address maps (e.g., base and limit registers)

8

### Multistep Processing of a User Program



9

### Logical vs. Physical Address Space

- ▶ The concept of a logical address space that is bound to a separate **physical address space** is central to proper memory management
  - **Logical address** – generated by the CPU; also referred to as **virtual address**
  - **Physical address** – address seen by the memory unit
- ▶ Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme
- ▶ **Logical address space** is the set of all logical addresses generated by a program
- ▶ **Physical address space** is the set of all physical addresses generated by a program

10

### Memory-Management Unit (MMU)

- ▶ Hardware device that at run time maps virtual to physical address
- ▶ Many methods possible, covered in the rest of this chapter
- ▶ To start, consider simple scheme where the value in the relocation register is added to every address generated by a user process at the time it is sent to memory
  - Base register now called **relocation register**
  - MS-DOS on Intel 80x86 used 4 relocation registers
- ▶ The user program deals with *logical addresses*; it never sees the *real physical addresses*
  - Execution-time binding occurs when reference is made to location in memory
  - Logical address bound to physical addresses

11

### Dynamic relocation using a relocation register

- Routine is not loaded until it is called
- Better memory-space utilization; unused routine is never loaded
- All routines kept on disk in relocatable load format
- Useful when large amounts of code are needed to handle infrequently occurring cases
- No special support from the operating system is required
  - Implemented through program design
  - OS can help by providing libraries to implement dynamic loading



12

### Dynamic Loading

- Routine is not loaded until it is called, all routines are kept on disk.
  - Useful when large amounts of code are needed to handle infrequently occurring cases, e.g. error routines.
- The main program is loaded into memory and executed.
  - When a program needs to call another routine, the calling routine first checks to see whether the other routine has been loaded.
  - If not, the relocatable linking loader is called to load the desired routine into memory and to update the program's address to reflect this change.
  - Control is passed to the newly loaded routine.



13

### Dynamic Linking

- **Static linking** – system libraries and program code combined by the loader into the binary program image
- Dynamic linking – linking postponed until execution time
- Small piece of code, **stub**, used to locate the appropriate memory-resident library routine
- Stub replaces itself with the address of the routine, and executes the routine
- Operating system checks if routine is in processes' memory address
  - If not in address space, add to address space
- Dynamic linking is particularly useful for libraries
- System also known as **shared libraries**
- Consider applicability to patching system libraries
  - Versioning may be needed



14

### Swapping

- A process can be **swapped** temporarily out of memory to a backing store, and then brought back into memory for continued execution
  - Total physical memory space of processes can exceed physical memory
- **Backing store** – fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images
- **Roll out, roll in** – swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed
- Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped
- System maintains a **ready queue** of ready-to-run processes which have memory images on disk

15

### Swapping (Cont.)

- Does the swapped out process need to swap back in to same physical addresses?
- Depends on address binding method
  - Plus consider pending I/O to / from process memory space
- Modified versions of swapping are found on many systems (i.e., UNIX, Linux, and Windows)
  - Swapping normally disabled
  - Started if more than threshold amount of memory allocated
  - Disabled again once memory demand reduced below threshold

16

### Swapping (Cont.)

- *"if needed, the operating system can always make room for high-priority jobs, no matter what!"*



17

### Schematic View of Swapping



18

### Contiguous Allocation

- Main memory must support both OS and user processes
- Limited resource, must allocate efficiently
- Contiguous allocation is one early method
- Main memory usually into two **partitions**:
  - Resident operating system, usually held in low memory with interrupt vector
  - User processes then held in high memory
  - Each process contained in single contiguous section of memory

19

### Contiguous Allocation (Cont.)

- Relocation registers used to protect user processes from each other, and from changing operating-system code and data
  - Base register contains value of smallest physical address
  - Limit register contains range of logical addresses – each logical address must be less than the limit register
  - MMU maps logical address dynamically
  - Can then allow actions such as kernel code being **transient** and kernel changing size

20



21

## Real Memory Management Techniques

- ▶ Although the following simple/basic memory management techniques are not used in modern OSs, they lay the ground for a later proper discussion of virtual memory:
  - ▶ Fixed/Static Partitioning
  - ▶ Variable/Dynamic Partitioning
  - ▶ Simple/Basic Paging
  - ▶ Simple/Basic Segmentation

22



23

## Placement Algorithm with Partitions

- ▶ **Equal-size partitions:**
  - If there is an available partition, a process can be loaded into that partition –
    - because all partitions are of equal size, it does not matter which partition is used.
  - If all partitions are occupied by blocked processes, choose one process to swap out to make room for the new process.

24

### Placement Algorithm with Partitions

- ▶ **Unequal-size partitions, use of multiple queues:**
  - assign each process to the smallest partition within which it will fit.
  - a queue exists for each partition size.
  - tries to minimize internal fragmentation.
  - problem: some queues might be empty while some might be loaded.



25

### Placement Algorithm with Partitions

- ▶ **Unequal-size partitions, use of a single queue:**
  - when its time to load a process into memory, the smallest available partition that will hold the process is selected.
  - increases the level of multiprogramming at the expense of internal fragmentation.



26

### Dynamics of Fixed Partitioning

- ▶ Any process whose size is less than or equal to a partition size can be loaded into the partition.
- ▶ If all partitions are occupied, the OS can swap a process out of a partition.
- ▶ A program may be too large to fit in a partition. The programmer must design the program with overlays.

27

### Comments on Fixed Partitioning

- ▶ Main memory use is inefficient. Any program, no matter how small, occupies an entire partition. This can cause internal fragmentation.
- ▶ Unequal-size partitions lessens these problems but they still remain ...
- ▶ Equal-size partitions was used in early IBM's OS/MFT (Multiprogramming with a Fixed number of Tasks).

28



29



30

## Example#02

| 0                                                | operating system    |                |  |
|--------------------------------------------------|---------------------|----------------|--|
| 400K                                             |                     |                |  |
| 2560K                                            |                     |                |  |
|                                                  | job queue at time 0 |                |  |
| process                                          | memory              | time in system |  |
| P1                                               | 600K                | 10             |  |
| P2                                               | 1000K               | 5              |  |
| P3                                               | 300K                | 20             |  |
| P4                                               | 700K                | 8              |  |
| P5                                               | 500K                | 15             |  |
| How to schedule the job queue in a FCFS fashion? |                     |                |  |

31

## MVT Example – at time 5

| 0                                     | operating system |                |  |
|---------------------------------------|------------------|----------------|--|
| 400K                                  |                  |                |  |
| 1000K                                 |                  |                |  |
| 2000K                                 |                  |                |  |
| 2300K                                 |                  |                |  |
| 2560K                                 |                  |                |  |
|                                       | job queue        |                |  |
| process                               | memory           | time in system |  |
| P1                                    |                  |                |  |
| P2                                    |                  |                |  |
| P3                                    |                  |                |  |
| P4                                    | 700K             | 8              |  |
| P5                                    | 500K             | 15             |  |
| Now P2 is done, so replace with P4... |                  |                |  |

32



33



34



35



36

### Fragmentation

- ▶ **External Fragmentation** – total memory space exists to satisfy a request, but it is not contiguous
- ▶ **Internal Fragmentation** – allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used

37

### Fragmentation (Cont.)

- ▶ Reduce external fragmentation by **compaction**
  - Shuffle memory contents to place all free memory together in one large block
  - Compaction is possible only if relocation is dynamic, and is done at execution time
  - I/O problem
    - Latch job in memory while it is involved in I/O
    - Do I/O only into OS buffers
- ▶ Now consider that backing store has same fragmentation problems

38

### Comments on Variable Partitioning

- ▶ Partitions are of variable length and number.
- ▶ Each process is allocated exactly as much memory as it requires.
- ▶ Eventually holes are formed in main memory. This can cause external fragmentation.
- ▶ Must use compaction to shift processes so they are contiguous; all free memory is in one block.
- ▶ Used in IBM's OS/MVT (Multiprogramming with a Variable number of Tasks).

39

### Paging

- ▶ Physical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter is available
  - Avoids external fragmentation
  - Avoids problem of varying sized memory chunks
- ▶ Divide physical memory into fixed-sized blocks called **frames**
  - Size is power of 2, between 512 bytes and 16 Mbytes
- ▶ Divide logical memory into blocks of same size called **pages**
- ▶ Keep track of all free frames
- ▶ To run a program of size **N** pages, need to find **N** free frames and load program
- ▶ Set up a **page table** to translate logical to physical addresses
- ▶ Backing store likewise split into pages
- ▶ Still have Internal fragmentation

40

### Address Translation Scheme

- ▶ Address generated by CPU is divided into:
  - ▶ **Page number (p)** – used as an index into a **page table** which contains base address of each page in physical memory
  - ▶ **Page offset (d)** – combined with base address to define the physical memory address that is sent to the memory unit



- ▶ For given logical address space  $2^m$  and page size  $2^n$

41

### Paging Hardware



42

### Paging Model of Logical and Physical Memory



43

### Paging Example



44



45

### Implementation of Page Table

- ▶ Page table is kept in main memory
- ▶ **Page-table base register (PTBR)** points to the page table
- ▶ **Page-table length register (PTLR)** indicates size of the page table
- ▶ In this scheme every data/instruction access requires two memory accesses
  - One for the page table and one for the data / instruction
- ▶ The two memory access problem can be solved by the use of a special fast-lookup hardware cache called **associative memory** or **translation look-aside buffers (TLBs)**

46

### Implementation of Page Table (Cont.)

- ▶ Some TLBs store **address-space identifiers (ASIDs)** in each TLB entry – uniquely identifies each process to provide address-space protection for that process
  - Otherwise need to flush at every context switch
- ▶ TLBs typically small (64 to 1,024 entries)
- ▶ On a TLB miss, value is loaded into the TLB for faster access next time
  - Replacement policies must be considered
  - Some entries can be **wired down** for permanent fast access

47

### Associative Memory

- ▶ Associative memory – parallel search
- | Page # | Frame # |
|--------|---------|
|        |         |
|        |         |
|        |         |
|        |         |
- ▶ Address translation (p, d)
    - ▶ If p is in associative register, get frame # out
    - ▶ Otherwise get frame # from page table in memory

48

### Paging Hardware With TLB



49

### Effective Access Time (EAT)

- ▶ Associative Lookup =  $\varepsilon$  time unit
  - Can be < 10% of memory access time
- ▶ Hit ratio =  $\alpha$ 
  - Hit ratio – percentage of times that a page number is found in the associative registers; ratio related to number of associative registers
- ▶ Consider  $\alpha = 80\%$ ,  $\varepsilon = 20\text{ns}$  for TLB search,  $100\text{ns}$  for memory access
- ▶ **Effective Access Time (EAT)**

$$\begin{aligned} \text{EAT} &= (1 + \varepsilon)\alpha + (2 + \varepsilon)(1 - \alpha) \\ &= 2 + \varepsilon - \alpha \end{aligned}$$
- ▶ Consider  $\alpha = 80\%$ ,  $\varepsilon = 20\text{ns}$  for TLB search,  $100\text{ns}$  for memory access
  - EAT =  $0.80 \times 100 + 0.20 \times 200 = 120\text{ns}$
- ▶ Consider more realistic hit ratio  $\rightarrow \alpha = 99\%$ ,  $\varepsilon = 20\text{ns}$  for TLB search,  $100\text{ns}$  for memory access
  - EAT =  $0.99 \times 100 + 0.01 \times 200 = 101\text{ns}$

50

### Example

The percentage of times that a page number is not found in the associative registers is 33%. If it takes 150 nanoseconds to access primary memory, and takes 23 nanoseconds to search the associative registers, then what is the effective reference time?

#### Solution

$$\text{Miss ratio} = 33\% = 0.33$$

$$\text{Hit ratio} = 67\% = 0.67$$

$$\text{E.A.T} = 0.67 * 173 + 0.33 * 323$$

$$\text{E.A.T} = 131.48 + 106.59$$

$$\text{E.A.T} = 238.07 \text{ nanoseconds}$$

51

### Memory Protection

- ▶ Memory protection implemented by associating protection bit with each frame to indicate if read-only or read-write access is allowed
  - Can also add more bits to indicate page execute-only, and so on
- ▶ **Valid-Invalid** bit attached to each entry in the page table:
  - "valid" indicates that the associated page is in the process' logical address space, and is thus a legal page
  - "invalid" indicates that the page is not in the process' logical address space
  - Or use **page-table length register (PTLR)**
- ▶ Any violations result in a trap to the kernel

52

### Valid (v) or Invalid (i) Bit In A Page Table



53

### Shared Pages

#### ► Shared code

- One copy of read-only (**reentrant**) code shared among processes (i.e., text editors, compilers, window systems)
- Similar to multiple threads sharing the same process space
- Also useful for interprocess communication if sharing of read-write pages is allowed

#### ► Private code and data

- Each process keeps a separate copy of the code and data
- The pages for the private code and data can appear anywhere in the logical address space

54

### Shared Pages Example



55

### Structure of the Page Table

#### ► Memory structures for paging can get huge using straight-forward methods

- Consider a 32-bit logical address space as on modern computers
- Page size of 4 KB ( $2^{12}$ )
- Page table would have 1 million entries ( $2^{32} / 2^{12}$ )
- If each entry is 4 bytes  $\rightarrow$  4 MB of physical address space / memory for page table alone
  - That amount of memory used to cost a lot
  - Don't want to allocate that contiguously in main memory

#### ► Hierarchical Paging

#### ► Hashed Page Tables

#### ► Inverted Page Tables

56

## Hierarchical Page Tables

- ▶ Break up the logical address space into multiple page tables
- ▶ A simple technique is a two-level page table
- ▶ We then page the page table

57

## Two-Level Page-Table Scheme



58

## Two-Level Paging Example

- ▶ A logical address (on 32-bit machine with 1K page size) is divided into:
  - ▶ a page number consisting of 22 bits
  - ▶ a page offset consisting of 10 bits
- ▶ Since the page table is paged, the page number is further divided into:
  - ▶ a 12-bit page number
  - ▶ a 10-bit page offset
- ▶ Thus, a logical address is as follows:
 

|       |       |     |
|-------|-------|-----|
| $p_1$ | $p_2$ | $d$ |
| 12    | 10    | 10  |

 where  $p_1$  is an index into the outer page table, and  $p_2$  is the displacement within the page of the inner page table
- ▶ Known as **forward-mapped page table**

59

## Address-Translation Scheme



60

### 64-bit Logical Address Space

- ▶ Even two-level paging scheme not sufficient
- ▶ If page size is 4 KB ( $2^{12}$ )
  - ▶ Then page table has  $2^{52}$  entries
  - ▶ If two level scheme, inner page tables could be  $2^{10}$  4-byte entries
  - ▶ Address would look like
 

|            |            |             |
|------------|------------|-------------|
| outer page | inner page | page offset |
| $p_1$      | $p_2$      | $d$         |

42      10      12
  - ▶ Outer page table has  $2^{42}$  entries or  $2^{44}$  bytes
  - ▶ One solution is to add a 2<sup>nd</sup> outer page table
  - ▶ But in the following example the 2<sup>nd</sup> outer page table is still  $2^{34}$  bytes in size
    - ▶ And possibly 4 memory access to get to one physical memory location

61

### Hashed Page Tables

- ▶ Common in address spaces > 32 bits
- ▶ The virtual page number is hashed into a page table
  - This page table contains a chain of elements hashing to the same location
- ▶ Each element contains (1) the virtual page number (2) the value of the mapped page frame (3) a pointer to the next element
- ▶ Virtual page numbers are compared in this chain searching for a match
  - If a match is found, the corresponding physical frame is extracted
- ▶ Variation for 64-bit addresses is **clustered page tables**
  - Similar to hashed but each entry refers to several pages (such as 16) rather than 1
  - Especially useful for **sparse** address spaces (where memory references are non-contiguous and scattered)

62

### Hashed Page Table



63

### Inverted Page Table

- ▶ Rather than each process having a page table and keeping track of all possible logical pages, track all physical pages
- ▶ One entry for each real page of memory
- ▶ Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page
- ▶ Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs
- ▶ Use hash table to limit the search to one — or at most a few — page-table entries
  - TLB can accelerate access
- ▶ But how to implement shared memory?
  - One mapping of a virtual address to the shared physical address

64



65



66



67



68

### Segmentation Architecture

- ▶ Logical address consists of a two tuple:  
    <segment-number, offset>;
- ▶ **Segment table** – maps two-dimensional physical addresses; each table entry has:
  - **base** – contains the starting physical address where the segments reside in memory
  - **limit** – specifies the length of the segment
- ▶ **Segment-table base register (STBR)** points to the segment table's location in memory
- ▶ **Segment-table length register (STLR)** indicates number of segments used by a program;  
    segment number **s** is legal if **s < STLR**

69

### Segmentation Architecture (Cont.)

- ▶ Protection
  - With each entry in segment table associate:
    - validation bit = 0  $\Rightarrow$  illegal segment
    - read/write/execute privileges
- ▶ Protection bits associated with segments; code sharing occurs at segment level
- ▶ Since segments vary in length, memory allocation is a dynamic storage-allocation problem
- ▶ A segmentation example is shown in the following diagram

70

### Segmentation Hardware



71

**End of Chapter 8**

**Thank you**