

## Introduction to CAD

refined to structural

Digital computer: A machine that can solve problems for people by carrying out instructions given to it.

### Computer Architecture

- ↳ The set of data types, operations and features at each level
- ↳ Deals with those aspects that are visible to programmers
  - ↳ Memory, control unit, ALU, I/O
- ↳ Architecture and organization are essentially the same.
- ↳ functional operation of the individual hardware units in a computer system, the flow of information among, and control of those units.
- ↳ Computer architecture and organization deals with structure and function of computers.
  - ↳ Structure is the way in which components relate to each other.
  - ↳ Function is the operation of individual components as part of the structure.

### Functions of Computer:

- Data processing
- Data storage
- Data movement
- Control

Functional view:  
Operating environment  
(Source & destination of data)



(S-structure)

## Structure of a Computer

QAD of microprocessor

Structure of computer consists of two main parts  
- i) CPU and II) Peripheral



## Computer Generations

### • Zeroth Generation

↳ Mechanical computers (1642-1945)

### • First Generation

↳ Vacuum Tubes (1945-1955)

### • Second Generation

↳ Transistors (1955-1965)

### • Third Generation

↳ Integrated circuits (1965-1980)

### • Fourth Generation

↳ Very Large Scale Integration (1980 - ?)

### • Fifth Generation

↳ Artificial Intelligence

### • Sixth Generation

↳ Quantum Computers.



## Von Neumann Machine:



### Moorre's Law

- ↳ Increased density of components on chip
- ↳ No. of transistors on a chip will double every year.
- ↳ Cost of a chip has remained almost unchanged.
- ↳ Higher packing density means shorter electrical paths, giving higher performance.
- ↳ Smaller size gives increased flexibility.
- ↳ Reduced power and cooling requirements.
- ↳ Fewer interconnections increase reliability.

### Program

- ↳ A sequence of steps
- ↳ For each step, an arithmetic or logical operation is done.
- ↳ For each operation, a different set of control signals is needed.

### Control Unit

- ↳ For each operation a unique code is provided (ADD, MOVE, etc.).
- ↳ A hardware segment accepts the code and issues the control signals.

### Components

- ↳ The control unit and the Arithmetic and logic unit constitute the CPU.
- ↳ Data and instructions need to get into the system and results out
  - ↳ Input/Output
- ↳ Temporary storage of code & results is needed
  - ↳ Main memory.

## Computer Components



## Instruction Cycle

- Fetch
- Execute



### Fetch cycle:

- ↳ PC holds address of next instruction to fetch.
- ↳ Processor fetches instruction from memory location pointed to by PC.
- ↳ Increment PC
- ↳ Unless told otherwise.

- ↳ Instruction loaded into IR.
- ↳ Processor interprets instruction and performs required actions.

↳ Address & Data + also performs program control  
program control

## A single bit memory using D latch:



## 4 Bit Register



Lecture - 6

### 4x8-Bit Register



### Fetch Operation

Lecture-7

2 memory chips with 4 registers each and chip select.



### Memory-model:



ANSWER

Addressing 8 registers with 4 address lines:



1000-1011  
 $(0x20 \rightarrow 0x23)$   
↳ Address range supported by chip-1

0x0F - 0x1F  
↳ (Address range not used)

0000 - 0011  
(0x00 - 0x03)  
↳ (Address range supported by chip-1)

## Fetch operation of memory



Instruction Cycle State DiagramsA simple 4b ISA:

- ↳ Each instruction is 4 bits
  - ↳ 2 bits specify the OPCODE
  - ↳ 2 bits specify the operand or address.

Registers

- Accumulator (Acc) : Holds the result of operations.
- Program Counter (PC) : Points to the next instruction to execute.
- Instruction Register (IR) : Holds the current instruction.
- Memory Address Register (MAR) : Holds memory addresses.
- Memory Data Register (MDR) : Holds data being transferred to/from memory.

Control Actions

- ↳ control signals :

- Fetch : Load the next instruction into the IR.
- Decode : Determine the operation to perform.
- Execute : Carry out the operation.

## Fetch cycle:

- ① MAR  $\leftarrow$  PC
- ② MDR  $\leftarrow$  Memory [MAR]
- ③ IR  $\leftarrow$  MDR
- ④ PC  $\leftarrow$  PC + 1

## Decode & Execute cycle:

- NOP : Do nothing

- LOAD R:
  - MAR  $\leftarrow$  IR [1:0]
  - MDR  $\leftarrow$  Memory [MAR]
  - ACC  $\leftarrow$  MDR

- STORE R:
  - MAR  $\leftarrow$  IR [1:0]
  - MDR  $\leftarrow$  ACC
  - Memory [MAR]  $\leftarrow$  MDR

- ADD R:
  - MAR  $\leftarrow$  IR [1:0]
  - MDR  $\leftarrow$  Memory [MAR]
  - ACC  $\leftarrow$  ACC + MDR

27/02/23 THI

## State Transition Diagram:

- State 0: Fetch instr.  $\rightarrow$  Go to State 1
- State 1: Decode instr.  $\rightarrow$  Go to the appropriate execution state based on the opcode.
- State 2: Execute LOAD  $\rightarrow$  Return to State 0.

## States:

- Execute STORE  $\rightarrow$  Return to State 0.
- Execute ADD  $\rightarrow$  Return to State 0.

## Instruction Cycle with Interrupts:



## INTERRUPTS

- ↳ Mechanism by which other modules (e.g., I/O) may interrupt normal sequence of processing.
- ↳ Program
  - ↳ e.g., overflow, division by zero
  - ↳ Attempt to execute an illegal instruction.
  - ↳ Referring outside the allowed memory space.
- ↳ Timer
  - ↳ To do something at regular intervals.
  - ↳ Generated by internal processor timer.
  - ↳ Used in pre-emptive multi-tasking.
- ↳ I/O
  - ↳ from I/O controller.
  - ↳ Intimating the computer about completion of tasks, arrival of signals, or reporting error conditions.
- ↳ Hardware failure
  - ↳ Reporting hardware failures or similar critical events.
  - ↳ Eg., Power failure or memory parity error.

## Program Flow Control:



## Interrupt Cycle

- ↳ Added to instruction cycle.
- Processor checks for interrupt
  - ↳ Indicated by an interrupt signal
- If no interrupt, fetch next instruction.
- If interrupt pending:
  - suspend execution of current program
  - Save context
  - Set PC to set address of interrupt handler routine
  - Process interrupt.
  - Restore context and continue interrupted program.

## Instruction Cycle (with Interrupt) - State Diagram



## Multiple Interrupts

- Disable interrupts
  - ↳ Processor will ignore further interrupts whilst processing one interrupt.
  - ↳ Interrupts remain pending and are checked after first interrupt has been processed.
  - ↳ Interrupts handled in sequence as they occur.
- Define priorities
  - ↳ Low priority interrupts can be interrupted by higher priority interrupts.
  - ↳ When higher priority interrupt has been processed, processor returns to previous interrupt.



### Multiple Interrupts - Sequential

User Program      Interrupt Handler X



### Multiple Interrupts - Nested

U.P.



### Hardware Traps

If processor traps to operating system

Processor traps to interrupt controller. interrupt controller traps to memory management unit. memory management unit traps to operating system.

Trap to interrupt controller. interrupt controller traps to memory management unit. memory management unit traps to operating system.

Leaving trap to user program

Trap to memory management unit. memory management unit traps to operating system.

## Lecture-1D

# PROCESSORS AND INTERCONNECTIONS

- The organization of a simple computer with one CPU, and two I/O devices.



## Connecting the Modules

- All the units must be connected by some sort of interface

  - Data Bus
  - Address Bus
  - Control Bus

- Two main types of bus:

  - Internal bus (within CPU)
  - External bus (CPU to peripherals / CPU)

- Different type of connection for different types of unit:
    - Memory
    - Input / Output
    - CPU



11-946693

## Bus

- A communication pathway connecting two or more devices.
- Usually broadcast W/ regards to bus width, no. of channels etc.
- often grouped
  - A no. of channels in a bus
  - 32-bit data bus is 32 separate single bit channels.
- Power lines may not be shown.

## Data Bus

- Carries data [No difference b/w 'data' & 'instruction' at this level]
- Width is a key determinant of performance.
  - ↳ 8, 16, 32, 64-bit.
- Data transfer capacity is determined by the databus.

## Address Bus

- Identify the source or destination of data.  
e.g., CPU needs to read an instruction (data) from a given location in memory.
- Bus width determines maximum memory capacity of system.  
e.g., 8080 has 16-bit address bus giving 64K address space.

## Control Bus

- Control and timing information
  - Memory & Read / Write signal
  - Interrupt request
  - Clock signals
  - Status signals

## Physical Realization of Bus Architecture:



## Single Bus Problems

- ↳ Lot of devices (I/O and memory) on one bus leads to
- ↳ Propagation delay → long data paths mean that co-ordination of bus use can adversely affect performance.
- Clock period for a 1 GHz clock frequency =  $1/1\text{ GHz} = 1\text{ ns}$ 
  - ↳ In long bus can cause 5ns delay @  $2 \times 10^8 \text{ m/s}$  signal speed
  - ↳ If bus length is 5cm, p.d. =  $0.25\text{ ns}$  → enough to delay a full clock for a 4 GHz processor clock.
- Aggregate data transfer capacity approaches bus capacity.

## Data Transfer Capacity of Data Bus

- ↳ Capacity limit: Max. data transfer capacity of a bus.
- Data transfer capacity of bus:  $D = W \times (H/C)$ 
  - width of bus
  - (word length)
  - clock speed of bus
  - cycles required per word cycle
  - [W bits]
  - [Hz]
  - [c cycle]

## Increasing Bus Capacity

- ① Increase clock frequency
  - ↳ Data transfer capacity can be increased proportionally.
  - ↳ Cannot be increased due to propagation delays, as well as design of the rest of the bus.
- ② Use higher bus width
  - ↳ 8 bits to 16 bits
  - ↳ Data transfer capacity is proportionately doubled.
- ③ Use shorter bus
  - ↳ Propagation delay is lower.
- ④ Use multiple/split buses
  - ↳ Each bus is shorter → Low propagation delay.
  - ↳ Collective capacity is higher → Traffic localization.

## Bus Design Elements

- ① Bus classification into various types.
- ② Based on the character of serial/parallel.
  - ↳ Bus can use one line (serial) or multiple lines (parallel).
    - ↳ VSB is a serial bus.
    - ↳ PCI is a parallel bus.
- ③ Based on information carried in a bus.
  - ↳ Address, Data or Control Bus.
- ④ Based on multiplexing character.
  - ↳ Dedicated vs. shared (or multiplexed)
- ⑤ Based on arbitration requirements.
  - ↳ Centralized or distributed (arbitration)
- ⑥ Based on synchronization requirement.
  - ↳ Synchronous or Asynchronous
- ⑦ Based on data transfer type permitted.
  - ↳ Read, write
  - ↳ Read-modify-write
  - ↳ Read-after-write
  - ↳ Block data transfer.

Lecture-III

### Dedicated

- ↳ Permanently assigned to one function or to a subsystem.
- ↳ Separate data and address lines.

### Multiplexed

- ↳ Time multiplexed
- ↳ Shared lines
- ↳ Address line or data valid control line
- ↳ Advantage: Fewer lines and pins
- ↳ Disadvantages:
  - More complex control
  - Performance limit.

## Bus Arbitration

- ↳ only one module may control bus at one time.
- ↳ More than one module controlling the bus
  - ↳ e.g., CPU and DMA controller.
- ↳ Arbitration helps in choosing the module (bus master).
- ↳ Arbitration
  - ↳ Decision on who gets the bus next.
  - ↳ Who will be the bus master (who holds the bus)?
    - ↳ Bus master can initiate a data transfer (e.g., read or write) with some other device.
    - ↳ May be centralised or distributed.

## Centralised Arbitration

- ↳ Single hardware device controlling bus access.
  - ↳ Bus controller
  - ↳ Arbiter
  - ↳ Special arbiter chip collects requests, decides.
  - ↳ Ensures fairness, but arbiter chip may itself become bottleneck.
- ↳ May be part of CPU or separate.
- ↳ Polling v.s. individually addressed.



## Distributed Arbitration

- ↳ Each module may claim the bus.
- ↳ Control logic on all modules.
- ↳ Back off and retry if not the highest priority request.
- ↳ No bottlenecks and fair, but needs a lot of control lines.



### Daisy-chain

- ↳ Devices connect to bus in a certain predefined priority order.
- ↳ High-priority devices interrupt/deny requests by low-priority
- ↳ Simple, bus slow and can't ensure fairness.
- ↳ A small centralized arbiter to service a large no. of devices that use a shared resource at a lower cost.



### Timing

- ↳ Coordination of events on bus.
- ↳ Synchronous
  - ↳ Events determined by clock signals.
  - ↳ Control Bus includes clock line.
  - ↳ A single I-O is a bus cycle.
  - ↳ All devices can read clock line.
  - ↳ Usually sync on leading edge.
  - ↳ Usually a single cycle for an event.

Bi-directional Buffer

Function Table:

| Enable G <sub>1</sub>      | Dirn Control D <sub>IR</sub> | Operation       |
|----------------------------|------------------------------|-----------------|
| Lowlevel $\rightarrow$ L   | L                            | B Data to A Bus |
| High level $\rightarrow$ L | H                            | A Data to B Bus |
| X                          | X                            | Isolation       |
|                            | ↑                            | Invert          |

(74LS245)

Synchronous Timing Diagram : ReadSynchronous Timing Diagram : Write

## Asynchronous Timing - Read Diagram



Lecture-13

(Page 21)

## Asynchronous Timing - Write Diagram



## Read-Modify-Write

- ↳ Simply a read followed immediately by a write to the same address.
- ↳ Address is only broadcast once at the beginning of the operation.
- ↳ Whole operation is typically indivisible. (Atomic operations)
  - ↳ To prevent any access to the data element by other potential bus masters.

- ↳ Protects shared memory resources in multiprogramming system
  - ↳ Prevents other processes from reading while you write.
  - ↳ Lock it for reading, so others will not read when writing.
  - ↳ Prevents deadlocks.
  - ↳ Prevents race conditions.



## Read-After Write Bus

- ↳ An indivisible/atomic operation.
- ↳ Address is only given once.
- ↳ Consisting of a write followed immediately by a read from the same address.

### Main purpose:

|         |            |           |
|---------|------------|-----------|
| Address | Data Write | Data Read |
|---------|------------|-----------|

- ↳ The read operation may be performed for checking purposes.
- ↳ Whether the data is written correctly.
  - ↳ Memory cell errors can be present.
  - ↳ Data Read must be same as Data Written.

## Block Data Transfer Bus

- ↳ One address cycle is followed by N data cycles.
- ↳ First data item is transferred to or from the specified address.
- ↳ The remaining data items are transferred to or from subsequent addresses.
- ↳ Main advantage:
  - ↳ High data transfer throughput
  - ↳ saves address transfer for ~~several~~ subsequent reads/writes.

|         |      |      |      |
|---------|------|------|------|
| Address | Data | Data | Data |
|---------|------|------|------|

## Multiple Bus Hierarchy

- ↳ Single buses face many issues.

### ① High propagation delay

- ↳ When many interfaces are connected.
- ↳ When the physical length is high.
- ↳ Errors may occur over the bus.

### ② Limited bus capacity

- ↳ Max bus capacity ( $C=1$ ) = Bus width  $\times$  clock speed

$$\text{8085 data bus capacity} = 8 \text{ bits} \times 3 \text{ MHz} = 24 \text{ Mbps}$$

- ↳ If bus transfer requires  $K$  clock cycles.

$$\text{Capacity} = 8 \text{ bits} \times (3 \text{ MHz}/K)$$

↳ Capacity when 3 clock cycles are required for a data transfer: 8 Mbps

↳ I/O or memory devices faster than these cannot be supported.

③ Increasing the capacity

↳ Increase bus width

↳ Increase clock frequency;

④ Multiple Buses can help with increasing capacity.

### Bandwidth / Data Rate characteristics of I/O devices

① Primary characteristic : Data rate or B/W.

② Contributing factors:

↳ Partner : Humans have slower input/output data rates than machines.

↳ Input or output or both (input/output).

### An Eg. of Multiple Bus Architecture (with cache)



### High Performance Bus



## Some Popular Bus Standards

- Industry Standards Architecture (ISA) Bus
  - PCI Bus
  - USB Bus
  - PCMCIA (Personal Computer Memory Card International Association) Bus
    - Extended ISA (EISA) Bus
    - Limited data rate (16.7 MB/s)
    - 133 MB/s required for video interface bus.
    - Express card Bus
  - Firewire (IEEE 1394)
  - Serial ATA (Serial AT Attachment)
  - SCSI (Small Computer System Interface)
- # PCI (Peripheral Component Interconnection) Bus
  - 133 MB/s (original)
  - 2.1 → 528 MB/sec.

## MEMORY

The internal or external module of CPU or I/O subsystem in a computer system where data is stored by a computer temporarily or permanently.

### Memory Characteristics:

#### ① Location

- CPU
  - Internal
  - External

#### ② Capacity

→ word size [natural unit of org.]

→ No. of words (or bytes).

#### ③ Unit of transfer

→ Internal → usually governed by data bus width.

→ External → usually a block which is much larger than a word.

→ Addressable unit → smallest location which can be uniquely addressed.

→ Word internally.

→ Cluster of memory disks.

#### ④ Access Method

#### ⑤ Performance

#### ⑥ Physical type

#### ⑦ Physical characteristics

#### ⑧ Organisation

## Access Methods

### ① Sequential

↳ start at the beginning and read through in order.

↳ Access time depends on location of data and previous location.

↳ High volume storage

↳ Archival storage

↳ Low cost of storage (e.g., tape).

### ② Direct

↳ Individual blocks have unique address.

↳ Access is by jumping to vicinity plus sequential search.

↳ Access time depends on current and previous location

(e.g., disk).

### ③ Random

↳ Individual addresses identify locations exactly.

↳ Access time is independent of location or previous access (e.g., RAM).

### ④ Associative

↳ Data is located by a comparison with contents of a portion of the state.

↳ Access time is independent of location or previous access (e.g., cache).

## Performance

### ① Access time

↳ time b/w presenting the address and getting the valid data.

### ② Memory cycle time

↳ time may be required for the memory to "recover" before next access.

↳ cycle time is access + recovery.

Lecture-15

③ Transfer Rate: Rate at which data can be moved

For Non-Random access memory:

$$T_N = T_A + n/R$$

,  $T_N$ : av. time to read or write  $N$ -bits  
 $T_A$ : av. access time  
 $n$ : no. of bits

R: transfer rate in bps

$$Jpbkot \cdot T_N = \frac{N}{\text{cycle time}}$$

## Physical Types

- ① Semiconductor - RAM
  - ② Magnetic - Disk & Tape
  - ③ Optical - CD & DVD
  - ④ Others - Hologram, Quantum states

## Physical Characteristics

- ① Volatility
    - ↪ Data loses when power supply is removed.
    - ↪ Magnetic & surface memories are non-volatile (decay possibility exists)
    - ↪ Semi-conductor memories
      - ↪ volatile
      - ↪ Non-volatile
        - ↪ Flash memory uses an electrically erasable programmable Read-only (EEPROM)
  - ② Erasable
    - ↪ Non-erasable memory can only be destroyed by physical elimination.
  - ③ Decay
    - ↪ Slow decline of the stored contents over a period of time.
  - ④ Bit-rotting
    - ↪ A new term identified for representing the loss of world's information stored in various memory media.
      - ↪ e.g. no hardware readers available → Data format is obsolete.
      - ↪ Data format is not known
      - ↪ Encoding is not known

## Memory organization

↳ Physical arrangement of bits into words.  
Not always obvious.

↳ Big endian vs. little endian.

↳ Dilemma of a memory system designer.

access time ↑ (fast) cost per bit ↑ capacity ↓ → Tradeoff

Address



↳ 8 bits = 1 byte  
MASK → 10000000000000000000000000000000  
addr & size → 11111111111111111111111111111111  
addr & size - 11111111111111111111111111111111  
what's mentioned, memory - word 0

↳ Organizing words within memory

Blocks vs interleaved memory



## Byte ordering

① Big Endian memory — SPARC, IBM → MSB stored at lowest memory address

② Little Endian memory — Intel Family → LSB stored at lowest memory address

Address Big endian

|    |    |    |    |   |
|----|----|----|----|---|
| 0  | 0  | 1  | 2  | 3 |
| 4  | 5  | 6  | 7  |   |
| 8  | 9  | 10 | 11 |   |
| 12 | 13 | 14 | 15 |   |

32-bit word ← Byte

Little endian

|    |    |    |    |
|----|----|----|----|
| 3  | 2  | 1  | 0  |
| 7  | 6  | 5  | 4  |
| 11 | 10 | 9  | 8  |
| 15 | 14 | 13 | 12 |

Address

|    |    |    |    |
|----|----|----|----|
| 0  | 1  | 2  | 3  |
| 4  | 5  | 6  | 7  |
| 8  | 9  | 10 | 11 |
| 12 | 13 | 14 | 15 |

32-bit word ← Byte

## Memory Hierarchy

### ① Registers

↪ In CPU

### ② Internal or Main memory

↪ May include one or more levels of cache  
↪ "RAM"

### ③ External memory

↪ Backing store

### Advantages:

- Decreased cost per bit
- Increased capacity
- Decreased frequency of access of memory by the processor.
- Average decreased access time

### Disadvantages:

- Increased access time for down (the hierarchy).
- Increased hardware complexity.

Eg. Suppose that the processor has access to two levels of memory.

Level 1 contains 1000 words and has an access time of  $0.01 \mu s$ ;  
level 2 contains 100,000 words and has an access time of  $0.1 \mu s$ .

Assume that if a word has to be accessed is in level 1, then the processor accesses it directly. If it is in level 2, the word is first transferred to level 1 and then accessed by the processor. For simplicity, we ignore the time required for the processor to determine whether the word is in level 1 or level 2. Figure shows the general shape of the curve that covers this situation; shows the average access time to a two-level memory as a function of the hit ratio  $H$ , where  $H$  is defined as the fraction of all memory accesses that are found in the faster memory (e.g., the cache).  $T_1$  is the access time to level 1,  $T_2$  to level 2. As can be seen, for high %age of level 1 access, the average total access time is much closer to that of level 1 than that of level 2.



Ques:



Suppose 95% of the memory accesses are found in the cache. Then, avg. time to access word :

$$\begin{aligned}
 & (0.95)(0.01\mu s) + (0.05)(0.01 + 0.1)\mu s \\
 & = 0.095 + 0.0055 \\
 & = 0.015\mu s \rightarrow \text{much closer to } 0.01\mu s \text{ than to } 0.1\mu s, \text{ as desired.}
 \end{aligned}$$

$$\begin{aligned}
 & \text{If Hit Ratio is 50\%:} \\
 & (0.5)(0.01\mu s) + (0.5)(0.1\mu s) \\
 & = 0.005 + 0.05 \\
 & = 0.055\mu s
 \end{aligned}$$

### Hierarchy List :

- Registers
- L<sub>1</sub> cache
- L<sub>2</sub> cache
- L<sub>3</sub> cache
- Main memory
- Disk cache
- Disk storage
- Optical storage
- Tape storage

### Locality of Reference

↳ During the course of the execution of a program, memory references tend to cluster.

↳ Eg loops

Cache → Small amount of fast memory

↳ Sits b/w normal main memory and CPU

↳ May be located on CPU chip or module.

Cache and Main MemoryTraditional Cache (ISA) (with cache):Cache Operation

- CPU requests contents of memory location.
- Check cache for this data.
- If present, get from cache (fast).
- If not present, read required block from main memory to cache, then deliver from cache to CPU.
- Cache includes tags to identify which block of main memory is in each cache slot.

Flowchart:



## Typical cache organization



## Cache Design

- ① Addressing
    - logical
    - Physical
  - ② Size
  - ③ Mapping function
    - Direct
    - Associative
    - Set Associative
  - ④ Replacement Algorithm
    - Least Recently Used (LRU)
    - First In First Out (FIFO)
    - Least Frequently Used (LFU)
    - Random
  - ⑤ Write Policy
    - Write Through
    - Write Back
    - Write Once
  - ⑥ Block Size (Line Size)
    - single or two level
    - Unified or split
  - ⑦ No. of Caches

## Cache Addressing (Logical and Physical)

- Where does cache sit?
  - ↳ Between processor and virtual memory management unit (MMU)
  - ↳ Between MMU and main memory.
- Logical cache (virtual cache) stores data using virtual addresses.
  - Processor accesses cache directly, not through physical cache.
  - Cache access faster, before MMU address translation.
  - Virtual addresses use same address space for different applications.
- Must flush cache on each context switch.



- Physical cache stores data using main memory physical addresses.

### Cache Size does matter

- Cost → Large cache is expensive.
- Speed

↳ Large cache is faster (upto a point)

↳ checking cache for data takes time.

## Mapping Function

↳ Defines the method of mapping b/w main memory address & cache location.

e.g.,

- Cache of 64 KBytes
- Cache block of 4 bytes

↳ Cache is 16K ( $2^M$ ) lines of 4 bytes

• 16 MB main memory → Direct Mapping → simple  
• 24 bit address → (Fully) Associative Mapping → improved hit rate  
 $(2^{24} = 16M)$  → set Associative Mapping → Hardware complexity / expensive

### Direct Mapping:

↳ Each block of main memory maps to only one cache line, i.e., if a block is in cache, it must be in one specific place.

↳ Address is in two parts.

↳ Least significant w bits identify unique word.

↳ Most significant s bits specify one memory block.

↳ The MSBs are split into a cache line field r and a tag of s-r (most significant).



First  $m$  blocks of main memory (equal to size of cache)

Cache memory

$b$ : length of block in bits  
 $t$ : length of tag in bits.

### Address Structure:

| Tag (s-r) | Line or slot (r) | word |
|-----------|------------------|------|
| 8         | 14               | 2    |

- 24 bit address
- 2 bit word identifier (4 byte block)
- 22 bit block identifier
  - ↳ 8 bit tag ( $= 22 - 14$ )
  - ↳ 14 bit slot or line
- No 2 blocks in the same line have same tag.
- check contents of cache by finding line & checking tag.



### Cache Line Table:

cache line

Main memory blocks held

0

0, m, 2m, 3m, ..., 2s-m

1

1, m+1, 2m+1, ..., 2s-m+1

⋮

m-1

m-1, 2m-1, 3m-1, ..., 2s-1

Lecture-17

### Direct Mapping Summary:

- Address length =  $(s+w)$  bits
- No. of addressable units =  $2^{(s+w)}$  words or bytes
- Block size = line size =  $2^w$  words or bytes
- No. of blocks in main memory =  $2^{(s+w)} / 2^w = 2^s$
- No. of lines in cache =  $m = 2^s$
- Size of tag =  $(s-w)$  bits

### Direct Mapping Pros & Cons:

- Simple
- Inexpensive (less hardware complexity)
- Fixed location for given block
  - If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high.
  - Repeatedly cache miss can occur due to the execution of the same blocks of code.
  - Cache trash.



for word size  
program size  
(size of page of loop)  
: greatest Nibble

below  
tid s

tid s for

word for word tid s does this benefit for tid s ←  
tid ref word of other mi writing for tid s benefit for program ←

## Victim Cache

↪ smaller than cache with similar speed, as the cache.



↪ Lower Miss Penalty

↪ Remember what was discarded

↪ Already fetched

↪ Use again with little penalty

↪ Fully associative

↪ 4 to 16 cache lines

↪ Between direct mapped L1 cache and next memory level.

## Associative Mapping

↪ A main memory block can load into any line of cache.

↪ Memory address is interpreted as tag and word.

↪ Tags uniquely identifies block of memory.

↪ Every line's tag is examined for a match.

↪ Cache searching gets expensive.

## Associative Mapping from Cache to Main Memory:



## Address Structure:



→ 22 bit tag stored with each 32 bit block of data.

→ Compare tag field with tag entry in cache to check for hit.

→ Least significant  $s$  bits of address identify which  $l$  bits word is required from 32 bit data block.

### Summary:

- Address length =  $(s+w)$  bits
- No. of addressable units =  $2^{s+w}$  words or bytes
- Block size = Line size =  $2^w$  words or bytes
- No. of blocks in main memory =  $\frac{2^{s+w}}{2^w} = 2^s$
- No. of lines in cache = undetermined
- Size of tag =  $s$  bits.

### Set Associative Mapping

↳ Cache is divided into a no. of sets.

↳ Each set contains a no. of lines.

↳ A given block maps to any line in a given set.

e.g., Block B can be in any line of set i.

↳ e.g., 2 lines per set (2 way)

↳ 2-way associative mapping

↳ A given block can be in one of 2 lines in only one set.

$$\rightarrow m = v \times k \quad \text{where } w_s = \log_2 m = \log_2 v \times k$$

$$\rightarrow i = j \bmod v \quad \text{where } v \text{ is no. of lines in each set}$$

i : cache set no.      j = address no. mod k

j : main memory block no.      v = no. of lines in each set

m : no. of lines in the cache      w\_s = log\_2 m = log\_2 v \times k

v : no. of sets

k : no. of lines in each set      k = 2^w

(K-way associative mapping)

→ Block Bj can be mapped to any of the lines of set j.

Eg. 13-bit set no.

Block no. in main memory is modulo  $2^{13}$

$$i = j \bmod v$$

$$v = 2^{13}$$

→ 000000, 00A000, 00B000, 00C000 ... map to same set.

← (32bit)  
(32bit)

→ 8-bit

→ 8-bit  
→ 8-bit

→ 8-bit  
→ 8-bit

## Mapping from main memory to cache:



## Address Structure:

|           |            |            |
|-----------|------------|------------|
| Tag 9 bit | Set 13 bit | 2 bit word |
|-----------|------------|------------|

- Use set field to determine cache set to look into.
- Compare tag field to see if we have a hit.

## Summary:

- Address length =  $(s+w)$  bits
- No. of addressable units =  $2^{sw}$  words or bytes.
- Block size = line size =  $2^w$  words or bytes
- No. of blocks in main memory =  $2^d$
- No. of lines in set =  $k$
- No. of sets =  $v = 2^d$
- No. of lines in cache =  $Kv = k * 2^d$
- No. of tag =  $(s-d)$  bits.

Lecture-18



## Direct & set Associative cache Performance Differences:

- Significant upto at least 64 KB for 2-way
- Difference b/w 2-way and 4-way at 4KB  $\ll$  4KB to 8KB.
- Cache complexity increases with associativity.
- Not justified against increasing cache to 8KB or 16KB.
- Above 32 KB gives no improvement.

## Locality of Reference in cache Design

### • Spatial locality

- ↪ Memory addresses numerically similar to a recently accessed memory location are likely to be accessed in the near future.
- ↪ Cache exploit this property.
  - ↪ by bringing in more data (block size or line size) than have been requested
  - ↪ future requests can be anticipated.

### • Temporal locality

- ↪ Occurs when recently accessed memory locations are likely accessed again.
- ↪ Eg:
  - memory locations near the top of the stack.
  - instructions inside a loop
- ↪ Temporal locality is exploited in cache designs primarily by the choice of what to discard on a cache miss.

## Replacement Algorithm

- ↪ Replacing the contents of the cache memory which is full when a new item is brought in.

## Replacement Algorithms in Direct Mapping caches:

- ↪ Mapping leaves no choice for the data block to be stored.
- ↪ Each block only maps to one unique line.
- ↪ Replace that line.

First  $m$  blocks of  
main memory  
(equal to)  
(cache size)

$B_{m-1}$



## Replacement Algorithms and Associativity of Set Associative Caches:

→ Replacement algorithms are relevant for Associative and Set-Associative Mapped caches.

→ Algorithms are implemented in hardware

→ Speed of operation is very critical.

Eg In 2 way set associative

→ which of the 2 block is LRU?

→ There are main algorithms:

• Least Recently Used (LRU)

• First In first Out (FIFO)

• Most Frequently Used (MFU) / Least Frequently Used

• Random

### Least Recently Used (LRU)

→ Maintain the information about the usage of cache lines.

→ A new memory request facing a cache miss results in memory access.

→ The least recently used item is replaced.

→ Most popular algorithms in cache replacement.

→ How to keep track of the usage

→ Easy to be implemented in a fully-associative cache

→ 1 bit for 2-way set associative caches (1 for the recently accessed item and 0 for the other item) and replace the older item.

→ Maintains a separate list of indexes to all the lines in the cache.

→ When an item is accessed, it is moved to the front of the list.

→ Item from the end of the list is removed when a new item is cached.



## First In First Out (FIFO)

- ↪ Replace block that has been in cache longest.
- ↪ First item stored in the cache is moved out when the cache is full and a new item is to be taken into the cache.
- ↪ Easy to implement using a circular or round-robin buffer.

## Least Frequently Used

- ↪ Replace block which has had fewest hits.
- ↪ Maintain a counter against each cache line.
- ↪ Increment the counter with each reference.
- ↪ Replace the item with lowest count with new data item.
- ↪ Reset the counter.

## Random Replacement

- ↪ Easy to implement.
- ↪ Faster processing.
- ↪ Performance is not very poor in comparison to the rest of the algorithms.

## Cache Write Policy

- ↪ Writing data into memory/cache faces some issues.
- ↪ Processor modifies the data and writes to cache/Memory.
- ↪ Data is first written into the cache.
  - ↪ It must also be updated on main memory.
- ↪ Cache content modified by the processor shall be written onto main memory.
- ↪ Must not overwrite a cache block unless main memory is up-to-date.
- ↪ Data inconsistency can be resulted due to a no. of issues.

- ↪ Multiple CPUs may have individual caches.
- ↪ I/O may address main memory directly which may update the main memory contents.

Cache Write Policy

→ Must not overwrite a cache block unless main memory is upto date.

→ Even if

• Multiple CPUs present and they may have individual caches or

• I/O may address main memory directly.

Write Through

→ All writes go to main memory as well as cache.

→ Main memory is always updated.

→ Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache upto date.

→ Lots of traffic

→ slows down writers.

Write Back

→ Updates initially made in cache only.

→ Update bit for cache slot is set

when update occurs.

→ If block is to be replaced, write to main memory only if update bit is set.

→ Other caches get out of sync.

→ I/O must access main memory through cache.

→ complex circuitry required.

→ Memory wasted

→ Normal applications : Only 15% (85% for READ)

→ HPC applications : 33-50%.

Example: Write Back v/s. Write Through ?

• Cache with line size :

– 32 bytes (8 words of 4 bytes each)

– Assuming each word gets a write at least once.

– Main memory data transfer rate : word (4 bytes) in 30 ns.

- Write-through case:

- No. of word writes :  $32/4 = 8$  word writes
- No. of write back line requires : 8 [Each word writes takes line]
- Each word write takes 30 ns.
- Total write time required :  $30\text{ns} \times 8 \text{ words} \times 8 \text{ writers}$
- Write-back case:
- No. of word writes :  $32/4 = 8$  writes
- No. of write back of line requires : 8 [Each line update takes 30 ns.]
- Each line update takes 30 ns.
- Total write time required :  $30\text{ns} \times 8 = 240\text{ns}$

## Shared Main Memory Systems

- When main memory is shared
  - ↳ By multiple processor-cache systems
  - ↳ Cache coherency is a major issue.
  - ↳ If one cache line is altered, main memory and other caches invalidate the data.
  - ↳ Present in write-through caches also.
- Method for Cache coherency
  - ↳ Bus watching with write-through
    - ↳ Each cache controller monitors the address bus.
    - ↳ If any other cache controller writes to a main memory block, the other caches invalidate the cache data (if present).
    - ↳ Useful for write-through caches.

### # Hardware transparency

### # Noncacheable memory

- ↳ Only a portion of the memory is shared among the processors [designated as "non-cacheable" memory].
- ↳ All access to this area face miss.

### # Hardware transparency

- ↳ Additional hardware circuitry is used to update all caches whenever there is a write to main memory through cache.

Line Size

- Retrieve not only desired word but a no of adjacent words as well.
- Increased block size will increase hit ratio at first.
  - the principle of locality
- Hit ratio ↓ as block size becomes even bigger.
  - Probability of using newly fetched info. becomes less than probability of Reusing/replaced.
- Larger blocks
  - No. of blocks that fit in cache ↓
  - Data overwritten shortly after being fetched.
  - Each additional word is less local so less likely to be needed.
- No definitive optimum value (8 to 64 bytes - reasonable).

Multilevel caches

- High logic density enables caches on chip
  - faster than bus access
  - frees bus for other transfers
- Common to use both one and off chip cache.
  - L<sub>1</sub> on chip, L<sub>2</sub> off chip in static RAM
    - access much faster than DRAM or ROM
    - often uses separate datapath.



Use of large L2 cache is numbered to avoid miss →  
program. miss of L1 cache is short because L2 cache  
is also significant

## Unified and Split caches

- Unified cache → one cache for data and instructions.
- Split cache
  - ↳ Two separate caches
  - ↳ one for data & one for instructions.
  - ↳ Advantage: Eliminates cache "contention" b/w instruction fetch/decode unit and execution unit.
    - ↳ Important in pipelining.
- Unified cache advantage:
  - ↳ High hit rate
    - ↳ Balances load of instruction and data fetch.
    - ↳ only one cache to design and implement.

Lecture-20

## Pipelining

A five-stage pipeline:

- ↳ The state of each stage as a function of time.



## Superscalar Architectures

- ↳ Dual five-stage pipelined with a common instruction fetch unit.



→ A superscalar processor with 5 functional units:



Fetch-50

privilege

: privilege state A



Superscalar Architecture



# I/O SUBSYSTEM

## Input/Output Problems

↳ Challenges in interfacing with I/O.

- Wide variety of peripherals

↳ Delivering different amounts of data

↳ at different speeds

↳ in different formats

- Most are slower than CPU and RAM

↳ Some may be even faster (e.g., optical fiber links)

- Need I/O modules that can handle the diversity



## Input/Output Modules

↳ Main functions:

↳ Interface to CPU and Memory

↳ Interface to one or more peripherals.

## Lecture-21



## Generic Model of I/O Module

### Diversity of External Devices

- Human Readable → Screen, printer, keyboard

- Machine Readable → Monitoring and control, Sensor data

- Communication → Modem, Network Interface Card (NIC)

- Based on Data flow direction :

↳ Input devices:

↳ Output devices:

↳ Input & Output Devices:

Communication interfaces



## Bandwidth/Data Rate Characteristics

→ Major types of I/O devices:

- Human readable

- Machine readable

- Communications

Arrived at subAM of

1

2

3

4

5

6

7

→ Contributing factors:

- Partner: humans have slower input/output data rates than machines.
- Input or output or both.

→ Physical characteristic:

- Data rate
- Response time (delay characteristics)
- Buffer requirement
- Any other special requirements.

## Functions of I/O Modules

- ① Control and Timing
- ② CPU Communication
- ③ Device (peripheral) communication.
- ④ Data Buffering
- ⑤ Error Detection

### I/O Steps (Typical)

- CPU checks I/O Module device status.
- I/O module returns status.
- If ready, CPU requests data transfer.
- I/O Module gets data from device.
- I/O Module transfers data to CPU.
- Variations for output, DMA, etc.



### I/O Module Decisions

- Hide or reveal device properties to CPU.
- Support multiple or single device.
- Control device functions or leave for CPU.
- Also OS decisions
  - ↳ e.g., Unix treats everything it can as a file.

## Input Output Techniques

- Programmed I/O operation
- Interrupt driven I/O operation
- Direct Memory Access (DMA) based I/O operation

### Programmed I/O

- CPU has direct control over I/O.
- Sensing status
- Read/Write commands
- Transferring data
- CPU waits for I/O module to complete operation.
- Wastes CPU time.

- CPU requests I/O operation.
- I/O module performs operation.
- I/O module sets status bits.
- CPU checks status bit periodically.
- I/O module does not inform CPU directly.
- I/O module does not interrupt CPU.
- CPU may wait or come back later.

### I/O Commands

- CPU issues address
  - Identifies module and device.
- CPU issues command
  - Control - telling module what to do.  
(e.g., spin up disk)
  - Test - check status  
(e.g., power?, ~~Error?~~)
  - Read/Write
    - Module transfers data via buffer from/to device.

### Addressing I/O Devices

- Under programmed I/O data transfer is very like memory access (CPU viewpoint).
- Each device given unique identifier.
- Each command contains Identifier (address).



Lecture-22

## I/O Mapping

→ Memory mapped I/O

- ↳ Devices and memory share an address space.
  - ↳ I/O looks just like memory read/write.
  - ↳ No special commands for I/O
    - ↳ Large selection of memory access commands available.
  - Isolated I/O or Peripheral I/O or I/O mapped I/O
    - ↳ Separate address spaces.
    - ↳ Need I/O or memory select lines.
    - ↳ special commands for I/O
      - ↳ limited set







## Mode 0: Simple Input/Output

- Ports A and B can be used as two simple 8-bit I/O ports
  - Port C as two 4-bit I/O ports
    - ↳ Each sub-ports can be programmed separately.
  - Outputs are latched (buffered)
  - Inputs are not latched.
  - Ports do not interrupt or handle interrupt capability.

Control word:





Port A =  $8000H$  ( $A_1=0, A_0=0$ )  
 Port B =  $8001H$  ( $A_1=0, A_0=1$ )  
 Port C =  $8002H$  ( $A_1=1, A_0=0$ )  
 Control Register =  $8003H$  ( $A_1=1, A_0=1$ )

LDA H80,A TIA  
 [PB] → ACC → PA

TIA

### Program:

MVI A, 83H → Load ACC with control word  
 STA 8003H → Write word in the control register to initialize the ports  
 LDA 8001H → Read Switches at port B  
 STA 8000H → Display the reading at port A  
 LDA 8002H → Read switches at port C  
 ANI OFH → Mask the upper 4 bits of PCA (not i/p data)  
 .B RLC → Rotate & place data in upper half of the Acc.  
 RLC  
 RLC  
 STA 8002H → Display data at port C  
 HLT

### BSR Mode:



e.g. Write a BSR control word subroutine to set bit PC<sub>7</sub> and PC<sub>3</sub> and reset them after 10 ms.

### BSR control words

|                              | D <sub>7</sub> | D <sub>6</sub> | D <sub>5</sub> | D <sub>4</sub> | D <sub>3</sub> | D <sub>2</sub> | D <sub>1</sub> | D <sub>0</sub> |
|------------------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| To set bit PC <sub>7</sub>   | = 0            | 0              | 0              | 0              | 1              | 1              | 1              | 1              |
| To reset bit PC <sub>7</sub> | = 0            | 0              | 0              | 0              | 1              | 1              | 1              | 0              |
| To set bit PC <sub>3</sub>   | = 0            | 0              | 0              | 0              | 0              | 1              | 1              | 1              |
| To reset bit PC <sub>3</sub> | = 0            | 0              | 0              | 0              | 0              | 1              | 1              | 0              |

$$0FFH = 0111 1111$$

$$0EH = 0001 1110$$

$$07H = 0000 1111$$

$$06H = 0000 1101$$

### Port Address:

Control ~~and~~ register address = 83H

### Subroutine:

BSR: MVI A,0FH → Load byte in Acc to set PC7  
OUT 83H → Set PC7=1

MVI A,07H

(0=A,0=7) OUT 83H = A bcd

(1=A,0=7) CALL DELAY → 10 ms delay

(0=A,1=7) MVI 0,06H

(1=A,1=7) → Load Acc with the byte to reset PC3

OUT 83H

→ Reset PC3

MVI A,0EH

OUT 83H

RET

breadstrokes Atm 32A board ← HEB, H1VM  
atreq ext enable of port A board ext int breadstrokes ← H5008 AT2

a req to arbiter board ← H1008 AT1

a req to arbiter ext pins →

a req to arbiter board ← H5008 AT3

Lecture-25

### Mode 1

- Ports A and B ~~are~~ at 8 bit I/O ports
- Port C lines are used for handshaking for Ports A & B.
- I/P and O/p are latched.
- Interrupt logic (handshake is supported).

### Mode 2 : Bi-directional data transfer

- For data transfer b/w two computers, storage controllers.

#### → Port A

- can be configured as bi-directional port

- uses 5 lines from Port C for handshaking.

#### → Port B

- Either in Mode 0 or Mode 1.

#### → Port C

- 5 lines are for Port A handshaking

- 3 lines can be used as simple I/O or handshake for port B.

Tid = 111

ext breadstrokes tid=100 of grtivated board board R28 is H1VM of  
int of rjfe mmt board board

breadstrokes R28

H70 = 1 1 1 0 0 0 0 = Tid + 100 of

H30 = 0 1 1 0 0 0 0 = Tid + 100 of

H50 = 1 1 0 0 0 0 0 = Tid + 100 of

Tid + 100 of

Tid + 100 of

## Interrupt Driven I/O

- CPU overcomes waiting.
- No repeated CPU checking of device.
- I/O module interrupts when ready.

### Basic operation:

- CPU issues read command.
- I/O module gets data from peripheral whilst CPU does other work.
- I/O module interrupts CPU.
- CPU requests data.
- I/O module transfers data.

### Design issues:

- Identifying the module issuing the interrupt.
  - Dealing with multiple interrupts (ie, an interrupt handler being interrupted)
  - Each interrupt line has a priority.
  - Higher priority lines can interrupt lower priority lines.
  - Bus mastering, only current master can interrupt.
  - Different lines for each module.
- ↓  
↳ PCI  
↳ limits no. of devices.

### Software poll

- ↳ CPU asks each module in turn.
- ↳ Slow

### Daisy chain or Hardware poll

- ↳ Interrupt Acknowledge sent down a chain.
- ↳ Module responsible places vector on bus.
- ↳ CPU uses vector to identify handler routine.

### Bus Master

- ↳ Module must claim the bus before it can raise interrupt.
- ↳ E.g., PCI & SCSI



## Internal Memory

→ Internal Memory

↳ Cache/Main memory

→ External Memory

↳ External storage devices

↳ HDDs/ Disks/ TAPE Devices

## Semiconductor Memory Types:

| Memory Type                         | Category           | Erasure                                     | Write Mechanism                  | Volatility   |
|-------------------------------------|--------------------|---------------------------------------------|----------------------------------|--------------|
| Random-access memory (RAM)          | Read-write memory  | Electrically, byte-level                    | Electrically                     | volatile     |
| Read-only memory (ROM)              | Read-only memory   | Not possible                                | Masks (Chip layout, high volume) |              |
| Programmable ROM (PROM)             |                    |                                             |                                  |              |
| Erasable PROM (EPROM)               | Read-mostly memory | UV Light, chip-level (exposure for 10-20nm) | Electrically                     | Non-volatile |
| Electrically Erasable PROM (EEPROM) |                    | Electrically, byte-level                    |                                  |              |
| Flash Memory                        |                    | Electrically, block-level                   |                                  |              |

## Memory-cell operation (bit):



RAM → Read/write

→ Volatile

→ Temporary storage

→ Static or dynamic

## Dynamic RAM

↳ Bit stored as charge in capacitors

↳ charges leak

↳ Need refreshing even when powered

↳ Simplex construction.

- Smaller area per bit
- Less expensive
- Need refresh circuits
- Slower
- Main memory
- Essentially analog

Level of charge determines value.

### DRAM Operation:

→ Address line active when bit is selected. Read or written.

↳ Transistor switch closed (current flows)

→ Write:

↳ Voltage to bit line (High  $\rightarrow$  1, Low  $\rightarrow$  0)

↳ Then signal address line

↳ Transfers charge to capacitor.

→ Read:

↳ Address line selected

↳ Transistor turns ON

→ Charge from capacitor fed via bit line to sense amplifier

↳ Compared with ref. value to determine 0 or 1.

→ Capacitor charge must be restored.



### DRAM Organisation: Many ways

↳ 16 Mb chip → organised as 1M of 16 bit words.

↳ about  $2048 \times 2048 \times 4$  bit array

Refreshing: Reduces no. of address pins.

↳ Refresh circuit included on chip.

↳ Disable chip.

↳ Count through rows.

↳ Read & write back.

↳ Takes time

↳ Slows down apparent performance.

### Static RAM

↳ Bits stored as on/off switches.

↳ Transistors used for bi-stable multi-vibrators

↳ Digital → uses flip-flops.

↳ No charges to leak.

↳ No refreshing needed when powered.

↳ Does not need refresh circuits.

↳ More complex construction.

- Larger size/area per bit.
- More expensive.
- ~~slow~~ faster.
- Suitable for cache memory.



### Static RAM operation:

→ Transistor arrangement gives logic state.

→ State 1 :  $C_1$  high,  $C_2$  low

$T_1, T_4$  OFF ;  $T_2, T_3$  ON

→ State 0 :  $C_2$  high,  $C_1$  low

$T_2, T_3$  OFF ;  $T_1, T_4$  ON

→ Address line transistors  $T_1$  and  $T_6$  are switch.

→ Write - apply value to B & complement to B buffer

→ Read - value is on line B.

## ROM

↳ Permanent storage

↳ Non-volatile

↳ Microprogramming

↳ Library subroutines

↳ Systems programs (BIOS)

↳ Function tables.

## External Memory

### Types:

- Magnetic Disk (HDD, Floppy Drive)
  - RAID (Redundant Array of Inexpensive/Independent Disks)
  - Removable
- Optical
  - CD-ROM
  - CD-Recordable (CD-R)
  - CR-R/W
  - DVD
- Magnetic Tape → Archival storage of large volume of data.

### Magnetic Disk

- ↳ Disk substrate coated with magnetizable material (Iron oxide).
- ↳ Substrate used to be aluminium [Now glass].

### Read-Write Mechanism

- ↳ Recording & retrieval via conductive coil called head
  - ↳ May be single or separate for Head/Write.
- ↳ Write: Current through coil produces magnetic field
  - ↳ Pulses sent to head
  - ↳ Magnetic pattern recorded on surface below.
- ↳ Read (Traditional)
  - ↳ Mag. field moving relative to coil produces current
  - ↳ same coil for Head/Write.
- ↳ (Contemporary)
  - ↳ Separate Head/Write head, close to write head.
  - ↳ Higher storage density and speed.
  - ↳ Partially shielded magneto resistive (MR) sensor.

### Data Organization and Formatting

- Concentric rings of tracks
  - ↳ Gaps b/w tracks (reduce to increase capacity).
  - ↳ Same no. of bits per track (variable packing density).
    - ↳ Constant angular velocity.
- Tracks divided into sectors.
- Min. block size is one sector.
- May have more than one sector per block.

BC-30101



(With help of diagram)

### Disk velocity

- ↳ Bit near centre of rotating disk passes fixed point slower than bit on outside of disk.
- ↳ Increase spacing b/w bits in different tracks.
- ↳ Rotate disk at constant angular velocity.
  - ↳ Individual tracks and sectors addressable.
  - ↳ Move head to given track and wait for given sector.
  - ↳ Waste of space on outer tracks.
- ↳ Lower data density.
- ↳ Can use zones to increase capacity [fixed bits per track].

### Characteristics

- ① Fixed Head (stationary)
  - ↳ One R/W per track
  - ↳ Head mounted on fixed bridged arm.
  - ↳ Movable head
    - ↳ One R/W per side
    - ↳ Mounted on a movable arm.
- ② Removable disk
  - ↳ Can be replaced
  - ↳ Permanently mounted
  - ↳ Easy data transfer in the drive
- ③ Single sided
  - Double sided
  - ↳ Usually
- ④ Single platter
  - Multiple platters
  - ↳ One head per side (joined & aligned)
  - ↳ Aligned tracks on each platter form cylinders
  - ↳ Data is striped by cylinder
  - ↳ Reduces head movement
  - ↳ Increases speed (transfers rate).



## ⑤ Head Mechanism

- contact (Floppy)
  - ↳ Head physically touched platter/media
  - ↳ Slow access
  - ↳ Low density of storage
  - ↳ Possibility of damage to the media
  - ↳ High wear and tear.
- Fixed Gap → not very common
  - ↳ Maintains gap b/w head and media, (as small as possible)
  - ↳ Complex to maintain the gap
  - ↳ Speed of access is faster.
- Flying (Winchester)
  - ↳ Head rests on media when no rotation.
  - ↳ When media/disk rotated at high speed, head is aerodynamically lifted and a gap is maintained.
  - ↳ Common in Hard disk drives.

## Speed of Data Transfer

- Seek time: Moving head to correct track
- (Rotational) latency: Waiting for data to rotate under head.
- Access time = Seek + Latency



- Seek time
  - Initial start up time
  - Time taken to traverse the tracks that have to be crossed.
  - Settling time.
  - Typically, <10ms for HDDs.
  - Mostly contributed by mechanism design.

- Rotational delay: 3600 RPM to 20000 RPM (speed)
  - ↳ R.D ~ 1.5 ms (average)

Transfer time,  $T = \frac{B}{RN}$ , B : no. of bytes to read  
N : no. of bytes on a track  
R : rotational speed in RPS.

$$\rightarrow \text{Access time} = \text{Av. seek time} + \text{Rotational delay} + \text{Data Transfer time}$$

$$\hookrightarrow T_a = T_{\text{seek}} + T_{\text{rot}} + \text{Transfer time}$$

$$= T_s + \frac{l}{2\pi r} + \frac{B}{N}$$

Lecture-29

## Solid State Drives

- ↳ NAND flash memory-based drives
- ↳ High voltage is able to change the configuration of a floating-gate transistor.
- ↳ State of the transistor interpreted as binary data.
- ↳ More resilient against physical damage.
- ↳ Greatly reduced power consumption (no mechanical moving parts).
- ↳ Much faster than hard drives
  - ↳ No penalty for random access
  - ↳ Each flash cell can be addressed directly.
  - ↳ No need to rotate or seek.
- ↳ Extremely high throughput.

## Performance of External Memory

- ↳ Data recoverability in the event of disk failures is critical.
- ↳ Two approaches
  - ① single large expensive disk
    - ↳ High capacity, high transfer rate
    - ↳ Low access time
  - ② Redundant Array of Inexpensive (Independent) Disks [RAID]
    - ↳ Stored data is striped among several physical drives
    - ↳ Redundant disks provide higher reliability, robustness and fault tolerance.
    - ↳ High data recoverability in the event of disk failure
    - ↳ O.S. sees a single logical drive.

## RAID

- ↪ Not a hierarchy
- ↪ Set of physical disks viewed as single logical drive by O/S.
- ↪ Data distributed across physical drives [by striping].
- ↪ Can use redundant capacity to store parity information.

## RAID 0

- No redundancy
- Data striped across all disks [Round Robin striping]
- Increase speed (disks seek in parallel)



## RAID 1

- Mirrored disks
- ↪ 2 copies of each strip on separate disks
- ↪ Read from either, write to both
- ↪ Simple recovery, expensive



## RAID 2

- Disks are synchronised
- ↪ very small stripes (often single byte/word)
- ↪ Error correction calculated across corresponding bits on disks.
- ↪ Multiple parity disks store Hamming code error correction in corresponding positions.
- ↪ Lots of redundancy, expensive, not used.



### RAID 3 → Similar to RAID 2

↳ only one redundant disk, no matter how large the array.



↳ simple parity bit for each set of corresponding bits.

↳ very high transfer rate.

Recovery:  $x_p = x_1 + x_2 + x_3$  [0, 1, 0]

$$\Rightarrow x_p = 0 + 1 + 0 = 1 \equiv \text{even parity}$$

Disk 2 is damaged  $\Rightarrow x_2 = x_p + x_1 + x_3$   
 $= 1 + 0 + 0 = 1 \rightarrow \text{Recovered.}$

### RAID 4 → Each disk operates independently.

↳ Not sync



↳ works with stripes (not words or bits).

↳ Good for high I/O request rate.

↳ Large stripes

↳ Bit-byte parity calculated across stripes on each disk.

↳ Parity stored on parity disk.

### RAID 5 → Similar to RAID 4 ; reduces parity disk bottleneck

↳ Avoids RAID 4 bottleneck at parity disk.

↳ commonly used in network servers.

↳ Complex Recovery



### RAID 6 → Two parity calculations

↳ stored in separate blocks on different disks.

↳ User requirement of N disks needs N+2.

↳ High data availability



Optical Storage: CD-RDM

- ↳ Poly carbonate coated with highly reflective coat, usually Al.
- ↳ Data stored as pits
- ↳ Read by reflecting laser
- ↳ Constant packing density and linear velocity



- ↳ Audio is single (laser transmit/receive speed 1.2 m/s).
- ↳ Other speeds are quoted as multiples (e.g., 21X) [Maxm speed].

CD-RDM format :

Mode 0 = blank data field

Mode 1 = 2048 byte data + error correction

Mode 2 = 2336 byte data.

Random Access on CD-RDM.

- ↳ Difficult; Move head to rough position
- ↳ Set correct speed
- ↳ ~~Read~~ Read address & data
- ↳ Adjust to required location (takes time).

Advantages :

- ↳ Large capacity
- ↳ Easy to mass produce
- ↳ Removable
- ↳ Robust

Disadvantages :

- ↳ Expensive for small runs
- ↳ Slow
- ↳ Read-only

CD-Recordable (CD-R)

- ↳ Write Once Read Many (WORM)
- ↳ Was affordable
- ↳ Compatible with CD-RDM drives.

CD-RW

- CD-RW
    - Erasable (50K-100K times rewritable)
    - Getting cheaper
    - Mostly CD-ROM drive compatible
    - Phase change → 2 different reflectivities in different phase states.

## Digital Video Disk (DVD)

- ↳ Digital Video Disk. → used to indicate player for movies.
  - ↳ Digital versatile Disk. → used to indicate computer drive.
    - ↳ can contain data and video.
  - ↳ Multi-layer
  - ↳ very high capacity (4.7 G per layer)
  - ↳ Movies carry regional coding
  - ↳ rewritable.

## High Definition optical disks

- ↳ For high definition videos.
  - ↳ Much higher capacity than DVD (shorter wavelength laser).
  - ↳ Blue ray → data layer closed to laser.
  - ↳ Available Head-only, recordable once, re-recordable.

## Magnetic Tape

- ↳ Serial access
  - ↳ Slow
  - ↳ Very high storage capacity
  - ↳ very cheap
  - ↳ Backup and archive
  - ↳ Linear Tape-open (LTO) Tape Drives

## Serpentine reading and writing of Tape drives:

Track 2 | | | | | | | | →

Recovery →  
Reform ←

Track 1 | | | | ~~Wavelength~~ ←

$\xrightarrow{\text{Diss of RW}}$   $\text{H}_2\text{O} - \text{base}$

Bottom  
edge of tape

Wardrobe (CD-RW) Wardrobe (CD-RW) Wardrobe (CD-RW)

# OPERATION OF A CPU

- Fetch instructions
- Interpret instructions
- Fetch data (Immediate)
- Process data
- Write data



## Registers

- ↳ working space (temporary storage)
- ↳ Top level of memory hierarchy

## User visible Registers

- General Purpose
- Data
- Address
- Condition codes (Flags)

## General Purpose Registers

- ↳ May be true GP or restricted
- ↳ Used for data or addressing
- ↳ Design:

↳ GP: increase flexibility & programmer options

↳ increase instr. size & complexity

↳ Specialised → Smaller (faster) instr.

↳ less flexible

↳ B/W 8-32

↳ Fewer → more memory ref.

↳ More → reduce mem. ref.

↳ Takes processor real estate.

## Registers

- ↳ Large enough to hold full address and full word.
- ↳ often possible to combine two data registers.

### Condition code Registers:

- ↳ set of individual bits (e.g., result of last operation was zero)
- ↳ can be read (implicitly) by programs (e.g., jump if zero)
- ↳ cannot (usually) be set by programs.

### Control and status Registers:

- ↳ Program counter
- ↳ Instruction decoding Register / Instruction Decoder (IR)
  - ↳ temporarily store instr. code while decoding, executing.
- ↳ Memory Address Register (MAR)
  - ↳ stores address of memory locations to and from which data is transferred.
- ↳ Memory Buffer Register (MBR)
  - ↳ stores data being transferred to and from immediate access storage (memory).

### Program Status Word

- ↳ A set of bits
- ↳ includes condition codes



### Data Flow (Instr. Fetch):

- ↳ PC contains address of next instr.
- ↳ Address moved to MAR.
- ↳ Address placed on address bus.
- ↳ Control unit requests memory read.
- ↳ Result placed on data bus, copied to MBR, then to IR.
- ↳ Meanwhile,  $PC = PC + 1$ .

→ 32-bit word

for program start → ROM →

for main code → RAM →

for memory transfer → RAM →

## Data Flow

→ IR

→ MBR

→ RAM

→ ROM

→ Main Bus

→ CPU

→ May

→ Data

→ RAM

→ ROM

→ CPU

→ Main Bus

→ CPU

→ RAM

→ ROM

→ CPU

→ Main Bus

→ CPU

→ RAM

→ ROM

→ CPU

→ Main Bus

→ CPU

→ RAM

→ ROM

→ CPU

→ Main Bus

→ CPU

→ RAM

→ ROM

→ CPU

→ Main Bus

→ CPU

→ RAM

→ ROM

→ CPU

→ Main Bus

→ CPU

→ RAM

→ ROM

→ CPU

→ Main Bus

→ CPU

→ RAM

→ ROM

## Data Flow (Data Fetch):

- ↳ IR is examined.
- ↳ If indirect addressing, indirect cycle is performed.
- ↳ Right most N bits of MBR transferred to MAR.
- ↳ Control unit requests memory read.
- ↳ Result (result of operand) moved to MBR.

## Data Flow (Execute):

May include:

- Memory read/write
- Input/Output
- Register transfers
- ALU operations.

## Data Flow (Interrupt):

- Simple, Predictable.

↳ Current PC saved to allow resumption after interrupt.

↳ Contents of PC copied to MBR.

↳ Special memory location (e.g., stack pointer) loaded to MAR.

↳ MBR written to memory.

↳ PC loaded with address of interrupt handling routine.

↳ Next instr. can be fetched.

lecture - 32

## Prefetch

↳ Fetching main memory.

↳ Execution can fetch next instruction during execution of current instruction.

↳ Improved performance

↳ But not doubled.

↳ Add more stages to improve performance.

## Pipelining

↳ Overlap operations.

↳ Two stage instruction pipelining.



Expanded view:



## Operations:

- ① Fetch Instruction (FI) → Read into buffer
- ② Decode Instruction (DI) → Determine opcode & operand specifiers.
- ③ Calculate Operands (CO) → calculate effective addr. of each source operand
- ④ Fetch Operands (FO) → From memory; need not from registers.
- ⑤ Execute Instruction (EI) → Perform indicated operation & store the result, if any, to specified destination.
- ⑥ Write Operand (WO) → Store result in memory.

## Timing Diagram

Time →

|          | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8 | 9 | 10 | 11 | 12 |
|----------|----|----|----|----|----|----|----|---|---|----|----|----|
| Instn. 1 | FI | DI | CO | FO | EI | WD |    |   |   |    |    |    |
| Instn. 2 |    | FI | DI | CO | FO | EI | WD |   |   |    |    |    |
| 3        |    | FI | DI | CO | FO | EI | WD |   |   |    |    |    |
| 4        |    | FI | DI | CO | FO | EI | WD |   |   |    |    |    |
| 5        |    | FI | DI | CO | FO | EI | WD |   |   |    |    |    |
| 6        |    | FI | DI | CO | FO | EI | WD |   |   |    |    |    |
| 7        |    | FI | DI | CO | FO | EI | WD |   |   |    |    |    |

7 instrns. in 12 clock cycles  
12/7 clock cycles per instrn.  
Compare this to  $7 \times 6 = 42$  cycles.

## Pipelining performance and clock cycle time

Cycle time,  $T = \max_i [T_i] + d$   $[1 \leq i \leq k]$

$T_i \rightarrow$  time delay in circuitry in  $i^{th}$  stage of pipeline.

$T_m \rightarrow$  max stage delay

$k \rightarrow$  no. of stages

$d \rightarrow$  time delay in latch [to advance signal to next stage].  
Equivalent to clock pulse

$T_m \gg d$

For  $n$  instr. with no branches, total time required for a pipeline with  $k$  stages to execute  $n$  instructions,

$T_{k,n} = [k + (n-1)]T$  (No branching)

For no pipelining, instruction cycle time =  $KT$

$$\text{Speed up factor, } S_K = \frac{T_{K,n}}{T_{K,n}} = \frac{nKT}{[K+(n-1)T]} = \boxed{\frac{nK}{K+(n-1)}}$$

Ex: A pipelined processor has a clock rate of 2.5 GHz and executes a program with 1.5 M instr.. The pipeline has 5 stages, and instr. are issued at a rate of one clock per clock cycle. Ignore penalties due to branch instr. and out-of-sequence executions.

(a) what is the speedup of this processor for this program compared to a non-pipelined processor?

(b) what is throughput (in MIPS) of the pipelined processor?

Soln: Since large no. of instr., we can exclude initial filling as well as end finishing up.

(a) So, speedup factor is 5.

(b) 1 instr. in 1 clock cycle  $\Rightarrow$  MIPS = 2500 MIPS.

Ex: A non-pipelined processor has a clock rate of 2.5 GHz and an average CPI (clocks per instr.) of 4. An upgrade to the processor introduces a 5-stage pipeline. However, due to internal pipeline delays, such as latch delay, the clock rate of the new processor has to be reduced to 2 GHz.

(a) what is the speedup achieved for a typical program?

(b) what is the MIPS rate for each processor?

Soln: (a)  $K=5, n=100$  (say)

$$\text{Speedup, } S = \frac{nK}{K+(n-1)} = \frac{500}{104} = 4.8$$

However, the pipelined 2 GHz CPU will have a reduced speed of a factor of 0.8 to the unpipelined 2.5 GHz CPU.

$$\text{So, overall speedup} = 4.8 \times 0.8 = 3.8$$

(b) Non-pipelined: Each instr. takes 4 clocks. So, MIPS =  $\frac{2500 \text{ MHz}}{4} = 625 \text{ MIPS}$ .

Pipelined: One instr./clock is completed. So, MIPS = 2000 MIPS.

ConsiderPipelining with Branching

Consider an instr. sequence of length  $n$  that is streaming through the instruction pipeline.

$p$ : probability of encountering a conditional or unconditional branch instruction.

$q$ : probability that execution of a branch instr. causes a jump to a non-consecutive address.

No. of instr. causing branching =  $n p q$

No. of instr. not causing branching =  $(1 - pq) n$

Non-branching case: Time taken by a pipelined system of  $K$  stages,  $n$  instructions and cycle time  $T$ ,  $\text{Time} = \frac{nK}{n + K - 1} T$

$$T_{K,n} = [K + (n-1)] T$$

Branching case:

$$T_K = pq n K T + (1-pq) [K + (n-1)] T$$

Speedup (non-branching):

$$S_K = \frac{T_{K,n}}{T_K} = \frac{n K T}{[K + (n-1)] T} = \frac{n K}{K + (n-1)}$$

(Branching) speedup:

$$S_K = \frac{T_1}{T_K} = \frac{n K T}{(pq) n K T + (1-pq) [K + (n-1)] T} = \frac{n K}{pq n K + (1-pq) [K + (n-1)]}$$

$$S_A = \frac{0.2}{0.1} = \frac{2}{1 + K} = 2 \cdot \frac{1}{1 + K}$$

$$S_E = 8.0 \times 8.4 = 67.2$$

$$\frac{0.2}{0.1} = 2 \cdot \frac{1}{1 + K} \Rightarrow 2 = \frac{1}{1 + K} \Rightarrow K = 0.5$$

$$1000 S = 29 IM \cdot 0.2 \cdot 67.2 \cdot 0.5 = 29 IM \cdot 67.2$$

## Pipeline Hazards

- ↳ Conditions in a pipelined machine that impede the execution of a subsequent instruction in a cycle.
- ↳ Types:
  - Resource
  - Data
  - Control

### Resource Hazards / Structural Hazard

- ↳ Two (or more) instructions in pipeline need same resource.
  - ↳ Executed in serial rather than parallel for part in pipeline.
- Eg: Instr. fetches & data reads and writes performed at once.
  - ↳ single ALU      ↳ Idle for one cycle (fetch)      (single main memory port)
  - ↳ Solutions: Increase available resources
    - ↳ Multiple main memory ports
    - ↳ Multiple ALUs.

### Data Hazards

- ↳ Conflict in access of an operand location.
- Eg: 2 instr. to be executed ~~in sequence~~ (in pipeline), operand value could be updated so as to produce different result from strict sequential execution.
  - ↳ Pipeline stall for 2 clock cycles.

#### Types:

- ① Read after write (RAW) or true dependency
  - ↳ If an instr. modifies a register or memory location; succeeding inst. read data from that location; hazard if read takes place before write complete.
- ② Write after read (WAR) or anti-dependency
  - ↳ An instr. reads a location; succeeding instr. written there; hazard if write completed before read takes place.
- ③ Write after write (WAH) or output dependency
  - ↳ Two instr. both write to same location; hazard if writes take place in reverse order.

Control Hazard / Branch Hazard.

- ↳ Pipeline makes wrong decision on branch pipeline prediction.
- ↳ Brings instr. into pipeline that must subsequently be discarded.

Dealing with Branched

- ① Multiple streams → 2 pipelined; prefetch each branch into separate pipeline.
- ② Prefetch Branch Target      ↳ Leads to contention delay for access to the registers and memory.
- ③ Loop Buffer
- ④ Branch Prediction
- ⑤ Delayed Branching.

Loop Buffer

- ↳ Contains the  $N$  most recently fetched instructions in sequence.
- ↳ Fetch check buffer before fetching from memory.
- ↳ Similar to cache.
- ↳ No additional memory access time.
- ↳ Well suited to dealing with loops or iterations.

Branch Prediction

Static

Dynamic

① Static Branch Prediction:

- ↳ Not using the execution history as well as the results of execution (use a predefined logic).
- ↳ Predict Never taken      ↳ assume branch does not happen.  
↳ Always fetch next instr.
- ↳ Predict Always taken      ↳ assume branch always happens.  
↳ Always fetch target instr.  
↳ 50% chance for branching.
- ↳ Predict by Opcode      ↳ Some instr. are more likely to result in a jump than others.  
↳ Can get upto 75% success.

② Dynamic Branch Prediction:

- ↳ History of conditional branches in the program was considered.

- (Refined by 2-level or correlation-based branch history)
- ↳ Taken / Not taken switch → one or more bits can be associated with each conditional branch instr. that reflect the recent history of the instr.
  - ↳ Based on previous history.
  - ↳ Good for loops.

- ↳ Correlation-based → In more complex structures, branch direction correlates with that of related branches.
- ↳ Use recent branch history as well.
- ↳ Two-bit branch prediction to record result of last 2 instances of the execution of the associated instruction.

### Branch Prediction State Diagram:



Delayed Branching → Do not jump until you have to.

- ↳ Rearrange instructions without Data hazard.
- ↳ The instr. location immediately following the branch is referred to as delay slot.

Superpipelining → Many pipeline stages perform tasks that require less than half a clock cycle.

- Thus, a doubled internal clock speed allows the performance of 2 tasks in one external clock cycle.

↳ Functions performed in each stage can be split into 2 non-overlapping parts.

↳ Degree 2.



Superscalar → Depends on the ability to execute multiple instructions in parallel.

Instruction-level parallelism: Degree to which, on average, the instruction of a program can be executed in parallel.

# OPERATING SYSTEM

OS Services:

- ① Program creation
- ② Program execution
- ③ Access to I/O devices
- ④ Controlled access to files
- ⑤ System access
- ⑥ Error detection and response.
- ⑦ Accounting

OS: Controls hardware by running inside it.  
 ↳ Just a program.

Types:

① Interactive: User directly interacts with the hardware

② Batch: A user's program is batched other user's.  
 ↳ Submitted by an operator.

③ Single Batch Program (Uni-programming):  
 ↳ Processor is used for executing one program at a time.

④ Multi-programming (Multi-tasking):  
 ↳ several programs are loaded to the memory.  
 ↳ Processor is made to switch b/w them rapidly.

↳ when one

## Single Batch System

↳ Resident Monitor program

↳ User submit jobs to Operator.

↳ Operator batches jobs.

↳ Monitor controls sequence of events to process batch.

↳ when one job is finished, control returns to Monitor which  
 reads next job.

↳ Monitor handles scheduling.

Job Control Language: Instructions to monitor

↳ usually denoted by \$.

↳ features: Monitor Protection

↳ Timer → To prevent a job monopolizing the system.

- ↳ Privileged instructions (e.g., I/O) → only executed by Monitor.
- ↳ Interrupt → Allows for relinquishing and regaining control.

## Multi-programmed Batch systems:

- ↳ I/O devices are very slow.
- ↳ when one program is waiting for I/O, another can use the CPU.

Uniprogramming : 

Multiprogramming: (2 programs)  
 P-A:   
 P-B:   
 Combined: 

Time sharing systems: Allows users to directly interact with the computer (i.e., interactive).  
 ↳ e.g., multiprogramming.

## Scheduling

→ Process: A program in execution.  
 → (In multi-prog.) Providing the sequence of execution among the processes / programs.

- Long term → Decision to add to the pool of processes to be executed.
- Medium term → Decision to add to the no. of processes that are partially or fully in main memory
- Short term → Usually based on need to manage multipro.
- I/O.

The decision as to which available process will be executed by the processor

- ↳ Dispatcher
- Scheduling the use of I/O devices
  - The decision as to which process's pending I/O request shall be handled by an available I/O device.

## 5-stage Process Model:



refined pd between processes for mutual exclusion of resources

### Process Control Block (PCB)

↳ OS maintains info about the state of the process and other info necessary for process execution.

- ↳ PCB →
- Identifier of partition it is running on and its state
  - State
  - Priority:  $\text{Low} \rightarrow \text{High} \rightarrow \text{Priority}$
  - Program Counter
  - Memory Pointers
  - Context Data
  - I/O Status
  - Accounting Info

→ OS gets back control when

① Process A issues a service call (e.g. I/O request) to the OS.

↳ Execution suspended until the call is satisfied by OS.

② Process A causes an interrupt (hardware-generated signal).

↳ Processor ceases to execute A and transfers to the interrupt

handler in OS.

Eg. Timeout, to prevent a process from monopolizing the processor.

③ Some event unrelated to process A that requires attention causes an interrupt.

↳ Eg. completion of an I/O operation.

### Key-Elements of OS:



## Process Scheduling:



## Memory Management:

- Uni-program: Memory split into 2 → one for OS (monitor) one for currently executing program.
- Multi-program: "User" part is sub-divided and shared among active processes.

## Swapping

Problem: I/O is so slow compared to CPU that even in multi-pro. system, CPU can be idle most of the time.

Solutions: Increase main memory → expensive leads to larger problems  
or swapping.

## Swapping:

- Long term queue of processes stored on disk.
- Process swapped in ab space becomes available.
- As a process completes it is moved out of main memory.
- If none of the processes in memory are ready (i.e., all I/O blocked)
  - Swap out a blocked process to immediate queue.
  - Swap in a ready or new process.
  - But swapping is a I/O process.

## Partitioning

↳ Splitting memory into sections to allocate to processes (including OS).

### Fixed Partitioning

↳ May not be equal size.

↳ Process is fitted into smallest hole that will take it (best fit).

↳ Some wasted memory.

↳ Leads to variable sized partitions.



Equal-size partitions      Unequal-size partitions

### Variable Size Partitions

↳ Allocate exactly the required memory to a process.

↳ Leads to a hole at the end of memory → too small to use less waste.

↳ When all processes are blocked, swap out a process and bring in another.

↳ New process may be smaller → Another hole.

↳ Eventually have lots of holes (fragmentation).

### Solution:

Coalesce: Join adjacent holes into one large hole

Compaction/de-fragmentation: From time to time, go through memory and move all holes into one free block.

## Relocation

## Relocation

↳ N

↳ gr

## Relocation

- ↳ No guarantee that process will load into same place in memory.
- ↳ Instructions contain addresses
  - ↳ Locations of data
  - ↳ Addresses for instructions (branching)
- ↳ Logical address: Relative to beginning of program.
- ↳ Physical address: Actual location in memory (this time).
- ↳ Automatic conversion using base address.

## Paging

- ↳ Split memory into equal sized, small chunks — page frames.
- ↳ Split program (process) into equal sized small chunks — pages.
- ↳ Allocate the required no. of page frames to a process.
- ↳ OS maintains list of free frames.
- ↳ A process does not require contiguous page frames.
- ↳ Use pagetable to keep track.

## Virtual Memory

- ↳ Demand paging
  - ↳ Do not require all pages of a process in memory.
  - ↳ Bring in pages as required.
- ↳ Page fault
  - ↳ Required page is not in memory.
  - ↳ OS must swap in required page.
  - ↳ May need to swap out a page to make space.
  - ↳ Select page to throw out based on recent history.

## Thrashing

- ↳ Too many processes in too little memory.
- ↳ OS spends all its time swapping.
- ↳ Little or no real work is done.
- ↳ Disk light is on all the time.

Solutions:

- Good page replacement algorithms
- Reduce no. of processes running
- Fit more memory.

Segmentation → visible to programmer.

- ↳ Paging is (usually) not visible to the programmer.
- ↳ Usually different segments allocated to program and data.
- ↳ May be a no. of program and data segments.
- ↳ Advantages: Simplifies handling of growing data structures.

↳ Allows programs to be altered and recompiled independently without re-linking and re-loading.

↳ Lends itself to sharing among processes.

↳ Lends itself to protection.

↳ Some systems combine segmentation with paging.

↳ Example: VAX/VMS uses segments of size 2^20 bytes and pages of size 2^10 bytes.

program level

Program is divided into segments for protection and sharing.

Segments are mapped in memory.

first slot

Program is first mapped in memory.

Slot mapping is done from 20 bits.

Each word of slot is two words of base memory.

It means no need two words of slot.