

## INDEX

| S.No | Topic Name                                              | Pg.No | CSA 12 - COMPUTER ARCHITECTURE                                                                                                | Pg.No |
|------|---------------------------------------------------------|-------|-------------------------------------------------------------------------------------------------------------------------------|-------|
| 1.   | Functional Units :                                      | 1     | 1. Functional Units :<br>Basic operation of computer<br>Bus structure<br>Types of bus structure                               | 1     |
| 2.   | Factors of performance                                  | 2     | 2. Factors of performance                                                                                                     | 2     |
| 3.   | Evaluation of computers                                 | 3     | 3. Evaluation of computers:<br>* History of computer<br>* Generation of computer<br>* Data representation                     | 3     |
| 4.   | INSTRUCTION SET, INSTRUCTION FORMATS, INSTRUCTION TYPES | 4     | 4. INSTRUCTION SET, INSTRUCTION FORMATS, INSTRUCTION TYPES                                                                    | 4     |
| 5.   | Addressing modes                                        | 5     | 5. Addressing modes                                                                                                           | 5     |
| 6.   | Integer Arithmetic                                      | 6     | 6. Integer Arithmetic:<br>Addition and subtraction<br>Carry look ahead adder                                                  | 6     |
| 7.   | Multiplication and Division                             | 7     | 7. Multiplication and Division:<br>Booth's multiplication<br>Booth's Restoring and non-restoring division                     | 7     |
| 8.   | Floating point                                          | 8     | 8. Floating point:<br>Floating point representation<br>→ single precision representation<br>→ Double precision representation | 8     |
| 9.   | Superscalar Processing:                                 | 14    | 9. Superscalar Processing:<br>Instr. Fetch unit prediction.                                                                   | 14    |
| 10.  | Floating point Multiplication                           | 10    | 10. Floating point Multiplication<br>Floating point Division                                                                  | 10    |
| 11.  | Fundamental concepts of Processing:                     | 11    | 11. Fundamental concepts of Processing:<br>Steps for instruction execution.<br>Execution of complete 'value'.                 | 11    |
| 12.  | Processor Architecture:                                 | 12    | 12. Processor Architecture:<br>Basic Concepts of pipelining.                                                                  | 12    |
| 13.  | Instruction Hazards:                                    | 13    | 13. Instruction Hazards:<br>Unconditional Branches<br>Addition Branch and prediction.                                         | 13    |
| 14.  | Superscalar Processing:                                 | 14    | 14. Superscalar Processing:<br>Instr. Fetch unit prediction.                                                                  | 14    |
| 15.  | Memory:                                                 | 15    | 15. Memory:<br>Random Access Memory<br>Types of RAM<br>Read only Memory<br>Types of ROM                                       | 15    |
| 16.  | Code Memory:                                            | 16    | 16. Code Memory:<br>Operation on code memory.                                                                                 | 16    |
| 17.  | Performance Measures:                                   | 17    | 17. Performance Measures:<br>Mapping function.                                                                                | 17    |
| 18.  | DMA:                                                    | 18    | 18. DMA:<br>DMA → Direct Memory Addressing                                                                                    | 18    |
| 19.  | Virtual Memory:                                         | 19    | 19. Virtual Memory:<br>Address translation restriction bits.<br>Address translation process.                                  | 19    |
| 20.  | Cache Memory                                            | 20    | 20. Cache Memory:<br>Magnetic surface Reading<br>Multilevel memory<br>Memory Allocation                                       | 20    |
| 21.  | CPU Performance                                         | 21    | 21. CPU Performance                                                                                                           | 21    |
| 22.  | Number System                                           | 22    | 22. Number System                                                                                                             | 22    |
| 23.  | Floating Point Operations                               | 23    | 23. Floating Point Operations                                                                                                 | 23    |
| 24.  | Pipelining                                              | 24    | 24. Pipelining                                                                                                                | 24    |
| 25.  | Virtual Memory                                          | 25    | 25. Virtual Memory                                                                                                            | 25    |
| 26.  | Cache Memory                                            | 26    | 26. Cache Memory                                                                                                              | 26    |

CSA12

COMPUTER

Computer architecture is the organisation of the components which make up a computer system and the meaning of the operations which judge its function.



## **COMPUTER TYPES:**

1. DIGITAL COMPUTER
  2. PERSONAL COMPUTER
  3. DESKTOP COMPUTER
  4. NOTEBOOK COMPUTER
  5. SUPERCOMPUTER
  6. MAINFRAMES



### Connections between and the Memory:



**COMPUTER :**  
Input unit → accepts information

## Types Of Bus Structure:

- \* Address Bus  
\* Data Bus  
\* Control Bus



\* Can attack peripheral.

- \* One transfer      \* one/more transfers
  - \* Low cost
  - \* Low performance      \* high cost

being locked. Allows the user to switch rapidly from one device to another. Interworking with several devices.



```

graph TD
    CPU[CPU] <--> Memory[Memory]
    CPU <--> IOUnit[I/O unit]
    Memory <--> IOUnit
    ControlBus[Control Bus] --- CPU
    ControlBus --- Memory
    ControlBus --- IOUnit

```

The diagram illustrates a computer system architecture. At the top is the CPU (Central Processing Unit). Below it are three main components: Memory, I/O unit, and Control Bus. The CPU is connected to both Memory and the I/O unit via bidirectional arrows. The Memory and I/O unit are also connected to each other via a bidirectional arrow. A separate Control Bus is shown at the bottom, with bidirectional arrows connecting it to the CPU, Memory, and I/O unit.

- Store in memory.

Control Bus:

- Memory Read/Write
- I/O Read/Write
- Ready state.

Fetch instruction from memory

Address Bus:  
—  
*Can do*

| DIFFERENCE BETWEEN SYNCHRONOUS AND ASYNCHRONOUS BUS |                                        |
|-----------------------------------------------------|----------------------------------------|
| SYNCHRONOUS                                         | ASYNCHRONOUS BUS                       |
| * Works at a fixed clock rate.                      | * Not dependent on a fixed clock rate. |
| * Affected by clock skew.                           | * Not affected by clock skew.          |
| * Data transfer takes                               | * Data transfer is                     |

Performance  $\rightarrow$  how fast the computer executes

The best way to measure performance is using time. The following aspects are used to measure the performance of the computer

① Execution time (response time):

the time between program starts and the completion of the program execution.

Elapsed time:

time for execution of the program

Processor time:

processor clock controlled by a timing signal

Clock cycle: regular time interval

Hertz (Hz): cycles per second

Throughput: total amount of work done in given time.

Best performance: ① Less execution time

ii) more throughput

Million denoted by mega (M)

Billion denoted by giga (G)

500 million cycles per second = 500 mega Hertz

1050 million cycles/sec = 1.05 Giga Hertz

clock periods = 2 and 0.8 nano seconds (ns)

## Formula - I FACTORS OF PERFORMANCE

$$\text{Performance} = \frac{1}{\text{Execution time}} \quad \text{--- (1)}$$

Reduction in execution time will increase throughput.

- In terms of instruction performance of a computer is directly related to throughput and hence it's reciprocal of execution time.

$$\text{Performance}_A > \text{Performance}_B \quad \text{--- (2)}$$

can be written as

$$\frac{1}{\text{Execution time}_A} > \frac{1}{\text{Execution time}_B}$$

$\Rightarrow A$  is faster than  $B$  or  $B$  is slower than  $A$

$$\frac{\text{Performance}_A}{\text{Performance}_B} = n$$

$$\Rightarrow \boxed{\text{Performance}_A = n \times \text{Performance}_B}$$

$\Rightarrow$  Performance of computer  $A$  is  $n$  times faster than performance of computer  $B$ .

$$\frac{\text{Performance}_A}{\text{Performance}_B} = \frac{\frac{1}{\text{Execution time}_A}}{\frac{1}{\text{Execution time}_B}} = \frac{\text{Execution time}_B}{\text{Execution time}_A}$$

Example:

- ① If a computer A runs a program in 10 sec and computer B runs the same program in 25 sec. Who is faster and how much faster?

$$\frac{\text{Performance}_A}{\text{Performance}_B} = \frac{\text{Execution time}_B}{\text{Execution time}_A} = n$$

$$= \frac{25}{10} = 2.5$$

A is 2.5 times faster than B

$$\text{CPU clock cycle for a program}$$

$$\text{CPU time} = \frac{\text{CPU clock cycle for a program}}{\text{Clock rate}}$$

for a program

Formula - II

- In terms of instruction

$$\text{CPU clock cycle} = \text{No. of instruction} \times \frac{\text{Clock cycle of instruction}}{\text{Clock rate}}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

$$\text{CPU clock cycle for a program} = \frac{\text{CPU clock cycle for a program}}{\text{Number of instruction}} \times \text{Clock rate}$$

- ② Computer A runs a program in 10 sec with 3GHz clock. We have to design a computer B such that it can run the same program within 9 sec. Determine the clock rate for computer B. Computer B's CPU design of computer B requires 1.5 times as many clock cycles as computer A.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

Such that it can run the same program within 9 sec. Determine the clock rate for computer B.

(2)



## INSTRUCTION SET:

### 2. Complex Instruction Set Computer (CISC)

- \* Code is 1s and 0s, or Machine language.
- \* Contains instruction or tasks that control the movement of bits and bytes within the processor.

Example of some instruction sets -

- \* ADD - Add two numbers together.
- \* JUMP - Jump to designated RAM address.
- \* LOAD - Load information from RAM to the CPU.

### Types of Instruction Set

#### 1. Reduced Instruction Set Computer (RISC)

The concept of RISC involves an attempt to reduce execution time by simplifying the instruction set of computers.

- Relatively few instructions.
- Relatively few addressing modes.

### General Instruction Format

| opcode | operand address | operand address |
|--------|-----------------|-----------------|
| 4 bit  | *               | bit             |

- Handwired rather than micro programmed control.
- Fixed length, easily decoded instruction format.

- Handwired rather than micro programmed control.

According to type of operation

No. of Addresses

## INSTRUCTION TYPES:

### Input & Output - Instruction

| Mode  | OPCODE | INPUT & OUTPUT OPERATION |
|-------|--------|--------------------------|
| 15 14 | 12 11  | 0                        |

| MODE  | OPCODE = 111 AND Addressing Mode I = 0 |
|-------|----------------------------------------|
| 15 14 | 12 11 0                                |

- ⇒ Data processing - ALU instruction
- ⇒ Data storage - Memory instruction
- ⇒ Data Movement - Data transfer instruction

### Instruction Formats

#### Zero Address Format

$$X = (A+B) * (C+D)$$

#### One Address Format

$$\begin{aligned} \text{PUSH } A & \text{ TOP} = A \\ \text{PUSH } A & \text{ TOP} = B \\ \text{ADD } TOP & = A+B \\ \text{PUSH } C & \text{ TOP} = C \\ \text{PUSH } D & \text{ TOP} = D \\ \text{ADD } TOP & = C+D \end{aligned}$$

#### Two Address Format

$$\begin{aligned} \text{LOAD } A & \text{ AC} = M[A] \\ \text{ADD } B & \text{ AC} = AC + M[B] \\ \text{STORE } T & M[T] = AC \\ \text{LOAD } C & AC = M[C] \\ \text{ADD } D & AC = AC + M[D] \\ \text{MUL } T & AC = AC * M[T] \\ \text{STORE } X & M[X] = AC \end{aligned}$$

#### Three Address Format

$$\begin{aligned} \text{MOV } R_1, A & \text{ R}_1 = M[A] \\ \text{ADD } R_1, B & R_1 = R_1 + M[B] \\ \text{MOV } R_2, C & R_2 = C \\ \text{ADD } R_2, D & R_2 = R_2 + D \\ \text{MUL } R_1, R_2 & R_1 = R_1 * R_2 \\ \text{MOV } X, R_1 & M[X] = R_1 \end{aligned}$$

### Instruction Formats

Groups of bits used to perform a particular operation on the data.

#### Elements of Instructions



#### 1. Reduced Instruction Set Computer (RISC)

The concept of RISC involves an attempt to reduce execution time by simplifying the instruction set of computers.

- Relatively few instructions.

- Relatively few addressing modes.

Memory access limited to load and store instructions.

All operations done within the register of the CPU.

Single-cycle instruction execution

- Fixed length, easily decoded instruction format.

- Handwired rather than micro programmed control.

- \* 2 address instruction
- 2 operands used

Eg: ADD A, B [A ← A+B]

- \* One address instruction
- Only one operand used

Eg: ADD B [AC] ← B + [AC]

AC = Accumulator

- \* Zero Address Instruction

Eg: CMA [AC] ← [AC]

#### Memory Reference Instruction

| Mode  | OPCODE  | MEMORY ADDRESS |
|-------|---------|----------------|
| 15 14 | 12 11 0 | 0              |

OPCODE = 000 To 110

#### Two Address Format

$$\begin{aligned} \text{MOV } R_1, A & \text{ R}_1 = M[A] \\ \text{ADD } R_1, B & R_1 = R_1 + M[B] \\ \text{MOV } R_2, C & R_2 = C \\ \text{ADD } R_2, D & R_2 = R_2 + D \\ \text{MUL } R_1, R_2 & R_1 = R_1 * R_2 \\ \text{MOV } X, R_1 & M[X] = R_1 \end{aligned}$$

#### Three Address Format

$$\begin{aligned} \text{ADD } L_1, A, B & R_1 = M[A] + M[B] \\ \text{ADD } R_2, C, D & R_2 = M[C] + M[D] \\ \text{MUL } X, R_1, R_2 & M[X] = R_1 * R_2 \end{aligned}$$



## Relation between Addition & Subtraction.

$$(A \pm B) = (\pm A) + (-B)$$

$$(\pm A) - (-B) = (\pm A) + (+B)$$

3 major representation of signed integers:-

\* Sign and magnitude

\* One's complement (1's)

\* Two's complement (2's)

## Binary Signed integer representation:-

| b3, b2, b1, b0 | Sign + magnitude | 1's complement | 2's complement |
|----------------|------------------|----------------|----------------|
| 0000           | +0               | +0             | +0             |
| 0001           | +1               | +1             | +1             |
| 0010           | +2               | +2             | +2             |
| 0011           | +3               | +3             | +3             |
| 0100           | +4               | +4             | +4             |
| 0101           | +5               | +5             | +5             |
| 0110           | +6               | +6             | +6             |
| 0111           | +7               | +7             | +7             |
| 1000           | -0               | -7             | -7             |
| 1001           | -1               | -6             | -6             |
| 1010           | -2               | -5             | -5             |
| 1011           | -3               | -4             | -4             |
| 1100           | -4               | -3             | -3             |
| 1101           | -5               | -2             | -2             |
| 1110           | -6               | -1             | -1             |
| 1111           | -7               | -0             | -0             |

## Addition in 2's complement :-

$$(4) \quad 0100 \quad (-4) \quad 1100 \\ + (3) \quad 0011 \quad + (3) \quad 0011 \\ \hline (-7) \quad 0111 \quad \underline{(-1)} \quad \underline{1111}$$

## Subtraction in 2's complement :-

$$\text{Step 1:- Take 2's complement of } B \\ \text{Step 2:- Result} \Rightarrow A + \text{Step 1 answer} \\ \text{Step 3:- If carry generated result} \rightarrow \text{ignoresign} \\ \text{Step 4:- If no carry, result} \rightarrow \text{neg}$$

## Integer Arithmetic - Addition & Subtraction.

Ex:-

$$(28)_0 - (15)_0$$

$$(28)_0 \quad 011100$$

$$2^5 \text{ complement } 110001$$

$$\begin{array}{r} \text{Ignore } 1 \\ \hline 001101 \end{array} \quad (13)_0$$

$$\text{Overflow: } (-1) + (+7) = -ve \\ \text{condition } (-) + (-) = +ve$$

(i) Half adder :-

| Inputs | Outputs |
|--------|---------|
| A      | Sum     |
| B      | Carry   |
|        | 0       |



(ii) Full adder

| Inputs | Outputs |
|--------|---------|
| A      | Sum     |
| B      | Carry   |
| Cin    |         |



## Carry Look Ahead Adder (CLA).

\* CLA improves Speed by reducing the amount of time required to determine carry bits.

$$\begin{aligned} C_4 &= G_3 + P_3 \cdot C_3 \\ &= G_3 + P_3 \cdot G_2 + P_3 \cdot P_2 \cdot G_1 + P_3 \cdot P_2 \cdot P_1 \cdot G_0 \end{aligned}$$

$$\begin{aligned} C_3 &= G_2 + P_2 \cdot C_2 \\ &= G_2 + P_2 \cdot G_1 + P_2 \cdot P_1 \cdot G_0 \end{aligned}$$

$$\begin{aligned} C_2 &= G_1 + P_1 \cdot C_1 \\ &= G_1 + P_1 \cdot G_0 + P_1 \cdot P_0 \cdot C_0 \end{aligned}$$

$$\begin{aligned} C_1 &= G_0 + P_0 \cdot C_0 \\ &= G_0 + P_0 \cdot G_0 + P_0 \cdot P_0 \cdot C_0 \end{aligned}$$

\* Carry generator ( $G_i$ ) is used to generate the output carry regardless of input carry.

$$P_i = A_i \oplus B_i \quad \text{--- (1)}$$

$$G_i = A_i \cdot B_i \quad \text{--- (2)}$$

$$\begin{aligned} \text{Sum} &= P_i \oplus C_i \\ C_{i+1} &= G_i + P_i \cdot C_i \quad \text{--- (3)} \end{aligned}$$

$$\begin{aligned} \text{For designing } 4\text{-bit CLA, we} \\ \text{need } i = 0, 1, 2, 3 \end{aligned}$$

Figure 4 bit CLA using Full Adder (FA)

$$\begin{aligned} \text{Similarity} \\ C_{i+1} &= G_i + P_i \cdot C_i + P_i \cdot P_{i-1} \cdot G_{i-2} + \dots \\ &\dots + P_i \cdot P_{i-1} \dots P_0 \cdot G_0 + P_i \cdot P_{i-1} \dots P_0 \cdot C_0 \end{aligned}$$



### Multiplication of unsigned numbers:

\* Unsigned multiplication can be viewed as addition of shifted versions of the multiplicand.

\* Product of 2 n-bit numbers is at most a 2n-bit number.

$$\begin{array}{r} 1101 \text{ (3-bit multiplicand)} \\ \times 1101 \text{ (3-bit multiplier)} \\ \hline 10110111 \text{ (9-bit product)} \end{array}$$

\* In general, Booth Algorithm has 3 schemes

- (i) '0' to '1'  $\Rightarrow$  add '1' time shifted 'm'
- (ii) '1' to '0'  $\Rightarrow$  add '+' time shifted 'm'
- (iii) '0' to '0' or '1' to '1'  $\Rightarrow$  add 0

Ex:-

$$\begin{array}{r} 1101 \times (-6) \\ 01101 (+13) \\ \times 11010 (-6) \\ \hline 0000000000000000 \end{array}$$



$$\begin{array}{r} 01101 (m) \\ +1+1+10 \rightarrow \text{Recorded Value} \\ \hline 0000000000 \end{array}$$

Signed multiplication :-

negative

\* Take 2's complement if number is first multiplication is one, otherwise zeros should be padded.

Ex:-  $(-13) \times 6^{11}$

$$\begin{array}{r} 10011 \text{ (3-bit multiplier)} \\ \times 10111 \text{ (3-bit multiplicand)} \\ \hline 1110110010 \text{ (Product)} \end{array}$$

Booth multiplier table:-

| Multiplicand<br>i | i-1 | Version of multiplicand selected by bit i |               |
|-------------------|-----|-------------------------------------------|---------------|
|                   |     | 0                                         | 1             |
| 0                 | 0   | $0 \times M$                              | $+1 \times M$ |
| 1                 | 0   | $-1 \times M$                             | $0 \times M$  |
| 1                 | 1   |                                           |               |

Booth multiplier :-

$$\begin{array}{r} 1110000011111100 \\ \text{ordinary multiplier!} \\ 10001110011110010 \\ \text{Worst case multiplier!} \\ 010101010101010101 \\ \hline 1101110001 (-143) \\ \text{Product} \end{array}$$

### Booth Algorithm :-

\* To general, Booth Algorithm has

Division :-  
Decimal :-

Binary:-

$$\begin{array}{r} 13 \overline{) 21} \\ 26 \\ \hline 14 \\ 13 \\ \hline 1 \end{array}$$

$$\begin{array}{r} 1101 \overline{) 1010} \\ 1101 \\ \hline 10000 \\ 1101 \\ \hline 110 \end{array}$$

Shift 00001  
Subtract 11101  
Set  $q_0 0$  Restore 11

Shift 00010  
Subtract 11101  
Set  $q_1 1$  Restore 11

Shift 00001  
Subtract 11101  
Set  $q_2 0$  Restore 11

First cycle.

$$\begin{array}{r} 0000000000000000 \\ \text{Shift 00010} \\ 0000000000000000 \\ \text{Subtract 11101} \\ \text{Set } q_3 0 \text{ Restore 11} \\ \text{Shift 00001} \end{array}$$

Second cycle.

Circuit Arrangement for binary division :-



(i) Restoring Division:-

Step 1:- Shift A and Q left one binary position

Step 2:- Subtract M from A, A place result back in A

Steps:- If the sign of A is 1, set  $q_0$  to 0 and M back to A (Restore A); otherwise set  $q_0$  to 1

Step 4:- Repeat above steps n times.

Ex:-

$$\begin{array}{r} \text{Initially } 00000 1000 \\ \text{Shift 00001} 00000 \\ \text{Subtract 11101} 00000 \\ \text{Set } q_0 0 \text{ Restore 11} \end{array}$$

$$\begin{array}{r} 11100 00000 \\ \text{Shift 00011} 00000 \\ \text{Subtract 11101} 00000 \\ \text{Set } q_1 1 \text{ Restore 11} \end{array}$$

$$\begin{array}{r} 00000 00000 \\ \text{Shift 00001} 00000 \\ \text{Subtract 11101} 00000 \\ \text{Set } q_2 0 \text{ Restore 11} \end{array}$$

$$\begin{array}{r} 00000 00000 \\ \text{Shift 00011} 00000 \\ \text{Subtract 11101} 00000 \\ \text{Set } q_3 1 \text{ Restore 11} \end{array}$$

$$\begin{array}{r} 00000 00000 \\ \text{Shift 00001} 00000 \\ \text{Subtract 11101} 00000 \\ \text{Set } q_4 0 \text{ Restore 11} \end{array}$$

$$\begin{array}{r} 00000 00000 \\ \text{Shift 00011} 00000 \\ \text{Subtract 11101} 00000 \\ \text{Set } q_5 1 \text{ Restore 11} \end{array}$$

1st cycle

2nd cycle

3rd cycle

4th cycle.

Ex:-

$$\begin{array}{r} 8/3 \\ \text{11110000} \\ \text{11} \\ \hline 10 \\ \text{3: } 0011 \end{array}$$

## Floating point Representation

- \* The binary point and the numbers are called floating point numbers. IEEE Standard for Floating Point The standards for representing floating point numbers is given.

→ The floating point representation has 3 fields:-

\*SIES

### \* SIGNIFICANT DIGITS

\* EXPONENTI

**Example:-**

Let us

11181: 1888B11B to be Represented

### Floating Point Format

\* First binary point is shifted

to right of the first bit

\* Next the number is...  
" " Correct Scaling factor to

the same life.

get same view

Normalized form :-

010001 . 10111

卷之三

1. 1118100010 x 23  
- 2 → expect  
└ Scaling

1-1101010110 → Sign bit factor

•  digits

Note: Box of sealing material.

the following factors is

卷之二

the setting of significant digits

are known as Mantissa.

In the above example,

SIGN = 0

**EXPO** = **1000000**

Note:- Bias value is added to the true exponent. This solves the problem of negative exponent.

**IEEE Standard for Floating Point numbers**:-

- The standards for representing floating point numbers in 32-bit and 64-bits have been developed by IEEE.
- Two types :-
- \* 32-bit standard representation
- \* Single-precision Representation
- \* 64-bit Standard Representation
- Double Precision Representation

**SINGLE PRECISION REPRESENTATION**

occupies single 32 bit word

3 fields

SIGN → 1 bit

EXONENT → 8-bits

MANTISSA → 23-bits

$E' = E + \text{Bias}$

Where E → scaling factor  
Bias = 127.

**E' = E + 127** This representation is called excess -127 format.

E' ranges from 0 to 254.

Thus range of E' for normal values in Single Precision is  $\alpha E' < 254$ . i.e.  $-126 \leq E \leq 127$

Value =  $\pm 1.m \times 2^{E-127}$

.. 0 signifies +  
1 signifies -

|        |        |                 |
|--------|--------|-----------------|
| 1 sign | 8 bits | 23-bit mantissa |
| S      | E'     | M               |

Double Precision Representation

1) 64 bit standard representation  
→ occupies two 32-bit words

3 fields

|             |                  |                   |
|-------------|------------------|-------------------|
| SIGN → 1bit | EXPONENT → 1-bit | MANTISSA → 52 bit |
|-------------|------------------|-------------------|

3)  $E' = E + 1023$  here bias = 1023  
Hence, it is called excess-1023 format.

4) The range of  $E'$ .  
i.e.  $0 \leq E' \leq 2047$ .

The actual exponent  $E$  is  $-1023 \leq E \leq 1023$ .

Two ways to store Data:  
big endian and little endian

Starting Address (A)  
low order bytes  
high order bytes

Next Address (A+1)  
starting address (A)  
high order bytes  
low order bytes

Example: 1

Represent 125.9 - 125.10 in single precision and double precision formats  
Step 1: Convert decimal number in binary format.  
Binary Part : 12510  
Integer Part : 125910

## FLOATING POINT OPERATION

### Floating Point Operations

It consists of 4 types  
Addition, Sub, Multiplication, Division

### Floating Point Arithmetic operation

#### Floating Point Addition:

Divide the following two decimal numbers in scientific notation

$9.95 \times 10^1$

$8.10 \times 10^{-1}$

Step 1: Rewrite the smaller number such that its exponent matches with the exponent of larger number

$8.10 \times 10^{-1} \rightarrow 0.810 \times 10^0$

Step 2: Add the mantissas

$9.95 + 0.810 = 10.765$

Sum =  $10.765 \times 10^0$

Step 3: Put the result normalized from

$10.765 \times 10^0 = 1.0765 \times 10^1$  (normalised)

such that its exponent matches with the exponent of larger number

$1.0765 \times 10^1 \rightarrow 1.0765 \times 10^1$

Step 4: Round the result

If the mantissa does not fit in the space

needed for it, it has to be rounded off.

e.g.: Many 4 digits are allowed for mantissa

$1.00037 \times 10^0 \rightarrow 1.0004 \times 10^0$

Example 1: Addition in binary:

Perform  $0.5 + (-0.4375)$   $0.5 - 0.4375$

$1.0000 \times 2^{-1} - 0.1100 \times 2^{-4} = 0.0001 \times 2^{-1}$

Step 1: Normalize the sum, checking for overflow/underflow

$0.0001 \times 2^{-1} = 1.000 \times 2^{-4}$

Step 2: Add the mantissas

$0.5 - 0.4375 = 0.0625$

Check:  $1.000 \times 2^{-4} = 0.0625$  which is equal to  $0.5 - 0.4375$  correct!

Step 3: Round the result

$0.0625 \rightarrow 0.063$

Step 4: Set the sign of the result

Done

Step 3: Put the result normalized from

$10.037 \times 10^1 = 1.0037 \times 10^2$  (normalised)

such that its exponent matches with the exponent of larger number

$8.10 \times 10^{-1} \rightarrow 0.810 \times 10^0$

Step 4: Round the result

Its exponent matches with the exponent of larger number

$0.810 \times 10^0 \rightarrow 0.810 \times 10^0$

Step 5: Rewrite the smaller number

such that its exponent matches with the exponent of larger number

$-1.10 \times 2^{-2} = -0.110 \times 2^{-1}$

Step 6: Add the mantissas

$0.810 + (-0.110) = 0.700$

Step 7: Round the result

$0.700 \rightarrow 0.701$

Step 8: Set the sign of the result

Done

Step 1: Subtract the following two decimal numbers in scientific notation

$8.10 \times 10^{-1}$  with  $9.95 \times 10^0$

Step 2: Rewrite the smaller number

such that its exponent matches with the exponent of larger number

$-1.10 \times 2^{-2} = -0.110 \times 2^{-1}$

Step 3: Normalize the sum, checking for overflow/underflow

$0.810 \times 2^{-1} = 1.000 \times 2^{-4}$

Step 4: Round the sum, the sum fits 4 bits so rounding is not required

$1.000 \times 2^{-4} = 0.0625$  which is equal to  $0.5 - 0.4375$  correct!

Step 5: Round the result

$0.0625 \rightarrow 0.063$

Step 6: Set the sign of the result

Done

Step 1: Subtract the following two decimal numbers in scientific notation

$8.10 \times 10^{-1}$  with  $9.95 \times 10^0$

Step 2: Multiply the significands

Step 3: Normalize the product

Step 4: Overflow if yes, raise exception

Step 5: Round the significant to appropriate # of bits

Step 6: If not still normalized to back to step 5

Step 7: Set the sign of the result

Done

### Floating Point Subtraction

### Floating Point Multiplication

Step 1: Add exponents & subtract bias

Step 2: Multiply the significands

Step 3: Normalize the product

Step 4: Overflow if yes, raise exception

Step 5: Round the significant to appropriate # of bits

Step 6: If not still normalized to back to step 5

Step 7: Set the sign of the result

Done



Example-1

Multiply the following two numbers in scientific notation

$$1.10 \times 10^{10} \times 9.200 \times 10^{-5}$$



## FLOATING POINT MULTIPLICATION

- Add the exponents to find new exponent =  $10 + (-5) = 5$

If we add biased exponents, bias will be added twice therefore we need to subtract it once to compensate.

$$\text{Bias} = (6 + 127) + (-5 + 127) = 259$$

$$259 - 127 = 132 \text{ which is } C5 + 127.$$

- Multiply the mantissas:

$$1.10 \times 9.200 = 10.212000$$

Can only keep three digits to the right of the decimal point so the result is  $10.212 \times 10^5$ .

- Normalise the result

$$1.0212 \times 10^6$$

- Round it  $1.021 \times 10^6$

**Start**

Add exponents and subtract bias

Multiply the significands

Normalise the product

Overflow? Yes → Raise exception  
No → Normalised

Round the significant to # of bits  
No → Normalised  
Yes → set the sign of the result

**Stop**

## Example: a multiplication in binary

$$1.000 \times 2^{-1} \times -1.110 \times 2^{-2}$$

- raise exception

Step 5 Round the significant to the appropriate number of bits

Step 6 If not still normalised go back to step 3

Step 7

- Set the sign of the result.

$$10 - (-5) = 10 + 5 = 15$$

- Divide the significants

$$1.110 \div 9.200$$

$$= 0.1120652173$$

- Normalize the result

$$0.1120652173 \times 10^{15}$$

- Round it

$$1.110 \times 2^{-3}$$

- Subtract the biased exponent of the two numbers adding the bias from the sum to get the new biased exponent

$$-126 + \text{exponent} = 254$$

- Round the result

$$1.110 \times 2^{-3}$$

- Adjust the sign.

Since the original signs are different the result will be negative

$$-1.110 \times 2^{-3}$$

## Floating Point Division

- Subtract the exponents

and odd bias

- Divide the significands

Step 3 Normalise the quotient

Step 4 If there is a overflow,

- raise exception

Step 5 Divide the following two numbers in scientific notation

$$1.110 \times 10^{10} \div 9.200 \times 10^{-5}$$

- Subtract the exponent to find the new exponent

$$10 - (-5) = 10 + 5 = 15$$

- Normalize the quotient

$$1.110 \div 9.200$$

- Round it

$$0.1120652173 \times 10^{15}$$

- Normalize the quotient

$$0.1120652173 \times 10^{15}$$

- Round it

$$1.110 \times 2^{-3}$$

- Normalize the quotient

$$1.110 \times 2^{-3}$$

10

Done

## Fundamental Concepts of Processing

A Unit which executes machine instructions and coordinates the activities of other units is called Instruction Set Processor (ISP).

### Steps Involved in Instruction Execution:

- Fetch the contents of memory location pointed to by the PC.

$$IR \leftarrow [I[PC]]$$

- Increment the value of PC by 4.

$$PC \leftarrow [I[PC]] + 4$$

- Carryout the action according to IR.

### Single Bus Organization:



## 1. Register Transfers

Example: Move (R1), R2

Control Signals: 1. Read, MARin, Read

2. MDRin, WMFC

3. MDRout, R2in

- \* For each register 2 control signals are used
- \* Ex:  $R_1 \rightarrow R_{1in}, R_{1out}$ .  
MAR  $\rightarrow$  MARin, MARout.

### 2. Performing an ALU operation

Example: Move R1, R4

0 - class

- \* Equivalent control signals are

$$R_{1out}, R_{4in}$$

### 3. Fetching a Word from Memory:

Example for Register Transfer: 1- open

Move R1, R4

Control signals: 1. Read, MARin

2. R1out, MDRin, Write

3. MDRout, WMFC.

### Execution of a Complete Instruction:

Example: Add (R3), R1

Steps involved:

1. Fetch the instruction
2. Fetch the first operand
3. Perform the addition
4. Load the result into R1.

### Control Sequence for instruction Add (R3), R1



## Multiple Bus Organization (II)

\* One bus — one data transfer.

\* Multiple bus — multiple data transfer at same time.

- \* Register File — All general-Purpose Registers are combined into a single bus.



### 4. Storing a Word in Memory:

Example: Move R2, (R1).

Control signals: 1. Read, MARin

2. R2out, MDRin, Write

3. MDRout, WMFC.

### 5. Fetching a Word from Memory:

Control signals: 1. Read, MARin, Read

2. MDRin, WMFC

3. MDRout, R2in, End.

### 6. Increment the Value of PC by 4.

Control signals: 1. Read, MARin, Read

2. MDRin, WMFC

3. MDRout, R2in, End.

### 7. Increment the Value of PC by 4.

Control signals: 1. Read, MARin, Read

2. MDRin, WMFC

3. MDRout, R2in, End.

### 8. Increment the Value of PC by 4.

Control signals: 1. Read, MARin, Read

2. MDRin, WMFC

3. MDRout, R2in, End.

### Control Sequence for Instruction Add R4, R5

Step Action

1. PCout, MARin, Read, SelectA, Add, Zin

2. Read, PCin, Yin, WMFC

3. MDRout, IRin

4. R2out, MARin, Read,

5. Read, PCin, Yin, WMFC

6. MDRout, SelectY, Add, Zin

7. Zout, R1in, End.

### Control Sequence for Instruction Add R4, R5

Step Action

1. PCout, R=B, MARin, Read, Inc PC

2. WMFC

3. MDRoutB, R=B, IRin

4. R4out, RSoutB, SelectA, ADD, R5in, End.

## PROCESSOR ARCHITECTURE

- \* It has instruction unit which fetches instructions from main memory.
- \* Has separate processing units to deal with integer data & floating-point data.
- \* Data Cache & instruction Cache is inserted between these units & Main Memory.
- \* Processor is connected to the system bus.
- \* Can include several units to increase the potential for concurrent operations.



## Basic Concepts of Pipelining:

- \* One way to improve performance is to use faster circuit technology to build processor.

- \* Another way - to arrange the hardware so that more than one operation can be performed.
- \* Pipelining - Effective way of organizing concurrent activity in a computer system.

### 2 - Stage Pipelining



### (a) Sequential Execution



### (b) Hardware Organization



### (c) Pipelined Execution



### 4- Stage Pipelining

- \* For a variety of reasons, one of the pipeline stages may not be able to complete its processing tasks in the time allotted.

Hazard:  
\* If we change the result of one instruction to be forwarded directly for use in step F<sub>2</sub> of 2<sup>nd</sup> instruction.

### Types of Hazards:

1. Data Hazard
2. Instruction Hazard
3. Structural Hazard.



### Operand Forwarding Technique

- \* Delay can be reduced, or possibly eliminated if we change the result of one instruction to be forwarded directly for use in step F<sub>2</sub> of 2<sup>nd</sup> instruction.

## Data Hazard:

- \* It is a situation in which the pipeline is stalled because the data to be operated on are delayed for some reason.

### Example: 1

Example: 2

- A ← 3 + 4      A ← B × C  
B ← 4 × A      B ← 20 + C

### Example: 3

- Mul R2, R3, R4  
Add R5, R6, R6.



- \* Most of the modern processors use separate cache for instruction and data.
- \* Bus interface used to connect the processor with external environment.

## (a) Hardware Organization

### Handling Hazards using Software

- \* Compiler automatically introduces NOP (no operation) when it detects a dependency between instructions.

### Side Effects:

- \* When a location other than one explicitly named in an instruction as a destination operand is affected, the instruction is said to have a side effect.

## INSTRUCTION HAZARDS

- \* Occurs if instruction stream is interrupted.

- \* Branch instruction may also cause the pipeline to stall.

### Unconditional Branches:

- \* Time lost as a result of branch instruction is known as branch penalty.
- \* Reducing the branch penalty required the branch address to be computed earlier.
- \* IF unit has dedicated hardware to identify branch instruction & compute target address.



- \* If unit keeps the instruction queue filled at all times to reduce delay.

## INSTRUCTION HAZARDS

### (b) Reordered Instructions

- \* Prediction decision may change depending on execution history.

- \* Objective: To reduce the probability of making a wrong decision.

### B1 - 2 State Algorithm



- \* Branch prediction deviation is always the same every time a given instruction is executed.

### (a) Original Program Loop

## B. Dynamic Branch Prediction: (12)

- \* Prediction decision may change depending on execution history.

- \* Objective: To reduce the probability of making a wrong decision.

### K1 - Branch is likely to be taken



- \* Either set or cleared by many instructions.

- \* Provides flexibility in reordering instructions.

- \* Effectively utilized by compilers.

## SUPERSCALAR PROCESSING

Instruction Fetch Unit - Capable of reading 2 instructions & store in IQ.

Out-of-order Execution:

Execution Completion:

(14)

- \* Equip the Processor with multiple Processing Units to execute in parallel.

- \* Multiple Issue: Several Instructions Start execution in the same clock cycle.
- \* Superscalar Processor: Instruction execution throughput more than one instruction per cycle.

- \* Many modern high-performance Processors use this approach.

- \* Instruction Queue should be filled always than one instruction at a time.
- \* Compiler can avoid many Hazards through judicious selection & ordering of Instructions



|              |       |       |          |          |          |       |
|--------------|-------|-------|----------|----------|----------|-------|
| $I_1$ (Fadd) | $F_1$ | $D_1$ | $E_{1A}$ | $E_{1B}$ | $E_{1C}$ | $W_1$ |
| $I_2$ (Add)  | $F_2$ | $D_2$ | $E_{2A}$ | ---      | $W_2$    |       |
| $I_3$ (Fsub) | $F_3$ | $D_3$ | $E_{3A}$ | $E_{3B}$ | $E_{3C}$ | $W_3$ |
| $I_4$ (Sub)  | $F_4$ | $D_4$ | ---      | $E_4$    | $W_4$    |       |

### (a) Delayed Write

clockcycle 1 2 3 4 5 6 7 → Time

|              |       |       |          |          |          |       |
|--------------|-------|-------|----------|----------|----------|-------|
| $I_1$ (Fadd) | $F_1$ | $D_1$ | $E_{1A}$ | $E_{1B}$ | $E_{1C}$ | $W_1$ |
| $I_2$ (Add)  | $F_2$ | $D_2$ | $E_{2A}$ | $E_{2B}$ | $E_{2C}$ | $W_2$ |
| $I_3$ (Fsub) | $F_3$ | $D_3$ | $E_{3A}$ | $E_{3B}$ | $E_{3C}$ | $W_3$ |
| $I_4$ (Sub)  | $F_4$ | $D_4$ | $E_{4A}$ | $E_{4B}$ | $E_{4C}$ | $W_4$ |

### Dispatch Operations:

- \* Will ensure whether all the resources needed for the execution of an instruction are available.

- \* Should instructions be dispatched out-of-order?

- \* If allowed - deadlock will arise.

- \* Issuing instructions out-of-order → increases the complexity of dispatch unit.
- \* Must Processors use only in-order dispatching

- \* Conclusion - Superscalar Processing increases the throughput of the execution of instructions out-of-order & update results in-order, which in turn increases the speed of the computation.

### A Processor with Two execution units

- \* To guarantee consistency - Results of instructions must be written in Program Order.
- \* By using Delayed write technique (or) by using temporary Registers, we can execute out-of-order & update results in-order, which in turn increases the speed of the computation.

## RANDOM ACCESS MEMORY :

- \* Accessed for a Read or Write operation.
- \* Consists of memory cells.

### Semi-Conductor Materials (VLSI)

- \* More reliable.
- \* Works faster.
- \* More Cost.



## LATCH.



- \* Read operations.
- \* Write operations.

### Advantages of SRAM:

- \* Consumes very low power.
- \* Refreshing Not required.
- \* Can be accessed very quickly.
- \* 8-16 times faster and costlier than DRAM.

### 2) Dynamic RAM (DRAM)

- \* Do not retain their state indefinitely.
- \* Less expensive - simpler cells are used.
- \* Information stored - charge on capacitor.
- \* Periodic refreshing is required.
- \* Problem - capacitor charge leakage.

### Internal Organization of Memory chips:

- \* Memory cells organized as arrays.
- \* Each cell stores one-bit information.
- \* Row- Word Line
- \* Column- bit line
- \* Sense/Write Circuit - sense/ read data
- \* Control signals - R/W, CS.



### Types Of RAM

#### 1) Static RAM (SRAM)

- \* Retains their state as long as power is applied.

## MEMORY

### 1.) Asynchronous DRAM:

- \* Operations not synchronized with clock.
- \* Special memory controller - RAM, CAS.
- \* Need refresh circuit.
- \* High density and low cost.

### 2.) Synchronous DRAM:

- \* Operations directly synchronized with clock.
- \* Has built-in refresh circuit.
- \* Different modes of operation.

### 3.) PROM:

- \* Programmable ROM
- \* Data can be loaded by the user.
- \* Initially empty - All cells contains '0'.
- \* Break the fuse as per requirement.
- \* Requires high-current pulse
- \* Stand data can be erased.
- \* New data can be loaded/ written.
- \* For erasing - expose to UV light.
- \* Physically removed from the circuit.

### 4.) EEPROM: Erasable Programmable ROM.



### READ - ONLY MEMORY (ROM)

- \* Retain information even it is switched off.
- \* Special writing process is needed.
- \* Used to "boot" the system.
- \* Non volatile and costly.



## Types of ROM (15)

### 1) ROM:

- \* WORM device - write once read many.
- \* Data is written by manufacturer.
- \* Not rewritable - break the fuse once.

### 2) EEPROM:

- \* Programmable ROM

### 3) EPROM:

- \* Erasable Programmable ROM

### 4) PROM:

- \* Data can be loaded by the user.

## CACHE MEMORY

### CACHE MEMORY:

- Speed of processors are high when compared with speed of main memory.
- The processor speed is the good performance parameter, so reduce waiting time by main memory.
- Cache memory is the efficient solution.

### EFFECTIVENESS OF CACHE MECHANISM

#### \* Locality of reference (LOR)

- Property of computer programs.
- + Many instructions in localized areas of program executed repeatedly during time period.

- + Remainder of the program accessed infrequently.

#### \* Types of LOR

1. Temporal LOR
2. Spatial LOR

- + Temporal LOR - Recently executed instruction is likely to be executed again soon.
- + Spatial LOR - Instructions in close proximity to recently executed instruction will be executed soon.

- \* By placing active segments of program in fast cache of memory, total execution time is reduced.

### OPERATION OF A CACHE MEMORY:

- \* The memory control circuit is designed to use LOR property.
- \* Temporal LOR - When instruction/data needed, bring from cache + hopefully remain.

- \* Spatial LOR - Fetching several items reside at adjacent addresses (Blocks of data)

- \* cache block / Cache line.



### READ / WRITE IN CACHE:

#### \* READ MISS / WRITE MISS

- (+ request word not in cache (read))
- (\* Block of words in requested word copied to cache.)

- (+) Particular word sent to processor.

- (2) Particular word sent to processor immediately read from main memory stored through Early restart

- (\* If requested word not in cache during write - Write Miss)

- (\* Write-back / write-through is used)

### MAPPING FUNCTIONS:

#### (1) Direct Mapping:



#### (2) Associative Mapping:

#### (3) Set-Associative Mapping:



### READ / WRITE IN CACHE:

#### \* READ MISS / WRITE MISS

- \* consider cache has 128 blocks

- main memory 4096 blocks
- cache memory value = block  $j \times 128$

- \* 0, 128, 256 - block 0 of cache

- \* 1, 129, 257 - block 1 of cache

16

- \* Tag bits
- \* cost is higher than direct mapped
- \* Associative search method used

### READ / WRITE IN CACHE:

#### \* READ MISS / WRITE MISS

#### (1) Direct Mapping:

#### (2) Associative Mapping:

#### (3) Set-Associative Mapping:

#### (4) Set-Associative Mapping:

#### (5) Set-Associative Mapping:

#### (6) Set-Associative Mapping:

#### (7) Set-Associative Mapping:

#### (8) Set-Associative Mapping:

#### (9) Set-Associative Mapping:

#### (10) Set-Associative Mapping:

#### (11) Set-Associative Mapping:

#### (12) Set-Associative Mapping:

#### (13) Set-Associative Mapping:

#### (14) Set-Associative Mapping:

#### (15) Set-Associative Mapping:

#### (16) Set-Associative Mapping:

#### (17) Set-Associative Mapping:

#### (18) Set-Associative Mapping:

#### (19) Set-Associative Mapping:

#### (20) Set-Associative Mapping:

#### (21) Set-Associative Mapping:

#### (22) Set-Associative Mapping:

#### (23) Set-Associative Mapping:

#### (24) Set-Associative Mapping:

#### (25) Set-Associative Mapping:

#### (26) Set-Associative Mapping:

#### (27) Set-Associative Mapping:

#### (28) Set-Associative Mapping:

#### (29) Set-Associative Mapping:

#### (30) Set-Associative Mapping:

#### (31) Set-Associative Mapping:

#### (32) Set-Associative Mapping:

#### (33) Set-Associative Mapping:

#### (34) Set-Associative Mapping:

#### (35) Set-Associative Mapping:

#### (36) Set-Associative Mapping:

#### (37) Set-Associative Mapping:

#### (38) Set-Associative Mapping:

#### (39) Set-Associative Mapping:

#### (40) Set-Associative Mapping:

#### (41) Set-Associative Mapping:

#### (42) Set-Associative Mapping:

#### (43) Set-Associative Mapping:

#### (44) Set-Associative Mapping:

#### (45) Set-Associative Mapping:

#### (46) Set-Associative Mapping:

#### (47) Set-Associative Mapping:

#### (48) Set-Associative Mapping:

#### (49) Set-Associative Mapping:

#### (50) Set-Associative Mapping:

#### (51) Set-Associative Mapping:

#### (52) Set-Associative Mapping:

#### (53) Set-Associative Mapping:

#### (54) Set-Associative Mapping:

#### (55) Set-Associative Mapping:

#### (56) Set-Associative Mapping:

#### (57) Set-Associative Mapping:

#### (58) Set-Associative Mapping:

#### (59) Set-Associative Mapping:

#### (60) Set-Associative Mapping:

#### (61) Set-Associative Mapping:

#### (62) Set-Associative Mapping:

#### (63) Set-Associative Mapping:

#### (64) Set-Associative Mapping:

#### (65) Set-Associative Mapping:

#### (66) Set-Associative Mapping:

#### (67) Set-Associative Mapping:

#### (68) Set-Associative Mapping:

#### (69) Set-Associative Mapping:

#### (70) Set-Associative Mapping:

#### (71) Set-Associative Mapping:

#### (72) Set-Associative Mapping:

#### (73) Set-Associative Mapping:

#### (74) Set-Associative Mapping:

#### (75) Set-Associative Mapping:

#### (76) Set-Associative Mapping:

#### (77) Set-Associative Mapping:

#### (78) Set-Associative Mapping:

#### (79) Set-Associative Mapping:

#### (80) Set-Associative Mapping:

#### (81) Set-Associative Mapping:

#### (82) Set-Associative Mapping:

#### (83) Set-Associative Mapping:

#### (84) Set-Associative Mapping:

#### (85) Set-Associative Mapping:

#### (86) Set-Associative Mapping:

#### (87) Set-Associative Mapping:

#### (88) Set-Associative Mapping:

#### (89) Set-Associative Mapping:

#### (90) Set-Associative Mapping:

#### (91) Set-Associative Mapping:

#### (92) Set-Associative Mapping:

#### (93) Set-Associative Mapping:

#### (94) Set-Associative Mapping:

#### (95) Set-Associative Mapping:

#### (96) Set-Associative Mapping:

#### (97) Set-Associative Mapping:

#### (98) Set-Associative Mapping:

#### (99) Set-Associative Mapping:

#### (100) Set-Associative Mapping:

#### (101) Set-Associative Mapping:

#### (102) Set-Associative Mapping:

#### (103) Set-Associative Mapping:

#### (104) Set-Associative Mapping:

#### (105) Set-Associative Mapping:

#### (106) Set-Associative Mapping:

#### (107) Set-Associative Mapping:

#### (108) Set-Associative Mapping:

#### (109) Set-Associative Mapping:

#### (110) Set-Associative Mapping:

#### (111) Set-Associative Mapping:

#### (112) Set-Associative Mapping:

#### (113) Set-Associative Mapping:

#### (114) Set-Associative Mapping:

#### (115) Set-Associative Mapping:

#### (116) Set-Associative Mapping:

#### (117) Set-Associative Mapping:

#### (118) Set-Associative Mapping:

#### (119) Set-Associative Mapping:

#### (120) Set-Associative Mapping:

#### (121) Set-Associative Mapping:

#### (122) Set-Associative Mapping:

#### (123) Set-Associative Mapping:

#### (124) Set-Associative Mapping:

#### (125) Set-Associative Mapping:

#### (126) Set-Associative Mapping:

#### (127) Set-Associative Mapping:

#### (128) Set-Associative Mapping:

#### (129) Set-Associative Mapping:

#### (130) Set-Associative Mapping:

#### (131) Set-Associative Mapping:

#### (132) Set-Associative Mapping:

#### (133) Set-Associative Mapping:

#### (134) Set-Associative Mapping:

#### (135) Set-Associative Mapping:

#### (136) Set-Associative Mapping:

#### (137) Set-Associative Mapping:

#### (138) Set-Associative Mapping:

#### (139) Set-Associative Mapping:

#### (140) Set-Associative Mapping:

#### (141) Set-Associative Mapping:

#### (142) Set-Associative Mapping:

#### (143) Set-Associative Mapping:

#### (144) Set-Associative Mapping:

#### (145) Set-Associative Mapping:

#### (146) Set-Associative Mapping:

#### (147) Set-Associative Mapping:

#### (148) Set-Associative Mapping:

#### (149) Set-Associative Mapping:

#### (150) Set-Associative Mapping:

#### (151) Set-Associative Mapping:

#### (152) Set-Associative Mapping:

#### (153) Set-Associative Mapping:

#### (154) Set-Associative Mapping:

#### (155) Set-Associative Mapping:

#### (156) Set-Associative Mapping:

#### (157) Set-Associative Mapping:

#### (158) Set-Associative Mapping:

#### (159) Set-Associative Mapping:

#### (160) Set-Associative Mapping:

#### (161) Set-Associative Mapping:

#### (162) Set-Associative Mapping:

#### (163) Set-Associative Mapping:

#### (164) Set-Associative Mapping:

#### (165) Set-Associative Mapping:

#### (166) Set-Associative Mapping:

#### (167) Set-Associative Mapping:

#### (168) Set-Associative Mapping:

#### (169) Set-Associative Mapping:

#### (170) Set-Associative Mapping:

#### (171) Set-Associative Mapping:

#### (172) Set-Associative Mapping:

#### (173) Set-Associative Mapping:

#### (174) Set-Associative Mapping:

#### (175) Set-Associative Mapping:

#### (176) Set-Associative Mapping:

#### (177) Set-Associative Mapping:

#### (178) Set-Associative Mapping:

#### (179) Set-Associative Mapping:

#### (180) Set-Associative Mapping:

#### (181) Set-Associative Mapping:

#### (182) Set-Associative Mapping:

#### (183) Set-Associative Mapping:

#### (184) Set-Associative Mapping:

#### (185) Set-Associative Mapping:

#### (186) Set-Associative Mapping:

#### (187) Set-Associative Mapping:

#### (188) Set-Associative Mapping:

#### (189) Set-Associative Mapping:

#### (190) Set-Associative Mapping:

#### (191) Set-Associative Mapping:

#### (192) Set-Associative Mapping:

#### (193) Set-Associative Mapping:

#### (194) Set-Associative Mapping:

#### (195) Set-Associative Mapping:

#### (196) Set-Associative Mapping:

#### (197) Set-Associative Mapping:

#### (198) Set-Associative Mapping:

#### (199) Set-Associative Mapping:

#### (200) Set-Associative Mapping:



## VIRTUAL MEMORY

### VIRTUAL MEMORIES :-

- \* Main Memory is small compared to processor.
- \* When program does not completely fit into main memory, not executed parts stored in secondary storage devices.
- \* When new segment of program coming inside main memory, need to replace existing content.
- \* Technique to move program & data into physical main memory is virtual memory.
- \* Binary addresses generated by processor is virtual/logical address.



INPUT/OUTPUT

ORGANIZATION

## Accessing I/O devices.

- Connect I/O devices
  - Three sets of lines



I/O Interface

- Control signals
    - Requested read or a write operation
    - Requested data are transferred over the data lines.

Memory Mapped I/O

- Memory Mapped I/O  
- Portions of address space are assigned to I/O devices -- reads and writes to those addresses are interpreted



Required to go

- The diagram illustrates the internal structure of a memory system. It features three main functional blocks arranged vertically:

  - Address decoder**: This block receives the **Address lines** from the left and generates control signals to the other blocks.
  - (Control) Circuits**: This block receives control signals from the Address decoder and sends control signals to the **Data and status register**.
  - Data and status register**: This block receives data from the **Memory** and sends data back to the **Output device**. It also receives control signals from the **(Control) Circuits**.

**Bus** lines are shown on the right side, representing the **Address lines**, **Data lines**, and **Control lines**. The **Address lines** connect the **Address decoder** to the **Memory**. The **Data lines** connect the **Memory** to the **Output device**. The **Control lines** connect the **(Control) Circuits** to both the **Address decoder** and the **Data and status register**.

## Interrupts

- Interrupts:
    - Signal generated by an external device requiring some urgent action to be taken by the CPU.
    - Enabling and Disabling Interrupts



1  
Focus

- Using Multiple Devices.** A number of initiating interrupt are connected to the processor. The processor poll all the I/O devices.

### Interrupt Service Routine.

- Interrupt Service Routine.

  - Multiple-level priority organization.
  - Depending upon the device's priority.
  - Assign a priority level to the processor.

(19)



Even

- ulation of user programs  
plementing security and protection  
tutes.

一

- 19



1

- Two methods are used to address

the device:  
1 Memory-mapped  
2: I/O Mapped I/O.





## NUMBER SYSTEM

Convert number  $5062_{10}$  to Binary System.

$$\begin{array}{r}
 2 \overline{) 5062} \\
 2 \overline{) 2531} - 0 \\
 2 \overline{) 1265} - 1 \\
 2 \overline{) 632} - 1 \\
 2 \overline{) 316} - 0 \\
 2 \overline{) 158} - 0 \\
 2 \overline{) 79} - 0 \\
 2 \overline{) 39} - 1 \\
 2 \overline{) 19} - 1 \\
 2 \overline{) 9} - 1 \\
 2 \overline{) 4} - 1 \\
 2 \overline{) 2} - 0 \\
 2 \overline{) 1} - 0
 \end{array}$$

$$\boxed{\text{Ans} = (1001111000110)_2}$$

Convert  $159_{10}$  to octal Number

$$159 \text{ to octal}$$

$$\begin{array}{r}
 8 \overline{) 159} \\
 8 \overline{) 19} - 7 \\
 8 \overline{) 2} - 3
 \end{array}$$

$$\boxed{\text{Ans} = (237)_8}$$

i) Convert  $380_{10}$  to hexadecimal

$380_{10}$  to Hexadecimal

$$16 \overline{) 380}$$

$$\begin{array}{r}
 16 \overline{) 23} - 12 \\
 16 \overline{) 7} - 7
 \end{array}$$

$$\boxed{(\text{Ans} = 14C)}$$

$$\boxed{\text{Ans} = 2C4_{16} = 108_{10}}$$

ii) Convert  $11001011_2$  to decimal number System.

$$11001011$$

$$\begin{array}{r}
 1 \times 2^0 = 1 \\
 0 \times 2^1 = 0 \\
 1 \times 2^2 = 0 \\
 0 \times 2^3 = 8 \\
 0 \times 2^4 = 0 \\
 1 \times 2^5 = 0 \\
 1 \times 2^6 = 64 \\
 1 \times 2^7 = 128
 \end{array}$$

$$\begin{array}{r}
 1001 \\
 1 \times 2^0 = 1 \\
 0 \times 2^1 = 0 \\
 0 \times 2^2 = 0 \\
 1 \times 2^3 = 8
 \end{array}$$

$$\begin{array}{r}
 1001 \\
 1 \times 2^0 = 1 \\
 0 \times 2^1 = 0 \\
 0 \times 2^2 = 0 \\
 1 \times 2^3 = 8
 \end{array}$$

$$\boxed{\text{Ans} = (11001011)_2 = (203)_{10}}$$

5) Convert octal Number  $114_8$  to decimal Number.

$$114$$

$$\begin{array}{r}
 8 \overline{) 114} \\
 8 \overline{) 1} - 1
 \end{array}$$

$$\boxed{\text{Ans} = (114)_8 = (460)_{10}}$$

b) Convert Hexadecimal Number  $264_{16}$  to decimal number System.

Number

$$\begin{array}{r}
 16 \overline{) 264} \\
 16 \overline{) 10} - 4 \\
 16 \overline{) 2} - 0
 \end{array}$$

$$\begin{array}{r}
 16 \overline{) 10} \\
 16 \overline{) 4} \\
 16 \overline{) 0}
 \end{array}$$

Take decimal part.

$$0.625 \times 2 = 0.3125$$

$$0.3125 \times 2 = 0.625$$

$$0.625 \times 2 = 1.25$$

$$1.25 \times 2 = 0.5$$

$$0.5 \times 2 = 0.0$$

$$0.0 \times 2 = 0.0$$

$$\begin{array}{r}
 16 \overline{) 2020} \\
 16 \overline{) 65} - 10(A) \\
 16 \overline{) 1} - 1(B)
 \end{array}$$

$$\boxed{\text{Ans} = (1BA)_{16}}$$

Step 1: Convert Binary to decimal.

$$\begin{array}{r}
 1001 \\
 1 \times 2^0 = 1 \\
 0 \times 2^1 = 0 \\
 0 \times 2^2 = 0 \\
 1 \times 2^3 = 8
 \end{array}$$

$$\begin{array}{r}
 1001 \\
 1 \times 2^0 = 1 \\
 0 \times 2^1 = 0 \\
 0 \times 2^2 = 0 \\
 1 \times 2^3 = 8
 \end{array}$$

$$\begin{array}{r}
 2 \overline{) 2020} \\
 2 \overline{) 10} - 0 \\
 2 \overline{) 5} - 0 \\
 2 \overline{) 2} - 1 \\
 2 \overline{) 1} - 0
 \end{array}$$

$$\begin{array}{r}
 2 \overline{) 2020} \\
 2 \overline{) 10} - 0 \\
 2 \overline{) 5} - 0 \\
 2 \overline{) 2} - 1 \\
 2 \overline{) 1} - 0
 \end{array}$$

(22)

v) Convert  $1001_2$  to octal number System.

$$\begin{array}{r}
 1001 \\
 1 \times 2^0 = 1 \\
 0 \times 2^1 = 0 \\
 0 \times 2^2 = 0 \\
 1 \times 2^3 = 8
 \end{array}$$

$$\begin{array}{r}
 1001 \\
 1 \times 2^0 = 1 \\
 0 \times 2^1 = 0 \\
 0 \times 2^2 = 0 \\
 1 \times 2^3 = 8
 \end{array}$$





## VIRTUAL MEMORY

1. A computer with virtual memory has

- \* an access time to main memory 50 ns,
- \* the time to transfer a block from the virtual into main memory is 10 ns. The probability for the page fault is  $10^{-6}$ .

What is the average access time, if the page-table is in the main memory?

- Solution :
- \*  $t_{avg} = 10 \text{ ns.}$
  - \*  $t_B = 10 \text{ ms.}$
  - \*  $(1 - H) = 10^{-6}.$
  - \*  $t_A = ?$

$$t_A = t_{avg} + t_B + (1 - H) \times t_B.$$

↓ Access to the Page-table.  
in main memory

$$t_A = (50 \times 10^{-9}) + (50 \times 10^{-9}) \times (10^{-6}) \times (10 \times 10^{-3})$$

$$= 100 \times 10^{-9} + 10 \times 10^{-9}$$

$$= 110 \times 10^{-9}$$

$$t_A = 110 \text{ ns.}$$

$\therefore$  The average access time = 110 ns.

2. Computer with virtual memory has

the following features.

- \* length of virtual address is 38-bits.
- \* page size is 16 kB.
- \* length of physical address is 32-bits.

(a) How many bits is the length of page descriptor, if in addition to the frame number (Fn), additional parameters occupy another 6 bits?

Solution :

- \*  $n = 38, f = 32.$
- \* Page Size =  $16 \text{ kB} = 2^P = 2^{14} \text{ B.}$

\* No. of Pages in Virtual Memory :

$$2^{n-p} = 2^n \div 2^P = 2^{38} \div 2^{14} = 2^{24}.$$

- \* Number of Page frames in Main Memory :
- \*  $2^{f-p} = 2^f \div 2^P = 2^{32} \div 2^{14} = 2^{18} \cdot (\text{Fn}).$

$$\text{Page descriptor} = \text{Frame Number (Fn)} + 6$$

$$= 18 + 6 = 24 \text{ bits.} = 3 \text{ B.}$$

|                       |               |
|-----------------------|---------------|
| 6 bits                | 18 bits       |
| Additional Parameters | Frame Number. |

3. If a process has 32-bit virtual address, 28-bit physical address, 2 KB address space is 4 KB and Page

- \* pages. How many bits are required for the virtual, physical page number?

(A). 14, 21  
(B). 21, 14  
(C). 6, 10  
(D). 10, 15

Solution :

Virtual Address = 32-bit

Virtual Address Space =  $2^{32} \text{ B.}$

Physical Address = 28-bit

Physical Address Space =  $2^{28} \text{ bit}$

Page size =  $2 \text{ kB} = 2^{11}$

No. of entries for Virtual Page Number

=  $2^{\text{Virtual Address}} / \text{Page size}$

=  $2^{32} / 2^{11} = 2^{21}.$

No. of entries for Physical Page Number

Number of entries for Physical Page Number

Physical Address / Page Size

=  $2^{28} / 2^{11} = 2^{17}.$

Size of Page Table = No. of Pages × Page descriptor

$\therefore$  No. of bits for Virtual Page Number = 21.

$\therefore$  No. of bits for Physical Page Number = 17.

4. In a Paging Scheme, Virtual address space is 4 KB and Page

- \* table entry size is 8 bytes. What should be the optimal Page size?

Solution :

Given,

- \* Virtual Address Space = 4 KB. (Processor size)
- \* Physical Page Table entry = 8 bytes.

$$(2 \times \text{Processor size} \times \text{Page Table entry size})^{1/2}$$

$$= (2 \times 4096 \times 8 \text{ bytes})^{1/2}$$

$$= (8192 \times 8 \text{ bytes})^{1/2}$$

$$= (65536 \text{ bytes})^{1/2}$$

$$= 256 \text{ bytes.}$$

## CACHE MEMORY

### CACHE MEMORY

#### VIRTUAL MEMORY:

⑤ A 32 bit address system uses 4 megabytes main memory. The page size is 1 Kbytes. What is the page size in bytes? What is the virtual page and the offset in the page for the virtual address 0x00030f40?

For a page size of N bytes, the number of bits in the offset field is  $\log_2 N$ .

In case of a 2Kbytes page,

$$\log_2 2^{11} = 11 \text{ bits.}$$

Therefore, number of bits for the page number is  $32 - 11 = 21$ , which means a total of  $2^{21} = 2$  Mpages.

Binary representation of the address is

|                       |                |
|-----------------------|----------------|
| 0000 0000 0000 0000 1 | 1111 0100 0000 |
| " " 10                | 0              |

Given virtual address identifies the virtual page number

$$0x61 = 97_{10}.$$

The offset inside the page is

$$0x740 = 1856_{10}.$$

⑥ A 32 bit address system has a 4 megabytes main memory. The page size is 1 Kbytes. What is the size of the page table?

The page table addressed with page number field in virtual address.

Number of lines in page table equals the number of virtual pages.

$$\text{Virtual Page} = \frac{\text{address space [bytes]}}{\text{page size [bytes]}}$$

$$= \frac{2^{32}}{2^{10}} = 2^2 = 4 \text{ Mpages}$$

\* Width of each line in the page table equals width of the page number field in virtual address.

Page number field in virtual address is  $2^{22}$ ,

\* If no. of virtual pages is  $2^{22}$ , then line width is 22 bits.

$$\text{Page table size} = \text{Number of Entries} + \frac{\text{Main Memory Size}}{\text{Line size}}$$

$$\text{Size of main memory} = 16384 \text{ blocks}$$

$$= 2^{22} * 2^{22} \text{ bits}$$

$$= 110.5 \text{ Mbytes}$$

$$= 2^{22} \text{ bytes}$$

$$\text{The } 10.0 \text{ bits required} \\ \text{to address main memory}$$

$$\text{Number of bits in block offset} \\ \text{we have}$$

Block size = 256 bytes =  $2^8$  bytes  
thus, number of bits in block = 8 bits

offset or word = 8 bits

Number of bits in set number = 8 bits

No. of sets in Cache = No. of lines in Cache / Set size

$$= 128 \text{ blocks} / 4 \text{ blocks} = 32 \text{ sets}$$

$$= 25 \text{ sets}$$

Thus, no. of bits in set number = 5 bits

Number of bits in Tag number

No. of blocks in Cache memory = 128  
No. of bits in physical address = (no. of bits in tag + no. of bits in set number + no. of bits in block)

$$= 22 \text{ bits} - (5 \text{ bits} + 8 \text{ bits})$$

$$= 22 \text{ bits} - 13 \text{ bits} = 9 \text{ bits}$$

thus, no. of bits in Tag = 9 bits

Therefore, the Physical Address

$$= 16384 * 256 \text{ bytes} = 4194304 \text{ bytes}$$

$$= 2^{22} \text{ bytes}$$

$$\left. \begin{array}{l} \text{The } 10.0 \text{ bits required} \\ \text{to address main memory} \end{array} \right\} = 22$$

$$\left. \begin{array}{l} \text{Number of bits in block offset} \\ \text{we have} \end{array} \right\} = 8$$

(26)