

Processor:-

20/12/21

- 1) General Pur. Reg :-
  - 2) Special Pur. Reg :-
    - 1) IR  $\rightarrow$  Instruction
    - 2) PC  $\rightarrow$  execution sequence  
Address of i/s
  - 3) MAR  $\rightarrow$  Address of Memory location  
Read data  $\rightarrow$  fun
  - 4) MDR  $\rightarrow$  consists of data  
data can be instructions also  
for Read & write
  - 5) ALU :- perform arithmetic & logical operations.
  - 6) CU :- control unit sends signals for r/w
- Read:-
- 1) MAR  $\rightarrow$  has address of what you want to read.
  - 2) CU  $\rightarrow$  send read signal.
  - 3) Copy data to MDR.

Write:-

- 1) MAR  $\rightarrow$  where to write
- 2) MDR  $\rightarrow$  what
- 3) CU  $\rightarrow$  write signal.

Bus:-

$\rightarrow$  wires technically



- 1) Data bus lines  $\rightarrow$  carrying words
- 2) Address bus  $\rightarrow$  carrying address bits
- 3) Control bus  $\rightarrow$  control bits.

→ To avoid the speed mismatch of I/P, O/P dev with CPU, we use buffer registers

→ These are a bit faster than I/P, O/P devices and a bit slower than CPU.

Software:-

→ 1) System software → OS, compilers, text editors

2) Application software

User interacts with system using application software.

→ System software helps for the life of application software.

→ Browser, word.

Steps to execute a prgrm

→ Load prgrm from disk to Mem

→ Read file from the disk.

→ Calculations on file data

→ Print the results.





- when CPU is free, use that time for another program → multiprogramming
- elapsed time → total time for prgrm execution
- Process time → CPU time.

Performance:-

- Measure efficiency of CPU
- Measure using clock.
- clock cycle length,  $P$
- clock Rate,  $R = \frac{1}{P}$  cycles per second. | Hertz
- In each clock cycle, it executes certain tasks.  
(each basic inst )

Program → High level lang → N ins. each has

↓  
Compiler  
S basic steps

↓  
Machine languages.

so, total steps  $= N \times S \times P$

$$\text{time } t = \frac{N \times S}{R} \text{ cycles} / (\text{cycles/sec}) = \text{sec}$$

→ For performance, decrease t  
So, decrease N<sub>S</sub> or increase R value.

(24-12-2021)

Executing an instruction

Cache

1) Fetch

2) decode

3) Execute

→ Max time of CPU will be wasted on fetch.

→ To reduce the avg fetch time, we use cache memory.

→ Generally, cache memory can be placed on the CPU.

→ Cache can hold some data/instructions that are present in main memory.

→ Repeatedly accessible part of data from main memory is placed in cache.

| M M                      | Cache                 |                        |
|--------------------------|-----------------------|------------------------|
| 1) direct access by CPU  | Not direct            |                        |
| 2) More Mem              | Less                  |                        |
| 3) DRAM                  | SRAM                  | Dynamic RAM static RAM |
| 4) size of DRAM is more. | size of SRAM is more. |                        |
| 5) speed is slow         | speed is fast         |                        |

→ Using cache, reduces the avg fetch time

$$\text{Performance eqn} = \frac{N \times S}{R}$$

→ We can decrease N & S (or) increase R.

For decreasing S:-

→ Using pipelining

→ Overlap stages of program's execution.

For decreasing NXS:-

→ Based on compiler

→ Compiler should be able to produce efficient machine language.

For increasing R:-

→ Improving the IC of the system.

For reducing N:-

→ Reduced Instruction set computer → easy instructions and more steps.

→ Complex Instruction set computer → complex instructions and less steps.

Performance Measurement:-

→ SPEC → System Performance Evaluation Corporation.

→ set of standard programs to analyse the set of programs.

$$\text{SPEC rating} = \frac{\text{time taken on ref computer}}{\text{time taken on test computer}}$$

$\text{SPEC} = (\text{SPEC}_i)^{1/n}$  to get final rating of our system.

Multiprocessor :- More CPU's but same main memory.

Multicomputer :- computers are connected using networking.

Historical aspects:-

- 1) first gen comp:- 1945-55 → vacuum tubes, assembly level programming
- 2) Second gen comp:- 1955-65 → transistor, magnetic HLR, compilers, tapes
- 3) third : 1965-75 → IC, OS, Multip., Cache
- 4) fourth : 1975 - present  
↳ VLSI, OS, MP, VM, pipelining.

Q) List the steps needed to execute machine instruction

Add LocA + R<sub>0</sub> in terms of transfers between components processor and memory and some simple

Assume that ins is stored in memory at location instr and the address is initially in the reg A.  
Assume address of next ins is instr+1.

→ Fetch the instruction at the address instr to memory. (MAR ← instr, read by CU, MDR ← data in instr)

→ Decode the terms.

→ Address of location 'A' to MAR

→ Control unit issues read.

→ Read the data to MDR

→ Addition is performed using ALU.

→ Store the result of addition to R<sub>0</sub>

PC  $\rightarrow$  MAR (Address)

$\downarrow$  CU read

MDR (data at location instr)

$\downarrow$

IR (store it in IR)

$\downarrow$

Decode (Identify operation, operands)

Add, loc A, R0

$\downarrow$

Fetching from loc A

$\hookrightarrow$  MAR  $\leftarrow$  loc A

$\rightarrow$  CU  $\rightarrow$  read

$\rightarrow$  MDR  $\rightarrow$  data at A

$\downarrow$

store that value in temporary reg

MDR  $\rightarrow$  R<sub>i</sub>

$\downarrow$  send the instruction to ALU

(A  $\downarrow$ )  $\rightarrow$  (B  $\downarrow$ )  $\rightarrow$  (C  $\downarrow$ )

(A  $\downarrow$ ) perform addition in ALU

(B  $\downarrow$ )  $\downarrow$

store that value in R<sub>o</sub>

2) Give a short sequence of Machine instructions for the task "Add the contents of Mem loc A to those of loc B and place the answer in loc C."

Instructions load loc, R<sub>i</sub> and store R<sub>i</sub>, loc are the only instructions available to transfer data b/w Mem & General purpose Reg. R<sub>i</sub>.

Don't destroy contents of either loc A or loc B.

|       |                      |                            |
|-------|----------------------|----------------------------|
| load  | locA, R <sub>0</sub> | $R_0 \leftarrow m[locA]$   |
| load  | locB, R <sub>1</sub> | $R_1 \leftarrow m[locB]$   |
| Add   | $R_0, R_1$           | $R_1 \leftarrow R_0 + R_1$ |
| store | $R_1, locC$          | $m[locC] \leftarrow R_1$   |

(27-12-2021)

Suppose that move and add instr are available with format move/add loc1, loc2.

The instr move/add a copy of the operand at 1<sup>st</sup> location to 2<sup>nd</sup> location, overwriting original operand at the 1<sup>st</sup> location.

The loc i can be memory /process reg set.

Is it possible to use fewer instr. to accomplish the prev task? If yes, give the new set of instr.

move locA, locC  $m[locC] \leftarrow m[locA]$   
~~move~~ locB, locC  $m[locC] \leftarrow m[locA]$   
 Add  $+m[locB]$

100101010

27-12-2021

## Machine Instructions:-

ISA  $\rightarrow$  Instruction Set Architecture

Instructions:-



→ Types of opcode, No. of operands etc are under ISA.

Numbers:-

1) fixed-point representation (integers)

2) floating-point representation (float)

positive numbers  $\rightarrow 0 (+)$  } signed bits

negative numbers  $\rightarrow 1 (-)$  } for n-bits

Unsigned :- All bits represent magnitude. (0 to  $2^n - 1$ )

Signed :- One bit (MSB) sign part

$$+101 = 64 + 32 + 4 + 1$$

$$= 0 | 1100101$$

sign      \ Magnitude.

→ Range of n bit unsigned number

$$\text{is } [0, 2^n - 1]$$

n bits

Signed :- 1) sign-magnitude



2) Signed 1's complement

sign bits

3) Signed 2's complement

bit mag

|      | 0  | +0 | +0 | +0 |
|------|----|----|----|----|
| 0000 | 0  | +0 | +0 | +0 |
| 0001 | 1  | +1 | +1 | +1 |
| 0010 | 2  | +2 | +2 | +2 |
| 0011 | 3  | +3 | +3 | +3 |
| 0100 | 4  | +4 | +4 | +4 |
| 0101 | 5  | +5 | +5 | +5 |
| 0110 | 6  | +6 | +6 | +6 |
| 0111 | 7  | +7 | +7 | +7 |
| 1000 | 8  | -0 | -7 | -8 |
| 1001 | 9  | -1 | -6 | -7 |
| 1010 | 10 | -2 | -5 | -6 |
| 1011 | 11 | -3 | -4 | -5 |
| 1100 | 12 | -4 | -3 | -4 |
| 1101 | 13 | -5 | -2 | -3 |
| 1110 | 14 | -6 | -1 | -2 |
| 1111 | 15 | -7 | -0 | -1 |

$[-(2^{n-1}), 2^{n-1}-1]$  sign Mag

$[-(2^{n-1}), 2^{n-1}-1]$  is complement

$[-2^{n-1}, 2^{n-1}-1]$  2's complement

+5 → 00000101

1's → 11111010

2's → 11111011

→ This is called sign extension.

$\begin{matrix} 13 & \xrightarrow{\text{1's}} \\ -13 & \xrightarrow{\text{2's}} \end{matrix}$   
 $\begin{matrix} +13 & \xrightarrow{\text{sign}} \\ -13 & \xrightarrow{\text{2's}} \end{matrix}$   
 it.  $\xrightarrow{\text{2's}}$

(i) Represent -256 in 2's using 8 bits.

Not representable

(ii) -256 in 2's using 16 bits (0)

(iii) -256 using 1's using 8 bits 0010  
0011

Not possible

(iv) 16384 using 16 bits in sign-mag, 1's, 2's

(v) -16384 " " "

(vi) +13 & -13 in all three systems using 16 bits

(vii) -256

+256 - 00000000100000000000

Sign Mag - 100 000 010 000 0000

1's → 111 111 101 111 0000

2's → 111 111 110 000 0000

} -256

0100 0000 0000 0000

(iv) 16384 → 00111 111 111 11111  
16384 8192 4096 2048 1024 512 256 128 64 32 16 8 4 2 1

1's → 00111 111 111 11111

2's → 00111 111 111 11111

2 | 16384

2 | 8192

2 | 4096

2 | 2048

2 | 1024

512

(v) -16384 → 10111 111 111 11111

1's → 11000 000 000 00000

2's → 11000 000 000 000001

(vi) +13 → 0000 0000 0000 1101

1's → 0000 0000 0000 1101

2's → 0000 0000 0000 1101

-13 → 1000 0000 0000 1101

1's → 1111 1111 1111 0010

$2^8 \rightarrow 1111\ 1110\ 1111$  0011

(iv)

16384 - 0100 0000 0000 0000

is - 0100 0000 0000 0000

$2^8$  - 0100 0000 0000 0000

(v)

-16384 - 10100 0000 0000 0000

is - 1011 1111 1111 1111

$2^8$  - 1100 0000 0000 0000

(30-12-2021)

→  $2^8$  complement is used because using this all the numbers in the range can be represented.

Addition/Subtraction on signed  $2^8$  complement data.

4-bits:-

A, B      A = +2 = 0010

B = +4 = 0100

A+B:- 0010

$$\begin{array}{r} 0 \quad 100 \\ + 0 \quad 100 \\ \hline 0 \quad 110 \end{array}$$

A = +2 = 0010

B = -4 = 1100

↓ binary

1010 = -2

A = -2 = 1110

B = +4 = 0100

$$\begin{array}{r} + \\ \hline 1 \quad 0010 \end{array} \rightarrow \text{omit carry}$$

= +2

$$A = -2 = 1110$$

$$-4 = \underline{1100}$$

$$\boxed{1} \underline{\quad 1010}$$

↓ binary

$$\begin{array}{r} 1101 \\ +1 \\ \hline 1110 \end{array} \rightarrow -6$$

$$A = +4 = 0100$$

$$B = +5 = \underline{0101}$$

$$\underline{1001} \rightarrow 1111 \rightarrow -7$$

1010

$$A = +4 = 0100$$

$$B = -5 = \underline{1011}$$

$$\begin{array}{r} 1111 \\ +1 \\ \hline 1000 \end{array} \rightarrow -1$$

$$A = -4 = 1100$$

$$B = +5 = \underline{0101}$$

$$A = -4 = 1100$$

$$B = -5 = \underline{1011}$$

$$\boxed{1} \underline{0111}$$

$\rightarrow +7 \rightarrow$  This is not in the range of 2's complement representation with 4 bits.

→ When both numbers are having the same sign and if we are adding them, then we are in a state of overflow.

Subtraction.

$$A = +2 = 0010$$

$$B = +4 = 0100$$

$$2's \text{ of } 4 = 1100$$

$$+2 \quad 0010$$

$$-4 \quad +1100$$

$$\underline{1110}$$

$$\xrightarrow{2's} 1010 (-2)$$

$$+6 \rightarrow 0100$$

$$-6 \rightarrow 01000$$

$$-8 \quad 10111$$

$$\begin{array}{r} +1 \\ \hline \boxed{1}000 \end{array}$$

$$+2 = 0010$$

$$-4 = 1100$$

$$2's \text{ of } -4 = 0100$$

$$+2 \rightarrow 0010$$

$$-4 \rightarrow 0100$$

$$\underline{0110} + 6$$

$$-2 \rightarrow 1110$$

$$+4 \rightarrow 0100$$

$$2's \text{ of } +4 \rightarrow 1100$$

$$-2 \rightarrow 1110$$

$$+4 \rightarrow 1100$$

$$\begin{array}{r} \boxed{1010} \\ \underline{1110} \end{array} \xrightarrow{2's} 1110$$

-b.

$$-2 \rightarrow 1110$$

$$-4 \rightarrow 1100$$

$$2's \text{ of } -4 = 0100$$

$$-2 \rightarrow 1110$$

$$-4 \rightarrow 0100$$

$$\begin{array}{r} \boxed{0010} \\ \underline{1110} \end{array} \rightarrow +2$$

→ To perform subtraction of two signed numbers, take the 2's complement of second numbers and add them.

$$-4 \rightarrow 1100$$

$$+5 \rightarrow 0101$$

$$2's \text{ of } +5 \rightarrow 1011$$

$$1100$$

$$1011$$

$$1010$$

$$\begin{array}{r} \boxed{0111} \\ \underline{1011} \end{array}$$

$$\rightarrow +7$$

→ When two numbers of different signs are subtracted, then there is a chance of overflow.

# Instruction storage in Memory :-



## Byte-Addressability :-

|     |     |     |     |     |
|-----|-----|-----|-----|-----|
| 3   | 2   | 1   | 0   | ← 0 |
| 7   | 6   | 5   | 4   | ← 4 |
| ⋮   |     |     |     |     |
| 255 | 254 | 253 | 252 | 255 |

Address 0 means all bytes at address 0.

Right to left → little Indian style

left to right → big Indian style

For 16-bit:-

|     |     |       |
|-----|-----|-------|
| 1   | 0   | ← 0   |
| 3   | 2   | ← 2   |
| ⋮   |     |       |
| 255 | 254 | ← 254 |

For 16-bit system.

→ Byte addressability is easier access.



i) Use 8 bits to perform addition of following  
sign -2's complement numbers.

(i) +63, -56 (all possible combinations)

$$+63 \rightarrow 111111 \rightarrow 00111111$$

$$\begin{array}{r} 11000000 \\ + 1 \\ \hline 11000001 \end{array}$$

$$-63 \rightarrow 11000001$$

$$+56 \rightarrow 00111000$$

$$00111000$$

$$-56 \rightarrow 11001000$$

$$11000111$$

$$+63 \rightarrow \begin{smallmatrix} 111 \\ 00111111 \end{smallmatrix}$$

$$\begin{array}{r} 00111000 \\ + 1 \\ \hline 11001000 \end{array}$$

$$+56 \rightarrow 00111000$$

$$\begin{array}{r} + \\ 01110111 \\ \hline 01110111 \end{array} \rightarrow +119$$

$$+63 \rightarrow 00111111$$

$$-56 \rightarrow 11001000$$

$$\begin{array}{r} + \\ \boxed{1} 00000111 \\ \hline \end{array} \rightarrow +7$$

$$-63 \rightarrow 11000001$$

$$\begin{array}{r} 00000110 \\ + 1 \\ \hline 00000111 \end{array}$$

$$+56 \rightarrow 00111000$$

$$\begin{array}{r} + \\ \boxed{1} 11111001 \\ \hline 10000111 \end{array} \xrightarrow{2's} \boxed{1} 00000111 \rightarrow -7$$

$$-63 \rightarrow 11000001$$

$$1110110$$

$$-56 \rightarrow 11001000$$

$$\begin{array}{r} + \\ \boxed{1} 10001001 \\ \hline \end{array} \xrightarrow{2's} 11110111 \quad \begin{array}{r} + \\ 1110111 \\ \hline \end{array} \quad \hookrightarrow -119$$

(31-12-2021)

Add loc1, loc2 → 12

load loc1, R0 → 8<sup>0</sup>

ISA:-

Memory operations:-

1) load → read locA, R0

2) store → write store R0, locA

second operand:- destination.

Instructions:-

→ Categorize the instructions based on opcode (4 types)

Opcode:-

1) Data transfer :- load, store, move

2) ALU based operations :- Add, subtract, multiply, divide, and, or, not

3) Program sequencing & control:- Branching, looping  
(If-else constructs, goto, for, while) Branching notequal  
jump (BNIE)

4) I/O based operations:- in operation, out operation

Register transfer Language:- (RTL) [Arrow notation]

1)  $R_i \leftarrow m[loc]$

2)  $R_i \leftarrow R_j + R_k$

Assembly language notation:- (Machine language)

1) add loc, R<sub>i</sub>

operation    operands

2) add R<sub>j</sub>, R<sub>k</sub>, R<sub>i</sub>

Instruction types based on number of operands:-

1) 3-address instructions → Add R<sub>i</sub>, R<sub>j</sub>, R<sub>k</sub>  $R_k \leftarrow R_i + R_j$

2) 2-address instructions → Add R<sub>i</sub>, R<sub>j</sub>  $R_j \leftarrow R_i + R_j$

3) 1-address instruction → Add R<sub>i</sub>  $AC \leftarrow AC + R_i$

4) 0-address instructions → push R<sub>i</sub>, push R<sub>j</sub>  
→ stack based operations Add



→ In 3-address instruction, for the other operand there will be default operand called accumulator.

$$Y = (A \times B) + (C \times D)$$

3-Address

multiply loc A, loc B, R<sub>1</sub>    R<sub>1</sub> ← A × B

|   |   |   |   |   |      |
|---|---|---|---|---|------|
| 0 | 1 | a | 1 | 0 | char |
|   |   |   |   |   |      |

multiply loc C, loc D, R<sub>2</sub>    R<sub>2</sub> ← C × D    Num char

add              R<sub>1</sub>, R<sub>2</sub>, loc Y    loc Y ← R<sub>1</sub> + R<sub>2</sub>

(multiplying)

2-Address

load loc A, R<sub>1</sub>    R<sub>1</sub> ← [loc A]

load loc C, R<sub>2</sub>    R<sub>2</sub> ← [loc C]

multiply loc B, R<sub>1</sub>    R<sub>1</sub> ← R<sub>1</sub> \* [loc B]

multiply loc D, R<sub>2</sub>    R<sub>2</sub> ← R<sub>2</sub> \* [loc D]

Add              R<sub>1</sub>, R<sub>2</sub>    R<sub>2</sub> ← R<sub>1</sub> + R<sub>2</sub>

store R<sub>2</sub>, loc Y    [loc Y] ← R<sub>2</sub> + [loc Y]

load loc A    ac ← [loc A]

store R<sub>1</sub>    R<sub>1</sub> ← ac

load loc B    ac ← [loc B]

Multiply R<sub>1</sub>    ac ← R<sub>1</sub> × ac

load loc C

store R<sub>2</sub>

load loc D

Multiply R<sub>2</sub>    R<sub>2</sub> ← R<sub>2</sub> × ac

store R<sub>2</sub>    ac ← R<sub>2</sub>

Add              R<sub>1</sub>    R<sub>1</sub> ← R<sub>1</sub> + ac

store R<sub>1</sub>    ac ← R<sub>1</sub>

store loc Y

|               |                                      |
|---------------|--------------------------------------|
| load locA     | $ac \leftarrow [locA]$               |
| store R1      | $R1 \leftarrow ac$                   |
| load locB     | $ac \leftarrow [locB]$               |
| Multiply R1   | $ac \leftarrow R1 * ac$              |
| store R1      | $R1 \leftarrow ac$                   |
| load locC     | $ac \leftarrow [locC]$               |
| store R2      | $R2 \leftarrow ac$                   |
| load locD     | $ac \leftarrow [locD]$               |
| Multiply R2   | $ac \leftarrow R2 * ac$              |
| #store R2     | $R2 \leftarrow ac$                   |
| Add R1        | $ac \leftarrow ac + R1$              |
| store locY    | $[locY] \leftarrow ac$               |
| load locA     | $ac \leftarrow m[locA]$              |
| Multiply locB | $ac \leftarrow ac * m[locB]$         |
| store R1      | $R1 \leftarrow ac$ $R1 = A \times B$ |
| load locC     | $ac \leftarrow m[locC]$              |
| Multiply locD | $ac \leftarrow ac * m[locD]$         |
| Add R1        | $ac \leftarrow ac + R1$              |
| Store locY    | $[locY] \leftarrow ac$               |

push locA  $\rightarrow$  Address

push locB

Multiply

push locC

push locD

Multiply

Add

pop locY



03-01-2022

## Program Sequencing:

Default :- 2-address, only one memory location.

Add A, B store to C       $C \leftarrow A + B$

Load locA, R1

↳ only one mem loc.

Add. locB, R,

1 word = 32-bit

store  $R_1$ , loc C

|     |       |
|-----|-------|
| i   | load  |
| i+4 | Add   |
| i+8 | store |
| :   |       |
| A   |       |
| B   |       |
| C   |       |

Assume instruction length = 4

→ Regs required:- PC, MAR, MDR, IR

→ Based on value of PC, we say whether it is sequential or branching.

## List of n numbers addition

|        |       |                   |
|--------|-------|-------------------|
| i      | Load  | $N_1, R_1$        |
| i+4    | Add   | $N_2, R_1$        |
|        | :     |                   |
|        | :     |                   |
| i+4(n) | Add   | $N_n, R_1$        |
| i+8n   | Store | $R_1, \text{SUM}$ |

```
for(i=1, i < n, i++)
    sum = sum + n[i]
```

|     |                                                                    |
|-----|--------------------------------------------------------------------|
|     | load N, R <sub>1</sub>                                             |
| i+4 | clear R <sub>2</sub>                                               |
| i+8 | loop → get the <sup>n</sup> th number<br>and add to R <sub>2</sub> |
|     | DEC R <sub>1</sub>                                                 |
|     | if R <sub>1</sub> >0 goto loop → Branching instruction             |
|     | store R <sub>2</sub> , sum                                         |

→ Branching alters the PC of the instruction.

PC ← i

i+4

i+8

i+12

i+16

(i+20) → i+8

i+12

i+16

(i+20) → i+8

1) Conditional Branch → check for the condition

2) Unconditional Branch - jump, goto

Conditional Codes:-

1) Negative - N

2) zero - Z

3) Overflow - O

4) Carry - C

→ These are like flags, if satisfies the condition it will be set to 1.

N-flag - 1 :-

Add R<sub>i</sub>, R<sub>j</sub> if R<sub>j</sub> < 0, N = 1

BN <address>

→ BN checks for the value of previous instr destination.

Branching Instruction

|        |                          |
|--------|--------------------------|
| opcode | address of instr to jump |
|--------|--------------------------|

BN → Branch if negative  
→ one address instr.

Z-flag - 4 :-

clear R<sub>0</sub>

BZ loc Y

DEC R<sub>1</sub>

BNZ loc loop

BN

BNN

BZ

BNZ

BO

BND

BC

BNC

0-flag - 1 :-

Add R<sub>1</sub>, R<sub>2</sub>

BNO loc Y

C-flag - 1 :-

Add R<sub>1</sub>, R<sub>2</sub>

BNC loc Y

(Addressing modes)

R<sub>1</sub> ← N

R<sub>2</sub> ← 0

R<sub>3</sub> ← get & Fetch

R<sub>2</sub> ← R<sub>2</sub> + R<sub>3</sub>

R<sub>1</sub> ← R<sub>1</sub> + 1

sum ← R<sub>2</sub>

06-01-2022

## Addressing Modes:-

- 1) Immediate Addressing mode.
- 2) Register Addressing mode.
- 3) Direct Addressing Mode
- 4) Indirect → Memory Addressing Mode  
    └→ Register Addressing Mode.
- 5) Index Addressing Mode
- 6) Index with base
- 7) Index with base & offset
- 8) Relative / PC
- 9) Auto - Increment
- 10) Auto decrement

→ Specifies where exactly operands are presents.

- 1) Variables, constants

H.L.L :- int x;

Immediate :-   

|        |      |
|--------|------|
| opcode | data |
|--------|------|

Move #2, R1

# → indicates that it is data but not address.

Register :-



→ Register has data directly.

Direct/Absolute :-



→ Data lies in memory.

load loc A, R1

Indirect :-

WRT Memory :-

→ Like a double pointer, has one more level of indirection.



Ex:- load (loc), R<sub>1</sub>

( ) → indicate the data is not present in that address but another address.

→ Effective Address is the address from where the data is obtained.

WRT Register :-



→ Single Memory access.

Ex:- load (R<sub>1</sub>), R<sub>2</sub>

MLL

load B, R<sub>1</sub>

load (R<sub>1</sub>), locA

$$A = *B$$

load  $N_1, R_1 \rightarrow$  clear  $R_2$ , load size,  $R_3$

Add  $(R_1), R_2$

INCR  $R_1$  by 4

if  $(R_3) > 0$  goto loop

store  $R_2$ , sum

=  
load size,  $R_3$

clear  $R_2$

load  $N_1, R_1$

Add  $(R_1), R_2$

INCR  $R_1$  by 4

if  $(R_3) > 0$  goto loop  $\xrightarrow{\text{DECR } (R_3)}$

store  $R_2$ , sum

=  
load size,  $R_3$

clear  $R_2$

load  $\#N_1, R_1$  : loop  $\rightarrow N_1 \rightarrow$  data at  $N_1$

Add  $(R_1), R_2$   $\#N_1 \rightarrow$  address of  $N_1$

Sub1  $R_1$

Add  $\#4, R_1 \rightarrow \#4$  exact value 4

DEC  $R_3$

BNZ loop

Store  $R_2$ , sum



Move 20(R1), R2

↓  
offset → Index Register.

Move #i, R0

Move 4(R0), R1

Move 8(R0), R2

Move 12(R0), R3

INC #16, R0

Move #i, R0

clear R5

Move #3, R5

Move 4(R0), R1

Move N, R0

clear R1

clear R2

clear R3

load #i, R4

Add 4(R4), R1 : loop

Add 8(R4), R2

Add 12(R4), R3



DEC R<sub>0</sub>  
Add #16, R<sub>4</sub>

BNZ loop

store R<sub>1</sub>, sum1

store R<sub>2</sub>, sum2

store R<sub>3</sub>, sum3

07-01-2022 Index with base.

MOVE (R<sub>i</sub>, R<sub>j</sub>), R<sub>k</sub>

opcode | BR, IR

BR + IR → Effective Address

Move (R<sub>1</sub>, R<sub>2</sub>), R<sub>3</sub>

Index with base and offset

Move 10(R<sub>i</sub>, R<sub>j</sub>), R<sub>k</sub> EA = R<sub>i</sub> + R<sub>j</sub> + 10

opcode | off(IR, BR)

→ If offset is huge, then pass it into another and use that.

PC related:- (Relative)



- This is generally used for instruction fetching.  
→ Used with branching instruction.



|                                       |      |
|---------------------------------------|------|
| ADD (R <sub>1</sub> ), R <sub>2</sub> | 1000 |
| INC #4, R <sub>1</sub>                | 1004 |
| DEC R <sub>3</sub>                    | 1008 |
| BNZ -16                               | 1012 |
| STORE R <sub>2</sub> , sum            | 1016 |

PC      PC ← 1016 - 16

→ Even if the program is replaced to another memory location, then also it performs correctly.

### Auto Increment Addressing Mode:-

→ used with loops

→ like post increment



After fetching the data from address, automatically the address stored in register is incremented.

Ex:- MOVE #1000, R<sub>1</sub>  
ADD (R<sub>1</sub>)<sub>+</sub>, R<sub>2</sub>

$$R_1 = 1004$$

$$R_2 \leftarrow R_2 + R_1$$

Auto Decrement

→ Pre Decrement

→ First decrement and then execute.



Move N, R<sub>0</sub> 0

clear R<sub>1</sub> 4

clear R<sub>2</sub> 8

clear R<sub>3</sub> 12

load (#i), R<sub>4</sub> 16

Add (R<sub>4</sub>)+, R<sub>1</sub> 20

Add (R<sub>4</sub>)+, R<sub>2</sub> 24

Add (R<sub>4</sub>)+, R<sub>3</sub> 28

DEC R<sub>0</sub> 32

BINL -12

store R<sub>1</sub>, loc A

store R<sub>2</sub>, loc B

store R<sub>3</sub>, loc C

Regs - R<sub>1</sub> and R<sub>2</sub> of comp contain the dec values 1200, 4600. What is the effective Add of memory operand?

1.) Load 20(R<sub>1</sub>), R<sub>5</sub> EA = R<sub>1</sub> + 20 = 1220

2.) Move #3000, R<sub>5</sub> EA = 3000 = R<sub>5</sub> = 3000

3.) Store R<sub>5</sub>, 30(R<sub>1</sub>, R<sub>2</sub>) EA = (R<sub>1</sub> + R<sub>2</sub> + 30) = 5830

4.) ADD -(R<sub>2</sub>), R<sub>5</sub> EA = R<sub>2</sub> - 4 = 4596.

5.) SUB -(R<sub>1</sub>) + , R<sub>5</sub> EA = R<sub>1</sub> = 1200

~~① EA~~

The list of student marks is changed to contain T-test scores for each student. Assume that there are N students. Write Assembly L code for computing the sums of the scores and store these sums at addresses sum, sum+4, sum+8, ...

The No: of tests T is larger than the No: of reg's.

Use two Nested loops. The inner loop → sum for one test and outer loop for No: of tests T. Assume that T is stored in Memory location T placed ahead of N.



load loc J , R<sub>0</sub>  
 load loc N , R<sub>1</sub>  
 load #i , R<sub>2</sub>  
 Add (#A, /R<sub>2</sub>) Add #4 , R<sub>2</sub>  
 Move (R<sub>2</sub>)+, R<sub>3</sub>  
 clear R<sub>4</sub>  
 Add 4J(R<sub>3</sub>)

---

|    |                                                     |                                                     |
|----|-----------------------------------------------------|-----------------------------------------------------|
| 16 | load loc J , R <sub>0</sub>                         | R <sub>0</sub> ← [J] i                              |
| 17 | Mv(load) #J , R <sub>1</sub>                        | R <sub>1</sub> ← J                                  |
| 18 | Add #1 , R <sub>1</sub> , R <sub>2</sub>            | R <sub>2</sub> ← R <sub>1</sub> +1                  |
| 19 | Mul #4 , R <sub>2</sub>                             | R <sub>2</sub> ← R <sub>2</sub> *4                  |
| 20 | Mv(load) #0 , R <sub>3</sub>   clear R <sub>3</sub> | R <sub>3</sub> ← 0                                  |
| 21 | load loc N , R <sub>4</sub>                         | R <sub>4</sub> ← [N] n                              |
| 22 | (Mpvcl 12(R <sub>1</sub> )/R <sub>5</sub> )         | R <sub>5</sub> ← R <sub>1</sub> +12                 |
| 23 | Add #12 , R <sub>1</sub> , R <sub>5</sub>           |                                                     |
| 24 | Add R <sub>3</sub> , R <sub>5</sub>                 | R <sub>5</sub> ← R <sub>3</sub> + R <sub>5</sub>    |
| 25 | clear R <sub>6</sub>                                | R <sub>6</sub> ← 0                                  |
| 26 | Add (R <sub>5</sub> ) , R <sub>6</sub>              | R <sub>6</sub> ← R <sub>6</sub> + [R <sub>5</sub> ] |
| 27 | Add R <sub>2</sub> , R <sub>5</sub>                 | R <sub>5</sub> ← R <sub>5</sub> + R <sub>2</sub>    |
| 28 | DEC R <sub>4</sub>                                  | R <sub>4</sub> ← R <sub>4</sub> - 1                 |
| 29 | BNZ -12                                             | loop to 40                                          |
| 30 | store R <sub>6</sub> , (sum , R <sub>5</sub> )      | [sum+R <sub>5</sub> ] ← R <sub>6</sub>              |
| 31 | Add # 4 , R <sub>3</sub>                            | R <sub>3</sub> ← R <sub>3</sub> + 4                 |
| 32 | DEC R <sub>0</sub>                                  | R <sub>0</sub> ← R <sub>0</sub> - 1                 |
| 33 | BNZ -48                                             | loop to 20                                          |

|    |                              |                          |                           |
|----|------------------------------|--------------------------|---------------------------|
| 0  | load locT , R0               | DIR                      | R0 $\leftarrow [T]$       |
| 4  | move #T , R1                 | JMM                      | R1 $\leftarrow T$         |
| 8  | Add #1 , R1 , R2             | JMM                      | R2 $\leftarrow R1 + 1$    |
| 12 | Mul #4, R2                   | JMM, Reg                 | R2 $\leftarrow R2 * 4$    |
| 16 | clear R3                     | Reg                      | R3 $\leftarrow 0$         |
| 20 | load (R1, A1/A2), 4(R1) , R4 | Index                    | R4 $\leftarrow [R1+4]$    |
| 24 | Add #12, R1 , R5             | JMM                      | R5 $\leftarrow R1 + 12$   |
| 28 | Add R3 , R5                  | Reg                      | R5 $\leftarrow R5 + R3$   |
| 32 | clear R6                     | Reg                      | R6 $\leftarrow b$         |
| 36 | Add (R5) , R6                | Reg Indirat              | R6 $\leftarrow R6 + [R5]$ |
| 40 | Add R2 , R5                  | Reg                      | R5 $\leftarrow R5 + R2$   |
| 44 | DEC R4                       | Reg                      | R4 $\leftarrow R4 - 1$    |
| 48 | BNZ -16                      | PC                       | loop to 36                |
| 52 | Store R6 , (sum, R3)         | [sum+R3] $\leftarrow R6$ |                           |
| 56 | Add #4, R3                   | Index with base<br>JMM   | R3 $\leftarrow R3 + 4$    |
| 60 | DEC R0                       | Reg                      | R0 $\leftarrow R0 - 1$    |
| 64 | BNZ -48                      | PC                       | loop to 20                |

Last instruction  
Branch, then  
value of PC

## Assembly language:-

(17-01-2023)

- Source language is used for source program
- Assembly language consists of some mnemonics like opcode, operands
- Object programming → code in 0's & 1's



syntax rules for A-L:-

- opcode → capital
- Right operand → destination.
- # → used for immediate data
- opcodes have some fixed notations in binary form

if 15 codes are there, we can use 4 bit code

opcode  $\square$  operands ( $[M]OP1 [M]OP2$ )  
↓  
bit codes      mode bits  
to give Addressing mode.

→ ADDI → Add immediate data to destination.

→ EQU, ORIGIN, RESERVE → directives.

→ The above words are assembly directives.

ORIGIN → Next storage of data. (Next address)

Reserve → store / require that much storage.

Return → program completed  
start, end

# Assembly language Instruction:

Label + operations + operands

→ We can have comments too.

Symbol table:

→ used by assembler for storing values of the labels in source program.

→ 2-pass assembler

1st pass go through names and store its addresses.

→ Loader



→ Loader loads the M.L program from hard disk to Main Memory.

→ Loader should know the length of program and starting address of program.

→ Debugger will help to recover the errors.

→ #10 → decimal Immediate

#%10 → binary Imm

#\$10 → Hexadecimal.

(21-01-2022)

Input/output:-

→ speed of I/O devices is small when compared to processor.

→ so, we use some interfaces to match the speed of these I/O with processor.

→ Modes of I/O :-

1) Polling

2) Interrupt

3) DMA

→ There will be some bits like sin, sout.

sin  $\Rightarrow$  indicates that processor should wait for the data transfer from device to device buffer.

sout  $\Rightarrow$  processor can access data in buffer.

sout  $\Rightarrow$  data transfer from buffer to display.

sout  $\Rightarrow$  processor can't transfer, monitor display

readwait Branch to readwait if sin = 0

Input from datain to R<sub>1</sub>

writewait Branch to write wait sout = 0

(if R<sub>1</sub> ≠ null) Send data<sub>1</sub> to dataout.  
Output from R<sub>1</sub>

Data Transfer Hw Peripheral devices & Processor Regs.

1) Memory Mapped I/O :- MOVEBYTE DATAIN R<sub>1</sub>

2) Isolated I/O

①  $\Rightarrow$  Buffer Regs are taken as Memory addresses.

INSTATUS → registers (u bits)  
↳ 3<sup>rd</sup> bit (sin bit)

WW Testbit #3 OUTSTATUS

Branch = 0 WW

MOVEBYTE R1, DATAOUT

RW Testbit #3 INSTATUS

Branch = 0 RW

MOVEBYTE DATAIN, R1

RW Testbit #3 INSTATUS

Branch = 0 RW

MOVEBYTE DATAIN, R1

CMP #CR, R1

BZ EXIT

STORE R1 (locA, R3) → Add #L, R3

WW Testbit #3 OUTSTATUS

Branch = 0 WW

MOVEBYTE R1, DATAOUT

BZ RW

EXIT

25-01-2022

Stacks:-

Subroutine:-

1) function calls

2) Loopings

→ Stacks store from higher addr to lower addr.  
→ Stack pointer (SP) points to the top of the stack.

→ So, to insert into stack

1) update SP

2) push new item to updated S.P.

SUB #4, SP } → PUSH  
MOVE R0, (SP) } → MOVE R0, -(SP)

MOVE (SP), R0 } → POP  
Add #4, SP } → MOVE (SP)+, R0

SUB #4, SP  
CMP #1500, SP  
~~BEN~~ EXIT  
MOVE R0, -(SP)  
EXIT

CMP #1500, SP } SAFEPUSH  
~~BEN~~ EXIT } BZ/BN EXIT  
MOVE R0, -(SP)

CMP #2000, SP } SAFEPOP  
BNN EXIT }  
MOVE (SP)+, R0

Queues:-

Subroutine:-

call:- store PC to link register.  
Branch to the sub routine by loading  
address of subroutine to PC.

$LR \leftarrow PC$

$PC \leftarrow$  Address of starting instr of  
sub routine.

Return:- Load PC with Link reg.

In case of sub-routines nesting; the link  
register may not be helpful.

Because the (PN) LR will get updated  
to latest SR PC; which leads to  
loss of (y/n) previous SR PC.

In case of sub-routines nesting; the link register may not be helpful.

Because the (PC) LR will get updated to latest SR PC; which leads to loss of (XNIB) previous SR PC.

(28-01-2022)

→ To pass parameters to sub routine from main method, we use registers.

call ListAdd

Here listadd is subroutine and we use this (reg11) instruction for calling subroutine

→ If there is a need to transfer more data between main and subroutine, then reg set may not be sufficient.

## Logical operations

01-02-2022

NOT, AND, OR

NOT R<sub>1</sub>

AND R<sub>1</sub>, R<sub>2</sub>

AND #\$FFFF 0000 , R<sub>1</sub>

The above instruction converts the last 16 bits of R<sub>1</sub> to zeroes.

32-bit ASCII code → R<sub>0</sub>

check if most significant byte is Z

Z ← 0101 1010 B

AND #\$FB00 0000 , R<sub>1</sub>

CMP #\$FA00 0000 , R<sub>1</sub>

OR R<sub>1</sub>, R<sub>2</sub>

Shift Operations:-

- 1) Logical (Left / Right)
- 2) Arithmetic (Left / Right)
- 3) Rotate (Left Rotate carry / Left Rotate)

Logical shift :-



Here we lose MSB.

LSHIFTL count, R<sub>j</sub>

LSHIFTR count, R<sub>j</sub>

Logical left shift :-

Arithmetic Right shift:-

→ Sign of the number will not change

10110001  
↓ 10110 → 0000

11011000

Rotate operation:-

Rotate Right:-

10110001

11011000

LSB → MSB



PACKED = 10010101

MOVEBYTE loc, R0      R0 = 0000 0101

MOVEBYTE loc+1, R1      R1 = 0000 1001

LSHIFTL #4, R1

ADD R0, R1, loc

LOAD loc, R0

LOAD loc+1, R1

AND #\$0000 000F, R0

AND #\$0000 000F, R1

LSHIFTL #4, R1

ADD, R1, R0, PACKED

- 1) Vector dot product
- 2) Sorting
- 3) Linked List

02-02-2022

### Encoding of Instructions.

Add R<sub>1</sub>, R<sub>2</sub>

7 4 4 4 4

|        |   |     |   |     |
|--------|---|-----|---|-----|
| Opcode | M | OP1 | M | OP1 |
|--------|---|-----|---|-----|

No: of opcodes supported

No: of Regs present

This information is required:

To store this information, we require

23 bits

Add 5(R<sub>1</sub>), R<sub>2</sub>

7 27

|        |   |             |   |                |   |                |
|--------|---|-------------|---|----------------|---|----------------|
| Opcode | M | Imm<br>data | M | R <sub>1</sub> | M | R <sub>2</sub> |
|--------|---|-------------|---|----------------|---|----------------|

7 4 [5] 4 4 4 4

Add Loc, R<sub>1</sub>

|        |   |         |   |                |
|--------|---|---------|---|----------------|
| Opcode | M | Address | M | R <sub>1</sub> |
|--------|---|---------|---|----------------|

7 4          4 4

↓  
13

But 13 bits are not enough to represent the loc address.

→ So, we use 2 words to store this instr.

→ This is called as CISC architecture.  
→ Instruction can take More than 1 word.

RISC  $\rightarrow$  1 WORD (only 1 word instruction)

0 MOV EAX, N  
4 clear EBX  
8 MOV ECX, NH  
12 Add EBX, [ECX]  
16 Add ECX, 4D

20 DEC EAX

24 JNZ -16

28 MOV SUM, EBX

04-02-2022

LEA EBX, NUM1

SUB EBX, 4

MOV ECX, N

MOV EAX, 0

ADD EAX, [EBX + ECX \* 4]  $\leftarrow$  STARTADD

LOOP STARTADD  $\begin{array}{l} \text{- Dec ECX and branch} \\ \text{back if } [ECX] > 0 \end{array}$

MOV SUM, EAX

LOOP instr is supported in IA-32 format

| OPCODE       | Add mode | Displace | Imm    |
|--------------|----------|----------|--------|
| 1 or 2 bytes | 1 or 2   | 1 or 4   | 1 or 4 |

$\rightarrow$  Min Instr. length is 1 byte.

consider the instruction

INC EDI

          

1 byte

so, for mentioning the register number, 3-bits  
are enough. (There are only 8 regs in IA-32)  
5-bits for identifying opcode.

MOV EAX , 1000 - 5 bytes

1 byte  $\rightarrow$  opcode + Reg

JMM data  $\rightarrow$  4 bytes

MOV ECX , LOC

$\downarrow$        $\downarrow$        $\downarrow$   
1 byte 1 byte 32 bits  
4 bytes

32+3  
+  
4

1000, 1003, 1004, 1005

Total :- 6 bytes.

ADD EAX, [EBX + EDI \* 4]      3 + 1 + 1 + 2  
 $\downarrow$                        $\downarrow$   
1 bytes      2 bytes

MOV DWORD PTR [EBP + ESI \* 4 + DISP], 10  
 $\downarrow$                        $\downarrow$   
1      2 + 4      4

Scaling factor can be of only 1, 2, 4 (or) 8

DD  $\rightarrow$  double word

Flags:- ZF, CF, SF, OF (zero, carry, sign, overflow)

(09-02-2022)

### Interrupt Nesting:-

- Use priorities to handle multiple interrupts.
- To change the priority, switch from user to privilege mode.
- Use multi-level priorities to handle.
- Use multiple bus lines to handle interrupts properly.
- To handle simultaneous interrupt, we use daisy chain.
- Only IA bus line and it passes through all devices.
- So, the device which electrically nearer to processor will be served first.

### Controlling device requests:-

- If we want any i/o device not give any interrupt, there will be some bits like enable bits which when set to 0, then that will n't be able to give interrupt.

Consider a processor that uses vectored Interrupt where starting address of is stored at INTVEC. Interrupts are enabled by setting to 1. and

which we assume it is bit 9. A keyboard and

display unit connected to the processor have the status control and data reg as we discussed. Assume that at some point in a program called main, we read input line from key board and store the line in successive bytes in the mem starting at loc 9.

- List out steps for initializing interrupt process.
- Steps by ISR.

- 1) store pc and regs to stack.
  - 2) Update PC to INTVEC
  - 3) Load the data/instr to Mem.
- 
- 1) Load the address in INTVEC to PC.
  - 2) Enable keyboard interrupt by setting bit 2 in reg CONTROL to 1.
  - 3) Set IE bit to 1 in processor status register.

(02-2022)

Exceptions:-

- Interrupts are a form of exceptions.
- The current instruction which is causing exception will not be completely executed.

Debugging:-

- Finding out errors.
- 1.) Trace :- Every instruction is going to be executed and going to cause exception to check for errors.
- 2.) Break points

↓  
Stop execution after every regular interval of instructions.

07/03/2022

## Cache Memory :-

- used to store the frequently accessed data.
- Fetching data from cache reduces the overhead.
- smaller than M.M but faster.
- If data is present in cache, it is called hit.
- If not miss.
- If data is missed, then a copy of the data is stored in cache memory.
- To map addresses from MM to cache:-

3 - tech

- (i) Direct Map
- (ii) Associative Map
- (iii) set-associative Mapping



Tag =  $\frac{\text{M.M blocks}}{\text{Cache M block}}$  gives the No: of

bits in tag field.

- In cache memory, there is a tag field for every block in cache field.

- ① In direct mapping, using the tag field, we directly store block there.

Check videos for Memory Mapping

25-03-2022

## Computer Arithmetic:-

Consider two integers, A & B in  $2^5 - CO$

$$A = (00010001)_2 \quad B = (00010110)_2$$

Show A + B

$$A = 00010001$$

$$B = 00010110$$

$$R = A+B = \underline{00100111}$$

$$A = (00011000)_2 \quad B = (1101111)_2$$

$$A = 00011000$$

$$B = 1101111$$

$$\begin{array}{r} \boxed{A} \\ \underline{-} \quad B \\ 00000111 \\ \downarrow \\ \text{omit} \\ \text{carry} \end{array}$$

## Subtraction

$$A = (00011000)_2 \quad B = (1101111)_2$$

$$2's \ of \ B = (00010001)_2$$

$$A = 00011000$$

$$B = 00010001$$

$$\begin{array}{r} \underline{00100001} \\ \downarrow \end{array}$$

$$A = (1101101)_2 \quad B = (00010100)_2$$

$$2's \text{ of } B := (1101100)_2$$

$$A = \begin{array}{r} 111 \\ 110 \quad 11101 \end{array}$$

$$B = \begin{array}{r} 11101100 \\ \hline \end{array}$$

$$\begin{array}{r} \boxed{1} & \begin{array}{r} 11001001 \\ \hline \end{array} \\ \downarrow \text{omit carry} & \end{array}$$

Overflow:-

→ When the signs of the addends are different, there is no chance of overflow.

Multiplication:-

Unsigned binary integers

$M$  - Multiplicand       $A \leftarrow$  temp reg to hold sum.

$B$  - Multiplier

$n$  - counter = no. of bits in  $B$

Ex:-  
 ~~$A = 1011$~~   
 $M = 1011, B = 1101$

$$\begin{array}{r} 0110 \\ 1011 \\ \hline \boxed{1} \quad \begin{array}{r} 0001 \\ \hline 1101 \end{array} \end{array}$$

If right most bit of  $B$  is 0, shift right the reg  $C, A + B$

①  $C=0$   
 $A = 1011 \quad B = 1101 \quad M = 1011$

Shift R  $\cdot A = 0101 \quad B = 1100$

②  $B_0=0$    Shift R    $C=0 \quad A = 0010 \quad B = 1111$

③  $B_0=1$  Add  $\begin{array}{r} A = 1101 \\ \text{shift R} \quad C=0 \\ \hline A = 0110 \end{array} \quad B = 1111 \quad C=D$

④  $C = 1$      $A = 0001$      $B = 111$

shift R C=0 A = 1000 Q = 111

Product = A B

= 1000 1111

$$M = 1101 \quad Q = 1111 \quad A = 0000$$

$$\textcircled{1} \quad C = D \quad A = 1101 \quad B = 1111$$

shiftR C=0 A=0110 Q=1111

$$\textcircled{2} \quad C = -1 \quad A = 0011 \quad B = 1111$$

shiftR c=0 A = 1001 .0 = 1111

$$\textcircled{3} \quad C = 1 \quad A = 0110 \quad Q = 1111$$

shiftR C = 0 A = 1011 B = 0111

$$\textcircled{4} \quad C = 1 \quad A = 1000 \quad Q = .0111$$

shift R C=0, A = 1100 B = 0011

$$P = 1100\ 0011$$

$$128+64+9+1$$

195

1101  
1111

1101

11.09

110.1

1101

1100011

1101  
1101



→ \* In Arithmetic shift Right, the sign bit is conserved.

|   |       |      |                 |      |                              |
|---|-------|------|-----------------|------|------------------------------|
|   | A     | Q    | Q <sub>-1</sub> | M    |                              |
| ① | 0000  | 0011 | 0               | 0111 | $Q_0 Q_{-1} = 10, A = A - M$ |
|   | A - M | 1001 | 0011            | 0    | 0111                         |
|   | ASR   | 1100 | 1001            | 1    | 0111                         |
| ② | ASR   | 1110 | 0100            | 1    | 0111 $Q_0 Q_{-1} = 11$       |
| ③ | ASR   | 1110 | 0100            | 1    | 0111 $Q_0 Q_{-1} = 01$       |
|   | ASR   | 0010 | 1010            | 0    | 0111                         |
| ④ | ASR   | 0001 | 0101            | 0    | 0111 $Q_0 Q_{-1} = 00$       |

$$\begin{array}{l}
 \textcircled{1} \quad 0111 = M \\
 \textcircled{2} \quad M = 0111 \\
 \textcircled{3} \quad M = 1001 \\
 \textcircled{4} \quad M = 1001 \\
 Q = 1101 \\
 P = 00010101 \\
 \textcircled{5} \quad Q = 1101 \\
 P = 00010101
 \end{array}$$

Floating point Numbers:-



→ IEEE 754 standard is used for representation of floating point number.



①  $M = 0111$ ,  $B = 0011$ ,  $A = 0000$

|      |      |          |      |
|------|------|----------|------|
| A    | B    | $B_{-1}$ | M    |
| 0000 | 0011 | 0        | 0111 |

- ① A-S-R    0000    0001    1    0111     $\square_0 \square_{-1} = 11$
- ② A-S-R    0000    0000    1    0111     $\square_0 \square_{-1} = 11$
- ③ A+M    0111    0000    1    0111     $\square_0 \square_{-1} = 01$   
 A-S-R    0011    1000    0    0111     $A = A + M$
- ④ ASR    0001    1100    0    0111     $\square_0 \square_{-1} = 00$
- 

|      |      |          |      |
|------|------|----------|------|
| A    | B    | $B_{-1}$ | M    |
| 0000 | 0011 | 0        | 0111 |

$$\begin{array}{r} 0000 \\ 1001 \\ \hline 1001 \end{array}$$

- ① A-M
- ②  $M = 0111$ ,  $B = 1101$
- |      |      |          |      |
|------|------|----------|------|
| A    | B    | $B_{-1}$ | M    |
| 0000 | 1101 | 0        | 0111 |
- $$\begin{array}{r} 0001 \\ 1001 \\ \hline 1010 \\ 0111 \\ \hline 0011 \end{array}$$
- ① A-M    1001    1101    0    0111     $\square_0 \square_{-1} = 10$
- ASR    1100    1110    1    0111
- ② AFM    0011    1110    1    0111     $\square_0 \square_{-1} = 01$
- ASR    0001    1111    0    0111
- ③ A-M    1010    1111    0    0111     $\square_0 \square_{-1} = 10$
- ASR    1101    0111    1    0111
- ④ ASR    1110    1011    1    0111

$$\therefore QM = 11101011$$

③

$$M = 1001 \quad Q = 0011$$

|       | A    | Q    | Q-1 | M    |
|-------|------|------|-----|------|
|       | 0000 | 0011 | 0   | 1001 |
| ① A-M | 0111 | 0011 | 0   | 1001 |
| ASR   | 0011 | 1001 | 1   | 1001 |
| ② ASR | 0001 | 1100 | 1   | 1001 |
| ③ A+M | 1010 | 1100 | 1   | 1001 |
| ASR   | 1101 | 0110 | 0   | 1001 |
| ④ ASR | 1110 | 1011 | 0   | 1001 |

$$QM = 11101011$$

④

$$M = 1001 \quad Q = 1101$$

|       | A    | Q    | Q-1 | M    |
|-------|------|------|-----|------|
|       | 0000 | 1101 | 0   | 1001 |
| ① A-M | 0111 | 1101 | 0   | 1001 |
| ASR   | 0011 | 1110 | 1   | 1001 |
| ② A+M | 1100 | 1110 | 1   | 1001 |
| ASR   | 1110 | 0111 | 0   | 1001 |
| ③ A-M | 0101 | 0111 | 0   | 1001 |
| ASR   | 0010 | 1011 | 1   | 1001 |
| ④ ASR | 0001 | 0101 | 1   | 1001 |

$$QM = 00010101$$

## Pipelining:-

(30-03-2022)

- It means parallel execution.
- When we execute prgs using pipelining, the no: of clock cycles taken to completely execute them is called pipelining. less
- decompose sequential process into sub-operations where each sub-process executes on separate segment so that concurrent execution is possible.
- This pipelining is implemented in processor.

### → Instruction

- 1) Fetch IF
- 2) Decode ID
- 3) Execute IE
- 4) Memory Access MA
- 5) Write back WB

5-stage pipeline

- 4-stages
- 1) Fetch
  - 2) Decode
  - 3) Execute
  - 4) Write Back

→ Instr can be divided into above seg.

→ These segments are called stages of pipeline.

→ Every stage has a separate dedicated reg.

Non-pipeline ← sequential execution of instr.

Ex:- Consider an assembly program with 5 instr which are to be executed on a 3 segment processor. Compare No: of cycles req using both pipelined and non-pipeline instr.

- Assume every instr is taking 3 clock cycles
- So, without pipelining, the program takes 15 clock cycles.

Pipelining:-

- With pipelining it takes lesser time.
- It is found out using phase-time diagram.

Phase-time diagram:-

| clock cycle | $s_1$ | $s_2$ | $s_3$                 | Segments |
|-------------|-------|-------|-----------------------|----------|
| 1.          | $I_1$ | -     | -                     |          |
| 2.          | $I_2$ | $I_1$ | $I_2$                 |          |
| 3.          | $I_3$ | $I_2$ | $I_1 \rightarrow I_1$ |          |
| 4.          | $I_4$ | $I_3$ | $I_2 \rightarrow I_2$ |          |
| 5.          | $I_5$ | $I_4$ | $I_3 \rightarrow I_3$ |          |
| 6.          | -     | $I_5$ | $I_4 \rightarrow I_4$ |          |
| 7.          | -     | -     | $I_5 \rightarrow I_5$ |          |

for execution with pipelining, it takes less number of cycles.

Execution time & Speed up:-

- A pipeline with k-stages
- No: of instr executed = n
- Non-pipelined =  $n \times k$  cycles
- Pipelined =  $n+k-1$
- Speedup =  $\frac{nk}{n+k-1}$

| ② Instr        | S <sub>1</sub> | S <sub>2</sub> | S <sub>3</sub> | S <sub>4</sub>   |
|----------------|----------------|----------------|----------------|------------------|
| I <sub>1</sub> | 2              | 1              | >              | 2                |
| I <sub>2</sub> | 1              | 1              | 2              | 3 (non-pipeline) |
| I <sub>3</sub> | 3              | 2              | 2              | 2 (non-pipeline) |
| I <sub>4</sub> | 1              | 1              | 1              | 1                |

→ This gives how many clock cycles an instr spends in each seg.

$$\text{Non-pipelining} = 8 + 7 + 9 + 4 \\ = 28$$

|    |                |                |                |                                 |
|----|----------------|----------------|----------------|---------------------------------|
| 1  | I <sub>1</sub> | -              | -              | -                               |
| 2  | I <sub>1</sub> | -              | -              | -                               |
| 3  | I <sub>2</sub> | I <sub>1</sub> | -              | -                               |
| 4  | I <sub>3</sub> | I <sub>2</sub> | I <sub>1</sub> | -                               |
| 5  | I <sub>3</sub> | "              | I <sub>1</sub> | "                               |
| 6  | I <sub>3</sub> | "              | I <sub>1</sub> | "                               |
| 7  | I <sub>4</sub> | I <sub>3</sub> | I <sub>2</sub> | I <sub>1</sub>                  |
| 8  | "              | I <sub>3</sub> | I <sub>2</sub> | I <sub>1</sub> → I <sub>1</sub> |
| 9  | -              | I <sub>4</sub> | I <sub>3</sub> | I <sub>2</sub>                  |
| 10 | -              | "              | I <sub>3</sub> | I <sub>2</sub>                  |
| 11 | -              | "              | "              | I <sub>2</sub> → I <sub>2</sub> |
| 12 | -              | -              | I <sub>4</sub> | I <sub>3</sub>                  |
| 13 | -              | "              | -              | I <sub>3</sub> → I <sub>3</sub> |
| 14 | -              | -              | -              | I <sub>4</sub> → I <sub>4</sub> |

$$\text{Speed up} = \frac{14}{7} = 2$$

(31-03-2022)

Cycle time:-

→ Cycle time is value of one clock cycle.

Case-i:-

All stages offer same delay

→ cycle time = delay offered by one stage including delay due to register.

Case-ii:-

All stages don't offer same delay

→ cycle time = Max delay.

A 4 stage pipeline has 150, 120, 160, 140 ns respectively. Reg in each stage have a delay of 5 ns. Assuming constant clocking rate.

Total time to process 1000 data items/instructions on pipeline will be.

Total time for 1000 cycles

=  $\uparrow$  clock cycles + 999  $\times$   $\uparrow$  clock cycles.

= 165495 ns

Data hazards:-

→ Using pipelining reduces clock cycles.

→ DisAdv is possibility of hazards.

i) Data hazards

ii) Instruction/control hazards

iii) Structural

hazards.

### Data Hazards:-

- occurs when the pipeline is stalled because the data to be operated on is delayed due to some reason.
- These hazards are mainly prone in decoding stage.

Ex:-

Consider  $A = 5$  and two instr's

$$I_1: A = 3 + A$$

$$I_2: B = 4 \times A$$

The expected result is 32 in B

But due to abnormality, 20 is stored in B

- If the instr's are independent, then there will be no data hazards.

### Instruction Hazards:-

- Pipeline is stalled due to unavailability of instruction.
- Occurs during instruction fetch stage.

### Control Hazards:-

- These occur due to branching instr.

### Structural Hazards:-

- When more than one instr requires the use of same hardware resource, then these hazards occur.