



multi computers & multi processors

- UMA (Uniform Memory Access)
- NUMA (Non Uniform " "
- COMA (Cache Only " "

UMA:



Here each processor takes the same time to access a memory (uniform time) (e.g. SM)

NUMA: Here memory is separated physically to each processor



P1 → sys. In. co. N/w → LM2  
 P1 → LM1  
 P2 → sys. In. N/w → LM1

Here there will be time diff. for accessing diff. memories by a processor, due to memory location.



CIN - cluster intercon. N/w

Here each cluster acts as UMA

COMA:



## Multi Computers:



**star:**



**bus**



**UMA** → Symmetric → If all those processors can be done by all processors then they are symmetric

Asymmetric



System Software:

Additional utilities

OS - interface b/w user & hardware

## Multi programming:

1. Get application from secondary memory.

2. exec → read data

compute  
write



for any pro.

Whenever the CPU is idle, it loads another application even when the 1st one is not completely executed.



## Number Representation

$$B = b_0 b_1 b_2 \dots b_n$$

$$V(B) = b_0 \times 2^{n-1} + b_1 \times 2^{n-2} + \dots + b_n \times 2^0$$

Sign magnitude

its complement

2's complement

Representations

Subtract the binary no. from  $2^n$

$$2^n - 1$$

So, add 1 to get 2's from 1's

Memory Operations:

load

store

n bits - word

memory organization



↳ 16 to 64 bits

1 byte

4 ASCII value of length 0-8 characters

can be stored in one word

byte addressable

0 → 1 byte  
1 → 2 bytes  
2 → 3 bytes



little Endian



byte addressable

Big Endian



$2^k$  - addressable locations if we have a k-bit register

Sign Extension: Adding 1's on the left hand side of a 2-bit

a 2's complemented bin. no. doesn't make any effect.

Based on operations (data transfer, conversions, logical, arithmetic)

Instructions:

OPCODE OPERAND

unsigned

signed

110101

110101

101010

101010

Eg: ADD R1, R2

logical left → 110101

right → 011010

111010

shift

arithmetic left

right → 111010

111010

ari. left: 110001

100111

100111

ari. right: 111001

111000

111000

ADD R1, R2, source, desti.

ADDI #25

Add → will be sent to immediate accumulator.

Sources  $\begin{cases} \text{registers} \\ \text{memory locations} \\ \text{immediate} \end{cases}$  destination  $\begin{cases} \text{register} \\ \text{memory location} \\ \text{immediate} \end{cases}$

Instructions: (Based on # of Operands)

① 3 address      OPCODE      Mem/reg      M/r      M/r

$$\text{Eg: } X = (P+Q) \times (R+S)$$

ADD R1, P, Q

ADD R2, R, S

MUL R, X, R1, R2

OPCODE M/r M/r

② 2 address

MOV R1, P

Add R1, Q

MOV R2, R

Add R2, Q

MUL R1, R2

MOV +X, R1

③ 1 address  
LOAD, STORE

LOAD P

~~OR ADD Q~~

STORE T

LOAD R

~~ADD STORE S~~

MUL T

STORE X

(finite) no. of bits

If we have definite ad  
required for opcod, operand  
then it becomes easy for  
operation

(no operand references)

④ zero address are used

(stack)

We use when we have

stack organization

in the system

operation - top 2 ele

PUSH P

PUSH Q

ADD

PUSH R

PUSH S

ADD

MUL

POP X

Sequencing:

|        |            |                                      |
|--------|------------|--------------------------------------|
| CE A+B | LOAD R1, A | considering each i/n to take 4 bytes |
|        | ADD R1, B  |                                      |
|        | STORE R1   |                                      |

Here instructions will be

executed linearly

i.e., after the execution of  $i^{th}$  instruction,  $(i+4)^{th}$  will be executed

Branching/Looping:

LOOP LOAD R1, N

{ add element to reg R0  
and increment R1  
decrement R1 }  
(Arithmetic overflow)

i.e., checks if the last instr. BRANCH > 0  
if R1 > 0 loop will be checked MOVE SUM, R0

Addressing modes: → Immediate  
 ↘ direct  
 ↘ indirect

Immediate:

MOVE A, #F50

OPCODE OPERAND

→ memory will be limited for data  
 ↳ accessing speed will be fast

Memory

Direct:

OPCODE ADDRESS

→ memory will be limited for address space

ADD R1, 4016H

Indirect:

OPCODE ADDRESS

Memory



not restricted with range of the memory.

Eg: ADD R, LOCA



Register direct:

OPCODE REGISTER

Register bank



range is limited  
faster access.

NO need to access data from memory.

Register indirect:

OPCODE REGISTER

Reg. bank



not limited in address range

PC mode

Relative:

OPCODE Address

memory

Eg: Address: 30

Adder → operand

PC



$$\text{Now: } \text{PC} = 900 + 2 + 30 \\ = 932$$

Stack mode:

Operates on top two data values. Swaps top two values. This address is used to access the operand at stack top.

Stack pointer → points to the top

ADD

↳ Adds top 2 ele.

Base addressing mode: Memory off register + base = 900 + 10

Here we add the address we have to the base address to get the physical address for the instruction to access the operand.



index register addressing mode:



auto increment:



auto decrement: similar to this ↑

Assembly Language:

Mnemonics: ADD, MOV, BR, LOC, INC

Mnemonics (symbols) + rules → converts prog. lang. to assembly lang.

Assembler: part of system sw. Set of i/n that helps if stored in memory.

A.L → must specify the addressing mode.

Assembler converts the code to binary code

& values. & symbols

Directives are used to do these.

LOOP EQU 100 ⇒ This directive tells that the value of LOOP is 100.

START ⇒ Directive that specifies the start point of execution

ORIGIN ⇒ point from where the addresses should be given for i/n.

END ⇒ end of execution.

define storage ⇒ reserves memory for some data items.

define constant ⇒ allows us to reserve some memory before the

program execution.

ORIGIN → how to interpret the names → Equate directory file

↳ where to allocate → origin directive

symbol table: contains the list of symbols along with their respective values.



~~READWAIT BRANCH TO READWAIT if  $SIN = 0$~~

Move input data into R from DATAIN

~~WRITEWAIT BRANCH WRITEWAIT if  $SOUT = 0$~~

Output to DATAOUT from R

Initially,  $SIN = 0$ ,  $SOUT = 1$

Memory mapped I/O:

MoveByte R, DATAIN

MoveByte DATAOUT, R

READ Testbit INSTATUS, #2

BRANCH=0 READ

MoveByte R, DATAIN

eg: 100100  
#2  
'0'  
so, the  
status  
flag=0

Status registers tell the status of each device.

WRITE Testbit OUTSTATUS, #2

BRANCH=0 WRITE

MoveByte DATAOUT, R

~~REPKD~~

MOV R0, LOC

READ Testbit INSTATUS, #3

BRANCH=0 READ

MoveByte R0, DATAIN

WRITE Testbit OUTSTATUS, #3

BRANCH=0 WRITE



MOVE BYTES "DATAOUT", #0  
compare CR, #0 + (-Auto increment) if equal branch

STACKS: Memory will allocated from higher to lower addresses

push:  
 move (SP)-, DATA  
 pop:  
 move DATA,(SP)+

Stack point register → top 1800  
 points to top  
 (SP) fixed bottom 2000

Compare (SP), #1000  
 destination, source  
 dest - sour  
 returns either +ve or -ve

If o/p +ve → stack → underflow

COMPARE (SP), #1000

BRANCH > 0 ~~IF~~ STACK UNDERFLOW

MOVE DATA, (SP)+

→ POP



stack underflow

the integer value of CR is 0  
 this is wrong with software  
 so we have to do something  
 to handle this problem  
 we can do this by adding a new instruction  
 ADDI, a simple add operation  
 to the program

~~MOVE -(SP), ITEM~~ → PUSHA supports auto increment & auto decrement  
~~SUBTRACT SP, #4~~  
~~MOVE SP, ITEM~~  
~~COMPARE (SP), #100~~  
~~BRANCH LO STACK OVERFLOW~~  
~~MOVE -(SP), ITEM~~

Queues: Memory allocation → lower to higher

Circular queue

insertion → ↓

deletion → ↓



## subroutine:

Set of instr which are to be performed many times at different locations, then we store them in a subroutine & branch to them whenever necessary.

(similar to ^fn's in prog.)

call instruction is used to call the subroutine from return state/instr.



here the ADD fn. returns to the address stored in LR.

As the fn. call is initialized, the PC now points to the instr in the ADD fn. (subroutine). So, after the fn. returns, PC to point to the nxt instr. we store the next instr. address in 'Link Register' before pointing to the sub-ro.

```

100 CALL ADD
104 MOVE SUM, R0
PC 104
↓
Link Register 104
(LR)
    ↓
    return.
  
```

But if the subrount. fn again calls sm other subrount. then the LR now gets replaced with the new address. so, we now lost our previous address.

To avoid this we use stacks to store the addresses.

processor stack

Passing parameters

- registers
- memory loc.
- stack

depending on the additional function of parameter passing

MOVE R1, N

MOVE R2, NUM1

100 CALL ADD

104 MOVE SUM, R0

ADD CLEAR R0

LOOP ADD R0, (R2) +

DEC R1

BRANCH >0 LOOP

RETURN





Stack frame - used when subroutine is called & is freed up when the subroutine returns.

frame pointer → used to access local variables & the parameters

|              |
|--------------|
| reg2         |
| reg1         |
| loc var      |
| loc var      |
| saved (FP)   |
| return addr. |
| param 3      |
| param 2      |
| param 1      |

↓  
 declared inside the subroutine  
 SP's location changes, if we  
 push/pop  
 FP's location is fixed  
 ↓  
 general purpose register



MOVE - (SP), FP  
MOVE FP, SP  
SUBTRACT SP, 12

(2+3) + (8+9) = 23

9+10 0000

8+9 0001

$$\sum_{i=1}^{n-1} A(i) \times B(i)$$



~~MOVEMULTIPLE -(SP), R0-R3~~

Move R3, 28(SP)

MOVE R2, 24(SP)

MOVE R1, 20(SP)

MOVE R4, R1+

MUL R4, (R2)+

ADD R0, R4

R1 10 bits

R2 10 bits

overlap init

MOVE R1, ALOC

MOVE R2, BLOC

MOVE R3, N

CLEAR R0

LOOP Move R4, (R1)+

MUL R4, (R2)+

ADD R0, R4

Decrement R3

Branch > 0 loop

Move ANS, R0

LshiftF

LshiftL dest, count

LshiftR dest, count

AshiftL

AshiftR

Mach. instr. encoding:

Assembly lang  $\xrightarrow{\text{Assembler}}$  Mach. instr.

OPCODE Dest Src OtherInfo

Addressing mode etc.

CISC  $\rightarrow$  RISC  $\rightarrow$  Reduced Instruction Set Computer

$\downarrow$  complex, variable

$\hookrightarrow$  fixed length  
insts.

Complex Inst. Set Comp.  $\rightarrow$  variable length instr.

Processor rev. ed. revision

Move (R3), (Loc)  $\rightarrow$  No. of memory references  $\Rightarrow 3$

$\downarrow$  Indirect

register

addressing mode

$$M = (P+Q) + (R+S)$$

LOAD R1, P

LOAD R2, Q

## Piplining:

Each problem is divided into sub problems  
(Inst.)

CISC Instr. is converted into  
complex

RISC

↳ simpler

bcoz of fixed length



CISC  
IBM 360/370

Intel X86/pentium  
VAX 11/780

Backward compatibility

DA32 :

## Registers:



Special Memory



instr.  
↑  
code  
stack  
data  
operand

globally

allocated  
memory  
(global  
variables)

double  
word  
registers



Selectors

↳ to point to  
certain mem. point

(e.g. code, stack)

Effective address

(Address which will be  
achieved by performing  
some operations during  
execution)

$$\text{Eg.: EA} = \text{Mem}[R1+12]$$

Instr. pointer

## Addressing modes:

### Immediate

8 / 32 bit operands

Direct: 32 bit



Register direct: We can use any of the 8 general purpose registers.

Register Indirect: We place a 32-bit integer in the register  
(any of the 8 gen. p.r.)

### Base with displacement:

Base address will be stored in a register (1 of 8)

EA = content of base + Displacement

→ reg. (1 of 8)

Index with disp.: EA = content of index \* scale + displacement  
Scale factor - to access off the array of elements (gives the bottom limit)  
→ reg. (8) of 8 (range).

EA = content of index \* scale factor + displacement

### Base with index:

2 out of 8 gen. reg. are used

EA = Base + index \* scale

### Base with index & displacement:

EA = Base + (index \* scale) + displacement

OPCODE Dest. Src

MOV R, #15

# - immediate

MOV R, [LOC]

[ ] → indicates direct add. mode

Add R<sub>1</sub>, [R<sub>2</sub>]

$$R_1 \leftarrow R_1 + \text{Mem}[R_2]$$

### Base with displacement

Register direct

Base reg.

1000

disp.

100



Base & Index with disp.

Opcode dest, src  
we can almost have any one of them (dest, src) as a memory reference

8-bit 16-bit 32-bit

|   |                |     |
|---|----------------|-----|
| A | AX             | EAX |
| B | BX             | EBX |
| C | CX or register |     |
| D | DX             |     |
| E |                |     |
| F |                |     |
| G |                |     |
| H |                |     |

gen.pur. regis.

file as criteria

MOVE AL, 10

↳ moves the src(10) to the lower 8-bits

AH → moves the content to the higher 8-bits

OFFSET

MOV EAX, OFFSET LOC

MOV EBX, [EAX]

LEA → Lowered Effective Address

LEA EAX, [EBX+50]

JZ LOOP

↳ Jump to location loop if the value of branch is 0

JG loop

↳ Branch > 0

Loop directing: LOOP ADD

M/c i/n format:

OPCODE Addressing mode displacement

1/2 bytes 1/2 bytes 1/4 bytes

If we use only 1 loc., then

(1 reg) 1 byte is suff.

Eg: MOV EAX, EBX

2 bytes → 2 registers → Eg: ADD EAX, [EBX + EDI \* 4]

EDI - Index Register

MOV EAX, 820  
1 byte  
32 bits  
4 bytes

MOV EBX,  
MOV AL, 8  
1 byte  
8 bits  
1 byte

push  
We can push/pop memory locations into the stack  
(data in it gets stored)

LEA EBX, NUM1  
MOV ECX, N  
MOV EAX, 0  
MOV EDI, 0  
ADD EAX, [EBX + EDI \* 4]  
INC EDI  
DEC ECX  
JG ADD  
MOV SUM, EAX

LEA EBX, NUM1  
MOV ECX, N  
MOV EAX, 0  
SUB EBX, 4  
ADD EAX, [EBX + ECX \* 4]  
LOOP ADD  
MOV SUM, EAX

data: (data section)

DD NUM1 (15, 7, 6, 3, -21)

DD N (15)

DD SUM (0)

MAIN: i/n that r translated to M/c Lang.

END MAIN

SF, ZF, OF, CF conditional flags.  
Status flag Zero flag Overflow flag carry flag  
arithmetic overflow

in sub:  
CF = 1 → borrow signal