

3/11/23

Date / /

Tue Jan

## Computer Architecture

1

Saathii



Block Diagram  
of Machine  
Computer

Information Processor  
⇒ Input, output are  
both information.

CPU site for all processing / information processing  
CPU interfaced with MM  
CPU has address lines, will generate  
address.

CPU will float address.

MM should also have address line.

Address line → Output of CPU  
→ Input of MM

Address bus → Output of CPU

+  
collection of address lines

Data bus → Input to CPU, output of CPU, both  
output of MM, Input of MM, both

Generally  
bidirectional

OR have data in and data out lines

For each CPU have 1 read write signal line  
or 1 read & 1 write line to  
identify which operation  
∴ Input / Instruction to MM.

G

C

Page No. [ ]



Scanned with OKEN Scanner

(2)

Saathi

Date / /



Any memory is  $m \times d$  memory  
It means that



$$8 \text{ GiB} \rightarrow \text{Byte} \Rightarrow d = 8$$

$$8 \text{ G} \rightarrow m = 8 \times 2^{30}$$

$m=1, d=1$  1 word of size 1 bit

$m=1, d=2$  1 word of size 2 bit

$m=2, d=1$  2 words of 1 bit size

$m=2, d=2$  2 words of size 2 bit



$a=0 \rightarrow \text{word 0}; a=1 \rightarrow \text{word 1}$

$m \leq 2^n$  where  $n = \text{No. of address lines}$

No. of data lines =  $d$

RD/WR  $\rightarrow 0 \rightarrow \text{write}; 1 \rightarrow \text{Read}$

Chip Select  $\rightarrow$  At 0 enabled, 1 disabled chip

Page No. [ ]



Scanned with OKEN Scanner

(9) Saathi

(3)

Saathi

74367 → Tristate  
Buffer

Date \_\_\_\_\_ / \_\_\_\_\_ / \_\_\_\_\_

Tristate ≡ Cutoff

When want to receive address/data  
get it,  
otherwise not receive → At tristate

First write, then read

74367

| $\overline{G_1}$ | 1A | 1Y     |
|------------------|----|--------|
| 0                | 0  | 0      |
| 0                | 1  | 1      |
| 1                | X  | High Z |

~~High Z~~ Tristate  
(No light flickered)

High Impedance Z



1.  $\overline{G_1} \rightarrow$  Yellow

2. 1A  $\rightarrow$  Blue

15  $\overline{G_2} \rightarrow$  Black

14 6A  $\rightarrow$  Red

| $\overline{G_1}$ | 1A | $\overline{G_2}$ | 6A | Output |
|------------------|----|------------------|----|--------|
| 0                | 0  | 0                | 0  | 0      |
| 0                | 1  | 0                | 1  | 1      |
| 0                | 0  | 1                | X  | 0      |
| 0                | 1  | 1                | X  | 1      |
| 1                | X  | 0                | 0  | 0      |
| 1                | X  | 0                | 1  | 1      |

Apart from CPU also there will be other processors.

CPU is useless without memory directly interfaced.

Memory attached to CPU is main/ primary memory.

If X access main memory CPU X process.



CPU follows instructions to process data.

Main memory is comparatively costly but comparatively faster.

Costly  $\Rightarrow$  There exist cheaper memory with similar processing.

Any memory has an access time.

If faster  $\Rightarrow$  Access time is less.

Main memory is chosen to be comparatively less volume.

Secondary memory is less costly, more slower than main memory.

$\therefore$  Cost  $\downarrow \rightarrow$   $\uparrow$  Access time

$\downarrow$   $\uparrow$  Volume

Date \_\_\_\_\_ / \_\_\_\_\_ / \_\_\_\_\_

Transparent  $\rightarrow$  X know what is behind.  
 $\downarrow$  X go through it.

Everybody

using computer

X know how

much main memory, secondary memory,  
volume (apart from bytes).Total cost of memory  $\rightarrow$  Should be min.

$$= C_m \times V_m + C_s \times V_s \rightarrow \text{Should be min.}$$

 $C_m \rightarrow$  Cost per unit vol. of main memory $V_m \rightarrow$  Volume of main memory $C_s \rightarrow$  $V_s \rightarrow$ 

Appears as if

total vol. of memory  $\approx V_m$ total cost/unit vol.  $\approx C_s$ 

$$\therefore \text{Total cost} = C_s V_m$$

Computer useless without input, output devices -

CPU:

ALU  $\rightarrow$  Where all arithmetic & logical operations done

Control Unit

Major part of CPU, controls sequence of info access from memory, what operation should be done  
 ALU receives instruction from here

6

Saathi

Date \_\_\_\_\_ / \_\_\_\_\_ / \_\_\_\_\_

Computation speed should meet demand.

+ Time for computation → Make CPU more complex

We are now in 5nm technology  
ie, effect ↓

↑ Fast, ↓ space

Technological developments

Change computer architecture

↓  
To get faster computation

Input output devices are much slower  
as compared to CPU.

Memory also slows down CPU  
Does not mean memory is slow

i: outside chip communication

Any outside chip communication is  
slow

5/1/23

Thu

RAM → Immediate Access Storage → 1<sup>st</sup> comp.

architecture

7

Saathi

Date / / → 1st electronic computer  
NEAC → 1st electronic computer

- Electronic Numerical Computer
- 1940s launched
- Able to store program
- Size: 30x50 sq. feet or 80x50 sq. feet
- 200 kW
- Not run more than 2 hrs
- Computation power: 5000 addition per second / simple operations per second
- 18000 vacuum tubes (approx.)
- weight 30 tonnes

### EDVAC

- 1st comp to store program
- Electronic Discrete Variable and computer
- 

Need to realise floating points & their operations which are more complex than integer operations / arithmetic.  
 And to also realise millions / billions per second)

### ~~EDVAC~~ → ~~EDPAC~~

- This occurs due to change in machine comp. language, comp. architecture

### 5 generations of computers

1st Generation → Processors designed with vacuum tubes

~~weight 30 tonnes~~

1 vacuum tube ≈ 1 transistor

11700 crore transistors in today's processor

Page No. \_\_\_\_\_ Page No. CPU's process



Scanned with OKEN Scanner

3<sup>rd</sup>  
Generation  
Date \_\_\_\_\_

3  
Saathi

3<sup>rd</sup>  
Generation  $\rightarrow$  Software were written  
type

$\rightarrow$  Memory is very limited  
Memory delay time increases  
 $\rightarrow$  Primitive I/O, ALP devices

Punch cards  $\rightarrow$  Used to give instructions to  
Machines / program

$\uparrow$  Size than piano.

$\rightarrow$  Programming language  $\rightarrow$  Machine lang.  
 $\rightarrow$  Only binary assembly lang.

4<sup>th</sup>  
Generation  
1950s - 1965  $\rightarrow$  2<sup>nd</sup> generation  
Machine comp.

deminated

Major change

$\rightarrow$  Transistors were introduced  
(invented in late 1940s) & then used  
in machine comp.s instead of  
vacuum tubes.

$\rightarrow$  Memory  $\rightarrow$  Mercury delay is replaced  
by Drum,

$\rightarrow$  I/O  $\rightarrow$  I/O is Sophisticated  
Keyboard introduced

$\rightarrow$  Fortan code/ introduced as programming  
long page.

## 2<sup>nd</sup> Generation

9

Saathi

- Batch Processing → Execution of programs in a sequence
- Jobs placed in batches

### 2<sup>nd</sup> Generation

#### 3<sup>rd</sup> Generation

- 1960 - 1970
- IC technology was introduced (Metal Change)
- SST, MST were in place
- Semiconductor memory introduced
- Pipelining & Computer clock introduced to improve speed of machine
- Cache also introduced
- High level language
- C also introduced

### Multiprogramming → Parallel processing

#### Multiprogramming → Cover in OS

- Many processes → Some waste processes occur when run other user programs
- 1 program system programming process
  - ↑ Than 1 user program simultaneously stay in system

#### Multiprogramming

## Time Sharing

Consider 3 programs simultaneously in Sys - E<sub>1</sub>, E<sub>2</sub>, E<sub>3</sub> (say)

All multiprogramming & time sharing but all time sharing  
✓ mult-programming

CPU so fast that X knew if CPU gave service to other in certain time

Some time E<sub>1</sub> given time for 10ms.

Next 10ms E<sub>2</sub>  
Next 10ms E<sub>3</sub>

(The 10ms boundary)

This is time quantum & Round Robin Schedule

10ms 20ms 30ms

E<sub>1</sub> E<sub>2</sub> E<sub>3</sub> E<sub>1</sub> E<sub>2</sub> E<sub>3</sub> E<sub>1</sub> E<sub>2</sub> E<sub>3</sub> E<sub>1</sub>

We identify for 20 ms give service to another

parallel processing

Introduced in 3rd generation



Maastricht  
University

- 1970 - 1980
  - 1st VLSI Ultra Large Scale Integration
  - Parallel processing known
  - Massively parallel processing introduced

→ How much parallel processing possible  
 → Very sophisticated if, so that  
can handle

- Sophisticated execution to handle  
 Scalable vector data introduced.
- Realisation of high speed computation,

### Fifth Generation:

- 1990 - Present
- Major changes / breakthroughs
  - Expert Sys replace human brain
  - Hardware with process knowledge,  
 voice, video, image
  - Ultra VLSI, Micro LSI, Wafer Scale  
 Integration came
  - Complete distributed form  
 Crane normal machine /  
 Von Neumann Machine

Pred to 2000

peripheral industry interested to  
 accept data flow machine  
 peripheral dominated

- 2 goals
  - Processor performance
  - Peripheral processor

Date

~~Micro computer was not dominated by supercomputer till 2005.~~

~~Supercomputer~~ with limited process

Micro computer was not dominated by supercomputer till 2005.

Supercomputer  $\rightarrow$  Large size

Till  $\times$  demand  
 $\times$  dominate

~~Want~~

Requirements

$\downarrow$  Size,  $\downarrow$  Power,  $\uparrow$  Computation Power

(Performance Mrozski)  $\downarrow$  Cost,  $\uparrow$  Speed



Moore's Law

In 1965 Moore proposed

The introduction of func-in chip will show exponential growth

No. of transistors in chip double in 24 months, but

everyone thought it would do so in 3 years

$\uparrow$  Transistor in single chip to  $\downarrow$  size

Compaction density  
No. of gates per sq. cm

This should be less more

$\uparrow$  Transistor in

$\Rightarrow$   $\uparrow$  Computation ?  
function includes

Every transistor generate heat

$\uparrow$  Compaction density  
 $\Rightarrow$   $\uparrow$  Heat in small area

Some area in chip hottest  $\rightarrow$  damage of chip.  $\therefore \times$  Easy to increase compaction density.



~~Address~~, if need ↑ compaction density we need to change scale of integration or technology used.

{  
25mm tech.  
18mm tech.  
7mm tech.  
5mm tech.  
4mm tech.

→ Scaling of tech. to ↑ compaction density

By 2025 may have reached least size

To ↑ compaction density we wouldn't be able to use scaling.

Power Consumption > Power diss.

1971 → Intel 4004 → 4 bit processor

$$\text{No. of transistors} = 23000$$

Now XTON5 ~~deep groove transistors~~  
→ 800 crore transistors

Cost:Codage LawPerformance of machine comp. of (Price)<sup>2</sup>Other PerformanceUser's Point of View (P, O, V)

↳ How much time to execute program

(i) Depends on

Program size ~~with ORP and others~~Say NI ~~instructions~~ instructionsFind no. of prog cycles required  
per instruction

Called CPI.

Find no. of progs from instruction

Find time taken per prog

$$NI \times CPI \times T = \text{Total time}$$

Total timeQuantum Cell Automaton (QCA)

Info flow through coulombic interaction

Key device is a cell

4 holes where  $e^-$   
can settleBy some means  
2  $e^-$ 's put in

same hole

Now 2 stable states


 $\rightarrow$  Max. dist. b/w  $e^-$   
for 1 stable

Based on this design diff logic gates

### Changes

Vacuum Tube

Big Size, weight  
↑ Cost, power consumption

IC

Comparatively lesser cost, size, power consumption

LSI

~~MOS~~ MST

VLSI

→ & A



11/1/23 Von Neumann Architecture

wed Jan



Accumulator → Temporarily stores the operand

1 operand → DR

other → Accumulator

Temporarily

stores output  
of ALU

DR → Data Register

MDR Memory Data Register

Data transfer b/w memory & CPU  
will be through DR

AR → Address Register

MAR Memory Address Register

Any memory access

Say CPU access an address

This address first stored in  
AR.

Address of memory location first  
stored in AR

DR, AR → Control transfer  
b/w memory, CPU

PC → Program Counter

Stores the address of the memory location  
from where next access will be done

i.e., Stores next executable instruction  
address

IR → Instruction Register

After instruction available at CPU, it should  
be decoded → Func. of IR, Decoder

Control P Depend on instruction control clk generates  
ckt. Sequence of signals to do operation

At a time 1 instruction faced  
decoded, executed

~~Program~~ Data & instruction will be in  
same place / memory

1946-48 von Neumann proposed &  
developed

I cache  
Main memory to  
CPU data transfer  
Operation cause  
to I cache

D cache  
Operand to D cache

Let machine have few no. of instructions  
like  $x \rightarrow$  memory location

LOAD ~~Load~~  $x$

Content of accumulator

will be  $x$

STORE ~~Store~~  $x$

Accumulator content go

to

ADD ~~Add~~  $a$

Accumulator content go  
to  $(x + DR)$ 's content

AND  $a$

Accumulator content go

to  $x$  AND DR's content

JUMP  $x$

Program counter content

is  $x$

JUMP  $x$

If accumulator content is  
0 jump to  $d$

RSHIFT

Accumulator content

shifted right by 1 and

Accumulator content

complement of accumulator

Comp

Macros instructions

Content of PC transferred to AR  
Instruction read from memory

When instruction fetch DR. any info  
has 2 codes  
 ↳ operation code  
 ↳ operator code

PC ready for next instruction

If 1 word  $\rightarrow$  PC increment by 1  
k words  $\rightarrow$  PC increment by k

Now decide if instruction is load  
If load, then perform operations

Load  $\rightarrow$  Fetch from memory location n

AR contains x  
Content of memory x go to PR  
Ultimate destination accumulator  
From PR to accumulator

Micro instructions  $\rightarrow$  set of instructions  
operations to perform 1 macro instruction

Load  $\rightarrow$  3 micro operations

For each macro instruction there  
is 1 micro program



21

Sarithi



- Now TAG architecture dominates
- hardware cost ↓

12/1/23 CPU should maintain sequence in which instructions should be executed, until hardware cost ↓



each instruction has own operation code.

Next instruction inside instruction itself

length of instruction becomes large, memory required ↑, memory was costly, width

CPU bus also should be very large through which CPU collects data from memory ⇒ taking in bus.

⇒ ↑ Size of CPU

- Discard putting next instruction in instruction
- Now PC stores next instruction address
- Reduce length of instruction

Date \_\_\_\_\_ / \_\_\_\_\_ / \_\_\_\_\_

Saathi

22

VLTW  $\rightarrow$  Very large magnetooptical word  
[Next] instruction in instruction.

Instruction Sequencing with help of PC:

## Mazeo Instructions



Check if CPU  
Surcharged or not

23

Saathi

Date \_\_\_\_\_ / \_\_\_\_\_ / \_\_\_\_\_



94

Sathish

Date / /

Load PC with starting address.

For each instruction,  
Have decoder  
to find which  
instruction

PC



Now 2nd instruction



For every instruction  
steps 0 - 3 common



Fetch instruction  
→ Decode instruction,  
→ Fetch operand  
→ Execute

4F 4D 4C F EX

Effectively each instruction  
passes above  
4 stages

Upgradations for Von Neumann machine

1. Less no. of registers in von Neumann AR, PC, DR, TR, Accumulator
  2. Today's machine have  $\uparrow$  no. of registers like General Purpose Registers, Special purpose registers, for ~~Accumulator~~

These are also index registers  
Daring operand addressing  $\rightarrow$  Direct  $\rightarrow$  Immediate

ADD Tx Y → Using index register  
Tx is index register  
and operand

8888888

The diagram illustrates the von Neumann architecture. At the top, a box labeled "Effective operand address" contains a value "AC". An arrow points from this box down to a horizontal bar representing memory. The memory bar is divided into two sections: "CPU" on the left and "AC" on the right. Below the memory bar, another arrow points down to a box labeled "Van Neumann \rightarrow Sample".



Today → Complicated operations like multipollution admissions

~~Cosily but fast~~

Von Neumann  $\rightarrow$  1 ALU

3. Today  $\rightarrow$  k partition ALU

ALU divided into parts

All Func. s of ALU may  
not be required  
simultaneously

Integer arithmetic  
Operations      Shifter      Floating point etc.  
operations

This satisfies 2 requirements

Comp-Arch.  $\rightarrow$  mostly designed by Sweeney  
~~much math.~~

If  $\times$  partition, all operations require  
same time say  $t$ .

If partition, each partition has diff  
computation time

Integer  $\rightarrow$   $t_1$ ; Float  $\rightarrow$   $t_2$

Obviously  $t_2 > t_1$

$t_2 > t_1$  but  $t > t_1$

$\therefore$  If use integer arithmetic  $\uparrow$  as compared  
to float arithmetic  $\rightarrow$  execution time  $\downarrow$

Multiple instructions can be handled in CPU all using ALU

If each instruction use diff partition of ALU  
can execute all instructions simultaneously  
∴ Execute faster

4. If no. of ALU's

Now execute k no. of instructions simultaneously.

5. If von Neumann has instructions

$I_1, I_2, I_3, \dots$

$I_2 \times$  Touched by  $I_1$ , completed

Any communication between CPU memory costly (takes lot of time)

Communication within CPU / within memory fasted.

If in CPU can have queue of registers.

Say  $I_1$  in execution then can get ready for execution of say  $I_2$ .

Instruction will be prefetched  
This is instruction prefetching

## 6. Pipelining



Say all ports take same time

When IF of  $J_1$  complete & go to ID, can start IF of  $J_2$

If first finish  $J_1$  & then start  $J_2$   
so on then for  $n$  instructions  
 $4T_m$  time

$4T + (n-1)T$  time of execution here

$$4nT \geq 4T + (n-1)T$$

When  $J_1 \rightarrow GF$

$J_2 \rightarrow ID$

then  $J_3 \rightarrow IF$

## 7. Parallel Processing:

Simultaneously more than 1 instruction executed  $\rightarrow$  say one executing, other fetching operand

Von Neumann One by one

## 8. Speculation / Speculative Executions

Speculation is imbibed in architecture of machine.

OS:

IP &amp; D = 0 then

$$r = 1 + 5$$

else

$$P = 1 + D$$

If OS finds 2 processing units available  
will execute if in 1 & else in  
other processor.

Keep all processors busy.

Speculation is similar to this

~~Control Speculation      Data Speculation~~

Say program with

$I_b \rightarrow$  Branch Instruction  $\rightarrow$  say

$I_{b+1}$   $\rightarrow$  go to  $I_t$

$I_{b+2}$

$I_t \rightarrow$  Target

$I_{t+1}$

When CPU execute  $I_b$ , will decide &  
deduce branch instruction.  
But X know how to proceed  
& X known, can't predict  
next instruction.

preferring

(30) Sathish



But  $\times$  afford

this delay

speculate,  
say cond<sup>n</sup> satisfied. Then go to  $T_4$

have to make speculation by  
necessarily false

If say speculation is false then  
should flush out data of ~~state~~  
execution when take speculation  
true

But this is not so simple  
So store if temporarily, so that  
can delete result of  
speculation wrong,

Data Speculation

Say  $T_1, T_2, T_3$

$T_1: \text{ary}$   $T_2 \times$  proceed  $T_3: \text{y}$   
 $T_2: \text{ary}$  done  
 $T_3: \text{ny}$  But have parallel processing

May be that all of  $T_1, T_2$  is ready  
to execute (processes available)

But have to wait for dependency -  
if know  $x, y$  can execute instruction

So speculate data

Again if speculation incorrect  
Data flush out

Today miss speculation is close to 0%

q. Queen computing

If speculation from instruction will be  
lapping same frame  
To use eager execution i.e., follow  
both paths simultaneously  
One correct → one incorrect  
Discarded incorrect → faster execution  
↓  
X Delay

The All instructions executed  
if  $f_1 > f_2 \rightarrow$  Power consumed  
Rule of queen computing → followed  
them

Now ~~power~~   
f1 position in ALU utilised → switched  
off to  
some power

Date / /

So replace 1 powerful core processor with ↑ power, ↑ fast computation

with multiple ↓ power processors (multicore).

Each core will have lesser ~~executing~~ execution speed than a single core.

Power  $\downarrow$   
 Static Power      Dynamic Power

$$P_s / P_d = C V^2 f \quad * \text{check}$$

Speed of execution depends on clock freq.

↑ Clock freq.  $\rightarrow$  ↑ Running time

↑ Cores with ↓ clock freq.  $\rightarrow$  ↓ power

\* reduce beyond certain threshold as otherwise noise will destroy system.

i. Can only control f.

CMP (Chip Microprocessors)

Within single chip, CPU includes billions of processing / processor cores.

Bottleneck von Neumann

↳ program, Data stored in same place



Early machine computer had only assembly, machine language i.e., hardware supports only simple structured information

In high level language have array but  
In earlier computer → & declare array

this translation requires additional cost  
→ computer  
→ System Software support  
→ Delay

have to use other way to do an array routines through which when translated base hardware can support

This gap is called semantic gap

Gap b/w language & hardware support

Date \_\_\_\_\_  
Page \_\_\_\_\_

Construction of stack  
n elements, each element size l

use shift register (bidirectional) n m  
no. to design it, Each element is 18 m bits  
long.



Push  $\rightarrow$  Shift 1 bit to right



Push  $\rightarrow$  Up down movement 1 & right shift  
Pop  $\rightarrow$  Up down counter, decrement 1 & left shift

35

Sarithi

Date / /

One way to → semantic gap is at  
Data Descriptions.

Say declare variable with type public/private  
protected or not,  
How to describe this to machine?



Each op (operator) is a variable  
In data descriptor for each operand  
we can have some information



Tag also helps to → semantic gap

Tag can be 1bit, 2bits, 3bits, long  
depending on use,

Tag may be used to tell type of  
variable like long, short, etc.  
n bit processor → Can process n bits at a  
time,

8086 → 8088 Both have 16 bit processor  
8088 → 16 bit width than 8086 → 8 bit of chip.



Scanned with OKEN Scanner

Date \_\_\_\_\_

 $m \times d$  memory $d=1 \rightarrow$  Bit organised $d=8 \rightarrow$  Byte organised $d=16 \rightarrow$  Word organised $d>8 \rightarrow$ 

Memory of CPU byte organised, but processor bit organised  
 How is information stored in memory?

→ Little Endian → Least significant byte at lowest memory address

→ Big Endian → Most significant byte at lowest memory address

Ex: 11001101 01011011

Big Endian & Little Endian

|     |          |          |     |
|-----|----------|----------|-----|
| L   | 11001101 | 01011011 | L+1 |
| L+1 | 01011011 | 11001101 | L   |

Most machines have Bi Endian i.e., both big & little endian, with a switch can access which one.

3 digit decimal  $\rightarrow 10^3$  patterns  
 3 digit binary  $\rightarrow 2^3$  patterns

$10^3 > 2^3 \Rightarrow$  Decimal store ↑ info.  
 than binary

But to store will require ↑ memory in decimal as here have to distinguish 10 values whereas in binary 2 values

Consider n digit no. in base b.  
 We have to find most efficient b value

$$b^n = N$$

Cost of n ; Cost of b  
 $\Rightarrow$  Cost =  $k \log_b N$

~~fixed~~ N fixed

want to find least cost

~~Derivative of cost = 0~~

$$\frac{dc}{db} = 0$$

$$\frac{d}{db} \left( k' \frac{\log_b N}{\log_e b} \right) = 0$$

$$k' = \frac{k \log_e N}{\log_e b}$$

$$\log_e b - 1 = 0$$

$$\log_e b = 1$$

If base is e then,

$$b = e \rightarrow \text{optimum}$$

$$b \approx 2.78 \quad \text{if } b \text{ can't be fraction}$$

b = 3 is unreliable

b = 2  $\rightarrow$  Some cost is sacrificed, but it reduces complications, ↑ reliable

19/11/23

Date \_\_\_\_\_

Thu  
Jan~~Information~~Data covered from 11<sup>th</sup>, 12<sup>th</sup>, digital logicInstruction Set:

Instruction Set should be:

~~① Efficient~~(1) Complete:

Should instructions included should be such that programmes can solve most common problems with them

(2) Efficient:

Frequently used instruction should execute fast

For this may have to ↑ delay of other × frequently used instructions.

(3) Regular:

Instruction set should be regular, say instruction I do task T

Then another instruction J, should be able to do T' i.e., opposite of T.

(4) Compatible:

At least part of design should be available in market, otherwise design cost ↑ as design everything then.

16 bit address in market

Can design 32 bit address from this

But for 24 bit address need new design with new fetch plan, routing, etc.

∴ Should be compatible,

Some standard instructions should be included like:

- Data movement (Move from 1 register to another, Loading instructions)
- Arithmetic instructions (Addition, subtraction)
- Logical instructions (and, or, compare)
- Branch control instructions (Jump, jumpz, jump positive, jump negative)
- Debugging instructions (break)
- Input output instructions
- Miscellaneous instructions

In von Neumann Architecture

For ADD  $\times$ : → 1 addressed instruction

$$AC \leftarrow AC + M(x) \rightarrow AC, AC, M(x) = 3 \text{ openg}$$



If ADD = 1  
then activated  
& add

2 addressed  
instruction

Date \_\_\_\_\_ / \_\_\_\_\_ / \_\_\_\_\_

Say have ADD  $x, y$  and want  
 $x \leftarrow x + y$   $\rightarrow 2x + 1y = 3$  opcodes

## Implementation



~~ADD X, Y, Z~~ →  $Z \leftarrow X + Y$        $\rightarrow X, Y, Z$  are operands  
3 addressed instruction



Say have ADD → Single address instruction

Add top most element of stack & store result at top of stack.

## Instruction Format



To optimise program  $\rightarrow$  Program should have  
 $\downarrow$  memory

$\downarrow$   
 Should need  $\downarrow$  memory to  
 store instruction

$\therefore$  Instruction length should be  
 optimum

Also no. of instructions should be  
 optimised

Say we have  $op = a + b c$  then 2 op

For ADD  $x, y \rightarrow$  2 operations

For ADD  $x, y, z \rightarrow$  3 operations

$\hookrightarrow$  First load  $x$ , then 3 additions

Explicit operand  $\rightarrow$  operand specified

Implicit operand  $\rightarrow$  operand  $x$  specified

In ADD  $x$  explicit  
 $AC \leftarrow AC + M(x)$   $\rightarrow$  2 explicit + 1 implicit  
 implicit

ADD  $x, y, z$   
 $z \leftarrow x + y \rightarrow$  All 3 explicit operands

19  
Thu

Date \_\_\_\_\_

First try to  $\downarrow$  length of individual instruction

### Components

- $\hookrightarrow$  operation code
- $\hookrightarrow$  operand fields
- $\hookrightarrow$  No. of operands
- $\hookrightarrow$  Size of operand

to  $\downarrow$  length of instruction may

- $\downarrow$  No. of operands
- $\downarrow$  Size of each operand
- $\downarrow$  Opcode

No. of operands depends on how many operands implicitly specifying

$\uparrow$  Implicit operand  $\rightarrow \downarrow$  Length of instruction

For this define ~~in addressed machine~~

In instruction set of machine computer all instructions such that no. of explicit operands  $\leq m$ ,

$m$  addressed  $\rightarrow$  Almost  $m$  explicit operands

In  $0$  addressed machine there are 2 single addressed instructions are these

- (i) Push
- (ii) Pop

In  $0$  addressed instruction length = Opcode

Drawback:  
To solve a problem may need to give  $\uparrow$  info.

Q.  $Z = w - x + y$

Find how many instructions to solve for  
3 addressed, 2 addressed, 1 addressed,  
0 addressed machine.

3 addressed:

2

SUB  $Z, w, x$   
ADD  $Z, \cancel{x}, z, y$

$$Z = w - x$$

$$Z = Z + y$$

2 addressed:

~~ADD, SUB, P~~  
3

ADD  $w, y$   
SUB  $w, x$   
ADD  $Z, w$

~~ADD, SUB, P~~

$$\begin{aligned} w &= w + y \\ w &= w - x \\ Z &= Z + w \end{aligned}$$

1 addressed:

4

0 addressed:

6

~~PUSH w~~  
PUSH x  
SUB  
PUSH y  
ADD  
POP Z

44

Saathi

Date \_\_\_\_\_

Operand Addressing Mode  $\rightarrow$  & Length of  
op code

absolute E

00000000

00000000

X = 00000000

00000000

absolute S

00000000

00000000

00000000

00000000

X = 00000000

00000000

00000000

00000000

00000000

absolute I

00000000

00000000

X = 00000000

00000000



Operand Addressing Mode  $\rightarrow$  Length of op code

25/1/23  
wed Jan

(ii)

$AND \alpha$   
 $AND \alpha, Y, Z$   $\rightarrow$  1 operand  
 $X \leftarrow Y \text{ AND } Z$   $\rightarrow$  3 operands  
 To  $\downarrow$  size of each operand

(a)

Immediate Operand Addressing

ADD 07

ADD  $\alpha$

$AC \leftarrow AC + 07$

Have

Opcode [ 07 ]

Supplying operand within instruction itself

07 is operand address

operand address

$\downarrow$  size

for 07 need 1 byte

Called

Immediate Operand Addressing

ADD immediate  $\rightarrow$  2 byte instruction

1 byte  
for 0

But in ADD  $\alpha$

[ Opcode |  $\alpha$  ]

Here give  $\alpha$ , & value of  $\alpha$

$\alpha \rightarrow$  depends on  
of address lines  
of CPU

2 byte if each 1 byte

Total = 1 + 2 = 3 byte

Called absolute/direct operand address

memory addressing  
operand address is  
memory

Date \_\_\_\_\_

45

Saathi

In direct operand addressing

- Size depends on no. of CPU address lines
- Access memory → Slower

But in immediate operand addressing  
it is faster at immediate access.

Registers

(b) ~~Access~~ direct addressing

✗ mention memory address

✓ mention ~~register~~ <sup>register</sup> address

Say which ~~register~~ <sup>register</sup> has operand.

Mov A, B  
A ← B

|        |   |   |
|--------|---|---|
| opcode | A | B |
|        | Y | V |

→ Has less storage than  
direct operand addressing  
A, B have size < 1 byte  
total storage

Addresses → Contain operands

8085 → 8 general purpose registers.

Registers

~~Registers~~ are fastest memory in system.  
inside CPU

Memory of latest tech is faster than CPU  
Access time of CPU from memory  
is slower than from ~~register~~ register

(e).1 Indirect Addressing → Memory Indirect

ADD I X

|        |     |   |   |
|--------|-----|---|---|
| opcode | ADD | I | X |
|--------|-----|---|---|

If X memory address, then say require 2  
bytes



Main Memory



Go to X

Get operand Y

Go to Y

Get operand

By changing Y can change operand

## 1.2 Register Indirect Addressing

ADD B

$$Ac \leftarrow Ac + B$$

Size :- registers

Also when go to B, it will point to some Y (given)

ADD I B

Y → Memory address



## (d) Relative Addressing

Say have 1234 5678 in kolkata

Relative in India, Absolute in kolkata  
 But if X in kolkata need to specify  
 123 03 to make in kolkata  
 ↳ Absolute w.r.t India

PC relative

$\text{ADD R } x \rightarrow$  Go to  $\frac{\text{PC}}{x}$ , move by  $x$   
 some address down,  
 get operand

Absolute  $\rightarrow$  Specify  $x \rightarrow 2$  byte

If  $\text{PC} \rightarrow 2$  byte  
 $x \rightarrow x$  be more than 8 bit

$\text{ADD R } x \rightarrow$  operand at  $\text{PC} + x$

|     |   |  |     |  |
|-----|---|--|-----|--|
| ADD | R |  | $x$ |  |
|-----|---|--|-----|--|

↓  
 Atmost 8 bit

$x$  2 byte

$x$  specify absolute address of operand

### (e) Indexed Addressing

Specify index register no.

$\text{ADD } Ix \ p$

Search for  $Ix$   
 Content will be accessed

### (f) Base Addressing

Sort of relative addressing

In CPU have base register

ADD B ] k~~k~~ [ ]

Actual operand  $\rightarrow$  Base address ~~step~~ Replaced by  $2n$

When we X program loaded in memory give address of variables



(g) Page ~~Page~~ Addressing

Page → Block of equal size

Whole program  $\times$  be accommodated in memory at particular time

If could, then difficult to do  
multiprogramming

Resident part } → Temp- part of OS that is  
of OS } loaded in main memory

Program divided in no. of blocks  
All blocks X available simultaneously  
Only required blocks in main memory  
at particular time.

If block sizes are diff., then called segments.



ADD pp → Starting from address of page, go pp distance

When page is collected, page register loaded by OS.



pp page size can't be very large

(iii) How define Opcode for instruction

(a) Fixed format

Say have 3 bit opcode  
 $\Rightarrow$  No. of instructions =  $2^3 = 8$

↓ size of opcode  $\rightarrow$  Limit no. of instructions

But can use 1 ~~level~~  
 level of decoding to find  
 instruction

(b)

Opcode Extension

Say opcode is 3 bit

Let opcode of RSHIFT be 111

When we use RSHIFT → store 111 in memory

Consider  
remaining  
storage of  
opcode  
extension

Remaining storage  
of operands is  
useless as  
have  $\times$  explicit  
operand.

If 111, next bits specify opcode

i.e., If 111, go to remaining bits

If 00000 consider RSHIFT }  
 If 00001 consider CLR }  
 ↑ no. of instructions      } each instruction  
                                 8 bit

Drawback

With ↓ no. of bits, can ↑ no. of  
instructions  
 But now 1 level of decoder &  
sufficient  
 as if 111 need another decoder

Decoding slaves

## Data Dependent Opcode

Say have 3 bits which specify ADD

Say ADD 2 instruction (i) ADD integers  
(ii) ADD float

need 2 opcodes

ADD  $x, y, z \rightarrow$  Depending on type  
 $x \leftarrow y + z$  of operand will go  
to integer/floating point unit

Here operand should have tag that is  
to be decoded

## Frequency Dependent Opcode

If opcode of all instructions fixed  
maybe some space of instruction  
wasted

ADD (B)  
Register  
Address

ADD (X)  
operand  
Address

X same length of my function

∴ We have instructions with diff. addressing  
modes & sizes.

byte  
LOAD (X) 2 bytes

↓  
1+2 = 3 byte  
instruction

2 bits Register

(MOV A,B)

+ 3 bit each for  
1 byte 8085  
instruction

Some instructions used frequently & some used less frequently

Depending on freq. → one with ↑ freq, try to reduce opcodes of instruction

Consider following freq. of instructions

| Data Movement | Logical | Arithmetic | Load | Branching    |
|---------------|---------|------------|------|--------------|
| 23            | DM      | 20 L       | 12 A | 11 Load 16 B |

| Store | <del>Composite</del> | Composite | Debug |
|-------|----------------------|-----------|-------|
| 8 S   | 6                    | C         | 4 D   |

Above are the branches of mode  
we will merge ones with least weight



access  
We want to fetch, ↑ no. of operations  
at a time

