

Ex - 1

## COA - CV instruction Pipelining



Inserts extra pipeline bus after fetch

Q. Why processor becomes so fast?

→ increase data bus

8085 - Non pipelined

→ increase clock frequency

8086 - Pipelined

→ Pipelining

### 2 stage pipelining



As we are taking 2 main phases of an instruction and working on the two instructions at same time → so becomes fast. This process is called pipelining.

## Lecture - 14

Date: 28.05.22

### 9 Stage Pipelining

5 instructions - 9 stage = 20 steps



Advantages: Fast

Dis Advantages: i) Control Hazard  $\xrightarrow{\text{removed by branch prediction algo}}$

ii) Data dependency

iii) Structural hazard

A Span or empty space for jumping is called pipeline bubble. If stages increases, discont increases.

Data dependency: destination of 1st instruction should not be source of 2nd instruction.

INC B      B = 20  
MOV A, B      B = 21

Solution: INC B

NOP  
MOV A, B

Structural hazard:

Sometimes data bus also used in execution at that time they cannot fetch other instructions.

Move, A, 20

host : coprocessor

Add, A, [loc]

host : coprocessor

RISC - If has many registers and lessen the use of memory.

PICT - Peripheral Inter Controller (better)



## Lecture-15

Date: 30.05.22

- Overlapping of fetching and execution can only happen, if execution doesn't require memory.

- RISC Processor (lots of reg) so most work in reg,

depending on memory reduce

- ii) PIC 18 processor



two separate memory for program and data.

while executing if we need, use other memory. So,

fetching and executing never clash with each other.

Addressing Mode

(address)

The manner by which an operand is given in instruction

operand: the number on which the operation is done is

called operand.

ADD AL, 25H  
operation  
→  
↑ operands

- fetch
- decode
- execution



### Addressing Mode

i) Immediate Addressing mode: Data is given in instruction. Used for initializing.

MOV BL, 25H      BL  $\leftarrow$  25H

MOV BX, 2000H      BX  $\leftarrow$  2000H

ii) Register Addressing Mode: Data is given in register.

MOV BL, CL      BL  $\leftarrow$  CL

MOV BX, CX;      BX  $\leftarrow$  CX

iii) Direct Addressing mode: Address is given in instruction

MOV BL, [2000H]      Data at address [2000H]

MOV BX, [2000H]

MOV AL, [2000H] — Direct

MOV BL, [3000H] —

Address of BL —

mov BL, [200H] ; address given in register  
mov CL, [300H]  
Add BL, CL      15 bit address + 8 bit value = 23 bit total  
Add BL, 55H      15 bit address + 8 bit value = 23 bit total  
mov [400H], BL      16 bit address + 8 bit value = 24 bit total

⑤ Indirect Addressing mode: Address is given in register.

i) Register indirect: address in reg. e.g.

[100+X8].10 VDM mov CL, [BL]

[100+X8].10 VDM mov CL, [BL+X8] ; base + offset = address

[100+X8+X8].10 VDM mov CL, [BL+X8+X8] ; base + offset + scale = address

[400+12+X8].10 VDM

- X8 : multiplier part

offset + scale of X8 address value 400 + 12 + 8 = 420

[400+X8].10 VDM

400 + 12 + 8 = 420 ; base + offset + scale

scale or factor part [12+X8].10 VDM

# Lecture-16

Date: 04.06.22

## Indirect addressing mode:

Q. Get data from location 2000 in CL

Q. Get data of hundred location starting from 2001.

i) Reg indirect: add in reg,  $MOV CL, [BX]$

ii) Reg relative:  $addr = addr + displacement$

iii) Base indexed:  $addr = base + index \text{ reg}$ ,  $MOV CL, [BX+SI]$

iv) Base relative plus indexed:  $addr = base + index + disp$

$MOV CL, [BX+SI+03H]$

Reg relative: Ex -

Get the 100th value after BX.

$MOV CL, [BX+100H]$

Base indexed: Get the value of 50 data after 50H.

$MOV CL, [BX+SI]$

$INC SI$

\* One act as base

\* other act as an index.

immediate: initialize

Reg: move between reg

direct: need to access few memory location

Indirect: looping.

(next values)

$MOV CL, [BLH]$

$INC BL$

Indirect: When need to access a bulk amount of data.

Fl. chart

## Instruction cycle state: transition diagram:



## Lecture-17

Date: 06.06.22

Program → a set of instruction → each instruction

requires tiny operations (microoperation)

when memory was Megahertz, clock pulse microsec.

But now frequency Gigahertz, clockpulse . nanosec.



Processor      memory

A → Address bus send add to memory  
 D → Data bus send the instructions to processor,



81 - unit 3

PC - Program Counter - fetch the next instruction

MAR - Memory Address Register - mem add reg temporary hold add

MBR - Memory Buffer Register - temporary hold the data

IR - Instruction Register -

GPR - General Purpose Register

CU - Control Unit - generate the control signal.

Register file stores bits of address to be used in memory & data from memory to be used in ALU

Activities exist parallel to each other

Register file has two responses to a word write

and number of words of wop written

of the most recent operation to be written

and number of next addition

and also sum of current addition result

RI of again most result

and number of next addition

## Lecture-18

Date: 08.06.22



Empirical activities with sharing - find bottleneck - yes

- Program is a set of operations and each operation needs a set of tiny activities. These tiny activities are known as microoperations.
- Address goes to memory by address bus -  
PC fetch the next instruction. Then from PC to MAR, then to address bus.
- Then instruction comes to MBR from data bus,  
then from MBR to IR

## micro-operation for fetch

D<sub>1</sub> RAM OUT

T1: MAR  $\leftarrow$  PC

mem  $\rightarrow$  MBR

[EX9]  $\rightarrow$  ROM

T2: MBR  $\leftarrow$  mem (instruction)

T3: IR  $\leftarrow$  MBR

PC  $\leftarrow$  PC + 1

} each clock pulse is  
a trigger to process  
to do a new activity

## micro operation instruction

### 1) Immediate addressing mode

MOV BL, 25H

### 2) Reg addressing mode

MOV BL, CL

$\Rightarrow$  T1: MAR  $\leftarrow$  PC

T2: MBR  $\leftarrow$  mem (Inst)

T3: IR  $\leftarrow$  MBR

PC  $\leftarrow$  PC + 1

T4: BL  $\leftarrow$  25H (IR)

T2: —

T3: —

T4: BL  $\leftarrow$  CL

when ADD BL, 25H

$\rightarrow$  T4: BL  $\leftarrow$  BL + CL

T4: BL  $\leftarrow$  BL + 25H (IR)

ST9

### 3) Direct addressing mode: MOV BL, [5000H]

T1: —

T2: —

T3: —

T4: MAR  $\leftarrow$  IR (add)

add (IR)

T5: MBR  $\leftarrow$  mem (data)

T4: BL  $\leftarrow$  [5000H] (IR)

T6: BL  $\leftarrow$  MBR [5000H]



5) Implied addressing mode:



CT2:

Adder, Booth multi, division,

subtractor, ~~pipelining~~

$\Rightarrow$  Why not use traditional mult, pipelining hazard, booth mul/div saturation

## Cache Memory Mapping Techniques

### 1) Fully Associative mapping

The process how block from main memory is placed on cache → called

cache memory mapping.

hit ratio

processor copies a block of data into cache while

fetching, that's how hit ratio ↑ will be.

For pentium processor →



Block size = 32 bytes

No. of blocks in main

$$\text{memory} = \frac{9\text{GB}}{32\text{bytes}}$$

$$= \frac{2^{32}}{2^5}$$

$$= 2^{27}$$

• tag: will indicate which

block from memory is

present in cache. (27 bit tag required)

$$\text{No. of block in cache} = \frac{8\text{KB}}{32\text{bytes}}$$

$$= \frac{2^{32}}{2^5}$$

$$= 2^8 = 256$$

Cache directory is the collection of all tags.

tag = 27

Search = 256

hit ratio = best

27      5       $\rightarrow$  to represent the block no.

Block no | location no

Prings and location no.

~~8/3~~ 8/3 of - represents location -3 and no of block -8.

if we get this address in 3<sup>rd</sup> shift in cache.

then the 3/3 will be the location and block  
item of cache.

drawback - slow as tag size



*E. gracilis* mont. foliol.

(birth prep for lid F2) . return at 10:00 AM

## 1) One way Associative mapping



Size of set = 8 KB

$$\text{No of set in main memory} = \frac{16\text{KB}}{8\text{KB}} = \frac{2^{13}}{2^3} = 2^{10}$$



tag = 19 bits

search = 1

hit ratio = worst case after repeated use of block 0 of different set size



worst case: after

In description - all calculation should be included.

with an example.

method you = often FIFO

more info will come up next class

Two way associative Mapping:

it divides the cache into two sets.



$$\text{No. of set in MM} = \frac{4\text{GB}}{4\text{KB}} = \frac{2^{32}}{2^{12}} = 2^{20}$$

$$\text{set size} = 4\text{KB}$$

$$\text{MM size} = 4\text{GB}$$

$$\text{No. of blocks in cache/set} = 128 = 2^7$$

$$\text{size of 1 block} = 32 \text{ bytes} = 2^5$$

| Set | Block | Location |
|-----|-------|----------|
|-----|-------|----------|

tag = 20 → represents 110 - no. of bits in tag  
set no.

search = 2

Hit ratio = way better

worst case: there are some blocks in repeated manner.



## Memory Hierarchy

Primary memory / physical / main mem. (original)

RAM → constant power supply, volatile

ROM → non-volatile

chip,  
memory,  
expensive,  
fast

Secondary mem (to increase storage capacity)

HD →  
FD (floppy disk)

} magnetic  
disk

writable  
non volatile,

CD →  
DVD →

} optical disk

pendrive - semiconductor

cheap  
need large power

HD, FD, CD, DVD

data acquiring rate =

rate of disk rotation (rpm)

(fast + cheap) → secondary mem

for operation      \  
for storage

Q. How speed increase of . by increasing

RAM size?

processor      primary



secondary      peripheral



(flowchart of) main address

