

পাঠ শুরু  
Execution unit COA  
CU

Subject \_\_\_\_\_  
Date: \_\_\_\_\_ Time: \_\_\_\_\_

## COA CU Instruction Pipelining

one of the reason why processor works fast.

COA → Computer Organization Architecture

CU → Control Unit (Processor এর একটি unit)

Fetch & Decode

Q: Why processor becomes so fast?

→ Increase data bus size (Now 64 bit is used)

→ Increase clock frequency / Trigger

→ Pipelining

8085 → Non pipelined

8086 → Pipelined

### 2 stage Pipelining:



Decode ক্ষেত্র  
বন্ধ time ক্ষেত্র  
So লিপ্তি নি,



As we are fetching two main process of an instruction & working on the two instructions at the same time, so if becomes fast, this process is called pipelining.

(Time taken by pipeline)  $\rightarrow$  80 ns  
(Time taken by non-pipeline)  $\rightarrow$  100 ns

Q: figure of pipelining and non-pipelining.



$$K + (N-1) \times t \rightarrow \text{Pipelining eqn}$$

number of instructions

$$K \times N \rightarrow \text{Non pipelining eqn}$$

Advantage & Disadvantage of 4-stage



- 1) Control Hazard (Branch Prediction)
- 2) Data Dependency
- 3) Structural hazard

Control Hazard :

branching [0000] A QCA

A space or empty spot was created which is called Pipeline

Bubble.



and pipeline failed

By Branch prediction algorithm it was overcome. It is  
accuracy

If stages increased & discard increases.

\* Data Dependency :

Destinoation of first instruction should not be

source of second instruction:

INC B  
MOV A, B



Increment करने से बाकी मिले prb क्या।

लेटा. 2 को 21 करने से 01b हो जाए।

## Advantage & Disadvantage of 4-stage

- ↓  
fast  
branch prediction  
data dependency  
structural hazard
- 1) Control Hazard (Branch Prediction)
  - 2) Data Dependency
  - 3) Structural hazard

### Control Hazard :

branching

A spam or empty spot was created which is called Pipeline

Bubble.



pipeline failed

By Branch prediction algorithm it was overcome. It 95% accuracy

If stages increased & discard increases.

### \* Data Dependency :

Destino Destination of first instruction should not be

source of second instruction.

INC B  
MOV A, B

Before,  
 $B = 20$   
Now,  
 $B = 21$

Increment 20 याकडे बदले जिले prb 23।

Delta n... on nrb तरीके।

Solution :

INC B → No operation

NOP  
MOV A,B

प्रक्रिया,  
efah, execution  
operations प्रक्रिया ना,

## \* Structural Hazard:

ADD A,B

ADD A, [2000]

to solve this  
structural  
hazard

1. RISC Processor      brief writing      ] full form  
2. PIC      "

2. PIC n

## Characteristics :

RISC has more register. It lessen the dependency

on memory.

PIC → Peripheral Inter Controller



Solution:

INC B

NOP → No operation

NOP  
MOV A, B

Causes a stall in execution  
operation prob

\* Structural Hazard:

A

ADD A, B

ADD A, [2000]

problem

forms no usage A

solve this  
structural  
hazard



1. RISC Processor

→ full form

2. PIC

simply used to control various devices

Characteristics:

RISC has more registers. It lessens the dependence on memory.

PIC → Peripheral Inter Controller



Solution



Overlapping of fetching & execution can only happen, if execution doesn't require memory. It increased the speed

Pipelining is imp as it helped to increased the speed of Microprocessor.

1. RISC Processor (lots of reg) so most work in register,  
depending on memory reduce.

2. PIC 18 processor



Two separated memory for program & data while executing. If we need data use other memory, so fetching and executing never clash with each other.

### Addressing Mode

Addressing mode is the manner by which an operand is given in instruction. Operand: the number on which the operation is done is called operand.

Add, AL 25H  
Operands

Operation → SUM, MOV, ADD

- Data direct immediate Add AL, 25H
- Register Add AL, BL
- Indirect/memory location Add AL, [BL]

## General purpose register :

|    |        |
|----|--------|
| AH | AL (8) |
| BH | BL     |
| CH | CL     |
| DH | DL     |

→ AX (16)

→ BX

→ CX

→ DX

25 H → 0010 0101

1235 H → 0001 0010 0011 0101

### ① Immediate Addressing Mode :

Data is given in instruction

[To initialize]

MOV BL, 25H ; BL ← 25H

MOV BL, 2000H ; BX ← 2000H

এই instruction এর মানের data পেরি যাবে (so it can go to immediate operation).

### ② Register Addressing Mode :

Data is stored in Reg.

MOV BL, CL ; BL ← CL

MOV BX, CX ; BX ← CX

processor এর পিতৃর কাটি হচ্ছে "It's smaller & faster."

Q: কোন addressing mode ক্ষমতা বেশি?

### III Direct Addressing Memory:

Address is given in instruction

MOV BL, [2000H] ; BL ←  
MOV BX, [2000H] ; BX ←

It's  
an address  
so 16 bit.  
Though the  
Reg is 8 bit.

(BL gets the content of)  
2000H



don't

\* When we know the data, we use this mode.

### IV Indirect Addressing MODE:

(BEST)

(BEST)

Address is given in the register.

1) Register Indirect : addr. in register

MOV CL, [BL]



2)

best way because it's easier to increment/decrement.

X0 → X1 ; X0, X1 ROM

\* ADD 25 + 35

MOV AL, 25H

MOV BL, 35H

ADD AL, BL

Reg A.M

\* [2000H], [3000H]

MOV AL, [2000H]

MOV BL, [3000H]

Add AL, BL

[14000H] to VOM

→ Direct Add. M

\*

2000 → BL

3000 [BX] → CL

X8 + BL

BL

4000 → [4000H] + 55

BL

MOV BL, [2000H] or most std 00L Freq with

MOV CL, [3000H] from 000 to std all with

ADD BL, CL

ADD BL, 55H

MOV [4000H], BL

most prot mode for std top 12  
It's also a direct addressing mode as address is given in the ins.

+ Operation      Destination, Source  
↓  
MOV    AX, BX

## Indirect addressing Mode:

immediate : initialize

register : move between reg

register : move between D  
direct : need to access few memory location



Give next 100 data from 2000 → Indirect  
Give the data of 2000 memory location → Direct

Q: get data from location 2000 m CL.

- Q: get data from location -
- Q: get data of hundred location starting from 2000.

## TE] Indirect Addressing Mode :

i) Reg. indirect : add in reg,  $MOV CL, [BX]$

ii) reg relative : add = adder + displacement  
 $MOV CL, [BX + 02H]$

iii) Base indexed : adder =  $MOV CL, [BX + 02H]$

$$\text{adder} = \text{base} + \text{index reg}$$

$MOV CL, [BX + Si]$

iv) Base relative plus indexed : add = base + index + disp  
 $MOV CL, [BX + Si + 03H]$

### Indirect

When we need to access a bulk amount of data.

SI = Source Index register

$MOV CL, [BX + 50H]$

$MOV CL, [BX - 1100H]$

M

2000

## Base indexed:

i) one register act as a base

ii) other reg act as an index



3 frames of R2000 of base or mode : base

## Indirect Addressing Mode:

## Instruction cycle State; transition diagram:



int main()

$$\begin{aligned} a &= 2 \\ b &= 3 \end{aligned}$$

```
{  
    cin >> a >> b;  
    c = a + b;  
    cout << "done";  
}
```

O/P :

done.

- \* ~~what exactly goes in processor~~ ~~CPU make~~ program → set of instruction
- \* To execute these programs, tiny operations are needed that are called Micro operations.  
(each instruction requires tiny tiny operations)
- \* Control unit control signal generate
- Q: What is micro operation?   
⇒ Clock speed microsec
- Now it's calculated in nanosec.

System Bus → Address Bus  
 → Data Bus  
 → Control Bus



name Counters  
 , the next  
 instruction)



Imp Topic for Marks 4.

## Control Unit (micro operations)



Q: What is μ-operations?



Q: Steps of Micro(μ) operations :

\* μ-operation for fetching :

$$T_1 : \text{MAR} \leftarrow \text{PC}$$

$$T_2 : \text{MBR} \leftarrow \text{mem (instruction)}$$

$$T_3 : \text{IR} \leftarrow \text{MBR}$$

$$\text{PC} \leftarrow \text{PC} + 1$$

sequentially execute করানোর পথ  
 Independent ৩টি 4টি না করে ৩টি Transaction হবে "

clock pulse तक परियोगी Transaction controlled 227"

- \* each clock pulse is a trigger to processor to do a new activity.

### μ-operation instruction :

#### ① Immediate Addressing Mode :

MOV BL, 25H

T<sub>1</sub> : MAR  $\leftarrow$  PC

T<sub>2</sub> : MBR  $\leftarrow$  mem (inst)

T<sub>3</sub> : IR  $\leftarrow$  MBR

PC  $\leftarrow$  PC + 1

T<sub>4</sub> : BL  $\leftarrow$  25H (IR)

#### ② Reg Addressing Mode :

MOV BL, CL

T<sub>1</sub> : —

same as before

T<sub>2</sub> : —

T<sub>3</sub> : —

→ RAM → AL

T<sub>4</sub> : BL  $\leftarrow$  CL

↑  
data given in register

Q: Add BL, 25 H ?

$\Rightarrow T_1:$

$T_2:$

$T_3:$

$T_4: BL \rightarrow BL + 25H \text{ (IR)}$

### 3) Direct Addressing Mode:

MOV BL, [5000H]

address instruction  
à রেজি মার্ক

$T_1:$  —

$T_2:$  —

$T_3:$  —

$T_4: MAR \leftarrow IR \text{ (address)}$

$T_5: MBR \leftarrow \text{mem}$

$T_6: BL \leftarrow MBR [5000]$

Q: ADD reg, [add] ?

$T_1:$

$T_2:$

$T_3:$

$T_4: MAR \leftarrow$

(IR)

$T_5: MBR \leftarrow \text{mem}$

(data)

$T_6: Reg \leftarrow Reg + MBR$

### 4) Indirect Addressing Mode:

MOV BL, [CX]

address register  
à রেজি মার্ক

$T_1:$  —

$T_2:$  —

$T_3:$  —

$T_4: MAR \leftarrow CX$

$T_5: MBR \leftarrow \text{mem}$  [box]

$T_6: BL \leftarrow MBR$  [box]

ADD, SUB  
etc.

## ⑤ Implied Addressing Mode

Subject..... Date..... Time.....

carry flag 1

কর্যত হবে,

T<sub>1</sub>:

T<sub>2</sub>:

T<sub>3</sub>:

T<sub>4</sub>: CF ← 1

Ep : STC



set the  
carry flag

~~data flow~~ ⑥

for example

[H000] : 18 VOM

— × — → [T]

[B000] : 18 RAM → [T]

[T] → [T]

CT-2 Booth's Algo (যোগাদা) RL → RAM : PT

Booth's Multiplication, Division → RAM : PT

Pipelining [B000] : RAM → RL : PT

for example ⑦

Adder, Subtractor

[Ex0] : 18 VOM

Lec - 5 - 10, Lec - 13, 14

— × — → [T]

X0 → RAM : PT

Mid Break starts Here ...

[Ex1] : 18 VOM → RAM : PT

পার্শ্ব

\* Cache: lies betw<sup>n</sup> processor and memory.

## Cache Memory Mapping Technique

### i) fully associative mapping:

Slow

The process how block from main memory

is placed on cache → called cache memory mapping.

95% ratio (hit)

cache এর প্রতি-  
কাশের সময়

\* hit ratio:

processor copies a block of data into cache while fetching, that's how hit ratio arise.

hit ratio এর মান ~ 95%

example of pentium processor:



~~exam~~ Size of Block = 32 bytes

No. of B. in MM =  $\frac{4 \text{ GB}}{32 \text{ bytes}} = \frac{2^{30}}{2^5} = 2^{25}$

No. of Block in cache =  $\frac{8 \text{ KB}}{32 \text{ byte}}$

$$= \frac{2^{13}}{2^5} = 2^8 = 256$$

flexible method & slow

tag: main memory থেকে কোর্ট বাধাতেরি।

tag will indicate which block from main memory

is present in cache.

tag জাতে 27 bit for pentium.

বেলে 2<sup>27</sup> মানে 2<sup>27</sup> পদ্ধতি বাধার লাগ।

cache directory

↓  
expensive

এই method use বৃদ্ধির cache size এতে খাড়া-ফালি-  
expense বেড়ে যাইছে।

tag = 27 bit

Search = 256 (worst case)

hit ratio = best



= 32 bit



813      tag

8 no. block & 3 no. location

location catch করে করে number Block 2 "

Q: cache location? = 313 (example একটি)

B: drawback? → slow, 256 বাই search করতে আবশ্যিক

time = O(log n)

② One way associative Mapping:

just একটি way মাত্র

মেমোরি cache (একটি set consider করুন)



size of set = 8 KB

$$\begin{aligned} \text{No. of set in M.M.} &= \frac{4 \text{ GB}}{8 \text{ KB}} \\ &= \frac{2^{32}}{2^{13}} \\ &= 2^{19} \end{aligned}$$

unflexible  
rigid  
fast

| setNo | BlockNo | location No |
|-------|---------|-------------|
| 2     | 3       | 4           |

tag = 19 bit

Search = 1

hit ratio = worst

worst case scenario : same block  $\Rightarrow$  alternatively use  
repeated use / alternative use

tag (set No) नाम्बर represent करता है !!

Q: Difference between the mapping

### (3) Two way associative method :

It divides the cache into two sets.

$$\text{No. of set in M.M} = \frac{4\text{GB}}{4\text{KB}} = \frac{2^{32}}{2^{12}} = 2^{20}$$

Set size = 4 KB

M.M = 4 GB

No. of Blocks in cache/set =  $128 = 2^7$

Size of block = 32 bytes =  $2^5$

| Set | Block | Location |
|-----|-------|----------|
| 20  | 7     | 5        |

tag = 20

search = 2

hit ratio = way better

worst case

three some  
~~there are some~~ blocks in repeated  
 manner.

two solve

the prob,  
 we can divide the cache  
 in some parts.

represent  
 set No.

tag



## Memory Hierarchy

Primary memory / physical / main mem: (original)

RAM → Constant power supply voltage

ROM  $\rightarrow$  X , non-volatile

chip  
memory,  
expensive,  
fast

Secondary Memory: (to increase storage capacity)

HD →

FD (floppy disk)

CP →

DVD →

portable

magnetic disk

878

optional disk

- ionizable
- non-volatile
- cheap
- need larger power

HD, FD, CD, DVD, pendrive (chip memory)

data acquiring rate =

rate of disk rotation (rpm)

(fast + cheap) → secondary memory

↓ ↓

for operation      for storage

Q: How speed increase of computer by increasing RAM size?

RAM size ?  
and RAM occupied ক্রমে - জাতি , speed ক্রমে যাই , RAM যাইজে  
সুবৃত্তি , so speed বাঢ়ে !

## Memory Hierarchy



General order of access: RAM, ROM, CD, HD, FD, DVD, DLT

= after primary level  
(longer) access time level to after

Access propose ← (read + write)

## Memory Hierarchy

pen-drive (flash ROM) ~ writable & non-volatile

SSD → Solid state disk → NO Rotation

ROM:



BIOS → Basic Input Output System → Booting Program

OP Run → load in RAM → a program fetch OP in RAM

→ Booting Program (BIOS) → ROM (Primary)

| SRAM                  |  | DRAM           |
|-----------------------|--|----------------|
| Static RAM            |  | Dynamic RAM    |
| Speed fast            |  | Slow           |
| cost high             |  | low            |
| used in cache         |  | M.M            |
| power consumption low |  | high           |
| use flip flop         |  | use capacitors |



Processor

Cache

Main Memory

2nd Memory

Processor → Main Memory → Bus → RAM

Bus slow তাই দ্রুত

(ROM) ROM → (information) আদর্শ প্রদান possible না।



Pentium

1993

Processor

→ এই prob solve কোথা

এটি আনা দুব্বল।

এটি fast. কেন??

বিনোদন

রিডেন