

4/9/18

CQA

set Association:

= 2 tag set association



Supp & use addressing = 00 001 Matched / hit

set associative addressing.

No. of replacement decreases

02 001 Hit

00 001 Hit.

Block  
unit  
search

searching is like set associative  
but mapping in direct mapping

01 001

miss condition

and replacement  
is done in  
tag 02.

miss

Hit / miss case or hits of tag / block etc

Scanned

5/04/18

Cache organisation for pentium - 9



System Bus ~~from~~ fetches word from the main memory.

L1 = splitted cache

L2 = unified cache

Out of order Execution logic - come inst.  
when dependency b/w inst / data  
~~comes then~~ ~~the related~~ The execution sometimes cannot enter to its execution unit during reading of sequential inst.  
~~inst + is executed~~ which inst will enter ~~to~~ to L2 Cache or, is decided by this. will be executed first

Integer file stores intermediate value calculated by integer + complex integer ALU

FP register file stores intermediate value calculated by complex integer ALU

80386 - does not include uncached cache

(unified)

> 80486

- include single uncached cache with size 8-12<sup>1</sup>

Pentium includes ~~FP~~ splitted uncached L1 Cache.

Pentium 4 includes L2 unified) and splited L1 cache.

Cache  
 $L2 = 8$  way set associative memory and size is  $128KB$   
multiple access memory.

Affiliative memory, also called parallel search memory  
Addressed by the content itself rather than  
content addressable memory

## Argument Registers

Key / mask register



To check if in free  
location or not in  
associating our  
not during our  
white operatn.

~~which will be selected if tag is high.~~

## Read operation

Data placed in As journal Register

Data in sometimes not the whole word of not required.

if all is read ie  
0 110 0



Mask  
Resistor  
anions

$$\begin{array}{r} 110011 \\ \hline 11 \end{array}$$

all are read from  
associative memory

If bit wise there is a match b/w argument register data & associative memory = found  
else not found, no read operation.  
if a matching is 1 or more then ~~that~~ its corresponding match HT is high

If multiple match = parallel access / all read.  
multiple location is accessed

④ searching is very fast



# Internal organisation of a cell



$$r_i = A_j F_{ij} + \overline{A_j} \overline{F_{ij}} \quad (\text{For } j \text{ bit})$$

$$M_i = r_1 \text{ and } r_2 \text{ and } r_3 \text{ and } r_4$$

↓  
for all bit matching.

Including Key bit:-

go through Morris manual

$$K_j = 1 \text{ then comparison is done.}$$

$$r_j = \text{obtained } r_j = A_j r_{ij} + \overline{A_j} \overline{r_{ij}}$$

$$K_j = 0 \text{ show expression}$$

$$r_j = r_j + K_j$$

$M_i =$



(P)

$$r_j^i = A_j F_{ij} + \bar{A}_j \bar{F}_{ij} + K_j$$



6/4/8



For use of OS.

For user's  
use.

OS stored in  
secondary  
storage

- ① This part is divided into diff parts called partition during multiple I/Os secondary storage to main memory
- ② swap out (main memory to secondary storage)  
swap in (secondary to main)

Q types of partitioning :-

may be equal size partition

(i) Fixed :- unequal size partition

(ii) Variable :-

different size of memory storage



Disadvantage

Process less than 10 K

more than 10 K

equal

10 K, - If a process of size  
is present in it OS do not makes risk  
and fits it to the next higher memory locatn

waste of  
memory

10 K



waste  
process

if a process of size

is present in it OS do not makes risk

and fits it to the next higher memory locatn

Fixed equal  
 In equal size partitioning all are equal, and if all processes are more than the size of individual size of partition though space is available but process cannot fit.

### Variable:-

partitioning is done as per the size of the process



Compaction :- If shifts the present process and try to make space in compact manner. ∵ It manages space.



Relocation of the process is done and it happens during compaction to make spaces.

#### disadvantages

- ④ performance reduce because OS takes times for shifting and this may reduce performance.
- ④ During the shifting, if within the program a subroutine comes, the subroutine will also get shifted, this may create mismatch of address. But OS takes care of it by logical address. not in physical address we save add in logical address.



⑧ physical address is actual memory space available to the user (actual add to main memory)

⑨ logical address is stored with the help of secondary storage = start add + offset val.

⑩ Total physical address available = memory space

Total logic address available: address space

The available memory is divided into small chunks and the available process is also divided into memory.



OS decides the size of chunks of process first memory. and then it decides the size of memory.

⑪ program is present in secondary memory

Division of

RAM

Demand Paging:-

when the page is demanded then it moves from the secondary memory to the main memory.

All the disadvantages is avoided by virtual memory.

Virtual memory takes care of secondary and main primary memory.

Physical memory space is very small and address.

It uses logical address.

Virtual memory logical address = 8 K

Let virtual address = 12 bits required to address

Let Real memory / Physical add = 4 K. (Actual add. avail. for execution).

Let each page = 1 K, : each Frames = 1 K

Let real memory space

13 bit address is there but actually required = 12 bit

From 13 bit  $\rightarrow$  12 bit (transformation required by mapping).



To select a particular line/word of a Frame = 10 bit

To select a Frame we need 2 bit

$$10 + 2 = 12 \text{ bit}$$

required



logical address



Memory mapping

Page Table

It takes care of pages which frame is in active or which is not active

Page Table

Buffer



virtual replacement

Algorithm

- (i) First come first serve
- (ii) Optimal replacement algorithm (intelligent one)  
Due to the experience of OS this also works.  
It will replace a page ~~from~~ <sup>on</sup> Frame if that  
will not be used for a longer time.

(iii)

LRU

Least Recently Used :-

which has not been used for a longer period of time, that page will be replaced from frame by new page.

## Segmentation - user

Paging is not visible to user  
segmentation is visible to user  
decided

main memory.



18/11/18

## COA [General Standardized DTE Interface]

- Interface in a connector
- ① we also use standardized 110 interface and early
  - ② for serial connection, new widely used interface is developed standardise 110 interface is AS - 232-C. The common may be synchronous or asynchronous.
- It has two additional pins for common and ground.



③ 25-pin D-type connector -  
25-pin D-type connector -  
communications capability  
and its communication distance  
is 20 K baud with soft distance.

positive logic in TTL  $+5V = 1$   
 $0V = 0$  logic  
It was developed prior to TTL.  
Signal level:-  $+3V$  to  $+15V = 1$ .  
 $-3V$  to  $-15V = 0$ .

RS232C  
No ground

This is a negative true logic

Minimum 3 pins required = pins: 2, 3, 7 for data communication, may be excessive.

For ex. if few pins are required and all other pins are unnecessary.



Pin 7: common ground

R3 - 232-C  
Capable

0V to +4V  
for 1 to 0  
0 to +5V  
for 0 to 1

When DTG wants to send '0' wire + 0V (in TTL)  
but because of AS-232-C, MC-1488 changes 0 to 1

0V to +4V and MC-1488 changes 1 to 0  
and it receives 0.

May or may not be collision-free in 3 pin  
structure.

Assignment :-

① IODE 1488 Interface.

② VSB.

Report

Data Trans.

Diff. Prog. Jack

④ ⑤ D

① :-

V

to

z

z

④

⑤

Interface  
And  
Reader

The  
keepin  
the

Data from you below (P.U. and peripheral).

models are there?

there :-

Diff. modes were used for data Transfer  
Programmed I/O, data Transfer initiated in interrupt mode data Transfer

Jeder ruft

Direct memory Access

1

10

various peripheral devices are connected to processor to transfer info.

Everything before D,  
whether the province

⑤ wastage of CPV time because of sys. dec.

To avoid

Interrupt :- Every device has interrupt flag. Every device has one interrupt. And when interrupt is generated by the CPU, it sends its interrupt word and received by the CPU.

and received) by the C.P.U. The C.P.U. will suspend its present work in stack any time the return address is reached. The program is executed. The interrupt :-  
(Interrupt service routine).

Depending

more than one device let interrupt flag  
CPU receives : priority wise decision is  
made for execution

The highest speed device = 1<sup>st</sup> priority  
lowest = n = lowest priority  
highest priority = disk | lowest priority = keyboard

Clock  
memoried > Electromagnetic > electronic



vectors  
additions

will pass through device and reach D1  
as it has request  
 $P_1 \text{ (D1) Priority } 1 | P_0 = 1$

~~Processor~~ therefore  $P_0 = 0$ . After the

Disadvantages  
Last device may  
prioritize its  
request at  
get chance or  
change or  
the next device  
gets chance

Algorithm: Software based priority interrupt.

DMA Data Transfer  
Without masking CPU busy.



2 types ↪ ① Burst mode ↪ To steal the bus cycle from CPU.  
② Cycle stealing ↪ To steal the bus cycle From CPU.

DMA controller

Diagram.

Digital comp

Depending on multiplicity of Data stream (Flynn's classification of 4 types suggested by comp.)

- 1) SIMD (single inst single data)
- 2) SIMD (multi plu inst single data)
- 3) MIMD (multiple inst multiple data)
- 4) MIMD (multiple inst multiple data)

SUD

$$\text{For } A_i = B_i + C_i$$

$$A_i = B_{i1} + C_1$$

$$A_i = B_{i2} + C_2$$

input stream      Data stream

for  $i = 1$  to  $n$

① uni processor  
classical  
sequential comp  
which follows  
van rovemans



- ② sequentially
- ① in this sys, parallelism can be achieved by pipelining.

## 2. SIMD



single int multiple data

PE = Processor element

MM = Memory module.

for,  $A_i = B_i + C_i$ ,  $i=1$  to  $n$ .

$$PE_1 \rightarrow MM_1 = A_1 = B_1 + C_1$$

$$PE_2 \rightarrow MM_2 = A_2 = B_2 + C_2$$

$$PE_3 \rightarrow MM_3 = A_3 = B_3 + C_3$$

## Array Processor

1st array process =  $1U14C - TV$

## 3. MISD



Result of ① pass off

to ② next is open if result by pass off

calculated ① passed

Result of ② pass off

• This architecture has not been used (hypothetical type).  
 • gives the feeling of pipeline.

### MLMD :-



also multiple SIMD, multi comp. system

Difference = intra connection network that in a communication ~~so that they~~ tapology.

## PIPELINING

Sub operations to complete a instruction (inst-cycle)

if we segment each sub operation into individual hardware unit



For non pipelining sys ntn time reqd  $\rightarrow$   $t_4 + t_1 + t_2 + t_3$

$t_4$  - time for 1 inst

in Uniprocessor by using pipelining , parallelism can be attain

There may be difference in time factor b/w diff hardware units, we use intermediate register, time taken by segmented register =  $t_p$ , highest this  $t_p$  must be decided after taking ~~stop~~ time for every required operation ~~and~~ for any instruction.



The value which is highest  $\rightarrow$

## Time Space Diagram of Pipelining :-

you.)

hardware



(Q1P)

$K = \text{no. of segment}$

$t_p = \text{segment time} \times \text{time} \rightarrow$

$n = \text{no. of inst.}$

After  $K$  to time we get 1st QP.

$$\boxed{\text{Total time} = Kt_p + (n-1)t_p}$$

allowable

Iteration,

start

time



non-pipelining

time

$t_m$

$t_m$

non-pipelining

$n t_m$

and step

$$\text{Ind in } \mathbb{Q} = \frac{n}{(k+1)^{\frac{1}{2}}}$$

$$= \frac{n^{\frac{1}{2}}}{(k+1)^{\frac{1}{2}}} \cdot \frac{1}{\sqrt{k+1}}$$

$\Theta$

On fields of first  $n^{\frac{1}{2}}$  spns

$$(1) = \frac{n^{\frac{1}{2}}}{(k+1)^{\frac{1}{2}}} - \frac{1}{\sqrt{k+1}}$$

$$\text{Ind} = \frac{n}{(k+1)^{\frac{1}{2}}}$$

$$S = \frac{n}{(k+1)^{\frac{1}{2}}} \equiv K \text{ (Ind cond)}$$

$$\text{Efficiency} (\%) = \frac{\text{Ind field}}{\text{Ind spns}} \cdot 100$$

$$I = \frac{\text{Ind field}}{\text{Ind spns}} = \frac{12}{12} = 1$$

$$Y_{\text{total}} = \frac{1}{n} \cdot \frac{1}{(k+1)^{\frac{1}{2}}}$$

Example not used in book  
for first time

See also by Helmut - Rolf -  $(k+1)^{\frac{1}{2}}$



$$\text{Ind limit} = \frac{n}{((k+1)^{\frac{1}{2}})^2} = \frac{n}{(k+1)^2}$$



Critical incident :-  
the behaviour

evaluating individual in critical  
situation e.g. in breakdown,  
of the engg. in such a case.

Assessment sender

~~20/4/18~~

COA

2 types of pipeline :-

- (1) Inst. Pipeline
- (2) Arithmetic pipeline

(Previous discussion.)

Arithmetic pipeline :-

$$A_i * B_i + C_i$$

At a time  
more  
than  
1  
arithmetic  
opera<sup>n</sup>.



- (\*) we cannot generate a general purpose comp. for arithmetic pipeline.
- (\*) It is a specialised comp / system.
- (\*) cost is high where a particular exp / calcn is repeated regularly.
- (\*) Designed acc. to organisational requirement

Assignment

Design arithmetic pipeline for floating pt Add / sub. and determine their stages

At a designated time, which inst. should be performed in what particular segment cannot be decided. This is called Pipeline Hazard.

3 types of pipeline hazard

- Structural hazard
- Data hazard
- Control

Structure Hazard :- Due to structure or architecture

a particular inst. which is designated to enter a particular segment at a particular clock but that cannot attain clock.

If has to wait for sometime



If at that time, we are splitted cache one for data other for Inst, then this hazard can be avoided.

Multiplicity of functional units - multiple no. of processing element & memory module.

Hazard Hardware interlocks :- which stops the the clock cycle.



Data Hazard :- pipeline sys due to data dependency

①

Add  
Sub

when data hazard occurs, the



data dependency

at this time available for subs R1 is operand

it is only available after execution is done at tp 1.

## operand forwarding:-

As soon as the operand is generated forward that to all the segments where it is necessary. This is operand forwarding.

## Compiler scheduling:-

Before execution compilation is done. Compiler changes / or reschedule the operations to avoid data hazard.

|     |                 |                                 |
|-----|-----------------|---------------------------------|
| Add | R <sub>4</sub>  | R <sub>2</sub> R <sub>3</sub>   |
| sub | R <sub>4</sub>  | R <sub>1</sub> R <sub>5</sub>   |
| AND | R <sub>10</sub> | R <sub>11</sub> R <sub>12</sub> |

independent from  
above 2.  
Reschedule the  
operations.

## Control Hazard:-

Hazard due to program control.



After decoding we can know about the jump but wrongly I must entered.

after decoding during operand fetch if it is a JUM<sub>160</sub> from first ad address 1600, is only executed but the direct corangle entered.

- ① Due to flow of control (i.e. program control) according technique:-
- ② Compiler scheduling
- ③ hardware interrupt



Delay slot is given to waste the first cycle to execute the next instruction.

sim(vt+R)

Transform :-

$$\text{DFT-ODA: } \frac{\text{DFT}}{\text{ODA}} = \frac{\text{DFT}}{\text{DFT} + \text{ODA}}$$

$$\text{A. DFT and Fourier Transform: } T_m = \text{DFT}$$

introduction

TRAP

Non native

A lot of

mechanical

if we want to

more about it then

it's in

RST in

SMS

introduction

defined by a set of rules

symbolic form of data

structured or unstructured

assigned to a task / source

subset of data

representation of data

using any programming language

introduction

classical

symbolic

numerical

processes

units

510

start of data

length of data

end of data

classical

introduction

generated by program

structured / unstructured

language of machines

powers supply

eg.

turning



hard ware

soft ware

introduction

External :- asynchronous

Internal :- synchronous

Software interrupt → It is an interrupt given by user which interrupt the system.

Every instruction has an opcode to temporarily suspend the work / execution.

SWIN →

RSTn

Software instruction n which are given by user to program for some specific purpose.

- ① maskable interrupt
- ② unmaskable

↳ Firmware interrupt

Eg:- maskable interrupt (here also there are some priority) RST 7.5, RST 6.5 etc.

Non maskable interrupt

TRAP

TRAP flag



Two types of ISRA :-

- ① vector
  - ② Non vector

The device which generates interrupt, provides the start address of the ISR and the ISR is executed.

The interrupt that is generated by the device is first decoded (such ISR are standard interrupt) and then it automatically goes to the start address of the ISR and is executed.

Types of addressing / 2 types of I/O mapping:

  - ① I/O mapped I/O / isolated I/O
  - ② Memory mapped I/O

Generally some I/O has generated address.

  - If CPU generates the address of I/O device then I/O mapped I/O -
  - If the address of I/O is obtained from the memory.

ie communicated  
in the dense connected list



## RISC/CISC

CISC :-  
Complex inst. cost comp. - ① designed during the hardware complexity inc.

- ① probable designed to enhance the performance.
- only 20% of the CISC inst is used 80% of time.
- 80% of complex inst are used by 20% of time
- Used for 10% time

To reduce of CISC architecture,

→ RISC :- reduced inst. set comp. -

- ① very simple inst and all inst are fixed cycle inst. All inst are single cycle inst except load and store.

register AM and storage is more

- ⑥ implied AM and register addressive
- Max. dist on register addressive

hardware intensive system

functionality

- ① because the function using operand using AM is fast
- ② hardware controlled inst (partial) some tiny memory sometimes register opera

① Microprogrammed or in hardw. How to decide which is better?

How to decide which is better?

contra verty

RISC/CISC controversy

① different arch  
Different compiler / Different Hardware arch diff inst set. so how to decide. It is difficult to common.

CISC RISC RISC/CISC convergence

~~① pipelining Algo.~~

② pipelining