

SESSIONAL - III

- for addition of two floating numbers.
- compare of EXP
  - shifting of mantissa
  - summing the mantissa
  - normalize the result.



$$\text{total delay} = 4T.$$

→ while  $x_1, y_1$  is in mantissa shift we can put  $x_2, y_2$  in another (exp comp.) phase.

→ while two numbers are in one phase, other two numbers we can put in see another stage (phase).

$$4T + T + T = GT$$

using pipelining = GT      for adding 3 number  
 without pipelining = 12T

To avoid collision buffer reg's are placed bet<sup>n</sup> two stages.



$$\rightarrow \text{delay} = \max\{T_i^o\} + TR \quad \text{buffer req.}$$

$$\rightarrow \text{Throughput} = \frac{1}{\text{Max}\{TP_3+TR\}}$$

$\rightarrow$  speed up factor =  $\frac{\text{Non pipelining proc}}{\text{pipelined system.}}$

$\rightarrow$  Non-pipelined proc.  $\rightarrow N$  no. of processes  
1 operation is completed in  $4T$  time.

- Non pipelined system =  $4NT$ .

$$\rightarrow \text{for pipelined system} = 4T + (N-1)T \\ = (8+N)T$$

$$\therefore \text{speedup factor} = \frac{4NT}{(N+3)T} = \boxed{\frac{4N}{N+3}}$$

4 stages

If 5 stages,  $= \frac{5N}{N+4}$

m stages  $\Rightarrow \frac{mN}{N+(m-1)}$

## 4 Stage pipeline

$E_1$  - exponent of 1<sup>st</sup> no.

$E_2$  - exponent of 2<sup>nd</sup> no.



$E_5$   
contains  
diff. of  
 $E_1 - E_2$

based upon  
 $M_1$  &  $M_2$   
shifted.

shifted values  
and stored in  
 $M_6$  &  $M_7$ .



Advantage: speed is high

Drawback: because of buffer registers cost will be increase

## Encoding of Micro Instruction

Inst's are of 2 types.

- i) Horizontal
- ii) Vertical.

In vertical format, 16 Inst's  $\Rightarrow$  4 bits



at a time only 1 is high.

In register file

FUXI MUX two selection lines should be provided at same time but using vertical, we can only activate one signal at a time.

In horizontal,



we can activate one or more signals at a time.

ExuInst^n'sI<sub>1</sub>I<sub>2</sub>I<sub>3</sub>I<sub>4</sub>I<sub>5</sub>I<sub>6</sub>I<sub>7</sub>I<sub>8</sub>

## control signals

C<sub>1</sub>, C<sub>2</sub>, C<sub>3</sub>, C<sub>4</sub>, C<sub>5</sub>C<sub>1</sub>, C<sub>3</sub>, C<sub>4</sub>, C<sub>6</sub>C<sub>5</sub>, C<sub>6</sub>C<sub>4</sub>, C<sub>5</sub>, C<sub>8</sub>C<sub>7</sub>, C<sub>8</sub>C<sub>1</sub>, C<sub>8</sub>, C<sub>9</sub>C<sub>2</sub>, C<sub>4</sub>, C<sub>8</sub>C<sub>1</sub>, C<sub>2</sub>, C<sub>9</sub>

Have to group these uInst^n so that it should be ~~ex~~ mutually exclusive with each other

$$M_1 = \{c_1, c_2, c_3, c_4, c_5, c_6, c_7, c_8, c_9\}$$

$M_2 = \{c_1, c_7\}$  can be grouped because it is ~~not~~ not mutually exclusive

$$= \{c_1c_7, c_2c_6, c_2c_7, c_2c_8, c_3c_7, c_3c_9, \\ c_4c_7, c_4c_9, c_5c_7, c_5c_9, c_6c_7, c_6c_8, \\ c_6c_9, c_7c_9\}$$

$$M_3 = \{c_2c_6c_7, c_2c_6c_8, c_3c_7c_9, c_4c_7c_9, \\ c_5c_7c_9, c_6c_7c_9\}$$

Groups:  
 c<sub>1</sub>c<sub>7</sub>                    c<sub>5</sub>c<sub>7</sub>c<sub>9</sub>  
 c<sub>2</sub>c<sub>6</sub>c<sub>7</sub>                    c<sub>6</sub>c<sub>7</sub>c<sub>9</sub>  
 c<sub>2</sub>c<sub>6</sub>c<sub>8</sub>  
 c<sub>3</sub>c<sub>7</sub>c<sub>9</sub>  
 c<sub>4</sub>c<sub>7</sub>c<sub>9</sub>



|               | $C_1$ | $C_2$ | $C_3$ | $C_4$ | $C_5$ | $C_6$ | $C_7$ | $C_8$ | $C_9$ |
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| $C_1 C_7$     | ✓     |       |       |       |       |       |       |       | ✓     |
| $C_2 C_6 C_7$ |       | ✓     |       |       |       |       | ✓     | ✓     |       |
| $C_2 C_6 C_8$ |       | ✓     |       |       |       |       | ✓     | ✓     |       |
| $C_2 C_7 C_9$ |       |       | ✓     |       |       |       |       |       | ✓     |
| $C_4 C_7 C_9$ |       |       |       | ✓     |       |       |       |       | ✓     |
| $C_5 C_7 C_9$ |       |       |       |       | ✓     |       |       | ✓     |       |
| $C_6 C_7 C_9$ |       |       |       |       |       | ✓     | ✓     |       |       |

Find minimal covers class

- Above rounded are minimal cover class.

$C_1 C_7$  - 1 bit to indicate only  $C_1$  because  $C_7$  is covered into another classes as well.

$C_2 C_6 C_8$  - 00 - None 2 bits  
01 -  $C_2$ , 10 -  $C_6$ , 11 -  $C_8$

$C_3 C_7 C_9$  - 2 bits 00 - None  
01 -  $C_3$ , 10 -  $C_7$ , 11 -  $C_9$

$C_4 C_7 C_9$  - 1 bit for  $C_4$  because  $C_7 C_9$  already covered.

$C_5 C_7 C_9$  - 1 bit for  $C_5$

1 | 2 bits | 2 bits |  $C_4 | C_5$   
C1 C2 C6 C8 C8 C7 C9

I1 = 10|01|11 include  $C_1, C_2, C_3, C_4, C_5$

Ex.

uInst's

I<sub>1</sub>I<sub>2</sub>I<sub>3</sub>I<sub>4</sub>

Control signals

a b c g

a c e h

a d f

b c f

$$M_1 = \{ \textcircled{a}, \textcircled{b}, \textcircled{f}, \textcircled{d}, \textcircled{e}, \textcircled{g}, \textcircled{s}, \textcircled{x}, \textcircled{h} \}$$

$$M_2 = \{ bd, be, bh, \textcircled{cd}, de, dg, dh, ef, eg, fg, fh, gh \}$$

$$M_3 = \{ bde, bdh, dgh, efg, fgh \}$$

|     | a | b | c | d | e | f | g | h |
|-----|---|---|---|---|---|---|---|---|
| a   | ✓ |   |   |   |   |   |   |   |
| cd  |   |   |   | ✓ | ✓ |   |   |   |
| bde |   | ✓ |   |   | ✓ | ✓ |   |   |
| bh  |   |   | ✓ |   |   | ✓ |   | ✓ |
| dgh |   |   |   |   | ✓ |   | ✓ | ✓ |
| efg |   |   |   |   |   | ✓ | ✓ | ✓ |
| fgh |   |   |   |   |   | ✓ | ✓ | ✓ |



## Memory Organization

DATE: \_\_\_\_\_  
PAGE: \_\_\_\_\_

- Address lines  $\rightarrow$   $n \rightarrow 2^n$ .
- proc: only knows primary memory.
- ~~size~~ secondary mem.  $>$  primary mem.
- Only <sup>activate</sup> programs are there into the primary mem.
- Cache memory are placed bet<sup>n</sup> proc. & main memory
- Secondary mem. - uses proc.  
primary mem - (5 to 10 times slower than)
- DRAM: are cheaper than SRAM  
<sup>or cache</sup>
- primary  $\rightarrow$  %S is high & access time is low  
secondary  $\rightarrow$  %S is low & access time is high.  
 $\downarrow$   
cost / bit
- write cycle takes more clockcycle than read cycle.

## Access Modes

- 1) serial Access
- 2) Random Access
- 3) SemiRandom Access: - Hard disk / CD drive used in

①  
serial  
Access  
Mode.



②



data is wherever it will take same amount of time to access.

③



- time taken to move R/W head on a given track is known as "seek time".
- R/W head is under given sector disk has to rotate → it is known as Rotational delay.
- disk controller is used to read data or write data with bus takes time → known as Access time

- Avg Rotational delay = Rotational delay  
 ↓  
 2.

- In semia random access, we don't have seek time only we have rotational delay & access time



track-0 on Platter 1      ] → makes cylinder 0  
 track-0 on Platter 2      ]  
 - - - - - Platter 5

Ex 1 Consider a Hard disk with 4 surfaces, 64 tracks/surface, 128 sectors per track, 256 bytes per sector.

i) Find capacity of the Hard disk.

$$\rightarrow \text{Capacity} = 8 \text{ Meger Byte}$$

$$= 4 * 64 * 128 * 256 \text{ bytes}$$

ii) Disk is rotating at 3600 rpm. What is a data transfer rate?

$$\rightarrow \text{Data transfer rate} = \text{No. of Rotations/sec}$$

$$\frac{\text{* track capacity *}}{\text{No. of surfaces}}$$

$$= \frac{3600}{60} \text{ rps} = 60 \text{ rotations per sec.}$$

$$\text{DTR} = 60 * 128 * 256 * 4 = 7.5 \text{ MB/sec.}$$

iii) Average access time = Avg. rotational delay  
~~= 8.3 msec~~  
~~=  $\frac{8000}{2}$~~

$$\text{Rotational latency} = \frac{1}{60} \text{ sec} = 16.67 \text{ msec}$$

$\downarrow$   
rotations/sec

$$\text{Average Rotational latency} = \left( \frac{16.67}{2} \right) = 8.3 \text{ msec}$$

## Classification of Memory Systems.

(a)

### Volatile

- stored data is lost when the power is switched off
- ex CMOS static mem.  
CMOS dynamic memory

### Non-volatile

- stored data is retained even when the power is switched off
- ex ROM, Magnetic disk, CDROM / DVD

(b)

### Random-access

- R/w time is independent of the memory location being accessed.

### Direct / Sequential Access

- stored data can only be accessed sequentially in a particular order

- Ex CMOS memory  
(RAM and ROM)

- Ex. Magnetic tape

- part of the access is sequential and part is random - direct access or semi-random access

(C)

ROM

RAM

- data is stored in permanent or semi-permanent.

- data access time is the same indep<sup>1</sup> of the location

- data written during manufacture or in the laboratory.

- data once written are retained as long as power is on.

- Ex ROM, PROM, EPROM

- Ex SRAM, DRAM.

Ans: 2

$$\text{No. of tracks} = 500$$

$$\text{No. of sectors/track} = 100$$

$$\text{No. of bytes/sector} = 500$$

Time taken by the head to move from one track to adjacent track = 1 ms

$$\text{Rotation speed} = 600 \text{ rpm} = 0.1 \text{ sec}$$

taken

What is the avg time for transferring 850 bytes from the disk?



Avg. time to transfer =

Avg. seek time + Avg. rotational delay  
+ Data transfer time.

- Now, Avg seek time =  $\frac{\sum(0+1+2+\dots+499)}{500}$

$$= [249.5 \text{ ms}]$$

time taken to move from,

1st track to 1st track = 0 ms

1st track to 2nd track = 1 ms, 2 ms, 3 ms, ..., 499 ms

- Avg. Rotational Delay =  $\frac{0.1}{2} = [50 \text{ ms}]$

- Data transfer time :

In 1 track rotation we can read data  
on one track =  $100 * 500$   
 $= 50,000 \text{ B}$  tracks

Now,  $50000 \text{ B} \rightarrow 0.1 \text{ sec.}$

$$250 \text{ B} \rightarrow \frac{250 \times 0.1}{50,000} = [0.5 \text{ ms}]$$

∴ Therefore, ATT =  $249.5 + 50 + 0.5$   
 $= [300 \text{ ms}]$

Ex.

A hard disk has 63 sectors per track, 10 platters  
each with 2 recording surfaces and 1000  
cylinders. Address of the sector is given as  
a triple  $(c, h, s)$ , where  $c$  = cylinder number,  
 $h$  = surface number,  $s$  = sector number. Thus, 0th  
sector is addressed as  $(0, 0, 0)$ . the 1st sector as

$(0,0,1)$  and so on. The address  $<400, 16, 29>$  corresponds to the sector number \_\_\_\_\_

→  $<400, 16, 29>$  represents 400 cylinders are passed. (0-399) and thus, for each cylinder 20 surfaces (10 platters \* 2 surface each) and each cylinder has 63 sectors per surface.

Hence, we have passed 0-399

=  $400 * 20 * 63$  sectors + In 400th cylinder we have passed 16 surfaces (0-15) each of which again contains 63 sectors per cylinder so  $16 * 63$  sectors + Now on the 16th surface we are on the 29th sector. So sector no.

$$= 400 * 20 * 63 + 16 * 63 + 29$$

$$= \boxed{505037}$$

Consider the data given in the previous question. The address of the 1039th sector is \_\_\_\_\_

→ 1 cylinder have  $63 * 20 = 1260$  no. of sectors  
 $\uparrow$   
 10 platters \* 2 surfaces

Here,  $1260 > 1039 \rightarrow$  so it would be in 1st cylinder  $\boxed{(\text{cylinder no.} = 0)}$

Now, surface no. (head no) =  $\frac{1039}{63} = \boxed{16}$

sector no. =  $1039 \% 63 = \boxed{37}$  sector no.

So,  $\boxed{(\text{CHS}) = (0, 16, 31)}$

## • Characteristic / Types of Memory

### 1. Destructive Read out RAM.

- whenever Read op. is done with content of memory data will be lost. To return data, same them is copied in Buffer register & after reading it needs to write in that place.

### 2. Dynamic - data is stored as charges in capacitor, periodically dynamic RAM needs to be refreshed.

### 3. Volatile - temporary data.

- In Non-destructive RAM - we did not have to write after read operation.
- Access time ( $t_A$ ) - time which memory takes to receive request & give output is known as access time.
- Time bet<sup>n</sup> 2 successive read or 2 successive write operation is called cycle time ( $t_H$ ).
- Cycle time & access time may not be same.
- $t_A$  (Access time) is same as cycle time in static memory.

## RAM IC :

8 bits  
= 8 datalines



$$4K = 2^2 \times 2^{10} = 2^{12}$$

4096 diff. memory

location

↪ 12 address lines are required

- LDA 2000, load into AC
  - ↪ memory read

\* WE - Write Enable input

when WE = 1 indicates it has to perform memory write.



WE = 0, read op.

\* OE - Output enable

whether data bus is ready to receive or send data.

OE = 0, Bus is not ready to receive or send data

size of 8085 processor =  $2^{16} = 64\text{KB}$

$$16 \text{ add. lines } \frac{2^{10} \cdot 2^6}{K} \rightarrow 64$$

Ques

I want to design 8085 uProc. using 4Kx8 memory

- 16 add. lines  $\rightarrow$  4Kx8 ICs are required to make 64KB

Ques

que.

1010 H.



ans

For 32 KB

↪ 8 IC required.

↓  
3x8 decoder , 3 input  $[A_7 - A_5]$   
required

que

4K x 8 memory bit require size = 16 bits

↓

8 bit word size.

improvement:



1 column 2 ICs each of 8 bit word size  
 $= 8 + 8 = 16$  bit wordsize



1-D form of Arrangement  
of Memory

• 2-D form of Arrangement of Memory.



$$\text{Total} = 2^8 + 2^8 = 2^{16}, \text{Drivers are required.}$$

- 2-D form reduces no. of circuit required.
- It is recommended to use 2-D form.
- To increase the speed of memory.



wordsize = 16 bits

But databus is of 8 bits

so, databus first access the  
1st 8 bits And then other 8.

- To access 16 bits more size of proc is required.
- Disadvantage: cost of circuit increase.



- B0 start executing, B1, B2, B3 are free.
- At same time, 2<sup>nd</sup> Address is placed at B1 which is free, processing speed increase.

(1) Assume that there are latency, it takes 1 clock cycle would be taken to reach address from CPU to main memory, main memory access time = 16 clock cycle, transfer time 1 clock cycle from memory to CPU. Find time to catch 41 words without memory interleaving.

$$\rightarrow \text{1 word size} = 15 + 1 + 1$$



$$4 \text{ words} = 17 \times 4 \\ = 68 \text{ clockcycle.}$$

But using Memory Banks, in case of not using memory interleaving.

$$\text{for } B_0 = 17$$

$$B_1 \rightarrow 15 + 1 = 16 \quad 16 \times 3 = 48$$

$$B_2 \rightarrow 15 + 1 = 16$$

$$B_3 \rightarrow 15 + 1 = 16$$

$$48 + 17 = 65 \text{ clock cycles}$$

$\rightarrow$  Using memory interleaving.

$$B_0 = 17$$

$$B_0 \boxed{17}$$

$$B_1 = 1$$

$$\boxed{17+1}$$

$$B_2 = 1$$

$$\boxed{18+1}$$

$$B_3 = 1$$

$$\boxed{19+1}$$

$$\text{Total} = 17 + 1 + 1 + 1 = \boxed{20 \text{ clock cycle}}$$

$$= 20$$



- Level - I memory nearest to memory (L<sub>1</sub>)
- cost per bit is very much less for higher order than lower order memory (L<sub>1</sub>)
- storage of lower level less than higher level memory.
- Access time of lower level memory is less than that of higher level memory



- Required data should be present in fastest memory (CM).
- whatever is present in MM is not present in CM.
- Hardware controller would take care of block transfer from MM to CM.
- In OS, we have memory management unit which would take transfer from SM to MM.

### Virtual Memory

- In assembly language, the data is stored in physical memory.
- User would find that it is having very vast memory.

- 10 programs running in processor (CPU) then all 10 programs should be present in MM, processor only knows MM.
- If size of prog. is too large, instead of placing all in MM at same time, that prog. is divided in no. of Blocks. One of the block place in MM, MM execute that Block (B1) & after executing, remove that Block, new Block is added in MM (B2), these B<sub>1</sub>, B<sub>2</sub>, ... Blocks needs to map in SM which forms virtual memory.
- The address (Blocks) present in MM are physical address.
- MM is build using dynamic RAM, requires period refreshing, CM is build using static RAM and it is nearer to processor, therefore CM is more faster than MM.

• Prediction of Addresses : locality of Reference address

1000 MVI A,10

1002 MOV B,A

1003 MVI C,10

1004 loop: DCR C

1007 JNZ loop

→ due to prediction  
it assume that  
next inst<sup>n</sup> is  
at very next  
add

- That type of locality where it predicts that it needs to be placed at very next add. is called spatial locality.

↳ consecutive inst<sup>n</sup> are placed at consecutive address.

- For loops case, entire loops should be copied from higher (secondary) level to lower level (L1).

for loop

{

stat 1

stat 2

—

3

] all statements needs to be copied.

- In temporal locality, in case of loop, entire loops need to be locality.

### Cost and Performance (Measure)

Assume, 2-level memory

$$\text{cost } C = \frac{C_1 S_1 + C_2 S_2}{S_1 + S_2}$$

where  $C_1$  = cost/bit of Memory M1

$S_1$  = storage capacity of M1

$C_2$  = cost/bit of M2

$S_2$  = storage capacity of M2

→ Hit ratio : prob. of Reference word if it is present in physical memory

$$H = \frac{N_1}{N_1 + N_2}$$

where  $N_1$  = no. of references of M<sub>1</sub>

$N_2$  = no. of references of M<sub>2</sub>.

(word) in MM.

if it is not present it is miss.

miss  $M = 1 - H$ .

max. Hit or miss is 1. (max. prob.)

→ Access time : time to access the content of memory

$$ta = Hta_1 + (1-H)ta_2 \quad (1)$$

avg. access time.

if word is present  $H=1$ .

$$ta = ta_1$$

$$\text{if miss } \rightarrow ta = ta_2 \\ (H=0)$$



copied whole block if it is not present in MM, which is required by CPU (by virtual add.)

Proc always only knows MM. SM is like IO device. proc. does not know about SM.

$$ta_2 = \text{Block transfer time} + \text{Access time of MM}$$

$$\boxed{ta_2 = t_B + ta_1}$$

put into C12

$$ta = Hta_1 + (1-H)(t_B + ta_1)$$

$$= Hta_1 + t_B + ta_1 - Ht_B - Hta_1$$

$$\therefore \boxed{ta = ta_1 + (1-H)t_B}$$

$$\rightarrow \text{Access ratio } r_2 = \frac{ta_2}{ta_1}$$

$\rightarrow$  Access efficiency (How far it is from the avg. access time)

$$e = \frac{ta_1}{ta}$$

$$= \frac{ta_1}{Hta_1 + (1-H)ta_2}$$

$$\therefore \boxed{e = \frac{1}{r_2 + (1-r_2)H}}$$

$\rightarrow$  Utilization factor  $U = \frac{\text{Useful space}}{\text{Total space}}$

$$\boxed{U = \frac{S_U}{S}}$$



once  $P_1$  is over  
it is removed  
from MM and  
another  $P_2$  is  
placed.

$P_1$  requires = 780 ,  $P_2$  = 550

① space ( $P_1 - P_2$ ) is wasted  $\rightarrow$  this is known as fragm

(anticipation)

- due to prediction if word w is required it would also copied next add w+1 but ✓ it might possible that proc. don't require because of it so it wastes the space. of MM. at that spacial locality. of MM.
- Instead of word by word copy, block by block copy is good.

- ② Prediction may go wrong
- ③ part of the MM is utilized by system s/w so it can't be useful.

### Address translation

done by

- 1) - Programmer (Assembly lang. prog.)  
- Requires when programmer writes prog. in ALP.
- If we use kit, we have to provide starting add., opcode of inst's.

2) Compiler

3) loader

4) Run the prog. (when we right recursion)

variable

fixed size blocks - is also known as pages.

variable size blocks - known as page ~~fix~~ frames  
fixed

| MM     | SM   |     |
|--------|------|-----|
| Page 0 | Pf 0 | 4KB |
| Pg 1   | Pf 1 | 4KB |
| Pg 2   | Pf 2 | 4KB |
|        | Pf 3 | 4KB |
|        | Pf 4 | 4KB |

by same  
size.

MM partitioned into page frames, page  
can be accommodated in page frames.



- Page table: per proc one page table would be maintained.

| (virtual) | (physical)  |                     |
|-----------|-------------|---------------------|
| Av        | Ap          |                     |
| 0         | 3           | physical add. = 48. |
| 1         | 1           |                     |
| 2         | 4           |                     |
| pages     | page frames |                     |

virtual add. = 1  $\Rightarrow$  physical add. = 49.

virtual add. = 0  $\Rightarrow$  physical add. = 48.