

# Computer Organization

1. Number System & Representation

2. Memory Organization : Hierarchy

Cache Memory - Address mapping

Updation

Replacement

Principles of virtual memory & Associate  
memory

3. Fixed Point Arithmetic, carry look ahead etc

4. Instructions, Addressing Modes, Instruction  
pipeline, control unit design

5. Addressing, Data Transfer Techniques  
Disk & tape memories

1,3,4: Computer Architecture - J.P. Hayes

2,5: Computer organization - Pauli Choudhury

- Q1: It deals with conceptual things of ~~system~~ system
- Instruction Set
  - Addressing Modes
  - Data format

Q2:

Instruction Set

CISC (Intel)

RISC (Reduced Inst. set) Risk

(PowerPC) "fixed length"

(Not advantage) Supports limited addressing mode

If any instr. will not complete at  $CPI = 1$ ,

then it is not a part of RISC  $\downarrow$  clock per instr.

No more registers are required

Ltd provider only flexibility

If instr. have - Opcode / Ref. opr

Reference of operand.

by Ref opr part, we can easily get the instr. address

If Data format deals with "How to interpret the binary string"

23/09/10

## Instruction Pipelining

Register,

→ Efficient usage of resources (Instruction phase are overlapped)

→ Performance of pipelining is given by speedup factor,  $S = \frac{\text{time without pipeline}}{\text{time with pipeline}}$

re. address

register

→ Ideal case with  $k$ -stage instruction pipeline

$$T_n = (K + n - 1) T_{cb} + T_{clockp} + S_{ideal} = K, \\ CPT_{avg} = 1$$

red in it.

→ Parameters influencing the performance

- Un-even stage delays

- Buffer overhead

- Dependencies among the instructions

→ Data dependencies occurs when the result of one instruction is to be used by its successor.

→ Instruction scheduling & stall cycles introduced by operand forwarding techniques deal with data dependencies.

→ Control dependencies occur when flow of control is altered by instruction cycle of another.

(Branch instructions) are the main resources for control dependencies.

→ Delayed-Branch, Multiple pipes & prediction are used to deal control dependencies.

(a) If processor register will be

81. of operand

(1) Index - then it is index register.

→ used for accessing the memory.

(2) Base address of operand - Base address

Register

→ used for relocatable objects.

→ prob occurs with JUMP instr.

Used for in

(3) Relative Addressing - PCs involved in it

Used for intrasegment branching.

(4) Based Index Relative

$A(0,0)$  $A(1,0)$  $A(2,0)$  $A(3,0)$  $(0,0)$  $(1,0)$  $(0,1)$  $(1,1)$ 

### Principle of Associate Memory

"The associate memory is also called a content addressable memory. To retrieve the info., the content or partial content is used; since no address is involved, it is the fattest memory."

The associate memory contains:

- (1) Argument Register (n)
- (2) Key Register (mask register (mask)) (n)
- (3) Associate memory array ( $m \times n$ )
- (4) Match logic ( $M \times 1$ )

→ To perform the read oprn, place that cont or partial content in the argument register. The match logic will generate either single or multiple match word.

→ In case of multiple matching, the matched words are accessed sequentially until the desired word is obtained.

→ To perform the write oprn, place the word in argument register, if it is not already existing then it is to be moved into one of the free locatn.

→ The match logic expression for  $j$ th word (Complete Matching)  $M_j = \prod_{i=0}^{n-1} (F_{ij} \odot A_{ij})$

|                                     |                       |                                                        |                                              |
|-------------------------------------|-----------------------|--------------------------------------------------------|----------------------------------------------|
| $A(0,0) \rightarrow M \leftarrow R$ | Column-Major<br>Order | Block 0                                                | $A(0,0)$<br>$A(1,0)$<br>$A(2,0)$<br>$A(3,0)$ |
| $A(0,1) \rightarrow M \leftarrow R$ | $H \leftarrow W$      | $A(0,0)$<br>$A(0,1)$<br>$A(0,2)$ occupies<br>400 words |                                              |
| Hits : 100                          |                       |                                                        | $A(8,0)$<br>$A(7,0)$<br>$A(0,1)$<br>$A(1,1)$ |

If the array is arranged in Column major order for every element, a block is moved from main memory to cache, total blocks moved are 100. For every block two references are made, one of them result Hit. No. of Hit = 100.

$$\text{Hit ratio} = 0.5 \quad (50\%)$$

→ Requires 1 extra memory ref.

Adv: Range of operand is limited by memory word size not by instr. length.  
or restricted  
personel] static variables can be accessed.

DisAdv - Limited Range of Addr.

### (3) Indirect Addressing Mode.

|                 |               |
|-----------------|---------------|
| Opcode   Ref-FA | 32 bits<br>EA |
|-----------------|---------------|

needed

→ Two extra memory references

are required (one for accessing operand)

Ref-FA & one for operand)

→ Single indirect requires memory references  
Hence, n-indirect requires n+1 extra memory ref.

→ Can access local variable

### (4) Computable Addressing mode

address



## 1. Non-Computable Addressing Mode

(effective) EA is to be obtained.

## 2. Computable Addressing mode

EA [effective address] + Address of operand  
is to be computed.

1

### Non-computable Addressing mode

A

(3)

#### 2 (i) Immediate Addressing mode

[opcode / operand]

→ No extra memory ref. for operand is needed.

→ Fastest.

→ Suitable for accessing constants.

Dis - Limited Range of operand. (bcz the field given to operand is fixed).  
Hence,

#### (2) Direct Addressing mode

Gives ref. of operand = effective address  
(Addr.) Known as

(4)

Index

Base

Prog

Coun

(Relati)

| opcode | Ref operand<br>(EA) | 32 bits<br>words |
|--------|---------------------|------------------|
| 0      | 102-1               |                  |
| -2     | 102-1               |                  |

72  
123  
4

$$P \rightarrow P_{\max} \text{ iff } Q \rightarrow Q_{\min}$$

384       $P \rightarrow P_{\min} \text{ iff } Q \rightarrow Q_{\max}$

$Q_{\min}$  &  $Q_{\max}$  here,  $1 \leq Q \leq 3$

6 instr. :  $\min Q = 1 \quad P + 1 \times 2^7 = 2^9$

$$P = 384$$

max. Q = 3     $P + 3 \times 128 = 512$

$$P = 512 - 384$$

$$P = 128$$

1-addr.       $P + Q \times 2^7 + R \times 2^{14} = 2^{16}$  - Total possible zero-  
 bits are      Zero Addr.      no. of 2-Addr. instr.      Addr. instr.  
 ence of      instr.      no. of 1-Addr.      (Invalid zero address instr. due  
 id of      instr. (Invalid zero address instr.) to the presence  
 1-Addr.      due to the presence of      of 2-Address  
 address      1-Address instr.)      instr.)

address

instr.      Addressing Modes

ence of      "The way reference of operand is given in  
 instr.)      the instr."

→ The addressing mode provides flexibility for  
 program construct.

→ CISC offer more addressing modes.

by the above machine?

- (a) 128,512      (b) 0,384      (c) 128,384  
 (d) 256,512

Soln -

|           | 15 bits |      |      |              |
|-----------|---------|------|------|--------------|
| For       | Opcode  | Ref1 | Ref2 | = 256 instr. |
| 1-Address | 4       | 7    |      |              |
| 2-Address | 2       | 7    | 7    |              |

To accommodate 1-2-address instr. 2<sup>7</sup> 1-address instr. are to be sacrificed (As 7-bits are allocated from opcode). The presence of 2-2 Address instr. results invalidation of 256 1-Address instr. Thus, Valid 1-Addr instr. are  $8 \times 12 = 256 \Rightarrow 256$  instr.

$$P + Q + 2^7 = 2^9 \quad \text{Total one address instr.}$$

Valid 1-Addr.      Invalid 2-Addr.

Instr. (due to the presence of 2 Two-address instr.)

$$Q = 2, P = 9$$

$$P \cdot 2^9 - 2^8 = 256$$

st<sup>n</sup> one addr

→ Smallest length (16 bit)

256 instr.

→ More cost. (disadv.)

|                  |
|------------------|
| 3) 1024<br>Words |
| 223              |

→ Zero - Address Instr. (or stack Addressing)

included)

→ Uses Stacks

→ Perform pop (for getting the  
Operande) first then push

→ Adv. - CPT C1

↓  
Clock pulse Tstr. (getting little faster)

More  
Disadv. - Complexity in the implementation

Rigid requirement on stack requirements  
(placing proper)



ts)

Q. A hypothetical system support 1-Address &  
2- address instr. The 16-bit instr is  
stored in 128-word memory. If there exist  
2-2 address instr. (A) what will be the no.  
of 1-Address instr supported by the machine?

(a) 128 bits

(b) 256 "

(c) 384 "

(d) 512 "

$$A = [A] + M[A]$$

(B) What will be the min/max 1-Address instr. supported

Prog - carrying sequence of instr' one after another

256 instr.

4 Address insr' instr'

(4 bits)

1024

words

→ Adr. Simplicity

1023

disAdr. Too lengthy

→ 3-Address instr. (Prog. Counter included)

→ 1) Ee

→ Pext

[ADD | A<sub>1</sub> | A<sub>2</sub> | A<sub>3</sub> ] + PC (36 bits)

operr

→ length become Reduced (Adr.)

→ Adr

Cost increased (disAdr.)

Dis

→ 2-Address Instr.

ADD | A<sub>1</sub> | A<sub>2</sub> | + PC

M[A<sub>1</sub>] ← M[A<sub>1</sub>] + M[A<sub>2</sub>] (28 bits)

Q. A

2-

→ Faster than above 2-Address instr. } Adr.

→ Length Reduced more than 3-address. }

→ Overwriting the operand. & Result will permanently  
lost. (disAdr.)

st

9-

1-

(a)

(b)

(c)

(d)

PC, Accumulator

ADD | A<sub>1</sub> |

A = [A] + M[A<sub>1</sub>]

(B)

$x - 1$  $X + 2^k$ 's complement

001

110

$$\boxed{111} = x - 1$$

 $Z_0$  $Z_1$ 

$$\text{E.g. } M_1 M_2 = 01$$

$$Z = X + 0001$$

$$Z = X + 1$$

$M_2 = 1$ , all the addrs give no.

 $Z_3$ 

$$\text{E.g. } M_1 M_2 = 11$$

$$Z = X + \bar{Y} + 1$$

$$= X - Y$$

Instructions, Addressing Mode, Unop & op

put

 $M_2 = 00$ , then

let 1 to all

its

therefore, on func

Format

functions

{

Instructions

opcode

(operand) Ref. RPR

↓

Number of references

4 Addresses instruction  
(No PC)ADD | A<sub>1</sub> | M<sub>2</sub> | A<sub>3</sub> | A<sub>4</sub>M[A<sub>1</sub>] ← M[A<sub>2</sub>] + M[A<sub>3</sub>]A<sub>4</sub> → null bcoz col

Classification based - Data Xfer

Arithmetic

Logical

Branching &amp; Condtn

Using Linked List

Ref to next instrucn (self-referencing)

21-09-10



$M_1 \quad M_2$

Operat<sup>o</sup>n

0 0

$x - 1$

0 1

$x + 1$

1 0

$x + y$

1 1

$x - y$

E.g.

$M_1 \quad M_2 = 0 \quad 0$

(G) 110 - Input

111 - If  $M_1 = 00$ , then

(S) 101 (S) added 1 to all

bits

Classification

When add anything it performs subtraction therefore, on function  
 $x - 1$ .

generated using

$$T_{CO} = 3T_g$$

$$T_{SG} = T_{EX-OR}$$

$$\begin{aligned} T_{CAT} &= 3T_g + T_{EX-OR} \\ &= 4T_g + T_g \\ &= 6T_g \quad (= 3T_g) \end{aligned}$$

$$x_0 + y_0$$

$$+ C_{in}(P_0)$$

$$C_o P_1$$

$$+ C_o P_1$$

$$x_i + y_i$$

- Q. Consider a 16-bit addition, here the behavior of EX-OR is similar to that of any other gate & consumes 5 ns. What is the time taken with -

① 16-bit CLA

② 8-bit CLA followed by 4-bit CLA.



Costliest - more h/w require  
bcz gate delay is far less



$$4T_g + 4T_g = 8T_g \Rightarrow 8 \times 5 = 40 \text{ ns}$$



$$4T_g + 4T_g = 8T_g \Rightarrow 8 \times 5 = 40 \text{ ns}$$



(cheapest)

$$4T_g + 4T_g + 4T_g = 12T_g \approx 60 \text{ ns}$$

less h/w required,  
bcz gate delay is high

## Carry Look Ahead Adder

The CLA provides constant time for addition.

It contains carry generate stage & sum generate stage.

$$T_{\text{car}} = T_g + T_{sg} \quad (4T_g / 6T_g)$$

gate delays

→ It improves the speed of addition.

With CLA, the performance of ALU will be increased.

The CLA requires more b/w. Hence, it is the costliest adder.

$$\text{Time Complexity} = O(2^n)$$

## Limitations of CLA

The fan-in constraints of the gate violates constant time requirement.

The combination of CLA or RGA (Ripple carry Adder) is used for larger size operands.

$$X = X_s(X_m)_2 + 2^{Ex-Bias}$$

fed for exponent

$$Y = Y_s(-Y_m)_2 + 2^{Ey-Bias}$$

wires

$$X+Y = Z, Z_s = \text{sign bit}$$

$Z_m = \text{Mantissa}$

$$Z = Z_s(Z_m)_2 + 2^{Ez-Bias}$$

$$Z = (X_s \oplus Y_s)$$

floating point  
pipelining).

if operands sign are same, there will be no result with zero.

ires

$$Z_m = X_m * Y_m$$

$$E_z = F_x + E_y$$

smaller  
variable

$$Z = (X_s \oplus Y_s)(-X_m * Y_m)_2 + 2^{Ez+Ey} - 2^{Bias}$$

for  $E_x + E_y$ , we add external bias. Bias in exponent bias field.

It will be  $f_2 - Bias$

for the biased exponent field in subtraction.  
Biased will be subtracted.

exam (addition)

normalize

$$Z = X/Y = Z_s(Z_m)_2 + 2^{Ez-Bias}$$

## Floating Pt. Arithmetic

The floating pt. arithmetic implemented for the operands which are in biased exponent form.

The addition-subtraction process requires,

- Exponent Alignment
- Operatn (Add. / Sub.)
- Normalization of the result.

To improve the speed of computation, floating pt. pipelines are used (functional pipelining).

The multiplication and division requires correction to the biased exponent.

Exponent Alignment - Equating the smaller to the larger variable.

$$0.05 \times 10^3$$

$$5 \times 10^{-1} \Rightarrow 5 \times 10^3$$

$$5 \times 10^0, 0.05 \times 10^1, 0.005 \times 10^2, 0.0005 \times 10^3$$

Ans -  $0.0505 \times 10^3$  // After operatn (additn)

$5.05 \times 10^2$  // After Normalizatn

no carry

n-bit

divisor

Successful  $\frac{n}{2}$

If unsuccessful, restore it.

$\frac{n}{2}$

↓ Restoration

Addition

Restoration

(addition)

$\frac{n}{2}$

$\frac{n}{2}$

Successful

On successful

$B_{\frac{n}{2}}$  (addition / Subtraction)

Non-Restoration

Dividend digit

$N$

Division digit

Compared before any subtraction.

10. If dividend digit  $>$  divisor digit, then subtraction is performed, otherwise not. ~~else~~  
only subtraction will be performed in this.

Subtraction's

- ii. for all d's & all i's i.e. 000 or 111, no opert will performed

Brute

### FIXED POINT DIVISION

The fixed pt div is implemented using ~~one~~ restor. division algo. & non-restor. div. algos.

The Restor. div algo. is simple & consumed 3/2 addition/ Subtractions on average.

The non-Restor. requires only n operations but consumes more h/w.

Restoration

$$V = Q + D + R$$

$$D) V(Q)$$

$$\begin{array}{r}
 287 \\
 -3 \\
 \hline
 6 \\
 +6 \\
 \hline
 0
 \end{array}$$

getting the dividend back if the restoration is not successful

Adv.

- (1) Sign of the multiplicand part is protected.
- (2) It enables to reuse the result. (A reused MSB result & its reused LSB result)
- (3) It is placed the current multiplier bit in Q position.

0111(7)

$$\begin{array}{r} 1101011 \\ \times 1011011 \\ \hline 1101011 = (-21)_{10} \end{array}$$

Operat<sup>n</sup>

$A \leftarrow A - M$  // Count denotes How many AR.S will performed

A.R.Shift

The above Booth's Algo is called as Radix-2 Booth's Algo. In the Radix-4

A.R.S. ARS,  
(7) Booth's Algo. 8-multiplier. Bits are considered and based on their pattern the arithmetic operations  $A + 2M$  or  $A + M$  are performed.

$A \leftarrow A + M$

If all the 3-bits are same

$$Q_1, Q_0, Q_{-1} = 000 \text{ or } 111$$

Then there is no Arithmetic operat<sup>n</sup> is performed.

$$\text{If } Q_1, Q_0, Q_{-1} = 011 \quad A \leftarrow A + 2M$$

$$Q_1, Q_0, Q_{-1} = 001 \quad A \leftarrow A + M$$

$$010$$

If complement the above bits

$$Q_1, Q_0, Q_{-1} = 100 \quad A \leftarrow A - 2M$$

$$Q_1, Q_0, Q_{-1} = 110 \text{ or } 101 \quad A \leftarrow A - M$$

the sign

$$1001 * [0][1][0]$$

$$1001 \leftarrow M \times 2^0$$

$$1001 \leftarrow M \times 2^1$$

$$0000 \leftarrow 0$$

$$-7 \times 3 = (-21)_{10}$$

$$M = 1001(-7)$$

$$(2\text{'s complement of } M) \#(M) = 0111(7)$$

| Count | A                    | D                    | $Q_{-1}$    | $Q_0 Q_1$      | Operatn                   |
|-------|----------------------|----------------------|-------------|----------------|---------------------------|
| 3     | 0000<br>0011<br>0111 | 0011<br>0111<br>0111 | 0<br>0<br>1 | 10<br>10<br>11 | $A \leftarrow A - M$ , // |
| 2     | 0011                 | 101                  | 1           |                | A.R.Shift                 |
| 2     | 0011                 | 101                  | 1           | 11             | A.R.S. ADQ, (7)           |
| 1     | 0001                 | 110                  | 1           |                |                           |
| 1     | 0001<br>1001         | 110                  | 1           | 01             | $A \leftarrow A + M$      |
|       | 1010                 | 110                  | 1           |                | A.Right S.                |
| 0     | [1]101<br>0100       | 011                  | 0           |                | A.DS;                     |

$$\overrightarrow{0100} + 4$$

$$10010 + 2 \quad \text{A right shift}$$

$$\text{Sign bit} \rightarrow 0100(-4)$$

$$\text{will remain } 0110(-2)$$

change

(shouldn't be affected)

∴ Arithmetic right shift preserved the sign bit.

D

19/09/10

start the

Start

(B)

patterns

with's Algo

lock

Partial

sum

 $M \leftarrow \text{Multiplicand}$   
(n) $Q \leftarrow \text{Multiples (m)}$  $Q \leftarrow Q(1)$  $A \leftarrow 00 \dots 0(n)$ Count  $\leftarrow m$ 

lock

P(A(d))

shift

8

3

6

8

patterns

Arithmetic Right Shift A(Q)  
(n+m)Count  $\leftarrow$  Count - 1

No

Count = 0

Yes

Result A(Q) (n+m)

LSB

MSB

Incorrect is needed

Stop

informal result

// LSB  $\rightarrow$  MSB is the process of multiplication

19/0

1. The more the blocks of 1's, the worst the performance with booth's algorithm.

E.g. Which of the following multiplier patterns gives the best performance with the booth's algo?

2 654 321 0 pos.

- (a) 0|1|11 1|1|0      // having 1 complete block.
- (b) 1|1|11 1|1|0      // partial blocks
- (c) 0|0|11 1|1|1      // 1 Complete block
- (d) 1|1|0 0|1|1|1      // 1 complete, 1 partial block

a. (b) is giving the best performance & (d) is giving the worst performance.

Add Sub shift

(a)  $2^7 - 2^1 = +126$       1 1 8

(b)  $-2^2 = -4$       0 1 2

(c)  $12^5 - 2^0 \Rightarrow 823$       1 1 6

(d)  $(+2^3 - 2^0) + (-2^8) \Rightarrow 8 - 1 - 32$       2 8  
= -25

Least performance = (d), (a), (c), (b)

(in ascending order)

Ascending order of performance of multiplier patterns wrt booth's algo.

## Booth's Algorithm

- Notation
  - Recording Process
  - Algorithm
    - ↳ features

(-ve or +ve) Sign bit 0 111 1110 +ve no.

7 6 5 4 3 2 1 0 Position (Pos)

|           |                          |                      |            |
|-----------|--------------------------|----------------------|------------|
| $\oplus$  | $1\ 1\ 1\ 1\ 1\ 1\ 1\ 0$ | $\leq 2^{k+1} - 2^k$ | 1-add      |
| $\ominus$ | $1\ 1\ 1\ 1\ 1\ 1\ 1\ 0$ | $= 2^{k+1} - 2^1$    | ition,     |
| $\ominus$ | $1\ 1\ 1\ 1\ 1\ 1\ 1\ 0$ | $2\ 2^7 - 2$         | 1-Subtract |
| $\ominus$ | $1\ 1\ 1\ 1\ 1\ 1\ 1\ 0$ | $2\ 128 - 2$         | 8-shift    |
| $\ominus$ | $1\ 1\ 1\ 1\ 1\ 1\ 1\ 0$ | $2\ 126$             | 6operatn.  |

OR

$2^6 + 2^5 + 2^4 + 2^3 + 2^2 + 2^1 = (10110)_2$  // Conventional Method.  
 6 additions, 21 shift operations

conventional  
only

He done

$$\begin{array}{r} \cancel{3} \ 2 \ 1 \ 0 \\ \underline{0 \ 1 \ 0 \ 1} \Rightarrow 2^3 + 2^1 \Rightarrow (+5)_{10} \end{array}$$

2 additions, 3 shifts open at

$$\begin{array}{cccc|c} 3 & 2 & 1 & 0 & \text{PAS} \\ \hline \boxed{1} & \boxed{0} & \boxed{1} & \boxed{0} & \\ J_1 & J_2 & J_3 & J_4 & \\ \hline 2 & 2 & 0 & 0 & \end{array} = +2^{J_2+1} - 2^{J_3} + 2^{J_4+1} - 2^{J_1}$$

$$= 2^3 - 2^2 + 2^1 - 2^0$$

$$= (+5)_4$$

operations of additions. But this repeating pattern is convenient for converting the signed no. into decimal.

- Adding the two complement no's one subtract.

(F.G) . . . 3

(10) . . . 3

1010 + 0011

1010

1010

0000

0000

0011110

0011110 = 30

OR  
=

- We cannot use <sup>Normal</sup> multiplication or conventional multiplication for signed arithmetic; only addition & ~~subtraction~~ subtraction can be done.

exceeds

### Saturated Arithmetic -

- for this  $0111 \Rightarrow 7$

Results taken b/w  $-7 \text{ to } +7$  &  $-8 \text{ to } +8$

$0110$

$0010$

$\underline{002}$

wing

Result

- In satn arithmetic, whenever the results exceeds, it will be set to largest representable value, not correct to the overflow.

- In Non-saturated arithmetic, correct will be performed in case of overflow. Here the result is wrap-around with more bits.

- The addition & subtraction can be extended for both signed no's & unsigned no's; if signed no's are denoted in 1's complement notation. These is not possible for multiplication & division. The sign bit is getting disturbed in the process of multiplication & division.

- Booth's Algo. Is proposed for implementing fixed point signed multiplication. The booth's notation is similar to that of 1's complement notation but reduces effective no. of shift

Whatever be the notation, if addition exceeds the limit  $-127$  to  $127$  or  $-128$  to  $128$ , then

there is a overflow.

Correct it to the overflow -

In the presence of overflow, the following correction is to be taken,

- (1) Complement the sign bit of the result
- (2) Add  $+$  or  $-2^{n-1}$  to the result.

$$\begin{array}{r} 0110\ 0000 \\ 0100\ 0001 \\ \hline 1010\ 0001 \\ \text{Complement}(1) \\ 00100001 = +33 \end{array}$$

Correct  $\Rightarrow (+33) + 128$

33

128

161

Arithmetic -

(1) Saturation Arithmetic

(2) Non-Saturation (Wrap-around)

Results

$Inc = 1$        $Cin = 1$   
 $Sets = 1$        $Cout = 1$   
 Element 2's Complement  
 $s = 2^3$

$Z_3 = 1$        $\checkmark \text{Cout} = 0$

①  $\underline{\underline{1111}} \ 1111 \rightarrow (-0)$

FAC  
End Around Carry  
 $\underline{\underline{1111}} \ 1111 \rightarrow (-0)$

$\underline{\underline{1111}} \ 1110$

$\rightarrow 1$

$\underline{\underline{1111}} \ 1111 \rightarrow (-0)$

②  $\underline{\underline{0111}} \ 1111$

$\underline{\underline{0000}} \ 0000$

$\underline{\underline{1111}} \ 1111$

③  $\begin{array}{r} 6432 \ 168421 \\ 0110 \ 0000 \quad 96 \\ 0100 \ 0001 \quad 65 \\ 1010 \ 0001 \quad 161 \end{array}$

beyond the range -127

$\rightarrow +127$

If Add the no's of diff. sign,  $\therefore$  no overflow.

Ans - resulting overflow.

Q4

4 2's complement ignored FAC & its range is  
 $-128$  to  $127$

①

$\underline{\underline{1111}} \ 1111 \quad -127$

FAC  $\underline{\underline{1111}} \ 1111 \quad +127$

$\underline{\underline{1111}} \ 1110 \quad -2$

②

$\underline{\underline{0110}} \ 0000$

$\leftarrow \underline{\underline{0000}} \ 0001$  which is溢出 (overflow)

$\underline{\underline{1010}} \ 0001$

Q. Which of the following sign arithmetic results

overflow? (Signed Magnitude Notatn) ( $C_{in} = 1$ ) ( $C_{out} = 1$ ) ( $C_{in} = 2$ ) ( $C_{out} = 0$ )

|     | Signed M N            | $1's$ Complement | $2's$ Complement                                     |
|-----|-----------------------|------------------|------------------------------------------------------|
| (1) | 1111 1111 + 1111 1111 | ✓                | $x_s = y_s = 2^2$                                    |
| (2) | 0111 1111 + 1000 0000 | ?                |                                                      |
| (3) | 0110 0000 + 0100 0001 | ✓                | $y_s = z_s = 0$ $z_s = 1$ $C_{in} = 2$ $C_{out} = 0$ |
| (4) | 0000 0000 + 1111 1111 | ?                |                                                      |

(1) - Sign Position  
 $\begin{array}{r} 1 \\ \times 1111111 \\ \hline 1111111 \end{array}$  | Signed  
 $\begin{array}{r} 1111111 \\ \times 1111111 \\ \hline 1111110 \end{array}$  | Magnitude

(2)  $\begin{array}{r} 0111111 \\ + 0000000 \\ \hline 1111111 \end{array}$  ?

(3)  $\begin{array}{r} 1100000 \\ + 1000001 \\ \hline 0100001 \end{array}$  ?

(4)  $\begin{array}{r} 1111111 \\ + 0000000 \\ \hline 1111111 \end{array}$  ?

$1's$  Complement

4 bits

Unsigned Number Range : 0 to  $2^t - 1 \Rightarrow 0$  to 15

$$\begin{array}{r} 1010 \\ \text{Carry} \rightarrow 0000 \end{array}$$

0000

Signed Number Range : -7 to +7 / -8 to +7

4 bits

$$\begin{array}{r} 0100 \quad X_5 \\ 0101 \quad Y_5 \\ 1001 \quad Z_5 \end{array}$$

Add two +ve no., results -ve no., then it is overflow.

| Range | $x_s$ | $y_s$ | $z_s$ | $v$ |
|-------|-------|-------|-------|-----|
|       | 0     | 0     | 1     | 1   |
|       | 1     | 1     | 0     | 1   |

$$V(x_s y_s z_s) = x'_s y'_s z'_s + x_s y_s z'_s$$

$C_{ins} \leftarrow$  Carry into sign bit

$C_{outs} \leftarrow$  Carry out of sign bit

into sign

denote overflow

t is having opp. sign

say bits into

in & out of

are not same.

$$V(C_{ins}, C_{outs}) = C_{ins} \neq C_{outs}$$

$$= C_{ins} \oplus C_{outs}$$

## Fixed point Arithmetic

— BCDean Notation Burroughs Notation

— Burroughs Algorithm

- Arithmeticic
- Addition
  - Subtraction
  - Multiplication
  - Division

## 2's complement Notation

| unsigned |                               | signed (-8, 4, 2) |
|----------|-------------------------------|-------------------|
| (10)     | 10 10                         | -6                |
|          | 0001                          | 1                 |
| (11)     | $\leftarrow 1011 \rightarrow$ | -5                |

1 Arithmetic addition. Can exceed the Range  
It may be

overflow

Correction for overflow

4 Un-Signed Numbers - If arithmetic result carry  $\rightarrow$  overflow

Signed Number

- Signed Magnitude - Carry into sign
- Notation bit denote overflow

(i) Operands of sign & result is having opp. sign  
1's/2's complement (ii) Carry bits into  
Notation sign-position & out of  
sign position are not same.

ion

$$M = 01110 \dots 0$$

ue format

$$F = 187 = 3$$

$$F = (130)_{10}$$

$$= 10000010$$

$$128 \quad 64 \quad 32 \quad 16 \quad 8 \quad 4 \quad 2 \quad 1$$

$$1 \quad 0 \quad 0 \quad 0 \quad 0 \quad 1 \quad 0$$

tes

if denotes

$$\text{bit mask } M = 01110 \dots 0$$

483

Q. Consider, IEEE 754, Single precision format, the following 32 bit no. is interpreted as \_\_\_\_\_ value with the above format

|                                           |      |       |
|-------------------------------------------|------|-------|
| S                                         | Bias | M     |
| 32 bit No. - 1 <u>1000 0011 11000...0</u> |      |       |
| 31                                        | 30   | 23 22 |
| MSB                                       |      |       |

$$E = (18)_{10}$$

is beth

As  $E \neq 0$  to 255; this pattern denotes implicit normalized no.

$$v = (-1)^{s+1} (1 + 0.75) \times 2^{131-127}$$

$\nearrow$   
 $(1100\ldots0)$

$$= -(1.75) \times 2^4$$

$$= -1.75 \times 2^4$$

$$= -(\frac{27}{8})_{10} = (28)_{10}$$

Q. What is the hexadecimal pattern denotes 11.5 in IEEE 754 format?

0 10000000 01110...0

|   |   |   |   |     |
|---|---|---|---|-----|
| S | 1 | E | M | 4F8 |
|---|---|---|---|-----|

$$0 \times 41880000$$

$$(11.5)_{10} = (1011.1)_2$$

$$= (1.0111) \times 2^3$$

### Single Precision (32 bits) Base: 2

| S   | F       | M       | Value        |
|-----|---------|---------|--------------|
| (1) | (2)     | (23)    |              |
| 0/  | 000...0 | 000...0 | $\pm 0$      |
| 0/  | 111...0 | 000...0 | $\pm \infty$ |

Q1. IF E ≠ 0 & E ≠ 255

Ans:  $V = (-1)^S \cdot 1.M \cdot 2^{E-127}$       implicit normalized no.

floating

pred in

0/      E=0,      M≠0      fractional

on

$$V = (-1) \cdot f_2 \cdot 2^{-126}$$

string  
ying

Q2. other combinat?      Not a Number (NaN)

$$E=1 \quad M \neq 0$$

be

4 Bias is not 128, it is 127 because certain special patterns are used for exponent. therefore, Bias is reduced.

exponent  
(not a no.)

4 F is not used in fractional part

## Percentage of error

$$67 \rightarrow 3$$

$$100 \rightarrow \frac{100 \times 2}{67} \%$$

$$\frac{93}{67} \rightarrow 4.4\%$$

## IIEEE 754

→ It provides the standards for floating point no.

→ The floating point no. is stored in either a single precision (32 bits) or in double precision (64 bits).

→ It gives the provision for denoting + or - +0 and +∞ by sacrificing certain mantissa exponent pattern.

→ The floating point no. is can be denoted with either with implicit normalization or fractional form.

→ Certain combinations of mantissa exponent doesn't denote any no. e.g (NAN) (not a no.)

$$M = (111)_2$$

$(2)_8$

$$E - 8 = 1$$

base 8,

$$E = 9$$

$$(37)_{10}$$

$$\begin{array}{r} 32 \ 8 \ 4 \ 3 \ 2 \ 1 \\ 1 \ 0 \ 1 \ 0 \ 1 \end{array}$$

$$(67)_{10}$$

$$64 \ 32 \ 8 \ 4 \ 3 \ 2 \ 1$$

$$1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 1$$

$$(0.001000011)_2 * 2^9$$

$$(103)_8 * 8^3$$

In the

$$E - 8 = 3$$

$$E = 11 = (1011)_2$$

$(Q37)_8$

$$M = 0010$$

$$0 \ 1011 \ 0010$$

$$\boxed{\begin{array}{|c|c|c|} \hline S & E & M \\ \hline \end{array}} = (262)_8 = (67)_{10}$$

$$8^{-1} = \frac{1}{8} = \frac{1}{2^3} = 0.5 \times 0.2^2$$

|   |      |      |            |
|---|------|------|------------|
| 0 | 1010 | 1000 |            |
| S | E.   | M    | $=(250)_8$ |

(4) It is normalized w.r.t. to base 8,

$S=0$  as the number is +ve,

$$\begin{aligned} (-1875)_{10} \times 2^5 &= (1011)_2 \times 2^5 \\ &= (1011)_2 \times 2^4 \\ &= (11)_2 \times 2^3 \\ &= (6)_8 \times 8^1 \end{aligned}$$

$$\Rightarrow E \cdot 8 = 1 \Rightarrow E = 1 = (100)_2$$

$$M = (1100)_2$$

$$0 \quad 1001 \quad 1100$$

|   |    |   |           |
|---|----|---|-----------|
| S | E. | M | $(234)_8$ |
|---|----|---|-----------|

Q. What is the pattern for 7.5 in the above register?

|   |   |   |             |
|---|---|---|-------------|
| S | E | M | $= (237)_8$ |
|---|---|---|-------------|

$$0 \quad 1001 \quad 111$$

$$(7.5)_{10} = (111.1)_2 \Rightarrow (111)_2 \times 2^3$$

$$(111)_2 \times 8^1$$

oint

(3) What is the pattern with implicit normalization?

overflow

$$V = (-1)^S (1.M)_2 \times 2^{E-\text{Bias}}$$

max

 $\approx 2^{63}$   
can't fit  
overlap  
here.

(4) Pattern if base of the system is 8.

Excess:

$$V = (-1)^S (1.M)_8 \times 8^{E-\text{Bias}}$$

then it

Excess-8 Exponent

(i) (2)  $S=0$  as the number is +ve

$$\begin{aligned} (0.9875)_{10} + 2^5 &= (0.0111)_2 \times 2^5 \\ (1.1)_2 \times (2)^{-2} \times 2^5 &= (1.11)_2 \times 2^3 \\ &= (1100)_2 \times 2^3 \end{aligned}$$

towards

(i)  $M = 1100$

Excess-8 value

(ii)  $E-8 = 3 \Rightarrow E = (11)_2$

$$\begin{array}{ccccccc} & & & & & 0 & 1011 \\ & & & & & \swarrow & \searrow \\ \text{(ii)} & \dots & | & S & | & E & | M \\ & & & & & & \dots \end{array} \Rightarrow (274)_8$$

tude fact

(iii) (3)  $S=0$  as the number is +ve,

$$(1.875)_{10} + 2^5 = (1.0001)_2 \times 2^5$$

$$(1.1)_2 \times 2^{-3} \times 2^5$$

$$(1.1)_2 \times 2^2$$

$$M = 1100$$

$$E-8 = 2 \Rightarrow E = (10)_2 = (1010)_2$$

The No. distribution of floating point representation



floating point repr. is not continuous.

If value comes between  $-2^{-65}$  to  $2^{+65}$  then it is underflow.

→ 0 cannot be covered.

→ The no. distribution is also not uniform the distance b/w the nos is very close towards zero & widely spread towards maxm value.

→ The effect of error is negligible towards zero & it is dominant towards maxm value.

Q.

S (1) E (4) M (4)

Mantissa is Normalized sign Magnitude fraction  
Exponent in Biased form

Base : 2

(1) Expression

(2) Pattern for  $= \frac{0.1011}{2^5} * 2^5$   
(Octal)

$$V_{\max} = (-1)^0 (1 - 2^{-8}) \times 2^{127-64}$$

$$V_{\max} = (1 - 2^{-8}) \times 2^{63} \approx 2^{63} (2^{63} - 2^{55})$$

truncated

2<sup>nd</sup> largest No.

What

is)

to



$$V_{\max} = (-1)^0 (1 - 2^{-7}) \times 2^{127-64}$$

$$= (1 - 2^{-7}) \times 2^{63}$$

$$V_{\max} = 2^{63} - 2^{58}$$

Difference betw 1<sup>st</sup> & 2<sup>nd</sup> largest No.

$$V_{\max} = (2^{63} - 2^{58}) - (2^{63} - 2^{55})$$

$$V_{\max} = 2^{55}$$

max

Pattern for smallest positive no.

$$0.10000000 \dots 1.0000000$$

S E M

$$V_{\min} = (-1)^0 (0.5) \times 2^{0-64}$$

$$= 2^{-65}$$

|    |      |        |         |
|----|------|--------|---------|
|    | 1    | 100011 | 1101111 |
| OX | C7DF |        |         |
| S  | E    | M      |         |

Since one of the mantissa bit is truncated  
 the repre. is having the error. What  
 was stored in the system is -(111.05)  
 Percentage of error is equal to

$$-111.75 = 0.25$$

$$100 \approx 100 \times 0.258$$

$$111.75 \approx 223.5$$

$$= \frac{250}{447}$$

$$= \frac{100}{447}$$

$$\approx 0.92\%$$

(c) If sign should be zero & all the bits  
 are 1 then the value will be max<sup>m</sup>.

(d) Pattern for max<sup>m</sup> value is

for largest (pos) No.

|   |        |        |
|---|--------|--------|
| 0 | 111111 | 111111 |
| S | E      | M      |

$$E = 127$$

$$(1 - 2^{-8})$$

sent the  
many  
signs  
for.

$$(1) \quad v = (-1)^s (M)_B \times B^{E-\text{Bias}}$$

$$\text{expression } v = (-1)^s (M)_2 \times 2^{E-64}$$

$$\text{Bias} = 64 = 2^{K-1}$$

$$2^6 = 2^{K-1}$$

$$K-1 = 6$$

$K=7$  bits

1st

no.

magnitude  
Base

| S | E | M |
|---|---|---|
|---|---|---|

$$0 \leq E \leq 2^7 - 1$$

$$(2) \quad (111.75)_{10}$$

$s = 1$  ( '-' sign ) or ( as the number  
is -ve )

$$(111.75)_{10}$$

Weight of F 1 1 0 1 1 1 1 1

$$(111.75) \quad 64 \quad 32 \quad 16 \quad 8 \quad 4 \quad 2 \quad 0.5 \quad 0.125$$

Integer portion

fraction

portion

$$③ \quad (11011111)_2 \times 2^7 \rightarrow \text{How many places } (.) \text{ is shifted}$$

$$E - 64 \div 2^7$$

$$E = (71)_{10} = (1000111)_2$$

$M = 11011111 \rightarrow$  truncated base M has

only 8 bits.

The above representation cannot represent the value '0' which is required for many computations. The IEEE 754 floating point standard has a provision for  $(+0 \& +\infty)$ .

1. IBM machine base is 16.

2. Intel machine base is 2.

Q. Consider the following 16-bit register representing the floating point no. with mantissa in normalized sign magnitude form. Exponent in excess-64 form. Base of the system is 2.

|   |                        |   |
|---|------------------------|---|
| S | E<br>(biased exponent) | M |
|---|------------------------|---|

(a) What is the expression that is giving the value 2?

(b) What is the 16-bit pattern that represent the value  $-(11.75)_{10}$ ?

(c) What is the largest value stored in the above register?

## FLOATING POINT REPRESENTATION

256 MB  
64 KB  
size of  
used for

The floating point no. contains integer portion & fractional portion. In the system, it is stored as mantissa & exponent pair. Most of the systems represent the mantissa in normalized in sign-magnitude form. The exponent is denoted in Biased form. The biased exponent is an unsigned no. which denotes signed true exponent (unsigned no.) If the Biased field is having  $k$  bits, then bias is  $2^{k-1}$

|   |      |   |
|---|------|---|
| S | E(k) | M |
|---|------|---|

Mantissa & exponent pair ( $+M, +E$ )

Value of the no. is ~~(+1)~~

$V = (-1)^S (1.M)_B * B^{E-Bias}$  (Explicit)  
where,  $B$  = Base of the system. Normalization is done (It is not stored in Memory)

The implicit normalization increases the accuracy. The value of expression is

$$V = (-1)^S (1.M)_B * B^{E-Bias}$$

↑  
"Implied bit"

~~x 143~~

~~10 -> 2<sup>13</sup> x 12~~

~~x 3  $\Rightarrow$  96 KB~~

1 Hand list is the virtual memory.

Q. Consider a 32-bit VA which refer 256 MB MM. Both are partitioned into 64-KB pages / frames. What will be the size of page table in bytes if paging is used for address translation?

$$VA = 32 \text{ bit}$$

$$\text{pages / frames} = 64 \text{ KB} \\ = 2^{16}$$

$$\text{Word offset} = 16 \text{ bits}$$

$$\text{page offset} = 16 \text{ bits}$$

$$MM = 256 \text{ MB} \\ = 2^{28} \text{ KB}$$

$$MM = 28 \text{ bits}$$

$$\text{No. of frames} = \frac{2^{28}}{2^{16}}$$

$$\text{No. of frames} = 4 \text{ KB}$$

page table contains  $2^{16}$  words

$$\text{Page table size (in bytes)} = \frac{2^{16} \times 3}{8 \cdot 2}$$

$$= 2^{16} \times 12 \times 2^{13}$$

$$= 2^{15} \times 3 \Rightarrow 96 \text{ KB}$$

Q. Consider a direct-mapped cache with 64 words & the main memory has to accommodate  $10 \times 10$  float array. Each floating point element occupies 4 words, the cache & main memory are partitioned into 16 word blocks. What will be the no. of hits if the following program segment is executed?

Let the array is stored in Row-major order

float [10][10]

for (i=0, i<10, i++)

    for (j=0, j<10, j++)

row major order  $A[i][j] = A[i][j] + 2;$

Ir  
th



Re  
ce

cache  
 cache  
 its of  
 dls are

Q. Consider a 2-way set-associative cache  
 with 4-blocks. The following memory reference  
 are made, what will be the min. no. of  
 faults with LRU strategy?

4 8 5 12 4 8 8

29, 63, R,

Assume that cache is empty initially.

block is

- (a) 4 (b) 5 (c) 6 (d) 7

LRU

$P=2$ ,  $N=4$  blocks

No. of set =  $\frac{N}{P} = \frac{4}{2} = 2$

set 0

set 1

o (K Mods)

K Mod 2 set

|       | F | F | F | F  | H | P |
|-------|---|---|---|----|---|---|
| 3 159 | 4 | 8 | 5 | 12 | 4 | 8 |
| 8     | 0 | 0 | 1 | 0  | 0 | 0 |

73 92

1 0

T

Q. Consider a 4-way set-associative cache which was empty initially. The cache contains 16 blocks and  $M$  consists of 256 blocks. Let the following MM blocks are referred.

0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155

Which of the following main memory block is not available in cache at the end of LRU replacement policy?

- (a) 3    (b) 8    (c) 129    (d) 216

$$P = 4, \quad N = 16 \text{ blocks}$$

$$\text{No. of Sets } S = 4$$

$k^{th}$  block of MN has to placed in  $(k \bmod S)$   
 $k \bmod 4$  set.

A 255

|             |   |     |   |   |   |     |     |     |
|-------------|---|-----|---|---|---|-----|-----|-----|
|             | 0 | 255 | 1 | 4 | 3 | 187 | 133 | 159 |
| $k \bmod 4$ | 0 | 3   | 1 | 0 | 3 | 0   | 1   | 3   |

|  |     |    |   |    |    |    |    |
|--|-----|----|---|----|----|----|----|
|  | 129 | 63 | 8 | 48 | 32 | 73 | 92 |
|  | 0   | 1  | 3 | 0  | 0  | 1  | 0  |

155  
3

Hit



replacement

Block present : 5, 7, 12, 13, ~~5, 7, 12, 13~~

Last Replacement: 4

Hits : 4

During MM

next

?

Direct Mapping

$$K \bmod N = K \bmod 4$$

: 4, 5, 7, 12, 13



last Recent

Hits = 3

reference.

Blocks present - 12, 13, 7, 5

Last replacement - 5

, 12, 13

The ascending order of the performance of mapping scheme w.r.t the above problem:

FIFO, Direct Mapped Cache, LRU,

### (i) Direct Mapping

$(k \bmod N)^{th}$  Block is chosen for replacement.

$k =$  recently referred from memory

- Consider a 4-block cache, the following 11 blocks are referred, which replacement strategy use the best performance?

ref : 4, 5, 7, 12, 4, 5, 13, 4, 5, 7, 12, 13

Cache

0

1

2

3

○ replaced.



Hit

○ Most Recent Reference.

FIFO: (4) (5) (7) (12) (4) (8) (3) (4), 5, 7, 12, 13

Hit Hit

4 left most side will be replaced.

TF

ma

Hits = 2

finally blocks present are 5, 7, 12, 13

Last Replacement = 4

## The Block replacement Techniques

The block replacement techniques are aimed to choose such a cache block, for replacement that may result less or no penalty (xtra time, performance get reduced due to the

### (1) FIFO

The block that spent longer time in cache is chosen for replacement. The Basis is arrival time.

The data structure used in this is queue



Blocks are entered from front & goes out from rear.

### (2) LRU

The block that is not recently used is chosen for replacement. Queue with slight adjustment on hit (recently used).

If there is not hit, LRU & FIFO are same.

It can be applied for both associate as as set-associative mapping.

Stukun

+

m

(c)

(1)

is

f

Date

B

F

(2)

ch

a

T

P

a

$$T_{avg} = \frac{3.106 + 1.58}{2} \cdot 2.9$$

$$= \frac{16.4}{2} + \frac{15.9}{2} + \frac{2.2}{2}$$

avg

$$= \frac{18.8}{2}$$

$$T_{avg} = 94 \text{ ns.}$$

$$\text{Performance} = \frac{1}{T_{avg}} \text{ words/sec.}$$

$\times 80\% +$

$$= \frac{1}{94}$$

$$\approx 10^9 \cdot \frac{1}{94} \text{ million words/sec}$$

$$\Rightarrow 10.63 \text{ million words/sec.}$$

$$\text{If } f_w = 55\%, f_r = 85\%$$

$$T_{avg} =$$

$I_1 + 20\%$

$I_{new} + T_c)$

27

For write back opn:

At any point of time, 20% cache blocks are modified and cache is full.

$$T_{avg} = f_r + T_{avg} + f_w \cdot T_{avgw}$$

$$\text{Performance} = \frac{1}{T_{avg}} \text{ words/sec}$$

$$T_{avg} = H_R \cdot T_c + (1-H_R) \underbrace{(T_B + T_c)}_{\substack{\text{dirty} \\ \text{clean}}} \cdot 80\% + 20\% \cdot (T_{Bold} + T_{Bnew} + T_c)$$

// If it is clean, overwrite on the existing Block

$$= 0.8 \cdot 10ns + 0.2 [410ns \cdot 0.8 + 810ns \cdot 0.2] \\ = 8 + 0.2[328 + 162] \\ = 8 + 0.2 \times 490 \\ = 8 + 98$$

|           |               |
|-----------|---------------|
| $T_{avg}$ | $\approx 106$ |
|-----------|---------------|

$$T_{avgw} = H_W \cdot T_c + (1-H_W) [80\% \cdot (T_B + T_c) + 20\% \cdot (T_{Bold} + T_{Bnew} + T_c)]$$

$$= 0.9 \times 10 + 0.1 \times 490 \\ = 9 + 49$$

|            |              |
|------------|--------------|
| $T_{avgw}$ | $\approx 58$ |
|------------|--------------|

which  
update?

if n. 8 (Read  
hit)

ie, then

from

or

use of  
strategy?

See

c)

10)

$$T_{avg} = H_w T_p + (1-H_w)(T_b + T_{update})$$

$$= 0.9 \times 100 \text{ ns} + (1-0.9)(400 + 100)$$

$$T_{update}, T_m = \max(T_c, T_m) \approx 100 \text{ ns}$$

$$= 10 + 0.1 \times 500$$

$$= 10 + 50$$

$$T_{avg} = 140 \text{ ns}$$

$$T_{avg} = \frac{25\% \times 90 + 75\% \times 140}{22.5} = \frac{90 \times 25}{100} + \frac{140 \times 75}{100}$$

$$= 22.5 + 105$$

$$T_{avg} = 127.5$$

$$T_{avg} = \frac{75\% \times 90 + 25\% \times 140}{100}$$

$$= \frac{75}{100} \times 90 + \frac{25}{100} \times 140 = 67.5 + 35$$

$$= 102.5 \text{ ns}$$

$$\text{Performance} \approx \frac{1}{100 \text{ ns/word}}$$

$$\approx \frac{10^9}{100} \text{ words/sec.}$$

$$= 10 \text{ million words/sec.}$$

- Q. Consider, a hypothetical processor which issues 25% of references for update. It uses two level memory hierarchy with  $T_c = 10\text{ ns}$ ,  $T_m = 100\text{ ns}$ ,  $H_R = 0.8$  (read hit),  $H_W = 0.9$  (write hit).
- If the referred word is not available, then a 4 word block is to be moved from Main to cache (either for read oprn or write oprn). What is the performance of this memory with write through strategy?

$$\text{Performance} = \frac{1}{T_{avg}} \text{ Words/sec}$$

$$T_{avg} = 25\% T_{avgr} + 75\% T_{avgw}$$

$$= f_r \times T_{avgr} + f_w \times T_{avgw}$$

$$T_{avgr} = H_R \times T_c + (1-H_R)(T_B + T_C)$$

$$T_B = 400\text{ ns}$$

$$= 0.8 \times 10 + (1-0.8)(400+10)$$

$$= 8 + 0.2 \times 410$$

$$= 8 + 82$$

$$T_{avgr} = 90\text{ ns}$$

be  
f fun  
entire

→ It gives the better performance for less no. of update.

→ In write-back update, the r/w update is done only when concerned updated block is chosen for replacement. If the block chosen for replacement is not mod. then incoming block can be simply overwriting in the cache. The Dirty bit is used to indicate the status of the block in cache.

$$T_{\text{update}} = T_B + T_C$$

↑ current update  
in case of  
clean block

↓  
New block

1. Current update always reflected in Cache memory, only.

$$T_{\text{update}} = 2T_B + T_C$$

in case of  
dirty block

$(B_{\text{old}} + B_{\text{new}})$

Copied back ←

dealt

2. In M/W, current update is reflected when update

$T_{B_{\text{old}}}$  is copied back.

(a) The  $k^{th}$  block of  $16M$  has to be placed in which cache set, if two way association is used for the entire  $2C$  cache blocks.

(a)  $K \text{ Mod } C$ .

(b)  $K \text{ Mod } 2C$ .

(c)  $2C \text{ Mod } K$ .

(d)  $C \text{ Mod } K$ .

No. of Cache blocks  $A = 2C$

$$S = \frac{N}{P} = \frac{2C}{2} = C$$

Cache block =  $\log_2 C$

$k^{th}$  block of  $16M$  has to be placed in  $K \text{ Mod } C$  set.

### Updat<sup>n</sup> Techniques

The updat<sup>n</sup> technique are used to dealt with cache coherence problem. The ~~is~~ write through updat<sup>n</sup> simultaneously update the cache memory & MM

Time  $T_{\text{update}} = \text{Max}(T_c, T_m)$

mapping)

Mapping)

ciative Mapping)

### Set-Associative Memory

1 bit +  $(2^3 \times 13)$  bytes

Q. Consider a cache memory which is applied with direct mapping. The no. TAG bits is equal to no. of blocks in cache. Each block contains  $N$  words where  $N$  is the total cache blocks. How many words are there in MM?

(a)  $N^2 \times \log N$

(b)  $N \times 2^N$

(c)  $N^2 \times 2^N$

(d) None.

### Direct Mapping

| Physical Address | Byte | Tag | Block offset | Word offset |
|------------------|------|-----|--------------|-------------|
|                  |      |     | $\log_2 N$   | $N$         |

Block offset =  $\log_2 N$

$K = \log_2 N + \log_2 N + N \Rightarrow N + \log_2 N^2$

bytes

The no. of words in MM =  $2^K$

$2^{\log_2 N^2}$

$2^N \cdot \log N^2$

$2^N \cdot N^2$  Any

(3) One TAG comp.

Size = 10 bits (Direct Mapping)

8 K TAG comp

Size = 13 bits (Associative Mapping)

8 TAG comp

Size = 13 bits (Set-Associative Mapping)

(4) Actual size of the Cache

data memory + TAG memory

$N \times \text{Tag bits}$

Direct Mapping

$$1 \text{ MB} + \left( \frac{2^{23} \times 10}{8} \right) \text{ Bytes}$$

$$1 \text{ MB} + \left( \frac{2^{23} \times 10}{2^3} \right)$$

$$1 \text{ MB} + 2^{20} \times 10$$

Phys  
Add

Associative Memory

$$1 \text{ MB} + \left( \frac{2^{23} \times 23}{8} \right) \text{ bytes}$$

Th

MM

4 GB =

$2^{30}$

$2^{30}$

$2^3$  (4 bits) per word.

$2^8$

256 Million

words

(1) Physical Address (bits)

$$= \log_2 2^{28} \Rightarrow 28 \text{ bits}$$

10 13 ( $\log_2 N$ ) 5

TAG C. Block offset Word offset

28 bits

Associate Mapping

23 bits 5

TAG Word offset

28 bits

8-way Set Associate Mapping

1M

13 10 ( $\log_2 S$ ) 5

TAG Set offset Word offset

28 bits

P = 8-way set-associative mapping

$$S = 2^{\log_2 P} = 2^3 = 8 \text{ blocks}$$

4 The higher the associativity ; More the TAG bits

8 K blocks

$$\frac{2^{13}}{2^3} = 2^{10}$$

$$CM : 1MB = 2^{20}$$

N  
O  
T  
A  
S  
F  
O  
R  
T  
A  
I  
B  
T  
Y  
O  
R  
Y

MM

$$4GB = 2^{30}$$

2<sup>30</sup>

$2^3$  (4 blocks per row)

2<sup>8</sup>

256 Million  
words

$$M = 2^{28} \text{ words}$$

82 words per block

$$= 2^{28}$$

$$N = 2^{23} \text{ blocks}$$

M = 8 million blocks (in MM)

CM :

$$1MB = 2^{20}$$

$$\frac{2^{20}}{2^2}$$

$$2^{18} \text{ words}$$

$$2^{18} \times 8K \text{ words}$$

$$N = 2^{10}$$

82 words per block

$$\frac{2^{10}}{2^2}$$

$$2^8 \text{ blocks} \Rightarrow 8K \text{ blocks}$$



In Block of  
of  $S$ ) set  
in cache

if  $P = 1 \Rightarrow$  Direct Mapping,  $S = N$   
if  $P = N \Rightarrow$  Associate Mapping,  $S = 1$

For associate  
the

and cache  
of the set

up to

physical

(as required)

- Q. Consider one-encoding 1MB cache memory.  
1GB MM, Both are divided into 82 word blocks. Each word contains 32 bits.
  - (1) What will be the no. of bits required to address the MM?
  - (2) What will be the no. of TAG bits for the following mapping techniques?
    - (a) Direct Mapping
    - (b) Associate Mapping
    - (c) 8-way set associate mapping

- (3) How many comparators & of what size required for each case?

- (4) What is the Actual size of the cache memory in  $B$  required for the above mapping?

$$CM = 1\text{MB}, MM = 1\text{GB}$$

Size of Word blocks = 32

thus Size of each word = 32 bits

## Set-Associative Mapping

In the set-Associate Mapping  $k^{\text{th}}$  Block of memory has to be placed in  $(k \bmod s)$  set where,

$s = \text{Total No. of sets in cache}$

$$s = \frac{N}{P} \quad \text{for } P\text{-way set associat} \quad (1)$$

$N = \text{total No. of Blocks in cache.}$

$P = \text{Gives the No. of sets}$

within the set, it can be placed anywhere.

The No. of TAG comparator = Size of the set  
(B)

4.  $P$  Blocks are accommodated in every set.

No. Tag is

For Normal Mapping,  $P < N$

The  $P$ -way set association divides the physical Address into 3 portions

- (1) Word offset (Size of the blocks are required)
- (2) Set offset ( $\log_2 s$  bits required)
- (3) TAG



→ We have comparators are referred to be more parallel to be also  
n/w is required. Therefore, it is very costliest.

No. of tag comparators = No. of Blocks in  
Needed for trapping Cache. 5 bit

Physical Add. having only 4m partition

5 4

|     |             |
|-----|-------------|
| TAG | Word offset |
|-----|-------------|

Physical Address = 9 bits (512 words)

→ Associative mapping requires more TAG bits.

Actual Size of Cache = Data memory + TAG memory

If No. of words in cache i.e. 64 words is fixed.

TAG Memory needed to be like this:

TAG memory =  $N \times \text{TAG bits}$ .

Where;  $N = \text{No. of blocks}$  (In this is fixed in Cache)

$\frac{32}{64}$

eg) 03:0

Direct mapping is very simple and requires less no. of tag bits.

+

0

E.g. 4-Blocks

CPU ref: 0 4 0 8 4 0 8 0  
Block fault F F F F F F F F

Mod 4

|   |   |                 |
|---|---|-----------------|
| 0 | X | X X X X X X X X |
| 1 |   |                 |
| 2 |   |                 |
| 3 |   |                 |

→ Block faults: 3/4 blocks of cache is not used.

→ It requires more time bcz it is slower.

→ It may require many block replacement.

Soln -

Any block of MM can be placed anywhere in the cache.

CPU : 0 4 0 8 4 0 8 0  
F F H F H H H H

Accessing from cache is taken less time, known as

Associative mapping.

"Any Block of MM can be placed anywhere."

It is very fastest mapping

0 0

1 4

2 8

3

35

32  
28

35

R. 92

256 128 64 32 16 8 4 2 1

0 0 1 1 1 1 0 0

8

Cache 3<sup>7</sup>

Tag info

block

Not available.

2

3

→

→

→

Ar

placed in  
main memory

placed in

be same  
in main memory is placed

it is

in the MM

in cache  
contain  
atly.

pointed by  
Q to 7.

Q. Check whether 200 word is available in cache or not.

200 word is  
9 bits for MM



Word 200 is not available in the cache  
bcz physical address tag is not matching with TAG bits of associated cache block

Q. Whether the word 121 is present in cache  
not?



121 is not available.

Q. 96



available

96  
64  
32



## ADDRESS MAPPING

1280  
9

The cache or rpm are divided into fixed size partitions (Blocks). The Address mapping decides which block of main memory (rpm) is to be placed in which cache position.

Q) 138.88

Direct mapping is the simplest one & requires less no. of TAG bits. The direct mapping needs many block replacements.

a)

Associate mapping is the fastest one but requires more hw. The set associate mapping used the advantages of direct mapping & associativity mapping. In four-way set associate mapping the cache is logically partitioned into sets with each set containing four cache blocks.

1 / 4

The one-way set association reduces to direct mapping, N-way set association reduces to associate mapping.

At any point of time, only 64 words are placed in cache memory.

| MM 512 words   |          |          |    |
|----------------|----------|----------|----|
| Direct Mapping |          | 16 Words | 0  |
|                | 64-words |          |    |
| 0              |          |          | 2  |
| 1              |          | 3        | 4  |
| 2              |          | 5        | 6  |
| 3              |          | 7        | 8  |
| 4              |          | 9        | 10 |
| 5              |          | 11       | 12 |
| 6              |          | 13       | 14 |
| 7              |          | 15       | 16 |
| N=4            |          |          |    |
|                |          |          | 81 |
|                |          | M=32     |    |

37

$$T_c = T_m \frac{137.88}{5} = 27.576$$

T<sub>target</sub> = 60°

$$60 = H_{\text{new}} + \frac{1250}{9}; (1 - H_{\text{new}}) \frac{1250}{9}$$

$$2 \frac{1250}{9} (H_{\text{new}} +$$

$$T_c = 60 = H_{\text{new}} + \frac{138.88}{5} + (1 - H_{\text{new}}) 138.88$$

$$11.104 H_{\text{new}} =$$

$$60 = 138.88 (H_{\text{new}} + 1 - H_{\text{new}})$$

$$(H_{\text{new}} + 5 - 5H_{\text{new}}) = \frac{5 \times 60}{138.88}$$

$$-4H_{\text{new}} + 5 = \frac{5 \times 60}{138.88}$$

$$H_{\text{new}} = \left( 5 - \frac{5 \times 60}{138.88} \right) / 4$$

$$H_{\text{new}} = 0.71$$

$$0.8 \rightarrow 0.08$$

$$100 \rightarrow ?$$

$$\frac{100 \times 0.08}{0.8} \Rightarrow 10\% \downarrow \text{decrease}$$

Dirce

Identify  
ceil

$$T_c = T_m \quad H = 0.8$$

blocks will

$$T_{avg} \leq 20\% \text{ of song}$$
$$\frac{20}{100} \times 50 \times 10^5$$
$$= 10 \times 10^5$$

$$T_{avgold} = \text{song}$$

avg time increased so hit ratio  
 $H_{new} = ?$  ( $< 0.8$ ) is reduced

recently

$$T_{avgnew} = T_{avgold} + 20\% \text{ of } T_{avgold}$$

$$= \text{song} + \text{long}$$

$$T_{avgnew} \geq \text{song}$$

$T_c$  &  $T_m$  will remain same

placed

$$T_{avgold} = H \times T_c + (1-H) T_m$$

faster than  
every time  
from song

$$SO = 0.8 \times \frac{T_m}{5} + (1-0.8) T_m$$
$$SO = 0.8 T_m \left( \frac{0.8}{5} + 1 - 0.8 \right)$$

$$T_m \geq SO$$

$$\frac{0.8}{5} + 1 - 0.8$$

$$= \frac{SO \times 5}{0.8 + 5 - 4} \Rightarrow \frac{250 \times 10^5}{4.8} \times 10^{-9}$$

$$= \frac{1250}{4.8} \Rightarrow 138.88$$

$$\begin{array}{r} 138.88 \\ \times 10^5 \\ \hline 138.88 \end{array}$$

→ memory management is aimed to identify the block that is to be replaced.

Memory management decides which block will be replaced.

Techniques to identify -

FIFO

LRU

Direct-Mapping

FIFO - Arrival time

LRU - The block which is not currently used that will be replaced.

Direct-Mapping -  $(R \bmod M)$

| MM                | Total Blocks<br>in Cache       |
|-------------------|--------------------------------|
| Block No.         | which is replaced              |
| $(R \bmod M) = 9$ | → This block will be replaced. |

(1) Consider a cache which is 5 times faster than MM with hit ratio 80%, the avg. access time for this cache is increased by 20% from 10ns

(a) What is the cache access time?

(b) What is the MM access time?

(c) What is the new hit ratio?

hierarchy

→ Cache used in client side memory for main addressing as well as server side memory.

refers to

→ Cache Coherence problem is dealt with  
- write through update  
- Write back update

e)

e)

2

// A Variable having same value in both main & cache memory

\* Cache Coherence - data inconsistency with main & cache memory

variable

(1) Write through update - not suitable for more or avoid at cache are update at same time write operat<sup>s</sup>. Problem is resolved bcoz no data consistency when MM &

(2) Write back update - It is postponing the write operat<sup>n</sup> for some time, main memory is not updated with cache memory.

running after 1<sup>st</sup>

memory)

= 11 )

Main memory is updated when the concerned block is to be taken out of cache then MM updated.

lity

Eg. for bulk write operat<sup>n</sup>. it is suitable  
Data consistency is rendered in this update.

8/09/10

## Cache Memory

- Smallest & fastest memory component in hierarchy and maintain Locality of reference
- Address mapping converts physical address to Cache address
  - Direct Mapping (Small cache size)
  - Associate " (Large cache size)
  - Set-Associate " (Medium n v)

Variables are of two types -

- (1) live variable → required for used in m.
- (2) Dead variable

Registers Allocation Register Check Active variable  
Dead

for ( $i=1$ ;  $i<1000$ ;  $i++$ )

$$P(i) = g(i) + r$$

Stored in cache memory for run 999 times after 1<sup>st</sup>  
time - slow (fetch from main memory)  
999 times - fast (11 v Cache - 11)

for  $i=1$ :

• Spatial locality by position

• temporal locality by execut<sup>n</sup>

E.g.

med mag  
complement  
complement

40

for -6

signed mag. - 1110

1's complement - 1001

2's " - 1010

A = 8 + 2

+ 0 = 1000 - 0

S.M. - 0000 1000

1's.c. - 0000 1111

2's.c - 10000 0000

for S.M.

 $n$ -bit  $\Rightarrow (a_{n-1} a_{n-2} \dots a_0)_2$ 

for magnitude

 $\sum_{i=0}^{n-2} a_i \cdot 2^i$  is true if  $a_{n-1} = 0$ .

weight 120

coeff.

$$[((-2^{n-1}) \cdot a_{n-1}) + \sum_{i=0}^{n-2} a_i \cdot 2^i]$$

weight for sign

bit

Range (n-bits)

$$V_{\min} = -(2^{n-1})$$

$$2^{n-1}$$

to

$$V_{\max} = (2^{n-1})$$

$$2^{n-1} - 1$$

Signed mag

1's complement

2's complement

Largest +ve no. &amp; min. value.

## Signed magnitude

denoted  
complement

$$\begin{array}{r} 1011 \\ - 0101 \\ \hline 1010 \end{array} = (-7)_{10}$$

significat  
the  
itern  
e repr.  
rent

the  
code.

notatn  
xit ( the

→ The 1's complement & 2's complement arithmetic uses the same h/w for both addition & subtract whereas separate h/w is needed for sign-magnitude arithmetic.

→ The 2's complement arithmetic is faster compared to 1's complement arithmetic bcoz it doesn't require the "scraping" to end around carry.

→ The signed magnitude & 1's complement cover the same range & have two patterns for zero, (-0 & +0) whereas 2's complement repr. has single pattern for zero bcoz of this 2's complement cover one extra magnitude, compare to the others.

E.g. for +6

|          |                |        |                  |
|----------|----------------|--------|------------------|
| wt. code | signed mag.    | - 0110 | All the pos. has |
| on w.co. | 1's complement | - 0110 | same repr. in    |
| wt. code | 2's            | - 0110 | all 3 patterns   |

## Fixed point signed repr.

The fixed point signed no's can be denoted in 1-Signed magnitude repr or its complement or 2's complement

→ All the three repr. gives the most significant bits to denote the sign, if it is 1, all the repr. declared the no. is -ve.

→ All the no's are having identical pattern & identical value in all the three repr.

→ The signed magnitude & 2's complement notation are the weighted codes whereas the 1's complement repr. is non-weighted code.

→ The 2's complement is the only notation that essence the weight to sign bit (the weight of the sign bit is -ve).

E.g. 2's complement repr.

$$\begin{array}{r} 1 \ 0 \ 1 \ 1 \ 1 \\ \downarrow \uparrow \downarrow \uparrow \downarrow \\ 1 \ 0 \ 1 \ 1 \ 1 \end{array} = (-9)_{10}$$

E.g.  
wt. 0

Non wt. co.

wt. code

→ Integer (Radix position LSB)

→ Fraction (Radix position before MSB)

Number → Fixed Point  
(Position of Radix is fixed)

Floating Point

Unsigned  
Signed

We:

Then if  
its then

Value Expr<sup>n</sup>

$(a_0, a_1, a_2, \dots, a_{n-1})_2$  → fixed point

unsigned integer

$$V = \sum_{i=0}^{n-1} a_i \times 2^i$$

at

E.g.

$$1011_2$$

$$1011_2 = (23)_{10}$$

$V \rightarrow V_{\min}$  iff  $a_i; a_i = 0$

$$V_{\min} = 0 + 2^0 + 0 \times 2^1 + \dots + 0 \times 2^{n-1} = 0$$

$V \rightarrow V_{\max}$  iff  $a_i; a_i = 1$

$$V_{\max} = 2^0 + 2^1 + 2^2 + \dots + 2^{n-1} = 2^n - 1$$

Range:  $V_{\min}$  to  $V_{\max} \Rightarrow (0 \text{ to } 2^n - 1)$

if  $n = 8$   
0 to 255

## NUMBER REPRESENTATION (Radix-2)

Range :  $V_{\min}$  to  $V_{\max}$       V-value

Value Expression

(1) If the value is in between limits then it  
is underflow and if outer the limits then  
it is overflow.

### Numbers

Fixed point

(Position of radix is fixed)

Integer

(Radix position after MSB)

1. Integer itself is a floating point no.

Floating point

E.g.  $(110.101)_2$

$(11.101)_2$

(2) Fraction (Radix position before MSB)

• 1 2 3 4

Range

Ex 3.2

Ques.

101 x 1000000

Soln.

101000000

(6)

Max<sup>m</sup> decimal value of 3-digit binary no. (7)

$$\text{E.g. } (777)_8 = 8^3 - 1 \Rightarrow (511)_{10}$$

n - digit no.

radix - r

max<sup>m</sup> decimal value  $(r^n - 1)_{10}$

digit no.

let us say, in que it requires k-bits,  
then

$$(2^k - 1)_{10} = (10^{20} - 1)_{10}$$

$\Rightarrow$

$$2^k = 10^{20}$$

$$k = 20 \log_{10} 2$$

$$k = 66$$

binary,

Full form:

If we have a

Any radix can be used as borrow in  
their subtraction like above or as example.

In decimal i.e. 10 can take as borrow.

(6)

E.g.

10 111.10

1001.01

0.11.10.00

(7)

$$(903)_{16} \rightarrow (16^2 \cdot 9 + 3 \cdot 16^0)_{10}$$

Weighted sum of the digits is always radix no.

$$\begin{array}{r} 9 \cdot 0 \cdot 0 \cdot 3 \\ \downarrow \quad \downarrow \quad \downarrow \\ (1001\ 0000\ 0000\ 0011)_2 \end{array}$$

$$Z = 12 \quad \text{No. of 0's} \Rightarrow$$

$$N = 4 \quad \text{No. of 1's}$$

$$\begin{array}{r} (32^5 \cdot 9 + 32^3 \cdot 3 + 3) \\ \downarrow \quad \downarrow \quad \downarrow 10 \\ (8) \quad (3) \quad (2) = 7 \end{array}$$

If above expression is converted into binary,  
then the no. of 1's in it is 7.

ed to  
binary

When we increasing the radix, the value  
should be increased.

of 7's  
complement

$$(5) \quad (x^2 + 2x + 3)_{10} = (9 + 6 + e)_{10}$$

$$x^2 + x - 12 = 0$$

$x \rightarrow 4$  x ? digit value & radix value  
 $3 \times$  radix is not maintained.

(2) (1) Convert the operands into octal.

(2) Perform octal subtraction.

opr1: C 012.25

(1100 0000 0001 0010. 0010 0101)<sub>2</sub>

14.00.42.2

14.00.22.112.0

opr2: 10111 001.110.101

27.16.5

X Y

Octal Subtraction

(8-1) borrow

14.00.22.112

27.16.5

138103.412

E.g. ①  $(212 \ 121 \ 112)_3 = (?)_9$

(1)

$$\begin{array}{r} \text{digit}_2 \rightarrow \\ 3 = 9 \end{array}$$

$$(212 \ 121 \ 112)$$

(2)

$$(2 \ 5 \ 5 \ 4 \ 8)_9$$

$$3^4 = 81$$

$$1^1 = 1$$

$$(12)_3 = (5)_{10} = (5)_9$$

(a)

(b)

(3) Poss.

②  $(101110011)_2 = (?)_{16}$

$$2^4 = 16$$

If there is any relation betw. the radices, we can use them like above examples.

(4)

$$\begin{array}{cccccc} & 1 & 2 & 1 & 7 & \\ & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 \end{array}$$

(a) 12

098F

(5)

X

(a) 3

2m

$$\text{E.g. } (41.4)_8 = (?)_5$$

Conversion -

One radix to another radix

(1) Radix- $r$  to decimal

(2) decimal to radix- $r$

recurring  
element

(1) Radix- $r$  to decimal

$\sum w_i r^i$

recurring  
irrational

$w_i \Rightarrow r^i$  if  $d_i$  is integer

$\rightarrow r^i$  if  $d_i$  is fraction

$$\begin{array}{r} 4 \quad 1 \quad 4 \\ \downarrow \\ 8^1 \quad 8^0 \quad 8^{-1} \end{array}$$

$$8 \times 4 + 1 + 4 \times 8^{-1} = 32 + 1 + \frac{4}{8} \Rightarrow 33.5$$

(2) Decimal to Radix- $r$

divide the decimal no. (integer) successively by radix- $r$  and arrange remainders in reverse order (LIFO)

$$\begin{array}{r} 5 \sqrt{33} (6 \quad 5 \sqrt{61} \quad 5 \sqrt{10} \\ -30 \quad \rightarrow \quad 5 \rightarrow 0 \\ \hline \quad \quad \quad 1 \quad \quad \end{array}$$

$$0.5 \times 5 \Rightarrow 2.5 \Rightarrow 2$$

$$0.5 \times 5 \Rightarrow 2.5 \Rightarrow 0$$

$$(41.4)_8 = (11.3.222\ldots)_5$$

(6) Approximate number of bits needed to represent 20-digit decimal number in binary

- (a) 20 (b) 6.6 (c) 86 (d) 96

(7) What is the relat<sup>n</sup> bet<sup>n</sup> number of 1's & 0's if the following decimal expression is converted into binary

$$(16^3 + 9^2 + 3)_{10}$$

(a) 8 zeros are more than 1's

(b) 7

(c) 6

(d) none

Sol:

(3)

$$(4.2)_9 = (xy)_8$$

$$\begin{array}{r} 4 \quad 2 \\ \downarrow \quad \downarrow \\ 9^1 \quad 9^0 \end{array}$$

$$36 + 2 \Rightarrow 38$$

$$(38)_{10} = (xy+2) \quad \because x < y$$

$$\Rightarrow xy \leq 5,7$$

3 | 09 | 10

(1)  $(101011)_2 = (xyz)_3$

- x, y, z value (a) 1, 1, 1 (c) 1, 2, 1  
(b) 1, 1, 2 (d) 2, 2, 2

(2)  $(012.25)_8 = 1011100110.101_2$

- (a) 135103412 (b) 135103.205  
(c) 564411.412 (d) 564411.205

(3) Possible values of x if

$(42)_8 = (x3)_9$

- x = (a) 5, 2 (b) 1, 5  
(c) 3, 5 (d) 7, 85

(4)  $(1217)_8 =$

- (a)  $(1217)_{16}$  (b)  $(0B17)_{16}$  (c)  $(028F)_{16}$  (d)  $(2297)_{10}$

(5)  $(123)_x = (12x)_3$

x =

- (a) 3 (b) -3, 4 (c) -4, 3 (d) None

SISD - Single inst. single data stream

Eg.

(Array) SIMD -  $n \times n$  - multiple -  $n$

Cn

MISD - multiple -  $n$  - single -  $n$

C

(parallel) MIMO - multiple -  $n$  - multiple -  $n$

C

SISD - ICU, MU, 1-PFM

C2

SIMD - ICU, MU, N-PFM

PE - Processing

MISD - N-CU, MU, 1-PFM

element

MIMO - N-CU, MU, M-PFM

- Super scalar Arch contains separate processing unit for integer manipulation & fractional manipulation.

### Number System -

|     |                     |
|-----|---------------------|
| $R$ | Digit value < radix |
|-----|---------------------|

8x44



(2)

b

required

Dual core two core mirror images

C<sub>1</sub> C<sub>2</sub>



Having clock cycle 2 GHz

4 GHz



Non-dual core

More frequency more heat Hence, Dual core is effective

Classification of Computer Architecture

first Memory Architecture is proposed by Von neuman which supports the stored program concept (prog & data are to be stored prior to processing).

The Harvard arch. is similar to that of von-neuman arch., the only diff. is the placement of prog. & data is done in diff. memories

The basis of Flynn arch. is instr stream & data stream

Harvard

DM

IM CS

DM

DS

// Locality of reference or frequently required info are stored in cache memory.

// Aim of CLA (carry look ahead) -

→ reduced the time of addition



// Overlap the instruction phases



// Goal of Instr pipeline -

→ "Efficient usage of Resources"

Data transfer



of system

Co : Deals with logical design of the system

→ It consider the physical devices & their interconnect with the perspective of ~~improve~~ <sup>improve</sup> the performance

Performance of system -

1) Amount of work done in unit time  
or insta. per second (MIPS)

② floating pt. opr. per sec (FLOPS)

(Inst. set)

fixed length

addressing mode

CPI = 1

↓  
clock per

instr.

addr.

interpret

Integer fraction  
↑ ↑  
3.52

3  
Position of fractional point  
52

exponent represents memory location of fraction point.

Estimer ↑ Memory M/E → Exponent  
visiting

- Structural dependency is due to resource limitation.
- Resource duplication is used to deal with it.

Three types of dependency

- ① Data
- ② Control
- ③ Structural

We don't have any program without branch instruction.

K-clocks

| S |       | $I_1$ | $I_2$ |       | Total clocks =   |       |
|---|-------|-------|-------|-------|------------------|-------|
| E |       | $I_1$ | $I_2$ | $I_3$ | $(K+n-1)$ clocks |       |
| D | $I_1$ | $I_2$ | $I_3$ | $I_4$ | for $n$ instrucn |       |
| F | $I_1$ | $I_2$ | $I_3$ | $I_4$ | $I_F$            |       |
|   | 1     | 2     | 3     | 4     | 5                | clock |

of  
Instruct cycle is reduced.

$$T_{in} = (K+n-1) \text{ clock p}$$

$$\text{Clock p} = \text{Max (stage delay)}$$

+

Buffer overhead

### Buffer registers (PIPO registers)

resource

deal with it.



→ All data synchronously

If 100 clock is  $S = 97 + 4 \text{ clocks} \approx$   
there,  $100 \text{ clocks} \approx 3.88$

If 100 clock is  $S = 997 + 4 \approx$   
there,  $100 \text{ clocks} \approx 3.986$

out Branch

More instruction in pipeline → More the better performance.

Total clocks =  
 $(n-1) \text{ clocks}$

$\times n$  instructions  
then

$S_{ideal} = 4$

Pipeline system gives 4 times more performance than non-pipeline system.

$k = 20$   
decreased.

If the no of stages is increasing, after a particular time speedup factor is decreased i.e; performance decreases.

No. of optimal stages  
is always 6, it  
is always fixed  
for any processor.



(A) Consider 2-pipeline a & b. The pipeline a has 8-stages of uniform delay 2ns.  
The pipeline b has s=stages of

6 stages with

4ns, 1ns, 2ns, 3ns, 1ns

Even

(A) How much time is saved for 100 instr. if pipeline a is used instead of b?

(B) What will be the respective speedup factors & efficiencies

Uniform delay - ~~same time at every stage~~ <sup>uniform</sup>  
~~same time delay at every~~ <sup>stage</sup>

$$T_{clock\ p} = \max(\text{stage delay}) + \text{buffer overhead}$$

clock of pipeline

$$T_{clock\ PA} = 2\text{ ns}, T_{clock\ PB} = 4\text{ ns}$$

$$\textcircled{1} \quad n = 100 \text{ (instr.)}$$

$$T_n = (K+n-1) T_{clock\ p}$$

$$T_{nA} = (8+99) * 2\text{ ns} = 214\text{ ns}$$

$$T_{nB} = (5+99) * 4\text{ ns} = 416\text{ ns}$$

The pipeline  
by 2ns

of

Time saved if A is used instead = 202 ns  
of B (416 -)

(2)

$S_A$  = Time without Pipeline  
Time with pipeline.

2s

in instr. if

b?

up factors

Every instr. has to perform 8 phases in 2ns  
instr. cycle



uniform  
every time,  
say at every  
stage.  
overhead.

Amount of work is not changed in Time  
stages in A

$$S_A = \frac{16 \times 100}{214.25}$$

$$S_A = 7.476$$

4ns

$S_B$  =  $\frac{\text{time without pipeline}}{\text{time with pipeline}}$

$$= \frac{11 \times 100}{416.25}$$

$$4.64$$

$$S_B = 2.64$$

$S_{\text{ideal}}$  = Number of stages

1

100% efficiency

50

100% w.r.t pipeline A = 8

$$\text{w.r.t. } \eta = 11 - 2 = 7.46$$

$$2 \frac{7.46 \times 100\%}{8}$$

efficiency for A,  $\eta_A = 93.25\%$ 

100% w.r.t pipeline B = 5

$$9 - 11 = 2 = 2.64$$

$$2 \frac{2.64}{5}$$

$$= \frac{52.8}{100}$$

efficiency for pipeline B,  $\eta_B = 52.8\%$ 

Q. Consider  
each  
diff  
gives  
in

- Instr,
- Q An ~~to~~ stage pipeline is having speedup factor 10 while operating with 80% efficiency. What will be the no. of stages for the pipeline?

$$S_p = 10$$

$$\eta_p = 80\%$$

$$100\% =$$

~~$$80\% = 2/3$$~~

~~$$160\%$$~~

$$2 \frac{100\% \times 10}{2/3}$$

~~$$(10/2)$$~~

24 (b)  
clock

Sideral = No. of stages

$$\frac{100}{80} \Rightarrow \frac{100}{8} \\ = 12.5$$

213 stages

Q. Consider a 4 stage instru pipeline in which each instr. spends diff. no. of clocks at diff. no. of stages. The following table gives 4 instr. and associated required clock in the respective stages.

| Instruction    | F        | D | E | S | clocks |
|----------------|----------|---|---|---|--------|
| I <sub>1</sub> | 2<br>avg | 1 | 3 | 1 | 4      |
| I <sub>2</sub> | 1        | 2 | 2 | 1 | 6      |
| I <sub>3</sub> | 1        | 1 | 2 | 3 | 7      |
| I <sub>4</sub> | 1        | 1 | 1 | 1 | 4      |

24

for (i=1; i<2; i++)

{ I<sub>1</sub>; I<sub>2</sub>; I<sub>3</sub>; I<sub>4</sub> }

- (a) 24 (b) 28 (c) 20 (d) None

clocks required per instr. per stage is changed here.

### Thumb rule -

- Resource should not be allocated unless it freed by previous one
- No instant acquire the next before the completion of previous stage



$$\text{Here, } S_{\text{ideal}} = 4$$

$$S = \frac{\text{time without pipeline} - \text{time with pipeline}}{\text{time with pipeline}} = \frac{24 - 14}{14} = 1.71$$

4 stages

$$4 = 100\% \text{ eff.}$$

$$1.71 = \frac{1.71}{4 \times 100}$$

$$= 42.7\%$$

$$\eta_p = 43\%$$

Pipeline producing not any activity, we call it as stall.

More

be b1

for  $f_i = 1 : 2$

lated unless

$\{ I_1, I_2 \}$

before the

$I_3, I_4 \}$

8, 9, waiting

$S$  in

is pipeline

$E$  proc gear

$I_2, I_3$  is not

complete

$I_1, I_2, I_3, I_4$

$E, F, G$



shift Method

$\rightarrow$  shift

24  
14 = 171

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

14

### Dependencies -

→ The dependencies among the instr. requires re-organisation of instr. pipeline.

→ This inturn consumes extra time. Hence, the spreadup factor reduce.

→ The data dependencies are four types -

- RAW " True dependency "

- WAR " Anti "

- WAW " output "

- RAR " NO "

Instr. → T<sub>2</sub>

Read After write - RAW

If 2<sup>nd</sup> instr. has to read only after 1<sup>st</sup> instr. is read.

(i) RAW

E.g.      ADD R<sub>1</sub>, R<sub>2</sub>, R<sub>3</sub> ; R<sub>1</sub> ← (R<sub>2</sub>) + (R<sub>3</sub>)  
 ADD R<sub>1</sub>, R<sub>2</sub>, R<sub>5</sub> ; R<sub>1</sub> ← (R<sub>2</sub>) + (R<sub>5</sub>)

|                  | 1   | 2                               | 3                                     | 4              |
|------------------|-----|---------------------------------|---------------------------------------|----------------|
| I <sub>1</sub> : | ADD | R <sub>2</sub> , R <sub>3</sub> | (R <sub>2</sub> ) + (R <sub>3</sub> ) | R <sub>1</sub> |
| I <sub>2</sub> : | ADD | R <sub>1</sub> , R <sub>5</sub> | (R <sub>1</sub> ) + (R <sub>5</sub> ) | R <sub>4</sub> |

going to be

R<sub>1</sub> is available in 4<sup>th</sup> clock cycle & Instr.  
 2 is used it in 3<sup>rd</sup> of it. It's  
 used its previous value i.e; calendar dependency

## Rectification

requires

Hence,

types -

instr. is

$\hat{I}(R_3)$

$\hat{I} + (R_5)$

I

le & instr.

It's

has dependency

class 10

## Data dependency

RAN True  $D \cap R \neq \emptyset$ WAR ANTI  $R_2 \cap D \neq \emptyset$ WAW output  $R_2 \cap R \neq \emptyset$ RAR NO  $D_2 \cap D \neq \emptyset$   
It takes extra time  
than other

I) Inst

sche

Eth

(Per

WAW: 2nd instruction has to write when 1st instruction  
has to read it.  
(give like this statement)

domain : Range

SUB R<sub>1</sub>, R<sub>2</sub>, R<sub>3</sub> R<sub>1</sub>  $\leftarrow (R_2) - (R_3)$  R<sub>1</sub>, R<sub>3</sub> R<sub>1</sub> IP) storeOR R<sub>1</sub>, R<sub>4</sub>, R<sub>5</sub> R<sub>1</sub>  $\leftarrow (R_4) \vee (R_5)$  R<sub>1</sub>, R<sub>5</sub> R<sub>1</sub>

III) Oper

logical instruction takes less time.

Every stmt has 2 instrucn

Before

→ gt c

→ Most

(1) domain D - getting the operand (Read)

(2) Range R - storing the operand (Write opr)

g. ADD F

ADD R

If  $D \cap R \neq \emptyset$ , then there will be  
dependency

No penalty

Solut<sup>n</sup>s

## 2) Instruction Re-scheduling

Scheduling - Action taken in order to prevent

operations

times

Efficiency of compiler & program.

(Performance is depends upon it)

Instruction

(a)

I<sub>j</sub>

T<sub>j</sub>

T<sub>k</sub>

T<sub>2</sub>

new Range

R<sub>3</sub> R<sub>4</sub>

R<sub>5</sub> R<sub>2</sub>

If I<sub>j</sub> + t<sub>2</sub> pipeline  
Causing stalls

II) stall cycle instruction: No operation cycle introduced in both instructions.

III) Operand forwarding: Value of operand is provided to nearby stage before it is stored.

→ It consumes additional H/W

→ Most of the data dependencies are resolved

open

(Read) E.g. ADD R<sub>1</sub>, R<sub>2</sub>, R<sub>3</sub>

(Write opn) ADD R<sub>4</sub>, R<sub>1</sub>, R<sub>5</sub>

be

No penalty



(R<sub>2</sub>) + (R<sub>3</sub>) - called

as value or

result

data to B should be consumed in same clock, not going in other clock cycle.

### Control Dependency

Instruction cycle of other instruction is altered by

Sequence of execution is altered by instruction cycle of earlier one in pipeline.  $\rightarrow D$

T<sub>1</sub> B1NIT

T<sub>2</sub>

T<sub>3</sub>

1 2 3 4

| F                    | D | E      | S |  |
|----------------------|---|--------|---|--|
| I <sub>j</sub> : fin |   |        |   |  |
| F                    | D | E Work | S |  |
|                      | D | E      | S |  |
|                      |   | E      | S |  |

Valid

→ stall cycles

(A+1) clocks

No. of stages in pipeline

if Speed up factor is reduced.

only, not

No

$S_{eff}$

ideal

( $1 + \text{stall\_stall}$ )

+ stall freq.  $\times$  stall cycle

altered by

$S_{eff}$

K

( $1 + \text{stall freq.} +$ )

Stall-cycle

Instruction?  $\rightarrow$  Opcode - gives nature of instr.

we consider 8-stage instr. pipeline (F, D, A, write Back  $\rightarrow$  WB)

The instr. pipeline allows overlapping of all instr's except the branch instr. In the case of branch instr. The target is not available until the current branch instr. is completed. Let there are 80% branch instr. (Conditional & unconditional). Each stage consumes 10 ns delay.

(a) What is the avg. instr. time if there is no special consideration for this conditional branch instr.

(b) Let there are 80% instr's are conditional among the branch ones. A 30% doesn't satisfy the condtn. If there is no penalty for conditional branch instr whose condtn is met (branch is not taken). What is the

effective speedup factor?

Sol<sup>2</sup>(ii) All other except branch instr consume  
(on average) 1 clock for completion.  
for each branch instr. the target has to  
wait  $(k-1)$  clocks for scheduling & take 1-  
clock (on average) to complete.

$$K=8$$

$$T_{\text{clock}} = \text{Max stage delay} = 10\text{ns}$$

stall cycles for branch instr. =  $(k-1)$  clocks  
(waiting time)

$$2.4 \text{ clocks} \quad (b)$$

take for scheduling (waiting time)

$$T_{\text{avg}} = [80\% \cdot 1 \text{ clock} + 20\% (4 \text{ clocks} + 1 \text{ clock})]$$

$$= 1.8 \text{ clocks} \Rightarrow 18 \text{ ns}$$

OR:  $T_{\text{avg}} = (1 + \text{stall freq.} \times \text{stall cycles}) \text{ clocks}$

$$= (1 + 20\% \times 4) \text{ clocks} \quad (b)$$

$$= 1.8 \text{ clocks}$$

$$\approx 1.8 \times 10\text{ns}$$

$$T_{\text{avg}} = 18\text{ns}$$

effective speed up factor  $S = \frac{k}{1 + (\text{stall freq} * S)}$

consume  
selection

$t$  has to

take  $t -$

ions

( $k-1$ ) clocks

( $S-1$ )

4 clocks

utilizing (waiting time)

$s + 1$  clock))

(b)

branch instr. 20 ns

(decision tree)



50% 50%

un-conditional

(Branch is taken)

stalls: 4

Conditional (branch is taken)

(Condition satisfied 70%)

(Cond) not satisfied 30%

(Branch is not taken)

stalls: 0

$$T_{avg} = (1 + 20\% [50\% * 4 + 50\% (70\% * 4 + 30\% * 0)])$$

clocks

$$= (1 + 0.2 [0.5 * 4 + 0.5 (0.7 * 4)])$$

$$= (1 + 0.2 [2 + 0.5 * 2.8]) = (1 + 0.2 [2 + 1.4])$$

$$\frac{2}{68} \cdot \frac{2.8}{4} = \frac{2}{140}$$

$$= 1 + 0.2 * 3.4 = 1.68 \text{ clocks} \Rightarrow 1.68 * 20 \text{ ns} \Rightarrow 16.8 \text{ ns}$$

Here, Branch instr. is present which doesn't want penalty. So, the avg. time is given here.

$$S = K \\ 1.68$$

$$= S \\ 1.88$$

$$\boxed{S = 2.97}$$

$$\text{efficiency } \eta = \frac{1}{\text{Targ}} \rightarrow \frac{1}{16.8} \text{ ns per instr.} \\ \frac{10^9 \text{ instr./sec}}{16.8} \\ = \frac{10^3 \times 10^6 \text{ MIPS}}{16.8}$$

$$\eta = 89.5 \text{ MIPS}$$

Techniques to deal with ctrl dependencies-

(1) Delayed load

(2) Mul

The load of the branch instr. is either the next sequential one or the target instr. betw the branch & its load. Independent instrs. are scheduled since the load one

doesn't is getting delayed

is

$I_1$

$I_2$

$I_3$

$I_m$

Load  $I_2$  or  $I_3$

A. No. of instr. delayed depends upon stall cycles  
No.

B. Load is taken at end bcoz load is delayed  
i) Whether the condn. is satisfied or not, penalty  
is there or not will be



Instruction Queue

Tendencies

## (2) Multiple Pipelines

either target

The two pipeline sys. successfully resolve many data dependency. Both pipeline maintaining

2) indep-

the branch instr. in the beginning clock, &

the load one pipeline places  $I_2$  in the (next sequential) another

1(a)

places  $I_1$  (target for the next clock). Based on the cond<sup>n</sup>, one of the pipeline is allowed to continue & other one is stop. (3) Pn.

This 2 pipeline sys. is resolved

IF-THEN-ELSE statement. If free for Nested if conditions.

1 2

IF-THEN-ELSE

F  $I_1$ F  $I_2$ 

1 2

F  $I_1$ F  $I_2$ 

E

imp.

Alc

(denote

pre

110 | 12  
1) Based on control dependency  
ne is

s stop.

### (3) Prediction Techniques

for

EN ELSE

Static

Dynamic

decision when to change the prediction  
e.g. when ever previous two consecutive prediction are wrong then change pred

Branch Never takes

Fetch: Next Sequential (next clock)

Branch Always taken

Fetch: Target (next clock)

### Implementation of Dynamic Prediction

Flag : Taken = 0 (Branch is Not taken)  
(denote status of prediction)  
= 1 (Branch is Taken)



A, C = strong predict<sup>n</sup> status  
 B, D = weak predict<sup>n</sup> status

When the outcome is same & the predictor is not changed then it is strong predict<sup>n</sup>

④ Control dependency can't resolve with predict<sup>n</sup> tech.

### Structural Dependencies

The structural dependency result due to the limited availability of resources. Resource duplication is used to deal with structural dependency. The duplicate is subjected to cost and reliability constraints.

E.g. (i) In case of single port memory, fetching the instruct<sup>n</sup> cannot be overlapped with storing the result for another instruct<sup>n</sup>.

(ii) In case of single ALU unit, execution of one instruction can't be overlapped with fetch of another instruct<sup>n</sup>. If the fetch requires updating the prog. counter, ( $PC \leftarrow (PC) + k$ ).

|            | 1 | 2 | 3 | 4   | 5  |
|------------|---|---|---|-----|----|
|            | F | D | E | Mem | WB |
| predictor  |   |   |   | F   |    |
| redundancy |   |   |   | F   |    |

Single ALU  
 $(PC \leftarrow (PC) + K)$  Single port memory  
 only have one address & data bus

Due to less address & data bus, delayed in data & address sending which caused problem in efficiency. As well as due to

to the resource

ALU only one operatn is performed at a time which is also caused delayed.

$$R = R_1 R_2 = 0.89$$

| $R_1 = 0.9$    | $R_2 = 0.9$    |
|----------------|----------------|
| $R_{11} = 0.9$ | $R_{22} = 0.9$ |

now, if one module fail, other works.

for

| $R_1$          | $R_2$          |
|----------------|----------------|
| $R_{11} = 0.9$ | $R_{22} = 0.9$ |

$$R_1 = 1 - f_1$$

$$f_1 = f_{11}, f_{12}$$

$$= (1 - R_{11}) \cdot (1 - R_{12})$$

$$= 0.1 \times 0.1 = 0.01$$

$$R_1 = 0.99$$

$$R = R_1 R_2 = 0.99 \times 0.9 \approx 0.9$$

Q. Consider two instr. pipelines A & B, the pipeline having single port memory while B is having 2-port memory. Both pipelines are having equal no. of stages & allow all the instructions to be pipelined without penalty except the memory instr. In case of memory instr. if two memory operat's can't done simultaneously there will be 1-stall penalty. Let there are 90% instrutions are memory related. How many times the performance is enhanced if pipeline B is chosen instead of A.

| Pipeline     | stages | Memory      | stall freq. | seg |
|--------------|--------|-------------|-------------|-----|
| A            | K      | Single-port | 20%         | tex |
| B            | K      | two-port    | 20%         | ope |
| stall cycles | +      |             |             | mi  |
| +            |        |             |             | rec |
| 0            |        |             |             | ct  |
|              |        |             |             | use |
|              |        |             |             | (M) |

$$(\text{efficiency}) S_{\text{eff}} = S_{\text{ideal}}$$

$$\left[ \frac{(1 + \text{stall})}{(\text{freq.})} \times \frac{(1 + \text{stall})}{(\text{freq.})} \right]$$

K

$$1 + 20\% * \text{stall cycles}$$

B, the  
while B is

are having  
instn's  
upt the memory  
time

simultaneously  
let there  
be related

seen instead

$$S_A = K$$

$$1 + 20\% \times 1$$

$$S_B = K$$

$$1 + 20\% \times 0$$

$$\begin{matrix} S_B \\ S_A \end{matrix} = \frac{K}{1, 2}$$

$$S_B = 2 S_A$$

### Micro Operations

There are the elementary oprnts (atomic), which are used to move the data across registers. A group of microoperations implement a subphase of instruc'tns cycle. The seq transfer long. is used to encode the microoperations. The encoded version is called as micro program. Each micro operat'n in term required a set of simultaneous getting six (ctrl signals). The ctrl signals can be generated using h/w (-H/W ctrl unit design), memory (Microprogrammed ctrl unit design).

A program is a collectn of sequential instr.

leg



Control signal never generates anything.  
only helps to perform the signal &  
in flow of signals.

e.g. water taps in water pipelines helps to flow  
the water

Transfer Lang:  
Instruction



→ CPU program

Op 1: H/W

Op 2

Op n

Registers Transfer language  
e.g. Addition of  $R_1$  &  $R_2$  and

place the result in  $R_3$  at  
clock 2

Cond : no operat

$$T_1 : R_3 \leftarrow R_1 + R_2$$

H/W implementation -

⇒ (Getting  
Signal)

$$R_{1out} + R_{2out} \times R_{3in}$$

all  
as (Control) Signals  $T_1$

up thing  
signal &

data flow





$T_1 + T_2 T_3 : R_1 \leftarrow R_2, R_2 \leftarrow R_1 ; R_{1in}, R_{2out}$

$R_{1in}, R_{2out}$   
Both of them are done parallelly

# 4  
11 cl.

Q. Consider a basic sys. which has accumulator instr. register, prog. counter, memory address & data registers, which are connected by single bus architecture. The bus controller will decide who has to use the bus at that time.



→ fetch = get the instr into instr. registers  
It contains 8 bytes?

- Program counter
- ① place the address in MAR
  - ② get the instr. into MDR
  - ③ Place it in PR

Fetch Micro Program -

$$T_1 : \text{MAR} \leftarrow (\text{PC}) \quad (\text{Clock 1})$$

$$T_2 : \text{MDR} \leftarrow M[\text{MAR}] \quad (\text{Clock 2})$$

$$T_3 : \text{PR} \leftarrow (\text{MDR}) \quad (\text{Clock 3})$$

$$\text{PC} \leftarrow (\text{PC}) + 1 \quad (\text{5 values are op.})$$

or; R<sub>out</sub>

4 clocks are used in fetch instead of 3

or; R<sub>out</sub>

clock 4 is used to isolate Fetch & Execute  
parallelly

The execution of phase

for ADD in accumulator

The ADD R<sub>i</sub> instr. adds the content of memory locat<sup>n</sup> R<sub>i</sub> to the accumulator. By

the results are placed in accumulator.

11 decide

ime.

ADD R<sub>i</sub>

$$\text{Accum} \leftarrow M[R_i] + \text{Accum.}$$

bus  
controller

① Visit the memory to location R<sub>i</sub>.

② Get that content into MDR.

③ Add it to Accumulator.

TAR  
→ MDRout

instr.  
memory

MDR

3-10-1c

|       | opcode | set            |    |  | $\leftarrow ISZ$ |
|-------|--------|----------------|----|--|------------------|
| Fetch | ADD    | R <sub>1</sub> | IR |  | BUR<br>↓         |

 $T_1: MAR \leftarrow IR \text{ (Address)}$  $T_2: MDR \leftarrow M[MAR]$  $T_3: A \leftarrow (A) + (MDR)$ 

# Inc.

Han

To

are

&amp;

ctrl

for

it

All

Get the content of x

3-10-10

## Increment

Savie Back

## Incremental intents

$\leftarrow$  ISZ X Increment & Skip on Zero

Bun X Branch to X

— Skip the  
instruction  
result is 2

1

Opwate      Ref.

— Skip the  
instruction  
result is 2

/ Increment to memory local x

T<sub>1</sub>: MAR ← IR(~~off~~)

T<sub>2</sub> : MDR ← M[MAR]

$$T_0 : MDR \leftarrow (MDR) + 1$$

$T_4 : M[MAR] \leftarrow (MAR)$

$T_{PC} = \text{IF } ECMDR == 0 \text{ THEN } PC \leftarrow (PC) + 1$

|        |     |   |
|--------|-----|---|
| IR     | BUN | X |
| Opcode | Ref |   |

$T_1 : PC \leftarrow T_A(\text{Ref})$

## Hardware : (ontrol) Unit Design

In the h/w ctrl unit design, the ctrl signals are expressed as a sum & product expression & realised using dedicated h/w. The h/w ctrl unit offer fastest response & its suitable for real time application as it is not flexible. It can't be used for design & testing purposes. All the RISC processors employ the h/w ctrl (Pico)

(RISC)

unit design

Consider a CU (ctrl unit) which has to support two instructions  $I_1$  &  $I_2$  and uses 3-bit registers A, B, C.

The following table gives the ctrl signal request for each operation:-

| Op    | $I_1$                     | $I_2$           |
|-------|---------------------------|-----------------|
| $T_1$ | $A_{in}$ , Boot           | $A_{in}$ , Read |
| $T_2$ | $B_{in}$ , Cout           | $C_{in}$ , Boot |
| $T_3$ | $C_{in}$ , Cout, $B_{in}$ | $A_{in}$ , Read |
| $T_4$ | $A_{in}$ , Cout           | $B_{in}$ , Cout |
| $T_5$ | END                       | END             |

$$\text{Control}_x = f(I_1, I_2, T_1, T_2, T_3, T_4, T_5)$$

Every instruction has its own SOP of control signals.

Both instruction perform at clock 1

$$\text{SOP Expr} - A_{in} = T_1 * T_1 + I_2 * T_1 / T_2 + I_2 * T_3 + I_1 * T_4$$

$$\text{Hence, } A_{in} = T_1 + I_2 * T_3 + I_1 * T_4$$

$$C_{out} = T_5 + I_1 * T_3 + I_1 * T_2$$

has to  
id uses 3-bit

1 signal

1. If 1 bit opcode = 1, denotes Instr<sup>n</sup> I.  
otherwise I.



1) Max degree of parallelism  
No. of control signal simultaneously generated.

k<sub>1</sub>

I<sub>2</sub> \* T<sub>2</sub> +

$$\text{Bin} = I_1 T_2 + I_1 T_3 + I_2 T_3$$

$$f(P, Q, R) = \Sigma(3, 5, 6)$$

$$= \Sigma(3, 4, 5, 6, 7)$$

\* T<sub>2</sub>

## Microprogrammed Ctrl Unit Design

It offers flexibility. Here, the binary pattern of ctrl signals (either with encoding or without encoding) stored in a memory (ctrl memory) & b/w is used to realize the ctrl signals. Any modification requires changing the binary patterns. The b/w remain intact based on how many bits are used for each ctrl words. The u-programming can be termed as Horizontal u-programming.

or Vertical u-prog. The Horizontal u-programming consumes more bits for each ctrl word but offer maximum degree of parallelism.

In a vertical u-prog., the ctrl signals are first encoded. The generated patterns is stored in ctrl memory. It is required to decode this ctrl word before the signals are generated.

11

11

A

Dis

(1 bit / ctrl signals)

Horizontal map

• binary  
with

stored

in is

Is any

binary

Based

for each

be

examining  
horizontal

more bits

max.

Control program

ctrl word

It is generating

in order to

make ctrl

prog.

CLK

B<sub>0</sub>

-B<sub>1</sub>

COR

A<sub>in</sub>

A<sub>out</sub>

Cout

in out in out in

1 0 0 1 0

let control word

A<sub>in</sub> B<sub>out</sub>

ctrl signals  
patterns  
between

will never change while binary

are changed.

is required

The time taken to generate ctrl signal.

the

Time to access

time taken for

$$T_{cg} = T_{cm} + T_{Hw} - \text{using H/w}$$

System requires 6-ctrl signals. Hence,  
every ctrl word in COR is of 6-bit

If 1 ctrl word can generate more signals.

Adv - Max. degree of parallelism

Disadv - Lengthy ctrl word.

More no. of zero's, not efficient use of memory

If ctrl bit or ctrl signal  $\alpha$  is not open  
is performed or no signal is generated.

encoding

$$\begin{array}{l} A_{in} = 001 \\ \text{6-ctrl } A_{out} = 010 \\ \text{signals } B_{in} = 011 \\ \text{encoded } B_{out} = 100 \\ \text{in } C_{in} = 101 \\ \text{8-bits } C_{out} = 110 \end{array}$$

Vertical U-prog.



If ctrl signal can trigger only one signal.  
(generate)

$$T_{cg} = T_{cm} + T_{decoder} + T_{HW}$$

in operation  
generated.

The ctrl signals are encoded & stored & then decoded, so there is need for a decoder in vertical microprog. Hence, more time is consumed than horizontal while in horizontal microprogram the decoded ctrl signal is stored so, no external decoder is needed. Here

it/w is not changed in both microprogs.

Disadv - (1) More time required to access

(2) one word can generate only one signal

Sequencing the ctrl words - it is required to access the ctrl word in specific order, then only the instruction is implemented properly. Here the ctrl words are following the concept of node of a linked list.

Sequencing is done by PC.

- One add/mem instr is used to perform sequencing the ctrl words.



Ques - Consider a u-programmed C.I. design, which has to support 128 instrns. Each instrn on average consumes 8-ops. The system has to support 16-condns with horizontal u-prog. How many bits are required for each ctrl word & what is the size of ctrl memory (in bytes) required. The one addr. bit used is used for instr. sequencing.

4 bits repr. 16 condns.  $2^4$  bits  $\rightarrow$  1 word - 6 bits



|                | 8-word | 8 u.op.                               | 1 Nei |
|----------------|--------|---------------------------------------|-------|
| I <sub>0</sub> | 11     | 8 word x 128 instr.                   | 11    |
| I <sub>1</sub> | 21     | $2^7 \times 2^3 = 1024$ words (total) | 12    |
|                |        | Hence, 10-bit address                 | 13    |
| I <sub>0</sub> | "      | 64 control signals                    | 14    |
| Control Memory |        |                                       |       |

(i) Control word - 78 bits

(ii) Control memory -  $1024 \times 78$  bits

4-8 bits / Byte

$$= 128 \times 98$$

$$= 9984 \text{ B}$$

$$\begin{array}{r} 128 \\ \times 98 \\ \hline 1024 \\ 1152 \\ \hline 9984 \end{array}$$

design,  
instructions

8 - 110per

- flag

How -

b ctrl

memory

- ctrl

Q. Repeat the above with vertical  $\mu$ -prog

|  | Cond | 4 Control<br>signals | Next<br>Addr |
|--|------|----------------------|--------------|
|  | 4    | ④                    | 10           |

$$(i) \text{ Ctrl word} = 2^{\text{no. of bits}}$$

$$(ii) \text{ Ctrl memory} = \frac{1024 \times 2^{\text{no. of bits}}}{8 \times 2}$$

$$= 128 \times 2^{\text{no. of bits}}$$

$$= 2560 \text{ B}$$

A Neither horizontal  $\mu$ -prog., nor vertical  $\mu$ -prog. is preferred bcoz the first one requires more control memory & the 2nd acquires more " signals, ie, degree of parallelism is 4.

The Hybrid  $\mu$ -prog. is used in which the ctrl signals are partitioned based on mutually exclusive property. Some of the fields follow the horizontal  $\mu$ -prog while others follow vertical  $\mu$ -prog.

E.g. Consider 1-add r. ctrl instr. which has to support the control signals of the following behaviour.

R.L.R  
128

22

to 24

PS 6 x

- 1- either 1 or none of the 63 ctrl signals

2. At most 5 from the remaining ones, what will be the min bits required for ctrl bit?

In

G,

2

|  | Cond.            | Control field                            | Next       |
|--|------------------|------------------------------------------|------------|
|  | F <sub>1</sub> 6 | 1st sign bit<br>last in 1111111111111111 | 5 bit<br>9 |

fields

F<sub>1</sub> - either 1 or 6 (follows) H UP

F<sub>2</sub> = At most 8 from remaining ones  
(follows H UP)

ctrl field = 11 bits

H UP - 1 bit / ctrl signal.

Nc.

One-A ctrl instruction has to support 10 groups of control signals

|                |                |                |                |                |                |                |                |                |                 |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|-----------------|
| G <sub>1</sub> | G <sub>2</sub> | G <sub>3</sub> | G <sub>4</sub> | G <sub>5</sub> | G <sub>6</sub> | G <sub>7</sub> | G <sub>8</sub> | G <sub>9</sub> | G <sub>10</sub> |
| 3              | 6              | 5              | 11             | 1              | 6              | 13             | 3              | 5              | 7               |

Min<sup>m</sup> no. of bits saved for each control signal with respect to Horizontal shift reg.

Tb.

C to

J/E

m

S

word

cu

nar

$$\begin{aligned} \text{Min}^m \text{ no. of bits need in HUP} &= 3 + 6 + 8 + 11 + \\ &\quad 1 + 6 + 13 + 3 + 5 + 7 \\ &= 60 \text{ bits} \end{aligned}$$

con-

uni

what will  
ctrl bit?

In hybrid up

$G_1 \quad G_2 \quad G_3 \quad G_4 \quad G_5 \quad G_6 \quad G_7 \quad G_8 \quad G_9 \quad G_{10}$

2      3      3      4      1      3      4      2      3

totaled

min<sup>m</sup> bits are saved; for HUP = 32 bits

Max<sup>m</sup> degree of parallelism = 10

(up) ~~10 ctrl signals, each for each group.~~

Min<sup>m</sup> " " " " " 0

Nano-programmed CU design

The nano-programmed CU design uses 2+2 groups

ctrl memory (hi-ctrl memory, nano ctrl memory)

The ctrl words are sequenced in hi-ctrl

memory & generated in nano ctrl mem. It supports max<sup>m</sup> flexibility & modular pro-

gramming. The firmware based app<sup>n</sup> uses nano-pro-

grammed CU, the bootstrap prog. (BIOS) is stored in

nano-programmed CU since it uses two-level

controllable memory. It is the slowest ctrl

unit design.

20 bits

68

2-level design



bits are required to address  $H_w$  words

$$\log_2 H_w$$

$$H_m \approx \log_2 H_m$$

Control Memory Size = MCM size + NCM size

$$H_m [\log_2 H_m * H_w] + H_w * P$$

1 ctrl word accessed from  $T_m, T_n$  (Nano Ctrl M)

\* follow the Horizontal up and reduce effective control memory

$$T_{CS} = T_{ACM} + T_{NCM} + T_{HW}$$

Q. Let  $H_m = 1024$ ,  $H_w = 64$ ,  $P = 48$

$$\text{cm size} = 1024 [16] + 64 \times 48 \text{ bits}$$

$$\begin{aligned} & \quad \frac{8 \text{ bits / bytes}}{(2048 + 384) \text{ B}} \\ & = 24.32 \text{ B} \end{aligned}$$



// Using HUP for max degree of parallelism

words =   
 How many bytes are saved w.r.t. 1-  
 log  $H_m$  ctrl memory, which has to offer some  
 degree of parallelism. (Indirectly HUP)

$$m = n \times \log H_m$$

$$\text{Single level cm} = 1024 (48 + 10) \text{ bits}$$

$\rightarrow$   $n + m \text{ cm size}$

$$\text{in bytes} = \frac{1024 \times 58}{8} = 7424 \text{ B}$$

$$+ H_w \times P$$

$$= 94$$

(Nanoctrl M) // More bits are required for HUP compar  
 to NDUP

∴ HUP = 1-level design of memory

NUP = 2-level

Ascending order of flexibility of (st) unit design -

HW, HUP, VUP, Nano

If it will reversed, it repr. ~~access time~~  
ascending order of the speed of operat<sup>n</sup>.

Nano, VUP, HUP, HW.

time

(2)

### Assembly language Program Tracing

Tracing → finding the contents of  
specifying register

(3)

Time requirement (time to  
complete)

Space requirement

If behaviour of program dictate by  
key instr.

(words)

Size F

F C

(1) LD R<sub>0</sub>, R<sub>1</sub> (1m) 4 : 2 clock/word 4

(2) ADD R<sub>1</sub>, R<sub>0</sub>, R<sub>1</sub> 2 : 1 + 1 + 1 2

(3) ADD R<sub>0</sub>, R<sub>1</sub>, R<sub>0</sub> 2 : 1 + 1 + 1 2

(4) ST R<sub>0</sub> [R<sub>1</sub>], R<sub>0</sub> 4 : 2 + 1 1

ctr

Let  $R_1 = 100$

$$M[100] = 200$$

$$m[200] = 100$$

time

(1)

$$R_0 \leftarrow M(R_1 + 100)$$

zeroth

$R_1 \quad R_0$

100 100

(2)

$$R_1 \leftarrow (R_1) + (R_0)$$

tracing

$R_1 \quad R_0$

200 100

5. of

(3)

$$R_0 \leftarrow (R_1) + (R_0)$$

time to

$R_1 \quad R_0$

200 300

(4)

$$M(R_0 + 100) \leftarrow R_0$$

by

$$M(400) \leftarrow 300$$

5)

F

f. (clock)

$R_1 \quad R_0$

200 300

2 clock forward

4

1 m/p

2

1 1 1

7

2 \* 1

4

The tracing of the prog. result

$R_1$  contains 800,  $R_0$  contains 800, &

$$m(400) = 300$$

Space Req.

Let each word is occupying 32 bits  
 & the prog. is stored in a byte organised memory from the locat<sup>n</sup> 1000 onwards (decimal address). If an interrupt is occurred during the 1<sup>st</sup> ADD instr., what will be the addr. saved in the stack (take previous records)

|       |                               | Size (Words)        | Blks |
|-------|-------------------------------|---------------------|------|
|       |                               | 4                   | to   |
| 1000  | 4 B = 16 locatns              | 2                   |      |
| LD    |                               |                     |      |
| 1015  | Address of next instr. i.e; 2 |                     |      |
| ① ADD | 1016                          |                     |      |
| 1023  | 1024 is saved which           | 4                   |      |
| ② ADD | 1024                          |                     |      |
| 1031  | is to be return.              | 12 words            |      |
| ST    | 1032                          |                     |      |
|       |                               | $12 \times 4 = 48B$ |      |

How many clocks are required to complete above prog. (Time complexity)

| Size | A | F               | Total (Clocks)   | get |
|------|---|-----------------|------------------|-----|
| 4    | * | 2clock/word + 4 | $8 + 4 = 12$     |     |
| 2    | * | 1 + 2           | $2 + 2 = 4$      |     |
| 2    | * | 1 + 2           | $2 + 2 = 4$      |     |
| 4    | * | 2 + 4           | $8 + 4 = 12$     |     |
|      |   |                 | <u>32 clocks</u> |     |

After

82 bits  
organized  
words

## IO Organization

is  
2str, Wat  
stack



CPU can't connect directly to I/O because of -

- ① Speed mismatch
- ② Diff. device drivers

③ Diff. data format Sum take  
(8-bit pattern and some 7-bit)

words)

Black box is placed b/w them for their communication to each other.

Words  
= 48B



Interface  
I-module  
I-to-processor      Buffering (Input)  
I-to-processor      Latching (Output)

i) Processor gives data when it is latched or when complete.  
# take data when it is buffering.

ii) Buffering is nothing but selective amplification logic '1' signal gets amplified & logic '0' is get reduced.

$$0 \rightarrow 1 \quad '0'$$

$$8:5 \text{ to } 5 \quad '1'$$

2 clocks



After buffering logic '0' retained in '0' & '1' retained in '1'.

Logics will not change only quantity will change

11/a

### Interface ("No Error Correct")

Inte  
done

Based

on Serial 8251 USART, PCI (Programmable

Connecting (Universal syn. Asyn.) Direct Comm. Interface

device Parallel 8255 PPI (Programmable

providing connectivity peripheral Interface)

Sys

By placing the ctrl word into ctrl word register,  
the interface is programmed.

② I

Device is serial & processor is parallel. If serial interface is, it has to do additional mode conversions -

II

① Serial-to-parallel (Placed in receiver sect)

② parallel-to-serial (placed in transmitter sect)

interf

II device given data in serial & change it to parallel



TSS

II placed SISO register in receiver sect for gen S to P conversion.

① Ad

II placed PISO register in transmitter sect for P to S.

② Da

will charge

### Module : Error Control

Interface can't do error control - but mod. does

(Programmable  
mm. Interface)  
mmable  
eral Interface)

and register,

all in  
o additional  
ecives serial  
mitter serial)

ange it to

InP : if implement IO instruct  
(IO processor) (capable to deal with IO)

System has two processor.

- ① CPU processor
- ② IO processor

II IO processor deal with several module. Each module connected to several interfaces & each interface connected to several devices.



### ISSUES in IO

es seen for

er seen for

- ① Addressing (How to address IO device)
- ② Data transfer (Transfer from IO device)

72



(Addressing) A<sub>S</sub> → Partitioned into MS & DOS  
(memory space)



Sol<sup>n</sup>

Can't have same address for MS & DOS.

P

Adv - ① Memory & I/O address is distinct.

② All memory transfer modes can be used in I/O { Flexibility

1DA 8000 A ← [8000]

1DA 1000 A ← [1000]

Disadv - → I/O affect memory space

Sol<sup>n</sup> -



napped IO

Above  $SOI^n$  is named as Isolated IO/I

IO

Adv. of Above  $SOI^n$

Δ IOs doesn't effect MS

niver

disadv - Addresser are not distinct

Driven



both starts at '0'

$SOI^n$  Ia/M is used (int'l Address line)

It ctrl bit is used for IO device

in 0, 1 in memory.

It separates IO instructions

IN OUT

& IOs

X provides limited flexibility.

distinct

Program Driven

flexibility

One of the instr. checking the device status

If device is ready, it performs IO operation

technique called as program driven.

Adv - Simplicity (need not takes any h/w).

Disadv - processor is forced to wait till the device is ready i.e; inefficient usage of CPU.

### ③ Block transfer mode.

#### Burst Mode -

The Burst Mode delays the processor for longer time (System bus is returned only after entire data has been transferred).

1)

#### Cycle Stealing Mode -

The cycle stealing mode makes DMA to wait longer time (the sys. bus is returned by DMA controller after every word xfer & acquired after every instr. cycle).

#### Block Xfer Mode -

The Block xfer mode maintains the advantage of Burst mode & cycle stealing mode. If the block size is 1 word then it reduces to cycle stealing, if block covers entire data then it reduces to burst mode.

The processor performance is reduced due to DMA.

ys bus  
controller

$$\frac{1}{2} \text{ of time processor blocked is}$$

$$= \frac{T_x}{(T_x + T_y)} * 100$$

is returned where,

$T_x$  = word transfer time

$T_y$  = word preparat' time



Interrupt Driven - (Device Controlled)



Adv - efficient usage of CPU  
disadv - More complexity

DMA -

In a DMA, the processor is isolated from the data path. The DMA controller coordinate the data xfer, the bulk data can be xfered in a rapid rate using DMA, the sys bus is shared b/w processor & DMA controller in master-slave mode.

Based on when the system bus is returned, the DMA operating modes can be 1.

① Burst Mode

② Cycle stealing mode

24

Let take 2μs to place word.



System bus is not available at  $T_x + T_y$  time  
i.e., processor gets blocked.

$$\frac{2}{1} \times 10\%$$

10

CPU blocked = 20%

If sys. uses 2 system buses theoretically,  
sys. doesn't get blocked but practically  
it is having some waiting time & the  
costs are also increased.

### Issues in Interrupt driven - Implementation

Processor efficiency is increased  
but complexity is increased.

S is placeholder.

Temporary

atty time

oretically,  
practically

in the

lemental

|                                                           | CPU | Device                                                                              |
|-----------------------------------------------------------|-----|-------------------------------------------------------------------------------------|
| (1) When it should recognize                              |     | When the device shows interrupt // At any time                                      |
| (2) How                                                   |     | How to interrupt                                                                    |
| (3) What is recognized                                    |     | Change the signal level<br>(level triggered)<br>change the edge<br>(edge triggered) |
| 4. How to resolve                                         |     |                                                                                     |
| II. Interrupt is Asynchronous                             |     |                                                                                     |
| TRAP - both level & edge triggered.                       |     | current                                                                             |
| (1) After the completion of current instruction.          |     |                                                                                     |
| (2) INTR flag is used to recognize when interrupt occurs. |     |                                                                                     |
| (3) Branch to ISR (Interrupt Service Routine)             |     |                                                                                     |
| (4) Use priorities to resolve                             |     |                                                                                     |

Branch to ISR

Use Device information "Vector interrupt"

address

device itself giving the address, it is known as,

Default location - "Scalar interrupt"

Implementing the priorities -

The priorities are implemented either by using -

- ① Daisy chaining (Device chaining)
- ② Polling

Daisy chaining.

Devices are positioned such a way the highest priority is closer to processor & lowest priority is farther to processor.



due to lowest priority, starvation occurs.

Positions of priority are fixed so, it's also known as static or fixed priority.

Polling

Each device has TNTR & INTA lines. The buses are polled device by device until it receives interrupted one. By changing the order

then by

of putting dynamic position get changed.  
due to this instantaneity will reduced but  
is not economical.

### Secondary Memory

(1) Disk Memory - the magnetic disk is a random access memory. The unit of xfer is one sector. The disk sys. is associating with disks stayed together cylinder is the unit accessed in the disk system. The disk address refers the concerned sector.

| Drive offset | Surface offset | Track offset | Sector offset |
|--------------|----------------|--------------|---------------|
|--------------|----------------|--------------|---------------|

lowest priority

Sector is addressed by either logical or linear address

Surface

$\langle C, S, \text{Sector} \rangle \rightarrow$  linear address

cylinder

logical address

- which cylinder it belongs

- which surface it belongs

- a sector is in which surface

Based on the track capacity, the disk can be  
constituted using constant & linear recording cap.  
density variable

Angular Speed is going to change acc. to Velocity tracks

for inner track = less velocity

" outer " - more velocity

Concerned with variable recording density.  
Linear Velocity - It is concerned with constant recording density.

Capacity of track is changed but distance between bits is so constant

Q. Consider a disk which is having 10 equidistant tracks. The inner track diameter is 1 cm & outer track dia. is 10 cm.

- ① What is the capacity of the disk if it's using constant linear velocity
- ② Constant angular velocity

Answers - (a) 100 MB, 550 MB (c) 100 MB, 100 MB  
(b) 550 MB, 100 MB (d) 550 MB, 550 MB



capacity

changed i.e.

velocity

6-8 dB/m

8.30 - 10 - L7

R/W



The perimeter of track is varied.

R/W contains the conductor

$$T = T_{\text{seek}} + T_{\text{rotational latency}} + T_{\text{data transfer time}}$$

(fixed)

Seek time & rotational time are dominating

Semi-random access

Rotational latency - getting the sector under R/W head (Serial access)

As the perimeter is changing

No. of bits varies track to track

Constant recording density - Track capacity  
(distance is going to be changed) varies

Variable -  $\ell$  is changed

$e \leftarrow e_{\min}$  (for outer track)

$e \rightarrow e_{\max}$  (for inner track)

It is having constant track capacity