

## Computer Organization

### Architecture

Computer architecture define the need or purpose purpose of computers that to be desire.

Computer organization then realizes the computer to full fill the needs define by the corresponding architecture.

\* Adder :



$$\begin{array}{ccccccc}
 & x_0 & y_0 & c_0 & s_0 & c_0 \\
 & 0 & 0 & 0 & 0 & 0 \\
 & 0 & 0 & 1 & 1 & 0 \\
 & 0 & 1 & 0 & 1 & 0 \\
 & 0 & 1 & 1 & 0 & 1 \\
 & 1 & 0 & 1 & 0 & 1 \\
 & 1 & 1 & 0 & 1 & 1 \\
 & 1 & 1 & 1 & 1 & 1
 \end{array}$$

$$\begin{aligned}
 s_0 &= \bar{x}_0 \bar{y}_0 c_0 + \bar{x}_0 y_0 \bar{c}_0 + x_0 \bar{y}_0 \bar{c}_0 \\
 &\quad + x_0 y_0 c_0 \\
 c_0 &= \bar{x}_0 \bar{y}_0 \bar{c}_0 + x_0 \bar{y}_0 c_0 + x_0 y_0 \bar{c}_0 \\
 &\quad + x_0 y_0 c_0 \\
 &= x_0 y_0 + y_0 \bar{c}_0 + \bar{x}_0 \bar{c}_0 \\
 &= x_0 y_0 + (c_0 + x_0 + y_0)
 \end{aligned}$$

Thus, sum needs 3-gate delay and carry needs 2 gate delay.

4-bit full adder circuit (Carry Propagated Adder) :



→ Total delay  $\rightarrow d$  for 4 bit adder,  
 $\text{Carry } + 2^n$   
 $2^n \rightarrow 2^{n-1}$   
overflows  $\rightarrow 2^{n+1}$

→ For n bit adder delay will be  $(2n+1)d$ .

→ For 16 bit adder delay will be 33d,

This delay is prohibitive for high speed systems and increases with the size of adder.

The delay also creates the necessity to devise a new technique to increase the speed of operation known as

Carry - look - ahead. In this approach extra hardware is used to generate Carry ( $C_P, i > 0$ ) directly from  $C_0$ .

Let us recall that in a full adder the output  $C_P$  is related to ~~the~~ carry input  $C_P$  as follows:

$$\begin{aligned} C_{P+1} &= x_P y_P + x_i c_i + y_i c_i \\ &= x_P y_P + (x_P + y_P) c_i \\ &\equiv G_i + P_i c_i \end{aligned}$$

$$\text{where } G_i = x_P y_P \text{ and } P_i = (x_P + y_P)$$

$G_i$  is called Carry generate function since a carry is generated when  $x_P = y_P = 1$ .

$P_i$  is referred as Carry propagate function since if  $x_i$  or  $y_i$  is 1 then the input carry,  $C_i$  propagates to the next using  $G_i$  and  $P_i$ .  $C_1, C_2, C_3$  and  $C_4$  can be expressed

as follows:

$$C_1 = G_0 + P_0 C_0$$

$$C_2 = G_1 + P_1 C_1 = G_1 + P_1 + G_0 + P_0 C_0 = G_1 + P_1 G_0 + P_1 P_0 C_0$$

$$C_3 = G_2 + P_2 C_2 = G_2 + P_2 C_1 + P_2 P_1 G_0 + P_2 P_1 P_0 C_0$$

$$C_4 = G_3 + P_3 C_3 = G_3 + P_3 G_2 + P_3 P_2 G_1 + P_3 P_2 P_1 G_0 + P_3 P_2 P_1 P_0 C_0$$

The above results suggest that  $C_1, C_2, C_3$ , and  $C_4$  can be generated directly from  $C_0$ , thus these equations are called Carry-look-ahead equations, and the behavioral that implement the add-ahead add circuit (4CLC). Equations are called 4 stage carry look.



$$g_0 = G_3 + P_3 G_2 + P_3 P_2 G_1 + P_3 P_2 P_1 G_0$$

$$p_0 = P_3 P_2 P_1 P_0$$

A 4-CLC can be implemented as a two-level AND/OR logic circuit (FIFO level composed of AND gates and second level composed of OR gates) then needs 2 gate delays. All  $G_i, P_i$  points are generated parallelly from

$x_0 y_0$  with a need of 1 gate delay.

In particular the fan-in requirement for 4-CLC realization  $P_4 \oplus P_3 \oplus P_2 \oplus P_1 \oplus P_0 \oplus C_0$  and to generate  $C_4$  from to a 5 input OR gate  $P_5$  needed)

It must be emphasized that the realization of  $n$ -CLC requires an  $n+1$  input AND gates and an  $n+1$  input OR gate. Since the maximum fan-in offered by existing technology is  $8$ , it is not possible to build more than  $7$ -CLC. In this content 4-CLC is an excellent compromise.



{ n bit delay formula

$$nD = 2d + [n]x2d + 3d$$

$$[1d + \frac{n}{4}x2d + 3d]$$

Fan-out → max number

of output that one gate drives

Fan-in → max number of input that one gate can have.

If we have  $n$  stage C-L-C, the gate delay remains  $6d$  for  $n$  bit adder, where  $n$  could be

In particular the fan-in requirement for 4-realisation is 5 (5 input and gate to generate  $P_3 P_2 P_1 P_0 G_o$  to generate  $C_4$  from  $C_0 \oplus 5$  input OR-gate  $P_0$  sequence).

It must be emphasized that realisation of CLC requirement an  $(n+1)$  AND gate and  $(n+1)$  OR gate.

∴ The maximum fan-in offered by existing technology is 8 (TTL family), it is not possible to build more than 7 stage CLC. In this context, 4 stage CLC is an excellent compromise.

$$C_1 = G_0 + P_0 G_o$$

$$C_2 = G_1 + P_1 G_o + P_1 P_0 G_o$$

$$C_3 = G_2 + P_2 G_o + P_2 P_1 G_o + P_2 P_1 P_0 G_o$$

$$C_4 = G_3 + G_3 P_3 + G_3 P_2 P_1 + G_3 P_2 P_1 P_0 + P_3 P_2 P_1 P_0$$

$$G'_o = g_o + b_3 + b_2 P_3 + b_1 P_3 P_2 + b_0 P_3 P_2 P_1$$

$$P'_o = P_o = P_3 P_2 P_1 P_0$$

$$G_i = Y_{ii} = \text{constant}$$

$$C_1' = G_0' + P_0' C_0$$

$$= G_9 + G_2 P_3 + G_1 P_3 P_2 + G_0 P_3 P_2 P_1 + P_3 P_2 P_1 P_0 C_0$$

$$= C_9$$

$$C_2' = G_1' + G_0' P_1' + P_1' P_0' C_0$$

$$= (G_7 + G_6 P_7 + G_5 P_7 P_6 + G_4 P_7 P_6 P_5)$$

$$+ (G_3 + G_2 P_3 + G_1 P_3 P_2 + G_0 P_3 P_2 P_1) P_7 P_6 P_5 P_4$$

$$+ (P_7 P_6 P_5 P_4) (P_3 P_2 P_1 P_0) C_6$$

$$= (G_7 + G_6 P_7 + G_5 P_7 P_6 + G_4 P_7 P_6 P_5) + (G_3 P_7 P_6 P_5 P_4 +$$

$$G_2 P_7 P_6 P_5 P_4 P_3 + G_1 P_7 P_6 P_5 P_4 P_3 P_2 + G_0 P_7 P_6 P_5 P_4 P_3 P_2$$

$$+ P_7 P_6 P_5 P_4 P_3 P_2 P_1 P_0 C_6$$



out number of print in auto dev  
in Q) Find out the delay using single layer dev if

$$n_{th} = 1d + \left(\frac{n}{4}\right) \times 2d + 3d$$

$$= 1d +$$

$$g_8 = G_{11} + G_{10} P_{11} + G_9 P_{11} P_{10} + G_8 P_{11} P_{10} P_9$$

$$P_2' = P_1 P_{10} P_9 P_8$$

$$g_3 = G_{15} + G_{14} P_{15} + G_{13} P_{15} P_{14} + G_{12} P_{15} P_{14} P_9$$

$$P_3' = P_{15} P_{14} P_{13} P_{12}$$

Carry save addition

If there is a need to add more than two operands, a technique known as Carry save addition or Carry select addition is used. To see the effectiveness let us consider this example →

$$\begin{array}{r}
 44 \\
 28 \\
 32 \\
 19 \\
 \hline
 63 \\
 12 \\
 \hline
 183
 \end{array}$$

In this example 4 two digit decimal numbers are added. First the unit digits are added producing sum digit 3 & carry digit 2. Similarly the tens digit are added producing sum digit 6 and carry digit 1.

Since there is no carry propagation, these summations can be carried out in parallel to produce sum vector 63 8 carry vector 12.

When all operands are exhausted, the sum vector & shifted carry vector (1 position left shift) are added. In a conventional memory which produced final result.

and shifted manner which copy vector product are added in the conventional manner in which the final result.



b) Block cell

a) Block Diagram → sign-extended bits.



c) Non-invertive Semantic

Let  $m$  Consider the example  $Z = A + B + C + D$  where  $A, B, C,$   
and  $D$  are 3-bit 2's complement numbers, ~~where~~ the block diagram  
for this summation process sum vector's and carry vector's.

The sum vector S' and shifted carry vector C' and  
fourth operand  $D$  are applied to second CSA. The result produced  
by second CSA and were finally handle by CPA generate  $Z$ . The  
carry propagate only in the last step. So the total time required  
to add 4-operands is  $\rightarrow$

$$T(4) = 2 * (\text{CSA add time}) + (\text{CPA time})$$

In general, n operands can be added is

$$T(n) = (n-2) * (\text{CSA add time}) + (\text{CPA time})$$

The result can further improve by using CLA in the last stage replacing CPA.

Note  $\rightarrow$  A, B, C and D all are 3-bit 2's complement number. However the final result lies in the range -16 to +12, then  $Z$  requires 5 bits. These extra two bits are called Guard Bits. Thus, form first level addition. Last CPA/CLA will be of 4-bit and least significant bit does not participate in last stage addition.

Important note in size of adder, as grand bit plays an all, all, time.

No of required guard bits for n operators  $\approx \lceil \log_2 n \rceil$

The no of CSA stages can be reduced if the CSAs are organized in the form of a tree, popularly known as Wallace Tree.

The tree structure corresponding to a 6 operand summation is shown beside.



$$Z = A + B + C + D + E + F$$

Finally below:-



1101 → multiplier ( $M$ )

1011 → multiplicand ( $O$ )

1101 → PP<sub>1</sub>

1101  
100111 → PP<sub>2</sub>

0000 × ×  
—————  
100111 → PP<sub>3</sub>

1101 × × ×  
—————  
10001111 → PP<sub>4</sub>

Paper pencil method of multiplication



$m \rightarrow$  no of oper  
 $n \rightarrow$  no of bit

~~n bit adder~~

$$CPA = n + \lceil \log_2 m \rceil - 1$$

∴ no of adder in  $n$  bit no in CSA =  $n + \frac{\log m}{2}$

∴ no of adder in  $n$  bit no in CPA =  $(n + 1)^2$



## Multiplication of positive Numbers

The usual paper pencil algorithm for multiplication of integers represented in any positional number system is illustrated below for binary system, assuming two 4-bit operands.

The producing of two  $n$ -digit numbers can be accommodated in 2n digits, so a product in this example first just

8-bits

In the binary system, multiplication of the multiplicand by 1 bit of the multiplier is easy. If the multiplier bit is 1, the multiplicand is entered in the appropriate shifted position for addition with other shifted multiplicands to form the product. If the multiplier bit is 0, then 0's are entered.

It is possible to implement positive operand binary multiplication in a purely combinational two-dimensional (2-D) logic array as shown above. The main component in each cell is an AND gate in the cell determined whether or not a multiplicand bit  $m_i$  is to be added to the incoming partial product  $b_{pt}$ , based on the value of the multiplier bit  $g_j$ .

Although the above combinational multiplier is quite easy to understand but it is usually impractical to use in general-purpose computers because of its cost. There are two solution →

1. Computers don't support exclusive OR or division, i.e. they do not support hardware circuit implemented by loop program.

Q. provide hardware multiplier and divider by sequential digital circuits as follows.



This circuit performs multiplication by using simple adder of size  $n$  for  $n$  terms.

Initially,

Registers:  $A$  of size  $n$  bits set to zero.

Counter bit  $c_n$  cleaned to be 0. Register  $Q$  of size  $n$  holds the multiplicative result.

a) Register Configuration:



## Signed operation multiplication

According to multiple rule if both multiplicand and multiplier are negatives the result will be as good as both are positive.

Next let us consider the case of a positive multiplicand and a negative multiplicand, where we address a negative multiplicand to a partial product, one must extend the sign bit value of the multiplicand to the left as far as extent of the eventual product.

Let us consider the example of multiplying  $-13$  (multiplicand) with  $+11$  (multiplicand) below, whose sign extension of the multiplicand do as shown underline.

$$\begin{array}{r}
 10011 \quad (-13) \\
 01011 \quad (+11) \\
 \hline
 \end{array}$$

$$\begin{array}{r}
 \underline{\underline{1111}} \quad \underline{\underline{10011}} \\
 \underline{\underline{1111}} \quad \underline{\underline{10011}} \\
 00000000xx \\
 \underline{\underline{1110011}} \quad \underline{\underline{10011}} \\
 00000000xx \\
 \hline
 \underline{\underline{11011}} \quad \underline{\underline{10001}} \quad (-143)
 \end{array}$$

Significance of any number be positive or negative does not change on sign extension.

So here the partial products are negative, every partial product needs to be converted with sign extension to the size of 10-bit.

(Here multiplicand cannot be negative)

## Booth Algorithm

A powerful direct algorithm for signed-number multiplication is the Booth algorithm, it generates a 2n-bit product and treat positive and negative numbers uniformly.

Let us consider a multiplication operation in which a positive multiplier has a single block of 10

with at least 0 at each end for example 001110.

$(30)_{10}$  To derive the product we need to add four opportunities shifted multiplicand as in the standard process. However, this case can be seen as difference of two numbers as,

Follows:-

$$\begin{array}{r} 0100000 \quad (32)_{10} \\ - 0000010 \quad (2)_{10} \\ \hline 0011110 \quad (30)_{10} \end{array}$$

This suggests that the final product result can be achieved by one addition and one subtraction. For convenience now onward we will represent the standard scheme by writing the multiplier as 0 0 +1 +1 +1 +1 0 and the new recoding scheme by writing the multiplier as 0 +1 0 0 0 -1 0.

Note → the recoding scheme as (scan from right to left)

$$0 \rightarrow 0 \quad 0$$

Consider, 0 at the extreme right

$$0 \rightarrow 1 \Rightarrow -1$$

$$01101 \text{ equivalent } +1 0 -1 +1 -1$$

$$1 \rightarrow 0 \quad +1$$

(multiplying by -1 produces 2's complement of multiplicand)

$$1 \rightarrow 1 \quad 0$$

Normal 2 Booth multiplication scheme ( $45_{10} \times 30_{10}$ )

Booth multiplication can handle negative multipliers as well, in 2's complement obviously.

Normal method

$$\begin{array}{r}
 & & & & & 0 & 1 & 1 & 0 & 1 \\
 & & & & & 0 & 1 & 0 & 1 & 1 & 0 & (45)_{10} \\
 & & & & & 0 & 0 & 1 & 1 & 1 & 0 & (30)_{10} \\
 \hline
 & & & & & 0 & 0 & 0 & 0 & 0 & 0 & \\
 & & & & & 0 & 1 & 0 & 1 & 1 & 0 & x \\
 & & & & & 0 & 1 & 0 & 1 & 0 & 1 & xx \\
 & & & & & 0 & 1 & 0 & 1 & 1 & 0 & x \quad x \\
 & & & & & 0 & 1 & 1 & 0 & 1 & x & x \quad x \\
 & & & & & 0 & 1 & 1 & 0 & 1 & x & x \quad x \\
 & & & & & 0 & 0 & 0 & 0 & 0 & x & x \quad x \quad x \\
 & & & & & 0 & 0 & 0 & x & x & x & x \quad x \quad x \\
 \hline
 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 1 & 0
 \end{array}$$

Booth Algorithm.

$$\begin{array}{r}
 & & & & & 0 & 1 & 0 & 1 & 1 & 0 & 1 \xrightarrow{\text{010010}} \\
 & & & & & 0 & 1 & 0 & 0 & 0 & -1 & 0 \\
 \hline
 & & & & & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
 & & & & & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 0 & 1 & 1 & x \\
 & & & & & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x & x \\
 & & & & & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x & x & x \\
 & & & & & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x & x & x \\
 & & & & & 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & 1 & x & x & x & x \\
 & & & & & 0 & 0 & 0 & 0 & 0 & 0 & 0 & x & x & x & x & x & x \\
 \hline
 0 & 0 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0
 \end{array}$$

$$\begin{array}{r} 1 \ 1 \ 0 \ 1 \\ 1 \ 0 \ 1 \ 0 \end{array} \quad (\text{IB})$$

$$\begin{array}{r} 1 \ 1 \ 0 \ 1 \\ 1 \ 0 \ 1 \ 0 \end{array} \quad (-6)$$

$$\Rightarrow \begin{array}{r} 0 \ 1 \ 1 \ 0 \ 1 \\ 0 \ -1 \ +1 \ -1 \ 0 \end{array}$$

$$\begin{array}{r} 0 \ 1 \ 1 \ 0 \ 1 \\ 0 \ -1 \ +1 \ -1 \ 0 \\ \hline 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \\ 1 \ 1 \ 1 \ 1 \ 1 \ 0 \ 0 \ 0 \ 1 \ 1 \rightarrow 2^{\text{'s complement}} \\ 0 \ 0 \ 0 \ 0 \ 1 \ 1 \ 0 \ 0 \ 0 \ 1 \ 1 \\ 1 \ 1 \ 1 \ 0 \ 0 \ 1 \ 1 \ 0 \ 0 \ 0 \ 0 \\ \hline 1 \ 1 \ 1 \ 0 \ 1 \ 1 \ 0 \ 0 \ 1 \ 1 \end{array}$$

Booth algorithm achieves two purpose:-

1 uniformly transforms multiplicand

booth positive and negative n-bit

2 ~~address~~ achieves some efficiency in the number of summands generated when the multiplier for few long booleans of 1's.

Booth speed-up technique :-

Now let us describe a multiplication speed-up technique which Guarantees that an n-bit multiplier will generate at most  $\frac{n}{2}$  summands and uniformly handled signed-operand cases.

This provides a multiplication speed-up by a factor of 2 over the worst case Booth algorithm situation. This new technique can be derived from the Booth

Technique where multiplier bit is selected on the right, this is a function of the bits to the left of the LBB portion of the product before the summand is added.

The basic idea of the speed of parallel bits  $\frac{q+1}{2}$  select one summand as a function of bit  $\frac{q-1}{2}$  to be added for bit  $P$ . Thus there will be  $\frac{n}{2}$  summands for  $n$ -bit multiplier.

Sign  $\rightarrow 11010 \circlearrowleft$  implied to add the right

extension  $\downarrow \begin{matrix} 0 & 0 \\ \downarrow & -1 \\ 1 & 0 \end{matrix}$

Multiplexer  
bit pair

multiplexed at Pth position  
on the right

multiplicand  
selected at Pth position

$\frac{q+1}{2}$  is multiplied

0 0

0 0

0 1

0 1

1 0

1 0

1 1

1 1





Integer division :-

$$\begin{array}{r}
 & 021 \\
 13 \overline{)274} & \text{(Dividend)} \\
 & 26 \\
 & \hline
 & 14 \\
 & 13 \\
 & \hline
 & 1 \quad \rightarrow \text{Quotient} \\
 & 0 \quad \rightarrow \text{Reminder}
 \end{array}$$

Decimal division

$$\begin{array}{r}
 000010101 \\
 1101 \quad | 10001\ 0010 \\
 & 1101 \\
 & \hline
 & 0010000 \\
 & 1101 \\
 & \hline
 & 0001110 \\
 & 1101 \\
 & \hline
 & 0001
 \end{array}$$

in Boolean in division

\* Restone in type division.

The important point to of trial determination of the automate in a logie circuit.

conclude is that the process Quotent digit is difficult to

The simplest circuit that implement binary division must methodically position of division with respect to the dividend and perform a subtraction. If the result of subtraction is zero or positive a Quotent bit of 1 is determine. The remainder is extended another bit of the dividend, the divisor re-position and another subtraction is perform. On the other hand, if the remainder is negative a Quotent bit of zero is determine. And dividend restone adding back. The division is reposition for the subtraction.

An n-bit positive integer divisor is loaded register M and an n-bit positive dividend is loaded into register A at the start of the operation. Register A is to 0. After the division is comp

In A. The required substractions are 0's comp arithmetic.

The extra bit position at the left end of both A and M is for the sign bit for these substractions.

\* Restoring algorithm →  
Do n times

- Shift A and Q left one binary position.
- Subtract M from A, placing the answer back in A.
- If the sign of A is 1, set  $q_0$  to 0 as add M back to A otherwise, set  $q_0$  to 1.

It is possible to improve on this algorithm by adding the need for restoring after subtraction.

Let us consider the sequential sequence of operation after subtraction operation in restoring algorithm that takes place

If A is positive, we shift it left and subtract M; that is we perform  $2A - M$ .

If A is negative, we restore it by performing  $A + M$  and then we shift it left and subtract M; that is we perform  $2A + M$ .

The  $q_0$  bit to appropriately set to 0 and 1.

# Non-Restoring Algorithm

S1: Do n times

Shift A and Q left one binary position. If sign of A is 0, subtract M from A otherwise add M to A. Next if sign of A is 0, set q<sub>0</sub> to 1 otherwise set q<sub>0</sub> to 0.

S2: If the sign of A is 1, add M to A.

Example: Restoring & Non-restoring

Initially

$$\begin{array}{r} A \ 00000 \\ - M \ 00011 \\ \hline 00001 \end{array}$$

Restoring Restoring

First cycle

$$\begin{array}{r} 11101 \\ - 1110 \\ \hline 00011 \end{array}$$

$$\begin{array}{r} 000100000 \\ - 000100000 \\ \hline 000000000 \end{array}$$

shift

$$\begin{array}{r} 000100000 \\ - 000100000 \\ \hline 000000000 \end{array}$$

Subtract

(A-M)

set q<sub>0</sub>

Restore

shift

Subtract

(A-M)

set q<sub>0</sub>

Second cycle

Third

$$\begin{array}{r} 11101 \\ - 1111 \\ \hline 00011 \end{array}$$

$$\begin{array}{r} 000100000 \\ - 000100000 \\ \hline 000000000 \end{array}$$

$$\begin{array}{r} 000100000 \\ - 000100000 \\ \hline 000000000 \end{array}$$

$$\begin{array}{r} 11101 \\ - 00001 \\ \hline 00001 \end{array}$$

shift 0 0 0 1 0      0 0 1 □  
 subtract A - M  
 zero 1 1 1 0 1  
 1 1 1 1 1  
 1 1 0 0 1 0  
 0 0 0 1 0      0 0 1 0  
 ↓                  ↓  
 Quotient      Remainder

Initially ~~M = 0 0 1 0 0~~      A      Q = 1 0 0 0 } First cycle  
 shift      0 0 0 1 1  
 subtract A - M      0 0 0 1 □  
 zero 1 1 1 0      0 0 0 0  
 1 1 1 0 0      0 0 0 □ } subtract

shift      1 1 1 0 0      0 0 0 0 □ } third cycle  
 Add (M+A)      0 0 0 1 1  
 set zero 1 1 1 1 0      0 0 0 0 □

shift      1 1 1 1 0      0 0 0 0 □ } fourth cycle  
 Add (M+A)      0 0 0 1 1  
 set zero 0 0 0 1 0      0 0 1 0      committed

Add.

$$\begin{array}{r}
 11111 \\
 00011 \\
 \hline
 00010
 \end{array}$$

} Reminder

### Floating point Number Representation

- Floating-point representation of a number has two parts
- 1) Mantissa, a signed fixed-point number (may be a fraction or an integer).
- 2) Exponent, designates the position of the decimal point.
- 3) If the mantissa  $m$  & exponent  $e$  then the two parts representation of a number obtained from multiplying  $m$  times a radix  $r$  raised to the value of  $e$ ; thus  $m \times r^e$ .
- Example: Assume a fraction representation and a radix 10. The decimal number 537.25 is represented in a register with  $m = 53725$  and  $e = 3$  and interpreted to represent the floating point number  $.53725 \times 10^3$ .
- Purpose → It increases the range of numbers that can be accommodated in a given register.

## Normalization of Floating point Number:

- A floating point number is normalized if the most significant digit of mantissa is non-zero.
- Mantissa contains the maximum possible number of significant digits.
- It doesn't have a non-zero digit.
- It is represented in floating point by all 0's in the mantissa & exponent.

## Type of Representation of Floating point Numbers:

- IEEE standard for floating point number:
- 1. IEEE Single precision format.
- 2. IEEE Double precision format.

## IEEE Standards for Floating point Numbers:

### 1. IEEE Single precision format (32 bits)

- Developed by IEEE.
- Standard describes both the representation & the way in which the four basic arithmetic operations are to be performed.
- The sign of the number is given in the 1st bit, followed by a representation for the exponent of 8 bits & last 23 bits representation the mantissa.

> Instead of the signed exponent, E, the value actually stored in the exponent field is an unsigned integer.  
 $E' = E + 127$ . This is called the excess-127 format.

>  $E'$  is in the range  $0 \leq E' \leq 255$ . The end values of this range - 0 and 255, are used to represent special values.

> Therefore, the range of  $E'$  for normal value is  $1 \leq E' \leq 254$ .

> This means that the actual exponent, E, is in the range  $-126 \leq E \leq 127$ .



\* Un-normalized value:

|          |                     |                           |
|----------|---------------------|---------------------------|
| 0        | 10001000            | 0010110...                |
| ↑        | ↑                   | ↑                         |
| Sign bit | Excess-127 exponent | 23-bit mantissa fraction. |

value representation =  $+0.0010110\ldots \times 2^9$

\* Normalized value:

|              |                     |           |
|--------------|---------------------|-----------|
| 0            | 10000101            | 0110...   |
| ↑            | ↓                   | ↓         |
| Signed n-bit | Excess-127 exponent | Mantissa; |

$$\text{value representation} = 1.0110 \times 2^6$$

The 32-bit standard representation is called a single precision word because it occupies a single 32-bit word.

Scale factor range:  $2^{-126}$  to  $2^{+127}$

The 23-bit mantissa provides approximately the same precision as a 7-digit decimal value.

A computer must provide at least single precision representation to conform to the IEEE Standard.

### IEEE Double precision format (64 bits)

To provide more precision and range for floating point numbers, IEEE specifies a double precision format.

The double precision format has increased exponent & mantissa range.

The sign of the number is given in the first bit followed by a representation for the exponent of 11 bits & last 52 represent the mantissa.

The 11-bits  $1 \leq E' \leq 2046$  excess -1023 exponent  $E'$  has the range for normal values with 0 and 2047 used to indicate special values, as before. Thus the actual exponent  $E$  is in range  $-1022 \leq E \leq 1023$ , providing scale factor of  $2^{-1022}$  to  $2^{+1023}$ .

Thus 53-bit mantissa provides a precision equivalent to about 16 decimal digit.



### Arithmetic operation on floating point Numbers

#### \* Add/Subtract Rule:

- 1) choose the number with the smaller exponent and shift the mantissa right by a number of steps equal to the difference in exponent.
- 2) set the exponent of the result equal to the large exponent.
- 3) Perform addition/subtraction on the mantissa and determine the sign of result.
- 4) Normalize the resulting value if necessary.

#### \* Multiply rule:

- ① Add the exponents & subtract 127.
- ② multiply the mantissas and determine the sign of the result.
- ③ Normalize the resulting values if necessary.

## Divide Rules :-

- 1) Subtract the exponent of divisor from the exponent of dividend & add 127.
- 2) Divide the mantissas and determine the sign of the result.
- 3) Divide the mantissa of dividend by the mantissa of divisor ~~to~~ and determine the sign the result.
- 4) Normalize the resulting value.

## Methods of Truncation

- 1) Chopping
  - 2) von Neumann Rounding
  - 3) Rounding.
- Chopping :- way to remove the guard bits and mantissa in retained bits.
- Suppose no changes
- Example :- Suppose we want to ~~to~~ truncate a fraction from  $Spx$  to fixed bits by this method. All fractions in the range  $0.b_1 b_2 b_3 \dots 00$  to  $0.b_1 b_2 b_3 \dots 111$  are truncated to  $0.b_1 b_2 b_3$ . The error in the 3-bit result ranges from 0 to 0.000111.
- \* The result of chopping is a biased approximation because the errors are not symmetric about 0.

## \* Von-Neumann Rounding :-

- \* The excess bits are removed and the LSB of the retained bits is set 1 irrespective of the bits which are removed.
- \* Error Range  $\rightarrow$  b/w -1 and +1 In the LSB position of the retained bits.
- \* It is an unbiased approximation.

## Rounding :-

- \* It achieves the closest approximation to the number being truncated and is an unbiased technique.
- \* Procedure  $\rightarrow$  A '1' is added to the LSB position of the bit to be retained if there is a '1' in MSB position of bits being removed. Thus  $0.b_1 b_2 b_3 1 \dots$  is rounded to  $0.b_1 b_2 b_3 + 0.001$  and  $0.b_1 b_2 b_3 0 \dots$  is rounded to  $0.b_1 b_2 b_3$ .
- \* Error Range  $\rightarrow$  The error range is approximately  $\pm \frac{1}{2}$  in the LSB position of the retained bits.

$$\begin{array}{r} 0. b_1 b_2 b_3 \\ + 0.001 \\ \hline 0. b_1 b_2 b_3 \end{array}$$

## Cache Memory

Analysis of a large number of typical programs has shown that the references to memory at any given interval of time tend to be confined with a few localized areas in memory.

The phenomenon is known as the property of Locality of Reference.

- 1) Temporal locality of reference.
- 2) Spatial locality of reference.

Temporal Locality :-

The current instruction which is being fetched may be need again soon. Ex → loop.

Spatial Locality :- The adjacent instruction to current instruction may be need soon. Ex → every program

In view of those two properties - while accessing the main memory for any instruction, instead of fetching just one instruction from main memory, several consecutive instructions are fetched together and stored in cache memory.

Cache read policy :-

Read through policy.  
Read back policy.

Cache write policy :-

Write through policy.  
Write back policy.

Read Through: If the addressed location is not in cache the content of the addressed location is forwarded to the cache as well as sent to the CPU for execution and remaining consecutive locations are copied to cache along with execution by CPU.

Read back: The entire block containing the address location transferred from main memory to cache and then the addressed location is forwarded to CPU for execution.

Write through: Of the content of the ~~address~~ address location needs to be changed during execution, the cache location is changed with the new value as well as the corresponding main memory location also update accordingly.

Write Back: Of the content of the ~~address~~ address location needs to be changed during execution the cache content is updated only and marked for the change. Finally at the termination of for replacement need the entire cache page is copied back to corresponding main memory and the same addressed location pages that in many cases the same page in the system cache are need is repeatedly update during the last change to be update the main memory.

The basic characteristic of cache memory is the fast access time. Therefore, very little or no time must be wasted. Therefore, very little or no time must be wasted when searching for word in the cache. The transfers of data from main memory to cache memory

Q5: reflexed to add a mapping procedure.

Three type of mapping procedures are of prob  
when considering the organization of cache memory:

- 1) Direct mapping
- 2) Associative - mapping

3) Set - Associative mapping



Cache performance → measured in terms of Hit Ratio

### Cache Hit

If the required word is found in cache.

$$\text{Hit Ratio} = \frac{\text{Hits}}{\text{Hit} + \text{miss}}$$

$$= \frac{\text{no of hits}}{\text{Total no of CPU reference}}$$

### Cache miss

If the required word is not found in cache.

$$\text{Miss Ratio} = \frac{\text{miss}}{\text{Hits} + \text{miss}}$$

$$= \frac{\text{no of miss}}{\text{Total no of CPU reference.}}$$

Cache Access Time: → Time required to access word from the cache.

miss Penalty: → Cache miss time penalty  
The time required to fetch the required block from main memory

Average Access Time of CPU =  $\text{Hit Ratio} \times \text{Cache Access Time} + (1 - \text{Hit Ratio}) \times \text{Main Memory Access Time}$

$$= h \times T_c + (1-h) \times T_m.$$

## Formulas

$$\rightarrow \text{Hit Ratio} = \frac{\text{no. of hits}}{\text{Total no. of CPU reference}}$$

(h)

$$\rightarrow \text{Miss Ratio} = \frac{\text{No. of misses}}{\text{Total no. C.P.U. reference}}$$

Average Access Time of CPU:  $h \times T_{ct} + (1-h) \times T_m$

$$\rightarrow \text{Block no.} = \frac{\text{Size of MM}}{\text{Size of Block}}$$

$$\rightarrow \text{Page no.} = \frac{\text{Size of CM}}{\text{Size of page}}$$

$$\rightarrow \text{Set no.} = \frac{\text{Page no.}}{(\text{no. of pages included in one set})}$$

Bus → A bus refers to a communication system that transfer data b/w components inside a computer or b/w computer parts. It's like a highway that allows different parts of a computer to communicate with each other.



# Cache Mapping and 916 type



Fully Addressed  
↓  
Direct

K-way set address  
↓

## Direct Mapping ↗

### Rule

$K \text{ mod } n$   
↓  
Block no  
no of line.

| Cache          | B <sub>0</sub> | B <sub>4</sub> | B <sub>8</sub>  |
|----------------|----------------|----------------|-----------------|
| L <sub>1</sub> | B <sub>1</sub> | B <sub>5</sub> | B <sub>9</sub>  |
| L <sub>2</sub> | B <sub>2</sub> | B <sub>6</sub> | B <sub>10</sub> |
| B <sub>0</sub> | B <sub>3</sub> | B <sub>7</sub> | B <sub>11</sub> |

16 words

0 mod 4

1 mod 4.

line range size = Block size

2 mod 4.

$$\frac{16}{4} = 4$$

0 - 101 3 bit =  $2^3 = 8$

0 - 3 F 4 bit = 4

000, 111, 10



128 words → ④

$\frac{4}{4} = 32$  words

|   |   |
|---|---|
| 5 | 2 |
|---|---|

Block no → Block

offset (Size)

Q) Consider a direct mapped cache of size 32 KB with block size 32 bytes. The CPU generates 32 bits address. The no of bit required for cache indexing and tag bits maps.



$$32 \text{ byte} = \log_2 32 = 5$$



Line = Block size

32 KB

$K \rightarrow 2^{10}$  byte  
 $M \rightarrow 2^{20}$  byte  
 $G \rightarrow 2^{30}$  byte  
 $T \rightarrow 2^{40}$  byte

Advantage and disadvantage of Direct method:



Conflict miss -

space available but miss mapping.

- $k \bmod n$
- $1 \bmod 4$
- $5 \bmod 4$
- $0 \bmod 4$
- $3 \bmod 4$

## Fully associative mapping



disadvantage  $\rightarrow$  (i) Comparison  
(ii) Tag value increase.

## Associative memory



a memory unit accessed by content is called an  
Associative memory.

$\rightarrow$  accessed simultaneously & in parallel.  
↓  
data content (basis of)

- no address give associative memory
- capable of finding an empty unused location.
- (read) → the content of word  
Part of word.
- parallel barrels,
- locate all words which match the specific pattern
- Expensive.





### Internal organization of cell $c_{pj}$



K-way set associative

$$\frac{128}{4} = 32$$

$$\frac{16}{4} = 4$$

(2 level set)

4 way set  
8 way

|       |                                |       |
|-------|--------------------------------|-------|
| $L_0$ | $B_0, B_1, B_2, \dots, B_{30}$ | $S_0$ |
| $L_1$ | $B_0, B_2, B_4, \dots, B_{30}$ | $S_0$ |
| $L_2$ | $B_1, B_3, B_5, \dots, B_{31}$ | $S_1$ |
| $L_3$ | $B_2, B_3, B_5, \dots, B_{31}$ | $S_1$ |

$$= \frac{\text{NO of line}}{K} = \frac{4}{2} = 2.$$

|       |                 |
|-------|-----------------|
| $B_0$ | 0 1 2 3         |
| $B_1$ | 4 5 6 7         |
| $B_2$ | 8 9 10 11       |
| $B_3$ | 12 13 14 15     |
| $B_4$ | 16 17 18 19 20  |
|       | 1               |
|       | 1               |
|       | 124 125 126 127 |

Direct  
 $K \bmod n$   
 $0 \bmod 2$   
 → associative



A 4-way set associative cache memory with a capacity of 16 KB is built using block size of 8 words. The size of physical space word length is 32 bits. The no of bits for the tag field?

$$9 \text{ GB} = 4 \times 2^{30} = 2^{32} \quad 1 \text{ word} = 8 \times 4 \text{ Byte}$$



16 KB



$$\frac{16 \text{ KB}}{2^{32 \text{ bits}}} = \frac{2^{10 \text{ bits}}}{2^{32 \text{ bits}}} = 2^{-22}$$

$$2^{\frac{-22}{2}} = 2^7$$



MM  
4 kB

1 byte = 8 bit

## Booth algorithm

5)

$$110011(-13) \times 110110 (-10)$$

$$\begin{array}{r}
 110011 \\
 \times 110110 \\
 \hline
 000000000000 \\
 1111001101 \\
 000000000000 \\
 000110011xx \\
 11001101x \\
 000000000xx \\
 \hline
 111000000000 \\
 \end{array}$$



$M \rightarrow$  Multiplier  
 $Q \rightarrow$  multiplier  
 $a_0 \rightarrow 0$   
 $A \rightarrow P$   $n \rightarrow$  nof bit

\* How to transfer data from Bus to Bus?

Address

Data

Control

\* Database  $\rightarrow 2^n \rightarrow$  nof bits

Cycle time  $\rightarrow$



Access time  $\rightarrow$

The time memory takes complete one operation (R/W) is the access time.

The minimum time must be double b/w two consecutive subtraction in CPU. Obvious!

Depending on the RAM chosen access time fixed  
Stone depends on cycle time

the derive

Writing (20-25%) and presentation (5%) = 25%

- ~~short~~ ~~middle~~ ~~middle~~

- ~~Wissen und~~ ~~Welt~~ ~~Wissen~~ ~~Wissen~~ ~~Wissen~~ ~~Wissen~~ ~~Wissen~~ ~~Wissen~~

WT  
Pawpaw

2000 (1) 1000

• Usability → user-centered

لهم انت السلام السلام السلام السلام السلام

2. 10. 1965. 100% 100% 100% 100%

at at Visit home Chaperone get

13. *sisterhood* as term ~~against the self~~

stop script window

the black bladder will smell so strongly.

the next seven weeks of your time with us.

Q. Statement with question mark

## Computer Instruction Set

An Instruction  
and a sequence of instruction  
in general, an Instruction have two components →  
Manipulate the stored data  
constitute a programme. In

a) ~~output~~ op-code

b) Operand / Address field.

The op-code tell specifies how data could ~~manipulate~~  
manipulate.

The data items may reside within a CPU or in  
main memory. The purpose operand field is to indicate  
the source ~~destination~~, and on destination of data.

When operation require data from 1 or  
2 addresses as store the result on some one address  
the operand tell will have to contain more than one address

Depending the number of require address  
an instruction can have different format

(1) → Contain

3 address : ADD R<sub>0</sub>, R<sub>1</sub>, R<sub>2</sub> (R<sub>0</sub>) + (R<sub>1</sub>) → (R<sub>2</sub>)

2 address : ADD Acc, R<sub>0</sub> (Acc) ← (Acc) + R<sub>0</sub>

1 address : ADD R<sub>0</sub> ↳ accumulator

ADD

Stack Oriented architecture.

0 - address

some

all instructions stored in main memory, optimized and

at define. So that instruction size is

have performing capability.

also

obviously CPU core architecture has considerable influence on specific instruction format.

Hypothetical Instruction set →

LDA addn

(Acc) ← (addr)

JTA addn

(Acc) ← (addr)

ADD addn

~~(Acc) + (addn)~~

AND addn

(Acc) ~~(Acc) & (addn)~~

CMA

(Acc) ← ~~(Acc)~~

INC

(Acc) ← ((Acc) + ~~(1)~~)

Jmp

address

unconditional branch address

CPU termination.

HLT

address

to each of code. This pattern must be assigned  
op-code encoding.

The simplest way to carry out op-code  
encoding to assigned fixed length of binary pattern  
to each op-code.

A k-bit binary pattern can be repeated  
 $2^k$  times to define op-code. The concept goes more logic code

Encoding.

The op-code of this hypothetical subtraction  
set can be decoded using a 3-to-8 decoder.



(More easily) → (001)

More easily available time

→ result of op

depends on