

After mSE

Page No. 30 Date 10/10

### Division Algorithm (Restoring Algorithm)

M - Divisor

A - Dividend

count - n

- 1 step - Load 2's complement of divisor into m
- load dividend into A , a register's .
- dividend should be  $2^n$  bit positive number.

- 2 step - Shift A, a left 1 bit position

- 3 step -  $A \leftarrow A - M$  Assign

- 4 step - a) If result is non-negative msb of A = 0.  
then 1 is assign to  $Q_0$

- b) If result is negative i.e. msb of A = 1 at the time  $Q_0$  must be assign 0. And restore previous values of A.

- 5 step - repeate 2-4 as many as bit a in Q.

- 6 step - remainder will be in A and Quotient in Q.

Q. 23 figure .

Do the following three steps n times :-

1. Shift A & Q left 1 bit position .
2. subtract M from A and place the answer back in A.
3. If the sign of A is 1 , set  $q_0$  to 0 and add M back to A , otherwise , set  $q_0$  to 1 .

eg.

11) 1000

11  
10

Initially

0 0 0 0 0      1 0 0 0

0 0 0 1 1

shift

0 0 0 0 1      0 0 0 □

Subtract

1 1 1 0 1

Set q0 restore

Q 1 1 1 0

restore

0 0 1 1 0      0 0 0 1 0

1st cycle

2nd cycle

3rd cycle

fourth

cycle

0 0 0 1 0      0 0 0 0 0

0 0 1 0 0      0 0 0 0 0

0 1 1 1 0      0 0 0 0 1

0 0 0 0 1      0 0 0 0 1

0 0 0 1 0      0 0 0 1 0

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

1 1

0 0 0 0 1      0 0 1 0 1

0 0 0 1 0      0 0 1 0 1

1 1 1 0 1      0 0 1 0 1

Q 1 1 1 1      0 0 1 0 1

</div



Page No. \_\_\_\_\_  
Date \_\_\_\_\_

shift      0 0 1 1 1      0 0 0 1      } fifth cycle  
 subtract    1 1 1 0 1      0 0 0 1 1  
 set qo     0 0 0 0 0      0 0 0 1 1 ↑  
 0 0 0 0 0      0 0 0 1 1  
 Reminder      Quotient

$$\text{eg. } 7 \div (-3) \quad (-2) \quad (1)$$

initial      0 0 0 0      1 1 1      }  
 shift      1 1 0 1      1 1      } 1st.  
 subtract    0 0 1 1      1 1      }  
 set qo     0 1 0 0      1 1 1 ↑  
 shift      0 1 0 0 1      1 1 1      } 2nd.  
 subtract    0 0 0 1      1 1 1      }  
 set qo     1 0 1 0 0      1 1 1 0 ↑  
 shift      1 0 1 0 1      1 0 1      } 3rd.  
 subtract    1 1 0 1      1 0 1      }  
 set qo     0 1 1 0      1 0 1 1 ↑

$$\begin{array}{r} 0110 \\ +1001 \\ \hline 1010 \end{array}$$

Restoring Division. (1 - sub) (0 - add) (shift, sub)

2)  $7 \div 3$       0 1 1 1      0 1 1      0 0 1 1      0 1 1  
 Initially.      3      7      1 1 1      0 0 1 1  
 shift      0 0 0 0 A      1 1 1      1 1 1      0 0 1 1  
 subtract    0 0 0 1      1 1 1      1 1 1      0 0 1 1  
 set qo     msb 1 1 1 0      1 1 1      1 1 1      0 0 1 1  
 restore    0 0 1 1 original      1 1 1      1 1 1      0 0 1 1  
 0 0 0 1      1 1 1      1 1 1      0 0 1 1  
 1 1 0      1 1 1      1 1 1      0 0 1 1  
 1st cycle      0 0 1 1      1 1 1      1 1 1      0 0 1 1  
 shift      0 0 1 1      1 0 1      1 0 1      1 0 1  
 subtract    1 1 0 1      1 0 1      1 0 1      1 0 1  
 set qo     0 0 0 0      1 0 1      1 0 1      1 0 1  
 2nd cycle.      1 0 1      1 0 1      1 0 1      1 0 1  
 shift      0 0 0 1      1 0 1      0 1 1      0 1 1  
 subtract    0 0 0 1      0 1 1      0 1 1      0 1 1  
 set qo     msb 1 0 1 0      0 1 1      0 1 1      0 1 1  
 restore    1 1      0 1 1      0 1 1      0 1 1  
 3rd cycle.      0 0 0 1      0 1 1      0 1 1      0 1 1  
 0 1 0      0 1 1      0 1 1      0 1 1  
 reminder      0 1 0      0 1 1      0 1 1  
 quotient

Note - There is no any fixed division for signed bit number.

$$A = S_A \cdot F_A \cdot M_A$$

$$B = S_B \cdot F_B \cdot M_B$$

$$(A + B)$$

1. If  $S_A = S_B = 0$  i.e both are positive numbers. Now we can add  $M_A$  and  $M_B$  when both biased exponents are same or equal.

$$F_A = 128 + 10$$

$$F_B = 128 + 9$$

$F_A - F_B = 1$  then Mantissa should be shifted by 1

Hence shift  $M_A$

The  $M_A$  we can write as  $010 \cdot 1010$

$F_A$  you can write as  $10001001$ .

The alternative of this can be :  $F_A = 128 + 10$ .

$M_B$  can be written as :  $00 \cdot 100010 \dots 000$ .

$$F_B = 10001010$$

2. We can shift any one  $M_A$  or  $M_B$  but its difference between the exponents is large then so many bit shifting is required.

Key :- losing LSB side is better than MSB side. Hence always made lower exponent convert to larger exponent.

$$\text{eg. } A = 414C0000H \quad B = 40F80000H$$

$$A = 0100\ 0001\ 0100\ 1100\ 0000 \quad B = 0100\ 0000\ 1110\ 1011\ 0000$$

| $F_A$    |                                   | $F_B$            |                           |          |                  |
|----------|-----------------------------------|------------------|---------------------------|----------|------------------|
| 0        | 10000010                          | 1001100000000000 | 0                         | 10000001 | 1101011000000000 |
| sign 130 | $M_A = 01 \cdot 10011 \dots 0000$ | sign 129         | $M_B = 01 \cdot 11010000$ |          |                  |

Shift  $M_B$  by 1 bit to weight.  $M_B = 00 \cdot 1110$

$M_B + M_A$ .

$$00 \cdot 1110 + 00 \cdot 10000 = MR = 10 \cdot 10000$$

$$+ 00 \cdot 10011$$

$$10 \cdot 10000$$

$$1.010000 \times 2^1 \text{ normalized form.}$$

normal we shift 1 that is add to get FR.

$$FR = FA + X$$

$$= 10000010 + 1$$

$$FR = \underline{10000011}$$

$$4 \cdot 10 \cdot 0$$

$$4 \times 10^3 + 1 \times 10^2 + 10 \times 10^{-1} \\ ax10^0$$

$$0 | 10000011 | 01000000 \dots 0$$

41A0 0000H  $\rightarrow$  convert it into decimal = 16800

$$A = 0 | 10000010 | 10011000000000000000 \dots 0$$

decimal is 12.75  $12.05625 \cdot 10^{-1101000} = 1.50703125$

$$B = 0 | 10000001 | 11010110000000000000 \dots 0$$

Decimal is 7.25

$$1 \cdot 1101011 = 1.8183254338$$

Decimal = 7.2733025432.

$$12.05625 + 7.2733025432 = 19.3095525432$$

A Decimal value = 12.75  $8 = 7.34875$

$$A + B = 20.09375 \text{ (binary)}$$

$$10100 \cdot 00000$$

Normalized form =  $1 \cdot 010000000 \times 2^9$

Exponent =  $4 + 127 = 131$  which is  $10000011$

Mantissa =  $010000000$

$$0 | 1000011 | 01000000000000000000 \dots 0$$

$$\underline{41A0 0000H} \checkmark$$

Same.

0111 10001

0011  
1100  
1101Page No.  
Date

eg.  $7 \div 3$

$m = 00011$

Dividend = 0111

Initially 0 0 0 0 0 1 1 1

shift 0 0 0 1 1 1 0 0

subtract 1 1 0 1

set qo to 0 1 1 0 0

restore 1 1 1 1 1 1 0 0

shift 0 0 0 1 1 1 0 0

subtract 1 1 0 1

shift 0 0 1 1 1 1 0 0

subtract 1 1 0 1

shift 0 0 1 1 1 1 0 0

subtract 1 1 0 1

shift 0 0 0 0 0 1 1 0 0

subtract 1 1 1 1

shift 0 0 1 1 1 1 0 0

subtract 0 1 1 1

shift 1 1 0 1

add 0 1 0 0

subtract 1 1 1 1

shift 0 1 1 1

add 1 1 1 0

$17 \div 5$

0101  
1010  
11011Page No.  
Date 08 10 24

## Non restoring Division.

stage 1: Do the following two steps n-times.

I. If the sign of A is 0, shift A and Q left one bit position and subtract M from A ; otherwise shift A and Q and add M to A.

II. Now, if the sign of A is 0 set qo to 1 ; otherwise set qo to 0.

stage 2: If the sign of A is 1 add M to A .

Stage 2 is needed to leave the proper positive remainder in A. after the number of sequence of stage 1.

eg.  $17 \div 5$  (non restoring) . (0-sub) (1-add)

Initially

000000 1001

000101

000001 0001 □

1<sup>st</sup> cycle

shift 000001 0001 □

subtract 111011

set qo 011100 0001 □

shift 111000 0010 □

add 000101

set qo 011101 0010 □

2<sup>nd</sup> cycle

shift 111010 0100 □

add 000101

set qo 011111 0100 □

3<sup>rd</sup> cycle

shift 111110 1000 □

add 000101

set qo 000011 1000 □

4<sup>th</sup> cycle

- IV)  $F = \text{all } 0\text{'s}$  and  $M = \text{all } 0\text{'s}$  (it represents a value 0)  
 V) If  $F = \text{all } 0\text{'s}$  but  $M \neq \text{all } 0\text{'s}$  (it gives a non zero value)  
 so represented number is closest to zero and denormalized  
 number.

maximum number representation in single bit precession

|   |   |    |
|---|---|----|
| 1 | 8 | 23 |
| 0 | 8 | 10 |

minimum number

$$0 \quad 00000001 \quad 0 \dots \dots \dots 0$$

$-1^3 \times 1 \cdot M \times 2^{127}$

- The Biased values is 1023 (double precision)

|    |        |         |
|----|--------|---------|
| 1b | 11 bit | 52 bits |
|----|--------|---------|

If you consider 11 bit you can represent 2048 as maximum  
2 are reserved so  $2048 - 2 = 2046 \div 2 = 1023$  is your  
biased.

- i) If  $E = \text{all } 1's$  (2047) all M = all 0's (0 decimal)

Tt represent's infinity value

- II) If  $F = \text{all } 1\text{'s}$   $M \neq \text{all } 0\text{'s}$  (ie  $m \neq \text{nonzero}$ ), it represents not a number.

- III) If  $R = \text{all } 0's$     $M = \text{all } 0's$    represented value is zero.

- IV) If  $F = \text{all } 0's$   $M \neq \text{all } 0's$ , it is represented as value closest to zero which is a denormalized no.

- II) If  $E$  lies between 1 and 2046 (i.e  $1 \leq E \leq 2046$ ) then  
the number represented is  $(-1)^E \times (1 \cdot m) \times 2^{E! - 2023}$

$$\text{Maximum} - 1 \cdot 1 \dots \times 2^{1023} \Rightarrow \pm 1 \cdot 8 \times 10^{308} \text{ cf decimal separator}$$

$$\text{min} - 1 \cdot 0000 \dots \times 2^{-1022} \Rightarrow \pm 2.2 \times 10^{-308}$$

$$\begin{array}{r} 0.6275 \\ \times 2 \\ \hline 1.2750 \end{array}$$

4008  
0.828125

### Exercist

- i) Represent 29.6875 in double precision IEEE 754 format.

11505 · 1011

$$1.11011011 \times 2^4 \rightarrow \text{normalized form}.$$

- Convert CO46B0000000000 H into decimal.

1100 0000 0100 0110 1011 0000 0000 0000 0000 0000  
0000 0000 0000 0000 0000 0000

sign

$$\text{sign} = 1$$
$$\text{exponent} = 1028 - 1023 = 5$$

$$\text{Mantissa} = 0 \times 1 + 1 \times \frac{1}{2} + 1 \times \frac{1}{4} + 0 \times \frac{1}{8} + 1 \times \frac{1}{16} + 0 \times \frac{1}{32} + \frac{1}{64} \times \frac{1}{128}$$

$$+ 1 \times \frac{1}{256} + \dots = 0.8359375 \quad 1.41796875$$

$$\text{Decimal value} = (-1)^1 \times (1 \times 0.\overline{8359375}) \times 2^{9005} = -45.375$$

- Standard / Normal form.

The most significant digit of the significand is nonzero.

14.752.140.75110.11 $\times 2^2$ 1.5111.011  $\times 2^1$ 1.11011  $\times 2^3$ 0.5111.011  $\times 2^2$ 0.111011  $\times 2^4$  $\times 2$ 1.11011  $\times 2^3$ 0.00  $\times 2^7$ 0x2

- 10.19 Expressible Numbers in Typical 32-bit formats.

### Expressible Integers



- IEEE 754-2008

### c) binary 128 format.



- Table 10.3 TFFF 754-format Parameter.

| Parameter     | 32   | 64    | 128    |
|---------------|------|-------|--------|
| exponent bit  | 92-8 | 84-11 | 128-15 |
| Exponent bias | 127  | 1023  | 16383  |

### Additional formats.

- Extended Precision Format
- Extendable Precision Format

### Table 10.4 TEEE formats.

### Table 10.5 , 10.6

$$\begin{array}{l}
 \text{eg: } 5.6875 \\
 101.1011 \\
 \hline
 0.1011011 \quad | \quad 7 \\
 \hline
 0.6875 \\
 \times 2 \\
 \hline
 1.3750 \\
 \times 2 \\
 \hline
 0.3750 \\
 \times 2 \\
 \hline
 0.75 \\
 \times 2 \\
 \hline
 1.50 \\
 \times 2 \\
 \hline
 0.50 \\
 \times 2 \\
 \hline
 1.0 \\
 \times 2 \\
 \hline
 0
 \end{array}$$

$$\begin{array}{l}
 129 \\
 \hline
 1 \quad 1000001 \quad 0110110000000000000000000000000
 \end{array}$$

a) single precision.

Assume 4 bit data  $D_0 D_1 D_2 D_3$

|       |       |       |       |       |       |       |       |
|-------|-------|-------|-------|-------|-------|-------|-------|
| $D_3$ | $D_2$ | $D_1$ | $D_0$ | $P_4$ | $P_3$ | $P_2$ | $P_1$ |
| 1     | 1     | 0     | 1     | $P_4$ | $P_3$ | $P_2$ | $P_1$ |

$P_1 \rightarrow D_3 D_1 D_0 P_1$        $P_2 \rightarrow D_3 D_2 D_0 P_2$

|   |   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 |
|---|---|---|---|---|---|---|---|

$$00111001 = M$$

Hamming code 1110

$$C_1 = 1 \oplus 0 \oplus 1 \oplus 1 \oplus 0 = 1$$

$$C_2 = 1 \oplus 0 \oplus 1 \oplus 1 \oplus 0 = 1$$

$$C_4 = 0 \oplus 0 \oplus 1 \oplus 0 = 1$$

$$C_8 = 1 \oplus 1 \oplus 0 \oplus 0 = 0$$

fig 5.9 william strauss

Hamming code 0001 resulting block 00110110111

CRC 110011

steps.

$$1) P(x) = x^4 + x^3 + 1 = 11001$$

2)  $P(x)$  is in order of 4  $\Rightarrow$  CRC 4

3) Modulo 2 (Ex-OR)

4) Message append CRC 0

$$\begin{array}{r} 11001 \\ ) 110011 \end{array}$$

000000

11001

0000010000

11001

01001  $\rightarrow$  CRC  $\rightarrow$  110011001

- Page No. \_\_\_\_\_ Date. \_\_\_\_\_
- 1) let the message = 110011 generate a polynomial  $P(x) = x^4 + x^3 + 1$  and we have string 11001
- 2) If the generated polynomial is of order 4 then CRC bits are also 4.
- 3) Modulo 2 division i.e Ex-ORing is done.
- 4) While doing step 3 original msg is appended with CRC bits 0.

eg Original msg = 1011011  
Polynomial =  $\frac{11011}{x^3 + x^2 + x^1 + 1}$

~~$$\begin{array}{r} 11011 \\ ) 1011011000 \\ 1101 \\ \hline 01000 \\ 0101 \\ \hline 00011 \\ 0000 \\ \hline 1100 \\ 1100 \\ \hline 0000 \end{array}$$~~

1100100      ①

~~$$\begin{array}{r} 11011 \\ ) 1011011000 \\ 1101 \\ \hline 01000 \\ 0101 \\ \hline 00011 \\ 0000 \\ \hline 1100 \\ 1100 \\ \hline 0000 \end{array}$$~~

1101111      ②

~~$$\begin{array}{r} 11011 \\ ) 1011011001 \\ 1101 \\ \hline 01000 \\ 0101 \\ \hline 00011 \\ 0000 \\ \hline 1101 \\ 1101 \\ \hline 0000 \end{array}$$~~

001100      ③

~~$$\begin{array}{r} 11011 \\ ) 1011011001 \\ 1101 \\ \hline 000110 \\ 000110 \\ \hline 1101 \\ 1101 \\ \hline 0000 \end{array}$$~~

0000 ✓ No error.

Generated polynomial selection criteria.

Generated polynomial should not divided by  $x$  but get divided by  $x+1$ .

- Represent given number in IEEE format
  - Represent 14.75 in IEEE Single Precision format.

Q. Find out decimal equivalent of this : 40B60000 Hex

→ 40B60000  
0100 0000 1011 0110 0000 0000 0000 0000

|                                                                                                  |                                                                                                                                                                                                           |                                                   |
|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------|
| 0                                                                                                | 100 00001                                                                                                                                                                                                 | 011 0110 0000 0000 0000 0000                      |
| sign = 0<br>n9-n2 sign 12g-127<br>E = 2                                                          | $\frac{1}{2}$                                                                                                                                                                                             | Mantisa = $(1 \cdot 0110110) 0000 0000 0000 0000$ |
| Decimal value =<br>$E = 2$                                                                       | $M = \frac{1 \times 1}{2} + 0 \times \frac{1}{2^1} + 1 \times \frac{1}{2^2} + 1 \times \frac{1}{2^3} + 0 \times \frac{1}{2^4} + 1 \times \frac{1}{2^5} + 1 \times \frac{1}{2^6} + 1 \times \frac{1}{2^7}$ | $= 3.6875$                                        |
| $\begin{array}{r} (-1)^{\text{sign}} \times 2^{\text{exponent}} \\ \hline -1.421875 \end{array}$ | $1 \times 1/1 + 0 \times 1/2 + 1 \times 1/4 + 1 \times 1/8 + 0 \times 1/16 + 1 \times 1/32$<br>$+ 1 \times 1/64 = 1.421875$                                                                               |                                                   |
| $(-1)^0 \times 2^2 \times 1.421875$                                                              |                                                                                                                                                                                                           |                                                   |
| $= 5.6875$                                                                                       |                                                                                                                                                                                                           |                                                   |
|                                                                                                  | $(-1)^s \times 1 \cdot M \times 2^{f'-127}$                                                                                                                                                               |                                                   |
|                                                                                                  | $(-1)^0 \times 1 \cdot 1.421875 \times 2^2$                                                                                                                                                               | $= 5.6875$                                        |

Q. Represent C0B60000 H find out equivalent decimal of the no.

$\rightarrow$       1100 0000 1011 0110 0000 0000 0000  
 (1) 10000001 0110110 0000 0000 0000 0000  
 sign exponent = 2 mantissa = 1.421875

$$\text{Decimal value} = (-1)^1 \times 2^2 \times 1.425875 = -5.6875$$

4160800

0100 0001 1100 1000 0000 0000 0000  
0100 0001 1110 1101 1000 0000 0000 0000  
0 | 100000011 | 110110110110000 0000 0000 0000

$$\text{Decimal} = 29.7109375 \times ((-1)^0 \times 2^4 \times 1.856993594) .$$

(c) Considering the eq<sup>n</sup>  $RN = \pm 1 \cdot M \times 2^{E-127}$  sign bit is positive or negative. (Single precision)

### m - fraction

E - excess 127

Q) If  $F = 11111111$      $M = 00000000$     then the represented no. if infinity might be +ve or -ve.

2)  $\exists t \in E = \text{all } l^{15}$  and  $M \neq \text{all } 0\text{'s}$  then not a number

3) If  $1 < F < 254$  then the value represented is  $(-1)^{\pi(F \cdot m)} x_2^{F \cdot m}$

4) If  $E = \text{allo's}$   $m = \text{allo's}$  represents a value zero.

5) If  $F = \text{all } 0's$   $M \neq \text{all } 0's$  represent the value closest to zero i.e some denormalized one.

1. The represented value equals to  $+1 - 1^5 \times 1 \cdot m \times 2^{\frac{E-127}{5}}$

2. T-f It is so sign represents whether a no. is true or -ve significant & it is usually a fraction.

3: Experiment F is in excess of 127 (biased value)

I Case - if  $F = \text{all } 1's$  (254) and  $M = \text{all } 0's$  (0 represents infinity) then or -ve depend on sign.

II - If F = all 1's (255) but M ≠ 0's & in the case the number is represented as Not a Number.

III - If  $F$  lies between 1 and 254 ( $1 < F < 254$ ) in the case the represented number is  $(-1)^S \times 1.m \times 2^{F-127}$  refers to single precision representation

- Meaning  
3 bits represents 8 typical blocks getting from main memory, 5 bits represents point 1 block means (32 bytes) out of 64 blocks in one typical block. 5 bits represent out of 1 block that is from 32 bytes which one byte I need to take.

Processor try to match

- if 3 bits matching - Hit
- if 3 bits not matching - miss

When miss happens cache must reload the required block by removing the existing block by applying certain policy

- If present cache memory block is modify then it must be kept to main memory first at proper position then fetch required new block.

If cache memory block is not modify then → overwritten 2 things

- 1) valid bit
- 2) Dirty bit.

- valid bit is corresponding to if data in main memory loaded in cache memory the valid bit is 1 if nothing is loaded in cache memory then valid bit 0

- Dirty bit - if data from this 32 bytes is modified then it is set to 1. otherwise set to 0.

e.g. D=1 say need to put block in mm before it replaced.

for each block of 32 bytes we can have a valid bit, dirty bit and tag bits to make proper identification

Consider main memory - 16 kB cache memory - 2K block size 32 bytes. How many bit address - 14 bits. Full Associative



14 bit MM address

9 bits of tag (msb) - represent the tag value. if it is matching the data from cache → then cache hit. if not matching → then block of mm is fetched.

Set Associative.

same example above + 2 block per set



14 bit MM address.

Set size is 2 blocks.

How mapping is done?

0-31 of MM can land 0-31 set of cache respectively.

32-63 of MM can land 0-31 set of cache respectively.

$$\text{So } \frac{2^9}{2^5} = 2^4 = 16 \text{ blocks.}$$

means any set may load with any of 16 blocks. Hence tag bit comprises of 4bit. 5bit identify in which set block is address and 5bit which bit to be taken from which block.

As these particular partition set consist of only 2 blocks this type of set associativity is called 2 way set associativity. no. of cores increases → associativity increases

(Q.1) Consider a computing system having 4 way set associative cache of 4 kB capacity,  $m = 64$  KB of block size of 16 bytes. describe cache mapping and explain how mm address interpreted in for referring the cache.

22-10-24

fig 4.9 fig 4.10 Direct Mapping Example.

eg.2) The main memory is of capacity 16 MB, then the cache can hold 64 kB of data. Data transfer between main memory and the cache is in block of 4 bytes each? This means that the cache is organized as  $16\text{K} = 2^{14}$  lines of 4 bytes each. The mm consist of 16 MB, with each byte directly addressable by 24-bit address. Thus for mapping purposes, we can consider mm to consist of 4M blocks of 4 bytes each.

$$16\text{ MB} = 2^4 \times 2^{20} = 2^{24}$$



#### Direct Mapping Summary.

- 1) Address length =  $(s+u)$  bits.
- 2) No. of addressable units =  $2^{stw}$  words or bytes.
- 3) Block size = line size =  $2^u$  words or bytes.
- 4) No. of blocks in main memory =  $2^{stw} / 2^u = 2^s$
- 5) No. of lines in cache =  $m = 2^r$
- 6) Size of tags =  $(s-r)$  bits.

fig 4.11 fully associative Cache organization.

fig. 4.12 , fig 4.13 . fig 4.14 -  $k$ -way SA Cache organization

c. Estimate time required to transfer 5 MB file?

D. What is burst transfer rate?

$$\rightarrow a: 8 \times 512 \times 64 \times 1KB = 262144 \times 512 \times 64 \times 1024 = 268423456 \\ = 2^3 \times 2^9 \times 2^6 \times 2^{10} = 2^{28} = 268435456$$

$$b: \frac{60}{3600} \times 2$$

3600

$$\text{Rotation latency} = \frac{\text{Rotation time}}{2} = \frac{60}{3600} \times \frac{0.0167}{2} = \frac{8.3}{2} \text{ ms}$$

$$e. \text{ average access time} = \text{seek time} + \text{rotational latency} = 8 + 8.3 \\ = 16.3 \text{ ms}$$

$$c: 8 \times 64 \text{ sectors} \times 1KB / \text{sector} = 512 \text{ KB} \rightarrow \text{each cylinder consists of 8 sectors} \\ \text{such 10 cylinders are required to transfer 5 MB file.} \\ 8 \times 60 = 133.3 \text{ ms. to read 8 different tracks of cylinder.}$$

$$\text{The access time} = \frac{\text{seek time}}{\text{shift required}} + \underbrace{8 \times (8.3 + 133.3 + 1.5)}_{\text{avg access time}} + \text{tot access time} \\ = 1437.5 \text{ ms. time required to transfer 5 MB file}$$

$$d. \text{ Burst rate} = \text{revolution / sec} \times \text{sectors / revolution} \times \text{bytes / sector} \\ = \frac{3600}{60} \times 64 \times 1KB = 3.84 \text{ mbps}$$

Q. Consider a single platter disk with following parameter.

Rotational speed is 7200 rpm

no. of tracks on 1 side of platter = 30000

no. of sectors per track = 600

seek time = 1 ms. for every 100 tracks travels. let the disk receive a request to access random sequence sector on random track. Assume the disk head start at track 0.

A. What is avg seek time?

B. —? Rotational latency?

C. What is transfer time for sector?

D. What is total avg time to satisfy request?

→ a. Head start at track 0, then the moment seek time is 0 if requested track is 29999 track then seek time is the time to travel these tracks.

$$\text{avg seek time} = \frac{29999}{2} = \frac{14999.5}{100} = \frac{149.995}{149.995} \text{ ms}$$

$$b. \text{ avg Rotational latency} = \frac{60}{7200} = \frac{8.333}{2} \text{ ms.} = \frac{4.1665}{4.167} \text{ ms}$$

$$c. \frac{8.333}{600} = 0.01389 \text{ ms transfer time.}$$

$$d. 149.995 + 4.167 + 0.01389 = 154.17589 \text{ ms}$$

• Error Correcting code is Hamming code.

$$2^k - 1 \geq M + K$$

$$(M=8) \quad K=3 : 2^3 - 1 \leq 8+3$$

$$K=4 : 2^4 - 1 > 8+4$$

a) i. F0010 (Hex)

Binary - 1111 0000 0000 0001 0000

tag - 1111 = 15

offset (4 bits) = 0000 = 0

index (12 bits) = 0000 0000 0001 = 1

ii. 01234

Binary - 0000 0001 0010 0011 0100

offset (4 bits) = 0100 = 4

index (12 bits) = 0001 0010 0011 = 12 (29) decimal Hex(12)

tag (4 bits) = 0000 = 0

iii. CABBE

Binary - 1100 1010 1011 1011 1110

offset = 1110 = 14

index = 1010 1011 1011 = 2747 decimal ABB Hex

tag = 1100 = 12

b) Given any two main memory addresses with different tags that maps to the same cache slot for a direct mapped cache.

c) for the main memory addresses of F0010 and CABBE give the corresponding tag, cache set and offset value for a full associative cache.

i) F0010

Binary - 1111 0000 0000 0001 0000

offset (4 bits) = 0000 = 0 decimal

tag (16 bits) = 1111 0000 0000 0001 = F001 Hex

= 61441

ii) CABBE

offset = 1110 = 14

tag = 1100 1010 1011 1011 = 51899

iii) For the main memory addresses of F0010 and CABBE, gives the corresponding tag, cache sets and offset value for a two way associative cache.

→ no. of Cache line =  $2^{12} = 4096$ .

No. of sets = 4096 = 2048

sets =  $\log_2(2048) = 11$  bits set index.

i) F0010

offset = 0000 = 0

set index (11 bits) = 000 0000 0001 = 1

tag = 1111 0 = 30

ii) CABBE

offset = 1110 = 14

set = 010 1011 1011 = 699

tag = 1100 1 = 25

$$= \text{Instructions} \times \frac{\text{Misses}}{\text{program}} \times \frac{\text{Miss penalty}}{\text{instruction}}$$

eg. Average Access Time.

Hit time is also important for performance. Average memory access Time (AMAT)

$$\text{AMAT} = \text{Hit time} + \text{Miss rate} \times \text{Miss penalty}$$

Example - CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, 1 - cache miss rate = 5%  
 $\rightarrow \text{AMAT} = 1 + 0.05 \times 20 = 2 \text{ ns}$ .  
 2 cycles per instructions.

- Associative Cache
  - full associative
  - n-way set associative

Spectrum of Associativity - figure.

Set Associative Cache Organization - figure.

Q. Assume that you have 4 Gbytes of mm at your disposal. 1 GB of the 4 GB has been reserved for process page table storage. Each page table entry consists of:

- A physical frame number.
- 1 valid bit
- 1 dirty bit
- 1 LRU status bit

Virtual addresses are 32 bits, Physical address are 26 bits, The page size is 8 KB. How many process page tables can fit in the 1 GB space?

$1 \text{ GB} = 2^{30} \text{ bytes}$  (  $2^3 \times 2^{10} = 2^{13}$  ) offset associated with a VA / PA.  
 Thus remaining 19 bits used to represent VPN & serves as PT.

Thus each PT has  $2^{19}$  entries

Next, each PA is 26 bit. (Physical address).

13 bits of PA come from offset, other 13 from PT lookup.

$$\therefore \text{of page table} = \frac{2^{30} \text{ bytes available}}{2 \text{ bytes}} \times \frac{(1 \text{ PT entry})}{2^{19} \text{ PT entries}} = 2^{10} \text{ or } 1024$$

Given that each PT entry consists of a PFN, a valid bit, a dirty bit, and a LRU bit, each PT entry will be 2 bytes.

Q. Consider a magnetic disk drive with 8 surfaces then 512 tracks per surface, 64 sectors per track and each sector size is 1 KB. The average seek time is 8ms, track-to-track is 1.5 ms. The drive rotates at speed 3600 rpm.

Successive tracks in the cylinder can be read without head moment.

- What is the disk capacity?
- What is the average access time? Assume this file is stored in successive sectors and tracks of successive cylinders, starting at sector 0 track 0 of certain cylinder i.

## Unit 3: Memory Hierarchy

Hamacher Chapter 8

figure 8.1 Connection of the memory to the processor.

figure 8.14 Memory Hierarchy.



Reading assignment  
Differentiate among different levels of cache that are in intel i7 and AMD Ryzen processor.  
Diff - total capacity, block size, replacement policy.

- Method of Accessing Units of Data
- Characteristics of Memory System

sequential direct      random      associative.

Capacity performance -  
Three performance parameter are used  
1) Access time    2) Memory cycle limit    3)

## Memory

When block available - cache hit

when block not available - cache miss.

which constraints are there in TLB

TLB is subset of page table.

## Q. Who decide block size → architecture

Write down or mentioned the block size in AMD.

William Stealing!

figure 4.4 Cache and main memory.

4.3

- a) single cache   b) Three level cache organization.

## figure 4.5 cache read operation.

Q. we are using what inclusive and exclusive.

05 - system software that manages performance of system.

Page No. \_\_\_\_\_  
Date 15 10 24

## characteristics of Memory Systems.



A process of throwing out the block that is not required is called block eviction.

Block size is fixed designed by architecture.

## Method of Accessing Units of Data.

## Capacity and Performance -

fig 4.1 The memory hierarchy.

fig 4.3 cache and Main Memory.

fig 4.4 cache / Main Memory structure.

fig 4.5 cache Read operation.

chapter 5 - last and fast:

- Direct Mapped Cache  
- only one choice

- Tags and valids bits

cache example - 8 blocks, 1 word/block, direct mapped  
- initial state.

| EX. | Index | V   | Tag | Data        |
|-----|-------|-----|-----|-------------|
|     | 000   | N Y | 10  | mem [10000] |
|     | 001   | N   |     |             |
| ✓   | 010   | N Y | 10  | mem [10010] |
|     | 011   | N Y | 00  | Mem [00011] |
|     | 100   | N   |     |             |
|     | 101   | N   |     |             |
|     | 110   | Y   | 10  | Mem [10110] |
|     | 111   | N   |     |             |

| 1) | word addr          | binary addr      | Hit/miss    | cache block |
|----|--------------------|------------------|-------------|-------------|
| 2) | word 22<br>addr 26 | 10 110<br>11 010 | Miss<br>Hit | 110<br>010  |

Both are available together

| 3) | 22 | 10 110 | Hit | 110 |
|----|----|--------|-----|-----|
|    | 26 | 11 010 | Hit | 010 |

| 4) | word addr | binarg addr | Hit/miss | cache block |
|----|-----------|-------------|----------|-------------|
|    | 16        | 10 000      | miss     | 000         |
|    | 3         | 00 011      |          | 011         |
|    | 16        | 10 000      | Hit      | 000         |

| 5) | 18 | 10 010 | miss | 010 |
|----|----|--------|------|-----|
|    |    |        |      |     |

- Address Subdivision figure. → Hennessy Paterson book.

eg larger Block size.

64 blocks, 16 bytes / blocks.

To what block number does address 1200 map?  
→ Block address =  $[1200 / 16] = 75$ 

Block no. = 75 modulo 64 = 11



- Block Size Considerations.

- Cache Misses

- Write - Through } — write Strategy  
Write - Back.

fig.



- Measuring Cache Performance

\* Components of CPU time

- program execution cycles :- includes cache hit time
- Memory stall cycles :- mainly from cache misses

Memory stall cycles = Memory accesses × miss rate × Miss penalty  
program

- fig 4.6 Typical cache organization.

- Table 4.2 Elements of Cache Design.

- Cache Addresses -

Virtual Memory

- fig 4.7 Logical and Physical Caches.

Reading Assign - Intel core i7 and AMD find what used?

- Table 4.3 Cache sizes of some processors.

- Mapping Functions.

We continue our introduction of caches by answering the 4 questions for the first level of the memory hierarchy:

- Where can a block be placed in upper level? (block placement)
- How is a block found if it is in the upper level? ( $\rightarrow$  identification)
- Which block should be replaced on miss? ( $\rightarrow$  replacement)
- What happens on a write? (write strategy).

- 1) direct map

(Block address)

Modulo

(No. of blocks in cache)

- full associative

modulo

(No. of sets in cache)

anywhere

- set associative

Modulo

(No. of sets in cache)

In order to pick something from main memory we employ something called mapping. Three types of mapping -

- 1) Direct mapping
- 2) Full associative
- 3) Set associative

Direct map

- a. Let main memory is 16 kB and cache memory is 2 kB and cache memory block size is 32 bytes. find out total no. of blocks in main memory and cache memory.
- MM 16 kB  $\sim 2^{14}$   
 CM 2 kB  $\sim 2^{11}$
- ↳ Block 32 bytes  $\sim 2^5$

$$\text{No. of blocks in main memory} = \frac{2^{14}}{2^5} = 2^9 = 512 \text{ blocks.}$$

$$\text{No. of blocks in cache memory} = \frac{2^9}{2^5} = 2^4 = 16 \text{ blocks}$$

- 32 byte block loaded into cache memory. Processor will output memory address w.r.t main memory block. That block is loaded into cache memory. Which part of main memory block will be loaded into cache memory is called mapping.

A block in cache memory will accommodate  $2^9 = 2^3 = 8$  typical block cache store one of the things  $2^6$  came from memory. These 8 typical blocks are 64 blocks apart. Then cache memory can accommodate any of the 64 blocks. is identify by 3 bits ( $2^3$ )?

These typical 8 blocks out of which one is in cache memory is identify by 3 bits. The main memory is 16 kB processor will output 14 bit address out of that 3 bits are tag bits 6 bits are block (index) remaining 5 bits are word or offset.  $3 \text{ bit} | 6 \text{ bit} | 5 \text{ bit} \Rightarrow 14 \text{ bit}$

Tag bit      block index bit      word / offset bit

### Full Associative Mapping Summary.

- 1) Address length =  $(s+w)$  bits.
- 2) No. of addressable units =  $2^{s+w}$  words or bytes.
- 3) Block size = line size =  $2^w$  words or bytes.
- 4) No. of blocks in mm =  $2^{s+w} / 2^w = 2^s$
- 5) No. of lines in cache - undetermined.
- 6) size of tag =  $s$  bits.

### Set Associative Mapping Summary.

- 1) Address length =  $(s+w)$  bits.
- 2) No. of addressable units =  $2^{s+w}$  words or bytes.
- 3) Block size = line size =  $2^w$  words or bytes.
- 4) No. of blocks in mm =  $2^{s+w} / 2^w = 2^s$
- 5) No. of lines in set =  $k$
- 6) No. of sets =  $v = 2^d$
- 7) No. of lines in cache =  $m = kv = k * 2^d$
- 8) Size of cache =  $k * 2^{d+w}$  words or bytes.
- 9) size of tag ( $s-d$ ) bits.

#### Q.2) two way SA.

|        |         |        |
|--------|---------|--------|
| 9 bits | 13 bits | 2 bits |
|--------|---------|--------|

$$\text{blocks in mm} = \frac{2^{24}}{2^{22}} = 2^{24-22} = 2^2$$

$$\text{tag bit} = (s-d) = (22-13) = \underline{\underline{9}}$$

$$Q.1) C = 4 \quad MM = 64KB \quad \text{block size} = 16$$

$$= 64 \times 2^{10} = 2^{16}$$

$$\text{block size} = 2^4$$

$$\text{block in mm} = \frac{2^{16+4}}{2^4} = 2^{16+w} = 16. \text{ addressable units.}$$

$$s=12 \quad 16-12 = 4 \text{ bits offset.}$$

$k = 4$  ways associative.

$$d=6$$

$$(12-6) = 6 \text{ bit tag.}$$



Q.3 Consider two-way set associative cache line offsets.  
Size 8 kB, 64 MB of MM & byte addressable  
format of MM address.

$$2^{20} \times 2^6 = 2^{26}$$

26 bit address.

22/10/24

Q.4 Consider a machine with a byte addressable main memory of 2<sup>16</sup> bytes & block size of 8 bytes. Assume that a direct mapped cache consisting of 32 lines is used with the machine.

a) How is a 16 bit memory address divided into tag, line number and byte number?

$$\rightarrow MM = 2^{16} \text{ bytes} = (64 \text{ kB})$$

Block size = 8 bytes

Cache = 32 lines (directed mapped)

i) byte number (block offset) =  $\log_2(8) = 3 \text{ bits}$

ii) line number =  $\log_2(32) = 5 \text{ bits}$

iii) Tag =  $16 - (3+5) = 8 \text{ bits}$

Total no. of bits = 16 bits

|        |        |        |
|--------|--------|--------|
| 8 bits | 5 bits | 3 bits |
|--------|--------|--------|

b) Into what line would bytes with each of the following addresses be stored?

1. 0001 0001 0001 011      3. 1101 0000 0001 1101

Tag = 0001 0001 (8 bits)       $\rightarrow$  Tag = 1101 0000

index (5 bits) = 00011      idx = 00011

offset (3 bits) = 011      offset = 101

2. 1100 0011 0011 0100      4. 1010 1010 1010 1010

Tag = 1100 0011      Tag = 1010 1010

index = 00110      index = 10101

offset = 100      offset = 010

c) Suppose the bytes with address 0001 1010 0001 1010 is stored in the cache. What are the addresses of the other bytes stored along with it?

→ Since size of blocks is 8 bytes, when the bytes at address 0001 1010 0001 1010 is stored in cache, the following addresses in the same block will also be stored with it.

$$\begin{array}{l} 0001 \ 1010 \ 0001 \ 1010 \ (\text{given}) \\ 0001 \ 1010 \ 0001 \ 1010 \ 1010 \\ 0001 \ 1010 \ 0001 \ 1100 \\ 0001 \ 1010 \ 0001 \ 1101 \end{array}$$

$$\begin{array}{l} 0001 \ 1010 \ 0001 \ 1110 \\ 0001 \ 1010 \ 0001 \ 1111 \\ 0001 \ 1010 \ 0010 \ 0000 \\ 0001 \ 1010 \ 0010 \ 0001 \end{array}$$

d) How many total bytes of memory can be stored in the cache?

→ The cache has 32 lines, and each line can store 1 block of 8 bytes. Therefore, the total memory stored in the cache is:  $32 \times 8 = 256$  bytes.

e) Why is the tag also stored in the cache? (read from book)

→ The tag is stored in the cache to uniquely identify which blocks of main memory is currently stored in the corresponding cache line. Since multiple blocks from main memory can map to the same cache line the tag helps distinguishing which specific block is currently stored in that line.

Q.5) Consider a memory system that uses a 32 bit address to address at the byte level, plus a cache that uses a 64-byte line size.

a) Assume a direct mapped cache with a tag field in the address of 20 bits. Show the address format and determine the following parameters: no. of addressable units, no. of blocks in main memory, no. of lines in a cache, size of tag.

→ DM - address is divided into 3 parts

$$\text{offset} = \log_2(64) = 6 \text{ bits}$$

$$\text{index} = 32 - 20 - 6 = 6 \text{ bits}$$

Tag - 20 bits



$$\text{No. of addressable units} = 2^{32} = 4,294,967,296 = 32$$

$$\text{No. of blocks in mm} = \frac{2^{32}}{2^5} = 2^{26} = 67,108,864$$

Size of tag = 20 bits.

b) Assume an associative cache show the address format and determine the following parameters: number of addressable units, number of blocks in mm, no. of lines in cache, size of tag.

total add - 32 bit

offset - 6 bits

$$\text{tag} = 32 - 6 = 26 \text{ bits}$$



c) Assume a four way set associative cache with a tag field in the address of 9 bits. Show the address format and determine the following parameters. i) no. of addressable units, ii) no. of blocks in mm, iii) no. of lines in set, iv) no. of sets in cache, v) no. of lines in cache, vi) size of tag.

2) no. of blocks in mm 3) no. of lines in set 4) no. of sets in cache 5) no. of lines in cache 6) size of tag

1) addressable bit = 32

2) Tag bit = 9 bit

3) index bit = remaining bits after counting the tag + offset.

4) No. of lines in cache =  $2^n$

5) Remaining bit =  $32 - 9 - 6 = 17$  bits for index offset

$$6) \text{No. of set} = 2^{17} = 131,072$$

$$7) \text{No. of line in cache} = 2^{17} \times 4 = 2^{19} = 524,288.$$



Q.6) Consider a computer with the following characteristics: total of 1 MB of main memory; word size of 1 byte; block size of 16 bytes; and cache size is 64 kB.

a) For the main memory addresses of F0010, 01234 and CABBE, give the correct corresponding tag, cache line & word offsets for a direct-mapped cache.

→ mm = 1 MB ( $2^{20}$  bytes)

word size = 1 byte

block size = 16 bytes ( $2^4$  bytes)

cache size = 64 kB ( $2^6 \times 2^{10} = 2^{16}$  bytes).

block size =  $\log_2(16) = 4$  bits → offset

No. of lines =  $\frac{\text{Cache size}}{\text{Block size}} = \frac{2^{16}}{2^4} = 2^{12}$  ← index

Thus we need 12 bits of index to identify cache lines.

finally remaining bits of address lines will be tag 4 bits

a)



## Pipelining Strategy

- Pipelining Analogy
- + fig 14.9 Two stage instruction pipeline
- + Additional Stages :- FT, DI, CO, FO, FT, WB



### Page order

|   |              |
|---|--------------|
| - | Inorder      |
| - | out of order |
| - | (00)         |
| = | (000)        |

- fig 14.10 Timing Diagram for instruction pipeline operation.
- fig 14.11 , 14.12 , 14.13 , 14.14 , 14.15 , 14.16 , 14.18 , 14.19
- Pipeline Performance

$$T = \max [T_i] + d = T_m + d \quad | \quad 1 \leq i \leq k$$

- Speedup Time of the pipeline can written as-

$$T + (n-1) \times \frac{T}{k}$$

$$\text{Time regular} = n \times T$$

$$\text{The speedup} = \frac{nT}{T + (n-1) \times \frac{T}{k}} = \frac{n}{1 + (n-1) \frac{1}{k}} = \frac{nk}{n-1+k}$$

### - fig Pipeline Hazards

Data Hazard RAW  
WAR  
WAW

Control Hazard

Structure Hazard

e.g.

Total answersheet 500 for 1 particular subject, we have 5 questions on different topics. Then one examiner takes 5 min per question so 25 min / answersheet. Total time 12500 min. If 5 examiners are working in pipeline how the show will look like.

|    | E1   | E2   | E3   | E4 | E5 |
|----|------|------|------|----|----|
| t1 | A1-1 |      |      |    |    |
| t2 | A2-1 | A1-2 |      |    |    |
| t3 | A3-1 | A2-2 | A1-3 |    |    |
| t4 |      |      |      |    |    |

1<sup>st</sup> examiner check only one question and give to another examinee so 5 examiner check only question each. After 25 min A<sub>1</sub> will be completely accessed. Therefore every 5 min other paper will complete.

$$\text{Total time required} = 25 + 5 \times 499 = 2020$$

### Data dependencies of four types

1. Flow dependence denoted as S<sub>1</sub> → S<sub>2</sub>
2. Antidependence : output of S<sub>2</sub> overlaps the input of S<sub>1</sub> and S<sub>2</sub> follows S<sub>1</sub> in program order.
3. Output dependence : produce same output variable
4. I/O dependence :

### e.g. Consider the following code fragment of four instruction.

|                  |             |                      |
|------------------|-------------|----------------------|
| S <sub>1</sub> : | Load R1, A  | / R1 ← memory(A) /   |
| S <sub>2</sub> : | Add R2, R1  | / R2 ← (R1) + (R2) / |
| S <sub>3</sub> : | Move R1, R3 | / R1 ← (R3) /        |
| S <sub>4</sub> : | Store B, R1 | / Memory(B) ← (R1) / |

S<sub>2</sub> is flow dependent on S<sub>1</sub>

S<sub>3</sub> is antidependent on S<sub>2</sub>

S<sub>3</sub> is output-dependent on S<sub>1</sub>

Note that dependence is a partial ordering relation.  
 $s_1$  and  $s_4$  are totally independent.

- b) Consider a code fragment involving I/O operation:

$s_1$ : Read(4), A(1) / Read array A from file 4 /  
 $s_2$ : Process / process data /  
 $s_3$ : Write(4), B(1) / write array B into file 4 /  
 $s_4$ : close(4) / close file 4 /

- c) Answer the following questions on program flow mechanisms and computer models:

a)  $s_1$ : load R1, 1024 /  $R_1 \leftarrow 1024$  /  
 $s_2$ : load R2, M(10) /  $R_2 \leftarrow \text{memory}(10)$  /  
 $s_3$ : Add R1, R2 /  $R_1 \leftarrow (R_1) + (R_2)$  /  
 $s_4$ : Store M(1024), R1 /  $\text{memory}(1024) \leftarrow (R_1)$  /  
 $s_5$ : Store M(R2), 1024 /  $\text{Memory}(64) \leftarrow 1024$  /

Where  $(R_i)$  means the content of register  $R_i$  and  $\text{memory}(10)$  contains 64 initially.

- a) Draw a dependence graph to show all the dependences  
b) Are there any resource dependences if any.

a)  $s_3$  is flow dependent  $s_2$  &  $s_4$

$s_4$  is flow dependent on  $s_3$

$s_5$  is flow dependent on  $s_2$

$s_3$  is output dependent on  $s_1$

b) Yes there is resource dependence if <sup>only one</sup> any of the 2 ~~any of the 2~~ instruction use load store<sup>op</sup> is available in CPU

Control Hazards.

Branch Prediction ↗ next page

various techniques can be used to predict whether a branch will be taken:

- 1. Predict never taken
- 2. always taken
- 3. by opcode

These approach are static  
do not depend on execution history  
up to the time of the conditional branch.

- 1. Taken / not taken switch
- 2. Branch history table

These approach are dynamic  
depend on execution history.

fig 14.20

Pipelining and ISA design

- RISC-V designing for pipelining.

Forwarding (aka Bypassing)



Load - Use Data Hazard

ld x1, 0(x2)

Code Scheduling to Avoid stalls.

Control Hazard.

• stall on Branch., to avoid control hazard need proper branch prediction

• Branch prediction RISC-V Pipelined Datapath.

• # Pipeline registers.

Pipeline operation ↗ single line

multi line → not for exam

TF for load, store

EX for load

mem for load

WB for load

- RISC style characteristics.

- CISC style characteristics.

Unit 4 - I/O organization read own.

### Data Bus.

Address Bus Control Bus.

fig. 3.16 Bus Interconnection Scheme

fig 3.17 Multicore Configuration using QPI.

• Point-to-point interconnect.

• Quick Path interconnect (QPI) - Multiple direct

layered protocol  
packet transfer

fig. 3.18, 3.21, 3.22, 3.23, 3.24, 3.25

The Peripheral Component Interconnect (PCI).

Table 3.2.

7.1 Bus Structure. (fig. 7.2)

7.2.1 Synchronous Bus (fig. 7.3). (fig 7.4)

7.2.2 Asynchronous Bus (fig 7.6)

7.3 Arbitration (fig 7.9)

7.4 Parallel Interface (fig 7.10), 7.13)

7.4.2 Serial Interface

7.5 Interconnections

7.5.1 Universal Bus (USB) (fig 7.17)

7.5.2 Firewire

7.5.3 PCI (fig 7.18) (Table 7.1)

7.5.4 SCSI Bus

7.5.5 SATA 7.5.6 SAS 7.5.7 PCI Express (fig 7.20)

Unit 6.

09/11/24

### Multiple Processor Organization.

17.1 fig. A Taxonomy of Parallel Processor Architectures

17.2 fig. Alternative Computer Organization.

#### Symmetric Multiprocessor (SMP).

fig 17.4 Generic Block Diagram of a Tightly Coupled multiprocessor.

fig 17.5 Symmetric multiprocessor organization.

Disadvantages of the bus organization.

Multiprocessor Directory protocol not there for exam. (as no part)  
snoopy protocol.

Write Update.

#### Multithreading & chip Multiprocessor

Definitions of Threads and Processors

Implicit and Explicit Multithreading

Approaches to Explicit Multithreading.

Fig 17.7, 17.8, Table 17.2, 17.9 fig., 17.10 fig., 17.11, 18.1 clusters. , Parallelizing computation.

clusters compared to SMP. , Motivation, Numa pros & cons.  
effective applications for multicore processor.

SIMD.

Heterogeneous Multicore Organization

Heterogeneous system Architecture.

8.13  
1.9  
1.8  
1.7  
1.6  
1.5  
1.4  
1.3  
1.2  
18.2  
have to  
look just  
at 1.5  
1.6  
1.7  
1.8  
1.9  
1.10