

2021-22

CSE-305

Section-B

Tabib

5 (a)

- (i) Simplicity favors regularity.  
(!)
- (ii) Smaller is faster.
- (iii) Simplicity favors regularity.
- (iv) Good design requires good compromise.
- (v) Make the common case faster.

5(b)

(i) out of syllabus ??

$$\frac{100}{60} = \frac{98}{n} + 2$$

$\Rightarrow n = -270$ , so not possible . } out of syllabus ??

(c) out of syllabus ?

6- (a)

fib: addi \$sp, \$sp, -12  
sw \$ra, 8(\$sp)  
sw \$s0, 4(\$sp)  
sw \$a0, 0(\$sp)  
bgt \$a0, \$zero, L1  
add \$v0, \$zero, \$zero  
jr ret

L1: addi \$t0, \$zero, 1  
bne \$t0, \$a0, rec  
add \$v0, \$zero, \$t0  
jr ret

rec : subi \$a0, \$a0, 1  
jal fib  
add \$s0, \$v0, \$zero  
subi \$a0, \$a0, 1  
jal fib  
add \$v0, \$v0, \$s0

`TCEt :`  
 lw \$a0, 0(\$sp)  
 lw \$s0, 4(\$sp)  
 lw \$ra, 8(\$sp)  
 addi \$sp, \$sp, 12  
 jr \$ra

6. (b)

- (i) first 4 bit of address would be same as PC address.  
So, not possible.
- (ii) not possible too. out of range covered by 16 bit.

6. (c)



for multiplication:

The product is initially at the left half of the product register (64 bits). Multiplier is at the second half. While multiplication process starts, after multiplying LSB of multiplier with the multiplicand, the multiplier is right shifted each time and the updated product's rightmost bit also shifted 1 bit right.

for division:

In remainder register, first 32 bits are initially 0 and the right-most 32 bits are populated with the dividend. It is left shifted each time. And compare the remainder with the divisor. If remainder-divisor > 0, the this subtracted <sup>(left now)</sup> result is stored on that very left side and a 1 is placed as quotient at the rightmost bit of 64 bit remainder register's right half. Otherwise, restore the past remainder on the left and place 0 at the rightmost bit of right side of 64 bit reg.

→ out of syllabus

g. a

ii

P<sub>r</sub>:

total clock cycle required =

$$\sum_{\text{class } x} (CPI) * \text{instc.cnt}$$

$$= 3 \times (200,000 \times 25\%) + (3 \times 200,000 \times 60\%) \\ + 2 \times (200,000 \times 15\%)$$

$$= 57000$$

P2:

total dock cycles required:

$$= 2 \times (200,000 \times 25\%)$$

$$+ 4 \times (200,000 \times 60\%) +$$

$$1 \times (200,000 \times 15\%)$$

$$= 61000$$

(11)  
=

P1: CPU time =  $\frac{\text{tot cycle}}{\text{clock rate}}$

$$= \frac{57000}{2.7 \times 10^9}$$

$$= 2.1 \times 10^{-9} \text{ s}$$

P2: CPU time =  $\frac{\text{tot cycle}}{\text{clock rate}}$

$$= \frac{61000}{3.5 \times 10^9}$$

$$= 1.74 \times 10^{-9} \text{ s}$$

$\underline{P_2}$  is faster.

(ii)

To run two times faster,

CPU time for  $P_2$

$$= \frac{1.74 \times 10^4}{2} = \frac{2 \times 200,000 \times 25\% + 4 \times 200,000 \times 60\% + 3 \times 200,000 \times 15\%}{3.5 \times 10^9}$$

$$\Rightarrow x = -9.18$$

not possible

(b) actual exponent in floating point representation:

exp-bits

$$= (2^8 - 1) - (2^7 - 1)$$

$$= +127$$

Highest number to be represented  $\approx 1.11\dots \times 2^{+127}$

$\therefore$  highest number  $\approx 2.0 \times 2^{+127}$

$$\approx 3.4 \times 10^{38}$$

So, the  $n$  should be  $\approx 2.4 \times 10^{38}$

### 8. (c) Addressing modes:

Immediate addressing: operand is a constant within the instruction itself.



Register addressing: operand is a register.



Base Addressing: operand is at the memory location whose address is the sum of a register and a constant in the instruction,



PC-Relative Addressing:

branch address is the sum of PC and a constant in the instruction



$\Sigma$  Pseudo-direct addressing : where the jump addn is the 26 bits of the instruction concatenated with the upper 4 bits of the PC.



Section B

2020-21

10  
10  
a

- Design for Moore's Law
- Use Abstraction to Simplify Design
- Make the Common Case Fast
- Performance via Parallelism
- Performance via Pipelining
- Performance via Prediction
- Hierarchy of Memories
- Dependability via Redundancy

} Brief description  
of each

10  
10  
b

A

| $S_3$ | $S_2$ | $S_1$ | $S_0$ | Required Functions     | X     | Y    | $Z_0$ |
|-------|-------|-------|-------|------------------------|-------|------|-------|
| 0     | 0     | 0     | 0     | $F = A$                | A     | 0    | 0     |
| 0     | 0     | 0     | 1     | $F = A - 1$            | A     | 1    | 0     |
| 0     | 0     | 1     | 0     | $F = A + B + 1$        | A     | B    | 1     |
| 0     | 0     | 1     | 1     | $F = A + B$            | A     | B    | 0     |
| 0     | 1     | 0     | 0     | $F = A - B$            | A     | B    | 1     |
| 0     | 1     | 0     | 1     | $F = A - B - 1$        | A     | $B'$ | 0     |
| 0     | 1     | 1     | 0     | $F = A + 1$            | A     | 0    | 1     |
| 0     | 1     | 1     | 1     | $F = A$                | A     | 1    | 0     |
| 1     | 0     | 0     | X     | $F = A'$               | $A'$  | 0    | 0     |
| 1     | 0     | 1     | X     | $F = AB'$              | $AB'$ | 0    | 0     |
| 1     | 1     | 0     | X     | $F = A \odot B$ [XNOR] | A     | $B'$ | 0     |
| 1     | 1     | 1     | X     | $F = A \vee B$ [OR]    | A     | 0    | 0     |

OR B

|                                                    |    |    |    |    |
|----------------------------------------------------|----|----|----|----|
| $+ S_3 S_2 S_1 B$                                  |    |    |    |    |
| $\equiv: S_3' A + S_3 S_2' S_1' A' + S_3 S_2' S_1$ |    |    |    |    |
| $S_3 S_2$                                          | 00 | 01 | 11 | 10 |
|                                                    | 00 | 1  | 1  | 1  |
|                                                    | 01 | 1  | 1  | 1  |
|                                                    | 11 | 1  | 1  | 1  |
|                                                    | 10 | 1  | 1  | 1  |
|                                                    | 00 | 1  | 1  | 1  |
|                                                    | 01 | 1  | 1  | 1  |
|                                                    | 11 | 1  | 1  | 1  |
|                                                    | 10 | 1  | 1  | 1  |

$$Y : S_3' S_0 + S_3' S_2' S_1 B + S_2 S_1' B'$$

|           |           |    |    |    |    |
|-----------|-----------|----|----|----|----|
| $S_3 S_2$ | $S_1 S_0$ | 00 | 01 | 11 | 10 |
|           |           | 00 | 0  | 1  | 1  |
|           |           | 01 | 1  | 1  | 1  |
|           |           | 11 | 1  | 1  | 1  |

$$Z_0 : S_3' S_1 S_0 + S_3' S_2 S_1 + S_3' S_2 S_0'$$

$AL = S_3 S_2$  → for logical

in every carry,  $s_3 s_2$  will be AND with the carry. It ensures carry flow only for arithmetic operations

ii. @

$$\text{SPECratio} = \frac{\text{reference}}{\text{measured}}$$

geometric mean:

$$\left( \frac{9770}{x} \times \frac{10490}{527} \times \frac{12100}{586} \times \frac{20720}{109} \times \frac{7020}{470} \times \frac{6900}{275} \right)^{\frac{1}{5}} = 28.74$$

$$\Rightarrow x = 207.63$$

ii. b) 3 micro operations are needed for addition in scratchpad memory employed processor.

$$R_1 \leftarrow R_2 + R_3$$

Read :  $A \leftarrow M[010]$   
 "  $B \leftarrow M[011]$

write :  $M[001] \leftarrow A + B$

Here two operations are needed to read values from  $R_2$  and  $R_3$ . Since address is coming from single path and then choosing destination register each time, we can reduce the operation placing two address bus and separate selector line for each register.



(c) testfunc :

addi \$sp, \$sp, -16

sw \$ra, 12(\$sp)

\$s<sub>1</sub> ← var i

sw \$a<sub>2</sub>, 8(\$sp),

\$s<sub>2</sub> ← var res

sw \$a<sub>1</sub>, 4(\$sp)

sw \$a<sub>0</sub>, 0(\$sp)

addi \$s<sub>1</sub>, \$zero, 0

addi \$s<sub>2</sub>, \$zero, 0

do-while:

slt \$t<sub>0</sub>, \$a<sub>1</sub>, \$s<sub>1</sub> //  $i > b$

bneq \$t<sub>0</sub>, \$zero, L<sub>1</sub>

add \$s<sub>2</sub>, \$s<sub>2</sub>, \$a<sub>2</sub>

L<sub>1</sub>: addi \$s<sub>1</sub>, \$s<sub>1</sub>, 1

slt \$t<sub>1</sub>, \$s<sub>1</sub>, \$a<sub>0</sub>

bneq \$t<sub>1</sub>, \$zero, doWhile

add \$v0, \$52. \$zero

lw \$a\_0, 0(\$sp)  
lw \$a\_1, 4(\$sp)  
lw \$a\_2, 8(\$sp)  
lw \$ra, 12(\$sp)  
addi \$sp, \$sp, 16  
jr \$ra

12. a) three types of instruction format . It is a compromise between providing more larger addresses and constants in instructions and keeping all instructions the same length .

b) Avg clock cycle for program A;B :

$$(\text{Avg CPI})_A = \frac{2 \times 3 + 1 \times 1 + 4 \times 2 + 2 \times 4}{2+1+4+2} = 2.56$$

$$(\text{Avg CPI})_B = \frac{3 \times 3 + 2 \times 1 + 3 \times 2 + 2 \times 4}{3+2+3+2} = 2.5$$



12. c) guard bit = 0  $\rightarrow$  always truncate

- Guard and round digits, and sticky bit
- When computing result, assume there are several extra digits available for shifting and computation. This improves accuracy of computation.
- Guard digit: first extra digit/bit to the right of mantissa -- used for rounding addition results
- Round digit: second extra digit/bit to the right of mantissa -- used for rounding multiplication results
- Sticky bit: third extra digit/bit to the right of mantissa – used for resolving ties such as 0.50...00 vs. 0.50...01

12. d)



Q. (a) IEEE std wanted to ensure simplified sorting while representing floating point numbers. Neg. exponents can pose a problem to it. So, via biasing, exponents are kept positive.

(b)  $s=0$  1 exp  $\rightarrow$  7 frx  $\rightarrow$  24

$$\text{bias} : 2^6 - 1 = 63$$

(c) smallest :  $1.00\underset{24}{\dots}0 \times 2^{-63}$

$$= 1.00\dots 0 \times 2^{-62}$$

largest :  $1.\underbrace{11\dots 1}_{24} \times 2^{126-63}$

$$\approx 2.0 \times 2^{63}$$

(iv) smallest denormalized no :  $0.00\dots 1 \times 2^{-63}$   
 24 bits

$$= 1.0 \times 2^{-62-24}$$

$$= 1.0 \times 2^{-87}$$

13. (b)

(PC+4) → sign extended offset < 2

13. (c)

add.d \$f16, \$f0, \$f2

\$f0 ← a  
\$f2 ← b

sub.d \$f18, \$f16, \$f4

\$f4 ← c

c.gt.d \$f18, \$zero

\$f8 ← w.

bc1f exit

\$f10 ← x

mul.d \$f20, \$f10, \$f12

\$f12 ← y  
\$f14 ← z

div.d \$f22, \$f20, \$f8

swc1 \$f22, 0(\$f10)

exit;



