

CAOS  
Test 1.

Mansi Uniyal.

19EE10039

①

CPI

| Q1. Clock rate | Class A | Class B | Class C | Class D. |
|----------------|---------|---------|---------|----------|
| P1 1.5 GHz     | 1       | 2       | 3       | 4        |
| P2 2.0 GHz     | 2       | 3       | 4       | 5        |

$10^6$  instructions : 10% Class A,  
20% Class B,  
50% Class C,  
20% Class D.

- (a) no. of clock cycles for execution by  
processors P1 and P2.

$\textcircled{P}_1 \rightarrow$  Instructions  $\times$  CPI  
 $= 10^6 \left[ \frac{10 \times 1 + 20 \times 2 + 50 \times 3 + 20 \times 4}{100} \right] = 2.8 \times 10^6$  clock cycles

$\textcircled{P}_2 \rightarrow$  Instructions  $\times$  CPI  
 $= 10^6 \left[ \frac{10 \times 2 + 20 \times 3 + 50 \times 4 + 20 \times 5}{100} \right] = 3.8 \times 10^6$  clock cycles

- (b) time required by P1 and P2 for execution.  
 time required =  $\frac{\text{clock cycles}}{\text{clock frequency}}$ .

$\textcircled{P}_1 \rightarrow \frac{2.8 \times 10^6}{1.5 \times 10^9} = 1.8667 \times 10^{-3} \text{ s} = 1.8667 \text{ ms}$

$\textcircled{P}_2 \rightarrow \frac{3.8 \times 10^6}{2 \times 10^9} = 1.9 \times 10^{-3} \text{ s} = 1.9 \text{ ms}$ .

(c) global CPI for  $P_1$  and  $P_2$ .

$CPI = \frac{\text{total clock cycles}}{\text{instructions}}$

$$P_1 \rightarrow \frac{2.8 \times 10^6}{10^6} = 2.8$$

$$P_2 \rightarrow \frac{3.8 \times 10^6}{10^6} = 3.8$$

- d) which implementation is faster.  
indicate speedup factor of one with other.

CPU time  $P_1 <$  CPU time  $P_2$ .

$P_1$  is faster than  $P_2$ .

$$\text{Speed up factor} = \frac{TP_2}{TP_1} = \frac{1.9}{1.8667} = 1.018$$

- Q2. Write MIPS assembly program. If (d)
- base address of  $A[n] \rightarrow \$50$   
 $B[n] \rightarrow \$51$   
 $C[n] \rightarrow \$52$  registers.

$\$53 \rightarrow$  loop index.

loop limit  $\rightarrow n \rightarrow \$54$ .

(3)

for ( $i=0$ ;  $i < n$ ;  $i++$ ) {

    if ( $A[i] == B[i]$ )  
         $C[i] = 0$ ;

    else if ( $A[i] < B[i]$ )

$C[i] = -1$ ;

    else  
         $C[i] = 1$ ;

}.

(4)

addi \$s5,\$zero,1  
 addi \$s6,\$zero,-1  
 addi \$s3,\$zero,\$zero # initialize i=1  
 loop:  
 slt \$t0,\$s3,\$s4 # t0=0 if ( $i \geq n$ )  
 beq \$t0,\$zero,exit # go to exit if ( $i \geq n$ )  
 sll \$t1,\$s3,2 #  $t1 = i * 4$   
 add \$t2,\$s0,\$t1 #  $t2 = A + (i * 4)$   
 add \$t3,\$s1,\$t1 #  $t3 = B + (i * 4)$   
 lw \$t4,0(\$t2) #  $t4 = A[i]$   
 lw \$t5,0(\$t3) #  $t5 = B[i]$   
 add \$t2,\$s2,\$t1 #  $t2 = C + (i * 4)$   
 bne \$t4,\$t5,else1 # go to else1 if ( $A[i] \neq B[i]$ )  
 sw \$zero,0(\$t2) #  $C[i] = 0$   
 j end # jump to end

else1:  
 slt \$t0,\$t4,\$t5 # t0=0 if ( $A[i] < B[i]$ )  
 beq \$t0,\$zero,else2 # go to else2 if ( $A[i] \geq B[i]$ )  
 sw \$s6,0(\$t2) #  $C[i] = -1$   
 j end # jump to end

else2:  
 sw \$s5,0(\$t2) #  $C[i] = 1$

end:  
 addi \$s3,\$s3,1 #  $i = i + 1$   
 j loop # jump to loop  
 exit # exit

(5)

- Q3. reconstruct C function from MIPS code.  
describe what the function does.

mipsfun:

addi \$t0, \$a2, 1

loop:

slt \$t3, \$a1, \$t0

addi \$t4, \$0, 1

beq \$t3, \$t4, exit

add \$t1, \$t0, \$t0

add \$t1, \$t1, \$t1

add \$t1, \$t1, \$a0

lw \$t2, 0(\$t1)

addi \$t1, \$t1, -4

sw \$t2, 0(\$t1)

addi \$t0, \$t0, 1

j loop

exit:

jr \$ra.

It left shifts the elements of an array  
at a0 from index a2 to index a1-1

void mipsfun(int A0[], int a1, int a2) {

for (int i=a2 ; i<a1 ; i++) {

A0[i] = A0[i+1]; }

C code

(6)

Q4. perform  $77 \div 6$  by first version of division hardware. with all iterations and sequence of steps clearly.

| I  | Step                                                | Q      | D         | R.        |
|----|-----------------------------------------------------|--------|-----------|-----------|
| 0  | Initial values                                      | 0000   | 0110 0000 | 01001101  |
| 1  | $\text{rem} = \text{rem} - \text{div}$              | 0000   | 0110 0000 | 11101101  |
|    | $2a + \text{div} \text{ SLL } \theta, \theta_0 = 0$ | 0000   | 0110 0000 | 01001101  |
|    | 3. Shift div right                                  | 000000 |           |           |
| 2. | $\text{rem} = \text{rem} - \text{div}$              | 000010 | 0011 0000 | 00011101  |
|    | $2a \cdot 8U \theta, \theta_0 = 1$                  | 0001   | 0011 0000 | 00011101  |
| 3. | $\text{rem} = \text{rem} - \text{div}$              | 0001   | 0000 1100 | 000011101 |
|    | $2a$                                                | 0011   | 0001 1000 | 0000 0101 |
|    | 3                                                   | 0011   | 0000 1100 | 0000 0101 |
| 4  | $\text{rem} = \text{rem} - \text{div}$              | 0011   | 0000 1100 | 1111 1001 |
|    | $2b$                                                | 0110   | 0000 1100 | 0000 0101 |
|    | 3                                                   | 0110   | 0000 0110 | 0000 0101 |
| 5  | $\text{rem} = \text{rem} - \text{div}$              | 0110   | 0000 0110 | 1111 1111 |
|    | $2b$                                                | 1100   | 0000 0110 | 0000 0101 |
|    | 3                                                   | 1100   | 0000 0011 | 0000 0101 |

Q5.



1 → 2 load hazard, we need to add a bubble because of S2 which is loaded in 1 has to be used again in EX stage in 2.

This can be done with forwarding 1 bubble

2 → 3 S3 obtained after EX of 2<sup>nd</sup> stage is needed before EX of 3<sup>rd</sup> stage.  
This can be solved by forwarding

3 → 4 Branch reset is obtained at end of EX only after which IF of 4 is forwarded

4 → 5 Similar to 2 → 3 here. S5 is needed in 5 and calculated in 5.

Total delay 3 time slots.

## Execution time.

Q6. (i) single cycle approach  
given,

$$t_{IF} = 300 \text{ ps}$$

$$t_{ID} = 400 \text{ ps}$$

$$t_{EX} = 350 \text{ ps}$$

$$t_{MEM} = 500 \text{ ps}$$

$$t_{WB} = 100 \text{ ps}$$

$$t_e = (300 + 400 + 350 + 500 + 100) = 1650 \text{ ps}.$$

$$f_{clock} = \frac{1}{t_e} = 6.060606 \times 10^8 \text{ Hz}.$$

(ii) 5 stage pipeline without forwarding.

$$\begin{aligned} t_c &= \max(t_{IF}, t_{ID}, t_{EX}, t_{MEM}, t_{WB}) \\ &= 500 \text{ ps}. \end{aligned}$$

$$f_{clock} = \frac{1}{t_c} = \frac{1}{500 \text{ ps}} = 2 \times 10^9 \text{ Hz}$$

(iii) 5 stage pipeline with data forwarding

$$\begin{aligned} t_c &= \max(t_{ID}, t_{IF}, t_{EX}, t_{MEM}, t_{WB}) \\ &= 500 \text{ ps} \end{aligned}$$

$$\text{total time} = 12 \times 500 = 6000 \text{ ps}$$

$$f_{clock} = \frac{1}{6000 \text{ ps}} = 2 \times 10^9 \text{ Hz}$$

Q7. Data hazards occur when pipeline must be stalled because one step must wait for another to complete.

A MEM/WB merged may occur betw ou instruction in EX stage and instruction from 2 cycles ago.

An example:

add \$t1, \$t2, \$t3  
add \$t1, \$t1, \$t4  
sub \$t5, \$t5, \$t1

register \$t1 is written by both of previous instructions but only the most recent has to be used.

add \$t1, \$t2, \$t3.



add \$t1, \$t1, \$t4



sub \$t5, \$t5, \$t1



∴ The hazard can becomes:

if  $|MEM|WB.\text{RegWrite} = 1$  and  
 $MEM|WB.\text{RegRd} = ID|Ex\text{RegRS}$  and  
 $|E|MEM.\text{RegRd} \neq ID|Ex\text{RegRS}$  or  
 $Ex|MEM\text{RegWrite}$

then forward A=1

if  $|MEM|WB.\text{Regwrite} = 1$  and  
 ~~$|MEM|WB.\text{RegistersRd} = ID|Ex.\text{Reg}$~~   
and  $|Ex|MEM.\text{Registers} \neq ID|Ex\text{Reg}$   
or  $Ex|MM.\text{RegWrite}$   
then forward B=1.

Q8. Decade instruction

$$r_s = 17$$

$$r_t = 20$$

$$\text{constant} = 60D$$

100011      10001

opcode      r<sub>s</sub>

10100

000000

r<sub>t</sub>

r<sub>d</sub>

01001

shamt

011000  
frt

$$600 \ll 2$$

$$= (600 \times 4)$$

$$= 240D$$

O/P of sign extend:

0000000 0000000 0000000 0000000 0000000 1001011000

O/P of  $\frac{1}{2}$  left shift

0000000 0000000 0000000 0000000 0010010101000000

(b) As it is lw instruction

ALU op = 0D

ALU control = 0010

desired instruction is add

(c) PC<sub>new</sub> = PC<sub>old</sub> + (600  $\ll$  2)

$$= PC_{old} + 240D$$

e) All inputs  $\rightarrow \$rs, \$rt$   
Both are 1000

f) register unit  $rs \rightarrow 17$ .  
 $rt \rightarrow 20$   
 $rd \rightarrow 0$

Q9. 2way associative cache design with 32 bit address.

(a) hardware implementation.

Tag: 31-10  
Index: 9-4



$$\begin{aligned}
 \text{(b) Cache block size} &= 2^{\text{offset}} \text{ bytes} \\
 &= 2^4 = 16 \text{ bytes} \\
 &= 4 \text{ words}
 \end{aligned}$$

$$\begin{aligned}
 \text{(c) no. of blocks} &= 2^{\text{no. of sets} \times \text{associations}} \\
 &= 2^6 \times 2 = 2^7 \\
 &= 128
 \end{aligned}$$

$$\begin{aligned}
 \text{no. of entries} &= \text{block size} \times \text{no. of blocks} \\
 &= 16 \times 128 \\
 &= \underline{512 \text{ words}}
 \end{aligned}$$

(d) Data storage supported by cache - 512 words

$$= 2^9 \times 2^2 \text{ bytes} = 2^11$$

$$= \underline{2 \text{ KiB}}$$

(e) Total overhead involved

$$= (\text{size of valid set} + \text{tag size}) \times \text{no. of blocks}$$

$$= (1 + 2^2) \times 128 \text{ bytes/bits}$$

$$= 2^9 \times 4 \text{ bits}$$

$$= \underline{368 \text{ Bytes}}$$

(f) (i) Block address = reference address  
Block size in bytes.

No. of blocks in Cache =  $\frac{\text{line}}{\text{block ID}}$

| Address | Line ID | Hit/Miss | Replace (Y/N) |
|---------|---------|----------|---------------|
| 160     | 10      | Miss     | N             |
| 200     | 12      | Miss     | N             |
| 172     | 10      | Hit      | N             |
| 1188    | 10      | Miss     | N             |
| 1224    | 12      | Hit      | N             |
| 2216    | 10      | Hit      | Y             |
| 2248    | 12      | Miss     | Y             |
| 4136    | 2       | Miss     | N             |
| 760     | 15      | Miss     | N             |
| 3268    | 10      | Hit      | N             |

(ii) no. of misses observed = 6

(iii) no. of block replacements observed = 2

(iv) Hit ratio =  $\frac{\text{no. of hits}}{\text{total}} = \frac{4}{10} = 0.4$ .