

①

Q1) (i) bge X6, X5, ELSE branch if  $h \geq g$   
addi X5, X5, 1  
add X7, X0, X0  
beg X7, X0, STOP  
ELSE: addi X6, X6, -1  
END

(ii) START: subt X6, X5, ELSE branch if  $h < g$ .  
andi X6, X6, 0  
beg X6, X0, STOP  
ELSE : andi X5, X5, 0  
STOP

(2)

Q2. swap contents of  $x_5, x_6$

eg:  $x_5 : 1 \quad x_6 : 2$ .

add  $x_5, x_5, x_6$  //  $x_5 : 3$ .

sub  $x_6, x_5, x_6$  //  $x_6 : 1 \rightarrow$  value of  $x_5$  now in  $x_6$ .

sub  $x_5, x_5, x_6$  //  $x_5 : 2 \rightarrow$  value of  $x_6$  now in  $x_6$

STOP

Here, the values of  $x_5$  and  $x_6$  have been swapped without using a 3rd register.

(3)

3.

3.1) addi:

- Imem to fetch instruction : 140
- Read registers  $x_2$  : 140 ( $70 \times 2$ )
- Write register : 60

→ addi reads

2 registers,  
writes in 1

340 pJ

for both single and 5-stage  
pipeline

3.2) ld:

- Imem to read instruction : 140
- Read registers  $x_2$  : 140 ( $70 \times 2$ )
- Read D-Mem Read : 140
- Registers write : 600A

→ ld reads from

memory as  
well as registers

480 pJ

3.3) beg:

- Imem to fetch instruction : 140

→ beg does not

- Read registers : 140 ( $70 \times 2$ )

writes to registers

280 pT

as memory,

it doesn't read from

memory either.

(4)

4) add  $x_1, x_2, x_3$       IF      ID      ALU1      ALU2      MEM      WB  
 add  $x_5, x_4, \textcircled{x}_1$       X      X      X      IF      ID      ...

If no forwarding is used, 3 NOPs will have to be inserted after instruction 1 to avoid data hazards.

4.1. Here, forwarding from ALU2 from the 1<sup>st</sup> addi instruction to the 2<sup>nd</sup> ALU1 of the 2<sup>nd</sup> addi instruction can be used to reduce the number of stalls and NOPs.

add  $x_1, x_2, x_3$ :      IF      ID      ALU1      ALU2      MEM      WB  
 X      IF      ID  $\rightarrow$  ALU1      ALU2      MEM      WB

Here, the 'X' indicates a bubble.

∴ The <sup>code</sup> instruction with full forwarding ( $ALU2 \rightarrow ALU1$ ) becomes:

add  $x_1, x_2, x_3$

nop

add  $x_5, x_4, x_1$

The initial code shows an EX to 1 data hazard.

(5)

4.2

CC0 CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

add x1, x2, x3      IF    ID    ALU1    ALU2    MEM    WB

nop

add x5, x4, x1

add x5, x4, x1

IF    ID    ALU1    ALU2    MEM    WB

ALU2 if forwarding the  
result to ALU1

⑥

- 5> addi \$0, zero, 5      ① // 1  
 L1: bge zero, \$0, DONE      ② // 6  
 addi \$0, \$0 addi \$0, \$0, -1      ③ // 5  
 j L1      ④ // 5  
 DONE      // 1



Iterations :    \$0 : 5    \$0 : 4    \$0 : 3    \$0 : 2    \$0 : 1  
                \$0 : 4    \$0 : 3    \$0 : 2    \$0 : 1    \$0 : 0

Here, the loop will run 5 times  
 $\therefore$  number of cycles :

addi will run 1 time

L1 : will run 6 times

addi will run 5 times

j will run 5 times

END : 1 time

18 instructions

(7)

Ex: Assuming no data hazards and no structural hazards, the program will run in ~~18 cycles~~

$K + n - 1$  cycles

K: no. of stages

n: no. of instructions

$$\therefore S + \frac{18}{n-1}$$

$$\text{Given } K = 4 \text{ and } n = 18 \text{ instructions} \\ \therefore S + \frac{18}{18-1} = 18 \text{ cycles} \quad \underline{\underline{22 \text{ cycles}}}$$

$\therefore \text{CPI time} = \frac{\# \text{cycles}}{\# \text{instructions}}$

$$\text{CPI time} = \frac{18}{18} = 1.2 \quad \frac{22}{18} = 1.22$$

$$W \rightarrow M \quad J \quad J \quad J \quad J$$

$$W \rightarrow M \quad J \quad J \quad J \quad J$$

$$W \rightarrow M \quad J \quad J \quad J \quad J$$

$$W \rightarrow M \quad J \quad J \quad J \quad J$$

Time required = 12 cycles  $\times$  3 sec = 36 sec

Program completion time = 36 sec

(6)

- 6) xor \$1, \$2, \$3 ①  
 addi \$0, \$3, -4 ②  
 lw \$3, 16(\$2) ③  
 sw \$4, 20(\$1) ④  
 or t2, \$0, \$1 ⑤

Assuming that the hazard detection supports full forwarding and can stall instructions:

- There are no data hazards in these instructions.

As after xor, there are 2 instructions before the sw instruction and similarly, after the addi instruction, there are 2 instructions before the or instruction.

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9



In cycle 5, register \$1 is being written into  
 $s1 = s2 + s3$

Here, the register that is being read is also \$1

Hence As the both read and write are happening in the same register, this is a

(9)

## 6) structural hazard

i. A ~~loop with~~ A ~~loop~~ will have to be inserted before instruction

(4)

→ This will resolve the structural hazard taking place in cycle 5 and 6.

ii. New code:

xor \$1, \$2, \$3

addi \$0, \$3, -4

lw \$3, 16(\$7)

nop

sw \$4, 20(\$1)

or \$1, \$0, \$1

10

7) 7.1 or  $X_{13}, X_{12}, X_{11}$

1d  $X_{10}, O(X_{13})$

EX to 1  $\rightarrow X_{13}$

1d  $X_{11}, S(X_{13})$

$X_{11}$ : MEM to 1

add  $X_{12}, X_{10}, X_{11}$

$X_{12}$ : MEM to 2

subi  $X_{13}, X_{12}, 16$

$X_{12}$ : EX to 1

∴ Resolution :

or  $X_{13}, X_{12}, X_{11}$

IF ID E M  $\boxed{X_{13}}$  + L

nop

nop

1d  $X_{10}, O(X_{13})$

$X_{10}$ : EX IF ID E M  $\boxed{X_{13}}$

1d  $X_{11}, S(X_{13})$

$X_{11}$ : EX IF ID E M  $\boxed{X_{13}}$

nop

nop

add  $X_{12}, X_{10}, X_{11}$

$X_{12}$ : EX IF ID E M  $\boxed{X_{13}}$

nop

nop

subi  $X_{13}, X_{12}, 16$

$X_{12}$ : EX IF ID E M  $\boxed{X_{13}}$

The 'X' to the right denote bubbles.

As there is no forwarding and no hazard detection, 6 nops will need to be added.

(11)

|      | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9 | 10 |
|------|----|----|----|----|----|----|----|----|---|----|
| or   | IF | ID | E  | M  | W  |    |    |    |   |    |
| ld   |    | IF | ID | E  | M  | W  |    |    |   |    |
| ld   |    |    | IF | ID | E  | M  | W  |    |   |    |
| nop  |    |    |    | IF | ID | E  | M  | W  |   |    |
| add  |    |    |    |    | IF | ID | E  | M  | W |    |
| add  |    |    |    |    |    | IF | ID | E  | M | W  |
| subi |    |    |    |    |    |    | IF | ID | E | M  |

E: EX , M : MEMEN , W: WB , The arrows show forwarding

Cycle

Forward A

Forward B

|   |   |   |   |   |   |  |  |  |  |  |
|---|---|---|---|---|---|--|--|--|--|--|
| 1 | 1 | 0 | 0 | X | X |  |  |  |  |  |
| 2 | 0 | 1 | 0 | X | X |  |  |  |  |  |
| 3 | 0 | 0 | 1 | 0 | 0 |  |  |  |  |  |
| 4 | 2 | 1 | 0 | 0 | 0 |  |  |  |  |  |
| 5 | 1 | 0 | 0 | 0 | 0 |  |  |  |  |  |
| 6 |   | X |   |   | X |  |  |  |  |  |
| 7 |   | 0 |   |   | 1 |  |  |  |  |  |
| 8 |   | 1 |   |   | 0 |  |  |  |  |  |

Here, nothing is in execute stage

(12)

Q) A

- Instructions/Programs : Here, the number of instructions will decrease as 2 addition operations can be performed at the same time. e.g. add rs1, rs1, rs2 and add rs1, rs1, rs3 can be performed in a single instruction.
- CPI : Here, number of cycles required will remain the same but the number of instructions will decrease. Hence, CPI will increase.
- Circuit complexity : Circuit complexity will increase as we need to read an extra register in this case.

B

- Instructions/Programs : Here, number of instructions will decrease as we will not have to perform  $PC = PC + 4$  separately. The PC will point to the next instruction after ~~automa~~ execution in ALU.
- CPI : Here, number of cycles will remain the same but no of instructions will decrease. Hence, CPI will increase.
- Circuit complexity : It will decrease as the address for PC will not be required.

C

- Instructions/programs : No change as changing the numbers will not affect the instruction size.
- CPI : no change as increasing the number of register will have no effect of number of cycles, neither number of instructions.
- Circuit complexity : This will increase as more registers have been added to the circuit.