

**Hacettepe University**  
**Department of Computer Engineering**  
**BBM234 Computer Organization – Spring 2021**  
**Homework 2 Solutions**

**Assigned date: 04.05.2021      Due: 16.5.2021 at 23:59:59**  
**Submit a single (zipped) PDF file via submit.cs.hacettepe.edu.tr**

**Q1.** We would like to add **bne** instruction to the single cycle architecture given below. **bne** instruction is a branch instruction and it loads branch target address (BTA) to the PC (PC=BTA) if [rs] != [rt].

- Show the necessary changes on the data-path in Figure 1 and explain your changes. Your new architecture should be able to execute both **beq** and **bne** instructions. [12]
- Fill the control signals in Table I. Add new control signal/signals to the table if necessary. [8]

a)



Figure 1: Single cycle processor

| Branch | Zero | Bne | PCsrc |
|--------|------|-----|-------|
| 0      | 0    | 0   | 0     |
| 0      | 0    | 1   | 0     |
| 0      | 1    | 0   | 0     |
| 0      | 1    | 1   | 0     |
| 1      | 0    | 0   | 0     |
| 1      | 0    | 1   | 1     |
| 1      | 1    | 0   | 1     |
| 1      | 1    | 1   | 0     |

We have to determine PCsrc signal function by using Branch, zero and newly added **bne** control signal. The truth table is given below. If PCsrc = 1 then PC=BTA. The new control logic function is drawn above.

**PCsrc=(Zero XOR bne) AND Branch**

b)

Table I: **bne** control signals

| Inst. | Op <sub>31:26</sub> | RegWrite | RegDst   | AluSrc   | Branch   | MemWrite | MemtoReg | ALUOp <sub>1:0</sub> | Bne      |
|-------|---------------------|----------|----------|----------|----------|----------|----------|----------------------|----------|
| bne   | <b>000101</b>       | <b>0</b> | <b>X</b> | <b>0</b> | <b>1</b> | <b>0</b> | <b>X</b> | <b>01 (Subtract)</b> | <b>1</b> |

**Q2.** You are given the following MIPS code. You have 5 stage pipelined MIPS processor running the code.

- a) If there is no forwarding unit in the MIPS processor, **insert enough nop's between the instructions** to have correct execution of the code. How many cycles does it take to execute all instructions? Show your calculations or explain how you found the cycle time.

#### # MIPS assembly code

```
lw $s0, 0($0)
lw $s1, 4($0)
nop
nop (the value of s1 is updated in the fourth cycle. Since add uses s1, we need 2 nops)
add $t0, $s0, $s1
or $t1, $s2, $s3
nop
nop (the value of t1 is updated in the fourth cycle. Since add uses t1, we need 2 nops)
and $t1, $t1, $t0
```

There are 9 instructions with nops. That takes 9 fetch cycles to fetch instructions. To finish the execution of last instruction, we need 4 more cycles. **The total of 13 cycles.**

- b) If there is a forwarding unit in the MIPS processor, insert enough nop's between the given instructions to have correct execution of the code. How many cycles does it take to execute all instructions? Show your calculations or explain how you found the cycle time.

```
lw $s0, 0($0)
lw $s1, 4($0)
nop (because of data forwarding, only one nop is necessary)
add $t0, $s0, $s1
or $t1, $s2, $s3
and $t1, $t1, $t0
```

There are 6 instructions with nops. That takes 6 fetch cycles to fetch instructions. To finish the execution of last instruction, we need 4 more cycles. **The total of 10 cycles.**

- c) If there is data hazard unit (data forwarding and stalling hardware) in the MIPS processor, rearrange the given code if possible to minimize the clock cycles and your code still executes correctly. How many cycles does it take to execute all instructions? Show your calculations or explain how you found the cycle time.

Or instruction is independent of the previous instructions. We can move it up, after the second lw.

```
lw $s0, 0($0)
lw $s1, 4($0)
or $t1, $s2, $s3
add $t0, $s0, $s1
and $t1, $t1, $t0
```

There are 5 instructions. That takes 5 fetch cycles to fetch instructions. To finish the execution of last instruction, we need 4 more cycles. **The total of 9 cycles.**

**Q3.** The distribution of the instructions for the program with one billion ( $10^9$ ) instructions is given below.

40% load,    10% store,    10% branch,    10% jump,    30% R-type

Suppose

- 50% load instruction results are used by the next instruction.
- 50% R-type instruction results are used by the next instruction.
- 50% branches are taken (i.e., mispredicted).
- All jumps flush the next instruction.

a) How many clock cycles does this program take in a single cycle MIPS processor?

$$\begin{aligned} \text{CPI (Single Cycle)} &= 1 \\ \text{CPU\_Time (SC)} &= 10^9 \times 1 = 10^9 \text{ cc} \end{aligned}$$

b) How many clock cycles does it take on a pipelined MIPS processor with no hazard unit (no forwarding and no early branch resolution)?

If there is no forwarding, the dependent instructions will take 3 cc.

$$\begin{aligned} \text{CPI (lw)} &= 0.50 \times 1 + 0.50 \times 3 = 2 \\ \text{CPI (R)} &= 0.50 \times 1 + 0.50 \times 3 = 2 \end{aligned}$$

If the branch is taken, 3 instructions will be flushed. Branch will take 4 cc (the branch instruction plus three flushed ones). Otherwise, it will take only one cc. Jump instructions always take 2 cc.

$$\begin{aligned} \text{CPI (branch)} &= 0.50 \times 1 + 0.50 \times 4 = 2.5 \\ \text{CPI (J)} &= 2 \end{aligned}$$

$$\text{CPI (avr)} = 0.40 \times 2 + 0.10 \times 1 + 0.10 \times 2.5 + 0.10 \times 2 + 0.30 \times 2 = 1.95$$

$$\text{CPU\_Time (Pipe\_no\_hazard\_unit)} = 10^9 \times 1.95 \text{ cc} = 1.95 \times 10^9$$

c) How many clock cycles does it take on a pipelined MIPS processor with hazard unit (with forwarding and early branch resolution)?

If there is forwarding and early branch resolution, the dependent instructions will take 2 cc for lw.

$$\begin{aligned} \text{CPI (lw)} &= 0.50 \times 1 + 0.50 \times 2 = 1.5 \\ \text{CPI (R)} &= 0.50 \times 1 + 0.50 \times 1 = 1 \end{aligned}$$

If the branch taken, 1 instruction will be flushed. Branch will take 2 cc (branch instruction plus one flushed instruction). Otherwise, it will take only one cc. Jump instructions always take 2 cc.

$$\begin{aligned} \text{CPI (branch)} &= 0.50 \times 1 + 0.50 \times 2 = 1.5 \\ \text{CPI (J)} &= 2 \end{aligned}$$

$$\text{CPI (avr)} = 0.40 \times 1.5 + 0.10 \times 1 + 0.10 \times 1.5 + 0.10 \times 2 + 0.30 \times 1 = 1.35$$

$$\text{CPU\_Time (Pipe\_hazard\_unit)} = 10^9 \times 1.35 \text{ cc} = 1.35 \times 10^9$$

**Q4.** The propagation delay of each stage for a CPU architecture is determined as shown in the table below. The delay of a pipeline register is found as 1ns. ( $\text{ns} = 10^{-9}$ )

| IF  | ID  | EX  | MEM | WB  |
|-----|-----|-----|-----|-----|
| 7ns | 7ns | 6ns | 9ns | 5ns |

- a) If we design a single-cycle processor, what would be the minimum clock cycle in ns?

$$\text{Clock\_cycle} = 7 + 7 + 6 + 9 + 5 = 34 \text{ ns (all stages need to complete in a single cc)}$$

- b) What would be the clock cycle of a 5 stage pipelined processor?

The longest stage is the MEM stage, so it decides the clock cycle time, as each stage needs to be able to complete successfully during a clock cycle. Additionally, we must add the delay of the pipeline register to the MEM's delay. Then:

$$\text{Clock\_cycle} = 9 + 1 = 10 \text{ ns}$$

- c) We would like to decrease the clock cycle of pipelined processor and we decided to divide one of the stages into two stages. Thus, the new pipelined processor will have 6 stages. Which stage would you divide to two? What would be the new clock cycle?

To reduce the clock period, we must divide the longest stage which is the MEM stage. Then, the longest stage becomes 7 ns (both IF and ID). So, the new clock cycle will be:

$$\text{Clock\_cycle} = 7 + 1 = 8 \text{ ns}$$

- d) We would like run 1000 instructions on above specified processors. Assume there is no data and control hazards. What are the CPU times of these 1000 instruction on these three processors?

$$\text{CPU time} = \text{Instruction count (# of instructions)} \times \text{CPI} \times \text{clock cycle time}$$

$$\text{CPU Time (Single Cycle)} = 1000 \times 1 \times 34 \cdot 10^{-9} = 34 \cdot 10^{-6} \text{ sn}$$

$$\text{CPU Time (five-stage pipelined)} = 1000 \times 1004/1000 \times 10 \cdot 10^{-9} = 10.04 \times 10^{-6} \text{ sn}$$

$$\text{CPU Time (six-stage pipelined)} = 1000 \times 1005/1000 \times 8 \cdot 10^{-9} = 8.04 \times 10^{-6} \text{ sn}$$

**Q5.** The MIPS program below is executed on the pipelined architecture given below. At a time instant, we observe that SrcAE=0x0000 0005.



a) At the same moment, write down the instructions in the following phases:

| Stage     | Instruction                 |
|-----------|-----------------------------|
| Fetch     | <b>and \$s5, \$t2, \$t3</b> |
| Decode    | <b>sub \$s4, \$t3, \$t4</b> |
| Execute   | <b>add \$s3, \$t1, \$t2</b> |
| Memory    | <b>lw \$s2, 40(\$0)</b>     |
| Writeback | <b>addi \$t4, \$0, 5</b>    |

b) At the same moment, what are the values of the following signals?

| Signal    | Value                                        |
|-----------|----------------------------------------------|
| InstrD    | <b>0x016CA022<br/>(sub \$s4, \$t3, \$t4)</b> |
| SrcBE     | <b>6 (value in \$t2)</b>                     |
| ALUOutM   | <b>40</b>                                    |
| MemWriteM | <b>0</b>                                     |
| WriteRegW | <b>0xC (12 or \$t4)</b>                      |