

# Assignment 4

Due Date : April 26, 2021 (day)

Name : 易 翔  
 Student ID : 11912013  
 Score : \_\_\_\_\_

**Problem 1 (4.8) Score:** \_\_\_\_\_

- Solution:**
1. In pipelined processor, the clock cycle is  $\max\{250, 350, 150, 300, 200\} = 350\text{ps}$ . And the the clock cycle of the non-pipelined processor is  $250 + 350 + 150 + 300 + 200 = 1250\text{ps}$ .
  2. In pipelined processor, the latency is  $5 \times \text{clock cycle} = 5 \times 350 = 1750\text{ps}$ , since the number of stage is 5. And the latency of the non-pipelined processor is  $1 \times 1250 = 1250\text{ps}$ .
  3. The stage **ID** will be split because this stage spend the longest time. And the new clock cycle of pipelined processor is  $\max\{250, \frac{350}{2}, 150, 300, 200\} = 300\text{ps}$ .
  4. There are no stalls or hazards. And the instructions which need to utilize data memory is the instruction **lw** and **sw**. The utilization of the data memory is  $20\% + 15\% = 35\%$ .
  5. There are no stalls or hazards. And the instructions which need to utilize of the write-register port of the Registers unit is the instruction **lw** and **alu**. The utilization of the write-register port of the Registers unit is  $20\% + 45\% = 65\%$ .
  6. First consider the cycle time,

| pipelined processor | non-pipelined processor | multi-cycle organization |
|---------------------|-------------------------|--------------------------|
| 350ps               | 1250ps                  | 350ps                    |

Consider the multi-cycle organization. The **lw** cause 5 cycles, the **sw** and **beq** cause 4 cycles because without **WB**, and the **ALU** cause 4 cycles because without **MEM**. Then the excution time, we consider the ratio only:

| pipelined processor | non-pipelined processor                     | multi-cycle organization                                                         |
|---------------------|---------------------------------------------|----------------------------------------------------------------------------------|
| 1                   | $\frac{1250\text{ps}}{350\text{ps}} = 3.57$ | $\frac{(4 \times 80\% + 5 \times 20\%) \times 350\text{ps}}{350\text{ps}} = 4.2$ |

□

**Problem 2 (4.9) Score:** \_\_\_\_\_

| Solution: | Time → |    |    |    |     |     |    |
|-----------|--------|----|----|----|-----|-----|----|
|           | I1     | IF | ID | EX | MEM | WB  |    |
|           | I2     |    | IF | ID | EX  | MEM | WB |
|           | I3     |    | IF | ID | EX  | MEM | WB |

1. From the figure above we can obtain that, for reg1 the data in I2 and I3 depend the result of I1. Which may happen when read register before write register. Consider the instructions I2 and I3, I3 read register 1 before I2 write register 1. It is also a dependence happen in reg1.
- Similarly, we can find that the same thing happen from I2 to I3 in reg2 since the data in reg2 read by I3 depend on the result of I2.

2. The new instructions sequence is:

```

or r1, r2, r3
nop
nop
or r2, r1, r4
nop
nop
or r1, r1, r2

```

3. After adding full forwarding, there will be no hazard any more. Then no nop is needed to add into the sequence.
4. When there is no forwarding, the excution time is  $(5 + (3 - 1) + 4) \times 250\text{ps} = 2750\text{ps}$ . And with full forwarding. The excution time is  $7 \times 300\text{ps} = 2100\text{ps}$ . And the speed up rate is  $\frac{2750\text{ps}}{2100\text{ps}} \approx 1.31$ .

5. After adding ALU-ALU forwarding, we can know that the result of ALU can be forward to the next instruction like r1 from I1 to I2, and r2 from r2 to I3. Then no nop is needed to add into the sequence.
6. When there is no forwarding, the execution time is  $(5 + (3 - 1) + 4) \times 250\text{ps} = 2750\text{ps}$ . And with full forwarding. The execution time is  $7 \times 290\text{ps} = 2030\text{ps}$ . And the speed up rate is  $\frac{2750\text{ps}}{2030\text{ps}} \approx 1.35$ .

□