

## IT5002 Computer Systems and Applications

### Tutorial 4

- Suppose the four stages in some 4-stage pipeline take the following timing: 2ns, 3ns, 4ns, and 2ns. Given 1000 instructions, what is the speedup (in two decimal places) of the pipelined processor compared to the non-pipelined single-cycle processor?
- Let's try to understand pipeline processor by doing a detailed trace. Suppose the pipeline registers (also known as pipeline latches) store the following information:

| IF / ID           |   | ID / EX  |  | EX / MEM |   | MEM / WB |  |
|-------------------|---|----------|--|----------|---|----------|--|
| No Control Signal |   |          |  |          |   |          |  |
| MToR              |   | RegWr    |  | MToR     |   | MTor     |  |
| RegWr             |   | MemRd    |  | RegWr    |   | RegWr    |  |
| MemRd             |   | MemWr    |  | MemRd    |   | MemRes   |  |
| MemWr             |   | Branch   |  | MemWr    |   | ALURes   |  |
| Branch            |   | RegDst   |  | Branch   |   | ALURes   |  |
| RegDst            |   | ALUsrc   |  | BrcTgt   |   | DstRNUnm |  |
| ALUsrc            |   | ALUop    |  | isZero   | ? |          |  |
| ALUop             |   | PC+4     |  | ALURes   |   |          |  |
| PC+4              |   | ALUOpr 1 |  | ALUOpr 2 |   |          |  |
| OpCode            |   | ALUOpr 2 |  | Rt       |   |          |  |
| Rs                |   | Rt       |  | Rd       |   |          |  |
| Rt                |   | Rd       |  | Imm(32)  |   |          |  |
| Rd                |   | Imm(32)  |  |          |   |          |  |
| Funct             |   |          |  |          |   |          |  |
| Imm(16)           | ) |          |  |          |   |          |  |

Show the progress of the following instructions through the pipeline stages by filling in the content of pipeline registers.

- i. `0x8df80000 # lw $24, 0($15)`      `#Inst.Addr = 0x100`
- ii. `0x1023000C # beq $1, $3, 12`      `#Inst.Addr = 0x100`
- iii. `0x0285c822 # sub $25, $20, $5`      `#Inst.Addr = 0x100`

Assume that registers 1 to 31 have been initialized to a value that is equal to  $101 + \text{its register number}$ . i.e.  $[\$1] = 102$ ,  $[\$31] = 132$  etc. You can put “X” in fields that are irrelevant for that instruction. Do note that in reality, these fields are actually generated but not utilized.

Part (i) has been worked out for you.

- i. `0x8df80000 # lw $24, 0($15)`      `#Inst.Addr = 0x100`

| IF / ID           |       | ID / EX  |       | EX / MEM |      | MEM / WB |          |
|-------------------|-------|----------|-------|----------|------|----------|----------|
| No Control Signal |       | MToR     | 1     | MToR     | 1    | MToR     | 1        |
| PC+4              | 0x104 | RegWr    | 1     | RegWr    | 1    | RegWr    | 1        |
| OpCode            | 0x23  | MemRd    | 1     | MemRd    | 1    | MemRes   | Mem(116) |
| Rs                | \$15  | MemWr    | 0     | MemWr    | 0    | ALURes   | X        |
| Rt                | \$24  | Branch   | 0     | Branch   | 0    | ALURes   | X        |
| Rd                | X     | RegDst   | 0     | BrcTgt   | X    | DstRNUnm | \$24     |
| Funct             | X     | ALUsrc   | 1     | isZero ? | X    |          |          |
| Imm(16 )          | 0     | ALUop    | 00    | ALURes   | 116  |          |          |
|                   |       | PC+4     | 0x104 | ALUopr 1 | 116  |          |          |
|                   |       | ALUopr 2 | X     | ALUopr 2 | X    |          |          |
|                   |       | Rt       | \$24  | Rd       | X    |          |          |
|                   |       | Rd       | X     | Imm(32 ) | 0    |          |          |
|                   |       |          |       | DstRNUnm | \$24 |          |          |

3. Given the following three formulas (See Lecture #9, Section 5 Performance):

$$CT_{seq} = \sum_{k=1}^N T_k$$

$$CT_{pipeline} = \max(T_k) + T_d$$

$$Speedup_{pipeline} = \frac{CT_{seq} \times InstNum}{CT_{pipeline} \times (N + InstNum - 1)}$$

For each of the following processor parameters, calculate  $CT_{seq}$ ,  $CT_{pipeline}$  and  $Speedup_{pipeline}$  (to two decimal places) for 10 instructions and for 10 million instructions.

|    | Stages Timing (for 5 stages, in ps)   | Latency of pipeline register (in ps) |
|----|---------------------------------------|--------------------------------------|
| a. | 300, 100, 200, 300, 100 (slow memory) | 0                                    |
| b. | 200, 200, 200, 200, 200               | 40                                   |
| c. | 200, 200, 200, 200, 200 (ideal)       | 0                                    |