

Tutorial 8 (Practice Problems)  
CSE 112 Computer Organization

**Q1.** A Processor having five stages (Fetch, Decode, Execute, Memory, Writeback) and following Rv32I Isa is designed. We need to answer the below-mentioned questions while concerning the above-mentioned processor. There is no forwarding hardware in the microprocessor to forward the ALU result.

- Draw the pipeline diagram for the below-mentioned assembly programs.
- Point out the Data Hazards in the below-mentioned programs.
- Write the modified code after resolving the data hazards by inserting NOP.
  - You may use **add x0,x0,x0** as a possible NOP instruction.
- Calculate the total number of cycles taken to execute the assembly code.

| Assembly Code_A | Assembly Code_B | Assembly Code_C |
|-----------------|-----------------|-----------------|
| add x1,x0,x0    | add x1,x0,x0    | add x1,x0,x0    |
| sub x3,x2,x0    | sub x3,x2,x0    | sub x3,x2,x0    |
| slt x4,x7,x8    | slt x4,x7,x8    | slt x4,x7,x1    |
| or x9,x10,x11   | sw x1,10(x1)    | sw x1,10(x1)    |
| and x12,x13,x14 | or x9,x10,x11   | or x9,x3,x11    |
| sw x1,10(x1)    | and x12,x13,x14 | and x12,x9,x14  |

**Q2. Processor's Qualitative Performance Evaluation**

- Two processors, A and B, are working on the same ISA. The frequency of processor A is higher than the frequency of processor B. For a given program, does it imply that processor A always executes more instructions per second than processor B?
- Two processors, A and B, are working on the same ISA. For a given program, if A executes  $x$  Instructions per second, and B executes  $y$  instructions per second. If  $x > y$ , does this imply processor A is faster than processor B?

**Q3.** What are benchmark programs and what is the definition of MIPS (Million Instruction Per Second)

**Q4.** Two Processors (A,B) designed for different ISAs (A,B) respectively are being evaluated against a chosen benchmark. The processor A operates at 5GHz and processor B operates at 4GHz. We need to answer the following.

- What is the performance of both the processors in terms of total execution time.

|                 | Instruction Count<br>(ISA_A) | Processor_A<br>(Execution Time) | Instruction Count<br>(ISA_B) | Processor_B<br>(Execution Time) |
|-----------------|------------------------------|---------------------------------|------------------------------|---------------------------------|
| Optimization O1 | 10000                        |                                 | 12000                        |                                 |
| Optimization O2 | 8000                         |                                 | 4500                         |                                 |
| Optimization O3 | 6000                         |                                 | 6200                         |                                 |

Tutorial 8 (Rubric)  
CSE 112 Computer Organization

**Q1.** A Processor having five stages (Fetch, Decode, Execute, Memory, Writeback) and following Rv32I Isa is designed. We need to answer the below-mentioned questions while concerning the above-mentioned processor. There is no forwarding hardware in the microprocessor to forward the ALU result.

- e. Draw the pipeline diagram for the below-mentioned assembly programs.
- f. Point out the Data Hazards in the below-mentioned programs.
- g. Write the modified code after resolving the data hazards by inserting NOP.
  - i. You may use **add x0,x0,x0** as a possible NOP instruction.
- h. Calculate the total number of cycles taken to execute the assembly code.

| Assembly Code_A | Assembly Code_B | Assembly Code_C |
|-----------------|-----------------|-----------------|
| add x1,x0,x0    | add x1,x0,x0    | add x1,x0,x0    |
| sub x3,x2,x0    | sub x3,x2,x0    | sub x3,x2,x0    |
| slt x4,x7,x8    | slt x4,x7,x8    | slt x4,x7,x1    |
| or x9,x10,x11   | sw x1,10(x1)    | sw x1,10(x1)    |
| and x12,x13,x14 | or x9,x10,x11   | or x9,x3,x11    |
| sw x1,10(x1)    | and x12,x13,x14 | and x12,x9,x14  |

**Solution:**

**Note:** The red colored regions represent the first detection of data hazard in the instruction.

| Assembly Code_A |   |   |   |   |   |   |   |   |   |    |
|-----------------|---|---|---|---|---|---|---|---|---|----|
| Clock Cycle     | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| add x1,x0,x0    | F | D | X | M | W |   |   |   |   |    |
| sub x3,x2,x0    |   | F | D | X | M | W |   |   |   |    |
| slt x4,x7,x8    |   |   | F | D | X | M | W |   |   |    |
| or x9,x10,x11   |   |   |   | F | D | X | M | W |   |    |
| and x12,x13,x14 |   |   |   |   | F | D | X | M | W |    |
| sw x1,10(x1)    |   |   |   |   |   | F | D | X | M | W  |

Total Number of Cycles taken for the execution of Assembly Code\_A is 10.

| Assembly Code_B (Original Program) |   |   |   |   |                 |   |   |   |   |    |
|------------------------------------|---|---|---|---|-----------------|---|---|---|---|----|
| Clock Cycle                        | 1 | 2 | 3 | 4 | 5               | 6 | 7 | 8 | 9 | 10 |
| add x1,x0,x0                       | F | D | X | M | W               |   |   |   |   |    |
| sub x3,x2,x0                       |   | F | D | X | M               | W |   |   |   |    |
| slt x4,x7,x8                       |   |   | F | D | X               | M | W |   |   |    |
| sw x1,10(x1)                       |   |   |   | F | D (register x1) | X | M | W |   |    |
| or x9,x10,x11                      |   |   |   |   | F               | D | X | M | W |    |
| and x12,x13,x14                    |   |   |   |   |                 | F | D | X | M | W  |

| Assembly Code_B (After Inserting Minimum Number of Nope Instructions) |   |   |   |   |   |   |   |   |   |    |
|-----------------------------------------------------------------------|---|---|---|---|---|---|---|---|---|----|
| Clock Cycle                                                           | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| add x1,x0,x0                                                          | F | D | X | M | W |   |   |   |   |    |
| sub x3,x2,x0                                                          |   | F | D | X | M | W |   |   |   |    |
| slt x4,x7,x8                                                          |   |   | F | D | X | M | W |   |   |    |
| <b>add x0,x0,x0</b>                                                   |   |   |   | F | D | X | M | W |   |    |
| sw x1,10(x1)                                                          |   |   |   |   | F | D | X | M | W |    |
| or x9,x10,x11                                                         |   |   |   |   |   | F | D | X | M | W  |
| and x12,x13,x14                                                       |   |   |   |   |   | F | D | X | M | W  |

Total Number of Cycles taken for the execution of Assembly Code\_B is 11.

| Assembly Code_C |   |   |   |            |            |            |            |   |   |    |
|-----------------|---|---|---|------------|------------|------------|------------|---|---|----|
| Clock Cycle     | 1 | 2 | 3 | 4          | 5          | 6          | 7          | 8 | 9 | 10 |
| add x1,x0,x0    | F | D | X | M          | W          |            |            |   |   |    |
| sub x3,x2,x0    |   | F | D | X          | M          | W          |            |   |   |    |
| slt x4,x7,x1    |   |   | F | D (reg x1) | X          | M          | W          |   |   |    |
| sw x1,10(x1)    |   |   |   | F          | D (reg x1) | X          | M          | W |   |    |
| or x9,x3,x11    |   |   |   |            | F          | D (reg x3) | X          | M | W |    |
| and x12,x9,x14  |   |   |   |            |            | F          | D (reg x9) | X | M | W  |

| Assembly Code_C (After Inserting Minimum Number of Nope Instructions) |   |   |   |   |   |   |   |   |   |    |    |    |    |    |    |
|-----------------------------------------------------------------------|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|
| Clock Cycle                                                           | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| add x1,x0,x0                                                          | F | D | X | M | W |   |   |   |   |    |    |    |    |    |    |
| sub x3,x2,x0                                                          |   | F | D | X | M | W |   |   |   |    |    |    |    |    |    |
| <b>add x0,x0,x0</b>                                                   |   |   | F | D | X | M | W |   |   |    |    |    |    |    |    |
| <b>add x0,x0,x0</b>                                                   |   |   |   | F | D | X | M | W |   |    |    |    |    |    |    |
| slt x4,x7,x1                                                          |   |   |   |   | F | D | X | M | W |    |    |    |    |    |    |
| sw x1,10(x1)                                                          |   |   |   |   |   | F | D | X | M | W  |    |    |    |    |    |
| or x9,x3,x11                                                          |   |   |   |   |   |   | F | D | X | M  | W  |    |    |    |    |
| <b>add x0,x0,x0</b>                                                   |   |   |   |   |   |   |   | F | D | X  | M  | W  |    |    |    |
| <b>add x0,x0,x0</b>                                                   |   |   |   |   |   |   |   |   | F | D  | X  | M  | W  |    |    |
| <b>add x0,x0,x0</b>                                                   |   |   |   |   |   |   |   |   |   | F  | D  | X  | M  | W  |    |
| and x12,x9,x14                                                        |   |   |   |   |   |   |   |   |   |    | F  | D  | X  | M  | W  |

Total Number of Cycles taken for the execution of Assembly Code\_C is 15.

## Q2. Processor's Qualitative Performance Evaluation

- a) Two processors, A and B, are working on the same ISA. The frequency of processor A is higher than the frequency of processor B. For a given program, does it imply that processor A always executes more instructions per second than processor B?
- b) Two processors, A and B, are working on the same ISA. For a given program, if A executes x Instructions per second, and B executes y instructions per second. If  $x > y$ , does this imply processor A is faster than processor B?

### Solution:

As there is no information given in terms of number of stages in the processor. So, let's assume they all are single stage processors.

- a) As in a single cycle or single stage processor every instruction takes exactly one cycle to execute. So, the processor having more number of cycles per second will be able to execute more instructions. Or, the processor operating at higher frequency (A) will execute more instructions per second.
- b) Similar to part (a) with the assumption of single stage processors.
  - i) If the processors are having different stages (more than one) there things will be clumsy as they will lead to hazards too. In single stage or single cycle processors there are neither control nor data hazards.

## Q3. What are benchmark programs and what is the definition of MIPS (Million Instruction Per Second)

### Solution

**Benchmark:** A benchmark is a set of applications having diverse behaviors. These applications are compiled and executed for any chosen ISA and on the corresponding processor to evaluate the Microprocessor design and the effectiveness of ISA. The performance is measured in terms of the total execution time of the application or can also be measured in average instructions executed per cycle. Some examples are PARSEC, SPLASH, etc.

### MIPS (Million Instructions Per Second)

As the name implies these are the number of instructions in units of millions which any processor is able to execute within a second.

**Q4.** Two Processors (A,B) designed for different ISAs (A,B) respectively are being evaluated against a chosen benchmark. The processor A operates at 5GHz and processor B operates at 4GHz. We need to answer the following.

- b) What is the performance of both the processors in terms of total execution time.

|                 | Instruction Count<br>(ISA_A) | Processor_A<br>(Execution Time) | Instruction Count<br>(ISA_B) | Processor_B<br>(Execution Time) |
|-----------------|------------------------------|---------------------------------|------------------------------|---------------------------------|
| Optimization O1 | 10000                        |                                 | 12000                        |                                 |
| Optimization O2 | 8000                         |                                 | 4500                         |                                 |
| Optimization O3 | 6000                         |                                 | 6200                         |                                 |

### Solution:

As nothing is mentioned about the number of stages in the processor. We consider single stage processors.

Clock cycle time for processor A =  $1/5\text{GHz} = 0.2\text{ns}$

Clock cycle time for processor A =  $1/5\text{GHz} = 0.25\text{ns}$

|                 | Instruction Count<br>(ISA_A) | Processor_A<br>(Execution Time) | Instruction Count<br>(ISA_B) | Processor_B<br>(Execution Time) |
|-----------------|------------------------------|---------------------------------|------------------------------|---------------------------------|
| Optimization O1 | 10000                        | $10000 * 0.2$                   | 12000                        | $12000 * 0.25$                  |
| Optimization O2 | 8000                         | $8000 * 0.2$                    | 6500                         | $6500 * 0.25$                   |
| Optimization O3 | 6000                         | $6000 * 0.2$                    | 6200                         | $6200 * 0.25$                   |

|                 | Instruction Count<br>(ISA_A) | Processor_A<br>(Execution Time) | Instruction Count<br>(ISA_B) | Processor_B<br>(Execution Time) |
|-----------------|------------------------------|---------------------------------|------------------------------|---------------------------------|
| Optimization O1 | 10000                        | 2000 ns                         | 12000                        | 3000 ns                         |
| Optimization O2 | 8000                         | 1600 ns                         | 6500                         | 1625 ns                         |
| Optimization O3 | 6000                         | 1200 ns                         | 6200                         | 1550 ns                         |

