

University of Guelph  
School of Engineering  
ENG 3380

Computer Organization and Design  
Winter 2017

Solution

## Midterm Examination

Instructor: Shawki M. Areibi.

Location: THRN 1307

Date: Saturday, March 4th 2017

Time: 11:30 AM - 1:00 PM

Duration: 90 minutes.

Type: R—"Closed Book."

### Instructions:

1. There are 4 questions in total, answer all questions.
2. The total number of pages in the examination booklet is 15 pages.  
Please notify the invigilator immediately if your examination booklet does not contain the number of pages indicated.
3. You have to use Pens (no pencils allowed)!
4. Write your answers directly on the examination booklet.

|      |           |
|------|-----------|
| Name | Id number |
|      |           |

| For use of examiner |      |        |
|---------------------|------|--------|
|                     | mark | out of |
| Question 1          |      | 10     |
| Question 2          |      | 14     |
| Question 3          |      | 18     |
| Question 4          |      | 18     |
| Total               |      | 60     |

**Question 1. Performance [10 marks ]****Part A. CPI & Improvement [10 marks]**

Assume that for a given program 70% of the executed instruction are arithmetic, 10% are load/store, and 20% are branch.

- Given the instruction mix and the assumption that an arithmetic instruction requires 2 cycles, a load/store instruction takes 6 cycles, and a branch instruction takes 3 cycles, find the average CPI.
- For a 25% improvement in performance, how many cycles, on average, may an arithmetic instruction take if load/store and branch instructions are not improved at all.

You need to show your steps!!

$$\text{CPU Time} = \text{inst count} \times \text{CPI} \times \text{clock cycle time}$$

(i)

| OP           | Freq | CPI <sub>i</sub> | Freq x CPI <sub>i</sub> | 25% imp |
|--------------|------|------------------|-------------------------|---------|
| Arith        | 70%  | 2                | 1.4                     | ??      |
| Lo/ SW       | 10%  | 6                | 0.6                     | 0.6     |
| Branch       | 20%  | 3                | 0.6                     | 0.6     |
| $\sum = 2.6$ |      |                  |                         |         |



$$\text{Avg CPI} = \boxed{2.6}$$



$$(ii) 25\% \text{ improvement} \rightarrow \frac{\text{old CPI}}{\text{new CPI}} = \frac{2.6}{\text{new CPI}} = 1.25$$

$$\therefore \text{new CPI} = \frac{2.6}{1.25} = 2.08$$

$$\therefore \text{Avg Arith CPI} = 2.08 - 0.6 - 0.6 = \boxed{0.88}$$

## Question 2. MIPS Instruction Set [14 marks]

### Part A. MIPS Loop [14 marks]

Consider the following MIPS loop:

| Label | Instruction         | # Comments                                  |
|-------|---------------------|---------------------------------------------|
| LOOP: | slt \$t2, \$0, \$t1 | # if \$0 < \$t1 then \$t2 = 1 else \$t2 = 0 |
|       | beq \$t2, \$0, DONE | # ... Goto Done if \$t2 = 0                 |
|       | subi \$t1, \$t1, 1  | # ... \$t1 = \$t1 - 1                       |
|       | addi \$s2, \$s2, 2  | # ... \$s2 = \$s2 + 2                       |
|       | j LOOP              | # ... Goto Loop                             |

DONE:

- Complete the documentation (comments to remaining statements) of the above MIPS loop.
- Assume that the register \$t1 is initialized to the value 10. What is the value in register \$s2 assuming \$s2 is initially zero?
- For the MIPS loop code above, write the equivalent C code statements. Assume that the registers \$s1, \$s2, \$t1, and \$t2 are integers  $A$ ,  $B$ ,  $i$ , and  $temp$ , respectively. ( $\$s2=0$ ,  $\$t1=10$ ).
- For the loops written in MIPS assembly above, assume that the register \$t1 is initialized to the value  $N$ . How many MIPS instructions are executed?

(ii)  $\$t1 = 10, \$s2 = 0$

since we will go through the loop 10 times and each time  
we add "2" to  $\$s2 \therefore \$s2 = 20$

$i = 10; B = 0;$

2

(iii)

do

{

$B = B + 2;$

$i = i - 1;$

} while ( $i > 0$ )

6

(iv) for  $\underline{N}$  iterations ; 5 instructions per iteration  
Total =  $5 \times N = 5N$  iterations

### Question 3. Computer Organization [18 marks]

#### Part A. Single Cycle Computer [18 marks]

In the following questions, assume that we are starting with a datapath as shown in Figure 1.



Figure 1: Simplified Single Cycle Data Path

Assume the major components of the given datapath have the latencies and costs shown in Table 1

| Unit    | I-Mem | Add   | Mux     | ALU   | Regs  | D-Mem | Control |
|---------|-------|-------|---------|-------|-------|-------|---------|
| Label   | A     | B, C  | H, I, J | E     | D     | F     | G       |
| Latency | 400ps | 100ps | 30ps    | 120ps | 200ps | 350ps | 100ps   |
| Cost    | 1000  | 30    | 10      | 100   | 200   | 20000 | 500     |

Table 1: Latencies and Costs of Major Components

- Referring to the labels A through J in the datapath diagram in Figure 1, and assuming that a "lw" instruction takes the longest time compared to other instructions (arithmetic, branch, e.t.c).
  - List the steps to be taken to fetch, decode, execute the instruction lw \$t1, offset(\$t2).

- (b) What is the critical path for the *lw* instruction, and what is the total latency for the critical path?
- ii. When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance trade-off. Consider the addition of a multiplier to the ALU of Figure 1. This addition will add 300ps to the latency of the ALU and will add a cost of **600** to the ALU. The result will be 5% fewer instructions executed since we will no longer need to emulate the MUL instruction.
- What is the clock cycle time with and without this improvement?
  - What is the speedup achieved by adding this improvement?
  - Compare the cost/performance ratio with and without this improvement.

(i) (a) The instruction is fetched, pc is incremented

- A register \$t<sub>2</sub> is read from Regfile
- Alu computes the sum of the value read from Regfile and the sign extended, lower 16-bits of the instruction (offset)
- The sum from the Alu is used as the address for data Memory
- The data from the D-Mem unit is written into the Regfile. (the Reg destination is given by bits 20:16 of Regfile. (i.e \$t<sub>1</sub>)

(b) The critical path for "lw" instruction



(ii) (a) clock cycle time without improvement is 1330 ps  
 " " " with improvement is  $1330 + 300 = 1630 \text{ ps}$

(b) speedup achieved

$$\frac{\text{cpu time old}}{\text{cpu time new}} \rightarrow \frac{1 \times 1330}{0.95 \times 1630} = 0.8588$$

↓  
Slower!

... Continue Solution of Question 3 ...



③ cost/performance ratio

original cost

I-Mem, RegFile, control, ALU, D-Mem, 2 Adder, 3 Mux  
1000, 200, 500, 100, 2000, 2x30, 3x10

3890

new cost

$$3890 + 600 = 4490$$

$$\text{Relative cost} = \frac{4490}{3890} = 1.15$$

$$\text{cost/perf} = \frac{1.15}{0.85} = 1.3390$$

so we are paying significantly more  
for significantly worse performance

## Question 4. Computer Architecture [18 marks]

### Part A. Multi Cycle Computer [18 marks]

Figure 2 shows the complete data path along with necessary control signals.



Figure 2: Multi Cycle Data Path

- What are the main advantages of a Multi Cycle Processor over a Single Cycle Implementation (List at least 3)?
- The Multi Cycle Processor has to take several steps to fetch an instruction.
  - Write the RTL statements corresponding to fetching an instruction of a Multi Cycle datapath..
  - Identify the components in the data path that are involved in fetching the instruction. Explain briefly the role of each component you identified.
  - List the control signal issued by the Control Unit to control the different components (Circle these signals on the block diagram provided).

... Continue Solution of Question 4 ...

6

(i) Multi cycle implementation allows:

- ① faster clock rates per step
- ② different instructions to take different number of clock cycles
- ③ functional units to be used more than once per instruction  
as long as they are used on different clock cycles
  - only one memory
  - only one ALU/adder

(ii) (a) RTL  $\rightarrow$   $IR = \text{Memory}[pc]$

$$pc = pc + 4$$

W

b) components involved

- PC  $\rightarrow$  points to instruction memory
- MUX before memory  $\rightarrow$  choose between PC & ALUout Reg
- I-Mem/D-Mem  $\rightarrow$  contains instruction to be fetched
- IR  $\rightarrow$  opcode and related field of instruction are stored
- ALU  $\rightarrow$  add  $pc + 4$  (PC)
- MUX (ALUSRC A)  $\rightarrow$  choose SRC A for ALU (4)
- MUX (ALUSRC B)  $\rightarrow$  choose SRC B for ALU (4)
- MUX (controlling PC source)

c) control signals issued by control unit:

W

Iord  $\rightarrow$  0

MemRead  $\rightarrow$  1

IRwrite  $\rightarrow$  1

ALUSRC A  $\rightarrow$  0

ALUSRC B  $\rightarrow$  01

PCsource  $\rightarrow$  00

pcwrite  $\rightarrow$  1