

## 1) Fill in the blanks (3 points each)

- a) The ~~instruction set architecture~~ is an abstract interface between the hardware and the lowest level of software of a machine.
- b) The power wall has forced a dramatic change of microprocessors. This has made ~~parallelism~~ an attractive alternative for gaining higher performance in microprocessors.
- c) List three important features of a RISC architecture such as MIPS:  
~~large number of registers~~  
~~pipelining~~  
~~fixed-length instructions~~
- d) Assume a program that consists of only addition and multiplication operations each taking 50% of the overall execution time. By how much should we improve the speed of multiplication hardware to improve the overall execution time by 25%?  
~~by 5%~~ or improvement  $0.5a + 0.5m = 0.75$   
~~factor of 2~~
- e) Pipelining enhances performance by improving the time it takes to complete an individual instruction. (True or False)
- f) ~~CPU Execution time~~ is defined as the total time to complete a task including I/O access, memory access, disk access, and operating system overhead.
- g) In a multi-cycle system instructions are fetched, decoded, and executed in 3 to 6 cycles depending on the instruction type. The peak performance for this system if running at 1 Giga Hertz is  $3.33 \times 10^6$  instructions per second.  
 $\text{clock cycles} = \frac{\text{instructions}}{\text{cycles}}$
- h) Increasing ~~clock cycles~~ or lowering ~~clock rate~~ can improve CPU execution time.  
 $\text{CPU time} = \frac{\text{clock cycles}}{\text{clock rate}}$
- i) The ideal instruction execution rate for a single-issue pipelined datapath is  $1 \text{ instruction}$  per cycle.
- j) List three types of hazards that can degrade the performance of a pipelined datapath.
  - i) ~~data dependency hazard~~
  - ii) ~~loadword hazard~~
  - iii) ~~branch~~

logical  
 $A = a'b'01x$   
 $B = a'b'01x$   
 $c = ...$

$\stackrel{==}{\sim}$  → equal  $\rightarrow$   
 $\stackrel{==}{\sim}$  is equivalent to  
 $A \stackrel{==}{\sim} B \Rightarrow$  unknown, both 1  
 $A \stackrel{==}{\sim} B \Rightarrow 1$   
 $!=$  does not equal

CSc/CpE 142 Exam 1

Name.....

- 2) List all assembly instructions necessary for the execution of the following C code-segment on a 32-system designed using the 32-bit MIPS architecture. Assume that all variables are 32-bit unsigned numbers stored in main memory. Your answer should use proper MIPS assembly format. (15 points)

```

if(enable == 1)
  s = a + b + c;
else
  s = 0;

lbu $t0, 1($a)           // load integer 1
bne $t0, $t1, L1          // if not equal, go to L1
lw $t1, 0($b)             // load a
lw $t2, 0($c)             // load b
lw $t3, 0($d)             // load c
add $t0, $t1, $t2          // add a+b
add $t0, $t0, $t3          // add a+b+c, store
lw $t4, 0($a)             // load a
lw $t5, 0($b)             // load b
lw $t6, 0($c)             // load c
lw $t7, 0($d)             // load d
L1:                         } overflow
                            -1
  
```

5) Consider a hypothetical 16-bit assembly language with only three instructions. The format for these instructions is given below. Design a single-cycle CPU which can fetch and execute any of these instructions. Show all necessary components. Label each component (including its ports). Assume that the memory is byte-addressable. The internal design of the control logic is not required. You must use minimum number of components possible. (20 points)

|             | Bits<br>(15 - 12) | bits<br>(11 - 8) | bits<br>(7 - 4) | bits<br>(3 - 0) |                                                                                    |
|-------------|-------------------|------------------|-----------------|-----------------|------------------------------------------------------------------------------------|
| Instruction | opcode            | operand 1        | operand 2       | Operand 3       | operation                                                                          |
| SWAP        |                   | Reg 1            | Reg 2           | not used        | Swaps the contents of operand 1 and operand 2;<br>Reg 1 <= Reg 2<br>Reg 2 <= Reg 1 |
| ADD         | 0010              | Reg 1            | Reg 2           | Reg 3           | Reg 1 = Reg 3 + Reg 2                                                              |
| SUB         | 1000              | Reg 1            | Reg 2           | Reg 3           | Reg 1 = Reg 3 - Reg 2                                                              |



- 4) Consider a 1 Giga Hertz system based on architecture M1 with the following specifications:

| Instruction types available in M1 | CPI | Percentage of operations in a typical program |
|-----------------------------------|-----|-----------------------------------------------|
| Load                              | 6   | 15%                                           |
| Store                             | 5   | 15%                                           |
| integer arithmetic                | 4   | 50%                                           |
| floating point arithmetic         | 14  | 20%                                           |

- a) Compute peak performance of M1 in instructions per second. (10 points)

- b) Architecture M2 is a variant of M1 without the floating-point instructions. However, M2 can still perform floating-point operations by emulating them using integer instructions. It takes four M2 integer instructions to emulate each floating-point operation. An implementation of M2 has a 10% improved clock rate compared to that of M1. Calculate the CPU time of a 100-instruction program on M2. (10 points)

(a)

$$CPI = \frac{\text{clock cycles per second}}{\text{instructions per second}}, \quad \text{instructions per second} = \frac{\text{clock cycles}}{CPI}$$

$$\text{instructions per second} = \frac{1 \times 10^9}{29} = 3.44462 \times 10^7 \text{ instructions per second}$$

O<sub>3</sub>

(b)

$$CPU \text{ time} = \frac{\text{clock cycles}}{\text{clock rate}} = \frac{CPI \cdot \text{instructions}}{\text{clock rate}}$$

$$\boxed{\text{new CPI} = 6 + 5 + 4 + 4(4) = 31}$$

$$\text{new clock rate} = 1.1 \times 10^9 \text{ Hz}$$

use individual CPI's  
- 3

$$CPU \text{ time} = \frac{31 \cdot 100}{1.1 \times 10^9} = 3 \text{ MS.}$$

a)  $CPI = \frac{(4 \times 15) + (5 \times 15) + (4 \times 50) + (14 \times 20)}{100} = 6.45 \text{ [CPI]}$

Performance =  $\frac{\text{Clockrate}}{CPI} = \frac{1 \times 10^9}{6.45} = 15.5 \times 10^7 \text{ instructions/second}$

b)  $CPI = \frac{(4 \times 15) + (5 \times 15) + (4 \times 50) + (14 \times 50)}{100} = 11.65 \text{ [CPI]}$

CPU time =  $\frac{(\Sigma CPI \cdot \text{Instructions})}{\text{Clockrate}} = \frac{11.65 \cdot 100}{1.1 \times 10^9} = 1.06 \mu\text{s}$

logical  
 $A = 9^b / 10^x$   
 $B = 4^b \text{ mod } 10$

$\equiv \rightarrow \text{equal to}$   
 $\equiv \equiv \rightarrow \text{is equivalent to}$   
 $\neq \rightarrow \text{does not equal}$

Sc/CpE 142 Exam 1

Name.....

- Consider the pipelined system shown below. Find the number of cycles required to execute the following MIPS code. Justify your answer. (15 points)

|     | 1                  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 |
|-----|--------------------|----|----|----|----|----|----|----|----|----|----|----|
| 300 | SLT \$6, \$8, \$7  | IF | ID | EX | M  | WB |    |    |    |    |    |    |
| 304 | LW \$5, 100(\$6)   |    | IF | ID | EX | M  | WB |    |    |    |    |    |
| 306 | LW \$7, 100(\$15)  |    |    | IF | ID | EX | M  | WB |    |    |    |    |
| 308 | SUB \$9, \$7, \$4  |    |    |    | IF | ID | EX | M  | WB |    |    |    |
| 310 | SUB \$20, \$9, \$7 |    |    |    |    | IF | ID | EX | M  | WB |    |    |
| 314 | LW \$8, 100(\$9)   |    |    |    |    |    | IF | ID | EX | M  | WB |    |
| 316 | SW \$8, 100(\$9)   |    |    |    |    |    |    | IF | ID | EX | M  | WB |

-3



DAG

300  
↓h  
304

$306 \downarrow h$  ← register forward  
 $308 \downarrow h$  ← register forward  
 $310$   
 $314 \downarrow h$  ← hazard detection unit  
 $316$

$$\rightarrow 4n - 1 + [\text{delays}]$$

$$= 5 + 7 - 1 = 11 \text{ best case}$$

$$= 11 + [1(304 \text{ lw})]$$

= 12 cycles

4