

Stamati Morellas  
COM S 321 – Problem Set 1  
Due 9/16/19

Stamati Morellas

COM S 321 - HW1 # 1.2, 1.3, 1.4, 1.5, 1.7, 1.9, 1.10, 1.12, 1.15

9/16/19

### 1.2

- a. Assembly lines in automobile manufacturing → Performance via Pipelining
- b. Suspension bridge cables → Performance via Parallelism
- c. Aircraft/Marine navigation systems → Performance via Prediction
- d. Express elevators → Make Common Case Fast
- e. Library reserve desk → Hierarchy of Memories
- f. Increasing gate area on CMOS transmitter → Dependability via Redundancy
- g. Electromagnetic aircraft catapults → Design for Moore's Law
- h. Self-driving cars... → Use Abstraction to simplify design

1.3 Translating a high-level program into primitive instructions involves the help of a few layers of software. The steps are as follows:

1. Start with a high-level program. Run the program (instructions) through a compiler, which divides and translates the instructions into assembly language.
2. Once converted to assembly language, each operation is then translated by the assembler, which turns the assembly language into binary machine language, which are readable instructions by the computer hardware to execute.
3. Machine reads and executes binary machine language.

1.4

a. Minimum size of frame buffer:  $1024 \times 1280 \times 3 = 3.9 \text{ mb}$

b.  $100 \text{ Mbit/s} \rightarrow \frac{100}{8} \text{ mbytes/s} = 12.5 \text{ mb/s}$

$$\frac{1 \text{ sec}}{12.5 \text{ mb}} \times 3.9 \text{ mb} = 0.312 \text{ sec}$$

1.5 Processors:

$$\begin{aligned} P_1 &= 3 \text{ GHz} - 1.5 \text{ CPI} & \rightarrow 2e9 \text{ instructions/sec} \\ P_2 &= 2.5 \text{ GHz} - 1.0 \text{ CPI} & P = \frac{\text{Clock Rate}}{\text{CPI}} \rightarrow 2.5e9 \\ P_3 &= 4.0 \text{ GHz} - 2.2 \text{ CPI} & \rightarrow 1.82e9 \end{aligned}$$

a.  $P_3$  has the highest performance

b.

$$\begin{aligned} \# \text{ Cycles} &= \text{Time} \times \text{Clock Rate} & \# \text{instructions} &= \frac{\# \text{cycles}}{\text{CPI}} \\ P_1 &= 10 \times 3 \times 10^9 = 30e9 \text{ cycles} & P_1 &= \frac{30 \times 10^9}{1.5} = 20 \times 10^9 \text{ insts} \\ P_2 &= 10 \times 2.5 \times 10^9 = 25e9 \text{ cycles} & P_2 &= \frac{25 \times 10^9}{1.0} = 25 \times 10^9 \text{ insts} \\ P_3 &= 10 \times 4.0 \times 10^9 = 40e9 \text{ cycles} & P_3 &= \frac{40 \times 10^9}{2.2} = 18.18 \times 10^9 \text{ insts} \end{aligned}$$

c. New CPU Time:  $\frac{(I \times CPI)}{\text{Clock Rate}}$

$$-30\%: 7s$$

$$P_1: 1.2 \times 1.5 = 1.8 \text{ CPI}$$

$$+20\% \text{ CPI}: 1.2 \times \text{CPI}$$

$$\text{Clock Rate: } \frac{20 \times 10^9 \times 1.8}{7s} = 5.14 \text{ GHz}$$

$$P_2: 1.2 \times 1.0 = 1.2 \text{ CPI}$$

$$\text{Clock Rate: } \frac{25 \times 10^9 \times 1.2}{7s} = 4.28 \text{ GHz}$$

$$P_3: 1.2 \times 2.2 = 2.64 \text{ CPI}$$

$$\text{Clock Rate: } \frac{18.18 \times 10^9 \times 2.64}{7s} = 6.85 \text{ GHz}$$

1.7

| a. Compiler: | Execution Time: | Inst. Count:      |
|--------------|-----------------|-------------------|
| A            | 1.1 s           | $1 \times 10^9$   |
| B            | 1.5 s           | $1.2 \times 10^9$ |

$$CPI = \frac{\text{Execution Time}}{\# \text{inst} \times \text{Clock cycle}}$$

For A:

$$\boxed{CPI = 1.1}$$

For B:

$$\boxed{CPI = 1.25}$$

b. A execution: Instructions  $\times$  CPI  $\times$  Clock cycle time ~~clock cycle time~~

$$\cancel{\text{Instructions}} = 1 \times 10^9 \times 1.1 \times \text{Clock cycle time}$$

B execution:  $1.2 \times 10^9 \times 1.25 \times \text{Clock cycle time}$

$$1 \times 10^9 \times 1.1 \times CCT_A = 1.2 \times 10^9 \times 1.25 \times CCT_B$$

$$CCT_A = 1.36 \times CCT_B$$

Clock of compiler A is 1.36 times faster than B

c. New compiler:

- instructions:  $6.0 \times 10^8$  - CPU Execution: 0.66 s

- CPI: 1.1

$$\text{Speedup of A: } \frac{1.1}{0.66} = \boxed{1.66}$$

$$\text{Speedup of B: } \frac{1.5}{0.66} = \boxed{2.27}$$

1.9.1

$$\text{Execution Time (1)}: \frac{\text{Clock cycle}}{\text{Clock Rate}} = \frac{1.92 \times 10^{10} \text{ cycles}}{2 \times 10^9 \text{ cycles/sec}} = [9.6 \text{ sec}]$$

$$\text{Execution Time (2)}: \frac{1.408 \times 10^{10}}{2 \times 10^9} = [7.04 \text{ sec}]$$

$$\text{Execution Time (4)}: \frac{7.68 \times 10^9}{2 \times 10^9} = [3.84 \text{ sec}]$$

$$\text{Execution Time (8)}: \frac{4.48 \times 10^9}{2 \times 10^9} = [2.24 \text{ sec}]$$

$$\text{Speed up (1-2)}: \frac{9.6}{7.04} = 1.36 \quad [2 \text{ processors are } 1.36 \text{ times faster than 1}]$$

$$\text{Speed up (1-4)}: \frac{9.6}{3.84} = 2.5 \quad [4 \text{ processors are } 2.5 \text{ times faster than 1}]$$

$$\text{Speed up (1-8)}: \frac{9.6}{2.24} = 4.28 \quad [8 \text{ processors are } 4.28 \text{ times faster than 1}]$$

1.9.2

$$\text{Processor-1: } \frac{2.176 \times 10^{10}}{2 \times 10^9} = [10.88 \text{ sec}]$$

$$\text{Processors-2: } \frac{1.591 \times 10^{10}}{2 \times 10^9} = [7.96 \text{ sec}]$$

$$4 \text{ Processors: } \frac{8.59 \times 10^9}{2 \times 10^9} = [4.29 \text{ sec}]$$

$$8 \text{ Processors: } \frac{4.94 \times 10^9}{2 \times 10^9} = [2.47 \text{ sec}]$$

### 1.10.1

$$(1) \text{ Wafer area: } \pi r^2$$

$$r = 15/2 = 7.5 \text{ cm}$$

$$\pi \cdot 7.5 \text{ cm} = 176.7 \text{ cm}^2$$

$$\text{Die area} = \frac{\text{Wafer area}}{\text{num dies}}$$

$$\text{Die area} = \frac{176.7 \text{ cm}^2}{84} = 2.10 \text{ cm}^2$$

$$\text{Yield for wafer 1} = \frac{1}{(1 + (0.020 \times 2.10/2))^2} = [0.959]$$

(2)

$$r = 20/2 = 10$$

$$W_{\text{area}} = 314 \text{ cm}^2$$

$$D_{\text{area}} = 3.14 \text{ cm}^2$$

$$\text{Yield for wafer 2: } \frac{1}{(1 + (0.031 \times (3.14/2))^2)} = [0.910]$$

### 1.10.2

$$W_1, \text{ Cost per die} = \frac{\text{cost per wafer}}{\text{dies per wafer} \times \text{yield}} = \frac{12}{84 \times 0.96} = 0.1488 \approx [0.15]$$

$$W_2, \text{ cost per die} = \frac{15}{100 \times 0.909} = 0.1650 \approx [0.17]$$

### 1.10.3

$$(1) \text{ Dies per wafers: } 84 + (84 \times 10) = 92.4 \quad (1) \text{ Die Area} = \frac{176.786}{92.4}$$

$$(1) \text{ Defects per unit area: } \frac{115}{100} \times 0.020 = [0.023]$$

$$= [1.92 \text{ cm}^2]$$

$$(1) \text{ Yield} = \frac{1}{(1 + (0.023 \times \frac{1.92}{2}))^2} = [0.96]$$

$$(2) \text{ Dies per wafer: } \frac{110}{100} \times 100 = 110$$

$$(2) \text{ Defects per unit area: } \frac{115}{110} \times 0.031 = [0.036]$$

$$(2) \text{ Die Area: } \frac{314}{110} = [2.85 \text{ cm}^2]$$

$$(2) \text{ Yield: } \frac{1}{1.051^2} = [0.90]$$

| 1.12.1 | Processor | Clock Rate | CPI  | Instructions    |
|--------|-----------|------------|------|-----------------|
| P1     |           | 4GHz       | 0.9  | $5 \times 10^9$ |
| P2     |           | 3GHz       | 0.75 | $1 \times 10^9$ |

$$\text{CPU Time (P1)} = \frac{\text{CPI} \times \text{Instruction}}{\text{Clock Rate}} = \frac{(0.9)(5 \times 10^9)}{4 \times 10^9}$$

$\Rightarrow = 1.125 \text{ s}$

$$\text{CPU Time (P2)} = \frac{(0.75)(1 \times 10^9)}{3 \times 10^9} = 0.25 \text{ s}$$

P2 performs better

### 1.12.2

$$\text{Execution (P1)}: \frac{0.9 \times 1 \times 10^9}{4 \times 10^9} = 0.225 \text{ s}$$

$$\text{Execution (P2)}: \frac{I \times 0.75}{3 \times 10^9} \Rightarrow I = \frac{0.225 \times 3 \times 10^9}{0.75} = 9 \times 10^8$$

P2 must execute  $9 \times 10^8$  instructions

### 1.12.3

$$\text{For P1: } T(P1) = \frac{0.9 \times 5 \times 10^9}{4 \times 10^9} = 1.125 \text{ s} \quad \text{MIPS} = \frac{4 \times 10^3}{0.9} = 4.44 \times 10^3$$

$$\text{For P2: } T(P2) = \frac{0.75 \times 1 \times 10^9}{3 \times 10^9} = 0.25 \text{ s} \quad \text{MIPS} = \frac{3 \times 10^3}{0.75} = 4 \times 10^3$$

MIPS is inversely proportional to performance

### 1.12.4

$$FP_1 = 5 \times 10^9 \times 0.4 = 2 \times 10^9$$

$$FP_2 = 1 \times 10^9 \times 0.4 = 4 \times 10^8$$

$$MFLOPS_1 = \frac{1.8 \times 10^9}{0.45 \times 10^6} = 4 \times 10^3$$

$$MFLOPS_2 = \frac{3 \times 10^8}{0.1 \times 10^6} = 3 \times 10^3$$

1.15

| Processor | Execution Time | Total Time | 100/Total<br>Time   | <u>Relative Speedup</u><br>Processor |
|-----------|----------------|------------|---------------------|--------------------------------------|
|           |                |            | Relative<br>Speedup | Speedup<br>Actual v Ideal            |
| 1         | 100            | 100        |                     |                                      |
| 2         | 50             | 54         | 1.85                | 0.93                                 |
| 4         | 25             | 29         | 3.45                | 0.86                                 |
| 8         | 12.5           | 16.5       | 6.06                | 0.76                                 |
| 16        | 6.25           | 10.25      | 9.76                | 0.61                                 |
| 32        | 3.125          | 7.125      | 14.04               | 0.44                                 |
| 64        | 1.5625         | 5.5625     | 17.98               | 0.28                                 |
| 128       | 0.78125        | 4.78125    | 20.92               | 0.16                                 |