

- 1)
- a) Performance via Pipelining
  - b) Performance via Parallelism
  - c) Performance via Prediction
  - d) Make common case fast
  - e) Hierarchy of Memories
  - f) Design for Moore's Law
  - g) Use abstraction to simplify design
  - h) Dependability Via Redundancy

2) 3 Processors  $P_1, P_2, P_3$  execute same set of instructions

| Processor | Clock Rate | CPI | Instruction Set |
|-----------|------------|-----|-----------------|
| $P_1$     | 3 GHz      | 1.5 | C               |
| $P_2$     | 2.5 GHz    | 1.0 | C               |
| $P_3$     | 4.0        | 2.2 | 'C              |

a) Who has highest performance (Instructions per second)?

$$\text{CPU} = \frac{\text{Instruction Count} \times \text{CPI}}{\text{Clock Rate}}$$

\* Best Performance  $\approx$  Lowest CPU time

$$\text{CPU}_{T_1} = \frac{C \times 1.5}{3 \text{ GHz}}$$

$$\text{CPU}_{T_2} = \frac{C \times 1.0}{2.5 \text{ GHz}}$$

$$\text{CPU}_{T_3} = \frac{C \times 2.2}{4.0 \text{ GHz}}$$

$$\text{CPU}_{T_1} = \frac{C \times 1.5}{3 \times 10^9} = 5 \times 10^{-10} * C$$

$$\text{CPU}_{T_2} = \frac{C \times 1.0}{2.5 \times 10^9} = 4 \times 10^{-10} * C$$

$$\text{CPU}_{T_3} = \frac{C \times 2.2}{4.0 \times 10^9} = 5.5 \times 10^{-10} * C$$

∴ Processor 2 has the best performance

b) each processor executes some program in 10 sec.

$$CPU_{Time} = \frac{\# Instructions * CPI}{Clock Rate}$$

|                | Clock Rate | CPI | CPU Time |
|----------------|------------|-----|----------|
| P <sub>1</sub> | 3.0        | 1.5 | 10       |
| P <sub>2</sub> | 2.5        | 1   | 10       |
| P <sub>3</sub> | 4.0        | 2.2 | 10       |

1)  ~~$CPU_{Time} * Clock Rate = \# Instructions / CPI$~~

2) Clock cycles =  $\frac{\# Instructions * CPI}{Clock Rate}$

|                  |                                                                                                       |
|------------------|-------------------------------------------------------------------------------------------------------|
| P <sub>1</sub> ) | # instructions = $\frac{10 * 3.0}{1.5} = \frac{30.0}{1.5} = \frac{60}{3} = 20 * 10^9 = 2.0E10$        |
| P <sub>2</sub> ) | clock cycles = $20 * 1.5 = 30 * 10^9 = 3.0E10$                                                        |
| P <sub>2</sub> ) | # instructions = $\frac{10 * 2.5}{1} = 25 * 10^9 = 2.5E10$                                            |
| P <sub>3</sub> ) | clock cycles = $25 * 1 = 25 * 10^9 = 2.5E10$                                                          |
| P <sub>3</sub> ) | # instructions = $\frac{10 * 4}{2.2} = \frac{40}{2.2} = \frac{200}{11} \approx 18.2 * 10^9 = 1.82E10$ |
|                  | clock cycles = $18.2 * 2.2 \approx 40 * 10^9 = 4.0E10$                                                |

c)  $0.7 CPU_{Time} = \frac{\# Instructions * CPI}{Clock Rate}$

$$\frac{0.7}{0.7} = \frac{1.2}{x}$$

$$0.7x = 1.2$$

$$x = \frac{1.2}{0.7} = 1.71$$

$\therefore$  The clock rate would have to increase by 71%

$$CR = \frac{1.2 \text{ (m)}}{0.7 \text{ (m)}} \rightarrow \frac{1.2}{0.7} = \frac{12}{7} \rightarrow \frac{12}{7}$$

$$CR = 1.71 \cdot 10^9$$

3) 15 cm diameter wafer, cost of 12, 84 dies, 0.020 defects/cm<sup>2</sup>

20 cm diameter wafer, cost of 15, 100 dies, 0.031 defects/cm<sup>2</sup>

| size                | cost | dies | defects/cm <sup>2</sup>       | Area                    |
|---------------------|------|------|-------------------------------|-------------------------|
| W <sub>1</sub> 15cm | 12   | 84   | 0.020 defects/cm <sup>2</sup> | 176.715 cm <sup>2</sup> |
| W <sub>2</sub> 20cm | 15   | 100  | 0.031 defects/cm <sup>2</sup> | 314.159 cm <sup>2</sup> |

a) yield?

$$\text{DiesPerWafer} = \frac{\text{wafer-area}}{\text{Die area}}$$

$$W\text{Area} = \pi r^2 = \pi \left(\frac{d}{2}\right)^2$$

$$\text{Yield} = \frac{1}{(1 + (\text{Defects} \times \text{DieArea}/2))^2}$$

$$\text{DieArea} = \frac{\text{wafer-area}}{\text{Dies PerWafer}}$$

$$W\text{Area1} = \pi \left(\frac{15}{2}\right)^2 = 176.715 \text{ cm}^2 \quad W\text{Area2} = \pi \left(\frac{20}{2}\right)^2 = 314.159 \text{ cm}^2$$

$$\text{DieArea}_1 = \frac{176.715}{84} = 2.104 \text{ cm}^2 \quad \text{DieArea}_2 = \frac{314.159}{100} = 3.142 \text{ cm}^2$$

$$\text{Yield}_1 = \frac{1}{(1 + (0.020 \times \frac{2.104}{2}))^2} = 0.9592 = 95.92\%$$

$$\text{Yield}_2 = \frac{1}{(1 + (0.031 \times \frac{3.142}{2}))^2} = 0.9093 = 90.93\%$$

b)  $\text{cost\_per\_die} = \frac{\text{cost\_per\_wafer}}{\text{Dies\_PerWafer} \times \text{yield}}$

$$\text{Cost\_per\_die}_1 = \frac{12}{84 \times 0.9592} = 0.1489$$

$$\text{Cost\_per\_die}_2 = \frac{15}{100 \times 0.9093} = 0.1649$$

$$c) \text{ 1.10 Dies Per Wafer} = \frac{\text{wafer\_area}}{\text{Die\_area}}$$

$$\text{yield} = \frac{1}{(1 + (\text{DefectsPerArea} \times \text{DieArea}/2))^2}$$

$$\text{DieArea} = \frac{\text{Wafer Area}}{(1.10)\text{DiesPerWafer}} = (1.10)\text{DiePerWafer} = \frac{\text{Wafer Area}}{\text{DieArea}}$$

$$\text{Yield} = \frac{1}{(1 + (1.15 * \text{DefectsPerArea} \times \frac{\text{WaferArea}}{(1.10)\text{DieperWafer}}))^2}$$

$$\boxed{\text{Yield} = \frac{1}{(1 + (1.15 * \text{DefectsPerArea} * \left(\frac{\text{WaferArea}}{(1.10)\text{DiesPerWafer}}\right)/2))^2}}$$

| Compiler | Dynamic Inst. count | execution time | CPI  |
|----------|---------------------|----------------|------|
| A        | 1.0E9               | 1.1s           | 1.1  |
| B        | 1.2E9               | 1.5s           | 1.25 |

a) avg. CPI, clock cycle time 1ns

$$\text{CPU clock cycles} = \text{Instruction\#} * \text{CPI}$$

Compiler A:

$$\# \text{CPU clock cycles} = \frac{1.1s}{1ns} \cdot \frac{1.1}{1.0E-9} = 1.1E9 \text{ cycles}$$

$$\boxed{\text{CPI} = \frac{\# \text{clock cycles}}{\text{Instruction\#}} = \frac{1.1E9}{1.0E9} = 1.1}$$

Compiler B

$$\# \text{CPU clock cycles} = \frac{1.5s}{1ns} \cdot \frac{1.5}{1.0E-9} = 1.5E9$$

$$\boxed{\text{CPI} = \frac{\# \text{Clock Cycles}}{\# \text{Instructions}} = \frac{1.5E9}{1.2E9} = 1.25}$$

b) each program runs off different processor, execution time is the same. How much faster is A compared to B?

$$\text{CPU execution time} = \# \text{CPU clock cycles} * \text{Clock Cycle time}$$

$$\text{Clock Cycle time} = \frac{\text{CPU execution time}}{\# \text{cpu clock cycles}}$$

$$\frac{1.2E9 \text{ instructions}}{1.0E9 \text{ instructions}} = \frac{1.2E9}{1.2E9} = 1.2$$

∴ Compiler B is 20% Faster

4(c)  $6.0 \times 10^8$  instructions  $CPI = 1.1$   
Clock cycle time  $C$

$$CPU_{T_3} = 6.0 \times 10^8 (1.1) C = 6.6 \times 10^8 C$$

$$CPU_{T_4} = 1.0 \times 10^9 (1.1) C = 1.1 \times 10^9 C$$

$$CPU_{T_2} = 1.2 \times 10^9 (1.15) C = 1.38 \times 10^9 C$$

$$\therefore \frac{CPU_{T_2}}{CPU_{T_3}} = \frac{1.38 \times 10^9}{6.6 \times 10^8} = 2.1$$

$$\frac{CPU_{T_2}}{CPU_{T_3}} = \frac{1.38 \times 10^9}{6.6 \times 10^8} = 2.1$$

$\therefore$  The speedup is 1.67 compared to  $P_1$   
 $\therefore$  The speedup is 2.1 compared to  $P_2$

|                | Clock Rate | CPI |   |   |   |
|----------------|------------|-----|---|---|---|
|                | Clock Rate | A   | B | C | D |
| P <sub>1</sub> | 2.5GHz     | 1   | 2 | 3 | 3 |
| P <sub>2</sub> | 3.0GHz     | 2   | 2 | 2 | 2 |

Program: 1.0E6 instructions  
 A 10% B 20% C 50% D 20%

Instructions Divided into:

$$A: 1.0E6 * 0.10 = 1.0E5$$

$$B: 1.0E6 * 0.20 = 2.0E5$$

$$C: 1.0E6 * 0.50 = 5.0E5$$

$$D: 1.0E6 * 0.20 = 2.0E5$$

#instructions \* CPI

CPU Time = Clock Rate

find CPU time for each class in each processor,  
 and add them together

$$\text{CPU}_{T_{P_1}} = \left( \frac{1.0E5 * 1}{2.5E9} \right) + \left( \frac{2.0E5 * 2}{2.5E9} \right) + \left( \frac{5.0E5 * 3}{2.5E9} \right) \\ + \left( \frac{2.0E5 * 3}{2.5E9} \right) = \frac{2.6E6}{2.5E9} = 1.04E-3$$

$$\text{CPU}_{T_{P_2}} = \left( \frac{1.0E5 * 2}{3.0E9} \right) + \left( \frac{2.0E5 * 2}{3.0E9} \right) + \left( \frac{5.0E5 * 2}{3.0E9} \right) \\ + \left( \frac{2.0E5 * 2}{3.0E9} \right) = \frac{2.0E6}{3.0E9} = .66E-3 = 6.66E-4$$

CPU TIME For P<sub>1</sub> = 1.04E-3 seconds  
 CPU TIME For P<sub>2</sub> = 6.66E-4 seconds,

∴ P<sub>2</sub> is faster than P<sub>1</sub>

a) Global CPI

|                | Clock rate | Time of exec | # instructions | CPI |
|----------------|------------|--------------|----------------|-----|
| P <sub>1</sub> | 2.5 GHz    | 1.04E-3      | 1.0E6          | P   |
| P <sub>2</sub> | 3.0 GHz    | 6.66E-4      | 1.0E6          | P   |

$$\text{CPU Time} = \frac{\# \text{Instruction} * \text{CPI}}{\text{Clock Rate}}$$

$$\text{CPU Time} * \text{Clock rate} = \# \text{ instruction} * \text{CPI}$$

$$\text{CPI} = \frac{\text{CPU Time} * \text{Clock Rate}}{\# \text{instructions}}$$

$$\text{CPI}_{P_1} = \frac{1.04E-3 * 2.5E9}{1.0E6} = 2.6$$

$$\text{CPI}_{P_2} = \frac{6.66E-4 * 3.0E9}{1.0E6} = 1.998$$

b) What is # of clock cycles in each case?

$$\text{CPU Time} = \frac{\text{CPU clock cycles}}{\text{Clock rate}}$$

$$\text{CPU clock cycles} = \text{CPU Time} * \text{Clock rate}$$

$$\# \text{Clock Cycles}_{P_1} = 1.04E-3 * 2.5E9 = 2.6E6$$

$$\# \text{Clock Cycles}_{P_2} = 6.66E-4 * 3.0E9 = 1.998E6$$

6) Pentium 4 Processor - 3.6 GHz clock rate,  $V = 1.25V$ ,  
on avg. it consumes 10W static power  
90W dynamic power

Core i5 ivy - 3.4 GHz clock rate,  $V = 0.9V$ ,

30W static

40W dynamic

a) Find avg. capacitive loads for each

| Processor   | Clock rate | Voltage | Power   |
|-------------|------------|---------|---------|
| Pentium 4   | 3.6 GHz    | 1.25V   | 10s 90D |
| Core i5 ivy | 3.4 GHz    | 0.9V    | 30s 40D |

$$\text{Power} \propto \frac{1}{2} * \text{Capacitive load} * \frac{V^2}{2} * \text{Frequency switched}$$

$$\text{Capacitive load} = \frac{V^2 * \text{frequency switched}}{2 * \text{Power}}$$

$$\text{Pentium 4} = \frac{2 * (10W + 90W)}{(1.25)^2 * 3.6 \times 10^9}$$

$$= 3.76 \times 10^{-8}$$

$$\text{Core i5 ivy} = \frac{2 * (30W + 40W)}{(0.9)^2 * 3.4 \times 10^9}$$

$$= 5.08 \times 10^{-8}$$

b)

$$\text{Dissipation} = \text{static-power} + \frac{\text{Dynamic}}{\text{Static Power}}$$

$$\text{Pentium 4} = 10\text{W} + \frac{90\text{W}}{10\text{W}} = 19\text{W}$$

$$\text{Core i5 Ivy} = 30\text{W} + \frac{40\text{W}}{30\text{W}} = 31.3\text{W}$$

c)

$$\text{Power} = V * C$$

$$0.90 \text{ Power} = V * C$$

$$\frac{1}{0.90} = 1.111 \text{ voltage maintained}$$