

# Computer Architecture and Organization (01)

21B00612 Keonho Lim (김현호)

1.5 a.  $\text{Instruction / second} = \text{Instructions / clock cycle} \times \text{Clock cycle / second}$   
 $= \text{Clock rate / CPI}$

$$P1 = 3.0 \text{ GHz} / 1.5 \text{ CPI} = 2 \times 10^9$$

$$P2 = 2.5 \text{ GHz} / 1.0 \text{ CPI} = 2.5 \times 10^9$$

$$P3 = 4.0 \text{ GHz} / 2.2 \text{ CPI} = 1.81 \times 10^9 \\ \approx 1.82 \times 10^9$$

$\therefore P2$  has the highest performance in  
instruction per second

b. Number of cycles = clock cycles / second  $\times$  Time  
 $= \text{clock rate} \times \text{Time}$

$$\text{Number of Instructions} = \text{IPS} \times \text{Time}$$

$$\therefore P1 = 3.0 \text{ GHz} \times 10 = 30 \times 10^9 \quad \} \text{Number of cycles}$$

$$P2 = 2.5 \text{ GHz} \times 10 = 25 \times 10^9$$

$$P3 = 4.0 \text{ GHz} \times 10 = 40 \times 10^9$$

$$(2 \times 10^9 \times 10) \quad \leftarrow P1 \text{'s number of instructions is } 20 \times 10^9.$$

$$(2.5 \times 10^9 \times 10) \quad \leftarrow P2 \text{'s number of instructions is } 25 \times 10^9.$$

$$(1.82 \times 10^9 \times 10) \quad \leftarrow P3 \text{'s number of instructions is } 18.2 \times 10^9.$$

c. Clock Rate = Clock Cycles / CPU Time

$$P1 = 0.5, P2 = 0.4, P3 = 0.55$$

$$\therefore P1 \text{'s clock rate for one instruction} = 1.8 / 0.35$$

$$P2 \text{'s clock rate for one instruction} = 1.2 / 0.28$$

$$P3 \text{'s clock rate for one instruction} = 2.62 / 0.385$$

Clock Cycles

= Instruction count

$\times$  CPI

$\therefore P1$  should have about 5.14 GHz clock rate, P2 is 4.28 GHz, and P3 is about 6.8 GHz note.

$$1.6 \quad a. \quad CPI = \sum_{i=1}^n (CPI_i \times \text{Instruction Count}_i / \text{Instruction Count})$$

Clock Cycles =  $\sum_{i=1}^n (CPI_i \times \text{Instruction Count}_i)$ .

|    | A        | B | C | D |
|----|----------|---|---|---|
| P1 | 2.5 GHz. | 1 | 2 | 3 |
| P2 | 3 GHz    | 2 | 2 | 2 |

Execution for each Instruction class

$$A : 10^6 \times 10\% = 10^5$$

$$B : 10^6 \times 20\% = 2 \times 10^5$$

$$C : 10^6 \times 50\% = 5 \times 10^5$$

$$D : 10^6 \times 20\% = 2 \times 10^5$$

[ P1's clock cycles

$$(1 \times 10^5) + (2 \times 2 \times 10^5) + (3 \times 5 \times 10^5) + (3 \times 2 \times 10^5) = 2.6 \times 10^6$$

P2's clock cycles

$$(2 \times 10^5) + (2 \times 2 \times 10^5) + (2 \times 5 \times 10^5) + (2 \times 2 \times 10^5) = 2 \times 10^6$$

$$\therefore \underline{\text{P1's global CPI}} = 2.6 \times 10^6 / 10^6 = 2.6$$

$$\underline{\text{P2's global CPI}} = 2 \times 10^6 / 10^6 = 2$$

b.

∴ P1's clock cycle Ts  $2.6 \times 10^6$ .  
P2's clock cycle Ts  $2 \times 10^6$ .

# Computer Architecture and Organization (01)

21B00612 Keonho Lim (05/21)

| 1.7. | a. | Instruction count | Execution time |
|------|----|-------------------|----------------|
|      | A  | $10^9$            | 1.1            |
|      | B  | $1.2 \times 10^9$ | 1.5            |

$$\text{CPU Time} = \text{Instruction Count} \times \text{CPI} \times \text{Clock Cycle Time}$$

$$\text{CPI} = \text{CPU Time} / (\text{Instruction Count} \times \text{Clock Cycle Time})$$

the average

$$\text{CPI of Compiler A} = 1.1 / (10^9 \times 10^{-9}) \rightarrow \text{nano second}$$
$$= 1.1$$

the average

$$\text{CPI of compiler B} = 1.5 / (1.2 \times 10^9 \times 10^{-9})$$
$$= 1.25$$

$$\text{b. CPU Time} = \text{Instruction Count} \times \text{CPI} \times \text{Clock Cycle Time}$$

$$\textcircled{a} \quad \text{A's execution time} = 10^9 \times 1.1 \times \text{A's clock cycle time}$$

$$\text{B's execution time} = 1.2 \times 10^9 \times 1.25 \times \text{B's clock cycle time}$$

$$\text{A's execution time} = (1.2 \times 10^9 \times 1.25) / (10^9 \times 1.1) \times \text{B's clock cycle time}$$

$$= 1.5 / 1.1 \times \text{B's clock cycle time}$$

$$= 1.36 \times \text{B's clock cycle time}$$

∴ 1.36 times faster.

$$\text{c. the new compiler} = 6 \times 10^9 \quad \text{number of instructions}$$

$$\text{CPU Time} = 0.6 \times 10^9 \times 1.1 \times 10^{-9} \text{ (seconds)}$$

$$= 0.66 \text{ (seconds)}$$

$$n = \text{Execution time}_Y / \text{Execution time}_X \quad \begin{matrix} \times \text{ is } n \text{ times} \\ \text{faster than Y} \end{matrix}$$

$$\text{compiler A} = 1.1 / 0.66 \approx 1.67 \quad \text{new compiler is 1.67 times faster}$$

$$\text{compiler B} = 1.5 / 0.66 = 2.27 \quad \text{new compiler about 2.27 times faster.}$$

$$1.14 \text{ clock cycles} = \sum_{i=1}^n (\text{CPI}_i \times \text{Instruction Count}_i)$$

$$\text{CPU time} = (\text{Instruction Count} \times \text{CPI}) / \text{Clock Rate}$$

$$\therefore \text{clock cycles} = (50 \times 10^6 \times 1) + (110 \times 10^6 \times 1) + (80 \times 10^6 \times 4) + (16 \times 10^6 \times 2) = 512 \times 10^6$$

$$\text{CPU Time} = (512 \times 10^6) / 2 \times 10^9 = 0.256 \text{ (seconds)}$$

1.14.1

divide 2 to the clock cycle

$$\rightarrow \text{then } 256 \times 10^6 = (50 \times 10^6 \times \text{CPI of FP}) + (110 \times 10^6 \times 1) + (80 \times 10^6 \times 4) + (16 \times 10^6 \times 2) \\ = -4.12$$

$\therefore$  the result is negative so CPI of FP can not be improved.

1.14.2

$$256 \times 10^6 = (50 \times 10^6 \times 1) + (110 \times 10^6 \times 1) + (80 \times 10^6 \times \text{CPI of L/S}) + (16 \times 10^6 \times 2)$$

$$\text{CPI of L/S.} = (256 - 192) \times 10^6 / (80 \times 10^6 \times 4) \\ = 5$$

$\therefore$  make the CPI of L/S to 5.

1.14.3

$$\text{clock cycles} = (50 \times 10^6 \times 0.6) + (110 \times 10^6 \times 0.6) + (80 \times 10^6 \times 2.5) + (16 \times 10^6 \times 1.4) = 342.4$$

$$\text{CPU Time} = (342.4) \times 10^6 / 2 \times 10^9 = 171.2 \times 10^{-3} \\ = 0.1712$$

$$0.256 - 0.1712 = 0.0848$$

$\therefore$  It will improve 0.0848 faster than the original

# Computer Architecture and Organization (01)

21800612 Keonho Lim (김건호)

1.10.1

① Area of Circle =  $\pi r^2$

$$\therefore \text{wafer area} = \pi (7.5)^2 = 56.25 \pi (\text{cm}^2)$$

Die area  $\approx$  Wafer area / Die count

$$\approx 56.25\pi / 84$$

$$\approx 0.66964\pi \approx 0.66964 \times (3.14)$$

$$\approx 2.102 (\text{cm}^2)$$

Yield =  $1 / (1 + (\text{Defects per area} \times \text{Die area}/2))^2$

$$= 1 / (1 + (0.02 \times 0.66964\pi / 2))^2$$

$$\approx 1 / (1 + (0.02 \times 2.102 / 2))^2$$

$$\approx 1 / 1.04248$$

$$\approx 0.96$$

② wafer area =  $\pi (10)^2 = 100\pi (\text{cm}^2)$

Die area  $\approx 100\pi / 100 = \pi$

Yield  $\approx 1 / (1 + (0.031 \times \pi / 2))^2$

$$\approx 1 / (1 + (0.031 \times 3.14 / 2))^2$$

$$\approx 1 / 1.1$$

$$\approx 0.91$$

∴ 0.96, 0.91

1.10.2

$$\text{cost per die} = \text{cost per wafer} / \text{Dies per wafer} \times \text{Yield}$$

$$\text{first one's cost} = 12 / 84 \times 0.96 \approx 0.1489$$

$$\text{Second one's cost} = 15 / 100 \times 0.91 \approx 0.1648$$

∴ 0.1489, 0.1648.

1.10.3.

① changed Die area

$$= \text{Wafer area} / (10\% \times \text{Die count}) + \text{Die count}$$

$$= \text{Wafer area} / 1.1 \times \text{Die count}$$

$$= 2.102 / 1.1 \approx 1.91$$

$$\text{changed Yield} = 1 / (1 + (15\% \times \text{Defect rate}) + \text{Defect rate}) \times \\ \text{changed Die area}/2)^2$$

$$= 1 / (1 + 1.15 \times \text{Defect rate} \times \text{changed Die area}/2)^2$$

$$\approx 1 / (1 + 1.15 \times 0.02 \times 1.91/2)^2$$

$$\approx 1 / 1.0275$$

$$\approx 0.973$$

② changed Die area

$$= \pi / 1.1 \approx 2.85$$

changed Yield

$$= 1 / (1 + 1.15 \times 0.03) \times 2.85/2)^2$$

$$\approx 1 / 1.10416$$

$$\approx 0.905$$

∴ 1.91, 0.973 / 2.85, 0.905

1.10.4

$$\text{Yield} = 1 / (1 + \text{Defect rate} \times \text{Die area}/2)^2$$

$$= 1 / (1 + \text{Defect rate} \times 2/2)^2$$

$$= 1 / (1 + \text{Defect rate})^2$$

$$\text{Defect rate} = (1 / \sqrt{\text{Yield}}) - 1$$

$$\text{Before} = (1 / \sqrt{0.92}) - 1 \approx 0.0425 \text{ (defects/cm}^2\text{)}$$

$$\text{After} = (1 / \sqrt{0.95}) - 1 \approx 0.0259 \text{ (defects/cm}^2\text{)}$$