

## Exercise 1.3

1.3.1 The overall performance can be assessed by IPS (Instructions per Second)

$$\text{For } P_1: ISP_{P1} = \frac{\text{Clock Rate}}{\text{CPI}} = \frac{2 \times 10^9}{1.5} \approx 1.33 \times 10^9$$

$$\text{For } P_2: ISP_{P2} = \frac{1.5 \times 10^9}{1.0} = 1.5 \times 10^9$$

$$\text{For } P_3: ISP_{P3} = \frac{3 \times 10^9}{2.5} = 1.2 \times 10^9$$

So, Processor P<sub>2</sub> has the highest performance.

1.3.2. Total cycles = Clock Rate × Time

Total instructions = Total cycles / CPI

$$\text{For } P_1: \text{Total cycles} = 2 \times 10^9 \times 10 = 2 \times 10^{10}$$

$$\text{Total instructions} = 2 \times 10^{10} / 1.5 = 1.33 \times 10^9$$

$$\text{For } P_2: \text{Total cycles} = 1.5 \times 10^9 \times 10 = 1.5 \times 10^{10}$$

$$\text{Total instructions} = 1.5 \times 10^{10} / 1.0 = 1.5 \times 10^{10}$$

$$\text{For } P_3: \text{Total cycles} = 3 \times 10^9 \times 10 = 3 \times 10^{10}$$

$$\text{Total instructions} = 3 \times 10^{10} / 2.5 = 1.2 \times 10^{10}$$

1.3.3 Assume the original time has 1 unit  
Instruction Count  $\times$  CPI

$$\text{Execution Time} = \frac{\text{Instruction Count} \times \text{CPI}}{\text{Clock Rate}}$$

Corresponding Modify:

$$\text{New Clock Rate} = \text{Original Clock Rate} \times \frac{1.2}{0.7}$$

$$\text{For P}_1: NCR = 2 \times 10^9 \times \frac{1.2}{0.7} = 3.43 \times 10^9$$

$$\text{For P}_2: NCR = 1.5 \times 10^9 \times \frac{1.2}{0.7} = 2.57 \times 10^9$$

$$\text{For P}_3: NCR = 3 \times 10^9 \times \frac{1.2}{0.7} = 5.14 \times 10^9$$

1.3.4 IPC =  $\frac{\text{Number of Instructions}}{\text{Total Cycles}}$

$$\text{For P}_1: IPC = \frac{20 \times 10^9}{2 \times 10^9 \times 7} = 1.43$$

$$\text{For P}_2: IPC = \frac{30 \times 10^9}{1.5 \times 10^9 \times 10} = 2$$

$$\text{For P}_3: IPC = \frac{90 \times 10^9}{3 \times 10^9 \times 9} = 3.33$$

1.3.5

$$\text{Clock Rate} = \frac{30 \times 10^9 \times \frac{1}{2}}{7} = 2.14 \times 10^9$$

$$1.3.6 \rightarrow \text{No Instructions} = \frac{1.5 \times 10^9 \times 9}{\frac{1}{2}} = 27 \times 10^9$$

### Exercise 1.4

**1.4.1**

$$\text{Execution Time} = \frac{\text{No Instructions} \times \text{Average CPI}}{\text{Clock Rate}}$$

For P1:

$$\text{Average CPI} = 0.1 \times 1 + 0.2 \times 2 + 0.5 \times 3 + 0.2 \times 4 = 2.8$$

$$\text{Execution Time} = \frac{10^6 \times 2.8}{1.5 \times 10^9} = 1.8 \times 10^{-3} \text{ s}$$

For P2:

$$\text{Average CPI} = 0.1 \times 2 + 0.2 \times 2 + 0.5 \times 2 + 0.2 \times 2 = 2$$

$$\text{Execution Time} = \frac{10^6 \times 2}{2 \times 10^9} = 1.0 \times 10^{-3} \text{ s}$$

Conclusion: Processor P2 is faster.

### 1.4.2

For P1: Global CPI = 2.8

For P2: Global CPI = 2

**1.4.3**

$$\text{Clock Cycles} = \text{No Instructions} \times \text{Average CPI}$$

$$\text{For P1: } 10^6 \times 2.8 = 2.8 \times 10^6$$

$$\text{For P2: } 10^6 \times 2.0 = 2.0 \times 10^6$$

1.4.4.

$$\text{Total Cycles} = 500 \times 1 + 150 \times 5 + 50 \times 2 = 1350$$
$$\text{Execution Time} = \frac{1350}{2 \times 10^9} = 6.75 \times 10^{-7}$$

1.4.5.

$$CPJ = \frac{1350}{700} = 1.93$$

1.4.6.

$$\text{Speed-up} = \frac{\text{Old Total Cycles}}{\text{New Total Cycles}} = \frac{1350}{1100} = 1.23$$

Exercise 2.4

2.4.1

For a. lw \$t0, 1b(\$s7);

add \$t1, \$s1, \$s2;

add \$s0, \$t1, \$t0;

For b.: lw \$t0, 1b(\$s7);

sll \$t0, \$t0, 2;

add \$t0, \$t0, \$s6;

lw \$t1, 0(\$t0);

sub \$s0, \$s1, \$t1;

2.4.2.

For a. 3 MIPS Instructions (One lw + Two add)

For b. 5 MIPS Instructions (Two lw + One sll + One add + One sub)

2.4.3.

For a, 6 different registers:

$\$S_0, \$S_1, \$S_2, \$S_7, \$t_0, \$t_1$

For b, 6 different registers:

$\$S_0, \$S_1, \$S_b, \$S_7, \$t_0, \$t_1$

2.4.4.

For a.  $f = f + g + h + i + j;$

For b.  $f = A[i]$

2.4.5.

To be truth, these 2 assembly codes are already the optimal and the instructions are minimum.

2.4.6.

Original Code Utilization:

For a - 5 registers used

$\$S_0, \$S_1, \$S_2, \$S_3, \$t_4$

For b - 2 registers used

$\$S_0, \$S_b$

These 2 are already optimized.

## Exercise 2.11

2.11.1-

For a.

1010 1110 0000 1011 1111 1111 1111 1100

For b.

1000 1101 0000 0000 1111 1111 1100 0000

2.11.2

For a. 2920022012

For b. 2366177216

2.11.3.

For a: sw \$11 0xFFFF(\$16)

For b: lw \$8 0xFFC0(\$8)

2.11.4.

For a: R-type

For b: I-type

2.11.5

For a: add \$S3, \$S1, \$T2

For b: sw \$T5, 4(\$S16)

2.11.6-

For a. 000000 0000 0010 0001 000000  
000000 0000 0000 0000 000000

For b. 101011 10000 00101

00000000000000000000000000000000

## Exercise 2.B

2.B.1

For a - Ox 57755778

For b - Ox FEEFFFEDE

2.B.2

For a, Ox 55555550

For b, Ox EADFEED0

2.B.3.

For a, Ox AAAAAA

For b, Ox BFGD

2.B.4.

For a, Ox 00015B5A

For b, Ox 000000D0

2.B.5

For a, Ox EFEF0000

for b, Ox 0000

2.B.6.

For a, Ox FFFFFF

For b, Ox 00FO

## Exercise 2.18

2.B.1

For a.



For b.



## 2.18.2

For a. li \$t0, 0

for-loop:

bge \$t0, 10, exit-for

add \$s0, \$s0, \$s1

addi \$t0, \$t0, 1

j for-loop

exit-for

For b.

while-loop:

bge \$s0, 10, exit-while

add \$t1, \$s1, \$s0

sw \$t1, 0(\$2)

addi \$s0, \$s0, 1

addi \$s2, \$s2, 4

j while-loop

exit-while:

## 2.18.3

For a. 1 + # of loop \* 4

For b. # of loop \* 6

After init a, b and D

For a. 1 + (4 \* 10 = 41)

For b. 1

2.184.

For a, 501

For b, 301

2.185.

For a, int i, result = 0;  
int \*MemArray = ...;  
for (i=0; i<100; i++)  
{ result += MemArray[i]; }

For b, int i, result = 0;  
int \*MemArray = ...;  
for (i=0; i<50; i++)  
{ result += MemArray[2\*i] + MemArray  
[2\*i+1]; }

2.186:

For a,

addi \$t1, \$s0, 100

addi \$s0, \$s0, 400

Loop:

subi \$s0, \$s0, 4

lw \$s1, 0(\$t0)

add \$s2, \$s2, \$s1

bne \$s0, \$t1, Loop.

For b.

addi \$t<sub>1</sub>, \$50, 400

Loop:

lw \$51, 0(\$50)

lw \$t<sub>2</sub>, 4(\$50)

add \$52, \$52, \$51

add \$52, \$52, \$t<sub>1</sub>

addi \$50, \$50, 8

bne \$50, \$t<sub>1</sub>, Loop.

Exercise 2.27

2.27.1.

For a. add \$50, \$51, \$52

For b. b29 \$t<sub>0</sub>, \$t<sub>1</sub>, label

The address of Label is calculated relative  
to the current PC

2.27.2.

For a. R-type

For b. I-type

2.27.3.

Benefits: Reduced Code Size

Efficiency in Branching

Position-Independent Code

Drawbacks: Limited Range

Potential for Errors

Loop\_start:

lw \$t0, 0(\$s0)  
addi \$s0, \$s0, 4  
bne \$t0, \$zero, Loop\_start

Benefit: The bne instruction uses PC-relative addressing mode to loop back efficiently

Drawback: The range is limited, can only branch within  $\pm 2^{15} - 1$  instruction words from the current PC

2.27.4.

For a. ① 0x3C100064

② 0x36100028

For b:

① 0x20080000

② 0x3D090400

2.27.5.

addi \$s0, \$zero, 0

ori \$s0, \$s0, part\_of\_data

2.27.6-

Typically require 2-3 extra instructions