

# Computer Organisation:

FECHA / /

clock cycle = Instruction count × cycle per instruction?

$$\text{CPU Time} = N \times \text{CPI} \times T / f$$

Q. A - cycle time = 250 ps CPI = 2.0

B - = 500 ps CPI = 1.2

$\rightarrow \text{CPU A} = f \times 2.0 \times 250$

$\text{CPU B} = 600 / f \times 1.2 \times 600$

$\therefore \frac{\text{CPU}_B}{\text{CPU}_A} = \frac{600}{500} = 1.2$  --- A is 1.2 times faster than B.

CPI -

- If diff. instruction classes take diff. numbers of cycle.

$$\text{clock cycle} = \sum_{i=1}^n (\text{CPI}_i \times \text{Instruction count}_i)$$

$$\bullet \text{CPI} = \frac{\text{clock cycle}}{\text{Instruction Count}} = \sum_{i=1}^n \left( \frac{\text{CPI}_i \times \text{Instruction count}_i}{\text{Instruction count}} \right)$$

Relative frequency

Q Alternative compiled code sequences using Instruct<sup>n</sup> in class A, B, C

|               |   |   |   |  |
|---------------|---|---|---|--|
| 1.02          | A | B | C |  |
| CPI for class | 1 | 2 | 3 |  |
| IC in seq 1   | 2 | 1 | 2 |  |
| IC in seq2    | 4 | 1 | 1 |  |

$$\textcircled{1} \text{ Seq 1 } IC = 5$$

$$\text{seq 2 } IC = 6$$

- clock cycles

$$2 \times 1 + 2 \times 1 + 3 \times 2 = 10$$

- Avg CPI = 2.0

- clk cycle

$$4 \times 1 + 2 \times 1 + 3 \times 1 =$$

\textcircled{1} A 400 MHz processor was used to execute a benchmark program with following Instruct<sup>n</sup> mix & clock cycle counts:

| Instruct <sup>n</sup> type | count   | clk cycle count |
|----------------------------|---------|-----------------|
| Int arithmetic             | 450,000 | 1               |
| Data Transfer              | 320,000 | 2               |
| Floating                   | 150,000 | 2               |
| Control Transfer           | 80,000  | 2               |

CPI, MIPS rate & execution time?

$$\text{Avg CPI} = 2.24$$

$$\textcircled{2} \text{ Execution Time: } N \times CPI \times T \\ = N \times (IC) \times CPI$$

$$= \frac{400 \times 10^6}{2.24} = 3.87 \text{ ms}$$

| Instruct <sup>n</sup> type | CPI | Instruct <sup>n</sup> mix |
|----------------------------|-----|---------------------------|
| Arithmetic                 | 1   | 60%                       |
| store with cache hit       | 2   | 18%                       |
| Branch                     | 4   | 12%                       |
| Memory reg                 | 8   | 10%                       |

$$CPI = \left[ \begin{array}{l} \text{Avg CPI} = 2.24 \\ (1 \times 0.6 + 2 \times 0.18 + 4 \times 0.12 + 8 \times 0) \end{array} \right]$$

$$MIPS = \frac{400 \times 10^6}{2.24 \times 10^6} = 175.57 \text{ MIPS}$$

MIPS rate:

$$MIPS = \frac{IC}{T \times 10^6} = \frac{f}{CPI \times 10^6} = \frac{f \times IC}{C \times 10^6}$$

$$\frac{400 \times 10^6}{1.55 \times 10^6} = \frac{400000}{155} = 258 \text{ million inst per second (MIPS)}$$

1.6

Q The execution time of 4 prog on Three comp  
Assume 10<sup>9</sup> instrucn were exc

| prog. | exec time (sec) |   |      |     | # |
|-------|-----------------|---|------|-----|---|
|       | A               | B | B    | C   |   |
| 1     | 1               |   | 10   | 20  |   |
| 2     | 1000            |   | 100  | 80  |   |
| 3     | 500             |   | 1000 | 50  |   |
| 4     | 100             |   | 800  | 100 |   |

→ Calculate MIPS of each & put on graph

Amdahl's law - Fixed load Model: defines

$$\text{Execution time} = \text{Execution time}_\text{old} \left[ (1 - \text{frac}^{\text{enhanced}}) \frac{f}{s} \right]$$

$$\text{frac}^{\text{enhance}} \leq 1 \quad (f)$$

$$\text{Speedup}_{\text{enhance}} > 1 \quad (s)$$

$$s = \frac{1}{(1-f) + 1/s}$$

Q.1

$$\text{fractionenhance} = 0.4 \quad \text{speedupenhance} = 10$$

$$\text{Speedupoverall} = \frac{1}{0.6 \times 0.4} \approx 1.56$$

Q.2 → for FP case

$$S = 1.6$$

$$f = 0.5$$

for FPSQR

$$f = 0.2$$

$$S = 10$$

$$\therefore \text{Overall speedup} = \frac{1}{(1-f) + f/S}$$

$$= \frac{1}{(1-0.5) + 0.5/1.6} \quad \text{--- for PP}$$

$$= 1.23$$

$$\text{Same for FPSQR} = 1.22$$

Overall speedup of PP is greater hence choose PP.

Q.3 → FP op = 25%

$$\text{Avg CPI} = 4.0$$

$$\text{CPI of other} = 1.33$$

$$\text{freq of FPSQR} = 2\%$$

$$\text{CPI of FPSQR} = 2.0$$

Solve by:

$$EI = N * (\text{CPI} * I) \quad \left\{ \begin{array}{l} \text{processor} \\ \text{performance} \\ \text{evaluation} \end{array} \right.$$

$$= \frac{N * \text{CPI}}{f}$$

→ CPI with new FPSQR = CPI original - 2% (CPI of new FPSQR)

$$= \text{CPI of new FPSQR only}$$

$$= 2.0 \times 2\% (2.0 - 2) = 1.64$$

we can compute CPI of enhancement of all FP instructions the same way or by summing the FP and

$$\text{Speedup new FP} = \frac{\text{CPU Time Original}}{\text{CPU Time new FP}}$$

$IC \times \text{Clockcycles} \times CPI_{\text{Old}}$

$IC \times \text{Clockcycles} \times CPI_{\text{New}}$

Q.4 I don't understand what's going on.

$$\textcircled{a} \quad -1.25 \quad \textcircled{b} \rightarrow \frac{1}{1-0.99+0.99} = 1.98$$

② speedup

(c) 4.19

⑧ 1.11

9

- a) when parallelizing an app, ideal speedup is speeding up by the number of processor. This is limited (why I am not understanding this)

## Q.7 Speedup from pipelining

$$= \frac{4.4 \text{ ns}}{1.2 \text{ ns}} = 3.7 \text{ times}$$

Amdahl's law:

$$S_n = \frac{n}{1 + (n-1)\alpha}$$





Glustafson's law



Scaled workload

fixed exec time



speedup with fixed load

$$S_n = \frac{w_1 + w_n}{w_1 + w_n/n} = \frac{w_1 + G(n)w_n}{w_1 + G(n)w_n/n}$$

Amdahl & Gustafson are special case of fixmem  
(memory bounded speedup)

Instructions

3 add

2 add

1 --

0 -,-

$$Z = k + b * c$$



d-dest

s-source



① OP  $\xrightarrow{st}$  - add contents with Acc  
store in Acc.

② OP  $\xrightarrow{st}$

(2)

(1)

FECHA

5/Aug

3 Address :  $A + B * C$ 

2 Address :

MPY Z, B, C

ADD Z, Z, K

2 Address :

MPY B, C  $\left\{ \begin{array}{l} M R_1, C \\ MUL R_1, C \end{array} \right.$ ADD B, B  $\left\{ \begin{array}{l} MUL R_1, B \\ ADD R_1, B \end{array} \right.$ MOV Z, B  $\left\{ \begin{array}{l} ADD R_1, K \\ MOV Z, R_1 \end{array} \right.$ 

1 Add :

(2 Instruction)

(4)

1 Address :

LOAD C

MUL B

ADD K

STORE Y

(4)

Postfix code  $Z = K + B * C = KBC * +$ 

0 Add - PUSH K

PUSH B

PUSH C

MUL

ADD

POP

(6)



$$Y = \frac{A - B}{C + (D * E)}$$

3-Address :

SUB Y, A

MPY T, D, E

ADD T, T, C

DIV Y, Y, T

MPY T, E

ADD T, C

DIV Y, T

2 Add

MOV Y, A

SUB Y, B

MPY E

ADD C

STORE Y

LOAD A

SUB B

DIV Y

STORE Y

1 Add

LOAD D

AC E

ADD

C

STORE Y

Y ← AC

LOAD A

SUB B

DIV Y

STORE Y

$$(A - B) / C + (D * E)$$

 $AB - C / DE * +$  is the postfix

PUSH A

PUSH B

SUB

PUSH C

DIV

PUSH D , PUSH E

MUL

ADD

POP

10 Instructions

FECHA \_\_\_\_\_

$$z = \frac{((P - (Q+S) * T)}{(U / (V-W))}$$

3 Add →

SUB R1, V, W

DIV R1, U, R1

ADD R2, Q, S

MUL R2, R2, T

SUB R2, P, R2

DIV Z, R1, R2

2 Add

LDI MOV R1, W

SUB R1, V

DIV R1, U

MOV R2, S

ADD R2, CL

MUL R2, T

SUB R2, P

DIV R2, R1

MOV RZ, R2

1 Address

LDW W

SUB V

DIV U

STOR U

(6)

SIR

MOV R0, S

ADD R0, Q

MOV R1, P

SUB R1, R0

MUL R1, T

MOV R0, V

SUB R0, W

MOV R2, U

DIV R2, R0

DIV R1, R2

MOV Z, R1

1 Add -  $\equiv 15$ 

LD @

ADD S

STORE T

LD P

S0B T

MUL T

LOD V

SUB W

STORE TI

LOD U

DIV TI

STORE TI

LOD T

DIV TI

STORE Za.

o Add

FECHA \_\_\_\_\_

PCOS + T ← UVW - 11

PUSH P

@

S

ADD

• Fixed length, variable length instruction.

R7

FECHA 8 4 2 1

OP operand

1 2 3

16 bits &amp; 16 R

then no number of type 3 add instruction?

30 type 2

28 type 1

64 type 0

type 3:

each operand field 3 bits:

 $\therefore 4 \times 3 = 12$  bits for operands.

0000 01 02 03

0000 01 02 03

0000 01 02 03

0000 01 02 03

0000 01 02 03

0000 01 02 03

0000 01 02 03

0000 01 02 03

0000 01 02 03

0000 01 02 03

Type 2: 2 operands  $\therefore 2 \times 4 = 8$  remaining 8 op code

1110 0000

1111 1101

Type 1 1 operand  $\therefore 1 \times 4 = 4$  remaining 12 op code.

0000

111111000000

1

(27)

16  
15  
14  
13  
12

1 2 3

Type 0: All 16 bits represent op

1111111111000000

1111111111111111

By solving these all to get total no. of add

Type 3  $- 14 \times 2^{12}$ + 80  $\times 2^8$ + 28  $\times 2^4$ 

+ 64

1848  $\times$ 655  $\equiv 2^{16}$ 

Example 1:

Consider a machine with 16 bit instruction &amp; 16 reg. Instuct format:

opcode + Mem address

→ If 1KB byte addressable mem we need 12 bit to specify add locatn

• Remaining for 4 bits opcode.



OP Add

opcode + Reg Add

need 4 bits to select one of the 16 available reg.

Suppose we have 4 bit op encode 16 diff ins w/ 3 operand  $3 \times 4 = 12$  bits

Ex. 2 Consider machine with 16 bit instruction & 16 registers. And wish to encode the following instruction.

- 15 instructn with 3 Add

14            2

31            1

16            0

Can we encode this instructn set in 16 bits?

→ Yes, if we can use expanding opcodes.

⇒ 0000 ----- } 3  
14 - 1110

escape code - 1111 0000 R<sub>1</sub> R<sub>2</sub> } 2  
1111 1101 ⑬ R<sub>1</sub> R<sub>2</sub>

7 masked 1111 1110 0000 } 1  
1111 1110-⑬

1111 1111 1111 0000 } 0  
-11-            1111 } 1

15 instl

$$+ 15 \times 2^4 \times 2^4 \times 2^4 = 15 \times 2^{12} = 61440$$

14 ins

$$14 \times 2^8 = 3584$$

31 ins

$$31 \times 2^4 = 496$$

16 inste

$$16 \times 2^0 = 16$$

$$65536 \equiv 2^{16}$$

we have exact match with no wastage Hence possible.

Ex. 3 Is it possible to design an expanding opcode to allow the following to be encoded with 12 bit? Assume Reg operand required 3 bits

4 inste with 3 seg 0000 R<sub>1</sub> R<sub>2</sub> R<sub>3</sub>  
255 inste with 1 seg 0000 -

16 inste with 0 seg

$$\rightarrow 21 \times 2^9 + 255 \times 2^3 + 16 = 14104 \quad \text{not po}$$

instruction length 12 bit  $\therefore 2^{12} = 4096$

000 R<sub>1</sub> R<sub>2</sub> R<sub>3</sub> || 011 111 111 --- } TYPE 1

1111 1111  
1111 1111 1111

82  
16  
68  
36  
4

32 16 8 4 2!

FECHA

Ans 64

Q4 Assume CPU having 16 Reg & instead of length is 16 bits. It should be having 14 of type's add instruction.

81 Type 2

12 Type 1

Calculate almost how many no. of Type 0 instruction should have. Show proper encoding.

Type 3: 0000 R1 R2 R3  
1101 R1 R2 R3

Type 2: 1110 0000 R1 R2  
1111 1110

Type 1: 1111 1110 0000 R1  
1111 1111 1011 R1

Type 0: 1111 1111 1100 0000  
1111 1111 1111 1000

$$81 \times 2^{12} + 31 \times 2^8 + 12 \times 2^{14} + 2^0 = 2^{16}$$

williamster

FECHA

### • Control unit functional Requirements

Control unit → generate control signal  
It performs basic task:

- ① sequence of micro-operations
- ② Executn

496 chapter 4 chapter 21 chapter:



control signals

External

ALU

• datapath

• Control path

Q when stored unstored is ALU involved?

chapter 20 716

Q Who generate Address?  
→ Processor

RISK P

NSC pattern book



Setting of control signal depends on.

- Contents of the step counter
- Contents of instruction register
- The result of a computation or comparison operation
- External input signals, such as interrupt requests.

Counter-enable



*Generator of control signals*

Datapath control signal:

$$RF\_write = TS \cdot (ALU + \text{load} + \text{call})$$

CISC cycle processor:

organisation of cisc-cycle processor.