

## Computer Organization

Page No. \_\_\_\_\_  
Date 23/07

### CPI Example :-

Computer A: Cycle Time = 250ps, CPI = 2.0

Computer B: Cycle Time = 500ps, CPI = 1.2

Same ISA.

Which is faster & how many much?

$$\begin{aligned} \text{CPU Time}_A &= \text{Instruction Count} \times \text{CPI}_A \times \text{Cycle Time}_A \\ &= 1 \times 2.0 \end{aligned}$$

$$\begin{aligned} \text{CPU Time}_B &= \text{Instruction Count} \times \text{CPI}_B \times \text{Cycle Time}_B \\ &= 1 \times 1.2 \times 500 \text{ ps} = 1 \times 600 \text{ ps} \end{aligned}$$

$$\frac{\text{CPU Time}_A}{\text{CPU Time}_B} = \frac{1 \times 600 \text{ ps}}{1 \times 500 \text{ ps}} = 1.2$$

o Clock Cycle =  $\sum_{i=1}^n (\text{CPI}_i \times \text{Instruction Count}_i)$  [If different instruction classes take different numbers of cycles]

### o Weighted Average CPI

$$\text{CPI} = \frac{\text{Clock Cycle}}{\text{Instruction Count}} = \sum_{i=1}^n \left( \frac{\text{CPI}_i \times \text{Instruction Count}_i}{\text{Instruction Count}} \right) \text{relative frequency}$$

o Alternative compiled code sequence using instructions in classes A, B, C.

| class                    | A | B | C |
|--------------------------|---|---|---|
| CPI for class            | 1 | 2 | 3 |
| TC in seq <sup>n</sup> 1 | 2 | 1 | 2 |
| TC in seq <sup>n</sup> 2 | 4 | 1 | 3 |

Sequence 1 :  $T_C = 5$

$$\begin{aligned} \text{clock cycle} \\ = 2 \times 1 + 1 \times 2 + 2 \times 3 \\ = 10 \end{aligned}$$

Avg. CPI =  $10/5 = 2.0$

Sequence 2 :  $T_C = 6$

$$\begin{aligned} \text{clock cycle} \\ = 4 \times 1 + 1 \times 2 + 1 \times 3 \\ = 9 \end{aligned}$$

Avg. CPI =  $9/6 = 1.5$

Problem 1: A 400-MHz processor was used to execute a benchmark program with the following instruction mix & clock cycle count.

Determine CPI, MIPS rate and execution time for program.

| Instruction type   | Instn count | Instn clock cycle count |
|--------------------|-------------|-------------------------|
| Integer arithmetic | 650000      | 1                       |
| Data transfer      | 320000      | 2                       |
| Floating point     | 150000      | 2                       |
| Control transfer   | 80000       | 2                       |

$$= 450000 \times 1 + 320000 \times 2 + 150000 \times 2 + 80000 \times 2$$

~~+ 1000000~~

$$= 450000 + 640000 + 300000 + 160000$$

~~+ 1000000~~

$$= 1550000$$

~~+ 7000000~~

Avg. CPI = 1.55

$PT = N \times CPI \times T$

$$= N \times CPI$$

f

$$= \frac{1000000 \times 1.55}{400 \times 10^6} = 0.81$$

MIPS rate =  $\frac{400 \times 10^6}{1.55 \times 10^6} = \frac{400}{1.55} = 258 \text{ MIPS.}$

| Instructn Type                   | CPI | Instructn mix |
|----------------------------------|-----|---------------|
| Arithmetic & logic               | 1   | 60%           |
| Load / store with cache hit      | 2   | 18%           |
| Branch                           | 4   | 12%           |
| Memory reference with cache miss | 8   | 10%           |

a) calculate average CPI

b) calculate the corresponding MIPS

$$\rightarrow a) = 1 \times 0.6 + 2 \times 0.18 + 4 \times 0.12 + 8 \times 0.1$$

~~18~~ 100

$$= 2.24$$

$$b) \text{ MIPS rate} = \frac{400 \times 10^6}{\frac{400 \times 10^6}{2.24 \times 10^6}} = \frac{400}{2.24} = \underline{\underline{178.57 \text{ MIPS}}}$$

eg. 3] The execution times of four programs on three computers are given below:

| Program | Execution Time (in seconds) |            |            |
|---------|-----------------------------|------------|------------|
|         | Computer A                  | Computer B | Computer C |
| prog 1  | 1                           | 10         | 20         |
| prog 2  | 100                         | 100        | 20         |
| prog 3  | 500                         | 1000       | 50         |
| prog 4  | 100                         | 800        | 100        |

$$\text{MIPS Rate} = \frac{Tc}{T \times 10^6} = f = \frac{f \times Tc}{C \times 10^6}$$

Amelahi's law

$$\text{Execution Time new} = \text{Execution Time old} \times \left[ \frac{(1 - \text{fraction enhanced}) + \text{fraction taken}}{\text{Speedup enhanced}} \right]$$

$$\text{Speedup overall} = \frac{\text{Execution time old}}{\text{Execution time new}} = \frac{1}{(1 - \text{fraction enhanced}) + \frac{\text{fraction enhanced}}{\text{Speedup enhanced}}}$$

$$S_o = \frac{1}{\left( (1-f) + \frac{f}{S} \right)} \quad \dots \text{short form}$$

Amelahi's law

Speed up = Performance for entire task using the enhancement when possible  
Performance for entire task using

Speed up =  
Execution time for entire task using the enhancement when possible

$$f < 1$$

$$S > 1$$

1. The fraction of the computational time in the original computer that can be converted to take advantage of the enhancement.

2. The improvement gained by the enhanced execution mode, that is how much faster the task would run if enhanced mode were for entire program.

eg. 1) Fraction enhanced = 0.4, speedup enhanced = 10,  
speedup overall =  $\frac{1}{0.6 + 0.4} = \frac{1}{1.0} \approx 1.56$

2)  $f = 0.5 \quad S = 1.6$   
 $\frac{1}{(1-0.5) + 0.5} = \frac{1}{0.5 + 0.3125} = \frac{1}{0.8125} = 1.23$

$S = 10 \quad f = 0.2$   
 $\frac{1}{(1-0.2) + 0.2} = \frac{1}{0.8 + 0.02} = \frac{1}{0.82} = 1.22$

3) Frequency of FP operation = 25%

Average CPI of FP operation = 4.0

Average CPI of other instruction = 1.33

Frequency of FPSQR = 2%

CPI of FPSQR = 20.

ET = N × CPI × T

Average CPI of all FP operations is 2.5

Decrease CPI of FPSQR to 2

Solve By processor performance equation

$$= \frac{N \times CPT}{f}$$

CPT with new FPSAR = CPT original - 24% of CPT oldFPSAR -  
CPT of new FPSAR only  
 $= 2.0 - 24\% \times (2.0 - 2) = 1.64$

$$\text{CPT new FP} = (75\% \times 1.33) + (25\% \times 2.5) = 1.625$$

$$\text{Speedup new FP} = \frac{\text{CPU time original}}{\text{CPU Time new FP}}$$

$$= \frac{TC \times \text{clock cycle} \times \text{CPT original}}{TC \times \text{clock cycle} \times \text{CPT new FP}}$$

$$= \frac{\text{CPT original}}{\text{CPT new FP}}$$

$$= \frac{2.0}{1.625} = 1.23$$

a) Speedup portion is 2 ;  $f = 40\% \approx 0.4$  ;  $f = 99\% \approx 0.99$   
 $\checkmark$  Speedup  $\frac{1}{0.6+0.4} = \frac{1}{1.0} = \frac{1}{0.6+0.2} = \frac{1}{0.8} = 1.25$

b) Speedup  $\frac{1}{1.099+0.99} = \frac{1}{2.098} = \frac{1}{0.01+0.495} = \frac{1}{0.505} = 1.98$

c) 40% of first application

$$\frac{1}{0.2+0.8 \times 0.6+0.8 \times \frac{0.4}{2}} = \frac{1}{0.2+0.48+0.16} = \frac{1}{0.84} = 1.19$$

d) 90% second application

$$\frac{1}{0.8+0.2 \times 0.1+0.2 \times 0.99} = \frac{1}{(0.8+0.002+0.099)} = \frac{1}{0.901} = 1.11093$$

i. a) What is the speedup with N processor if 80% of the application is parallelizable, ignoring the cost of communication?

$$F = 0.8 \quad S = N \quad \frac{1}{(1-0.8)+0.8} = \frac{1}{0.2+0.8} = \frac{S}{(1-f)+f} = \frac{1}{N}$$

b)  $\frac{1}{(1-0.8)+0.8} = \frac{1}{0.2+0.8} = \frac{1}{N}$

b)  $\frac{1}{0.2+8 \times 0.005+0.8} = \frac{1}{8} = \frac{1}{0.2+0.04+0.8} = \frac{1}{8} = \frac{1}{0.34} = 2.941$

c)  $\frac{1}{0.2+3 \times 0.005+0.8} = \frac{1}{8} = 0.125$

d)  $\frac{1}{0.2+100N \times 0.005+0.8} = \frac{1}{N}$

d)  $\frac{1}{4N \left( \frac{1}{1-p} + \log N \times 0.005 + \frac{p}{N} \right)} = 0$

$$S = \frac{F_{Tup}}{E_{Tp}}$$

7) Average instruction execution time = clock cycle  $\times$  Average CPT  
 $= 1\text{ ns} \times [(40\% + 20\%) \times 4 + 40\% \times 5]$   
 $= 1\text{ ns} \times 4.4 = 4.4\text{ ns}$

Speedup from pipelining = Average instruction execution time of up pipeline  
 $= \frac{4.4\text{ ns}}{1.2\text{ ns}} = 3.7$  times

10) a) what is the clock cycle time of the 5-stage pipelined machine?

$$\rightarrow 7 \times 1\text{ ns} = 2\text{ ns} + 0.1 = 2.1\text{ ns}$$

b) stall every 4 instruction, what is the CPT of the new machine?

$$\rightarrow 5 \text{ cycles / stages} = 1.25 \text{ instruction}$$

c) Speedup of pipelined machine over single cycle machine?

$$\rightarrow T \times CPT + \text{longirod}$$

$T \times CPT + \text{new}$

$$ET = T \times CPT \times \text{cycle time}$$

$$\text{Speedup} = \frac{(T \times 1 \times 7)}{(T \times 1.25 \times 2.1)} = 2.67$$

d)  $T \times 1 \times 7 / 1 \times 0.1 = 70$  Ignoring extra stall cycles it would be:

Kai Hawang Pg 135  
 is based on a fixed workload or a fixed problem size.  
 Amdahl's law :- Pg 0 13

$$S_n = n$$

$$W_i = \alpha \quad i = 1 \dots n$$

$$W_i = 0 \quad i = n+1 \dots k$$

$$i \neq n$$

a) fixed workload



b) decreasing execution-time



c) Speedup with fixed load



sequential diagram fraction of program

Kai Hawang Pg 136

• Fixed-time second model and Gustafson's law

Execution Time increase with machine size



a) scaled workload

b) Fixed execution time

| Sn | 1024x | 1014x | 1004x | 993x | 983x |
|----|-------|-------|-------|------|------|
| 0% | 1024x | 1014x | 1004x | 993x | 983x |
| 1% | 1024x | 1014x | 1004x | 993x | 983x |

$S^* = 1024 - 1023x$

c) Speedup with fixed load

pg. 140 kai book

$$S^* = \frac{W_1 + W^* n}{W_1 + W^* n} = \frac{W_1 + G(n) W_0}{W_1 + G(n) W_0} \quad (3.33)$$

Special cases of the fixed memory mode are Amdahl's & Gustafson's law.

CPU Time = Instruction  $\times$  clock cycles  $\times$  Seconds  
 Program Instructions Clock cycle

Read : 1st chapter of William 9th / 10th / 11th

Q Intel is variable length or fixed length?

| 1st address                 |              | Op → Operation                      | Eg                      |
|-----------------------------|--------------|-------------------------------------|-------------------------|
| 3) Op                       | $s \uparrow$ |                                     | $Z = K + R * C$         |
| Op                          | $d$          | 3rd address                         | $100 \rightarrow Z$     |
|                             | $d$          | dest <sup>n</sup>                   | $\rightarrow MPY R, C$  |
|                             |              | operand                             | $ADD R, Z$              |
| 2) Op                       | $d$          | 2nd add <sup>n</sup>                | $MPY R, R_0, R_1$       |
|                             | $d$          | $s \uparrow$                        | $ADD Z, R_0, R_1$       |
|                             |              | zero                                |                         |
|                             |              | Op↑                                 |                         |
| a) One address instructions |              |                                     |                         |
| Load                        | $D$          | Comment : $\rightarrow$ Accumulator | LOAD C                  |
| MPY E                       |              | AC $\leftarrow$ AC $\times E$       | MUL B                   |
| ADD C                       |              | AC $\leftarrow$ AC + C              | ADD K                   |
| STOR Y                      |              | Y $\leftarrow$ AC                   | STORE Z                 |
| LOAD A                      |              | AC $\leftarrow$ A                   |                         |
| SUB B                       |              | AC $\leftarrow$ AC - B              | K B, C $\rightarrow$ MA |
| DIV Y                       |              | AC $\leftarrow$ AC $\div Y$         | PUSH K                  |
| STOR Y                      |              | Y $\leftarrow$ AC                   | PUSH B                  |
|                             |              |                                     | PUSH C                  |
| b) Two address instruction  |              |                                     | 0 address               |
| MOV Y, A                    |              | Y $\leftarrow$ A                    | MOV                     |
| SUB Y, B                    |              | Y $\leftarrow$ Y - B                |                         |
| MOVE T, D                   |              | T $\leftarrow$ D                    | POP                     |
| MPY T, E                    |              | T $\leftarrow$ T $\times E$         |                         |
| ADD T, C                    |              | T $\leftarrow$ T + C                |                         |
| DIV Y, T                    |              | Y $\leftarrow$ Y $\div T$           |                         |

Three address

|             |                      |
|-------------|----------------------|
| SUB V, A, B | Comment<br>Y ← A - B |
| MUL T, D, E | T ← D × E            |
| ADD T, T, C | T ← T + C            |
| DIV Y, Y, T | Y ← Y ÷ T            |

program to Execute  $Y = A - B$   
 $A = C \times D + E$        $C + D \times E$

Zero address

Push A

Push B

SUB

DIV

Push C

ADD

Push D

Push E

MUL

Pop Y

$$EQ \quad z = (((p - (q + s)) * t) / (u(v - w)))$$

Three address

|             |           |
|-------------|-----------|
| ADD Z, Q, S | Z = Q + S |
| MUL Z, Z, T | Z = Z × T |
| SUB Z, P, Z | Z = P - Z |

|             |                   |
|-------------|-------------------|
| SUB T, V, W | T = V - W         |
| DIV T, U, T | T = U ÷ T         |
| MUL Z, Z, T | Z = $\frac{Z}{T}$ |

|            |  |
|------------|--|
| MVN R0, V  |  |
| SUB R0, W  |  |
| MVN R1, D  |  |
| DIV R1, R2 |  |
| MVN Z, R1  |  |

Two address

|          |       |
|----------|-------|
| MVN Z, Q | Z ← P |
|----------|-------|

|          |           |
|----------|-----------|
| SUB Z, S | Z ← Z - B |
|----------|-----------|

|          |  |
|----------|--|
| MVN R, P |  |
|----------|--|

|           |  |
|-----------|--|
| SUB R, R0 |  |
|-----------|--|

|          |  |
|----------|--|
| MUL R, T |  |
|----------|--|

|          |  |
|----------|--|
| MVN R, V |  |
|----------|--|

|          |  |
|----------|--|
| SUB R, W |  |
|----------|--|

|           |  |
|-----------|--|
| MVN R2, D |  |
|-----------|--|

|            |  |
|------------|--|
| DIV R1, R2 |  |
|------------|--|

|           |  |
|-----------|--|
| MVN Z, R1 |  |
|-----------|--|

One address

|            |             |
|------------|-------------|
| LOAD A     | STORE Temp1 |
| ADD S      | LOAD U      |
| MUL T      | DIV Temp1   |
| ST Temp    | STORE Temp1 |
| LD Temp    | LOAD Temp   |
| SBT        | STORE Temp  |
| SUB Temp   | LOAD Temp   |
| STORE Temp | DIV Temp1   |
| LOAD V     | STORE Z     |
| CBW W      |             |

Op                          Operands

16 bit R 4 16 bits  
 141 of type 3 add inst ✓       $14 \times 3$  12 bit for operands  
 20 of type 2 -1-      4 bits for opcode  
 28 of type 1 -1-      escape code  
 64 of type 0 -1-       $\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{array}$   
 $0 \rightarrow 63 \rightarrow 64$

0000    01    02    03  
 !        !        !        !  
 1101    - - - - -  
 escape code      4 bits opcode  
 1110 0000      4 bits operand      4 bits opcode  
 :        :        :        :  
 1111 1101      4 bits operand      12 bits operand  
 000 → 27      12 bits      5 good  
 left hand 7  
 remaining 5 zero's

By solving this all we can add to get total number of address

$$\text{Type } 0 = 14 \times 2^8 + 30 \times 2^4 + 28 \times 2^4 + 64 = 65536 = 2^{16}$$

- Q3) Consider a machine with 16 bit instructions and 16 register.
- The instruction format can have several structures:
  - opcode + memory address (such as MARIE).
  - If we have 4KB type addressable memory we need 12 bits to specify an address location.
  - The remaining 4 bits are used for the opcode : 16 instruction are hence available.

opcode      Address

- Opcode + Register Addresses

- We need 4 bits to select one of the 16 bit available register.
- Suppose we have 4 bits

opcode    add1    add2    add3

- Q3) Consider a machine with 16bit instructions 16 register. And we wish to encode the following instruction:
- 16 bit with 3 address
  - 14 with 2 regd
  - 31 with 1 regd
  - 16 0 regd
- Can we encode this instruction set in 16 bits?
- Yes, if we use expanding encoder.

|                     |             |                                 |
|---------------------|-------------|---------------------------------|
| 0000 R1 R2 R3       | $\} 15 - 3$ | $15 \times 2^8 + 14 \times 2^8$ |
| 1110 R1 R2 R3       | $\} 15 - 3$ | $15 \times 2^8 + 14 \times 2^8$ |
| 1111 0000 R1 R2     | $\} 14 - 2$ | $14 \times 2^8 = 3584$ bit      |
| 1111 1101 R1 R2     | $\} 14 - 2$ | $14 \times 2^8 = 3584$ bit      |
| 1111 1110 0000 R1   | $\} 31 - 1$ | $31 \times 2^4 = 496$ bit       |
| 1111 1111 1110 R1   | $\} 31 - 1$ | $31 \times 2^4 = 496$ bit       |
| 1111 1111 1111 0000 | $\} 16 - 0$ | $16 \times 2^{16} = 65536$      |
| 1111 1111 1111 1111 | $\} 16 - 0$ | $16 \times 2^{16} = 65536$      |

- Q3) Is it possible to design an expanding encoder to allow the following to encode with a 12bit instruction? Assume a register operand requires 3 bits.

$$\begin{array}{ll} 4 \text{ inst} & 3 \text{ regis} \\ 255 \text{ inst} & 1 \text{ reg} \\ 16 \text{ inst} & 0 \text{ reg} \end{array}$$

$$4 \times 2^8 + 255 \times 2^8 + 16 = 81880$$

$$4104 \dots$$

- Q4) Assume the CPU having 16 registers and instruction length is 16 bits. It should be having 14 of type -3

$$\begin{array}{ll} 31 & 2 \\ 12 & 1 \\ 1 & \dots \end{array}$$

Calculate at the most how many number of type 0 instruction it should have. Show appropriate encoding → 64

$$\begin{array}{ll} 0000 R1 R2 R3 & \} 14 \\ 1101 R1 R2 R3 & \} 14 \\ 1110 0000 R1 R2 & \} 31 \\ 1111 1110 R1 R2 & \} 31 \\ 1111 1111 1001 R1 & \} 12 \\ 1111 1111 1111 R1 & \} 12 \end{array}$$

$$14 \times 2^{12} + 31 \times 2^8 + 12 \times 2^4 + 64 = 65536$$

$$\begin{array}{ll} 1110 0000 R1 R2 & \} 31 \\ 1111 1110 R1 R2 & \} 31 \\ 1111 1111 1001 R1 & \} 12 \\ 1111 1111 1111 R1 & \} 12 \end{array}$$

$$1111 1111 1111 0000 \} 64$$

① 14 instruction 3 register type 3

$$0-18 \quad 0000 \quad R_1 \quad R_2 \quad R_3 \quad | \quad 2^4 \times 2^4 \times 2^4 = 2^{12} \times 14$$

0-30 31 instruction 2 register 2 type

$$\begin{array}{l} 1110 \quad 0000 \quad R_1 \quad R_2 \quad | \quad 2^4 \times 2^4 = 2^8 \times 31 \\ 1111 \quad 1111 \quad R_1 \quad R_2 \end{array}$$

0-11 12 instruction 1 register type 1

$$\begin{array}{l} 1111 \quad 1111 \quad 0000 \quad R_1 \quad | \quad 2^4 \times 12 \\ 1111 \quad 1111 \quad 1100 \quad R_1 \end{array}$$

EU instruction 0 register type 0

$$\begin{array}{l} 1111 \quad 1111 \quad 1111 \quad 0000 \quad | \quad 64 \times 2^0 \\ 1111 \quad 1111 \quad 1111 \quad 1111 \end{array}$$

② 14 inst<sup>r</sup> type 3 16 bits 4 bytes

30 inst<sup>r</sup> type 2

28 inst<sup>r</sup> type 1

64 inst<sup>r</sup> type 0

$$0-18 \quad 0000 \quad R_1 \quad R_2 \quad R_3 \quad | \quad 14 \quad 3 \quad 14 \times 2^{12} + 30 \times 2^8 + 28 \times 2^4 + 64 \\ 1100 \quad R_1 \quad R_2 \quad R_3 \quad | \quad = 65536$$

$$0-29 \quad 1111 \quad 0000 \quad R_1 \quad R_2 \quad | \quad 30 \quad 2 \\ 1111 \quad 1101 \quad R_1 \quad R_2$$

$$0-37 \quad 1111 \quad 1110 \quad 0000 \quad R_1 \quad | \quad 29 \quad 1 \\ 1111 \quad 1111 \quad 1011 \quad R_1 \\ 1111 \quad 1111 \quad 1111 \quad 0000 \quad | \quad 840$$

12 bit 3 bytes

$$\begin{array}{r} 4 \quad 3 \\ 255 \quad 1 \\ 16 \quad 0 \end{array}$$

$$\begin{array}{l} 0000 \quad R_1 \quad R_2 \quad R_3 \quad | \quad 4 \quad 3 \\ 1000 \quad R_1 \quad R_2 \quad R_3 \end{array}$$

$$\begin{array}{l} 100 \quad 000 \quad 000 \quad R_1 \quad | \quad 255 \quad 1 \\ 111 \quad 111 \quad 111 \quad R_1 \end{array}$$

$$\begin{array}{l} 111 \quad 110 \quad 000 \quad | \quad 16 \quad 0 \\ 111 \quad 111 \quad 111 \end{array}$$

Vernon's Pathodon - 5<sup>th</sup> edition

Q.

a. 7 no. of inst<sup>r</sup> type that have only single memory address as it fields

|   |   |   |   |   |      |                |                |
|---|---|---|---|---|------|----------------|----------------|
| 1 | 1 | 1 | 1 | 1 | 0000 | R <sub>1</sub> | R <sub>2</sub> |
| 1 | 1 | 1 | 1 | 1 | 0010 | R <sub>1</sub> | R <sub>2</sub> |

b. 7 no. of inst<sup>r</sup> of type that have two register fields

|   |   |   |   |      |                |                |
|---|---|---|---|------|----------------|----------------|
| 1 | 1 | 1 | 1 | 0000 | R <sub>1</sub> | R <sub>2</sub> |
| 1 | 1 | 1 | 1 | 0110 | R <sub>1</sub> | R <sub>2</sub> |

c. 7 no. of inst<sup>r</sup> of type that have only one register field

|      |                |
|------|----------------|
| 0000 | R <sub>1</sub> |
| 0110 | -              |
| 1111 | 1111 0000      |
| 1111 | 1111 110       |

d. 8 no. of instructions of type that have zero address.

|                |
|----------------|
| 1111111111 000 |
| 1111111111 111 |

Q.

a. 6 number of inst<sup>r</sup> of type that have 1 reg field & 1 memo address

|       |        |
|-------|--------|
| 10000 | 11 101 |
|-------|--------|

b. 6 number of inst<sup>r</sup> of type that have all 3 reg field &

|          |                                              |
|----------|----------------------------------------------|
| 1111 000 | R <sub>1</sub> R <sub>2</sub> R <sub>3</sub> |
| 1111 101 | -                                            |

It is possible to immediate

Control Unit William Stroustrup chapter 20

Date 12 08 24

- There are 4 particular sub-cycles.
- 1) Instruction cycle
- 2) Indirect cycle
- 3) Execution cycle
- 4) Interrupt cycle

Microoperation - that are functional or atomic operation.  
micro refers to simple and complete small or little work of task.



Fetch cycle -  $T_f$  occurs at beginning of each Inst<sup>n</sup> cycle and cause an inst<sup>n</sup> to be fetched from memory.

- |                                   |                   |
|-----------------------------------|-------------------|
| 1) MAR - Memory Address Register. | Register          |
| 2) MBR - Memory Buffer Register.  | Involved in Fetch |
| 3) Program counter - PC           |                   |
| 4) IR - Instruction register      |                   |

MAR E (PC) must point to memory  
MBR E (memory) and IR E (MBR) not in some section

IR = Once the instr<sup>n</sup> is fetched it fetched source operands.



Control Unit Functional Requirements

12-08

- The control unit performs two basic tasks:
- 1) Sequencing micro operations.
- 2) Execution.

Figure - The CPU with System Bus



- ✓ Three steps to process to lead characteristic of control unit
- 1) Define basic elements of the processor
- 2) Describe micro-operations processor performs
- 3) Determine the function that are the control unit must perform

Wimmerer Zuenko & Zaky

William Stalling  
Control Unit Operation Chap 4  
Page No. 716 pg  
Date Hoboken

to cause the micro-operations to be performed.

• Data path

• Control path

• Micro-operations & control signals. (Table can be found on pg no 719 in textbook)

① Why the execution cycle is not there in table?

Who generates the memory address → processor

• Instruction Execution

chapter 4 - The processor - 3

1<sup>st</sup> and 2<sup>nd</sup> from William Stalling 20  
performance

17/03/24

• Bus Processing Unit

A five stages organization

Stage 1 Introduction  
fetch

Stage 2 Source  
register

Stage 3 ALU

4 Memory access

5 Destination register

- ① What is block size and who decide it?  
② What is latency of ram?



✓ Branch-if [R5] = [R6] LOOP Figure 5.16 Chapter 5 (zaky) no 19

Call - Register R9 5.17

• Waiting for Memory



① From where return address are coming?

5.19 figure 5.20 figure

sequencing 1. PC = PC + 4

2. Branch-target inter.

3. Subroutine



#### Hardwired Control

2 approach

1. Hardwired control

Hardwired

2. microprogrammed control

Setting control signal depends on -

- Content of step counter
- Contents of instruction register
- The result of computation or comparison operation
- External input signals, such as interrupt requests.

Fig 5.21  $T_1 - T_5 \rightarrow IF, ID, TF, M, WB$   
control signal sequencing  $\downarrow - \rightarrow$  execution

counter-enable



#### 5.6.1 Datapath Control Signals

RF write =  $T_5 \cdot (ALU + \text{Load} + \text{Call})$  Controlling datapath

Counter enable =  $\overline{WMEC} + MFC$

wait for a memory operation to be completed

PC enable =  $T_1 \cdot MFC + T_3 \cdot BR$

#### 5.2.2 CISc-style processor



Table 20.3 A Decoder with 4 Inputs and 16 Outputs.  
(William Stallings)

✓ Control Unit Input:

- As per instruction in TR there is a need of getting sequencing of appropriate signal

It can be done by control signals two approaches

- Hardwired control
- Microprogram control

- Let us consider 3 instructions

1. INSTR1 → add R1, R0

Sequence thus : i) R1 OUT, X IN  
ii) R0 OUT, Add  
iii) Z OUT, R1 IN  
iv) END.

2. INSTR2 → XOR R0, R5

i) R0 OUT, X IN.  
ii) R5 OUT, XOR  
iii) Z0 OUT, R0 IN  
iv) END

3. INSTR3 → ADD [R0], R6

i) R0 OUT, MAR IN, MAR \* OUT, MDR \* IN  
ii) MDR OUT, X IN.  
iii) R0 OUT, ADD  
iv) Z OUT, MDR IN, MDR \* OUT, MAR X OUT, RD  
v) END.

$$Ro\ out = ((INSTR_1 \text{ AND } T_2) \text{ OR} \\ ((\dots \text{ AND } T_1) \text{ OR} \\ ((\dots \text{ AND } T_1) \text{ OR} \\ \dots \dots))$$

where INSTR<sub>1</sub> and T<sub>2</sub> means Ro out should be made 1 if there is instruction 1 and second clock cycle.

$$ADD = INSTR_1 \text{ AND } T_2 \text{ AND } T_3 \text{ OR } \dots$$

:

END.

$$INSTR_1 \text{ AND } T_4 \text{ OR } INSTR_1 \text{ AND } T_2$$

The AND combination is for activation of particular signal in context of single instruction.

Here, for every instruction there is one AND gate and giving to each AND gate one timing marker and instruction marker.

All AND gates are OR.



We required two types of signals for this purpose—

- 1) Time marker
- 2) Instruction marker



fig. Control Unit with Decoded Inputs.

i) Different instruction with different steps which means naturally diff no of clock cycles to execute.

At the most 8 bit required choose 3 bits.

Tf 8 steps 3 bit counter

Tf 16 steps 4 bit counter.

00 T<sub>0</sub>

01 T<sub>1</sub>

10 T<sub>2</sub>

11 T<sub>3</sub>

ii) End signal connected to n bit counter that indicates one instruction is completed and set n bit counter to (quated) zero.

g) After the previous step now IR is loaded with new instruction and counter starts counting the clock.

ii) The approach discussed in hardware based control unit design lacks till we throw the design or scrub the design.

No flexibility for change as it is permanently implemented in

### Hardware architecture .

#### Chapter 21 william stalling

Machine Instruction set for Wilkes example Table 21.1

#### a) Horizontal Microinstruction



#### b) Vertical Microinstruction



### Control Memory

#### Fig 21.2 Organization of Control Memory

jump to execute  
 :  
 jump to execute  
 :  
 jump to fetch  
 :  
 jump to op-code routine  
 :  
 jump to interrupt  
 :  
 jump to

Control Unit Micromicroarchitecture fig 21.3.



Control unit function are as follows:

- 1) To execute an instruction , the sequencing logic unit issues a RFA command to the control memory .
- 2) The word whose address is specified in the control address register is read into the control buffer register.

- 3) The content of the control buffer register generates control signals and next - address information for the sequencing logic unit.  
 4)

• fig 21.4 Functioning of Microprogrammed Control unit.



Table 21.2 : (Read).

Microinstruction sequencing.  
 Design Consideration.

Fig. 21.6 Branch control Logic : Two Address fields.  
 Fig 21.7 : Single Address fields.

Fig 21.8 - 1 : Variable Format



|                                           | Hardwarebased CU               | microprogrammed CU                 |
|-------------------------------------------|--------------------------------|------------------------------------|
| 1) speed                                  | faster                         | slower.                            |
| 2) cost of implementation                 | more costly                    | cheaper                            |
| 3) flexibility                            | Not at all flexible            | flexible to accommodate            |
| 4) Ability to handle complex instructions | Difficult                      | Easy to handle complex instruction |
| 5) Decoding                               | Complex decoding + seq logic   | Simple decoder + seq logic         |
| 6) Applicability                          | Used by RISC kind of processor | Used by CISC kind of processor     |
| 7) Instruction set size                   | Smaller                        | Larger                             |

|                                                             |                                                                  |                           |
|-------------------------------------------------------------|------------------------------------------------------------------|---------------------------|
| 8) control memory                                           | absent                                                           | Present                   |
| 9) cheap area requirement                                   | less because control memory absent                               | more because control      |
| 10) occurrence of error                                     | more                                                             | less                      |
| Horizontal Microinstruction(Horizontal)                     |                                                                  | Vertical Microinstruction |
| 1) Supports longer control word                             | Supports shorter control word.                                   |                           |
| 2) Higher degree of parallelism ie degree is either 0 or 1. | Allows low degree of parallelism                                 |                           |
| 3) No additional hardware is required                       | Additional hardware is required in the form of decoders          |                           |
| 4) faster                                                   | slower.                                                          |                           |
| 5) less flexible                                            | more flexible                                                    |                           |
| 6) less use of ROM encoding as compared to vertical.        | makes more use of ROM encoding to reduce length of control word. |                           |

## Chapter 2

### Arithmetic for Computers

#### Integer Addition

Eg  $7+6$

$$\begin{array}{r}
 & (1) & (0) & (0) & (0) & (0) \\
 (0) & (0) & (1) & (0) & (1) & (1) \\
 + & 0 & 0 & 0 & 1 & 1 \\
 \hline
 & 0 & 0 & 0 & 1 & 0 & (1)
 \end{array}$$

#### Overflow

- Adding +ve and -ve operands, no overflow
- Adding two +ve no. msb is 1 is overflow
- Adding two -ve no. msb is 0 is overflow

#### Integer Subtraction

Add negation of second operand

Eg  $7-6 = 7+(-6)$

$$\begin{array}{r}
 7 \quad 0000 \quad 0000 \dots 0000 \quad 0111 \\
 - 6 \quad 1111 \quad 1111 \dots 1111 \quad 1010 \\
 \hline
 1 \quad 0000 \quad 0000 \dots 0000 \quad 0001
 \end{array}$$

#### Overflow if result out of range

- subtracting two +ve or two -ve operands, no overflow
- subtracting +ve from -ve operand overflow if result sign is 0
- subtracting -ve from +ve operand overflow result is 1

#### Arithmetic & Logical Unit

#### Integer Representation

- binary number system
- The digits zero and one
- The minus sign (for negative)
- The period or radix point

#### Sign-Magnitude Representation

#### Table 10.1

Characteristics of Two's complement Representation and Arithmetic

|                           |                                                                                                                                                 |
|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| Range                     | $-2^{n-1}$ through $2^{n-1}-1$                                                                                                                  |
| Number of Representations | one                                                                                                                                             |
| of zero                   |                                                                                                                                                 |
| Negation                  | Take the boolean complement of each bit of corresponding positive number, then add 1 to the resulting bit pattern viewed as an unsigned integer |
| Expansion of Bit Length   | Add additional bit position to the left and fill in with the value of the original sign bit                                                     |
| Overflow Rule             | If two no. with same sign are added then overflow occurs if and only if the result has the opposite sign.                                       |
| Subtraction Rule          | To subtract B from A, take the two's complement of B and add it to A                                                                            |

- Table 10.2 Alternative Representations for 4-Bit Integers

$\begin{array}{c} \text{sign} \\ (-\text{ve}) \end{array}$

$\begin{array}{c} 1001 \\ \text{2's comp} \\ +1 \\ 0001 \end{array}$

$\begin{array}{c} -1 \\ 1001 \\ (0001 + 1) 1111 \end{array}$

#### Range Extension

- In sign-magnitude notation this is accomplished by moving the sign bit to the new leftmost position and fill in with zeros.

- The procedure

- 1) rule is to move the sign bit to the new leftmost position and fill it with copies of the sign bit
- 2) For positive no.'s fill with zeros
- 3) for -ve numbers, fill it with 1.  
This is called sign extension.

#### Fixed Point Representation

##### Negation Special Case 1

$$0 = 00000000 \text{ (twos complement)}$$

$$\text{Bitwise complement} = 11111111$$

$$\text{Addition} \quad + \quad 1$$

$$\text{RESULT} \quad 10000000$$

##### CASE 2

$$-128 = 10000000 \text{ (two comple)}$$

$$\text{Bitwise complement} \quad 01111111$$

$$\text{Add } 1 \text{ to LSB} \quad + \quad 1$$

$$10000000$$



|          |  |
|----------|--|
| Page No. |  |
| Date     |  |

✓ 21.1 to 21.3 william strawing  
 Page No. \_\_\_\_\_  
 Date \_\_\_\_\_  
 T1800X  
 2B minor X

$$SO: \quad -(-128) = -128 \times$$

Monitor MSB (sign bit)

It should change during negation

03-09-13

- Negate figure 10.3 Addition of Numbers in twos complement Representation.

$$\begin{array}{l} a) 1001 = -7 \quad b) 1100 = -4 \quad c) 0011 = 3 \\ + 0101 = 5 \quad + 0100 = 4 \quad + 0100 = 4 \\ 1110 = -2 \quad 1000 = 0 \quad 0111 = 7 \end{array}$$

$$\begin{array}{lll} d) 1100 = -4 & e) 0101 = 5 & f) 1001 = -7 \\ + 1111 = -1 & + 0100 = 4 & + 1010 = -6 \\ 11011 = -5 & 1001 = \text{overflow} & 10011 = \text{overflow} \\ & & \downarrow \text{avoid} \end{array}$$

Overflow Rules - two no. with same sign but result is of different / opposite sign.

#### fig 10.4 william

#### fig 10.5 Geometric Depiction of Twos Complement Integers



fig. 10.6

Overflow  
(OF)

EVERYTHING  
IS POSSIBLE

9.1.1 Addition / Subtraction Logic Unit

Fig 9.2 Logic for addition of binary numbers.

Fig 9.3 Binary addition / subtraction logic circuit.



9.2.1 Carry-lookahead addition

using (a) 4-bit stage Bit stage cells

(b) 4-Bit adder

g. u. as fig 9.5

8-09-24

10.7 Multiplication of Unsigned Binary Integer

$$\begin{array}{r}
 1011 \\
 \times 1101 \\
 \hline
 0000 \\
 1011 \\
 1011 \\
 \hline
 1000011
 \end{array}$$

10.8 Hardware Implementation of Unsigned Binary



(a) Block diagram

M - multiplicand register

A - multiplier register

C = 0 initially initiated

C - 1 bit register to hold carry out

10.9 Flowchart of Unsigned Binary Number Multiplication



| C | A    | A    | M    | initial value        |
|---|------|------|------|----------------------|
| 0 | 0000 | 1101 | 1011 |                      |
| 0 | 1011 | 1101 | 1011 | Add } first cycle    |
| 0 | 0101 | 1110 | 1011 | Shift }              |
| 0 | 0010 | 1111 | 1011 | Shift } second cycle |
| 0 | 1101 | 1111 | 1011 | Add } third cycle    |
| 0 | 0110 | 1111 | 1011 | Shift }              |
| 1 | 0001 | 1111 | 1011 | Add } fourth cycle   |
| 0 | 1000 | 1111 | 1011 | Shift }              |

fig. 10.8 b) Hardware implementation - UBM . Ex from fig 10.7  
( prout in A,A ) .

| C | A    | A    | M    | C |
|---|------|------|------|---|
| 0 | 0000 | 0011 | 1001 |   |
| 0 | 1001 | 0011 | 1001 |   |
| 0 | 1100 | 1001 | 1001 |   |