

② System bus : (Interconnection b/w CPU and memory)

- 1) Address bus
- 2) Data bus
- 3) Control bus



$$N = 16$$

$$\log_2 N$$

$$= \log_2 16$$

= 4 ⇒ size of address bus

③ Size of address bus

= 4 ⇒ because here each location requires  
4 bits.



⇒ Initially the instructions are stored in the memory in binary form.

④ Steps for instruction execution cycle :-

- 1) Fetch / Read the instruction from memory
- 2) Decoding (After decoding the instructions we know that this is 'add' instruction)
- 3) Fetch the operand (access the content of a & b)
- 4) Execute the operation.
- 5) storing the result.

⑩ Program Counter → A special type of register which stores the starting address of a program.



⑪ Memory write operation:



⇒ Address bus and control are unidirectional, but Data bus is bi-directional.

⑫ Instructions has two parts



⑬ First, ALU gets the control signal from CPU and then it performs all the arithmetic operation.

## # Van Neumann Machine (also called IAS computer)



→ stored program concept

→ Developed in Institute of Advanced Study, Princeton

⇒ Three concepts for Van Neumann machine —

1.) Data & inst<sup>n</sup> will be stored in single memory.

2.) Content of memory will be accessed by the address not by type of data

3.) Sequential execution.

## ④ Structure of IAS computer



IR → Instruction register

MAR → Memory address register

PC → Program control

AC → Accumulator

MQ → Multiplication or division

MBR → Memory buffered register

or

DR

Data register



## ④ PCU -

PC → MAR → Address bus → Fetch the instruction from memory  
 ↓  
 opcode in IR

### # Instruction format

→ The no. of address field in the instruction format of a computer depends on the internal organisation.

→ There are four types -

① Three address inst<sup>n</sup>

② Two " "

③ One " "

④ zero " "

$$\text{Ex: } X = (A + B) * (C + D)$$

A, B, C, D, X → Memory address

### ① three address inst<sup>n</sup> -

|     |                       |
|-----|-----------------------|
| Add | R <sub>1</sub> , A, B |
|-----|-----------------------|

opcode

$$R_1 \leftarrow M[A] + M[B]$$

|     |                       |
|-----|-----------------------|
| Add | R <sub>2</sub> , C, D |
|-----|-----------------------|

$$R_2 \leftarrow M[C] + M[D]$$

|     |                                    |
|-----|------------------------------------|
| MUL | X, R <sub>1</sub> , R <sub>2</sub> |
|-----|------------------------------------|

$$M[X] \leftarrow R_1 * R_2$$

### ⑦ Two address inst<sup>n</sup>

MOV R<sub>1</sub>, A       $R_1 \leftarrow M[A]$   
 Add R<sub>1</sub>, B       $R_1 \leftarrow R_1 + M[B]$   
 MOV R<sub>2</sub>, C       $R_2 \leftarrow M[C]$   
 Add R<sub>2</sub>, D       $R_2 \leftarrow R_2 + M[D]$   
 MUL R<sub>1</sub>, R<sub>2</sub>     $R_1 \leftarrow R_1 * R_2$   
 MOV X, R<sub>1</sub>        $M[X] \leftarrow R_1$

### ⑧ One address inst<sup>n</sup>

→ In case of one address inst<sup>n</sup>, accumulator is used

Load A       $AC \leftarrow M[A]$   
 Add B       $AC \leftarrow AC + M[B]$   
 STORE T<sub>0</sub>     $M[T_0] \leftarrow AC$   
 Load C       $AC \leftarrow M[C]$   
 Add D       $AC \leftarrow AC + M[D]$   
 MUL T       $AC \leftarrow AC * T$   
 STORE X     $M[X] \leftarrow AC$

### ⑨ zero address inst<sup>n</sup> / stack organised inst<sup>n</sup>

PUSH A      TOS  $\leftarrow M[A]$   
 PUSH B      TOS  $\leftarrow M[B]$

ADD           TOS  $\leftarrow M[C]$   
 PUSH C      TOS  $\leftarrow M[C]$

PUSH D      TOS  $\leftarrow M[D]$   
 ADD

MUL

POP X

$M[X] \leftarrow TOS$



TOS → Top of stack

## ④ Types of Instructions

### I) Data transfer instruction -

MOV, LOAD, STORE, PUSH, POP

### II) Data processing instruction -

#### a) Arithmetic & Logical

ADD, MUL, DIV, SUB AND, OR, NOT

### III) Program control instruction -

#### → Branch instruction

1001: ADD A

1002: JMP 2050

1003:

conditional



1001: ADD A

1002: JNC 2050

1003:

conditional

carry bit  
if ( $C = 0$ )  
goto 2050  
else  
goto 1003

JNC → Jump not carry

JC → Jump on carry

### # Addressing mode :-

The way in which operands are chosen during program execution is dependent on the addressing mode.

I) Immediate : When address field contains data directly.  
(Faster)

e.g ADD 50



instruction

$$AC \leftarrow AC + 50$$

II) Direct addressing : When address field contains address.

Effective address = address field

e.g ADD 1000



$$AC \leftarrow AC + M[1000]$$



ADD 1000, R<sub>1</sub>

III) Indirect : content of content of address field.

Effective address = (address field)

e.g ADD (2000)



$$AC \leftarrow AC + M[M[2000]]$$



- Have to access memory twice
- can access more memory location using single instruction.

IV) Register addressing : Data present in register.

ADD R<sub>1</sub>



$$AC \leftarrow AC + R$$

V) Register indirect :

Effective address = (R)

ADD (R)



$$AC \leftarrow AC + M[R]$$



## VII) Displacement addressing:

$$\text{Effective address} = A + (R_1)$$



→  $R_1$  may be PC, Index, base register

### ① Index addressing —

(i)  $R_1 \rightarrow$  index register

→ used for arrays



→ Effective address =  $A + R_1 * \text{size of data type}$ .

### ② Base addressing —

$R_1 \rightarrow$  base register

$$\text{Effective address} = A + \text{base}$$



→ used when program relocation is required

## ⑥ Relative addressing -

$R \rightarrow PC$

$$\text{Effective address} = PC + A$$

251

## ⑦ Unsigned and signed number

No specific bit  
for sign

|   |     |
|---|-----|
| s | n-1 |
|---|-----|

n bit

$$s=0 \Rightarrow +ve$$

$$s=1 \Rightarrow -ve$$



n = 4 bit

|       |       |       |       |
|-------|-------|-------|-------|
| _____ | _____ | _____ | _____ |
|-------|-------|-------|-------|

used for sign

⑧ +8 cannot be stored in 4-bit system,  
this is known as overflow.

$$+3 \quad 0011$$

$$+5 \quad 0101$$

$$0100$$

$$0100$$

$$0100$$

To identify overflow -

$$\begin{array}{cccccc} c_n & c_{n-1} & c_{n-2} & c_{n-3} & \dots & c_0 \\ c_4 & c_3 & c_2 & c_1 & c_0 & \\ 0 & 1 & 1 & 1 & 0 & \end{array}$$

$+3$   
 $+8$   
 $\hline$

$c_n \oplus c_{n-1} = 1 \rightarrow$  for overflow.

Ex:  $-7 + -2$

$$\begin{array}{r} 000 \\ 1001 \\ \hline 1110 \\ 10111 \end{array}$$

$c_4 \oplus c_3 = 1 \therefore$  overflow

$s \swarrow$   
 $Ans = -(1001)$

Ex:  $+7 + -3$

$$\begin{array}{r} 0111 \\ 1101 \\ \hline 10100 \end{array}$$

$c_4 \oplus c_3 \neq 1 \therefore$  nonoverflow

$s \swarrow$   
 $Ans$

Ex:  $+7 + 2$

$$\begin{array}{r} 0111 \\ 0010 \\ \hline 01001 \end{array}$$

$c_4 \oplus c_3 = 1 \therefore$  overflow

$s \swarrow$   
 $Ans$

# Addition modulo system

$N=16$   
4 bit

Addition modulo 16

0 to 15

0 to  $N-1$

→ For +ve nos.  
move clockwise

→ For -ve nos.  
move anti clockwise

Range (-8 to +7)



31/1



$$2^{16} \times 12$$

Address bus size = 16

Data bus = 12

data size



PC → 16 bits

MAR → 16 bits

MDR → 12 bits

IR → 12 bits

21

$$2^{20} \times 10$$

PC = 20 bits

MAR = 20 bits

MDR = 10 bits

IR = 10 bits

## # Arithmetic unit

### \* 4-bit ripple carry adder



Gate delay -



general



$$S_0 \rightarrow ①$$

$$S_1 \rightarrow ③ (2+1)$$

$$S_2 \rightarrow ⑤ (4+1)$$

$$S_3 \rightarrow ⑦ (6+1)$$

$$C_1 \rightarrow ②$$

$$C_2 \rightarrow ④ (2+2)$$

$$C_3 \rightarrow ⑥ (4+2)$$

$$C_4 \rightarrow ⑧ (6+2)$$

→ Adding n-bit numbers

$$\text{delay } C_{n-1} \rightarrow 2(n-1)$$

$$C_n \rightarrow 2n$$

$$S_n \rightarrow C_n - 1 \text{ (excluding } C_{in})$$

→ For overflow condition

$$C_n \oplus C_{n-1} = 1$$

↓

2n

(2n+1) → detecting overflow

#### ④ combinational adder/subtractor

control signal = 0 → add  
= 1 → sub.



$$2\text{'s complement of } B = (\bar{B} + 1)$$

gate delay -

$$C_n = 2^{n+1}$$

$$S_n = C_n - 1 \quad (\text{excluding } C_n)$$

For overflow

delay =  $2n + 2$

## ④ Carry look ahead adder

$$C_{i+1} = \pi_i y_i + y_i c_i + c_i x_i$$

$$= \pi_i y_i + c_i (x_i + y_i)$$

$$= G_i + C_i P_i$$

$$C_0 = \pi_0 y_0 ; \quad P_i = x_i + y_i$$

$$C_1 = G_0 + C_0 P_0$$

$$C_2 = G_1 + C_1 P_1$$

$$= G_1 + P_1 (G_0 + C_0 P_0)$$

$$C_3 = G_2 + P_2$$

$$= G_2 + P_2 (G_1 + P_1 (G_0 + C_0 P_0))$$

We can represent

$$\phi_i = \pi_i \oplus y_i \quad (\text{as the case of } 1,1 \text{ is taken care by the } C_i)$$

$$S_i = P_i \oplus C_i$$



bit store cell



CLA LOGIC

C4

gate delay:

$$c_1 = G_0 + P_0 C_0$$



$$c_1 = ① + ② + ③ = ④$$

$$S_0 = P_0 \oplus C_0 = ⑤$$

$\rightarrow P_i, G_i \rightarrow ① \rightarrow c_i \rightarrow ④$

$$c_2 = G_1 + P_1 G_0 + P_1 P_0 C_0$$

$$S_2 = P_2 \oplus c_2 \rightarrow ⑥$$

$$S_1, S_-, S_3 \rightarrow ⑥$$

\* 16-bit CLA adder:

I) Cascading CLA

II) Higher level carry propagation & generation

⇒



$$\text{II} \rangle \quad C_4 = \underbrace{(C_3 + P_3 G_3 + P_3 P_2 G_1 + P_3 P_2 P_1 G_0)}_{G_0'} + \underbrace{P_3 P_2 P_1 P_0 C_0}_{P_0'}$$



$$G_0', P_0' \rightarrow \textcircled{1}$$

$$G_1', P_1' \rightarrow \textcircled{2}$$

Stage 1 and 2:  $\oplus$

Stage 3 and 4:  $\oplus$

Stage 5 and 6:  $\oplus$

Stage 7 and 1:  $\oplus$



7/2

Multiplication⑧ Booth's Algo.

- Extend the sign bit of partial product to the left as far as the product will extend.
- Applicable for the as well as one's complement multiplication.

## ④ Both Multiplier recording table -

| Multiplexer | version of multiplicand selected by bit $i$ |    |
|-------------|---------------------------------------------|----|
| Bit(i)      | Bit(i-1)                                    |    |
| 0           | 0                                           | 0  |
| 0           | 1                                           | +1 |
| 1           | 0                                           | -1 |
| 1           | 1                                           | 0  |

Ex. +13    01101

$$\begin{array}{r}
 -6 \\
 \underline{11010} \\
 0-1+1-10 \\
 \hline
 00000 \leftarrow 0
 \end{array}$$

$\Rightarrow 2^{\text{complement}}$  of multiplicand

$$\begin{array}{r}
 011111000110 \leftarrow -1 \\
 \downarrow \text{sign bit} \\
 110011 \leftarrow +1 \quad \Rightarrow \text{multiplicand}
 \end{array}$$

Extension

$$\begin{array}{r}
 00001101 \leftarrow +1 \\
 1110011 \\
 \hline
 0110110010 \leftarrow -1
 \end{array}$$

sign bit

Cout  
↓  
 $2^{\text{comp}} = (001001110)$

Ignore  
 $Ans = -(110011)(001001110)$

## ⑤ Bit pair recording multiplication

→ This technique reduces the number of partial products by two times.

→ For n-bit operand, no. of partial product will be  $n/2$  and it is directly derived from the booth's algo by grouping the multiplier bits in pair.

Table:

| pair  |  | result |                                                                        |
|-------|--|--------|------------------------------------------------------------------------|
| -1 0  |  | -2     | +13 01101                                                              |
| +1 0  |  | +2     | -6 11010                                                               |
| -1 +1 |  | -1     | By Booth's algo $\begin{array}{r} 0 -1 +1 \\ \hline -1 -2 \end{array}$ |
| +1 -1 |  | +1     |                                                                        |
| 0 0   |  | 0      | $-2 \rightarrow 2^{\text{'s complement}}$                              |
| 0 -1  |  | -1     | and left shift                                                         |
| 0 +1  |  | +1     | $+2 \rightarrow \text{left shift}$                                     |
|       |  |        | $-1 \rightarrow 2^{\text{'s complement}}$                              |

$$\begin{array}{r}
 -9 = 10111 \\
 +3 = 00011 \\
 \hline
 \begin{array}{r}
 \overset{0+1}{\cancel{0}} \overset{0-1}{\cancel{0}} \\
 +1 -1
 \end{array}
 \end{array}
 \begin{array}{r}
 \hline
 \begin{array}{r}
 111100110 \\
 1110011 \\
 \hline
 10110110010
 \end{array}
 \end{array}$$

$$\begin{array}{r}
 \begin{array}{r}
 \hline
 \begin{array}{r}
 111100110 \\
 1110011 \\
 \hline
 10110110010
 \end{array}
 \end{array}
 \end{array}
 \begin{array}{l}
 \leftarrow -2 \\
 \leftarrow -1
 \end{array}$$

carry sign

$$\begin{array}{r}
 \textcircled{1} 111100101 \\
 \downarrow \\
 \text{sign} \quad A_8 = - (000011011)
 \end{array}$$

8/2

carry some adder

$$\begin{array}{r}
 1101 \\
 1111 \\
 \hline
 \begin{array}{r}
 \textcircled{1} 101 \\
 1101 \\
 1101 \\
 1101 \\
 \hline
 1011101
 \end{array}
 \end{array}$$

we're getting 4 & 1s  
 we can't store it using  
 2 bit because we are  
 getting 3 bit result  
 as 100.

$$\begin{array}{r}
 \overline{0} \ 1 \ 0 \ 0 \ 1 \ 0 \ 1 \ 1 \leftarrow s \\
 0 \ 1 \ 1 \ 1 \ 1 \ 0 \ 0 \leftarrow c \\
 \hline
 1 \ 1 \ 0 \ 0 \ 0 \ 0 \ 1 \ 1
 \end{array}$$

$$\begin{array}{r}
 & & 1 \ 1 \ 1 \ 1 \\
 & \text{A} & 0 \ 1 \ 0 \ 0 \ 0 \\
 & \text{B} & 1 \ 1 \ 1 \ 1 \\
 & \text{C} & 1 \ 1 \ 1 \ 1 \\
 & \text{D} & 1 \ 0 \ 1 \ 1 \ 1 \\
 \hline
 & & 1 \ 0 \ 1 \ 1 \ 1 \leftarrow s
 \end{array}$$

while  $A + B + C$

$$\begin{array}{r}
 1 \ 1 \ 1 \ 1 \\
 1 \ 1 \ 1 \ 1 \\
 1 \ 1 \ 1 \ 1 \\
 \hline
 1 \ 0 \ 1 \ 1 \ 0 \leftarrow s \\
 0 \ 1 \ 1 \ 1 \ 0 \leftarrow c \\
 1 \ 1 \ 1 \ 1 \\
 \hline
 0 \ 0 \ 1 \ 1 \ 1 \leftarrow s \\
 1 \ 1 \ 1 \ 1 \ 0 \leftarrow c \\
 \hline
 1 \ 0 \ 0 \ 1 \ 0 \leftarrow s
 \end{array}$$

$$\begin{array}{r}
 1111 \quad A \\
 1111 \quad B \\
 1111 \quad C \\
 \hline
 101101 \leftarrow S_1 \\
 011110 \leftarrow C_1 \\
 \hline
 1111 \quad D \\
 \hline
 1101001 \leftarrow S_2 \\
 0111100 \leftarrow C_2 \\
 \hline
 1110001
 \end{array}$$

For 6 bit reg.



$$\begin{array}{r}
 111111 \\
 101010 \\
 \hline
 000000
 \end{array}$$

$$111111$$

$$000000$$

$$111111$$

$$000000$$

$$111111$$

$$D$$

$$E$$

$$F$$

Q2

Division

$$\begin{array}{r} 11 \\ \overline{)1000} \\ 11 \\ \hline 10 \end{array} \rightarrow \text{remainder}$$

$010 \rightarrow Q$

## ② Restoring technique



Steps:

i) shift A and Q left one bit position.

ii)  $A \leftarrow A - M$

iii) IF sign of A  $\neq 1$

$$q_0 \leftarrow 0, A \leftarrow A + M$$

else

$$q_0 \leftarrow 1$$

e.g.  $m \rightarrow \text{divisor} \rightarrow 11$

$Q \rightarrow \text{dividend} \rightarrow 1000$

|                      |        |                                                   |       |       |
|----------------------|--------|---------------------------------------------------|-------|-------|
|                      | M      | <table border="1"><tr><td>0011</td></tr></table>  | 0011  |       |
| 0011                 |        |                                                   |       |       |
|                      | A      | <table border="1"><tr><td>00000</td></tr></table> | 00000 |       |
| 00000                |        |                                                   |       |       |
| shift                |        | 00001                                             |       |       |
| $A \leftarrow A - M$ |        | <table border="1"><tr><td>11110</td></tr></table> | 11110 |       |
| 11110                |        |                                                   |       |       |
| $q_0 \leftarrow 0$   |        | <table border="1"><tr><td>10001</td></tr></table> | 10001 | 00001 |
| 10001                |        |                                                   |       |       |
| $A \leftarrow A + M$ | ignore |                                                   | 0000  |       |

|                      |                                                    |        |                                                   |       |         |
|----------------------|----------------------------------------------------|--------|---------------------------------------------------|-------|---------|
| shift                | 00010                                              | 000□   |                                                   |       |         |
| $A \leftarrow A - M$ | <table border="1"><tr><td>11111</td></tr></table>  | 11111  | <table border="1"><tr><td>00010</td></tr></table> | 00010 | cycle 2 |
| 11111                |                                                    |        |                                                   |       |         |
| 00010                |                                                    |        |                                                   |       |         |
| $q_0 \leftarrow 0$   | <table border="1"><tr><td>100010</td></tr></table> | 100010 | <table border="1"><tr><td>00000</td></tr></table> | 00000 |         |
| 100010               |                                                    |        |                                                   |       |         |
| 00000                |                                                    |        |                                                   |       |         |
| $A \leftarrow A + M$ | ignore                                             |        |                                                   |       |         |

|                      |                                                   |        |                                                   |       |         |
|----------------------|---------------------------------------------------|--------|---------------------------------------------------|-------|---------|
| shift                | 00100                                             | 1000 □ |                                                   |       |         |
| $A \leftarrow A - M$ | <table border="1"><tr><td>11001</td></tr></table> | 11001  | <table border="1"><tr><td>11001</td></tr></table> | 11001 | cycle 3 |
| 11001                |                                                   |        |                                                   |       |         |
| 11001                |                                                   |        |                                                   |       |         |
| $q_0 \leftarrow 1$   | <table border="1"><tr><td>00001</td></tr></table> | 00001  | <table border="1"><tr><td>0001</td></tr></table>  | 0001  |         |
| 00001                |                                                   |        |                                                   |       |         |
| 0001                 |                                                   |        |                                                   |       |         |
|                      |                                                   |        |                                                   |       |         |

|                      |                                                    |        |                                                  |      |         |
|----------------------|----------------------------------------------------|--------|--------------------------------------------------|------|---------|
| shift                | 00010                                              | 001 □  |                                                  |      |         |
| $A \leftarrow A - M$ | <table border="1"><tr><td>11001</td></tr></table>  | 11001  | <table border="1"><tr><td>1100</td></tr></table> | 1100 | cycle 4 |
| 11001                |                                                    |        |                                                  |      |         |
| 1100                 |                                                    |        |                                                  |      |         |
| $q_0 \leftarrow 0$   | <table border="1"><tr><td>000010</td></tr></table> | 000010 | <table border="1"><tr><td>0010</td></tr></table> | 0010 |         |
| 000010               |                                                    |        |                                                  |      |         |
| 0010                 |                                                    |        |                                                  |      |         |
| $A \leftarrow A + M$ | remainder                                          | 0010   | outrent                                          |      |         |

|                      | A       | Q         |         |
|----------------------|---------|-----------|---------|
| stop                 | 0 0 0   | 1 1 1 1 1 | initial |
| shift                | 0 0 1   | 1 0 □     | cycle 1 |
| $A \leftarrow A - M$ | ① 1 0 1 |           |         |
| $q_0 \leftarrow 0$   |         |           |         |
| $A \leftarrow A + M$ | 0 0 0 1 | 1 1 0     |         |
| <hr/>                |         |           |         |
| shift                | 0 0 1 1 | 1 0 □     |         |
| $A \leftarrow A - M$ | ① 1 1 1 |           |         |
| $q_0 \leftarrow 0$   |         |           |         |
| $A \leftarrow A + M$ | 0 0 1 1 | 1 0 0     |         |
| <hr/>                |         |           |         |
| shift                | 0 1 1 1 | 1 0 0 □   |         |
| $A \leftarrow A - M$ | ① 0 1 1 | 0 1 0 0 1 |         |
| $q_0 \leftarrow 1$   |         |           |         |

(a) Non-restoring

step 1: Do it n times

i) If the sign of A is 0

shift A and Q left one bit position

$$A \leftarrow A - M$$

else

left shift A and Q

$$A \leftarrow A + M$$

ii) If the sign of A is 1

$$q_0 \leftarrow 1$$

else

$$q_0 \leftarrow 0$$

step 2: If the sign of A is 1

$$A \leftarrow A + M$$

$$M \rightarrow 0011$$

step 1  
shift

$$\begin{array}{r} A \\ 0000000 \\ \hline 00001 \end{array}$$

$$\begin{array}{r} Q \\ 10010 \\ \hline 000\Box \end{array}$$

cycle 1

$$A \leftarrow A - M$$

$$\begin{array}{r} A \\ 1110 \\ \hline 0000 \end{array}$$

$$q_0 \leftarrow 0$$

shift

$$1110$$

$$000\Box$$

cycle 2

$$A \leftarrow A + M$$

$$q_0 \leftarrow 0$$

shift

$$11110$$

$$000\Box$$

cycle 3

$$A \leftarrow A - M$$

$$q_0 \leftarrow 1$$

shift

$$00010$$

$$001\Box$$

cycle 4

$$A \leftarrow A + M$$

$$q_0 \leftarrow 0$$

$$01111$$

$$0010$$

cycle 5

step 2:

$$A \leftarrow A + M$$

$$q_0 \leftarrow 0$$

$$100010$$

$$0010$$

rem.

Ans.

M 00101 01011

0 0000 1100  
0 0001 100   
11100 1000  
shift  
 $A \leftarrow A - M$   
 $Q_0 \leftarrow 0$

---

shift 11001 000   
 $A \leftarrow A + M$   
 $Q_0 \leftarrow 0$  11110 0000

---

shift 11100 000   
 $A \leftarrow A + M$   
 $Q_0 \leftarrow 1$  100001 0001  
↓ sign

---

shift 00010 001   
 $A \leftarrow A - M$   
 $Q_0 \leftarrow 0$  111.01 0010  
0010

---

Step 2  
 $A \leftarrow A + M$  1000101 0010  
dem 0000 01110  
0000 01110

~~15/2~~ Floating point representation

- Fractional no.
- Position of decimal point is not fixed.
- ① Exponent
- ② Mantissa
- ③ sign of exponent
- ④ sign of mantissa

#### ④ Normalized form:

Position of decimal point after one non-zero digit.

$$\underline{\underline{Ex}} \quad 4.567$$

$$\underline{\underline{Ex}} \quad 0.75$$

$$0.75 \times 2 = 1.50 \rightarrow 1$$

$$0.5 \times 2 = 1.00 \rightarrow 1$$

$$= (0.11)_2$$

$$= 0.1 \times 2^{-1} \rightarrow \text{normalized form}$$

hidden bit

$\Rightarrow$  In case of binary system, 1st non-zero digit is 1, so no need to represent it.

#### ⑤ IEEE 754 Standard

① single precision  $\rightarrow$  32 bits

② double "  $\rightarrow$  64 bits

#### ① single precision



$E' \rightarrow$  biased exponent

$E \leftarrow E' - 127$

$\downarrow 81 \geq 7 \geq 0$   
actual exponent

bias = 127

as  $k = 8.001 \div 2^{127}$

$s = 0 \Rightarrow +ve \text{ mantissa}$

$= 1 \Rightarrow -ve \text{ mantissa}$

$\rightarrow$  Bias

bias =  $2^{K-1} - 1$

(for exponent)

$0 \leq E' \leq 2^8 - 1$

$0 \leq E \leq 255$

$$\left. \begin{array}{l} E' = 0 \\ E' = 255 \end{array} \right\} \text{reserved}$$

so actual range

$$1 \leq E' \leq 254$$

Cases -

$$\textcircled{1} \quad E' = 0, M = 0$$

$$\Rightarrow N = 0$$

$N \rightarrow \text{Number}$

$$\textcircled{2} \quad E' = 0, M \neq 0$$

de normalized form

$$\text{Ex. } 0.0110$$

$$\textcircled{3} \quad E' = 255, M = 0$$

$$\Rightarrow N = \infty$$

$$\textcircled{4} \quad E' = 255, M \neq 0$$

$$\Rightarrow N = NaN \quad \text{eg. } \frac{0}{0}, \sqrt{-1}$$

range of E

$$-127 \leq E \leq 254 - 127$$

$$-126 \leq E \leq 127$$

$$\text{Ex. } 4.5 = 100.1 = 1.001 \times 2^2$$

$$E = 2$$

|       |          |          |
|-------|----------|----------|
| 0     | 10000001 | 00100... |
| 8 bit | 23 bit   |          |

$$E' = 2 + 127 = 129$$

$$\text{Ex. } 0.75 = 1.1 \times 2^{-1}$$

$$E' = 12.6$$

0|0111110|1000...

④ Range of single precision format

$$E' \geq 0 \quad \{ \text{reduced}\}$$

$$E' \leq 255 \quad \{ \text{number of mantissa bits} = 23 \}$$

$$\text{Min. } E' = 00000000$$

$$\text{Min. } N = 1.0 \times 2^2$$

$$\text{Min. } M = 000...00$$

(+ne)

0|00000001|000...

$$\text{Min. } N = -1.0 \times 2^2$$

(-ne)

1|00000001|000...

$$\text{Max. } E' = 254$$

$$E = 254 + 127 = 127$$

$$\text{Max. } M = \underbrace{111110}_{23}.$$

0|1111110|111...

$$\begin{array}{r} 111...01 \\ 000...12 \\ \hline 10.000... \rightarrow 2 \end{array}$$

1|1111110|111...

$$\text{Max. } N = (2-2^{-23}) \times 2^{127} \rightarrow 2.8$$

$$\text{Max. } N = -(2-2^{-23}) \times 2^{127} \rightarrow -2.8$$



$$-(2-2^{-23}) \times 2^{127}$$

$$(2-2^{-23}) \times 2^{127}$$

Comp. n → 2A

(Comp.) n → 2A

④ Double precision



19/2

Ques. The two word instruction "LOAD AC" is stored at location 200 with address field at location 201.

The address field contains the value ~~400~~ 500.

The register  $R_1$  contains 400. Evaluate the effective address and contents of AC if the following addressing mods are used -

- i) Direct    ii) immediate    iii) Indirect
- iv) Register indirect    v) Relative    vi) Index

Ans

LOAD AC 500

$$\text{i)} EA \rightarrow 500$$

$$AC \leftarrow M[500] = 30$$

$$\text{ii)} EA \rightarrow 201$$

$$AC \leftarrow M[500]$$

$$\text{iii)} EA \rightarrow 30$$

$$AC \leftarrow M[30]$$

$$\text{iv)} EA \rightarrow 400$$

$$AC \leftarrow M[400]$$

$$\text{v)} EA = 202 + 500$$

$$AC \leftarrow M[702]$$



Initial PC 200

New PC 202

\* PC increments after executing the instruction

$R_1$  400

$$\text{vi)} EA = 400 + 500$$

$$AC \leftarrow M[900]$$

## Main memory organization

Read = 1

Write = 0

Properties of cell



16x4

Row 0:   □ □ □ □

Row 1:   □ □ □ □

          □ □ □ □

          □ □ □ □

Row 5:   □ □ □ □

→ group of  $n$  bits/cell = word

→ size of address bus

→ word length = 4

→ size of data bus

→ word size = 4

→ Nos. of locations = M

Nos. of bits required =  $\log_2 M$

for a location

32x8

→ size of address bus =  $\log_2 32 = 5$

→ size of data bus = 8

∴ Address bus = 10

Data bus = 8

$$\text{so } \text{Memory} = 2^{10} \times 8 = 1024 \times 8$$

If output 32 bits required then bypass will be

10 bits from 32 bits of output 12 bits of data bus, 10 bits

with replace 8x8x16

## ④ Internal organization of memory chip.

16x4



RAM chip



ROM chip



$CS \rightarrow$  chip select line  
 $AD \rightarrow$  Address bus  
 $DB \rightarrow$  Data bus  
 $RD/W \rightarrow$  Read/write

$\Leftarrow$  Your computer system required 512 bytes of RAM and 512 bytes of ROM. Size of each RAM is  $128 \times 8$ . Design the system.



→ Memory address map:

Memory address map specifies the address assigned to each chip.

RAM 0 → 80 - FF  
 RAM 1 → Min → 0 0 0 0 0 0 0 → 00  
 Max → 0 0 1 1 1 1 1 → FF  
 Range → 00 - FF

RAM 2 → 00000000 - 01111111  
 RAM 3 → FFFF / 1997 FF  
 Range → 80 - FF

RAM 0 → 0 0 0 0 0 0 0  
 1 0 1 1 1 1 1  
 Range → 100 - 1FF

RAM 4 → 110000000  
 1 1 1 1 1 1 1 1  
 Range → 180 - 1FF



Address bus = 10 bits  
9th bit = 0  $\Rightarrow$  RAM

9-0  $\Rightarrow$  1111111100  $\Rightarrow$  1  $\Rightarrow$  ROM

Data bus = 8

for ROM

|                                                     |                                                                                                                                                                   |
|-----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $512 \times 8$<br>$512 \times 8$<br>$1028 \times 8$ | $9 \quad 8 - 0$<br>$1028 \quad 0000000000$<br>$2 \quad 0 \quad 0$<br>$Max \quad 1 \quad 1$<br>$3 \quad F \quad F$ |
|-----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|

Range 200 - 3FF

## 22) Memory Hierarchy



size of regs < size of cache

speed → access time

speed of cache > speed of P.M. > speed of S.M.



$$\text{Access time} = t_2 + t_1$$

→ Cache memory: stores the portion of program needed data which is frequently needed by CPU.

→ Locality of reference property  
 Analysis of a large no. of typical programs has shown that references to memory within fixed interval of time tend to be confined within a few localized area of memory. This property is called Locality of reference property.

## ① Temporal Locality

Recently executed inst<sup>n</sup> are likely to be executed again very soon.  
ex. for (i=0; i<n; i++)  
    { fact = fact \* i }

## ② Spatial Locality

Inst<sup>n</sup> residing close to a recently executed inst<sup>n</sup> are likely to be executed soon.  
ex. for (i=0, i<n, i++)  
    st = a[i];

## → Basic operation of cache

CPU generates some address



After finding data in MM,  
it transfers to cache.

n-1 times cache memory will be accessed

→ If Data is in cache memory → called Hit

→ If not → called Miss

## ③ Hit ratio (HR)

$$= \frac{\text{no. of hits}}{\text{no. of hits} + \text{no. of misses}}$$

## ④ Avg. memory access time

$$= (\text{HR} \times \text{cache access time}) + (\text{miss rate} \times \text{miss penalty})$$

(1 - HR)

miss penalty = cache access time + MM access time

$$\begin{aligned} \text{our cache access time} &= 100 \text{ ns} \\ \text{MM " } &= 1000 \text{ ns} \\ H.R &= 0.9 \end{aligned}$$

set<sup>n</sup>

## ② cache mapping



$$\text{size of MM} = 2^4 = 16$$

$$\text{" " blocks} = 4 = 2^2$$

$$\text{No. of blocks} = \frac{2^4}{2^2} = 4$$

$$\text{No. of blocks in cache} = C$$

$$\text{" " " " MM} = M$$

$$C \ll M$$

## ③ Mapping Techniques

### 1) Direct mapping:

$$q = b \bmod m$$

$a$  = Cache block no.

$b$  = MM " "

$m$  = no. of blocks in cache

memory.

$$\text{Cache size} = 2 \text{K bytes}$$

$$4 \text{M} \Rightarrow 64 \text{ K bytes}$$

$$\text{block size} = 16 \text{ bytes}$$

$$\text{No. of blocks in cache} = \frac{2 \times 2^{10}}{2^4} = 2^7$$

$$\text{No. of blocks in MM} = \frac{2^6 \times 2^{10}}{2^4} = 2^{12}$$

| U.M. block | Cache block |
|------------|-------------|
| 0          | 0           |
| 1          | 1           |
| :          | :           |
| 127        | 127         |
| 128        | 0           |
| 129        | 1           |
| :          | :           |
| 255        | 127         |
| 256        | 0           |



Tag      Cache block      word/offset

16 bits



size of address bus (processor etc)



## ii) Associative Mapping:

Any block from MM can be mapped to any block of cache memory.

e.g. Cache size = 2 K bytes

MM size = 64 kB

block size = 16 bytes =  $2^4$

$$\text{nos. of blocks in cache} = \frac{2^R}{16} = 2^T = 128$$

size of MM address bus = 16. ~~16~~ bits

### ④ direct



### ⑤ associative



- No. of comparisons in case of direct = 1
- " " " " associative = 128  
(in this case)  
(comparisons will be 11 at)  
drawback  $\Rightarrow$  (cost will increase)

### iii) set associative mapping:

cache memory is divided into some sets and each set contains of some blocks.

K-way set associative

Each set contains k no. of blocks.

Ex:

4-way

$$\text{no. of sets} = \frac{128}{4}$$

$$= 32$$

$$\text{No. of comparisons} = 4$$



|     |     |        |
|-----|-----|--------|
| 7   | 5   | 4      |
| Tag | set | offset |

b<sub>127</sub>

Ex: cache size = 32 kB

· MM size = 16 MB

· block size = 512 bytes = 2<sup>9</sup>

shows that partition of MM address ~~into~~ into tag, index, offset

Ans. Total cache address size =  $\frac{15}{2} + 2^6$  bits = 26 bits  
 no. of blocks in cache =  $\frac{2^15}{2^6} = 2^9$  blocks = 512 blocks  
 size of address by  $\underline{\text{tag}} = 2^6$  bits for block address, then



Set associative  $\rightarrow$  4-way. No. of sets = 512/64 = 8

$$\text{no. of sets} = \frac{64}{4} = 16$$



Q3  
 Q. Consider a direct map cache with 64 blocks  
 and block size = 16 words.  
 find out cache block no. which may contain  
 the main memory address 1200

$$1200 \Rightarrow \text{MM block} = \frac{1200}{16} = 75$$

$$a = b \bmod m$$

$$\Rightarrow a = 75 \bmod 64$$

$$= 11$$

Ques consider two way set associative cache with 128 blocks & block size = 16 words. Find out the set which will contain MM address 1500.

Sol no. of cache set =  $\frac{128}{2} = 64$

no. of MM blocks =  $\frac{1500}{16} = 93.75 = 93$  blocks

$$a \equiv b \pmod{s}$$

$$\text{set no} = 93 \pmod{64}$$

$$= 29$$

14/3

$T_{CA} \rightarrow$  Cache Access Time

$T_{MS} \rightarrow$  MM " "

$\Rightarrow$  Write back

Miss penalty

$$\text{Read : } T_{CA} + T_{MS} + P_d T_{MS}$$

$\rightarrow$  Prob. that flag is set

$$\text{Avg. memory access time for read} = H \times T_{CA} + (1-H) \{ T_{CA} + (1+P_d) T_{MS} \}$$

$$\text{write : } 2T_{CA} + T_{MS} + P_d T_{MS}$$

$$\text{Avg. memory access time for write} = H \times T_{CA} + (1-H) \{ 2T_{CA} + (1+P_d) T_{MS} \}$$

$$\text{Cache access time} = 90 \text{ ns}$$

$$\text{MM " " } = 500 \text{ ns}$$

" Cache access time = 90 ns  
 MM " " = 500 ns  
 80% of the memory request are for read operation  
 and 20% for write operation. H.R. for read  
 is 0.9, H.R. for write is 0.8. Compute avg.  
 memory access time for read operation for  
 both write back and write through.

both write back and write through  
 ii) Avg. memory access time for both read and  
 write for both write back & write through.

$$(P_d = 0.5)$$

so i) For write back -

$$\begin{aligned} T_M &= 0.9 \times 90 + 0.1 (90 + (1+0.5) \times 500) \\ &= 0.9 \times 90 + 0.1 (90 + 1.5 \times 500) \\ &= 81 + 84 = 165 \text{ ns} \end{aligned}$$

For write through -

$$T_{\text{avg}} = H \times T_{\text{CA}} + (1-H) (T_{\text{CA}} + T_{\text{MS}})$$

$$= 0.9 \times 90 + 0.1 (90 + 500)$$

$$= 81 + 59 = 140 \text{ ns}$$

ii) For write back -

$$T_{\text{avg}} = 0.8 \times 90 + (1-0.8) (2 \times 90 + (1+0.5)500)$$

$$= 72 + 0.2 (180 + 750)$$

$$= 72 + 186 = 258 \text{ ns}$$

$$T_{\text{avg.}} = P_r \times T_r + P_w T_w$$

$P_r \rightarrow$  Probability of read  
 $P_w \rightarrow$  Probability of write.

$$= 0.8 \times 165 + 0.2 \times 258$$

$$= 183.6$$

For write through -

(whether new occurs or not)

$$T_{\text{avg}} = T_{\text{MS}}$$

MM & CM will be updated parallelly

$$= 500 \text{ ns}$$

$$T_{\text{avg}} = 0.8 \times 140 + 0.2 \times 500$$

$$= 112 + 100 = 212 \text{ ns}$$

i) Find out avg. access time for memory read operation

ii) both read & write

→ only write through

$$T_{CA} = 50 \text{ ns}, T_{TA} = 50 \text{ ns}$$

$$P_{D1} = 0.8, H_{D1} = 0.9$$

## ② Cache Replacement Algorithm

→ In case of miss, if the cache memory is already full, to write in cache memory, cache replacement algo is required to choose the block to be overwritten.

|                |    |
|----------------|----|
| b <sub>0</sub> | 4  |
| b <sub>1</sub> | 5  |
| b <sub>2</sub> | 10 |
| b <sub>3</sub> | 12 |
| b <sub>4</sub> | 15 |

→ Only for associative & set-associative.

→ 3 types -

- i) FIFO → Replace the block, which has the max. timer value.
- ii) LRU → Least recently used
- iii) Random

e.g. M.M. block

[5, 4, 2, 3, 5, 1, 5, 10, 3]

Note: When cache is empty & M.M. is called for first time, miss is known as compulsory miss.

i) FIFO → [5, 4, 2, 3, 5, 1, 5, 10, 3]  
 M M M M H M M M H

ii) LRU → [5, 4, 2, 3, 5, 1, 5, 10, 3]  
 M M M H M H M H

## ① Types of cache misses →

- I) Compulsory
- II) Capacity → When program requires more blocks frequently than the capacity of cache.
- III) Conflict → Direct mapping  
More than one block is mapped to the same block.  
→ set association  
In theory, more than 4 blocks are ~~ever~~ used frequently which map to the same set.

## ② Techniques to reduce cache misses →

- ① Increasing cache size
- ② Increasing block size
- ③ Higher associativity
  - 2-way "
  - 8-way "

## ④ Compiler optimization

~~loop interchange~~

i) loop interchange

for ( $j=0$ ,  $j < 1000$ ,  $j++$ )

    for ( $i=0$ ,  $i < 1000$ ,  $i++$ )

$$a[i][j] = 2 * a[i][j];$$

$a[0][0]$

$a[1][0]$

if instead of accessing block by block data, we access all the data of ~~a~~ a block and then next block, then miss will be reduced. For this we have to interchange the loops.

```
for (i=0, i<1000, i++)
    for (j=0, j<1000, j++)
        a[i][j] = L * a[i][j];
```

|                |                 |                 |                 |
|----------------|-----------------|-----------------|-----------------|
| b <sub>0</sub> | a <sub>00</sub> | a <sub>01</sub> | a <sub>02</sub> |
| b <sub>1</sub> | a <sub>10</sub> | a <sub>11</sub> | a <sub>12</sub> |
| b <sub>2</sub> | a <sub>20</sub> | a <sub>21</sub> |                 |
|                |                 |                 |                 |
| i              |                 |                 |                 |

### ii) Loop fusion

```
for (i=0, i<1000, i++)
    for (j=0, j<100, j++)
```

$$a[i][j] = b[i][j] + c[i][j]$$

~~a[i][j] = b[i][j] \* 2;~~  
 complete all the operation related to  $a[i][j]$   
 and then move to the next i or j.

### iii) Merging arrays

~~interleaving~~ combining two separate arrays (that might conflict for a block in cache memory) into one separate array.

```
- int val[10]; → b0
```

```
int key[10]; → b1
```

```
struct merge
{
    int val;
    int key;
}; To;
```

Ques

④ Memory Expansion

i) Vertical expansion →

Increasing no. of memory location.



Increase in size of address bus.

ii) Horizontal expansion →

Increasing word length



Increase in size of data bus.

Ques

= i) How many  $256 \times 8$  RAM chips are needed for the memory of  $1K \times 16$

Soln

$$\text{no. of chips} = \frac{1K \times 16}{256 \times 8} = \frac{2^{10} \times 2^4}{2^8 \times 2^3}$$

$$= (4) \times (2)$$

↓

vertical      horizontal

ii) How many lines of address bus must be used to access  $1K \times 16$  of memory.

A<sub>1</sub> - 10

iii) How many of this lines will be common for all chips.

A<sub>2</sub> - 8

iv) Specify the size of decoder.

M -  $2 \times 4$



## ① Virtual Memory

Virtual memory is a concept that allows the execution of a program that may not be completely in MN.

Page → block of V.M.



→ Address for virtual ~~address~~ → virtual address

→ Virtual memory is also divided into some blocks, called "page".



① MMU (Memory management unit) —

Translates virtual addresses to physical addresses.



$$\text{V.A.} = 48 \text{ bits}$$

$$\text{P.A.} = 36 \text{ bits}$$

$$\text{page size} = 2\text{KB}$$

How many virtual pages & physical frames exist in the system?

$$\text{V.P. size} = 2^{\frac{48}{36}}$$

$$\text{No. of pages} = \frac{2^{48}}{2^{12}} = \frac{2^{48}}{2^{11}} = 2^{37}$$

$$\text{No. of frames} = \frac{2^{36}}{2^{11}} = 2^{25}$$

19/3

## Input and Output device



④ Difference b/w CPU & I/O device

i) I/O electro-mechanical device, CPU is electronic device.

ii) Rate of data transfer (CPU is faster)

iii) Data format



⇒ Isolated I/O vs memory mapped I/O





→ Separate address spaces for memory and I/O



memory mapped I/O

→ common instruction can be used for memory & I/O



→ common control line

→ common address spaces



ex LOAD 500 → for I/O

LOAD 1500 → memory

## ① I/O command

- i) control
- ii) status
- iii) Data input
- iv) Data output



21/9

Input output device → peripheral device

## ② Modes of data transfer -

- I) Programmed I/O
- II) Interrupt Initiated I/O
- III) Direct memory Access (DMA)

## 21/9 I) Programmed I/O

### ③ Handshaking protocol -

- i) Data in data bus
- ii) Enables data valid line
- iii) Enables data accepted by D
- iv) disables data valid lines
- v) disables data accepted line





$F = 1 \rightarrow$  means data present in the data reg.

$F = 0 \rightarrow$  No data in data reg.

I/O device is not ready

If  $F = 1$ , CPU accepts data from data reg.

after that ~~data~~  $F$  will be reset.

Steps - (for write operation)

i) placing address in address bus by CPU

ii) ~~I/O~~ Data in ~~data~~ bus  $\rightarrow$  data reg.

iii)  $F = 1$ , ~~data valid line~~

iv) Data accepted line enabled by interface

v) CPU checks the flag bit

$F = 1 \rightarrow$  ready for data transfer

vi) Data ~~reg~~ from data reg. to data bus

vii) Data accepted line disable by interface

viii) Next set of data.

## ② Drawbacks of programmed I/O —

- i) CPU has to check for flag bit (0 or 1) continuously.

## II Interrupt initiated I/O

- skip
- ii) first CPU initiate I/O operation by placing address of I/O device in address bus.
- iii) Never checks the flag bit, continue its normal execution.
- iv) interrupt occur when I/O device ready for data transfer



ISR (Interrupt service Routine)

↳ program that store what to do when interrupt occurs.

⇒ Ways to chose the address of ISR by CPU -

- i) Non-Vectored interrupt → fixed location in memory
- ii) Vectored interrupt → I/O device gives the address of ISR

#### ④ Priority Interrupt -

I/O 1, I/O 3 → at some time interrupt the CPU; then the I/O having higher priority will be serviced first

⇒ Identify highest priority -

s/w → polling → check for the interrupt signal is enabled for the I/O devices ~~in sequence wise~~ (IF I/O 1 is enabled then no checking for I/O 2 and I/O 3)



VAD → vectored address

INT → interrupt

ACK → ~~Ack~~ acknowledgement

$P_I = 1$  for I/O 1

$P_O = 0$

blocks the signal. (if interrupt signal is high)  $\rightarrow$  for I/O 1.

If I/O 1 is not ready from data transfer

for I/O 2

$P_I = 1$

$P_O = 1 \rightarrow$  passes the signal to next I/O

or if I/O 2 is ready

for I/O 2

$P_I = 1$

$P_O = 0$

## ① Drawbacks

Programmed I/O & Interrupt I/O both required CPU interaction. I.e. time loss.

ref/3

## III DMA controller (Memory Map I/O)

Add. reg.: starting address of block of data to be read or written in memory.

① starting address of block of data to be read or written in memory.

② No. of words to be transferred.

Preview

$$A_1 = \cdot 3 = k t_p$$

$$A_2 = k t_p + t_p = 4$$

$$A_3 = \underbrace{k t_p}_{A_1} + \underbrace{t_p}_{A_2} + \underbrace{t_p}_{A_3} = 5$$

$$SU = \frac{n k t_p}{k t_p + (n-1) t_p} = \frac{n k}{\underbrace{k + n - 1}_n} = \frac{n k}{n} = k$$

for  $n \gg k-1$



You have large no. of operators

$$n \rightarrow \infty$$

$$n \gg k-1$$

$Speed\ up \approx k$

Ex:  $t_p = 20\ ns$

$$K = 4$$

$$n = 100$$

$$SU = \frac{n k t_p}{k t_p + (n-1) t_p} = \frac{n k}{\underbrace{k + n - 1}_n} = \frac{100 \times 4}{4 + 100 - 1}$$

$$= \frac{400}{103} = 3.88$$

② → max. speedup  $\approx K$

26/3 pipeline



$S \rightarrow$  segments.

# ① Data pipeline :

$A_i * B_i + C_i$  stages without feedback.

$i=1 \quad A_1 * B_1 + C_1$  initial value of stage 1 is  $C_0$

$i=2 \quad A_2 * B_2 + C_2$  value of stage 2 is  $C_1$

$\vdots$

$i=7 \quad A_7 * B_7 + C_7$  final value of stage 7 is  $C_6$

$$S_1 : R_1 \leftarrow A_1, R_2 \leftarrow B_1$$

$$S_2 : R_2 \leftarrow R_1 * R_2, R_3 \leftarrow C_1$$

$$S_3 : R_5 \leftarrow R_3 + R_4$$



| clock | $S_1$          | $S_2$          | $S_3$                      |                   |
|-------|----------------|----------------|----------------------------|-------------------|
| 1     | $R_1$<br>$A_1$ | $R_2$<br>$B_1$ | $R_3$<br>$A_1 * B_1 + C_1$ | $R_5$             |
| 2     | $A_2$          | $B_2$          | $C_2$                      | $A_1 * B_1 + C_1$ |
| 3     | $A_3$          | $B_3$          | $C_3$                      | $A_1 * B_1 + C_1$ |
| 4     | $A_4$          | $B_4$          | $C_4$                      | $A_1 * B_1 + C_1$ |
| 5     |                |                |                            | $A_1 * B_1 + C_1$ |
| 6     |                |                |                            | $A_1 * B_1 + C_1$ |
| 7     |                |                |                            | $A_1 * B_1 + C_1$ |

| clock | $S_1$          | $S_2$          | $S_3$                      |                   |
|-------|----------------|----------------|----------------------------|-------------------|
| 1     | $R_1$<br>$A_1$ | $R_2$<br>$B_1$ | $R_3$<br>$A_1 * B_1 + C_1$ | $R_5$             |
| 2     | $A_2$          | $B_2$          | $C_2$                      | $A_1 * B_1 + C_1$ |
| 3     | $A_3$          | $B_3$          | $C_3$                      | $A_1 * B_1 + C_1$ |
| 4     | $A_4$          | $B_4$          | $C_4$                      | $A_1 * B_1 + C_1$ |

| clock | $S_1$          | $S_2$          | $S_3$                      |                   |
|-------|----------------|----------------|----------------------------|-------------------|
| 1     | $R_1$<br>$A_1$ | $R_2$<br>$B_1$ | $R_3$<br>$A_1 * B_1 + C_1$ | $R_5$             |
| 2     | $A_2$          | $B_2$          | $C_2$                      | $A_1 * B_1 + C_1$ |
| 3     | $A_3$          | $B_3$          | $C_3$                      | $A_1 * B_1 + C_1$ |
| 4     | $A_4$          | $B_4$          | $C_4$                      | $A_1 * B_1 + C_1$ |

$$A_7 * B_7 + C_7$$

Total Cycle = 9

## ④ Instruction pipeline:

→ Instruction execution cycle

I) Fetch the instruction

II) Decode " "

III) Fetch the operands OF

IV) Execute the instruction EX

| Inst <sup>n</sup> | 1  | 2  | 3  | 4  | 5  | 6  | 7  |
|-------------------|----|----|----|----|----|----|----|
| I <sub>1</sub>    | FI | DO | OF | EX |    |    |    |
| I <sub>2</sub>    | -  | FI | DO | OF | EX |    |    |
| I <sub>3</sub>    | -  | -  | FI | DO | OF | EX |    |
| I <sub>4</sub>    | -  | -  | -  | FI | DO | OF | EX |

without pipeline

$$I_1 = 4, I_2 = 4 + 4, I_3 = 4 + 4 + 4$$

$$I_4 = 16$$

with pipeline

$$I_4 = 7$$

$$\text{Speed up} = \frac{16}{7}$$

## ② Pipeline hazards

### i) structural hazards (Resource conflict)

when more than one inst want to use the same resource at the same time



I<sub>1</sub> & I<sub>4</sub> both are using memory at CP-4  
for writing & reading resp.

To resolve this



Solution :- ② Delayed Load : (H/w)

Delayed loading conflicting operator.

NOP  $\Rightarrow$  No operation will be performed, just increase the clock cycle.

|   | 1   | 2  | 3  | 4  | 5  | 6                        |
|---|-----|----|----|----|----|--------------------------|
| I | FI  | DO | OF | EX |    |                          |
| J | NOP | FI | DO | OP | EX |                          |
|   |     |    |    |    |    | $R_1 = 15 \quad R_2 = 6$ |

1 CLK delay

③ S/W  $\Rightarrow$  compiler optimization

Compiler detect data hazard  $\rightarrow$  rearrange the instructions.

$$\begin{array}{l} R_1 \leftarrow R_2 + R_3 \\ R_4 \leftarrow R_1 * 4 \\ R_5 \leftarrow R_6 + R_7 \end{array} \left\{ \begin{array}{l} \text{RAWI} \\ \text{RWH} \end{array} \right\} \Rightarrow \begin{array}{l} I: R_1 \leftarrow R_2 + R_3 \\ J: R_5 \leftarrow R_6 + R_7 \\ K: R_4 \leftarrow R_1 * 4 \end{array}$$

- $\Rightarrow$  independent instn will be inserted b/w.
- $\Rightarrow$  no extra clk pulse is required.

④ Operand Forwarding (H/w)



Ex will be performed in ALU.

Hardware detect the data hazard  $\rightarrow$  Instead of storing the value in  $R_1$ , ALU bypass the value. No extra clk pulse is required.

II) WAH:

$$I: R_1 \leftarrow R_2 + R_3$$

$$J: R_2 \leftarrow R_5 * 3$$

hazard will occur if J writes before I reads.  
 $(R_2)$

### III WAW :

I :  $R_1 \leftarrow R_2 + R_3$

J :  $R_1 \leftarrow R_4 - R_5$

K :  $R_6 \leftarrow R_1 + 10$

Hazard will occur if J writes the variable ( $R_1$ ) before I writes the variable ( $R_1$ )

### ④ Control hazard :

2000 : 1 :  $R_1 \leftarrow M[1000]$

2001 : 2 : JMP 2050

2002 : 3 :  $R_4 \leftarrow R_2 + R_3$

2050  
PC

2050 :  $R_5 \leftarrow R_6 + 4$

i) unconditional branch Inst<sup>n</sup> → ex: JMP 2050

ii) conditional branch Inst<sup>n</sup> →

↓ ex: JNC 2050

condition check JC 2050

At the end of 5th CLK

PC 2050

|                | 1  | 2  | 3  | 4  | 5  | 6  |
|----------------|----|----|----|----|----|----|
| I <sub>1</sub> | FI | DO | OF | EX |    |    |
| I <sub>2</sub> |    | FI | DO | OF | EX |    |
| I <sub>3</sub> |    |    |    |    |    | FI |

↓

FI from 2002      ↓      FI for 2050

Normally      ↓      ↓

in case of branch

~~Branch Buffer~~

### ① Delayed Branch:

Next B do NOP.

(6-8) delay

### ② Branch Target Buffer

| Address of Branch | target address |
|-------------------|----------------|
| 3000              | 2050           |
| 2000              | 2000           |
| 1000              |                |

BTB

2000 Add A,B  $\rightarrow$  100P

2001: JMP 2000

No need for the gate delay, if the instruction address matches the BTB address of branch then it will start the target instruction execution.

### ③ Branch Prediction:

JNC 3000 CP

for ( $i=0, i < 99, i++$ )

$P = 1 \rightarrow$  go to 3000.  $C \leftarrow A+B$

$P = 0 \rightarrow$  next inst<sup>n</sup> 99 times prediction

$\Rightarrow$  for loop inst<sup>n</sup> will be correct

only 1 time ~~P~~ miss prediction will occur.

first before occurring 2001 of

2/4/18

## RISC Architecture

- small nos. of instruction sets.
- simple instruction
- Few addressing modes
- Fixed length inst<sup>n</sup> format
- Memory access is limited to load and store inst<sup>n</sup>.
- Large nos. of registers.  
All operations are done within reg. of CPU.

Ex. ADD R<sub>1</sub>, R<sub>2</sub> ← [faster becoz using reg.]

ADD

- Single cycle inst<sup>n</sup> execution, efficient for pipeline.
- Hardwired control unit.

## CISC (Complex inst<sup>n</sup> set computer)

- Large nos. of inst<sup>n</sup>.
- complex inst<sup>n</sup>.
- Large variety of addressing modes (5-20)
- Variable length inst<sup>n</sup> format
- Inst<sup>n</sup> manipulate operands in memory

$$M[Z] \leftarrow M[X] + M[Y]$$

- Small nos. of reg.
- Complex inst<sup>n</sup> may have multiple clock cycle
- Micro programmed control unit.

## 5/4. Control signal

- > Hardwired CU  $\rightarrow$  control signals are generated using "hardwired" circuit design.
- > Microprogrammed CU

↳ control signals are generated by using a programming.

### I Hardwired CU:

- $\rightarrow$  Decoder, encoder, Functional logic.



conditional codes are required for the branched instructions.

- $\rightarrow$  Fixed instruction sets.
- $\rightarrow$  Not flexible
- $\rightarrow$  Difficult to handle complex instr.
- $\rightarrow$  Complicated design process
- $\rightarrow$  Speed is faster.
- $\rightarrow$  complex decoding & branching (e.g. RISC)

### III Microprogrammed CU

- Control signals ← programming.
- Speed is slower
- More flexible
- Easier to handle complex instn.
- Simple design process

Ex. CISC

$$R_3 \leftarrow R_1 + R_2$$

①  $R_{out}, Y_{in}$

②  $R_{out}, \text{select } Y, \text{ add}, Z_{in}$

③  $Z_{out}, R_{in}$



Micro routine → set of micro instructions which actually forms a particular machine lang.

Control word → ~~the~~ content of any memory locations of control memory,

# Block diagram of microprogrammed CU



sequencer is also called next address generator.

CAR → control address reg.

CM → control memory

CDR → " data reg

CW → " word.

- ① increment the content of CAR
- ② Load the branch address
- ③ subroutine call and return
- ④ Mapping process