

17/01/18

- System bus :- (interconnection b/w CPU & memory)

- 1) Address bus
- 2) Data bus
- 3) Control bus

|           |   |   |                                  |
|-----------|---|---|----------------------------------|
| 0000 → 0  | 1 | 1 | $N = 16$                         |
| 0001 → 1  |   |   | $\log_2 N$                       |
| 0010 → 2  | 1 | 0 | $= \log_2 16$                    |
| ⋮         | ⋮ | ⋮ | $= 4 \Rightarrow 8 \text{ bits}$ |
| 1111 → 15 |   |   | of address bus                   |

Size of data bus

= 4 ⇒ because here each location requires 4 bits.



- » Initially the instructions are stored in the memory in binary form.
- Steps for instruction execution cycle :-
  - 1) Fetch / Read the instructions from memory.
  - 2) Decoding. (After decoding the instructions we know that this is 'add' instruction)
  - 3) Fetch the operand. (access the content of a 8-b)
  - 4) Execute the operation.
  - 5) Storing the result.

- Program Counter → a special type of register which stores the starting address of a program.

- Memory Read :-



- Memory Write Operation :-



- » Address bus & Control bus are unidirectional, but Data bus is bi-directional.

- Instructions has two parts —



@ First, ALU gets the control signal from CPU and then it performs all the arithmetic operation.

18/01/18

- Von - Neumann Machine :-

Stored Program Concept → is developed in

Institute for Advanced Studies,  
(IAS)  
Princeton



⇒ There are 3 main concept for Von - Neumann Machine

- 1) Data & Instruction will be stored in single memory.
- 2) Content of memory will be accessible by address, not by data-type.
- 3) Sequential execution.

This machine is also called IAS computer.

⇒ What is stored program concept?

- Structure of IAS computer :-

consists of CPU, memory, register.



$\begin{array}{l} \text{MBR} \rightarrow \\ \text{Memory Buffer} \\ \text{Registers / DR} \\ \text{AC} \rightarrow \text{Accumulator} \\ \text{MS register} \rightarrow \text{multi or div.} \end{array}$

$\begin{array}{l} \text{IR} \rightarrow \text{Instruction Register} \\ \text{MAR} \rightarrow \text{Memory Address Registers} \\ \text{PCU} \rightarrow \text{Program Control Unit} \end{array}$

$\begin{array}{l} \text{DPU} \rightarrow \text{Data Processing Unit} \end{array}$



~~MAR / DR~~ AC → temporary  
register



### • Instruction format

The number of address field in the instruction format of a computer depends on the internal organisation. There are 4 types of instruction format →

- 1) Three Address Instruction
- 2) Two
- 3) One
- 4) Zero

⇒  $x = (A+B) * (C+D)$  → evaluate using three address instruction  
 $A, B, C, D, x$  → represents memory address

|     |                       |
|-----|-----------------------|
| Add | R <sub>1</sub> , A, B |
|-----|-----------------------|

opcode address

R<sub>1</sub> → Register

$$R_1 \leftarrow M[A] + M[B]$$

↓  
content of memory location A,  
M[100]

|     |               |                              |
|-----|---------------|------------------------------|
| Add | $R_1, A, B$   | $R_1 \leftarrow M[A] + M[B]$ |
| Add | $R_2, C, D$   | $R_2 \leftarrow M[C] + M[D]$ |
| MUL | $x, R_1, R_2$ | $M[x] \leftarrow R_1 * R_2$  |

$\Rightarrow$  Two Address Instruction  $\leftarrow$  Load (Reading)  $\leftarrow$  always write

|     |            |                                                   |
|-----|------------|---------------------------------------------------|
| MOV | $R_1, A$   | $R_1 \leftarrow M[A]$                             |
| ADD | $R_1, B$   | $R_1 \leftarrow R_1 + M[B]$                       |
| MOV | $R_2, C$   | $R_2 \leftarrow M[C]$                             |
| ADD | $R_2, D$   | $R_2 \leftarrow R_2 + M[D]$                       |
| MUL | $R_1, R_2$ | $R_1 \leftarrow R_1 * R_2$                        |
| MOV | $x, R_1$   | $M[x] \leftarrow R_1 \Rightarrow$ Store (writing) |

$\Rightarrow$  One Address Instruction :-

In case of one address instruction Accumulator is used.

|       |       |                             |
|-------|-------|-----------------------------|
| Load  | A     | $AC \leftarrow M[A]$        |
| Add   | B     | $AC \leftarrow AC + M[B]$   |
| Store | $T_1$ | $M[T_1] \leftarrow AC$      |
| Load  | C     | $AC \leftarrow M[C]$        |
| Add   | D     | $AC \leftarrow AC + M[D]$   |
| Store | $T_2$ | $M[T_2] \leftarrow AC$      |
| Load  | $T_1$ | $AC \leftarrow M[T_1]$      |
| Mul   | $T_2$ | $AC \leftarrow AC * M[T_2]$ |
| Store | $T_3$ | $M[T_3] \leftarrow AC$      |
| Store | x     | $M[x] \leftarrow AC$        |

29/01/18

⇒ Zero Address Instruction :- / Stack organised instruction

Addition and multiplication instruction do not contain any address field.



• Types of Instruction :-

1) DATA TRANSFER :-

MOV, LOAD, STORE, PUSH, POP



2) DATA PROCESSING :-

Arithmetic, Logical

↓  
   ADD, MUL, DIV,  
   SUB

↓  
   XOR, XNOR, AND

### 3) PROGRAM CONTROL

#### Branch instruction:

1001 : Add A



1002 : JMP 2050

1003 : JNC 2050

→ ~~Jump~~ Jump not carry [ $C \neq 0$ ]

1004 : JC 2050 if ( $C = 1$ ) [ $C = 1$ ]

{ }

{ }

→ The way in which operands are accessed / chosen during program execution; is depended on the addressing mode.

#### 1) Immediate



→ faster, no need to address memory

Add 50

$$AC \leftarrow AC + 50$$



#### 2) Direct

Effective address = address field

→ Here one ~~address~~ memory address is required.

Address of operand

Add 1000  
address

$$AC \leftarrow M[1000] + AC$$



### 3) Indirect

Effective address = content of (address field)



Add (2000)  $\rightarrow$  indirect addressing

$$AC \leftarrow AC + M[M[2000]]$$

$\Rightarrow$  Advantage is  $\infty$



Using direct addressing we can access the contents of 1000 memory locations.

But in case of indirect we can access the contents of more memory locations.

### 4) Register addressing



Add  $R_1$

$$AC \leftarrow AC + R_1$$

Content of  $R_1$  and  $AC$  will be added

### 5) Register indirect

Effective address =  $(R_1)$



Address ( $R_1$ )

$$AC \leftarrow AC + M[R_1]$$

- ⇒ Register indirect is faster than memory indirect.
- ⇒ Memory direct → n → n register → n.

### 6) Displacement Addressing :-



This R may be be PC, index register, base register

### ② Index addressing :-



$R \rightarrow$  index register

Address field → starting address

### ⑥ Relative addressing :-

$R \rightarrow PC$  content of address of PC and content of Register will be added.

### ⑦ Base addressing :-

$R \leftarrow$  base register



### ⇒ 2's compliment addition :-

$$(-5) + (+3)$$

$$\begin{array}{r} +5 \rightarrow 0101 \\ +3 \rightarrow 0011 \\ \hline 0 \quad 1000 \quad (8) \end{array}$$

using 4 bit we can not stop  
(+8). This is called  
overflow.

25/01/18

### • Unsigned and signed number :-

|   |   |
|---|---|
| S |   |
| 0 | 1 |

 n bit,  $n-1$   
 $s=0 \rightarrow +ve$   
 $s=1 \rightarrow -ve$



$$\begin{array}{r}
 +3 \rightarrow 0011 \\
 +5 \rightarrow 0101 \\
 \hline
 & \text{s} \checkmark \\
 & 1000 (+8)
 \end{array}$$

$$\begin{array}{cccccc}
 c_n & c_{n-1} & & & & \\
 q_1 & q_3 & q_2 & q_1 & q_0 \\
 0 & 1 & 1 & 1 & 0
 \end{array}$$

$$\Rightarrow (+3) + (-1)$$

$$\begin{array}{r}
 q_1 q_2 q_3 q_0 \\
 \hline
 0011 \\
 + 1111 \\
 \hline
 10010 \\
 q_1 \quad (+2)
 \end{array}$$

Using 4 bit we can calculate upto +7, so +8 is overflow

$$c_n \oplus c_{n-1} = 1$$

Overflow occurred

$$\Rightarrow (-7) + (-2)$$

$$\begin{array}{r}
 1001 \\
 + 1110 \\
 \hline
 01101
 \end{array}$$

$$q_1 \oplus q_3$$

$$= 1 \oplus 0 = 1 \Rightarrow \text{Overflow will occur.}$$

$$\Rightarrow (+7) + (-3)$$

$$\begin{array}{r}
 0111 \\
 + 1101 \\
 \hline
 10100
 \end{array}$$

$$1 \oplus 1 \Rightarrow 0$$

(No overflow)

$$\begin{array}{r}
 q_1 \oplus q_3 = 1 \\
 q_2 = 1 \\
 q_3 = 1
 \end{array}$$

$$c_0 = 0$$

$$c_1 = 0$$

$$c_2 = 0$$

$$c_3 = 0$$

$$q_1 = 1$$

$$\begin{array}{r}
 0011 \\
 \rightarrow 1100
 \end{array}$$

$$c_0 = 0$$

$$c_1 = 1$$

$$c_2 = 1$$

$$c_3 = 1$$

$$q_1 = 1$$

$$\Rightarrow (+7) + (+2)$$

$$\begin{array}{r} 1100 \\ 0111 \\ \hline 1001 \\ 1001 \end{array}$$

So, this is signed bit

$$c_0 = 0$$

$$c_1 = 0$$

$$c_2 = 0$$

$$c_3 = 0$$

$$c_4 = 0$$

$$c_5 \oplus c_6 = 0 \oplus 1$$

$$= 1 \Rightarrow \text{Overflow}$$

- Addition - Modulo System :-

$N=16 \Rightarrow$  Using 4-bit we can represent  $2^4 = 16$ .

$\Rightarrow$  Addition Modulo 16



0 to 15

0 to  $(N-1)$

Sign

0 000  $\rightarrow 0$

0 001  $\rightarrow 1$

0 10  $\rightarrow$

0 11  $\rightarrow 7$

1 000  $\rightarrow -8$  (2's comp)

1 001  $\rightarrow -7$

1 11  $\rightarrow -1$

$+7$   $\rightarrow$  Move upto +7  
 $-3$  then go upto 3  
 $\frac{+4}{+100}$  places back (anticlockwise)

31/01/18



Address bus size = 16 = PC size

Data bus  $n = 12$

PC > MAR = 16 bits

MDR = 12 bits  $\rightarrow$  size of data

$2^{16} \times 12 \rightarrow$  Address space  $\rightarrow$  how many location  
the processor can support.

» In case of IAS computer  $\rightarrow$

IR only hold opcode.

» But in case of other computer  $\rightarrow$

IR holds total instruction (opcode + operand)

$\therefore$  So, here total size of ~~IR~~ IR = 12 bits

= width of each location

C2A+B

### Arithmetic & Logical Unit

| <u>P<sub>1</sub></u> | <u>P<sub>2</sub></u> | <u>Processor</u> | <u>Performance</u> |
|----------------------|----------------------|------------------|--------------------|
| $t_1 > t_2$          |                      |                  |                    |

#### Full Adder

$$\text{Sum} = A \oplus B \oplus C_{in}$$

$$\text{Carry} = AB + BC_{in} + C_{in}A$$



R → Ripple carry adder

• Gate delay



Gate delay = 2  
(c<sub>1</sub>)



Gate delay = 1  
(s<sub>0</sub>)

$$C_2 = 2 + 2 = 4$$

$$C_3 = 4 + 2 = 6$$

$$S_1 = 1 + 2 = 3$$

$$S_2 = 4 + 1 = 5$$

$$C_4 = 6 + 2 = 8$$

$$S_3 = 6 + 1 = 7$$

⇒ If we add 2 n bits number, then →

$$C_{n-1} = 2(n-1) \text{ time unit.}$$

$$C_n = 2n$$

$$\begin{matrix} C_n \oplus C_{n-1} = 1 \\ \downarrow \\ 2n \end{matrix}$$

1 G.D

$$\text{Total} \geq (2n+1) \text{ Gate Delay}$$

↳ for detecting overflow.

⇒ Control signal →

= 0 → Add

= 1 → Subtraction



$$c_3 = 2+1 = 3$$

$$s_0 = 1+1 = 2$$

$$c_2 = 3+2 = 5$$

$$s_1 = 2+3 = 5$$

$\bar{B}_0, \bar{B}_1, \bar{B}_2, \bar{B}_3$  will be available simultaneously for the 1st operation after one Gate delay.

$$c_3 = 5+2 = 7$$

$$s_2 = 5+1 = 6$$

$$c_1 = 7+2 = 9$$

$$s_3 = 7+1 = 8$$

|                        |                                      |
|------------------------|--------------------------------------|
| $c_{n-1} = 2(n-1) + 1$ | $\Rightarrow$ For overflow detection |
| $c_n = 2n + 1$         | time delay = $\frac{2n+2}{2}$        |

### • Carry-Look-Ahead Adder

$$\begin{aligned} c_{i+1} &= x_i y_i + y_i c_i + c_i x_i \\ &= x_i y_i + c_i (x_i + y_i) \\ &= G_i + C_i P_i \end{aligned}$$

$$\begin{cases} P_i = x_i + y_i \\ G_i = x_i y_i \end{cases}$$

$$c_1 = G_0 + C_0 P_0$$

$$c_3 = G_2 + C_2 P_2$$

$$c_2 = G_1 + C_1 P_1$$

$$= G_1 + P_1 (G_0 + C_0 P_0)$$

$$\begin{aligned} c_4 &= G_3 + C_3 P_3 \\ &= G_3 + (G_2 + C_2 P_2 (G_1 + P_1 (G_0 + C_0 P_0))) \\ &= G_3 + (G_2 + P_2 G_1 + P_2 P_1 G_0 + P_2 P_1 P_0) \end{aligned}$$

$$\begin{aligned}
 Q_4 &= G_{13} + P_3 P_9 \\
 &= G_{13} + \{G_{12} + (G_1 + P_1 G_{10} + C_0 P_0 P_1) P_2\} P_3 \\
 &= G_{13} + P_3 G_{12} + P_3 P_2 G_1 + P_1 P_2 P_3 G_{10} + P_0 P_1 P_2 P_3 G_0
 \end{aligned}$$

01/02/18

### Carry Look Ahead Adder



Bit Stage cell



$$\begin{aligned}
 S_0 &= P_0 \oplus C_0 \\
 &= 1 + 1 \\
 &= 2
 \end{aligned}$$

$$\begin{aligned}
 C_4 &= G_0 + P_0 C_0 \\
 &= 1 + 1 + 1 \\
 &= 3
 \end{aligned}$$

$$C_2 = G_1 + P_1 (G_{10} + G_0 P_0)$$

$$= G_1 + P_1 G_{10} + P_0 P_1 C_0$$

$$C_2, C_3, C_4 = 3$$

$$S_1, S_2, S_3 = 1$$

• fan in :-

5 input gates are available.

disadvantage :-

for carry look ahead adder for 16 bits it requires gate of 15 i/p AND gate which is not possible.

• Cascading CLA :-



• Higher level Carry Look Ahead Adder :-



$$G_1 = \underbrace{G_3 + P_3 G_2 + P_3 P_2 G_1 + P_3 P_2 P_1 G_0}_{G_{16}} + \underbrace{P_3 P_2 P_1 P_0 C_0}_{P_{16}'}$$

$G_0', P_0'$  → 3 G. Deb'y

$G_2', P_2'$  → 3 n

05/02/18

- 16-bit Adder

Ripple carry adder

$$\frac{C_{16}}{2^n} = 32$$

$$\frac{S_{15}}{(2^n - 1)} = 31$$

Cascading CLA

9

10

$$S_i = x_i \oplus y_i \oplus C_i$$

High level carry propagation & generation.

$$C_1, C_8, C_{12}, C_{14}$$

= 5

8

$$S_{15} = x_{15} \oplus y_{15} \oplus C_{15}$$

$C_{15}$

$C_{12}$

$$C_2 = G_1 + P_1 G_0 + P_0 P_1 C_0$$

$$C_3 = G_2 + P_2 G_1 + P_1 P_2 G_0 + P_0 P_1 P_2 C_0$$

$$C_{15} = G_2^I P_2^I G_1^I + P_2^I P_1^I G_0^I + P_2^I P_1^I P_0^I C_{12}$$

→ after 7 G.D.

$$S_{15} \leftarrow x_{15} \oplus y_{15} \oplus C_{15}$$

$$= 8 \text{ G.D.}$$

- Multiplication of positive numbers

n-bits

product → 2n bits

$$\begin{array}{r} 13 \\ \times 11 \\ \hline \end{array}$$

1101 → Multiplicand (M)

1011 → Multiplier (Q)

1101 → Partial product (PP<sub>0</sub>)

0000 → PP<sub>1</sub>

1101 → PP<sub>2</sub>

10001111 → PP<sub>3</sub>

P<sub>7</sub> P<sub>6</sub> P<sub>5</sub> P<sub>4</sub> P<sub>3</sub> P<sub>2</sub> P<sub>1</sub> P<sub>0</sub>



07/02/18

### • Hardware configuration :



Initially, CA  $\leftarrow 0$ .

Size of A, q  $\rightarrow n$  bit.

q  $\leftarrow$  Multiplier

M  $\leftarrow$  Multiplicand

$\Rightarrow$  If  $q_i = 1$

$\rightarrow$  add + shift

$\Rightarrow$  If  $q_i = 0 \rightarrow$  shift

Multiplicand

|            | M    | P    | Q                                  |
|------------|------|------|------------------------------------|
| e          | 1101 |      |                                    |
| o          | 0000 |      |                                    |
|            |      | P    | Q                                  |
| 1st cycle  | 0    | 1101 | 1011 → add                         |
|            | 0    | 0110 | 1101 → shift                       |
| 2nd cycle  | 1    | 0011 | 1101 → add $\frac{0110}{1101}$     |
|            | 0    | 1001 | 1110 → shift $\frac{1101}{110011}$ |
| 3rd cycle  | 0    | 0100 | 1111 → shift                       |
|            | 1    | 0001 | 1111 → add $\frac{0100}{1101}$     |
| 4-th cycle | 0    | 1000 | 1111 → shift $\frac{1101}{110001}$ |
|            |      |      | ↓                                  |
|            |      |      | <u>Result</u>                      |

- Booth's Algorithm :

Extend sign-bit of partial product to the left as far as the product will extend.

It is applicable for positive as well as negative product.

- Booth Multiplier recording Table :

| Multiplicand Bit(i) | Multiplicand Bit(i-1) | Version of multiplicand selected by bit i |
|---------------------|-----------------------|-------------------------------------------|
|---------------------|-----------------------|-------------------------------------------|

- Advantages of Booth's Algo :

- 1) It is equally applicable for +ve as well as -ve multiplier.
- 2) If the multiplier multiplier consists of sequence of 1's. In case of normal multiplication no. of partial product will be total no. of 1's, but in case of Booth's Algorithm no. of pp. will be reduced.

$$\begin{array}{r} \text{5 PP} \\ \hline 0.0100000 \end{array} \rightarrow 8$$

$$+ 0000010$$

$\downarrow$   
2 PP in Booth's algo

$$+13 = 01101$$

$$-6 = 11010$$

$$\begin{array}{r} x 0 -1 +1 -1 0 \\ \hline 0 0 0 0 0 \end{array}$$

$$111100110 \leftarrow$$

$$00001101000$$

$$1110010101000$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$1101010010100$$

$$\begin{array}{r} 0001001101 \\ +1 \\ \hline 0001001110 \end{array}$$

sign-bit

$$0001001110 \rightarrow (-) 78 \quad (\underline{\text{Ans}})$$

## Bit pair recording multiplication

This technique reduces no. of PP by 2 times

for n-bit operand no. of PP will be  $\frac{n}{2}$

And it is directly derived from Booth's

Algo by grouping the multiplier bits in pair.

|    |    |    |
|----|----|----|
| -1 | 0  | -2 |
| +1 | 0  | +2 |
| -1 | +1 | -1 |
| +1 | -1 | +1 |
| 0  | 0  | 0  |
| 0  | -1 | -1 |
| 0  | +1 | +1 |

-2 → 2's compliment

and shift left

+2 → left shift

↓

multiplication

right shift

↳ division

$$+13 = 011101$$

$$-6 = \overline{11010}$$

$$\begin{array}{r} 0 -1 +1 -1 \\ 1 \quad \quad \quad \quad \\ \hline 0 \quad -1 \quad -2 \end{array}$$

$$\begin{array}{r} 1111100110 \\ 11110011 \times \times \\ \hline 1110110010 \end{array}$$

$$-1 \rightarrow 2's \text{ compliment}$$

$$\overline{010110} = 614$$

$$\overline{010111} = 62$$

$$\begin{array}{r} -9 = 10111 \\ +3 = 00011 \\ \hline 00010 \end{array}$$

$$\begin{array}{r} 01000 \\ +1 \\ \hline 01001 \end{array}$$

$$\begin{array}{r} -9 = 10111 \\ +3 = 00011 \\ \hline 00100 \end{array}$$

$$\begin{array}{r} 00000 \\ +1 \\ \hline 00001 \end{array}$$

$$\begin{array}{r} 00000001001 \\ 111101111 \times \times \\ \hline 111101001010 \end{array}$$

$$\begin{array}{r} 0000011010 \\ +1 \\ \hline 00001010 \end{array}$$

$$-7 = 11001$$

$$\begin{array}{r} -4 = 11100 \\ \hline 00100 \end{array}$$

$$\begin{array}{r} 000000000000 \\ 000000111xx \\ \hline 000000111100 \end{array} = (+) 28$$

$$\begin{array}{r} 8421 \\ 0110 \\ 1000 \\ +1 \\ \hline 0100 \\ 1011 \\ \hline 1100 \end{array}$$

$$\begin{array}{r} 00110 \\ +1 \\ \hline 00111 \end{array}$$

08/02/18

### • Carry Save Adder

$$\begin{array}{r} 1101 \\ 1111 \\ \hline 1101 \\ 1101 \\ 1101 \\ \hline 100101 \rightarrow S \\ \text{RCA} \quad \boxed{0111100} \rightarrow C \\ \hline 1100011 \end{array}$$



$$\begin{array}{r} 11111 \\ 11111 \\ \hline 101101 \rightarrow S_1 \\ 011110 \rightarrow C_1 \\ \hline 11110 \\ 1101001 \rightarrow S_2 \\ 0111100 \rightarrow C_2 \\ \hline 11100001 (\text{Ans}) \end{array}$$

|       |   |   |   |   |   |   |
|-------|---|---|---|---|---|---|
|       | 1 | 1 | 1 | 1 | 1 |   |
|       | 1 | 0 | 1 | 0 | 1 | 0 |
| <hr/> |   |   |   |   |   | A |
|       | 1 | 1 | 1 | 1 | 1 | 0 |
|       | 0 | 0 | 0 | 0 | 0 | 0 |
| <hr/> |   |   |   |   |   | B |
|       | 1 | 1 | 1 | 1 | 1 | 0 |
|       | 0 | 0 | 0 | 0 | 0 | 0 |
| <hr/> |   |   |   |   |   | C |
|       | 1 | 1 | 1 | 1 | 1 | 0 |
|       | 0 | 0 | 0 | 0 | 0 | 0 |
| <hr/> |   |   |   |   |   | D |
|       | 0 | 0 | 0 | 0 | 0 | 0 |
|       | 1 | 1 | 1 | 1 | 1 | 0 |
| <hr/> |   |   |   |   |   | E |
|       | 1 | 1 | 1 | 1 | 1 | 1 |
|       | 0 | 0 | 0 | 0 | 0 | 0 |
| <hr/> |   |   |   |   |   | F |

|       |   |   |   |   |   |   |   |
|-------|---|---|---|---|---|---|---|
| 0     | 0 | 0 | 0 | 0 | 0 | 0 | A |
| 1     | 1 | 1 | 1 | 1 | 1 | 1 | B |
| 0     | 0 | 0 | 0 | 0 | 0 | 0 | C |
| <hr/> |   |   |   |   |   |   | D |
| 0     | 1 | 1 | 1 | 1 | 1 | 0 | E |
| 0     | 0 | 0 | 0 | 0 | 0 | 0 | F |
| <hr/> |   |   |   |   |   |   | G |
| 0     | 0 | 1 | 1 | 1 | 1 | 0 | H |
| 0     | 0 | 1 | 1 | 1 | 1 | 0 | I |
| <hr/> |   |   |   |   |   |   | J |
| 0     | 0 | 1 | 0 | 0 | 1 | 1 | K |
| 0     | 0 | 0 | 1 | 1 | 0 | 0 | L |
| <hr/> |   |   |   |   |   |   | M |
| 0     | 0 | 0 | 1 | 1 | 0 | 0 | N |
| 0     | 0 | 1 | 1 | 0 | 0 | 0 | O |
| <hr/> |   |   |   |   |   |   | P |

|       |   |   |   |   |   |   |   |   |                             |
|-------|---|---|---|---|---|---|---|---|-----------------------------|
| 0     | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | $\rightarrow s_3$           |
| 0     | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | $\rightarrow c_3$           |
| 1     | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | $\rightarrow s_2$           |
| <hr/> |   |   |   |   |   |   |   |   | S                           |
| 1     | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | $\rightarrow s_4$           |
| 0     | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | $\rightarrow c_4$           |
| <hr/> |   |   |   |   |   |   |   |   | T                           |
| 0     | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | $\rightarrow \text{Result}$ |
| <hr/> |   |   |   |   |   |   |   |   | U                           |
| 0     | 0 | 1 | 0 | 0 | 0 |   |   |   |                             |

14/02/18

### • Division :-

$$1000 \overline{)11}$$

1000 → Dividend

11 → Divisor

### » Longhand technique :-

$$11 \overline{)1000} \quad 0$$

-ve remainder

$$11 \overline{)1000} \quad | 01 \rightarrow \text{Quotient}$$

$$\qquad\qquad\qquad 11 \overline{)10} \rightarrow \text{Remainder}$$

### » Restoring :-

Do the following for  $n$  times. [ $n \rightarrow$  dividend's bits]

- Shift A and Q left one bit position.



$$\begin{array}{r}
 00100 \\
 \times 11101 \\
 \hline
 \end{array}$$

① 0001  
 0001

step - 3

$$\begin{array}{r}
 00010 \\
 \times 11101 \\
 \hline
 \end{array}$$

① 1111

step - 4

$$\begin{array}{r}
 00011 \quad 0010 \\
 \hline
 00010 \quad 0010 \\
 \hline
 \end{array}$$

remainder → Quotient → Answer

$$\begin{array}{r}
 \text{A} \\
 \hline
 0011 \\
 1100 \\
 \hline
 1001011
 \end{array}
 \quad
 \begin{array}{r}
 Q \\
 \hline
 10 \square
 \end{array}
 \quad
 \left. \begin{array}{l} \\ \\ \end{array} \right\} \text{Step - 2}$$
  

$$\begin{array}{r}
 00100 \quad 100 \\
 \hline
 0011 \quad 01 \square
 \end{array}
 \quad
 \left. \begin{array}{l} \\ \\ \end{array} \right\} \text{Step - 3}$$
  

$$\begin{array}{r}
 1100 \\
 \hline
 1111
 \end{array}
 \quad
 \begin{array}{r}
 0100 \quad 010 \\
 \hline
 0011 \quad 010
 \end{array}$$
  

$$\begin{array}{r}
 0110 \quad 10 \square
 \end{array}
 \quad
 \begin{array}{r}
 1100 \\
 \hline
 0010
 \end{array}$$
  

$$\begin{array}{r}
 0010 \quad 101 \\
 \hline
 0010 \quad 101
 \end{array}$$

• Non-restoring :-

(s-1) If the sign of A is 0, shift A and Q left one bit position.

$$A \leftarrow A - M$$

else

left shift A and Q

$$A \leftarrow A + M$$

(s-2) If sign of A is 1

$$q_0 \leftarrow 1$$

else

$$q_0 \leftarrow 0$$

If the sign of A is 1,

$$A \leftarrow A + M$$

M20101

$$\begin{array}{r} A \\ \hline 00000 \\ \text{shift} \\ 00001 \\ \hline 11011 \\ \hline 11100 \\ \hline 11100 \quad 10.00 \\ \hline 11001 \quad 000\Box \\ 01010 \\ \hline 00011 \\ \hline 00011 \quad 0001 \\ \hline 00110 \quad 001\Box \\ 11011 \\ \hline 00001 \\ \hline 00001 \quad 0001 \\ \hline 00010 \quad 001\Box \\ 11011 \\ \hline 11101 \\ \hline 11101 \quad 0010 \\ + 01010 \\ \hline 00111 \quad 0010 \end{array}$$

1010  
1011

①      ②      ③      ④

15/02/18

- Floating point representation →  
Fractional no.

① Exponent

② Mantissa

③ Sign of exponent  
n n mantissa

4.567 → Normalized form

$$0.75 \times 2 = 1.50$$

$$0.50 \times 2 = 1.0$$

$$(0.75)_2 = (0.11)_2$$

## IEEE - 754 format

- 1) Single precision  $\rightarrow$  32-bit
- 2) Double  $\rightarrow$  64-bit



1-bit  
for exponent  
 $bias = 127$

$$bias = 2^{n-1} - 1 \quad n=8$$

$E' \rightarrow$  biased exponent

$$E \leftarrow E' - 127$$

↓  
actual exponent

8-bits  
mantissa

23-bits  
mantissa

**E'**

$$0 \leq E' \leq 2^n - 1$$

$$0 \leq E' \leq 2^8 - 1$$

$$0 \leq E' \leq 255$$

$$1 \leq E' \leq 254$$

$E' > 0$       Reserved  
 $E' = 255$

$$\textcircled{1} \quad E' = 0, M \neq 0$$

$N \neq 0$

$$\textcircled{2} \quad E' = 0, M \neq 0$$

Denormalized form

0.0110

$\Rightarrow$

$$-127 \leq E \leq 254 - 127$$

$$\textcircled{3} \quad E' = 255, M \neq 0$$

$N \neq \infty$

$$-126 \leq E \leq 127$$

$$\textcircled{4} \quad E' = 255, M \neq 0$$

$N = \text{Not a Number (NaN)}$

Ex: 0/0,  $\sqrt{-1}$

$$\Rightarrow 4.5 \text{ in IEEE-754 32 bit format.}$$

$= 100.1$   
 $= 1.001 \times 2^2$  (Normalized form)

$$E = 2, E' = E + 127$$

$$= 2 + 127 = 129 = (1000000.0)_2$$

|   |          |                            |
|---|----------|----------------------------|
| 0 | 10000001 | 00100000000000000000000000 |
|   | 8-bits   | 23-bits                    |

$$\Rightarrow 1.1 \times 2^1$$

$$E = -1, E' = 126$$

$$= 111$$

$$= (01111110)_2$$

|   |     |
|---|-----|
| 2 | 126 |
| 2 | 63  |
| 2 | 31  |
| 2 | 15  |
| 2 | 7   |
| 2 | 3   |
| 2 | 1   |
| 2 | 0   |

|   |          |                          |
|---|----------|--------------------------|
| 0 | 01111110 | 100000000000000000000000 |
|---|----------|--------------------------|

$\Rightarrow$  Range of Single Precision format :-

$$E' = 00000001 \quad E' = 1 \quad E = 1 - 127$$

~~E Min~~  
 $M = 00000\ldots0$   
 $= -126$

|   |          |            |
|---|----------|------------|
| 0 | 00000001 | 000...0000 |
|---|----------|------------|

Actual no  $\in \pm 1.0 \times 2^{-128}$

(+ve) overflow

(-ve)  
underflow

(+ve) underflow

(+ve) overflow

## Double precision :-



- 1) 15.375
- 2) 62.1234

|     |         |
|-----|---------|
| 30  | 100     |
| 200 | LOAD AC |
| 201 | 500     |
|     |         |
| 702 | 90      |
| 500 | 30      |
| 900 | 60      |

1) Effective Address = 500

Content of AC = 30

2) Immediate :-

operand = 500.

EA = 201

Content of AC = 500.

3) Indirect :-

BA = M[500]

operand = M[30] = 100.

Content of AC = 100

4) Relative :-

EA = content of PC + address field

$$= 202 + 500$$

$$= 702$$

operand = M[702] = 90.

content of AC = 90.

5) Index :-

EA = content of index reg + address field

$$= 400 + 500$$

$$= 900$$

operand  $M[900] = 60$

content of  $Ac = 60$

6) Register indirect (use  $R_1$ ) :-

$$EA = (R_1)$$

$$= 100$$

operand  $= M[100] = 70$

content of  $Ac = 70$

21/02/18



Information required by CPU is stored in pri. memory.

If more than one software is present in computer & we are using one software currently  $\rightarrow$  the other softwares will be stored in secondary memory as backup.

• Boot Strap Loader :-

1) Loads OS from disk to primary memory

2) Prepare OS for general use

3) Transfer the control to OS.

Loaded in ROM, as initial program is required for starting.

08/03/18

## 1) Direct Mapping

$$a = b \bmod M$$

$a$  = cache block no.  
 $b$  = Main memory (MM) no.  
 $m$  = no. of blocks in cache

## 2) Associative Mapping

Any block from MM can be mapped to any block in cache memory.

$\Rightarrow$  cache size = 2K bytes

$$\text{MM } n = 64 \text{ KB} = 2^{16}$$

$$\text{Block } n = 16 \text{ bytes.}$$

$$= 2^4 \text{ bytes}$$

$$\therefore \text{no. of block in cache} = \frac{2K}{16} = 2^7 = 128$$

① no. of block in CM?

②  $n \cdot m \cdot m = \text{MM}?$

size of MM address bus = 16 bits

## Direct Mapping



[In case of direct mapping, block 0, 128, 256 all will be mapped to cache block 0.]



No. of comparison = 1

Drawback :-

So, previous data is overwritten.

## Associative Mapping :-



Tag      offset



No. of comparisons = all the ~~blocks~~ attached to cache memory  $\times$  no. of blocks in cache

$$= 1 \times 128 = 128$$

## Drawback :-

In terms of performance it is best, but cost will be increased.

## Set associative mapping $\rightarrow$ (k way)

Each set contains k no. of blocks.



$\therefore$  no of sets

$$= \frac{128}{4} = 32 = 2^5$$



$$\Rightarrow \text{cache size} = 32 \text{ KB}$$

$$\text{MM n} = 16 \text{ MB} = 2^4 \times 2^{20} = 2^{24}$$

$$\text{block n} = 512 \text{ bytes} = 2^9$$

Show the partition of MM address into tag, index, offset (direct) & 1 way set associative

$$\therefore \text{no. of cache block} = \frac{32 \times 1024}{2^9}$$

$$= \frac{2^5 \times 2^{10}}{2^9} = 2^6$$

$$= 64 \text{ bits}$$

$$\therefore \text{size of MM address bus} = 24$$

24-bits



Direct mapping

$$\therefore \text{no. of sets} = \frac{64}{4} = 16 = 2^4$$



Set associative

$$\text{Word} \quad 1200 \rightarrow \text{MM Block no.} = \frac{1200}{16} = 75$$

for direct mapping  $\rightarrow$

$$a \geq b \bmod m$$

$$= 75 \bmod 64$$

$$= 11$$

$\therefore$  cache block no. = 11. (Ans)

Consider a 2-way set associative cache with 128 blocks, & block size = 16 bytes. Find out the set which will contain MM address 1500?

Soln: Each set contains 2 no. of blocks.

$$\text{No. of sets} = \frac{128}{2} = 64$$

$$\therefore \text{the set} \rightarrow \text{MM block} = \frac{1500}{16} = 93$$

$$\therefore \text{set no} = 93 \bmod 64$$

$$= 29. (\text{Ans})$$

• Read Miss :-  $1200 \rightarrow$  address of MM

CPU will search data at address 1200. While reading, If the data is found in cache memory  $\rightarrow$

then it is called Read Hit.

• Cache Write :- MM memory address  $\geq 16$ .

CPU ~~will~~ always generate address for MM not for cache memory.

While writing - if the data address is found in cache memory  $\rightarrow$  then it is called Write Hit.

If data address is not found while writing then it is called, Write Miss.

- Write through :-

» While writing operation, MM & cache memory both will be updated.

- » Advantage :-

Here MM & cache memory both contains same data.

- Write back :- (faster) → advantage over write through.

: Only cache memory will be updated.

- ~~Disadvantage~~

When we are updating any location in cache memory, the flag bit is set.



→ If flag bit is 1 while refreshing cache memory, then only MM will be updated.

→ MM will not be updated while refreshing.

- Disadvantage :-

- ⇒ ~~Disadvantage~~ for write back, when one block is updated multiple times in cache memory, then the data in MM & cache memory does not remain same.

- Write allocate :- from MM.

Blocks may be brought into cache memory. After that it will be updated.

(in case of <sup>↑</sup> write miss)

- Write no-allocate & (in case of write miss)  
Blocks is updated in MM only.
- ⇒ Consider a fully associative write back policy.  
Some sequence of memory operation is given ↪

write M[100]

Find out no. of bits →

write M[100]

i) write allocate

Read M[200]

ii) n no ".

write M[200]

write M[100]

### (i) Write allocate ↪

3 bit

Miss, hit, miss, hit, hit

initially  
cache memory  
does not contain  
any data

because CM  
contains only  
address 100

|                |     |
|----------------|-----|
| b <sub>0</sub> | 100 |
| b <sub>1</sub> | 100 |
| b <sub>2</sub> | 200 |
| b <sub>3</sub> | 200 |
| b <sub>4</sub> | 200 |

cache

[address 100 is  
allocated after  
this step]

Now, address  
200 is allocated

### (ii) Write no-allocate &

1 bit

miss, miss, miss, hit, miss

empty (only MM)  
(is updated from read operation - address)

|     |
|-----|
| 200 |
|-----|

19/03/18

- Miss penalty :

$T_{CA} \rightarrow$  Cache Access Time  
 $T_{MS} \rightarrow$  MM Access Time

### Write back

» Read :

$$T_{CA} + T_{MS} + P_d T_{MS}$$

$[P_d = \text{Prob. that flag is set}]$



Avg. memory access time for read  $\rightarrow$

$$T_p = H \times T_{CA} + (1-H) \{ T_{CA} + (1+P_d) T_{MS} \}$$

Read Miss Penalty

» Write :

$$2 T_{CA} + T_{MS} + P_d T_{MS}$$

Avg. memory access time for write  $\rightarrow$

$$T_w = H \times T_{CA} + (1-H) \{ 2 T_{CA} + (1+P_d) T_{MS} \}$$

$\Rightarrow$  cache access time = 90 ns

MM access time = 500 ns,  $P_d = 0.5$

80% memory read, 20% write

hit ratio for write operation,  $HR = 0.9$

$HW = 0.8$

1) Calculate memory access time for read operation for both write back & write through.

2) Avg memory access for both read & write operation for both write back & write through.

Soln:  $P_d = 0.5$ . (Given)

$$\Rightarrow T_R = (0.9 \times 90) + 0.1 \{ 90 + 1.5 \times 500 \} \Rightarrow \text{Write back}$$

Write through :-

$$T_R = (H_R \times T_{CA}) + (1 - H_R)(T_{CA} + T_{MS})$$

$$\Rightarrow (0.9 \times 90) + 0.1(90 + 500) = 140 \text{ ns}$$

• Write back :-

$$T_W = (0.8 \times 90) + (1 - 0.8) \{ 2 \times 90 + 1.5 \times 500 \}$$

$$= 258 \text{ ns}$$

$$\begin{aligned} T_{avg} &= P_R \times T_R + P_W T_W \\ &= 0.8 \times 165 + 0.2 \times 258 \\ &= 183.6 \text{ ns} \end{aligned}$$

Write through :-

$$T_W = T_{MS} = 500$$

$$\begin{aligned} T_{avg} &= 0.8 \times 140 + 0.2 \times 500 \\ &= 212 \end{aligned}$$

$\Rightarrow$  cache access time = 50 ns

MM " " = 500 ns

$P_R = 0.8$ ,  $H_R = 0.9$

wrote through

- 1) Avg. access time for only Memory read operation.
- 2) " " " both read & write "

• Cache Replacement Algorithm :-

For Direct Mapping, no such algorithm is reqd.

|    |    |
|----|----|
| b0 | 4  |
| b1 | 5  |
| b2 | 13 |
| b3 | 11 |
| b4 | 12 |
| b5 | 13 |

cache

- 1) FIFO → replace the block which is accessed for longest time
- 2) LRU → Least Recently Used
- 3) Random → Select one block for replacement

Eg:- 5, 4, 2, 3, 5, 1, 5, 10, 3 → MM blocks.

|   |    |
|---|----|
| 5 | 1  |
| 4 | 5  |
| 2 | 10 |
| 3 | 3  |

1) Calculate no. of bits & no. of rows.

First time when cache memory is empty then always miss occurs. It is called compulsory miss. (for 5, compulsory miss)

i) FIFO →

M, M, M, M, H, M, M, M, H

$$H = 2$$

ii) LRU →

M M M M H M H M H  
H = 3

|      |
|------|
| 5    |
| X 1  |
| 2 10 |
| 3    |

→ Diff. b/w FIFO & LRU.

- Types of cache miss

1) Compulsory



2) Capacity

3) Conflict → direct mapping, → 0, 16, 32 all maps set associative to the same block.

- Techniques to reduce cache misses

1) Increasing cache size

2) n block n

3) Higher associativity

4) Compiler optimization

i) Loop interchange → column wise

```
for(j=0; j<1000; j++)
{
    for(i=0; i<1000; i++)

```



$$a[i][j] = 2 * a[i][j]; \quad 1000 \text{ blocks}$$

}

ii) Loop fusion & Combines 2 or more loops in one.

```
for(i=0; i<100; i++)
{
    for(j=0; j<100; j++)

```

*loop fusion* →  $a[i][j] = b[j][j] + c[i][j],$   
 $a[i][j] = 2 * a[i][j],$

③ Merging arrays : combines 2 separate arrays.  
(that might conflict for a block in cache  
memory).  
into one separate array.

int val[10];  $\rightarrow b_0$

int key[10];  $\rightarrow b_{16}$

struct merge

{ int val;

int key; ✓

} [10];

---

15/03/18

---

- Memory Expansion :

- ▷ Vertical expansion :

increasing no. of memory location or no. of memory word.

$\Rightarrow$  increase the size of address bus.

- ▷ Horizontal expansion :

increasing word length.

" size of data bus.

$\Rightarrow$  How many  $256 \times 8$  RAM chips are needed to provide memory capacity of  $1K \times 16$ ?

no. of chips  $\rightarrow$

$$= \frac{1K \times 16}{256 \times 8} \Rightarrow \textcircled{1} \times \textcircled{2}$$

vertical                          horizontal

- Q) How many lines of address bus must be used to access  $1K \times 16$  memory?  $\Rightarrow \log_2 1024 = 10$
- Q) How many of these lines will be common for all chips?  $\Rightarrow \log_2 256 = 8$   $(10 - 8) = 2 = n$
- Q) Specify the size of decoder.  $(2 \times 4)$  decoder



- Q) How many  $512K \times 8$  RAM chips are needed to provide a memory capacity of  $8M \times 32$ ?

$$\text{no. of chips} = \frac{8 \times 2^{20} \times 2^5}{512 \times 2^{10} \times 2^3} = 16 \times 2^2$$

- Virtual Memory  
It allows a execution of the program that may not be completely in MM (Physical Memory).



$P_1, P_2, P_3$  is currently utilizing by CPU. So it is loaded to Physical Memory (MM).

$P_4, P_5$  is not

- (\*) Address of virtual memory utilized.
  - Virtual Address.
  - are in Binary/Virtual Memory.
- (\*) Address of MM -
- Physical Address.

- (\*) Total virtual memory is divided into some blocks. Each block is called Page.

- (\*) Total physical memory
  - Each block is called frame.



- Memory Management Unit (MMU)  
It maps virtual memory to physical memory.



⇒ Virtual add = 48 bits  
 Phy. n = 36 n  
 Page size = 2 KB

How many virtual pg  
 & phy. frames exist  
 in the system?

9/03/18  
 • Input and Output Device



I/O device is called Peripheral device.

- CPU
- i) electronic
  - ii) rate of data transfer — ~~higher~~ higher
  - iii) data format.

I/O device enters the input as ASCII format.

I/O device

- i) electro-mechanical
- ii) lower.
- iii)



- Isolated I/O vs. memory mapped I/O
- » Isolated I/O



2 - control lines →

- » Memory Read      LOAD 2000
- » n Write

- » Separate instruction will be there for memory operation & I/O operation.

## Memory Mapped I/O



### Isolated I/O

- 1) different control line
- 2) ~~diff~~ instructions will be used for MM & I/O device.
- 3) diff address space.

### Memory mapped I/O

- 1) same control line.
- 2) same instruction.
- 3) common address space.



\* In the control Line, CPU specifies the I/O command

- 1) control → (memory read/write)
- 2) Status → checks if I/O device is ready or not. (for data transfer)
- 3) data input → if I/O device <sup>is ready</sup> then CPU place activate the data - input line.



21/6/3/18

## • Modes of Data Transfer :

- 1) Programmed I/O
- 2) Interrupt initiated I/O
- 3) Direct Memory Access (DMA)

### ① Programmed I/O :

#### » Handshaking Protocol :



→ S places data in data bus. It enables data valid line. When destination accepts the data valid line it ~~acknowledges~~ enables data accepted line (by D). Disables data valid line.  
 " " accepted line.



- ⇒ f=1 → Data present in Data register
- ⇒ f=0 → I/O device is not ready.

if ( $F = 1$ )

CPU accepts data from data reg.

Then data accepted line is disabled by interface.

### » Drawback :-

CPU has to continuously check flag bit whether  $F = 0$  or  $F > 1$ . So, normal execution of CPU operation will be halted.

### ② Interrupt initiated I/O :-

To overcome the drawback of programmed I/O.

» CPU initiates I/O operation by placing address of I/O device in address bus.

A program is a set of instructions. Each instruction access memory address. Suppose at memory location 1000  $\rightarrow$  add(a, b) is a instruction  $\rightarrow$  interruption occurs (means it gives signal to CPU that I/O device is ready for data transfer).



ISR (Interrupt Service Routine)

This program contains what to do when interruption occurs.

» How CPU gets interrupt service routine :-

» Priority interrupt  
 Suppose at the same time interruption occurs of two or more I/O device.



Here, I/O device with the highest priority will be serviced first.

- » Identify highest priority → 2 ways →  
 1) using programming → Polling → software technique  
 2) " hardware technique "



it is a program. It checks all I/O device & checks whether interrupt signal of the I/O device is set or not. It checks all I/O devices sequentially.

VAD → Vector Address

CPU accepts the interrupt by enabling Interrupt acknowledgement line.

$$P_i = 1$$

$$P_o = 0$$

} block the signal to next device.

→  $P_o = 1 \rightarrow$  means  $I/O - 1$  passes the signal to  $I/O - 2$ .

Prog. I/O and ~~int~~ interrupt initiated I/O requires CPU interaction.

But in case of DMA direct transfer of data from I/O device & memory.

19/08/18

• DMA controller :-



CPU place.

1) Starting Address of block of data to be read or written in memory.



Address reg.

Word count reg.

control reg.

Address register  
(present in)  
DMAc

2) no. of words to be transferred (eg: 10 words)

10 memory  
addresses

3) control to specify read/write operation.

4) n n start the DMA operation.

BR → Bus request

BG → Bus Grant

Advantage :- No CPU interaction req'd., DMA can control transfer data directly b/w I/O & MM.

» 2 types of Transfer :-

1) Burst Transfer :- (faster).

Transfer block of data at a time.

(consists of memory location).

2) Cycle Stealing :-

One word is transferred at a time.

• Pipeline :-



Answer Scripts

Examiner

(E<sub>1</sub>) E<sub>2</sub> E<sub>3</sub>

| time | E <sub>1</sub> | E <sub>2</sub> | E <sub>3</sub>     |
|------|----------------|----------------|--------------------|
| 1    | A <sub>1</sub> | -              | -                  |
| 2    | A <sub>2</sub> | A <sub>1</sub> | -                  |
| 3    | A <sub>3</sub> | A <sub>2</sub> | A <sub>1</sub> → 3 |
| 4    | A <sub>1</sub> | A <sub>3</sub> | A <sub>2</sub> → 4 |

But for only E<sub>1</sub>

A<sub>1</sub> → 3 time

A<sub>2</sub> → 3 + 3 = 6 time

A<sub>3</sub> → 9

A<sub>4</sub> → 12 ... A<sub>10</sub> → 30



(faster)

A<sub>1</sub> → 3

A<sub>2</sub> → 4

A<sub>3</sub> → 5

A<sub>4</sub> → 6

A<sub>5</sub> → 7

A<sub>10</sub> → 12

For pipeline,  
total time unit is  
reduced to check to

answer script

with pipeline

$$= \frac{30}{12}$$

no. of tasks =  $n$  (10)

$n$  segments =  $k$  (3)  $\rightarrow$  examiner

clock period =  $t_p$  (1)

Time reqd without pipeline =  $n k t_p$

$$\therefore \text{speed up} = \frac{n k t_p}{k t_p + (n-1) t_p}$$

$$= \boxed{\frac{n k}{k+n-1}} \approx \frac{n k}{n} = \boxed{k} \left[ \begin{array}{l} \text{when } n \rightarrow \infty \\ n \gg k-1 \end{array} \right]$$

$\Rightarrow t_p = 20 \text{ ms}$ ,  $k = 4$  -,  $n = 100$  tasks.

speed up = ?

$$\text{speed up} = \frac{100 \times 4}{4 + 100 - 1} = \frac{400}{103} = 3.88 \approx 4$$

26/03/18

### Data Pipeline :-

$$A_i * B_i + C_i$$

$i = 1, 2, \dots, 7$

$$i=1, A_1 * B_1 + C_1$$

$$i=2, A_2 * B_2 + C_2$$

$$i=7, A_7 * B_7 + C_7$$

Segment - 1 , S1  $\rightarrow$  1st sub-operation

$$S_1 : R_1 \leftarrow A_i, R_2 \leftarrow B_i;$$

$$S_2 : R_3 \leftarrow R_1 * R_2, R_4 \leftarrow C_i$$

$$S_3 : R_5 \leftarrow R_3 + R_4$$



- Instruction Execution Cycle :- (Instruction Pipeline)

- 1) Fetch the instruction (FI)
- 2) Decode n n (DE)
- 3) fetch n operands (OF)
- 4) Execute n instruction (EX)



$$I_1 \rightarrow 4, I_2 \rightarrow 4+4, I_3 \rightarrow 4+4+4, I_7 \rightarrow 4+4+4+4 \\ = 16$$

(without pipeline)

$$\text{Speed up} = \frac{16}{7} = 2.25$$

$\Rightarrow$  No. of instruction,  $n = 50$ ,  
 $K = 4$ .

$$\text{Soln: speed up} = \frac{nk}{n+k-1} = \frac{200}{53} \approx 4$$

$$\text{Maxm speed up} = K = 3.$$

- Pipeline hazards

3 types  $\rightarrow$

1) Structural hazards (Resource conflict)

2) Data

3) Control

(when more than 1 instruction use the same resource at same time.)

$\rightarrow$  To resolve  $\rightarrow$  (use diff block)

[Data memory]

[Instruction memory]

- Harvard Architecture  $\rightarrow$  (to resolve str. hazard)



28/03/48

### Von - Neumann

- 1) Only one memory.
- 2) one bus.  
(1 set of bus)
- 3) slower.



### Harvard

- 1) Instruction Memory, Data Memory
- 2) 2 buses. (2 set of bus)
- 3) faster.



### Data Hazard :-

#### Instructions

- ↓  
I :  
J :

- » 3 types of data hazard :-

#### i. Read After Write Hazard (RAW) :-

$$I : R_1 \leftarrow R_2 + R_3 \quad R_2 \xrightarrow{10} \quad R_3 \xrightarrow{5}$$

$$J : R_4 \leftarrow R_1 * A$$

Old value of  $R_1$   
= 200

|   | 1          | 2          | 3           | 4  |
|---|------------|------------|-------------|----|
| I | FI         | DE         | OF          | EX |
|   | $R_2 = 10$ | $R_3 = 15$ |             |    |
| J | FI         | DE         | $R_1 = 200$ | EX |

$$R_1 = 800 \rightarrow \text{Wrong answer}$$

first I writes some variables then J reads the variables.

#### Solution :-

- 1) Delayed Load.

(right answer will come)

|   | 1          | 2          | 3  | 4  |
|---|------------|------------|----|----|
| I | FI         | DE         | OF | EX |
|   | $R_2 = 10$ | $R_1 = 15$ |    |    |

  

|   | 1         | 2  | 3  | 4  |
|---|-----------|----|----|----|
| J | FI        | DE | OF | EX |
|   | $R_3 = 5$ |    |    |    |

NOP  $\Rightarrow$  Increase the clock frequency

2) Compiler Optimization :- Compiler detects data hazard, after that it rearrange the instruction sequence.



3) Operand Forwarding :- hardware detect the conflict.

$$\begin{array}{l}
 \text{I: } R_4 \leftarrow R_2 + R_3 \\
 \text{J: } R_4 \leftarrow R_4 * A
 \end{array}$$



(ii) Write After Read Hazard (WAR) :-

$$\text{I: } R_4 \leftarrow R_2 + R_3$$

[ 1st Read then write  $\rightarrow$  no problem (no hazard) ]

$$\text{J: } R_2 \leftarrow R_5 * 3$$

But if write occurs before read then hazard occurs. ]

(iii) Write After Write Hazard (WAWH) :-

$$\text{I: } R_4 \leftarrow R_2 + R_3$$

[ First I updates  $R_4$  then J in  $R_4$ . ]

$$\text{J: } R_4 \leftarrow R_4 - R_5$$

But if J updates  $R_4$  first before I updates  $R_4$  then hazard occurs. ]

(iv) No data hazard for PAR

- Control hazard & occurs in case of branch instruction



2 types of branch instructions →

- 1) Unconditional branch
- 2) Conditional branch



### » Solution

- 1) Delayed branch ( $6 - 3 = 3$  CLK pulse delay).
- 2) Branch Target Buffer



### 3) Branch Prediction :-

JNC 3000 < P >

for ( $i \geq 0$ ;  $i < 100$ ;  $i++$ )

{      $c \leftarrow A + B$  ;      $\Rightarrow$  Always branch prediction is  
    }     correct except  $i = 150$ .

$\downarrow$   
miss prediction  
occurs.

$P=1 \rightarrow$  goto 3000  
 $P=0 \rightarrow$  next instruction