

## DESIGNING AN ALU:-

⇒ So, from where are you start designing an ALU, from multiplication or division?

No.

From addition (or adder)

No

So, From addition (or adder)

OK

⇒ Before that you have to understand how to represent numbers if I say multiply  $8 \times 9$  using Booth's Algorithm

You may take 1001 as 9, and your multiplication will go wrong (01001) The issue is how to store -ve nos.

Signed

⇒ So, first we have to know how to represent a number.

⇒ There are 3 proposed ways to represent a signed number.

1) Signed magnitude

2) 1's complement

3) 2's complement

First tell me do you want to design your ALU only for +ve numbers or Signed nos.? OFCOURSE SIGNED Numbers

I am going to explain why your ~~-ve~~ nos are stored in 2's complement form.

| Decimal | Sign Magnitude | 1's complement | 2's complement |
|---------|----------------|----------------|----------------|
| 3       | 0 1 1          | 0 1 1          | 0 1 1          |
| 2       | 0 1 0          | 0 1 0          | 0 1 0          |
| 1       | 0 0 1          | 0 0 1          | 0 0 1          |
| 0       | 0 0 0          | 0 0 0          | 0 0 0          |
| <hr/>   |                |                |                |
| -1      | 1 0 1          | 1 1 0          | 1 1 1          |
| -2      | 1 1 0          | 1 0 1          | 1 1 0          |
| -3      | 1 1 1          | 1 0 0          | 1 0 1          |

# As you can observe in all ~~six~~ signed representation +ve numbers are stored in same way.  
because There is no issues in storing +ve numbers.  
However, when we write -ve number on paper they can be represented by a '-' sign. but '-' minus cannot be stored in memory.  
So, we require some way to store -ve numbers.

# In sign magnitude, your MSB for +ve numbers is zero. For -ve numbers, you just make your MSB one.

# Now, out of 8 combinations, which combination is left.

100 this number represents -0.

But there is nothing like -zero.

So, this representation has one ~~rubbish~~ value.

This is the reason why signed magnitude representation failed.

So, if you add 2 numbers  $\begin{array}{r} 011 \\ 001 \\ \hline 100 \end{array}$  Can you guarantee that this number is  $0+4$  or  $-0$ ?  
No,  $\therefore$  this system fails.

# Now, as signed magnitude representation failed let's discuss another ~~one~~ alternate representation.

+ve nos. will again be represented by having 0 as its MSB

For -ve nos. don't just store MSB as 1,

take 1's complement of that number.

What this representation says if you have a 3-bit number  
look at its MSB  $-|--$  if MSB is zero no. is +ve  
take it as it is  
if MSB is one no. is -ve  
if you want to know what no. it is  
don't take it on face value, take  
its one's complement.

Ex:- So we have this no. 011 stored in memory.

as MSB = 0 no. is +ve take it as its

we have another no. 100

as MSB = 1 no. is -ve, now don't take it as face value.

find its one's complement to understand its value.

1's complement of

100 is 011 this no. is -3

Now consider another no. ~~000~~ 111

as MSB = 1 no. is -ve, to find its actual value  
find its 1's complement

1's complement of 111 is 000.

Wrong. there can be no no. ~~000~~ -0.

So, this system also fails.

Ex:- we are adding two numbers

$$\begin{array}{r} \cancel{0} \ 1 \ 1 \\ \cancel{0} \ 0 \ 0 \\ \hline 1 \ 1 \ 1 \end{array}$$

now once this no. is stored

in memory after some time when i will retrieve

it should it represent +7 or -0?

+7 of course. This system fails.

Now, let's consider another number system. 2's complement

F Now, what is 2's complement  $\Rightarrow$  1's complement  
+ (plus)  
one

There is a shortcut to use 1's complement that works in every case.

$\Rightarrow$  From R.H.S. (right hand side) copy the no. as it is till 1st 1 is encountered after that just invert the digits

$\Rightarrow$  So, how 000 will be represented in 2's complement.

You will get  $\boxed{000}$

So, now you have a single representation for zero (and same)  
in both original and its 2's complement form.

So, whenever 000 is stored in memory and read out after some time it will still represent zero.

---

Therefore, hence proved negative nos are stored in its 2's complement form.

---

Q.



101 is this +5 NO,  
MSB is 1  
0101 is +5  
00101 is -5

So, if someone ask  
you to multiply  
+5s using Booth's  
Algorithm and you use  
101 as +5, you will  
get incorrect answer

## Designing An ALU (Adder circuit):-

$$\begin{array}{r} 3 \quad 0011 \\ +2 \quad 0010 \\ \hline 5 \quad 0101 \end{array}$$

To perform this addition I require how many bit adder?  
4-bit Adder.

Now, if I had to perform addition of two 8 bit nos. 100, 125  
what would I do?

I have to design an 8-bit adder.

Now, Is the logic of 4-bit addition different from 8-bit addition

No,

So, will the logic of 1-bit addition be different from 4-bit addition?

Yes / No Yes (for half adder)

No (for full adder)

No.

So, to understand and design <sup>'6'</sup> bits adder I should first understand how to ~~add~~ design 1-bit adder.

## # Half-adder:

Now, consider the following circuit.



Now, when will my sum be 1,

Q. when both of my inputs  $X$  &  $Y$  ~~= 0~~ are zero?

No,  $0+0=0$

Q. So, when both of my inputs  $X$  &  $Y$  are one?

No.  $1+1=0$  Sum  
Carry

Q. So, then when will my sum becomes ~~zero~~ <sup>one</sup>?

when exactly one of my input is one.

i.e. ~~both~~

My sum  $Z=1$ , if  $X \cdot \bar{Y} + \bar{X} \cdot Y$

$\hookrightarrow X$  is one AND  $Y$  is zero.

OR

$Y$  is one AND  $X$  is zero.

This expression  $Z = X \cdot \bar{Y} + \bar{X} \cdot Y$

is my XOR gate.

It can also be represented by a circuit.



SIMILARLY,

## HALF-ADDER (continued)

Similarly,

Q. When will my carry out bit will be 1

Q. When both of my inputs are zero?

No.

Q. When ~~X=0, Y=1~~ one of my input is zero?

No.

Q. Exactly, My carry <sup>out bit</sup> will only be one  
When both of my inputs are ~~zero~~. one

$$\text{i.e., } C_{\text{out}} = X \cdot Y$$

X AND Y.



$\therefore$  either my adder can be represented as

or



Now, So pens down? done and dusted? Finished learned A, B, C, D, E, F, G?

No, The task is not finished, let me tell you why?

What was my task my task was to design a 1-bit adder that can be used to perform '6'-bit addition in combination with other adders.

Now, take example

$$\begin{array}{r} 3 \\ + 2 \\ \hline 5 \end{array}$$



But this! This doesn't work with half-adder  
This also works with half-adder

This works with half-adder

It work with half-adder as I have no carry input that can accept carry from previous Stage/step.

So, What should I do? If I want to add two number along with carry inputs?

I will design a new adder called as FULL ADDER



Now, tell me when will my sum be 1?

Q. When ~~None~~ <sup>Both</sup> of the inputs are 1  
OR  
When all the inputs are 0.  
No.

Q. When Exactly one of the inputs is 1?

Yes.  $X \cdot \bar{Y} \cdot \bar{C}_{in} + \bar{X} \cdot Y \cdot \bar{C}_{in} + \bar{X} \cdot \bar{Y} \cdot C_{in}$

Q. When X is 1 AND Y is zero as well as Cin is zero.  
When X is 0 AND Y is one AND Cin is zero.  
When X is 0 AND Y is also zero AND Cin is one.

Q. Now tell me when else my sum will be 1?

Q. When Exactly 2 inputs are 1

No, at that time  $1+1=0$  (so sum will be zero carry will be 1).

## Full adder (Continued)

Q. So, when else my sum will be 1?

When all 3 of my inputs are 1?

Yes,

$$\begin{aligned} Z &= XY \bar{C}_{in} + X\bar{Y}\bar{C}_{in} + \bar{X}Y\bar{C}_{in} + \bar{X}\bar{Y}C_{in} \\ &= X(Y\bar{C}_{in} + \bar{Y}\bar{C}_{in}) + \bar{X}(\bar{Y}\bar{C}_{in} + Y\bar{C}_{in}) \end{aligned}$$

Similarly, let just consider carry out.  $= X(\bar{Y} \oplus \bar{C}_{in}) + \bar{X}(Y \oplus C_{in})$

Q. When will my carry out will be 1?

Q. When all of my inputs are 0?

No,

Q. When exactly one of my inputs is 1?

No.

Q. When exactly two of my inputs are 1?

Yes.

$$So, XY\bar{C}_{in} + X\bar{Y}C_{in} + \bar{X}YC_{in}.$$

Q. What happens when all of my 3 bit are 1?

Yes, it will also generate carry out.

$$So, my C_{out} = XYC_{in} + X\bar{Y}\bar{C}_{in} + \bar{X}YC_{in} + \bar{X}\bar{Y}C_{in}$$

Now do a simple Boolean algebra.

$$C_{out} = XY(C_{in} + \bar{C}_{in}).$$

$$= XY + X\bar{Y}C_{in} + \bar{X}YC_{in}$$

$$= XY + C_{in}(X\bar{Y} + \bar{X}Y)$$

$$= XY + C_{in}(X \oplus Y).$$

$$A + \bar{A} = 1$$

So, going through previous equation.

$X \oplus Y$  is what?

Sum of half-adder.

$X \cdot Y$  is what?

Carry out of half-adder.



## # Serial Adder / Ripple carry adder / Serial adder

So, we have seen how to add two 1-bit adder numbers,

Now, we will see how to make 4-bit adder using two-bit adder.



Inputs

$x_3 \ x_2 \ x_1 \ x_0$

$y_3 \ y_2 \ y_1 \ y_0$

$c_{im}$

Outputs

$z_3 \ z_2 \ z_1 \ z_0$

Carry.

// This is called a slow adder

// The main drawback is that I have to calculate  $C_0$  before I can use Full Adder 1 to generate correct output.

Similarly, I have to wait for  $C_1$  before I can use Full Adder 2 therefore, it is slow adder.

// But, I want fast adder. I cannot afford slow adder.

If I have 8-bit adder it will require 8 clock cycles

————— 16-bit ————— 16 clock cycles.

So, I want fast adders.

// Now, for that I require some special mechanism that can generate all the carry simultaneously.

In other words, some circuit that can predict these carries simultaneously -

# predict can also be termed as "looking ahead into the future".  
i.e. I have to design a Carry look ahead adder.

# So, Now the question is how a circuit can predict its carry.

# lets look at the inputs that I have

$$x_3 \ x_2 \ x_1 \ x_0$$

$$y_3 \ y_2 \ y_1 \ y_0$$

Cin

To predict  $C_0$ , what do I require.

$$C_0 = x_0 \cdot y_0 + g_0 \cdot \text{Cin} + \text{Cin} \cdot p_0$$

$$C_0 = x_0 \cdot y_0 + \text{Cin} \cdot (x_0 + y_0)$$

$$C_0 = g_0 + \text{Cin} \cdot p_0$$

generate      propagate.

$$C_1 = x_1 \cdot y_1 + g_1 \cdot C_0 + C_0 \cdot p_1$$

$$= x_1 \cdot y_1 + C_0 (x_1 + y_1)$$

$$C_1 = g_1 + C_0 \cdot p_1$$

$$= g_1 + P_1 (g_0 + \text{Cin} \cdot p_0)$$

$$= g_1 + P_1 g_0 + P_0 P_1 \cdot \text{Cin}$$

The ultimate conclusion is we can get  $C_0, C_1, C_2, C_3$  directly using our inputs.

$$x_0, y_1, x_1, y_2,$$

Y0, Y1, Y2, Y3, and Cin, so we can predict our carry before hand using a specialized circuit termed as carry look ahead generator.

$$C_2 = x_2 \cdot y_2 + g_2 \cdot C_1 + C_1 \cdot p_2$$

$$= x_2 \cdot y_2 + C_1 (x_2 + y_2)$$

$$C_2 = g_2 + C_1 \cdot p_2$$

$$= g_2 + P_2 (g_1 + C_0 \cdot p_1)$$

$$= g_2 + P_2 g_1 + C_0 \cdot P_1$$

# FAST SUBTRACTOR

4-bit subtractor using 4-bit Fast Adder.



Now what I have to do is, I have to add create a 4-bit subtractor.  
So, what is subtraction?

It is addition of 2's complemented no.

because 2's complement of any positive no. is its -ve no.

e.g.:  ~~$\frac{9}{-5}$~~   $\Rightarrow$  +ve 9  
 ~~$-5$~~   $\Rightarrow$  2's complement of 5 ] Add them.

Now, how can we get 2's complement of a no.?

It is 1's complement Plus 1.

So, all that I have to do is add a circuit to my 4-bit adder that can generate 2's complement of one of its input while other input remains same.

i.e.



| XOR |   |    |
|-----|---|----|
| A   | B | Op |
| 0   | 0 | 0  |
| 0   | 1 | 1  |
| 1   | 0 | 1  |
| 1   | 1 | 0  |

when one input is 1, XOR will act like buffer.

when one input is 0, XOR will invert.

to the circuit.   
 when  $S_0$  is 1,  $Y_0$  is the input and  $C_{in}$  is added.   
 if  $S_0$  is 0,  $Y_0$  is also added.   
 simultaneously.   
 $C_{in} = 0$ .   
 $S_{addition} = X + Y$ .

## # Multiplication using Booth's Algorithm

S (Multiplicand)  
 $\times 3$  (Multiplier)  
15 (Product)

$$\begin{array}{r}
 0101 \\
 -0011 \\
 \hline
 0101 \\
 \cancel{0101} \\
 \hline
 0000xx \\
 0000xxx \\
 \hline
 0001111
 \end{array}$$

This method is faster than repeated addition.

$$\begin{array}{r}
 15 \\
 \times 15 \\
 \hline
 225
 \end{array}$$

$$\begin{array}{r}
 15 \\
 + 15 \text{ (15 times addition)} \\
 \hline
 225
 \end{array}$$

Our method requires only 4 steps.

$$\begin{array}{r}
 255 \\
 \times 255 \\
 \hline
 225
 \end{array}$$

$$\begin{array}{r}
 255 \\
 + 255 \text{ (255 times)} \\
 \hline
 225
 \end{array}$$

Now, what kind of circuit do you need to implement this? Adder.

Accumulator

$$\begin{array}{r}
 \boxed{0000 \quad 0000} \\
 0101 \\
 \hline
 \text{Right shift} \\
 \hline
 \cdot 0101000 \\
 \hline
 01011000 \\
 01111000 \\
 \hline
 15 \quad 00111100 \\
 \hline
 0000 \\
 00111100 \\
 00011110 \\
 \hline
 0000 \\
 00011100 \\
 \hline
 000011111
 \end{array}$$



## Multiplication using Booth's Algorithm

⇒ What are you gonna do if I ask you to multiply two 4-bit numbers using adders only? Subtractor ONLY?

Sol ⇒ Let say  $5 \times 3$  what will you do?

Repeatedly add 5 3 times OR

Repeatedly add 3 5 times?

If yes, how much time does it takes 5 cycles, 3 cycles.

OR 15 cycles!

Largest 4-value that can be stored in 4-bit no. is 15.

$15 \times 15$  will require 15 repeated additions.

So, if ~~negate~~ 1 addition requires 1 cycle

15 additions will require 15 cycles.

This was for 4-bit no.

For 8-bit no. it might take 2 to 4 cycles.

For 16-bit no. it ~~—~~, ~~—~~ 6 to 8 cycles.

which is a very huge amount of time.

So, there must be a faster way to do this multiplication operation.

Hence, comes into picture your Booth's Algorithm.

Now, before starting booth's Algorithm you have to recall basic multiplication.

So, let take an example.

$$\begin{array}{r}
 5 \times 3 \quad 0101 \\
 \quad \quad \quad 0011 \\
 \hline
 \quad \quad \quad 0101 \\
 \quad \quad \quad 0101 \times \\
 \quad \quad \quad 0 \quad 0 \quad 0 \quad X \quad X \\
 \quad \quad \quad 0 \quad 0 \quad 0 \quad 0 \quad X \quad X \quad X \\
 \hline
 \quad \quad \quad 0 \quad 0 \quad 0 \quad 1 \quad 1 \quad 1 \quad 1
 \end{array}$$

What kind of circuit do you require to implement this circuit?

Addition and Shift Addition  
Addition and shift & —  
Shift & —  
Shift & —

We don't need multiplication here for obtaining the ~~no.~~ multiplication.

$\Rightarrow$  No. of ~~bits~~<sup>Steps</sup> = No. of bits in the ~~multiplier~~<sup>number</sup> - 1 =  $S$  (Step no.) bit.

$\Rightarrow$  At each step, examine the multiplier's bits from LSB.

if 1, take that no. as its

if 0, take all 0's

Shift the no. by  $\frac{\text{Total steps}}{\text{Step no.}}$  positions.

Add it to value obtained till previous step.

$\Rightarrow$  You keep on addition new value to previous step's value.

(In other words "you keep on accumulating the results")

$\Rightarrow$  The <sup>minimum</sup> size of the accumulator register is size of multiplicand + plus size of multiplier.

Now, let's see how this calculation is performed using the circuit?

$\Rightarrow$  Initially, your circuit Accumulator has value 00000000 (All zeros)

Accumulator

0000 0000

At step 1:-

As multiplier's 1st bit from LSB is 1  $\Rightarrow$  add 0101 to

0000 0000  
0101

0101 0000

Right Shift 0010 1000

At step 2:- 2nd bit from LSB

As multiplier's 2nd bit from LSB is 1  $\Rightarrow$  add 0101 to

0101 1000  
0101

0111 1000

Right Shift 0011 1000

At step 3:- As 3rd bit from LSB is 0  $\Rightarrow$  add 0000 to

Add 0000 is not required so directly shift Add 0000 to directly shift

At step 4:- As 4th bit from LSB is 0

0001 1110

0000



The generic multiplication fails when trying to multiply two a-terms

$$\begin{array}{r}
 \underline{\text{Cn}} - S & 0101 \\
 X-3 & \underline{1101} \\
 \hline
 & 0101 \\
 & 0000x \\
 & 0101xx \\
 \hline
 & \underline{1000001}
 \end{array}$$

this is in 2's complement form  
Now check what no. it is?

0·11111

$\therefore$  no. is -63. Wrong.

Why the answer comes out to be -ve? as we have considered these position as '0' (i.e. +ve by default)

∴ let's take an example,  
what Booth's Algorithm does well is it takes MSB and  
extends the no.  
by copying  
MSB.

Let's look at the tabular method of solving the algorithm

- 1) The no. of steps = no. of bits in the multiplier  
2) At every step examine the last 2 bits near the LSB of the multiplier  
(including imaginary position)

if bits are 00, subtract the multiplicand (add 2's complement of multiplicand)  
if bits are 01, add the multiplicand (as it is)  
if bits are 10 or 11 } Do Nothing.

3) Perform Right shift

4) Repeat steps 2 and 3 till it reaches the no. of steps.

Another way to write this Algo.

1) — " —

2) — " —

if transition is from 0 to 1, subtract the multiplicand  
(i.e. add 2's complement of the multiplicand)

if transition is from 1 to 0, add the multiplicand  
as it is

if <sup>No</sup> ~~trans~~ transition i.e. 1 to 1 or 0 to 0 } Do nothing

3) — " —

4) — " —

# TABULAR METHOD OF Booth's Algorithm for signed multiplication

Ex:-  $S \times 7$        $S = 0101$        $7 = 0111$   
 $-S = 1011$        $-7 = 1001$

| 7                               | A(0)        | Q(7)         | Imaginary | M(S)         |
|---------------------------------|-------------|--------------|-----------|--------------|
|                                 | ACCUMULATOR | Multiplicand | Q1        | Multiplicand |
| Transition 0 to 1<br>SUBTRACT M | 0000        | 0111         | 0         | 0101         |
|                                 | 1011        |              |           |              |
|                                 | 1011        | 0111         |           |              |
| RS                              | 1101        | 1011         | 1         |              |
| Transition 1 to 1<br>DO NOTHING |             |              |           |              |
| RS                              | 1110        | 1101         | 1         |              |
| Transition 1 to 1<br>DO NOTHING |             |              |           |              |
| RS                              | 1111        | 0110         | 1         |              |
| Transition 1 to 0<br>ADDM       | 0101        |              |           |              |
|                                 | 0100        | 0110         |           |              |
| RS                              | 0010        | 0011         | 0         |              |

This no. is +ve  $\therefore$  answer is  $35$  ( $2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0$ )  
 $0\ 0\ 1\ 0\ 0\ 0\ 1\ 1$

NOTE:- Things to remember to avoid mistakes.

- $\Rightarrow$  ① Write the nos./bits one below the other to avoid errors during shifting.
- $\Rightarrow$  ② Take both multiplier and multiplicand of same size (i.e., if multiplier is 4 bits and multiplicand is 5-bits take both nos. as 5-bits to avoid making errors)
- $\Rightarrow$  ③ Take both multiplier and multiplicand one bit more than they actually are example take  $5$  as  $0101$  instead of  $101$  this helps avoiding mistakes while taking 2's complement of these nos.

~~X1-2:-~~

~~SX-7~~

$$5 = 0101$$

$$-5 = 1011$$

$$7 = 0111$$

$$-7 = 1001$$

|                                                 | A (0)<br>ACCUMULATOR | Q (-7)<br>MULTIPLIER | Q-1 | M (5)<br>MULTIPLICAND |
|-------------------------------------------------|----------------------|----------------------|-----|-----------------------|
| Transition is<br>0 to 1                         | 0000                 | 1001                 | 0   | 0101                  |
| RS                                              | 1011                 | 1001                 |     |                       |
| Transition is from 1 to 0<br>$\therefore$ Add M | 1011                 | 1100                 | 1   |                       |
| RS                                              | 1101                 | 1100                 |     |                       |
| → No transition<br>$\therefore$ Doing nothing   | 0010                 | 1100                 |     |                       |
| RS                                              | 0001                 | 0110                 | 0   |                       |
| Transition from<br>0 to 1                       | 0000                 | 1011                 | 0   |                       |
| SUB. M                                          | 1011                 | 1011                 |     |                       |
| RS                                              | 1101                 | 1101                 | 1   |                       |
|                                                 |                      |                      |     | Result                |

To check its value i will take  
2's complement of this no.

It will comes out to be.

00100011

$\therefore$  no. is  $\Rightarrow -35$

88

Q:-  $S \times -3$ .

$S$  (multiplicand)  
 $-3$  (multiplier)

$5 = 0101$        $3 = 0011$   
 $-5 = 1011$        $-3 = 1101$

|                           | A (0)<br>ACCUMULATOR                     | Q (-3)<br>MULTIPLIER | $Q^{-1}$ | M (5)<br>MULTIPLICAND |
|---------------------------|------------------------------------------|----------------------|----------|-----------------------|
| Transition after<br>SUB M | $0\ 0\ 0\ 0$<br>$\underline{1\ 0\ 1\ 1}$ | $1\ 1\ 0\ 1$         | 0        | $0\ 1\ 0\ 1$          |
| RS.                       | $1\ 0\ 1\ 1$                             | $1\ 1\ 0\ 1$         |          |                       |
| Transition into<br>ADD M  | $1\ 1\ 0\ 1$<br>$\underline{0\ 1\ 0\ 1}$ | $1\ 1\ 1\ 0$         | 1        |                       |
| RS                        | $0\ 0\ 1\ 0$                             | $1\ 1\ 1\ 0$         |          |                       |
| Transition into<br>SUB M. | $0\ 0\ 0\ 1$                             | $0\ 1\ 1\ 1$         |          |                       |
| RS.                       | $1\ 0\ 1\ 1$<br>$\underline{1\ 1\ 0\ 0}$ | $0\ 1\ 1\ 1$         |          |                       |
| No Trans.                 | $1\ 1\ 1\ 0$                             | $0\ 0\ 1\ 1$         |          |                       |
| RS                        | $1\ 1\ 1\ 1$                             | $0\ 0\ 0\ 1$         |          |                       |
|                           |                                          |                      |          | Result.               |

2's complement

0 0 0 0 1 1 1

∴ my answer is  $-15$ . correct done

$$\begin{array}{r}
 S \times \quad 0101 \\
 -3 \quad \underline{-1101} \\
 \hline
 \end{array}
 \quad
 \begin{array}{r}
 0000 \quad 0101 \quad 0 \\
 \hline
 \end{array}
 \quad
 \begin{array}{r}
 1101 \\
 \hline
 \end{array}$$

If transition is from 0 to 1, subtract  $\Rightarrow$  Add 2's complement of 0101 i.e. 1011  
 If transition is from 1 to 0, Add.

$$\begin{array}{r}
 0000 \quad 0101 \\
 \underline{\# 1011} \\
 \hline
 1011 \quad 0101 \\
 \hline
 1101 \quad 1010 \quad 1
 \end{array}$$

Now, transition is from 1 to 0  $\Rightarrow$  Add. 0101

$$\begin{array}{r}
 0101 \\
 \hline
 0010 \quad 1010 \\
 \hline
 0001 \quad 0101 \quad 0
 \end{array}$$

Now, transition is from 0 to 1  $\Rightarrow$  subtract  $\Rightarrow$  Add 1011

$$\begin{array}{r}
 1011 \\
 \hline
 1100 \quad 0101 \\
 \hline
 1110 \quad 0010 \quad 1
 \end{array}$$

Now, transition is from 1 to 0  $\Rightarrow$  Add 0101

$$\begin{array}{r}
 0101 \\
 \hline
 0011 \quad 0010 \\
 \hline
 0001 \quad 1001 \quad 0
 \end{array}$$

This is +ve no.

## ~~Repeat~~ DESIGNING AN ALU (DIVISION)

Example:-

$$4 \overline{)64} \quad |1$$

Case 1.

$$\begin{array}{r} -4 \\ +2 \\ \hline \end{array}$$

So, our division is successful.

Case 2:  $8 \overline{)64} \quad |$

$$\begin{array}{r} -8 \\ -2 \\ \hline \end{array}$$

As our subtraction gives -ve no.  
So, our step is unsuccessful.

So, Now we have to get back

6, ~~for that~~ (This is why  
~~this division~~ method  
is called  
restoring division).

- So, how can we get 6 back
- $\Rightarrow$  You have subtracted 8,
- So, to restore back add 8.

There are two types of division Algorithms available to us.

- Restoring Division AND
- Non-Restoring division

We, will discuss the restoring division first

Algo.

1 The total no. of steps = No. of digits in the dividend.

2 At each step Left shift the dividend

↳ Make an attempt to subtract divisor

↳ if result is +ve, step is successful

↳ if result is -ve, step is not successful

↳ if step is unsuccessful restore  
the dividend by adding divisor to the  
current value ( ~~make quotient 0~~ )  
(append 0 to the quotient)

Append 1 to  
Quotient if  
Non-restoration  
is required

3

Repeat step 2 till ~~all~~ the total no. of steps are reached.

## # Designing of An ALU (DIVISION) :-

Q So, why just like repeated addition, for performing multiplication,  
why can't I use repeated subtraction, for performing division?

Ans As repeated addition was slow,  
similarly, repeated subtraction is slow.

Ex Example:- when you convert decimal to binary.

100 divide by 2

Now you will require so subtractions (approx.).

Therefore, we require special Algorithms that can perform division  
in lesser time / lesser steps.

OR.

Ex:-

$$\begin{array}{r} 100 \\ \hline 2 \end{array}$$

$\frac{-2}{-1}$

Not successful

Now, you want your no. back, so, just add 2.

Restoring.

Now, subtract 2 from 10 Can you do that Yes. how many times  
5 times

So, your answer is

$$\begin{array}{r} 50 \\ \hline 2 | 100 \\ -10 \\ \hline 00 \end{array}$$



Example :- Restoring division.

$$7 = 0111$$

$$-7 = 1001$$

$$3 = 0011$$

$$-3 = 1101$$

A (0)

|       | ACCUMULATOR                   | DIVIDEND | DIVISOR |
|-------|-------------------------------|----------|---------|
| LS    | 0 0 0 0                       | 0 1 1 1  | 0 0 1 1 |
| SUB M | 0 0 0 0<br>+ 1 1 0 1<br>_____ | 1 1 1 0  |         |

(1) 1 0 1  
As no. is -ve  
∴ uns successful  
∴ Q = 0  
Restore by  
Adding DIVISOR  
(3)

ADD M

$$+ 0 0 0 0 \quad 1 1 1 0$$

$$0 0 0 0$$

LS

$$0 0 0 1 \quad 1 1 0$$

$$+ 1 1 0 1$$

(1) 1 1 0  
MSB is 1,  
No is -ve,  
~~UNSUCCESSFUL~~  
SUBTRACTION - FULL  
DIVISION  
Q = 0  
Restore

ADD M

$$+ 0 0 1 1 \quad 1 1 0 0$$

$$0 0 0 1$$

LS

$$0 0 1 1 \quad 1 0 0$$

$$+ 1 1 0 1$$

0 0 0 0  
MSB is 0,  
No. is +ve,  
~~BUT DIVISION~~  
SUCCESSFUL  
Q = 1  
No Restoration

CS  
SUBM

|       |                             |          |  |
|-------|-----------------------------|----------|--|
|       | 1 1 1                       | 0 0 1 -  |  |
|       | 0 0 0 1                     |          |  |
|       | 1 1 0 1                     |          |  |
|       | <u>① 1 0 0</u>              |          |  |
|       | MSB is 1<br>no. is -ve      |          |  |
|       | DIVISION IS<br>UNSUCCESSFUL |          |  |
|       | $Q=0$                       |          |  |
|       | Restore.                    |          |  |
| ADD M | <u>0 0 1 0 1</u>            | 0 0 1 0  |  |
|       | <u>0 0 0 1</u>              |          |  |
|       | REMAINDER                   | QUOTIENT |  |

So, ~~7~~  $7 \div 2$  has given remainder = 1  
Quotient = 2

E  
Your remainder is in Ax (Accumulator)  
and your Quotient is in Dx (register).

## Non-Restoring DIVISION :-

So, Now why do we recognise non-restoring division?

If we already have restoring division.

→ Restoring Division is fine, but, typically in most of the division  
most of the subtraction attempts are unsuccessful  
So, we have to perform restoration w.r.t. each unsuccessful division.  
This creates an overhead.  
To, remove this overhead we are using Non-restoring division.

# NOTE:- Non-restoring division works only for +ve nos.  
i.e. for unsigned nos.

On the other hand,

Restoring division works for both +ve as well as -ve nos.  
i.e. for signed nos.

Therefore, in 8086 up (microprocessor) you have separate instructions for performing division for  
Signed (IDIV)  
Unsigned nos. (DIV)

### Algorithm for Non-restoring division:-

- ① The total no. of steps = No. of digits in the dividend
- ② At each step left shift the dividend
  - ↳ Make an attempt to subtract divisor
    - ↳ if result is +ve, step is successful, append 1 to Quotient.
    - ↳ if result is -ve, step is unsuccessful, append 0 to Quotient
  - ↳ and make the addition in the next step instead of subtraction.
- ③ Repeat step 2 till total no. of steps are reached
- ④ ONLY IF last step is unsuccessful restoration is needed.

## # Example Non-restoring division

$$7 = 0111$$

$$-7 = 1001$$

$$3 = 0011$$

$$-3 = 1101$$

|                                           | A(0)<br>ACCUMULATOR                                                       | Q(7)<br>DIVIDEND                              | M(3)<br>DIVISOR |
|-------------------------------------------|---------------------------------------------------------------------------|-----------------------------------------------|-----------------|
| LS                                        | 0000                                                                      | 0111                                          | 0011            |
| SUB M                                     | $\begin{array}{r} 0000 \\ +1\textcircled{1}01 \\ \hline 1101 \end{array}$ | 111-                                          |                 |
|                                           |                                                                           | No. is -ve<br>Division is unsuccessful        | 1110            |
| NEXT STEP<br>ADD M<br>instead<br>of SUB M |                                                                           |                                               |                 |
| LS                                        | 1011                                                                      | 110 <u>0</u>                                  |                 |
| ADD M                                     | $\begin{array}{r} 0011 \\ +1\textcircled{1}00 \\ \hline 1100 \end{array}$ |                                               |                 |
|                                           |                                                                           | No. is -ve<br>Division is unsuccessful        | 1100            |
| NEXT STEP<br>ADD M                        |                                                                           |                                               |                 |
| LS                                        | 1001                                                                      | 100-                                          |                 |
| ADD M                                     | $\begin{array}{r} 0011 \\ +0000 \\ \hline 0011 \end{array}$               |                                               |                 |
|                                           |                                                                           | No. is +ve<br>Division is successful          | 1001            |
| <del>Next step<br/>ADD M</del>            |                                                                           |                                               |                 |
| LS                                        | 0001                                                                      | 001-                                          |                 |
| SUB M                                     | $\begin{array}{r} +1101 \\ \hline 0110 \end{array}$                       |                                               |                 |
|                                           |                                                                           | -ve.<br>Division is unsuccessful              | 0010            |
| ADD M                                     | $\begin{array}{r} 0011 \\ +0001 \\ \hline 0001 \end{array}$               |                                               |                 |
|                                           |                                                                           | PERFORM<br>RESTORATION<br>AS IT IS LAST STEP. |                 |
|                                           |                                                                           |                                               |                 |
|                                           |                                                                           | QUOTIENT                                      | REMAINDER       |

## RESTORING DIVISION FOR SIGNED NUMBERS :-

1 let M register hold the divisor, Q register hold the dividend.

2 A register should be signed extension of Q.

3 On completion of the algorithm, Q will get the quotient and A will get the remainder

### Algorithm :-

The number of steps required is equal to the no. of bits in the dividend

1 At each step, left Shift the dividend by 1 position.

2 if sign of A and M is same then Subtract the divisor from A  
(Perform  $A - M$ ; Add 2's complement of M)

Else Add M to A.

3 After the operation,

if Sign of A remains the same

OR

the dividend (in A and Q) becomes zero,

then Step is successful

Q bit is ~~2~~ 1

No Restoration.

if Sign of A changes

then step is unsuccessful

Q bit is ~~0~~ 1

Restoration is required.

4 Repeat steps 1 to 3 till all the steps are completed.

NOTE :- The result of the algorithm is such that, the quotient will always be positive and the remainder will get same sign as the dividend.

## RESTORING DIVISION FOR SIGNED NUMBERS :-

-8/-4

$$8 = 01000$$

$$-8 = 11000$$

$$4 = 00100$$

$$-4 = 11100$$

|                                                            | A (SIGN EXTENSION)<br>ACCUMULATOR | Q (-8)<br>DIVIDEND | M (-4)<br>DIVISOR |
|------------------------------------------------------------|-----------------------------------|--------------------|-------------------|
| INITIAL VALUES                                             | ①1111                             | 11000<br>1000-     | ①1100             |
| LS                                                         | 11111                             |                    |                   |
| Sign A, M same as A-M<br>⇒ SUB M                           | + 00100                           |                    |                   |
| Sign of A changes<br>Step is unsuccessful.                 | ②0011                             | 10000<br>10000     | 11100             |
| Restore A<br>Add M                                         | + 11100                           |                    |                   |
| LS                                                         | 11111                             | 10000              |                   |
| SUB M                                                      | + 00100                           | 0000-              |                   |
| Sign of A changes<br>Step is unsuccessful.                 | ②0011                             | 00000              |                   |
| Restore A by<br>adding M                                   | + 11100                           |                    |                   |
| LS                                                         | 11111                             |                    |                   |
| SUB M                                                      | + 00100                           | 0000-              |                   |
| Sign of A changes<br>Step unsuccessful.                    | ②0010                             |                    |                   |
| Restore A<br>by adding M                                   | + 11100                           |                    |                   |
| LS                                                         | 11110                             | 0000-              |                   |
| SUB M                                                      | + 00100                           |                    |                   |
| DIVIDEND(A,Q)=0<br>Step is successful.<br>No restoration   | 00000                             | ②0001              |                   |
| LS                                                         | 00000                             | 0001-              |                   |
| Sign (A-M) Different from A-M                              | + 11100                           |                    |                   |
| Sign of A changes<br>Step unsuccessful                     | ①1100                             |                    |                   |
| Restore ADD A<br>By Sub M.<br>i.e. add 2's complement of M | + ②0100                           | 00010              |                   |
| LS                                                         | 00000                             |                    |                   |

# Em: (-19)/7

$$19 = 010011$$

$$-19 = 101101$$

$$7 = 000111$$

$$-7 = 111001$$

|                             | ACCUMULATOR (A) | DIVIDEND (Q)     | DIVISOR (M) |
|-----------------------------|-----------------|------------------|-------------|
| <u>Initial values.</u>      | 011111          | 101101           | 000111      |
| LS                          | 011111          | 01101 <u>0</u>   |             |
| Sign(A,M): Diff. A+M        | <u>000111</u>   |                  |             |
| A's Sign changes: Unchanged | 000110          |                  |             |
| Restore: Sub. A-M           | +111001         |                  |             |
| LS                          | 111111          |                  |             |
| Sign(A,M): Diff. A+M        | +000111         | 1101 <u>00</u> 0 |             |
| A's Sign changes: Unchanged | 000101          |                  |             |
| Restore: Sub. A-M           | +111001         |                  |             |
| LS                          | 111101          | 10100 <u>0</u>   |             |
| Sign(A,M): Diff. A+M        | +000111         |                  |             |
| A's Sign changes: Unchanged | 000100          |                  |             |
| Restore: Sub. A-M           | +111001         |                  |             |
| LS                          | 111101          |                  |             |

## EXECUTION OF AN INSTRUCTION:-

- ⇒ A typical computing task consists of a series of steps specified by a sequence of machine instructions that constitute a program.
- ⇒ A program is a set of instruction.
  - An instruction is a set of microinstruction.
  - A microinstruction is a set of nanoinstruction.
- ⇒ To execute a program, you have to execute several instructions.
- ⇒ To execute an instruction, you have to execute several microinstructions.
  - To execute a microinstruction, you have to execute a few nanoinstructions.
- ⇒ To execute a program,
  - the processor fetches one instruction at a time and performs the operations specified
- ⇒ Program Counter (PC) stores the address of next instruction to be executed.
- ⇒ Instruction Register (IR) ~~has~~<sup>stores current</sup> the instruction to be executed.
- ⇒ Suppose each instruction is of 4 bytes, stored in a byte addressable memory. Then, to execute an instruction a processor has to perform following steps:-
  - Fetch phase
    - ① Fetch the contents of the memory location pointed to by the PC. The contents of this location are interpreted as an instruction to be executed. Hence, 1 into IR.
$$IR \leftarrow [PC]$$
    - ② Increment PC.
$$PC \leftarrow [PC] + 4.$$
  - Execution phase
    - ③ Execute the instruction in the IR register.

if size of the instruction is more than 4 bytes, repeat step 1 and 2, till the entire instruction is fetched.

### Internal organization of a processor

- ⇒ In this simple organization there is a Single processor bus (internal bus)
- ⇒ The internal bus should not be confused with External bus; External bus connects ~~with~~ processor with Memory and I/O devices.
- ⇒ Note:- Memory Address Register (MAR)  
AND  
Memory data Register (MDR)  
has two inputs and two outputs.
- ⇒ MDR can take data from memory (Input) and ~~so~~  
Send ~~the same data~~ <sup>it</sup> to processor (Output)
- OR
- MDR can take data from processor (Input) and  
Send it to memory (Output)
- These ~~operations~~ <sup>instructions</sup> are decoded and controlled by the control unit.
- ⇒  $R_0, R_{(n-1)}$  may be used as general purpose or special purpose registers such as Accumulator <sup>register</sup>, Count register etc.
- ⇒ The registers Y, Z and temp are internal to the processor and not directly accessible to the programmer (user).

The ALU, registers, ALU, and the interconnecting buses are collectively called as referred as DATA PATH.

### # BASIC STEPS OF Execution:-

- Transfer a data from one register to another register (Register transfer)
- Perform an arithmetic or logic operation <sup>and</sup> store the obtained result in a processor register.
- Fetch the contents of a given memory location and load them into a processor register.
- Store the result back into the memory location.

### # Register transfer:-

⇒ The input and output of Register  $R_i$  are controlled by signals  $R_{i\text{in}}$  and  $R_{i\text{out}}$  respectively.

When  $R_{i\text{in}} = 1$ , data <sup>available on the bus</sup> is loaded into the register.

When  $R_{i\text{out}} = 1$ ; data of the register is ~~not~~ placed on the internal bus.

⇒ To transfer data from  $R_i$  to  $R_u$ .

Enable  $R_{i\text{out}} (=1)$

:

Enable  $R_{u\text{in}} (=1)$



Input and output gating of one register bit

⇒ The Q output of D flip-flop is connected to the bus via a tri-state gate.

when  $R_{out} = 0$ , gate's O/p is high impedance.

$R_{out} = 1$ , gate's O/p is 0 or 1 depending on value of Q.

## # Performing an arithmetic or logic operation.

Addl  $R_3, R_2, R_1$ .

1.  $R_1$  out,  $Y_{im}$
2.  $R_2$  out, SelectY, Add,  $Z_{im}$
3.  $Z$  out,  $R_3$  in.



Observe; which value is placed on the processor's bus.

## # Fetching a word from memory:-

Move(R<sub>1</sub>), R<sub>2</sub>



1. R<sub>1</sub> out, MAR<sub>in</sub>, Read

2. MDR<sub>in\_E</sub>, WMFC

3. MDR<sub>out</sub>, R<sub>2</sub> in.

Wait for  
memory function  
Completed.  
Wait for Memory  
fetch to  
complete

## # Storing a word in memory :-

Move R<sub>2</sub>, (R<sub>1</sub>)

1. R<sub>1</sub> out, MAR<sub>in</sub>

2. R<sub>2</sub> out, MDR<sub>in</sub>, Write

3. MDR<sub>out\_E</sub>, WMFC

## EXECUTION OF A COMPLETE INSTRUCTION :-

Add ( $R_3$ ),  $R_1$

1. PCout, MARin, Read, Select 4, Add, Zin
  2. Zout, PCin, Yin, WMFC
  3. MDRout, IRin  $\rightarrow$  Instruction fetch
  4. R3out, MARin, Read  $\#$  Address value of memory location given by  $R_3$  is read into MAR
  5. R1out, Yin, WMFC  $\#$  2nd operand  $R_1$  is placed on the bus.
  6. MDRout, Select Y, Add, Zin  $\#$  value of  $R_3$  is placed on the bus and added; result to store back the result it has to be stored in  $Zin$ .
  7. Zout, R1in, End.  $\#$  The result is stored back in  $R_1$ .
- Program counter is updated by a value to point to the next instruction.

### Execution of an unconditional branch Instruction:

1. PCout, MARin, Read, Select 4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. Offset-field-of-IRout, Add, Zin  $\#$  JBRP LABEL1  
1000 JBRP 36
5. Zout, PCin, End  $\#$  1040

Notes:- A branch instruction replaces the contents of the PC with the branch target address.

This address is usually obtained by adding an offset  $X$ , given by the branch instruction. (w.r.t. updated value of PC).

Ex:- if the branch instruction is at location 1000, and if branch target is 2040, the value of  $X$  must be 36.

## # Conditional branch :-

~~jlt~~ jltz (branch < 0)

then the 4th line will be updated as.

4. Offset-field-of IRout, Add, Zin, If N=0 then End.

i. if N=0, processor returns to step 1 immediately.

ii. if N=1, step 5 is performed.

## # three bus organization

Add R<sub>4</sub>, R<sub>5</sub>, R<sub>6</sub>

1. P(out),  $R = B$ , MAR<sub>im</sub>, Read, IncPC

2. WMFC

3. MDR out B,  $R = B$ , IR<sub>im</sub>

4. R<sub>4</sub> out A, R<sub>5</sub> out B, Select A, Add, R<sub>6im</sub>, End

The contents of PC are passed to bus C through ALU.  
Then same content is passed to MAR using bus C.  
as MAR is connected only to bus C.

# Operands comes from buses A and B and the results are transferred to the destination over bus C.

# ALU can pass one of its two input operands unmodified to Bus C. Such control signals are represented by  $R = A$ , or  $R = B$ .

# PC has a separate incrementor unit. Nevertheless, constant 4 at the ALU is still useful for instructions like load multiple store multiple (assuming we are storing loading/storing) 4 bytes at a time.

#

## RISC

## v/s

## CISC

- |                                                                                     |                                                                                |
|-------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
| <u>1.</u> RISC architecture utilizes a small, highly optimized set of instructions. | <u>1.</u> CISC architectures utilizes a small specialized set of instructions. |
| <u>2.</u> Longer code<br>Short instructions                                         | <u>2.</u> Shorter code<br>Long instructions                                    |
| <u>3.</u> Emphasis is on developing hardware for complex instructions.              | <del><u>3.</u></del>                                                           |
| <u>3.</u> Emphasis is on developing software                                        | <u>3.</u> Emphasis is on developing hardware for complex set of instruction.   |
| <u>4.</u> Each instruction takes 1 clock cycle                                      | <u>4.</u> Each instruction <sup>may</sup> takes multiple clock cycles.         |
| <u>5.</u> Easy to pipeline                                                          | <u>5.</u> Hard to pipeline                                                     |
| <u>6.</u> Examples of RISC processors:-<br>Sun UltraSPARC,<br>IBM Power PC.         | <u>6.</u> Ex:- intel x86 family of Architectures,<br>Motorola 6800, 6809.      |

### Interstage buffers

Within each stage of pipeline a Buffer is needed to hold the information being passed from one stage to another.

This is a part of SYNCHRONOUS PIPELINE

# Asynchronous pipeline:- In asynchronous pipeline.

Transfer of information b/w two stages takes place when individual stages are ready.

In Asynchronous pipeline there is no concept of buffers, handshaking takes place using Ready and Ack signals.



⇒ ~~Different~~ This kind of pipelining is used when time required to execute one stage of pipelining is more than the another stage.

# Pipelining can be implemented in two manners Synchronously and Asynchronously.

$$\text{Throughput} = \frac{\text{Total}}{\text{no. of Instructions being executed}}$$

Total time taken to execute these instructions.

# Pipelining can be implemented when a task can be divided into two or more subtasks, which can be performed independently.

# Different subtasks are performed by different hardware blocks known as stages.

# The intermediate results generated by each stage is temporarily buffered in latches and then passed on to the next stage.

## # 2-stage Pipelining:-

F: fetch an instruction from the memory.

E: perform the operation specified by the instruction



$$\text{Total Throughput} = \left. \frac{4}{5} = 0.8 \right\} \begin{matrix} \text{if Non-pipeline execution} \\ = \frac{4}{8} = 0.5 \end{matrix}$$

## # 4-stage pipelining:-

F: read the instruction from the memory

D: decode the instruction and fetch the source operand

E: perform the operation specified by the instruction

W: store the result in the destination location

# Pipelining is most effective in improving performance if the task being assigned in different stages requires about the same amount of time.

$\Rightarrow$  The clock period should be chosen such that it corresponds to the longest duration among all the stages.

- # A pipeline stage ~~may~~ is not able to finish its task within the allocated time, it impacts the execution of entire pipelining.
- # A pipeline is said to have been stalled for  $\theta$  no. of cycles.
- # Any condition that causes a pipeline to stall is called a hazard.
- # Data hazard:- A data hazard is any condition in which either the source or the destination operands of an instruction are not available at the expected time in the pipeline
- # Instruction hazards:- A pipeline may be stalled due to unavailability of the an instruction due to an event such as cache miss, thereby requiring data instruction to be fetched from main memory, this type of ~~a~~ hazards are known as instruction hazards or control hazards
- # Structural hazards:- This type of hazards may occur when two or more instructions requires the use of a given hardware resource at the same time.

ex Load  $X(R_1), R_2$

Effect of an execution operation taking more than one clock cycle.

| Clock cycle    | 1              | 2               | 3              | 4              | 5              | 6              | 7              | 8              | 9              | 10             |
|----------------|----------------|-----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| I <sub>1</sub> | F <sub>1</sub> | D <sub>1</sub>  | E <sub>1</sub> | W <sub>1</sub> |                |                |                |                |                |                |
| I <sub>2</sub> |                | -F <sub>2</sub> | D <sub>2</sub> | E <sub>2</sub> | E <sub>2</sub> | E <sub>2</sub> | W <sub>2</sub> |                |                |                |
| I <sub>3</sub> |                |                 | F <sub>3</sub> | D <sub>3</sub> | id             | id             | E <sub>3</sub> | W <sub>3</sub> |                |                |
| I <sub>4</sub> |                |                 |                | F <sub>4</sub> | id             | id             | D <sub>4</sub> | E <sub>4</sub> | W <sub>4</sub> |                |
| I <sub>5</sub> |                |                 |                |                | id             | id             | F <sub>5</sub> | D <sub>5</sub> | E <sub>5</sub> | W <sub>5</sub> |

Form  
of  
Data  
hazard

Pipelined operation is stalled for two clock cycles.

Any condition that causes the pipeline to stall is called hazards

| Clock cycle    | 1              | 2              | 3              | 4              | 5              | 6              | 7              | 8    | 9 |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|------|---|
| I <sub>1</sub> | F <sub>1</sub> | D <sub>1</sub> | E <sub>1</sub> | W <sub>1</sub> |                |                |                |      |   |
| I <sub>2</sub> |                | F <sub>2</sub> | F <sub>2</sub> | F <sub>2</sub> | D <sub>2</sub> | E <sub>2</sub> | W <sub>3</sub> |      |   |
| I <sub>3</sub> |                |                |                |                | Idle           | Idle           | W <sub>3</sub> | Idle |   |
| I <sub>4</sub> |                |                |                |                |                |                |                |      |   |
| I <sub>5</sub> |                |                |                |                |                |                |                |      |   |

Idle 3, 4  
Idle 4, 5  
Idle 5, 6

These types of stalls are also called bubbles, once created they will move downstream until it reaches the last unit.

This is an example of instruction hazard due to unavailability of instruction due to event such as cache miss.

Scanned with CamScanner

The no. of cycles required to execute 'N' no. of instructions in a  $k$ -stage pipeline are.

$$k + (N-1).$$

|                | 1              | 2              | 3              | 4              | 5              | 6              | 7              | 8              | 9              | 10             | 11             | 12             |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| I <sub>1</sub> | F <sub>1</sub> | D <sub>1</sub> | E <sub>1</sub> | W <sub>1</sub> |                |                |                |                |                |                |                |                |
| I <sub>2</sub> |                | F <sub>2</sub> | D <sub>2</sub> | E <sub>2</sub> | W <sub>2</sub> |                |                |                |                |                |                |                |
| I <sub>3</sub> |                |                | F <sub>3</sub> | D <sub>3</sub> | E <sub>3</sub> | W <sub>3</sub> |                |                |                |                |                |                |
| I <sub>4</sub> |                |                |                | F <sub>4</sub> | D <sub>4</sub> | E <sub>4</sub> | W <sub>4</sub> |                |                |                |                |                |
| I <sub>5</sub> |                |                |                |                | F <sub>5</sub> | D <sub>5</sub> | E <sub>5</sub> | W <sub>5</sub> |                |                |                |                |
| I <sub>6</sub> |                |                |                |                |                | F <sub>6</sub> | D <sub>6</sub> | E <sub>6</sub> | W <sub>6</sub> |                |                |                |
| I <sub>7</sub> |                |                |                |                |                |                | F <sub>7</sub> | D <sub>7</sub> | E <sub>7</sub> | W <sub>7</sub> |                |                |
| I <sub>8</sub> |                |                |                |                |                |                |                | F <sub>8</sub> | D <sub>8</sub> | E <sub>8</sub> | W <sub>8</sub> |                |
| I <sub>9</sub> |                |                |                |                |                |                |                |                | F <sub>9</sub> | D <sub>9</sub> | E <sub>9</sub> | W <sub>9</sub> |

$$k = 4, N = 9$$

Total no. of cycles =  $k + (N-1) = 4 + (9-1) = 12$

Total time of execute 'N' instruction in  $k$  stage pipelining with each stage taking  $T$  unit of time.

$$= [k + (N-1)] T$$



However, it is not always possible to make time of each stage equal.

For ex:- in 3-stage organization, we have seen it require 3 control steps to ~~generate~~ fetch the instruction and ~~one~~ stage one control step to execute the instruction.

Therefore, in such a scenario where time of each stage cannot be made equal.

The clock period should be chosen such that it corresponds to the longest duration among all the stages.

$$\text{Speed up} = \frac{\text{Performance of pipelined processor}}{\text{Performance of Non-pipelined processor}}$$

=

$$\begin{aligned}\text{Speed} &= \frac{\text{distance}}{\text{time.}} = \frac{\cancel{\text{Time}} \text{ Cycles/time}}{\cancel{\text{Time}} \text{ Cycles/time.}} \\ &= \frac{\text{Cycles}}{\text{Time}}.\end{aligned}$$

$$= \frac{\text{Time taken by Non-pipelined processor}}{\text{Time taken by Pipelined processor}}$$

$$\therefore = \frac{n * k * T}{[k + (n-1)] T} = \frac{n * k}{k + (n-1)}$$

for large values of  $n$ ,  $\approx k$ .

$\therefore$  Performance of Pipeline increases with ~~the~~ no. of stages

However, this is not always true as we haven't considered buffer delay in this calculation  
in the pipeline