

Books :-

- Structured Computer Organization - Tanenbaum
- Computer Organization - Hamacher

75% → Theory

25% → Lab

Theory :

Quiz 20%

Mid-term 30%

End-term 50%

- Computer is a concept of stored programs

- Program : set of instructions

- Needs <sup>storage area</sup> ~~memory~~ to store a program, processing unit is <sup>also</sup> required

- Needs I/P, O/P devices

1<sup>st</sup> Gen Vacuum Tubes

2<sup>nd</sup> Gen Transistor

3<sup>rd</sup> Gen IC

4<sup>th</sup> Gen VLSI → very large scale integration

↳ more no of transistors / area

The no. of transistors that you can put per unit area will double every 2 years → Moore's law

(It has slowed down now due to fabrication technologies)

③ Embedded Computers

when a program is being executed, it becomes a process

i/p devices : keyboard, mouse, joystick, webcam, sensors

o/p devices monitor, printers, speakers, projectors



### Von Neumann architecture

After evolution, now, the architecture is :



### Single Bus architecture

Bus :- set of wires. Data travels over those wires.  
Over single wire, we can transmit 1 bit of data.

Multiprocessors :- Dualcore, Quadcore.

64 bit processor: can process 64 bits in one go.

Serial Bus :- Bits are going in a series.

ROM has boot  
↓  
non-volatile

Camlin Page

Date / /

int a = 8; : 4 bytes is given to it  
00000000 00000000 00000000 000001000

We have to send all 32 bits in one go. Hence, bus is required.

02-08-17

single bus can cause bottle neck and hence, make processor slower  
both I/O devices & memory want to communicate to CPU at same time.

We use double bus to avoid above kind of problems

CPU              Memory → volatile (Main/Primary/Physical/RAM)



- Programs are stored in Hard Disk (Secondary storage) (part of I/P - O/P scenario)
- Whenever we run a program, it gets picked up from HD to RAM  
Operating system is the manager here.  
assign specific spaces to all programs which are executed.
- OS : It is a software  $\Rightarrow$  It is also running  $\Rightarrow$  It should also be in RAM.
- On power on, # boot program in ROM which will put OS in RAM.
- OS is interface b/w user & hardware.



One by one, instructions will go to CPU & will be executed.

Machinery language : Binary language

Reuse digital circuits.  $\rightarrow$  (chapter 3)

Camlin Page

Date / /

- Basic architecture for each machine is different.

needs computer  
High level language :  $c = a + b$ ;

needs assembler  
Assembly language : (use mnemonics) ADD R1, R2, R3

Machine language : 0010 0001 0010 0011

- \* Assembly language is machine dependent, so is assembler.
- Modern day compilers directly convert HLL to Machine language.
- Program is stored in RAM in machine language.

10 CPU :



- Flip-flop : storage device which can store 1 bit of data.
- Registers : collection of flip-flops where data can be stored.  
20 can store 1 word of data
- 32-bit machine : 32 bits can be processed in one go.  
Usually, the word size is 32 bit. We need to store 32 bits. Hence, size of register : 32 bits. (32 flip flops)

25 int a = 8 ; stored in register as :

00000000000000000000000000000000

← word →

- Registers could be :

1) General Purpose (GPR)

2) Special Purpose

- New upgraded machine will always run the programs that could be run on earlier one  $\Rightarrow$  needs to be backward compatible

HD onto  $\rightarrow$  magnetic disk

- Flip-flops are primary memory
- All RAM, ROM, HD ~~and~~ have diff. manufacturing technologies



- To access a particular word in memory, we need its address
- Memory could be word addressable or byte addressable.  
each word has  $\frac{\text{no. of addresses}}{\text{no. of words}}$  an address
- suppose we have 4 words



need 2 bits to  
store address

- If there are  $2^k$  addresses ( $2^k$  words), then  $k$  bits of memory will be needed to store the addresses.

$2^{20}$  : Mega

$2^{30}$  : Giga

$2^{10}$  (1K)

4-8-17

- Embedded processors / microcontrollers are used in washing machines
- It depends on system we design that how the processor should be, ...
- Your job as architect is to design.

P.L.L.

O/S

ISA

Instruction set Architecture

Microprocessor

H/W

Architecture

Organisation



- In case of byte addressable memory, we can fetch a char. if we know its address (needs 1 byte)

→ 2 MB RAM  $\Rightarrow 2 \times 2^{20}$  Bytes =  $2^{21}$  Bytes  
 Byte addressable  $\Rightarrow$  we will need 21 bits of addresses.

By 32-bit machine

4 words addressable : 4 bytes = 1 word  
 $2^{21}$  bytes =  $2^{19}$  words

Word addressable  $\Rightarrow$  we will need 19 bits of addresses

- O.S. is at bottom location of RAM (stored in starting)

- When a program is executed, at the starting of instruction will be loaded in a SPR (from RAM) called Program Counter

always points to the next instruction to be executed

PC → Program Counter

multiple wires:  $\Rightarrow$  or  $\nrightarrow$

MAR → Memory Address Register.

MDR → .. data ..

Camlin Page

Date / /

(carrying  
similar type  
of data)

- The next instruction has to be fetched from the memory.
- PC gives this add. to MAR & at same time, PC will increment to point to next ~~word~~ <sup>instruction</sup> address (next word).

so it's word addressable

- each bit doesn't have an address.

- It is word addressable  $\Rightarrow$  next address after 2000H  $\rightarrow$  2001H

$\Downarrow$   
Each word has an address.

→ If it had been byte addressable memory  $\Rightarrow$  next address after 2000H

RAM

CPU

would be 2004H

Byte  
addressable



2 MB RAM  $\Rightarrow$   $2^{21}$  Bytes  $\Rightarrow$  21 bits needed for address

size needed of MAR = 21 bits.

Word addressable  $\Rightarrow$  size needed of MAR = 19 bits.

\* Program is divided into 4 parts



- MAR is accessing the RAM so arrow dir<sup>n</sup> MAR  $\rightarrow$  RAM & read signal is also sent.

2000H : 21 bit : 0 0000 0010 0000 0000 0000  
                   2      0      0      0

- \* The address bus should be unidirectional bcoz only processor  $\Downarrow$   
 o. only add. is ... is sending add. to memory, not vice versa  
 'passed by this'

- So, MAR will send add. to RAM. This content will go from to Processor through data bus.
- Each word has 32 bits data stored & this complete data travels through data bus  $\Rightarrow$  32 bits is required
- MDR receives the instruction.  
So data bus is bidirectional.
- \* signals are all controlled by the control Unit.
- Write, store : CPU  $\rightarrow$  Memory / other device  
Read, load : Memory  $\rightarrow$  CPU
- It is assumed that 1<sup>st</sup> thing brought is an instruction.  
So, it is taken to IR.
- Purpose of IR : To hold the current instruction which is currently executed.
- Decoder will decode & instruction will be executed. Then 2004 will be loaded in PC.
- This cycle is known as Fetch-Decode-Execute cycle.

Eg. ADD R1, R2, R3      R1, R2, R3 : GPR holding the variables  
 HLL : machine independent      AL : machine dependent

This ~~mean~~ can mean :

destination R1  $\leftarrow$  R2 + R3 source ADD : operation to be performed  
 or R3  $\leftarrow$  R1 + R2       $\hookrightarrow$  opcode

This is a 3 address instruction (As it has 3 operands)

2-add 30 ADD R1, R2 : Add R1 & R2 & store in R1/R2 (machine dependent)  
R1  $\leftarrow$  R1 + R2  
 or R2  $\leftarrow$  R1 + R2

⇒ Only machines with 3-add have spl. registers called "Accumulator".  
→ Add Add" is taking place in ALU.

1-add ADD R1 ; If a register : Accumulator which is taken as 2nd operand & result is stored over there only.  
 $Acc \leftarrow Acc + R1$

0-add ADD



1-addressable :

Registers File

special dedicated Register



How to put values in accumulator :

{

```
int x, y, z;
```

 $x = 200; y = 250;$ 
 $z = x + y;$ 

3

→ When we declare data, variables memory space is assigned to 3 variable (4 bytes each) ↓  
in the stack (local variables) follows LIFO.



stack

brought value of  $x$  to R1 ← "z = x + y" will be stored in RAM in Assembly language

LOAD R1, 3000H } have to bring x & y in processor

LOAD R2, 3600H }

ADD R3, R2, R1 } Loading from the memory to processor

STORE R3, 4000H } memory that has been assigned to z.

11/08/17

Some assembly languages use -

LOAD R1, @3000H or LOAD R1, #3000H

→ LOAD R1, 3000 R1 out, R2 out, ADD, ALU.Temp.in

LOAD R2, 3200 ALU.Temp.out, R1, END U

ADD R1, R2 U

STORE R1, 3350

Micro instructions :

actually executed in hardware. Each micro-instruction takes 1 clock

(The whole line is 1 microinstruction)

everything in one clock cycle

## Clock cycle

Camlin Page

Date / /

- everything happens on a clock pulse.



(It depends on implementation by the system how many clock cycles will be taken)

If memory has to be accessed, a micro instruction will take more than 1 clock cycle.

ADD R3, LOCA

LOCA  $\rightarrow$  3000



\* The instruction ADD R3, LOCA will go through data bus to MDR.

After going to IR, it will get decoded and then, it will doesn't have LOCA data bcz it is in the memory (while R3 data in the processor).

Now, we have to again access the memory to get data from memory.

So, we will put the address 4000 in the MAR while PC doesn't loose the track of the program.

So, 50 will go to MDR and then to other temporary register while R3 data will go to one Temp. register & after op<sup>n</sup> in ALU,

it will reach the destination.

- The decoding unit will decode that to add. of LOC A is 1000 and send it to MAR
  - PC will not pass the add. of next add. to MAR till the current instruction ends.

- 16 GPR  $\Rightarrow$  need 4 bits to  $\rightarrow$  for registers  
 $\Rightarrow$  it can perform only 2 op's sub / Add  $\Rightarrow$  2 bits needed  
 $\therefore$  3 bits

3 bits : 8 operation's ALU

5. bits : 32 bit Registers

21 bit memory address  $\Rightarrow$  2 MB

| operand           | Register | memory             | : This is how instruction is decoded |
|-------------------|----------|--------------------|--------------------------------------|
| 3                 | 5        | 21                 |                                      |
| 011<br>(for Add") | 00011    | 4000               |                                      |
|                   |          | (address of loc A) |                                      |

12/18/17

\* 20 ADD R3, LOCA : We have to access the memory  
2 times here

- 1) To fetch the instruction
  - 2) To fetch operand from LOC A from the memory.

→ The registers inside processor are connected by internal bus very fast

→ CPU is connected to memory by system bus which slows the processor as it has to communicate outside very slow.

→ MUL A, B : Easy instruction for user

→ Processor speed : depends on clock cycle.

|        |      |
|--------|------|
| Camlin | Page |
| Date   | / /  |

Here, hardware is doing add repeated "add" internally  
(let your hardware be of that kind, not direct mult).

so, MULT A,B will be complex instruction.

- this instruction will have lot many microinstruction
- ease of programming but lot of time is wasted.

• Complex Instruction set computers CISC

- This affects the performance, so, introduced

Reduced Instruction set computers RISC

But → we can't simplify jump to RISC from CISC & start from the start.

so, every machine is always backward compatible.

RISC :

- Usually fixed length instruc's, easy to decode.

To maximize the efficiency of processor :

- 1) Set of instructions should be simple so that it is easy to decode.  
Minimize memory access
- 2) Less use of memory because it takes lots of time to access the memory. So, more GPR should be present.
- 3) More use of Cache. (an internal or intermediate memory).
- 4) Have instructions that can be directly executed (1 microinstruction) on the hardware.

→ Now days we are using hybrid. Old architectures to run old program and new to run new programs.

Multiprocessors :-



- P1, P2 and P3 are interconnected with each other as there might be a possibility that they need to communicate (if programs is running on all processors which has many computations).



Single processor : Quad core (has 4 cores)  
(multicore)

- III : Only 2 processors can communicate with the memory at a time in this system.



- A processor connected with a memory is a computer.
- \* There could be possibility of having a shared memory also.  
(in case of multi computer system)

Workstation :

High End Computer

(high frequency, high memory RAM, coprocessor.)

- In a network of computers, 1 particular computer handles all the computers in the network  $\rightarrow$  main frame/ server.

Array Processors : same kind of computation is to be performed on all the data.



specific app's only (not in our daily computers)

Vector Processors : When oper's are performed on a pair of data.



$\rightarrow$  Till now, both instructions & data are stored in the same memory. (only 1 can be executed at one go)

Split Memory :



Harvard Architecture

Here, we can access both DM & IM at the same time.

## Boolean Algebra :

$\rightarrow -8 \quad 1000 \quad 0$

3 ways to represent negative no. ?

1) Signed bit magnitude :

MSB : represent sign

J

for sign

4 bit : Range :  $-7$  to  $+7$

n bit : Range :  $-(2^{n-1}-1)$  to  $(2^{n-1}+1)$

Problems have 2 representations for 0 (+0 & -0)

2) 1's complement :

1's complement

|    |      |      |    |
|----|------|------|----|
| +0 | 0000 | 1000 | -7 |
| +1 | 0001 | 1001 | -6 |
|    | :    | :    |    |
|    | 0110 | 1111 | -0 |
| +7 | 0111 |      |    |

We can't represent 8

as 1000 (used to represent -7)

Range :  $-7$  to  $+7$

Again same problem.

3) 2's complement : (2's complement : 1 in MSB) 1111

|    |      |      |    |
|----|------|------|----|
| +0 | 0000 | 1000 | -8 |
| +1 | 0001 | 1001 | -7 |
|    | :    | :    |    |
| +7 | 0111 | 1111 | -1 |

-0 : 1111  $\Rightarrow$  Range :  $-8$  to  $7$

1000 : same representation for 0

Range :  $-(2^{n-1})$  to  $+(2^{n-1}-1)$

→ signed no : If positive no. as well as negative no.

→ unsigned no : If only positive no.

4 bits : No. of unsigned no. available : 16

1111 : 15

Range : 0 to 15

$\Rightarrow$  0 to  $2^4 - 1$

8 bits :

0 to  $2^8 - 1$

0 to 255

signed bit magnitude :

Range : -127 to +127

2's complement : -128 to +127

Unsigned,

→ floating point representation :

$x_3 x_2 x_1 x_0 \cdot x_{-1} x_{-2}$

$x_0 \times 2^0 + x_1 \times 2^1 + x_2 \times 2^2 + x_3 \times 2^3, x_1 \times 2^{-1} + x_2 \times 2^{-2} + \dots$

8.5

unsigned representation :  $8 \times 2^0 \underline{5 \times 2^{-1}} \overline{2}$

1000 . 1

5

→ -9 4bit rep" using 2's complement : Not possible

→ 1001 . signed no. value : -7

#

→  $-(1 \times 2^3) + 0 \times 2^2 + 0 \times 2^1 + (1 + 2^0)$  (add - sign in front of MSB in case of signed no)  
 $= -7$

If add 1 in front:

11 001      00 111

Note:  
 {Signed no.}

Extending the sign bit (whatever value is given in MSB) doesn't change the no. This is called as sign extension.

→ Half adder : I/p don't have carry, diff. variable i/p

→ Full adder : process carry bit also.  
 ↑ → not taken as i/p

$$\begin{array}{r}
 11 \ 0 \ 0 \\
 11 \ 0 \ 1 \\
 \hline
 1 \ 0 \ 0 \ 0 \ 1
 \end{array}
 \quad \text{(Half adder)} \quad \text{(I/p has no carry)}$$

$$\begin{array}{r}
 11 \ 0 \ 0 \\
 11 \ 0 \ 1 \\
 \hline
 1 \ 1 \ 1 \ 0 \ 1
 \end{array}
 \quad \text{(Full adder)}$$

→ 1 full adder was taking care of 1 bit only.

⇒ Shifter : moves the bits of a nibble one position to the left

18/8/17

Sequential circuit: have some memory & remember their previous states. (Usually have flip-flop to remember their previous state.)

- A "clock" is a spcl. circuit that sends electrical pulses through a circuit.
- Clocks produce electrical waveforms



set of logic gates to implement boolean func": Combinational Circuit

|        |      |
|--------|------|
| Camlin | Page |
| Date   | / /  |



Edge triggered : change state on rising edge, falling edge.

Level triggered : change state at highest or lowest level

→ To retain their state values, these circuits rely on feedback.



If it is 0, it will always remain 0. If 1, always 1.

- S-R flip flop : 1-1 : not defined.

S → Set

R → Reset

- D flip-flop : used in memory bcoz doesn't change o/p

Slide 60: 3 flip-flops in each word  $\Rightarrow$  3 bits

→ Need 2 bits (2 select lines  $S_0$  &  $S_1$ ) to store address of 4 words.

I/O lines : sending data into the memory

O/P lines : Taking data out of memory

Slide 2:  $7 + 6 = 13$   
(signed bit)

0 1 1 1

0 1 1 0  
1 1 0 1 x

can't represent 13 in  
4 bits in case of add"  
of signed bits

0 0 1 0  
0 0 1 1 : - 3

32 bits of  $x$  ]  $\Rightarrow$  32 FA needed to add  
32 bits of  $y$

↳ have 4-bit add FA.

5 32 bit of  $x$  ]  $\Rightarrow$  8 FA needed to add  
32 bit of  $y$   $x$  and  $y$ .

K n-bit adders :

- K n-bit no. can be added by cascading K n-bit adders.

- Carries ripple or propagate through blocks,  
Blocked Ripple Carry Adder

n-bit subtractor :

$$15 \quad x - y \Rightarrow x + 2\text{'s complement of } y \\ \Rightarrow x + 1\text{'s complement of } y + 1$$

$$x - y = x + \bar{y} + 1$$

20 n-bit adder / subtractor

$$\begin{array}{r} 11010010 \\ 01011101 \\ \hline 01011101 \end{array}$$

Add"  $\Rightarrow$  control line : 0  
 21's complement  $\rightarrow$  10100010

control line : 0

1

↳ sub": add 1, so  $c_0 = 1$  (control line = 1)

Overflow :

- Overflows can only occur when signs of both the operands  
30 are same

- Overflows can occur if the sign of result is diff.  
from sign of operands. ( $7 + 6 = 13$ )

$$\begin{array}{r} 7 \\ + 6 \\ \hline 13 \end{array}$$

$$\begin{array}{r} x_{n-1} \\ \downarrow \\ \rightarrow 0 \ 1 \ 1 \ 1 \\ y_{n-1} \rightarrow 0 \ 1 \ 1 \ 0 \\ \downarrow 1 \ 1 \ 0 \ 1 \\ s_{n-1} \end{array}$$

$$\text{Overflow} = x_{n-1} y_{n-1} \bar{s}_{n-1} + \bar{x}_{n-1} \bar{y}_{n-1} s_{n-1}$$

$0+1=1 \Rightarrow \text{Overflow has occurred.}$

$$\text{Overflow} = c_n \oplus c_{n-1}$$

$0 \oplus 1 = 1 \Rightarrow \text{Overflow has occurred.}$

Computing the adder time : Ripple adder

- \* For n-bit adder,  $s_{n-1}$  is available after  $(2n-1)$  gate delays  
 $c_n$  is available after  $2n$  gate delays.

21/8/17

$$\rightarrow s_i = x_i \oplus y_i \oplus c_i$$

$$c_{i+1} = x_i y_i + x_i c_i + y_i c_i$$

$$\text{or } c_{i+1} = x_i y_i + (x_i + y_i) c_i$$

$$\therefore c_{i+1} = g_i + p_i c_i \quad \text{--- (1)}$$

20  $\downarrow$  AND op<sup>n</sup> for  $x \wedge y$   
 for  $x \wedge y$  (propagate func<sup>n</sup>)  
 (generate func<sup>n</sup>)

$g_i$  &  $p_i$  don't depend upon  $c_{i+1}$

From (1),

$$c_{i+1} = g_i + p_i g_{i-1} + p_i p_{i-1} g_{i-2} + \dots + p_i p_{i-1} \dots p_0 g_0 + p_i p_{i-1} \dots p_0 c_0$$

only this term depends on  $c_0$

∴ This is called carry lookahead adder

## Multiplication of unsigned no.

multiplicand  $\rightarrow$  1101 (13)multiplier  $\rightarrow$   $\times$  1011 (11)

$$\begin{array}{r}
 & 1101 \\
 & \times 1011 \\
 \hline
 & 1101 \\
 & 0000 \\
 & 1101 \\
 \hline
 & 10001111
 \end{array}
 \quad \begin{matrix} \text{→ called partial product} \\ -\textcircled{2} \\ (143) \end{matrix}$$

→ We have to  
of bits or are  
6 32-bit machine  
Product

Signed M

- Product of 2 n-bit no. is at most a  $2n$ -bit no.
- we have to ensure that the register that stores the o/p is of double size

- Unsigned multiplication can be viewed as add<sup>n</sup> of shifted versions of the multiplicand.

- Register stores  $\textcircled{2}$  also as 1101, but at time of add<sup>n</sup>, it gets shifted by one place.
- Now, multiplier bit is 0, so it will directly shift without performing any add<sup>n</sup>.

(Start from  
LSB)

It multiplier bit = 1, initialize product = 0, add multiplicand to product & shift it right (not shift partial product left)

If multiplier bit = 0, simply shift the product to the right.

→ Register : 

|   |   |   |   |   |   |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 0 | 1 | 1 |
|---|---|---|---|---|---|

after this bit has been checked,  
shift right so that again LSB has  
to be checked

|   |   |   |   |
|---|---|---|---|
| 1 | 1 | 0 | 1 |
|---|---|---|---|

→ We have to run the loop as many times as many no. ~~zero~~  
of bits are in the multiplier.

6 32-bit machine : 2 int. mult.  $\Rightarrow$  32 bit multiplier  
Product : 64 bits  $\Rightarrow$  2 registers will be used to store product.

Signed multiplication :-

$$\begin{array}{r} 1 & 0 & 0 & 1 & 1 \\ \times & 0 & 1 & 0 & 1 & 1 \\ \hline & (-13) & & & & \\ & (+11) & & & & \end{array}$$

Sign Ext<sup>n</sup> 1111110011  
1111111001  
0000000000

## Booth Algorithm

$$30 = 32 - 2 \quad (\text{as power of } 2)$$

$$1100000 - 0000010 = 0111100$$

$$M \times (30) = M (32 - 2)$$

$$\begin{array}{r}
 \text{for } 32 \\
 (\text{add}) \quad \leftarrow \begin{array}{ccccccccc} 0 & 1 & 0 & 1 & 1 & 0 & 1 \end{array} \xrightarrow{\text{for } 2 \text{ (subtract)}} \\
 \begin{array}{c} 0 \\ +1 \end{array} \begin{array}{ccccccc} 0 & 0 & 0 & 0 & 1 & 0 \end{array} \\
 \hline
 0_{25} \quad \begin{array}{ccccccccccccc} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array} \\
 \hline
 11111110100011 \Rightarrow \begin{array}{l} \text{2's complement} \\ \text{of the multiplicand} \end{array}
 \end{array}$$

~~anthomoea~~

next bit & previous bit & 0 & 0 : 0

1 & 0 : - 1

0 & -1

1 & 1 : 0

$$\begin{array}{c|cc} & 1 & 2 & 1 \\ \hline 0 & 1 & & \\ +1 & & -1 & 0 \\ \hline & -1 & & \end{array}$$

→ In general, it reduces the no. of add's required.

$$\begin{array}{r} 01101 \quad (+13) \\ 11010 \quad (-6) \end{array} \rightarrow \begin{array}{r} 01101 \\ \times 0-1+1-10 \end{array}$$

### Bit-Pair Recoding of Multiplier

$$0.101 \Rightarrow 0.625$$

$$11010 \quad 0110 \quad 1A654H$$

Eg.  $5 \times (-6)$  using Booth algo.

$$-6 = 1010$$

$$\begin{array}{r} 11111010 \\ \Rightarrow 0000-11-10 \end{array}$$

$$M : \boxed{01|01}$$

$$\begin{array}{r} 1010 \\ -1011 \\ \hline 1011 \end{array}$$

A (Partial Product)

Q

$$0000$$

shift

$$-1 + 1 - 1 0$$

$$0000$$

$$0 - 1 + 1 - 1$$

+ 1011 (Add 2's complement) 2's complement shift (1)

while shifting, 1011

take care of sign extension → 1101

+ 0101

$$0 - 1 + 1 - 1$$

$$1 0 - 1 + 1$$

shift (2)

1

$$\begin{array}{r} 0010 \\ 1001 \\ + 1011 \\ \hline 0010 \end{array}$$

$$1 0 - 1 + 1$$

$$0 1 0 - 1$$

shift (3)

1

$$\begin{array}{r} 0100 \\ 1010 \\ \hline 0010 \end{array}$$

$$0 1 0 - 1$$

$$0 0 1 0$$

shift (4)

$\Rightarrow$  NO 10100010

01011161  
+1

0 1 0 1 1 1 1 0  
6 5 4 3 2 1 0

$$16 + 8 + 4 + 2$$

100

## Bit - Pair Recording of Multiplier

$$\begin{array}{cccccc} 0 & 0 & -1 & +1 & -1 & 0 \\ \hline & & \underbrace{\phantom{0}}_{-1} & \underbrace{\phantom{0}}_{+1} & \underbrace{\phantom{0}}_{-1} & \end{array}$$

I think as we

$$\frac{x_1}{x_0} = \frac{x_0}{x_1} = \dots = \frac{x_n}{x_{n+1}}$$

value of  $x_1$  will be 2 times more than no

$$\begin{array}{ccccccc} & 0 & 0 & , & \underbrace{-1 & +1} & & -1 & 0 \\ 0 \times 2 + 0 \times 1 & = 0 & -1 \times 2 + 1 & = -1 & -1 \times 2 + 0 & = -2 \end{array}$$

Slide 38 : First Booth, then bit pair receding.

$$\text{eg. } \begin{array}{r} \overbrace{1 \quad 0 \quad 0} \\ -1 \quad 0 \\ \hline -2 \end{array} \qquad \begin{array}{r} \overbrace{1 \quad 1 \quad 1} \\ 0 \quad 0 \\ \hline 0 \end{array} \qquad \text{or } \boxed{1 \quad 1 \quad 1}$$

$$\text{Eq. } \underline{13x - 6}$$

Using bit-pair recoding

$$\begin{array}{r} 0 \ 1 \ 1 \ 0 \ 1 \\ 0 \ -1 \ -2 \xrightarrow{\text{add 2's complement}} \text{2 times} \end{array}$$

$$\begin{array}{r}
 01101 \\
 10010 \\
 + \underline{1} \\
 \hline
 (-M) 10011 \\
 \hline
 10011 \\
 \hline
 100110
 \end{array}$$

Semicont  
the machine  
is the same

$$\begin{array}{r}
 0 \ 1 \ 1 \ 0 \\
 0 \ -1 \ -2 \\
 \hline
 1 \ 0 \ 0 \ 1 \ 1 \ 0 \\
 0 \ 0 \ 1 \ 1 \\
 \hline
 0 \ 0 \ 0 \\
 \hline
 1 \ 1 \ 0 \ 1 \ 0 \ 0 \ 1 \ 0
 \end{array}$$

Here, we are getting 2 shifts in partial product (we are pairing up 2 bits)

Carry Save Addition of Summands  
adding 3 no one by one



We can use here a look ahead adder

Slide no. 44 :

Partial Products :



→ not level of carry save add as we don't have 3 no. here.

Manual division :

$$13 \overline{)274}$$

26

14  
- 13  
—  
1

How computer will do it

0 21

$$\begin{array}{r}
 13 \overline{)274} \\
 -13 \\
 \hline
 14 \\
 -13 \\
 \hline
 1
 \end{array}$$

: since we are getting -ive no., Q=0. Now, we have to restore the no. ("2"), so, add the divisor to the remainder.

Example

$$7 \div 4$$

$$0111 \div$$

Init

Shi

Sut

Set

Res

## Simulator

The machine will perform division of binary no. in the same way.

$$\begin{array}{r}
 1101 \\
 -1101 \\
 \hline
 -ive \\
 +1101 \\
 \hline
 10 \\
 -1101 \\
 \hline
 \end{array}$$



whenever  $R > 0$  : put 1 in Q  
shift left, add 1 more bit  
 $R < 0$  : put 0 in Q,  
restore the R  
shift left, add 1 more bit

Example :

$$25 \quad 7 \div 4$$

$$0111 \div 0100$$



8 : using 4 bits  
but 1 bit is taken extra in case of overflow / underflow

$$01000 : 5 \text{ bits}$$

Initially :

$$0000$$

$$111$$

$$0100$$

Shift

$$0001$$

$$11\boxed{ }$$

2's comp. :

Subtractor

$$\underline{1100}$$

$$\begin{array}{r}
 1011 \\
 + 1 \\
 \hline
 1100
 \end{array}$$

Set Q.

$$\boxed{1}101$$

$$\begin{array}{r}
 1110 \\
 \hline
 1110
 \end{array}$$

Restore

$$\begin{array}{r}
 100 \\
 \hline
 0001
 \end{array}$$

$$\begin{array}{r}
 1110 \\
 \hline
 1110
 \end{array}$$

|                    |         |               |
|--------------------|---------|---------------|
| Shift              | 00 11   | 1 0 □         |
| Subtract           | 11 00   |               |
| Set q <sub>0</sub> | 1 1 1   |               |
| Restore            | +01 00  | 1 0 □         |
|                    | 00 11   | 1 0 0         |
| Shift              | 0 1 1 1 | 0 0 □         |
| Subtract           | 1 1 0 0 |               |
| Set q <sub>0</sub> | 0 0 1 1 |               |
|                    | 0 0 1 1 | 0 0 1         |
|                    |         | Remainder : 3 |

Problem : If negative R, restore.

### Non Restoring Division

$$1000 \div 11$$

$$111\ 0\ 0$$

|                                |       |       |
|--------------------------------|-------|-------|
| Initially                      | 00000 | 1000  |
| Shift                          | 00001 | 000□  |
| sign bit<br>rth: 0<br>subtract | 11101 |       |
| $\frac{1}{2} \# q_0 = 0$       | 11110 | 000 0 |

30/8/17

→ 32

|       |              |           |          |
|-------|--------------|-----------|----------|
| shift | A = 1<br>add | 1 1 1 0 0 | 00 0 □ □ |
|       |              | 0 0 0 1 1 |          |

$$1 \# q_0 = 0 \quad 1 1 1 1 1 \quad 00 0 0 0$$

|                        |              |           |           |
|------------------------|--------------|-----------|-----------|
| shift                  | A = 1<br>add | 1 1 1 1 0 | 0 0 0 □ □ |
|                        |              | 0 0 0 1 1 |           |
| D = q <sub>0</sub> = 1 | 0 0 0 0 1    | 0 0 0 0 1 |           |

|         |             |         |
|---------|-------------|---------|
| Shift   | 1 0 0 0 1 0 | 0 0 1 □ |
| subtrah | 1 1 1 0 1   |         |
|         | 1 1 1 1 1   |         |

13 ÷ 3

$$\begin{array}{r}
 \text{shift} \quad 00000 \\
 \text{subtract} \quad 00001 \\
 \hline
 11101
 \end{array}
 \quad
 \begin{array}{r}
 1101 \\
 101 \square
 \end{array}$$

$$\begin{array}{r}
 \text{shift} \quad 11110 \\
 \hline
 11101 \\
 0101 \square
 \end{array}$$

$$\begin{array}{r}
 \text{shift} \quad 00000 \\
 \hline
 00000 \\
 10101
 \end{array}$$

$$\begin{array}{r}
 \text{shift} \quad 00000 \\
 \text{subtract} \quad 00000 \\
 \hline
 11101 \\
 10101
 \end{array}$$

$$\begin{array}{r}
 \text{shift} \quad 11101 \\
 \hline
 11011 \\
 10101
 \end{array}$$

$$\begin{array}{r}
 \text{Add} \quad 00011 \\
 \hline
 11110 \\
 01010
 \end{array}$$

$$\begin{array}{r}
 00011 \\
 00001
 \end{array}$$

30/8/17

### Fractions

→ 32 bit no. : 0.

Range : 0.0000 ... 0 to 0.111 ... 1

$$1 - 2^{-32} = 0.9999999999999999$$

significant digits

$$x = \underbrace{m_1 m_2 m_3 \times}_{\substack{\text{mantissa} \\ \downarrow}} b^{e \rightarrow \text{exponent}}$$

$$0 \leq v(b) \leq 1 - 2^{-n}$$

→ In binary system, base = 2

We already know the base & we don't have to store it. (<sup>2:</sup> implied base)

↓  
can be -ive, +ive  
→ use 2's complement

↳ signed bit is used



### Float value representation (Sample) \*

$$\rightarrow -0.75 \quad 0.5 + 0.25 = 0.75$$

$-0.11$  ; Binary conversion

To get it in scientific notation, we should shift it.

$$\rightarrow -1.1 \times 2^1 \Rightarrow \boxed{\text{Normalised form}}$$

↓  
implied base

always we have to represent in such a way that this position has '1'. Hence, don't need to store this also.

We have to only store 1 → this '1' & exponent in 2's complement form



To check which no. it is:

$$\rightarrow 0 \boxed{1} \Rightarrow -$$

$$\rightarrow \underline{11111} \Rightarrow -1 \Rightarrow \underline{1} \times 2^{-1}$$

$\hookrightarrow$  2's complement

$$\rightarrow 10 \dots, 0 \qquad \qquad \qquad 1 \cdot 100 \dots \times 2^{-1}$$

\* we can't use all 0's and all 1's in <sup>fractional</sup> mantissa, they are reserved

\* If 3 bits, smallest non-zero no. :  $\underline{0} \underline{0} \underline{1} = 0.125$   
 next smallest  $\sim \underline{0} \underline{1} \underline{0} = \underline{0.25}$  : increment of  $0.125 = 0.25$   
 next  $\sim \underline{1} \underline{0} \underline{0} = \underline{0.375}$  ...

$$1.0011\ldots \times 10^{63} \rightarrow 0.10011\ldots \times 10^{-64}$$

But we don't want negative exponent.  
So, we do an Excess-p coding.

max Range :  $-64$  to  $+63$  : 7 bits allotted to exponent  
 $\Rightarrow$  We will add  $+64$  to all exponents.

14. 8 bit :  $-128$  to  $+127 \Rightarrow$  add  $+127$

so  $2^{-1} -1 + 63 = 62 \Rightarrow$  we will put 62 over these

benefit : we can compare 2 floats easily

• We don't have to take 2's complement

$\rightarrow$  make lowest exponent ( $-64$ ) 0 by adding 64.

15. IEEE 754 : We have to add  $+127$  (8-bit)



In case of 8-bit, we have to add  $+128$ . (least no :  $-128$ )

$\rightarrow 21.25$

$$\begin{array}{r} 21.25 \\ \downarrow \\ 16+5 \end{array} \quad \begin{array}{c} 1010101.01 \\ \hline 1.010101 \times 2^4 \end{array} \quad +4+127=131$$

Floating point

~~0100111100~~ → now, we have no problem of signed bit

|     |   |   |     |   |
|-----|---|---|-----|---|
| 124 | 0 | 2 | 132 | 0 |
| 62  | 0 | 2 | 65  | 0 |
| 31  | 1 | 2 | 33  | 1 |
| 15  | 1 | 2 | 16  | 0 |
| 7   | 1 | 2 | 7   | 0 |
| 3   | 1 | 2 | 3   | 0 |
| 1   |   |   | 1   |   |

|   |     |   |
|---|-----|---|
| 2 | 131 | 1 |
| 2 | 65  | 1 |
| 2 | 32  | 0 |
| 2 | 16  | 0 |
| 2 | 8   | 0 |
| 2 | 4   | 0 |
| 2 | 2   | 0 |
|   | 1   | 1 |

10000011

→ 4 bit

|   |          |           |
|---|----------|-----------|
| 0 | 10000011 | 010101... |
|---|----------|-----------|

|        |   |          |   |            |   |
|--------|---|----------|---|------------|---|
| Excess | 1 | 00010101 | ↓ | 1010110... | 0 |
|--------|---|----------|---|------------|---|

-ive

$$16 + 5 = 21$$

$$21 - 127$$

$$-106$$

$$1 \cdot 1010110... \times 2^{-106}$$

$$\Rightarrow 2.09375$$

$$\begin{array}{r} 12 \\ 3125 \\ \times 5 \\ \hline 15625 \end{array}$$

$$\begin{array}{r} 0.51 \\ 0.125 \\ 0.3125 \\ 0.15625 \\ 1.09375 \end{array}$$

1/09/17

Excess notation : We want our exponent to be always positive

→ 7-bit exponent  $\Rightarrow$  -64 to +63

To get all the exponents, we should add 64

$\Rightarrow$  0 to 127 : Excess 64 coding

hence, we can represent 128 no as all ~~bits~~ are unsigned no.

While denormalizing, subtracting 64 from the exponent.

$$E' = E_{\text{true}} + p$$

→ 4 bit for exponent  $\Rightarrow$  excess 8-bit coding. (-8 to +7)  
 0 to +15

Suppose we decide to use excess 7-coding

\* -1 to +14  $\Rightarrow$  0 to +14  
 ↓  
 not true      True exponent : -7 to +7

(only 15 mantissa)

excess 6-coding (add +6) -2 to +13  $\Rightarrow$  0 to 13

$E_{\text{true}}$  : -6 to +7

only this much range I can represent.

### IEEE Notation :-

Bits available for exponent : 8

→ Range : -128 to +127

IEEE has decided excess -127 coding

Range : -1 to +254

actual range : 0 to +254 (Not representing negative no.)

However, IEEE reserves 0 & 255 for some other purpose.

→ available range : +1 to +254

$E_{\text{true}}$  (-127) -126 to +127  $\Rightarrow$  True range available to us.

How can we represent 0,  $\infty$ , NAN (Not a no.) ??  
 ↳ not a real no.

Sign

Exponent

Mantissa

0

00000000

00.....0

Pos Zero

±

00000000

00.....0

Neg Zero

(Any machine using signed bit repn / 1's complement representation would need this)

0

1111111

00.....0

Pos. Infinity

1

"

"

Neg. Infinity

"

Non-zero no.

NaN Nan

→ If exponent e is represented in excess-p notation :

$$a \cdot 10^{(x+p)} * b \cdot 10^{(y+p)} = (a \cdot b) 10^{(x+y+2p)}$$

Ideally, it should have "x+y". But acc. to IEEE,  
it should have been "x+y+p".

But here, we have "x+y+2p". So, we have  
to subtract p to avoid Bias.

Eg.  $2.3516 \times 10^3$ 

$$= 2351.6$$

↓

 $4.25896 \times 10^2$ 

$$\downarrow = 425.896$$

↓

\* First, convert in binary form.

Excess P

Normalize

 $1.0 \times 2^{-}$  $1.0 \times 2^{-}$ 

S E+p M

- \* While shifting the mantissa, don't forget 1 before the decimal which we remove at the time of representation.

Guard Bits :-

1.011

x 1.101

larger mantissa but, only can store 3 bits  $\Rightarrow$  losing some digits

- $\Rightarrow$  Keeping some extra digits bits called Guard Bits

Floating Point Operations : FLOPs

Truncation / Rounding

- 1) Straight chopping
- 2) Von Neumann Rounding
- 3) Rounding (Used in IEEE)

If MSB of Guard bit : 1

$\Rightarrow$  1 is added to LSB of the retained bits.

20

25

30

## INSTRUCTION SET ARCHITECTURE (ISA)

- ↳ Each machine has its own ISA, which actually runs on the machine.



We may have our program in any language, the compiler will convert it in ISA & then, compiled into object module.

Object module :



After compiling, program is converted into Assembly language, & then into machine language.

→  $c = a + b$

|           |
|-----------|
| a → 4000  |
| b → 40004 |
| c → 4008  |

LOAD R1, 4000

LOAD R2, 4004

ADD R1, R2

STORE R2, 4008

→ ISA : Interface b/w software & hardware.

→ In Pentium 4, some registers are



32-bit  
EAX

3 byte data

16 bit of data is used

(earlier programs which use this much data are able to get executed)

Upgraded machine must be backward compatible.

### ① Memory Model :- ISA

Sometimes, these documents are available for everyone to see,  
sometimes not (Intel doesn't)

(assuming 32-bit memory model)



Now, if we have to store an integer, if we start storing at 1, if it is word addressable, we have to access 2 words & get the int. However, here, in byte addressable, it will be non-aligned memory.

non-aligned memory

advantage : saving space

disadvantage : involve complexity in accessing & decoding the data  
(can use in case of smaller programs)

Other option :- always start at word boundary



→ aligned memory

→ First you have to choose this. Then, byte addressable / word addressable

$$\rightarrow 1 \text{ MB} = 2^{20} \text{ bytes}$$

byte addressable  $\Rightarrow$  20 bits to store address  
Word " "  $\Rightarrow$  18 " "

- $\rightarrow$  Same memory, but no. of bits in address has reduced.  
Advantages ?? — ①

Byte addressable memory :

- $\rightarrow$  accessibility improved (advantage) } disadvantage of word addressable memory

If word addressable, we have to  
 transfer whole word to CPU & then  
shift & get it to lowest bit, so, additional  
work instructions have to be executed.

- ④ Some bits have to be given to address

Opcode Reg MLOC

$\downarrow$        $\downarrow$  to store address, bits have to be given to this.  
7      5      20 bits      Let total bits = 32

- $\rightarrow$  If there are 5 bits to store address of registers, max no. of registers  $= 2^5 = 32$  (R0 - R31)
- $\leftarrow$  Only 7 bits for opcodes, max. opcodes possible = 128 (not sufficient).

test )

0000000 → ADD Reg, Reg

0000001 → ADDI Reg, # N

0010000 → ADDM Reg, LOCA : adding value to registers, storing in regm

Usually not allowed in  
the processors  
CISC instruction }

: adding value directly from memory  
to register & store.

- $\rightarrow$  Op Code in all 3 cases will be diff. otherwise the decoding unit won't be able to perform properly

① ADD R3,  
0000000, 00  
1101101  
The decoder  
add 2 Reg  
& 2nd Reg  
in o/p r

② ADDI R

But if

0000000

The de  
ALU o

ALU

③ If c  
regis  
tho

24

30

31

① ADD R3, R5

0000000, 00011    00101  
 bits 7                5                5                X

: designing ISA

The decoding unit will find it is add<sup>n</sup> op<sup>n</sup> (1<sup>st</sup> 7 bits) which add 2 Reg. It will bring data from R3 (next 5 bits) to IR & data from R5 (next 5 bits) to 2nd IR & bring op in o/p register

② ADDI R3, # 1023 → Immediate    (Immediately available in the instruction, don't have to look for it in registers)

available : 20 bits

max no. we can give :  $2^{20}$

But if we need to store no upto  $2^{10}$ . (1024)

000000 | 00011    111111111  
 X

The decoding unit will get data from R3 (next 5 bits) to IR in ALU and directly send next 10 bits will get stored in IR in ALU

③ If o/p has to be sent back to memory, o/p from o/p register in ALU to MDR & add will be stored in MDR & then data will be sent to memory through data bus