

## Comp Architecture :-

Comp Arch is Comp + it's organisation

Comp has 5 basic components:

1) Processor — control unit + Data path  
Data path - data travelling from one hardware component to another.

3 types of data path : control bus, data and address bus.

whether data should go from memory to R or vice versa  
whether read/write operation should be performed is done by control bus.

2) Memory

3) I/O devices Input

4) Output

18101089 computer = combination of hardware and software

Computer Architecture : computer + it's instruction set organisation

(slide) Computer Architecture

→ Layer of Computer or relationship among software and hardware and instruction set.

Above green : software related

Below " : hardware . II

How we can go from low level to high and vice versa or relation of hardware/software with



Compiler  
is system  
software

instruction set is called computer arch.  
SAP :-

memory lower level ; instruction store

higher " , data store

You must know memory organisation  
As a CSE engineer, you have to consider cost + needs. (you need to have hardware knowledge).

→ In MIDS arch, it is assembly lang.

a[i], b[i] → array // content should be moved to register.  
c → variable stored in register

↳ 14 register

Arithmetic operations must involve register  
not memory in MIDS arch.

Read  $\rightarrow$  load word (lw) } In MIDS  
write  $\rightarrow$  store word (sw)

We need to know instruction format to convert high  
to low level.

Why?

We need to know about generations, OS, application,  
arch changes for technological changes.  
Depending on performance and technology, new  
generations of comp is developed.

Why?

1st parameter  $\rightarrow$  Performance

How to measure performance?

Time Comparison.

Performance (slide)

Ques 1:

Depending on the work and hardware requirements,  
diff processor have diff performance, your personal  
requirements matter.

Ques 2:

OS does resource allocation, (new machine)

For new student, new pc should be added,  
for faster performance, upgrade processor.



Role of performance :-  
parameters should be kept in mind.

$$\text{Response time} = \text{CPU time} + \text{execution time}$$

$$\text{Throughput} = \frac{\text{Total work load}}{\text{Time}}$$

In a program, how many instructions are there, and how much time is required to execute all instructions.

Relative performance - comparison with other systems.

(CPI - Clock per Instruction,

CPU clocking, instruction count - how many ins in a program.

→ more passenger + more speed = better performance.  
(slide) Gmp Performance :-

\* RT → Time it takes to execute ~~individual~~ program ins written by user.

→ how long you have to wait for database query.

CPU Time = OS allocation + Execution Time

Database Query - Collecting data from resource + time taken to run the data together is called response time.

Throughput - n ins execution response time is total workload + time

- \* throughput (Ans) new processor

- \* Response time (Ans) new machine

→ user CPU time concern more as when we're working for an application in an IDE or text editor and we compile and then we see output.

$P_x > P_y$  The lesser the execution time, the better performance.

$$\frac{1}{E_x} > \frac{1}{E_y}$$

$E_x < E_y$ , So X is better.

Q. Afia = 10 sec Tahmid = 15 sec  
performance better by how much?

ET of A = 10 sec Execution Time

" " T = 15 sec

$$P_{\text{of } A} = \frac{1}{10}, \quad \frac{P_A}{P_T} = n, \quad P_A > P_T$$

$$P_{\text{of } T} = \frac{1}{15}$$

$$n = 1.5$$

After 1<sup>st</sup> page:

- Every instruction has opcode for identification from instruction register.
- This identification is same for all instructions.
- If in a processor, fetch cycle requires 2 cycles, all instruction will require 2 cycles.
- Many processor accesses memory to perform operations and many processor uses register. We use register to perform operations.
- We do have to memory read/write, to read from memory and move to register, to write from register to memory, that's why memory reference  $\rightarrow M+R$ . requires. So the cycle will be diff here.
- Register requires less time. So register refs is faster as you don't need to access memory or find out the memory address.
- when programming, if you work with registers it'll be faster.
- Base cycle and Execution Cycle, when you'll have to implement in hardware, you have to keep in mind small things.
- Every block should be designed by you. Whether your processor is taking single or multi cycle for instruction execution. There is also pipelining, which is also parallel processing. And single multi cycle is sequential which means at a time the processor will execute multi instruction

(not multi cycle).

Cycle - single cycle Instruction will execute in single cycle.

" " " " " multi cycle.

~~No~~ Pipelining - At a time, multiple instruction will be executed.

Identification - whether the searched data is in cache memory or main memory (RAM)

Placement - if in cache, then can be accessed from cache. If not, if in main, then where will you place? What will be the strategy?

Replacement - If cache is full, which data should be replaced by new data? There are algo for this.

If in cache, it'll be faster. How you will identify hit rate? What are the parameters?

mid - 20% - 10% - viva - CO-wise  
 → when ques asked, must switch on camera.

## Chapter - 1

- 1) CPU Performance and it's factors
- 2) Instruction performance and it's factors

### Clock cycles:

Last lecture we have calculated performance based on execution time.

- In a program, how many instructions are present, (not in seconds), but in cycle time.
- How much time is required to run a program, is CPU execution time.

$$\textcircled{1} \quad \frac{\text{seconds}}{\text{program}} = \frac{\text{cycles}}{\text{program}} \times \frac{\text{seconds}}{\text{cycle}}$$

$$\text{CPU execution time} = \frac{\text{No. of cycles}}{\text{clock}} \times \text{cycle time}$$

- Cycle time = time between the ticks.
- Clock cycle time is inverse of clock rate.

1<sup>st</sup> performance equation.

If instead clock cycle time, clock rate is given.  
Inverse to find clock cycle time.

Perform inc, if clock cycle dec and clock rate inc  
Perform inc  $\frac{\text{time}}{\text{no. of cycles}}$

→ No. of cycles differs depending on the type of instruction.

→ Mul is one kind of addition.

→ Diff hardware design of mul and addition.

CPU

Execution time for A = 10 sec      clock rate = 400 MHz

" " " for B = 6 sec

No. of cycles for A =  $\frac{10}{400 \text{ MHz}} = 10 \times 400 \text{ MHz}$   
~~400 MHz~~ = 4000 M cycles

CPU time B =  $1.2 \times \text{clock cycles of A}$

clock rate B =  $1.2 \times 4000 \text{ M}$

$$\frac{1.2 \times 4000 \times 6}{= 800 \text{ MHz}}$$

clock rate inc, performance inc. (clock cycles dec, performance inc.)

Viva: What clock rate should we tell the designer to target?

To run the same program in 6 sec, B must have twice the clock rate of A.

Instruction Performance:

CPU clock cycles : instruction for a program  $\times$  avg. clock cycles per instruction (CPI)

(MIPS) - millions of ins per sec  
cycles per sec - clock rate

instructions per sec - ins rate

$$\text{Execution time} = \text{Ins} \times \text{avg CPI} \times \text{cycle time}$$

clock cycle replaced.

ISA - Ins set arch.

~~$$A \text{ c.c.t} = 10 \text{ ns} \quad \text{CPI} = 2.0$$~~

~~$$E.T(A) = I \times 2 \times 10 \text{ ns} = 20 \times 10^{-9} \text{ s}$$~~

~~$$E.T(B) = I \times 1.2 \times 20 \text{ ns} = 24 \text{ ns}$$~~

Machine B faster by  $24 \text{ ns} - 20 \text{ ns} = 4 \text{ ns}$ .

~~$$CPU \text{ C.C} = \text{Ins} \times \text{Avg CPI}$$~~

~~$$C.C(A) = I \times 2.0$$~~

~~$$C.C(B) = I \times 1.2$$~~

~~$$CPU \text{ time} = C.C \times \text{cycle time}$$~~

~~$$time(A) = I \times 2 \times 10 \text{ ns} = 20 \text{ Ins}$$~~

~~$$time(B) = I \times 1.2 \times 20 \text{ ns} = 24 \text{ Ins}$$~~

$$\frac{P_A}{P_B} = \frac{E_B}{E_A} = \frac{I \times 1.2 \times 20 \text{ ns}}{I \times 2 \times 10 \text{ ns}} \geq 1.2$$

A is faster

A is 1.2 times faster than B.

CPU performance  $\Rightarrow$  CPU time = Ins. count  $\times$   
 equation CPI  $\times$  c.c.t

(or)

CPU time = Ins. count  $\times$  CPI, as the clock  
 clock rate rate is inverse  
 of c.c.t

CPI Example II.

same machine, two code sequences

3 types of ins.: A, B, C

$$A = 1 \text{ cycle}$$

$$B = 2 \text{ "}$$

$$C = 3 \text{ "}$$

Instruction:

$$A = 2$$

$$B = 1$$

$$C = 2$$

$$A = 4$$

1<sup>st</sup> code

2<sup>nd</sup> code

$$B = 1$$

$$C = 1$$

1<sup>st</sup>

code

CPU time

$$S = 5 \times CPI$$

$$CPI = 1$$

$$\text{time} = 5 \times \cancel{1} \times \cancel{x}$$

18101089

$$\begin{aligned} \text{CPU C.C.} &= \sum_{i=1}^n (\text{CPI}_i \times C_i) \\ &= (1 \times 2) + (2 \times 1) + (3 \times 2) \\ &= 10 \text{ cycles} \end{aligned}$$

(c.s.2)

$$\begin{aligned} \text{CPU C.C.} &= (4 \times 1) + (1 \times 2) + (1 \times 3) \\ &= 9 \text{ cycles (faster)} \end{aligned}$$

$$\begin{aligned} \text{CPI (c.s.1)} &= \frac{10}{5} \\ &= 2.0 \end{aligned}$$

CPI

$$(\text{c.s.2}) = \frac{9}{6} = 1.5$$

$\therefore \text{C.P.I. less}$ ,

Performance will

be faster.

MIPS Example H.W

18101089

## Chapter - 3 Instructions : Language of the Machine

MIDS is a processor, micro comp without interloop pipelining states

MIPS - million ins per sec.

MIPS arithmetic ins[!] uses exactly 3 variables.  
that's why it is said to be restrictive.

3 operands : des, 1<sup>st</sup> source operand, 2<sup>nd</sup> " "  
register or register  
or operand.

→ There must be 3 operands.

If,  $A = x + y - z$  // assigned to register.

$$A = x + y - z ;$$

$\downarrow \quad \downarrow \quad \downarrow$

$S_0 \quad S_1 \quad S_2 \quad S_3$

there are 32 registers.

s - means same register.

- Add \$to, \$S<sub>1</sub>, \$S<sub>2</sub>; to - temporary register
- Sub \$S<sub>0</sub>, \$to, \$S<sub>3</sub>;

⇒ MIPS arithmetic provides restriction 3  
operands must be present, for simplification of  
hardware design.

4 design principle:

1<sup>st</sup> simplicity favours regularity.

→ every arithmetic ins format is same.

2<sup>nd</sup> Operands must be in registers (0-31)

from variables, high level to low level has to be converted. Data length, memory length, register length must be represented in bits.

→ smaller is faster. If you can perform all operations and work within 32 bits, it be faster.

→ The more hardware, the more clock cycle time.

→ So, smaller is faster.



register splits to memory if not enough registers.

memory organisation:  
(byte addressing) - 8 bit length for every address



base and offset = physical.

base + offset = "

segment + offset. = "

MIPS, word = 4 bytes or 32 bits

Load / Store Instructions:

→ both are memory reference instruction

load → memory to register (read)

store → R to m (write)

A [8] — memory's 8 no slot content

for MIPS



load → des, source

store → source, des

S3 - A [ ] base register

32 - offset ( $8 \times 4$ )

MIPS data 4 byte  
length is 32 bit which  
is 4 byte in byte  
addressing.

offset = index  $\times$  4

for MIPS  
(byte address)

$$A[12] = x + A[3] - y;$$

1<sup>st</sup> step: variables to be assigned to registers.

MIPS Assembly code: [25 - 20 marks in mid]

1<sup>st</sup> instructions:

formatting wrong - 0 marks

lw \$t0, 12(\$\$s<sub>0</sub>);

Add \$t0, \$s<sub>1</sub>, \$t0;

Sub \$t0, ~~\$s<sub>2</sub>~~ \$t0, \$s<sub>3</sub>;

sw \$t0, 48(\$\$s<sub>0</sub>);

→ Variables gets directly assigned to registers after compilation.

→ memory reference instructions should be read / write in register (Array).

memory read → load, memory write → store

$$A[2] = A[12] + B[3] + C[2];$$

lw \$t0, 48(\$\$s<sub>0</sub>);

lw \$t1, 12(\$\$s<sub>1</sub>);

lw \$t2, 8(\$\$s<sub>2</sub>);

Add \$t0, \$t0, \$t1;

Add \$t0, \$t0, \$t2;

sw \$t0, 8(\$\$s<sub>0</sub>);

$$A[i] = x - y - A[i+1] \text{ where } i=100.$$

$s_0 \leftarrow s_1 \leftarrow s_2 \leftarrow s_0 \leftarrow$

lw \$t\_0, 404(\$s\_0);  $A[100] = x - y - A[101];$   
 Sub \$t\_0, \$s\_2, \$t\_0;

Sub \$t\_0, \$s\_1, \$t\_0;

sw \$t\_0, 400(\$s\_0);

CSE 317 High level lang



Machine language:

There are 3 class of instruction format:

- 1) R-type      2) I-type      3) J-type
- (Register reference)      (Memory reference)      (Jump instruction, regular instruction)

1) R-type instruction format : (Arithmetic Instructions)

ALU has 2 parts

$\rightarrow$  AU (Arithmetic Unit)

$\rightarrow$  LU (Logical Unit)

$\rightarrow$  AU, LU has different functions, where is the to which function it's going, execution is taking place for which type of instruction or which type of operation

For Arith Ins, the format is 32 bit long.

|    |    |    |    |       |               |                |
|----|----|----|----|-------|---------------|----------------|
| OP | rs | rt | rd | shift | function code | 6 fields total |
|----|----|----|----|-------|---------------|----------------|

MIPS = 32

registers

① Operation code (length = 6 bits)

source 1 (rs)

Add  $\underline{\$t_0}$ ,  $\underline{\$s_1}$ ,  $\underline{\$s_2}$ ;

function  
defined

destination  
(rd)

Data length =

32 bit.

Every register is 5 bits.

| Op     | rs | rt | rd | shift | func. code | total = 32 bits |
|--------|----|----|----|-------|------------|-----------------|
| 6 bits | 5  | 5  | 5  | 5     | 6 bits     |                 |

① Operation code

② 1st source operand or source register

③ 2nd source operand or register

④ Destination operand or register

⑤ shift amount

⑥ specific arithmetic function

Diff between func. code and op code?

→ By looking at OP code, you can decide it is arithmetic operation or not.

→ If arithmetic, in arith which particular func that will be selected by func code.

→ In ALU,  $S_2 \rightarrow$  Mode selection, AU or LU

→ Every ins. 32 bits long.

$A = B + C$ ; write machine code for this ins

so,  $S_0 = 0$ ,  $S_1 = 1$ ,  $S_2 = 1$  (OP code for arithmetic ins  $\approx 0$ )

function code = 0 in 6 bits = 000000 // R-type  
function code = 35 for add ins. OP code

register no.  $\{ S_0 = 16$  // No need to memorise  
 $S_1 = 17$  OP code or register no.  
 $S_2 = 18$  Technically assume reg no.

Op code and func code in 6 bits in 5 bits.

In format,  
// assigning

|   |    |    |    |   |    |
|---|----|----|----|---|----|
| 0 | 17 | 18 | 16 | 0 | 35 |
|---|----|----|----|---|----|

Add \$ \$ \$ , \$ \$ \$ , \$ \$ \$ ;  
rd rs rt

In case of  
add sub,  
shift = 0.

| ① Assigning | ② Represent in binary machine code. |       |       |       |           |
|-------------|-------------------------------------|-------|-------|-------|-----------|
| OP          | rs                                  | rt.   | rd    | shift | func code |
| 0000000     | 10001                               | 10010 | 10000 | 00000 | 100011    |

For sub, same reg can be used except sub op code needed. sub's func code needed / op code will be same.

Op code - R, I, J type Func code - Add, Sub, inc, dec for R-type.  
→ helps to identify or convert low level to high level.

$$A = D + E + F ;$$

$\downarrow s_0 \quad \downarrow s_1 \quad \downarrow s_2 \quad \downarrow s_3$

Add \$t0, \$s1, \$s2;  
Add \$s0, \$t0, \$s3;

1 full set question from this topic

2 statement, 2 machine code.

2) I-type instruction.       $\xrightarrow{\text{cons}} \xrightarrow{\text{base}}$   
 $\xrightarrow{\text{lw}} \xrightarrow{\text{des.}}$

sw \$t0, 12(\$s0);      high level  $\rightarrow A[3] = B + A[8]$



Eg:-  $lw = 42, sw = 45 \}$  op code.  $t0 = 8, s0 = 16$

|    |    |    |   |    |
|----|----|----|---|----|
| i) | 42 | 16 | 8 | 32 |
|----|----|----|---|----|

|   |        |       |       |  |
|---|--------|-------|-------|--|
| = | 101010 | 10000 | 01000 |  |
|---|--------|-------|-------|--|

|     |    |    |   |    |
|-----|----|----|---|----|
| ii) | 45 | 16 | 8 | 12 |
|-----|----|----|---|----|

000000000000100000

|        |       |       |  |                     |
|--------|-------|-------|--|---------------------|
| 101101 | 10000 | 01000 |  | → 00000000000001100 |
|--------|-------|-------|--|---------------------|

Example:

$s_0 \leftarrow A[3] = X + B[100] - Y$ ; Convert this high level statement into MIPS Assembly machine code.

$lw \$t_0, t_0 = 8, s_0 = 16, s_1 = 17, s_2 = 18,$   
 $s_3 = 19, Add = 32, Sub = 35,$   
 $lw = 42, sw = 45.$

$lw \$t_0, 400(\$s_2);$

$Add \$t_0, \$s_1, \$t_0;$

$Sub \$t_0, \$t_0, \$s_3;$

$sw \$t_0, 12(\$s_0);$

MIPS Machine code:

|    |    |   |     |   |        |
|----|----|---|-----|---|--------|
| 42 | 18 | 8 | 400 | → | binary |
|----|----|---|-----|---|--------|

|   |    |   |   |   |    |          |
|---|----|---|---|---|----|----------|
| 0 | 17 | 8 | 8 | 0 | 32 | → binary |
|---|----|---|---|---|----|----------|

|   |   |    |   |   |    |          |
|---|---|----|---|---|----|----------|
| 0 | 8 | 19 | 8 | 0 | 35 | → binary |
|---|---|----|---|---|----|----------|

|    |    |   |    |          |
|----|----|---|----|----------|
| 45 | 16 | 8 | 12 | → binary |
|----|----|---|----|----------|

Overview: shift left or right

- i) SLL, SLR → R
- ii) Addi, Subi → add immediate → I      conditional
- iii) b eq, bne → branch equal, " not " → Jump → I
- iv) AND, OR, NOT... → logical operation → R
- v) Jump → unconditional Jump → J

list of registers

J Exit;

J Address → Label;

opcode  
program counter → counts ins and indicates next ins.

Format of J-type:

OP | 17 Address | 2 is fixed for Jump

|   |  |
|---|--|
| 2 |  |
|---|--|

17 6 bits 26 bits

SLL = 0 SLR = 2

SLL \$S<sub>1</sub>, \$S<sub>2</sub>, 4;  
rd      rt

$$S_1 = S_2 \ll 4$$

|    |    |    |    |   |   |
|----|----|----|----|---|---|
| 0  | 0  | 18 | 17 | 4 | 0 |
| rs | rt | rd |    |   |   |

|     |                  |    |                  |   |    |
|-----|------------------|----|------------------|---|----|
| 0   | 0                | 18 | 17               | 3 | 2  |
| str | \$S <sub>1</sub> | ,  | \$S <sub>2</sub> | , | 3; |

Book 2.4 Representing Instructions in the Computer.

→ Binary to Hex and Back.

→ OP code → 0 (R), 2 (J), Value (I)

Design principles explained in books.

Loop is unconditional Jump (J-type)

Pg :- 78 MIPS machine language table.

Figure 2.18 MIPS register conventions.

Pg : 101 See before exam.

CT Syllabus : 1) Intro 2) Performance  
3) Lang of Comp → Instructions

$R \rightarrow \text{str} \rightarrow 2 \text{ bits}$     $R \rightarrow S \text{ } 11 \rightarrow 3 \text{ bits}$

|   |   |   |   |   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 1 | 6 | 1 | 7 | 1 | 2 | / | 2 |
|---|---|---|---|---|---|---|---|---|---|---|

|   |   |   |   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 1 | 6 | 1 | 7 | / | 3 | 0 |
|---|---|---|---|---|---|---|---|---|---|

Ques type : Performance math, short ques, problem solving  
high level to low level statement, low level to high level conversion.

Definition : Comp Arch, Ins set, Comp layer, abstraction.

Answer to Question 1 :-

Comp A

Clock cycle time = 250 ps   CPI = 1.8

Comp B

Clock cycle time = 500 ps   CPI = 1.2

For Comp A :

CPU clock cycle = Instructions  $\times$  Avg CPI

$$= I \times 1.8$$

CPU execution time =  $\frac{\text{clk}}{\text{cycle}} \times \text{cycle} \times \text{clk cycle time}$

$$= I \times 1.8 \times 250 \text{ ps}$$

$$= I \times 450 \text{ ps}$$

For Comp B :

"

$$= " \times "$$

$$= I \times " \times 1.2 \times "$$

$$= I \times " \times 1.8 \times "$$

$$500 \text{ ps} = I \times 900 \text{ ps}$$

$$\frac{P_A}{P_B} = \frac{E_B}{E_A} = \frac{2 \times 900 \text{ ps}}{2 \times 450 \text{ ps}}$$

$$= 2$$

~~A is faster~~ A is 2 times faster than B.

Answer to Question 2:

3 types of instructions:

1) R-type

2) I-type

3) J-type

1) R-type

|    |    |    |    |       |               |
|----|----|----|----|-------|---------------|
| OP | rs | rt | rd | shift | function code |
|----|----|----|----|-------|---------------|

2) I-type

|    |    |    |        |
|----|----|----|--------|
| OP | rs | rt | offset |
|----|----|----|--------|

3) J-type

|    |         |
|----|---------|
| OP | address |
|----|---------|

Answer to Question 3 :-

My ID = 18101089

$$t_0 = 17, S_1 = 18, S_3 = 19 \\ S_0 = 20$$

$$x = 89 \quad \begin{matrix} \nearrow s_1 & \nearrow s_0 \\ A[89] = 1 + A[32] - j; & \nearrow s_3 \end{matrix}$$

① lw \$t0, 128(\$S0);

② Add \$t0, \$S1, \$t0;

③ Sub \$t0, \$t0, \$S3;

④ sw \$t0, 356(\$S0);

①

|    |    |    |     |
|----|----|----|-----|
| 35 | 20 | 17 | 128 |
|----|----|----|-----|

②

|   |    |    |    |   |    |
|---|----|----|----|---|----|
| 0 | 18 | 17 | 17 | 0 | 32 |
|---|----|----|----|---|----|

③

|   |    |    |    |   |    |
|---|----|----|----|---|----|
| 0 | 17 | 19 | 17 | 0 | 34 |
|---|----|----|----|---|----|

④

|    |    |    |     |
|----|----|----|-----|
| 43 | 20 | 17 | 356 |
|----|----|----|-----|

17 18 19 17 0 34

|   |   |   |   |   |   |
|---|---|---|---|---|---|
| 1 | 7 | 1 | 8 | 1 | 9 |
|---|---|---|---|---|---|

17 18 19 17 0 34

|   |   |   |   |   |   |
|---|---|---|---|---|---|
| 1 | 7 | 1 | 8 | 1 | 9 |
|---|---|---|---|---|---|

17 18 19 17 0 34

|   |   |   |   |   |   |
|---|---|---|---|---|---|
| 1 | 7 | 1 | 8 | 1 | 9 |
|---|---|---|---|---|---|

(4)

|    |    |    |     |
|----|----|----|-----|
| 43 | 18 | 17 | 356 |
|----|----|----|-----|

=

|        |       |       |                  |
|--------|-------|-------|------------------|
| 101011 | 10010 | 10001 | 0000000101100100 |
|--------|-------|-------|------------------|

Arithmetic for Computers:

## Chapter 3

\* watch lecture from beginning

ALU's base is Adder. Ins executions takes place with the help of adders.

→ 2 represent by doing 2's complement.

$$A - B = A + B' + 1$$

(slides skipped)

→ MUX - "input" signals selection (multiple input single output). ( $2^n : 1$ ) If 2 unit input one 32 bit output.

→ MIPS data bus length - 32 bits.

→ HA can calculate single bit. When you work with more than one bit, you'll need FA( $n$  bits).

→ In HA, Carry in doesn't work as input

16 no slide, not needed much for exam

But when we study mul, all will be needed.

→ For mul in MIPS, there are 3 versions. Update for faster mul. 4th version for unsigned number also.

m bits  $\times$  n bits  $\geq$  m+n bits product

ver-2

$M_n \leftarrow 00110 \times 00100 \rightarrow M_x$   
6x4 using 5 bits multiplier

|   | step                                                                                              | $M_x$ | $M_n$ | Product     |
|---|---------------------------------------------------------------------------------------------------|-------|-------|-------------|
| D | 1. Test $m_x = 0$<br>2. NOP<br>3. $\overrightarrow{P}$<br>4. $\overrightarrow{M_x}$               | 00100 | 00110 | 00000 00000 |
| 1 | 1. Test $m_x = 0$<br>2. NOP<br>3. $\overrightarrow{P}$<br>4. $\overrightarrow{M_x}$               | 00010 |       | 00000 00000 |
| 2 | 1. Test $m_x = 1$<br>2. $P_L = P_L + M_n$<br>3. $\overrightarrow{P}$<br>4. $\overrightarrow{M_x}$ | 00001 |       | 00110 00000 |
| 3 | 1. Test $m_x = 0$<br>2. NOP<br>3. $\overrightarrow{P}$<br>4. $\overrightarrow{M_x}$               | 00000 |       | 00011 00000 |
| 4 | 1. Test $m_x = 0$<br>2. NOP<br>3. $\overrightarrow{P}$<br>4. $\overrightarrow{M_x}$               | 00000 |       | 00001 10000 |
|   |                                                                                                   |       |       | 00000 11000 |

Ver-1 of Mul:  $M_x = 32$  bits  $M_n$  and  $P = 2 \times 32$  bits (64)

$$M_x = M_0 M_1 \dots M_{31}$$

1. LSB of  $M_x = 0$ , then step 2

a) " " = 1, Add  $P = M_n + P$ , then step 2 ( $M_n$ )

$$\begin{array}{c} \xleftarrow{M_n} \\ 2. \quad \xrightarrow{M_x} \\ 3. \quad \xleftarrow{M_n} \end{array}$$

$$M_x = n \text{ bits} \quad P_{\text{prod}} = M_n = 2 \times n \text{ bits}$$

e.g.: - Multiplier = 4 bits (4 iterations)

$$M_n \rightarrow 0101 \times 0011 = 15 (1111)$$

| Iteration | steps                                                                                     | (8)<br>$M_n$ | (4)<br>$M_x$ | $P$ (8 bits) |
|-----------|-------------------------------------------------------------------------------------------|--------------|--------------|--------------|
| 0         | Initialisation                                                                            | 0000<br>0101 | 0011         | 0000 0000    |
| 1         | 1. $M_{x_0} = 1$<br>1.a) $P = P + M_n$<br>2. $\xleftarrow{M_n}$<br>3. $\xrightarrow{M_x}$ | 0000<br>1010 | 0001         | 0000 0101    |
| 2         | 1. $M_{x_0} = 1$<br>1.a) $P = P + M_n$<br>2. $\xleftarrow{M_n}$<br>3. $\xrightarrow{M_x}$ | 0001<br>0100 | 0000         | 0000 1111    |
| 3         | 1. $M_{x_0} = 0$<br>1.a) NOP<br>2. $\xleftarrow{M_n}$<br>3. $\xrightarrow{M_x}$           | 0010<br>1000 | 0000         | 0000 1111    |

|   |                                                                                        |                    |         |                             |
|---|----------------------------------------------------------------------------------------|--------------------|---------|-----------------------------|
| 4 | 1. $M_{x_0} = 0$<br>1. a) NOP<br>2. $\overleftarrow{M_n}$<br>3. $\overrightarrow{M_x}$ | 0 1 0 1<br>0 0 0 0 | 0 0 0 0 | 0 0 0 0 1 1 1<br>Ans:- (15) |
|---|----------------------------------------------------------------------------------------|--------------------|---------|-----------------------------|

- Product register one side always unused, so every bit generations requires power consumption less bits used, faster operations performed.
- It requires more CLK cycles.

Ver - 2 LSB of  $M_x$

$$\textcircled{1} \quad P(L)$$

$$= P(L) + M_n \text{ if } 1$$

$$M_n = M_x = 32 \text{ bits}$$

$$\textcircled{2} \quad \overrightarrow{P}$$

$$P = 64 \text{ bits}$$

$$\textcircled{3} \quad \overrightarrow{M_x}$$

$$M_n \text{ and } M_x = n \quad P = 2 \times n$$

Ver - 3

$$M_n = M_x = n \text{ bits}$$

$$P = 2 \times n \text{ bits}$$

① Test lsib of  $P$  = step 2 if 0

$$\text{iteration} = M_x \text{ bits}$$

$$\textcircled{2} \quad \overrightarrow{P}$$

$$= P(L) = P(L) + M_n \text{ if } 1.$$

$$P(R) = M_x$$

No. ③ step.

$$M_n = n$$

$$P = 2 \times n$$

Motivating Booth's Algo : (for unsigned number mul)

→ Diff between arithmetic right shift and logical right shift.

→ arithmetic - msb will be copy of previous bit.

If  $M_x = n$ , iteration  $n$ ;

Till Mul for Mid.

Ver 3

5 x 5 using multiplier as 5 bits

| iteration | steps                                                                         | Mn    | P                                                          |
|-----------|-------------------------------------------------------------------------------|-------|------------------------------------------------------------|
| 0         | init                                                                          | 00101 | $P_L$<br>00000<br>$P_R$<br>00101                           |
| 1         | 1. $P_0 = 1$<br>1a) $P_{(L)} =$<br>$P_{(L)} + Mn;$<br>2. $\overrightarrow{P}$ | 00101 | $00101$<br>$00101$<br>$\rightarrow 00010 \ 10010$          |
| 2         | 1. $P_0 = 0$<br>1a) NOP<br>2. $\overrightarrow{P}$                            | 00101 | 00001<br>01001                                             |
| 3         | 1. $P_0 = 1$<br>1a) $P_{(L)} =$<br>$P_{(L)} + Mn;$<br>2. $\overrightarrow{P}$ | 00101 | $\Rightarrow 00110 \ 01001$<br>$\Rightarrow 00011 \ 00100$ |
| 4         | 1. $P_0 = 0$<br>1a) NOP<br>2. $\overrightarrow{P}$                            | 00101 | 00001<br>10010                                             |
| 5         | - 1. $P_0 = 0$<br>1a) NOP<br>2. $\overrightarrow{P}$                          | 00101 | 00000<br>11001                                             |

# Booth's multiplier (+/-)



$3 \times (-7) \rightarrow 5$  bits  
 $M_n \leftarrow M_n$  should be -ve

$$A + B' + 1$$

$$= A - B$$

$$M_n = 00011$$

$$M_n' = 11100$$

$$-M_n = \underline{+1}$$

$$11101 \rightarrow -M_n$$

$$\begin{aligned} 1. & NOP \\ 2. & P_{(L)} = P_{(L)} - M_n; 10 \\ 3. & P_{(L)} = P_{(L)} + M_n; 01 \end{aligned}$$

$$\begin{array}{r} -7 \rightarrow 00111 \\ 11000 \\ \hline 11001 \end{array}$$

$$11001 = -7$$

| iteration | steps                                                                                       | $M_n$                | P · extra bit                              |
|-----------|---------------------------------------------------------------------------------------------|----------------------|--------------------------------------------|
| 0         | initialisation                                                                              | 00 <sup>3</sup> 0.11 | 000000 110010 -7                           |
| 1         | 1. $P_0 P_{0-1} = 10$<br>1a) $P_{(L)} = \overline{P_{(L)}}$<br>2. $M_n; \overrightarrow{P}$ | 00011<br>00011       | 11101 110010<br>11110 111001               |
| 2         | 1. $P_0 P_{0-1} = 01$<br>1a) $P_{(L)} = P_L + M_n;$<br>2. $\overrightarrow{P}$              | 00011                | 00001 111001<br>00000 111100               |
| 3         | 1. $P_0 P_{0-1} = 00$<br>1a) NOP<br>2. $\overrightarrow{P}$                                 | 00011                | 00000 011110                               |
| 4         | 1. $P_0 P_{0-1} = 10$<br>1a) $P_{(L)} = P_{(L)} - M_n;$<br>2. $\overrightarrow{P}$          | 00011                | 11101 011110<br>$\rightarrow 11110 101111$ |

5

$$\begin{aligned} 1. P_0 P_{0-1} = & 1 \\ \text{1a) } N \oplus & \end{aligned}$$

 $\xrightarrow{2.P}$ 

00011

11111

01011 

Result

 $21 \rightarrow 0000010101$ 

1111101010

1111101011

4 Ques 1 will OR Q4  $\rightarrow -21$ 

1, 2 mandatory