



# D.Y. PATIL COLLEGE OF ENGINEERING

## DETAILED LECTURE NOTES

### UNIT III

Page No. ①

#### Central Processing Unit

CPU - Part of Computer that performs the bulk of data processing operations, is called CPU.

Three parts:

- ① ALU
- ② Registers
- ③ CU.



#### General Register organisation

- When large number of registers are included in the CPU, it is most efficient to connect them through a common bus system.
- The registers communicate with each other not only for direct data transfer but also while performing various micro operations.



$$\underline{R_3 + R_2} \rightarrow \underline{R_1}$$

1. MUX A selector - to select the content of R2 into Bus A.
2. MUX B selector (SEL B) to place the content of R5 into Bus B.
3. ALU operation selector: to provide the arithmetic addition.
4. Decoder destination selector (SEL D) to transfer the content of the output bus into R1.



# POORNIMA

## COLLEGE OF ENGINEERING

### DETAILED LECTURE NOTES

PAGE NO 2

Control word —

Encoding of Register Selection fields.

Binary code

000

001

010

011

100

101

110

111

SEL A

Input

R1

R2

R3

R3

R4

R5

R6

R7

R8

SEL B

Input

R1

R2

R3

R4

R5

R6

R7

SEL D

No A -

R1

R2

R3

R4

R5

R6

R7

DPR

Select

000000

000011

000100

001011

Operation

Transfer A

Increment A

Add A+B

Subtract A-B

Symbol

TSFA

INCA

ADD

SUB.

$R_2 - R_3 \rightarrow R_1$

Control word.

## STACK organization -

- STACK is LIFO in first out
- SP(stack pointer) - holds the address for the top of stack
- Two operations of stack are Insertion & deletion. (PUSH, POP)

→

## Register STACK

A stack can be placed in a portion of a large memory or it can be organised as a collection of finite no. of memory words. or Register.

- ⇒ SP Cleared to 0 EMPTY set to 1.
- ⇒ full is cleared to zero.

$$SP \leftarrow SP + 1$$

$$M[SP] \leftarrow DR$$

if ( $SP = 0$ ) then ( $full \leftarrow 1$ )

$$\quad \quad \quad \text{Empty} \leftarrow 0$$

    Mark the stack not empty

→ check if stack is full



$$DR \leftarrow [SP]$$

$$SP \leftarrow SP - 1$$

if ( $SP = 0$ ) then  
 $full \leftarrow 0$

$(empty \leftarrow 1)$  → check stack is empty

→ mark the stack not full



# POORNIMA

## COLLEGE OF ENGINEERING

### DETAILED LECTURE NOTES

PAGE NO. 3

#### MEMORY STACK

Assigning a portion of memory to stack operations and using a processor register as a stack pointer

$$\text{SP} \leftarrow \text{SP}-1$$

$$M[\text{SP}] \leftarrow DR$$

$$\text{DR} \leftarrow M[\text{SP}]$$

$$\text{SP} \leftarrow \text{SP}+1$$



Reverse Polish Notation  $\Rightarrow$   
Stack organization is very effective for evaluating arithmetic operations.

$$A * B + C * D \Rightarrow AB * CD * +$$

Infix notation

pre fix notation

Post fix notation

+ AB

AB +

$\rightarrow$  RPNC (reverse Polish notation)

(Conversion  
to RPN

$$(A+B) * [ C * (D+E) + F ].$$

$$\begin{aligned} &\Rightarrow (AB+) * [ C * (DE+) + F ] \\ &= (AB+) * [ 0 \ DE + C * F + ] \\ &= \underline{AB+ \ DE + C * F + *} \end{aligned}$$

### Evaluation

$$3 \times 4 + 5 \times 6.$$



### Instruction format :-

- ⇒ Computer has variety of instruction code formats.
- ⇒ It is the function of control unit within the CPU to interpret the instruction.
- ⇒ The bits of the instruction are divided into groups called fields. The common fields are
  - An operation code field that specifies the operation to be performed
  - An address field that designates a memory address or a processor register
  - A mode field that specifies the way the operand or the effective address is determined

Computers may have instructions of several different lengths containing varying number of addresses.

The number of address fields in the instruction format of a computer depends on the internal organisation of its registers.



# POORNIMA

COLLEGE OF ENGINEERING

## DETAILED LECTURE NOTES

① Single Accumulator organisation

Add X

Page No. 4

② general Register organisation

ADD R1, R2, R3.

③ STACK organisation

PUSH X

Three Address Instruction

ADD R1, A, B

ADD R2, C, D

MUL X, R1, R2

Two Address Instruction

MOV R1, A

ADD R1, B

MOV R2, C

One Address Instruction

LOAD A

ADD B

STORE T

Zero Address

PUSH A

PUSH B

## Addressing mode:-

The way of choosing operands during program execution is dependent on the addressing mode.

The control unit of a Computer is designed to go through an instruction cycle that is divided into three major phases.

1. fetch the instruction from memory.
2. Decode the instruction
3. execute the instruction.

### ① Implied Mode

Ex CMA

Operands are specified implicitly in the definition of instruction.

### ② Immediate Add. Mode

In this operands are actual data, rather than address field.

Ex MVI A, 30H.      A  $\leftarrow$  30H

### ③ Register mode: Operands are register within CPU

Add R1, R2                  R1 + R2  $\rightarrow$  R1

### ④ Direct Add Mode: The operand resides in memory and its address is given directly by the address field of the instruction.

Ex LXI B, 2000H

### ⑤ Indirect Add Mode - In this mode the address field of the instruction gives the address where effective address is stored in the memory

effective Add = add Part of Instruction + Content of CPU registers.



# POORNIMA COLLEGE OF ENGINEERING DETAILED LECTURE NOTES

PAGE NO. 5

## Relative Address Mode →

In this mode the content of the program counter is added to the address part of the instruction in order to obtain the effective address.

## Indexed Addressing Mode

Content of the index register is added to the address part of the instruction to obtain the effective address.

## Base Register Addressing Mode →

Content of base register is added to the address part of the instruction to obtain the effective address.

## Auto Increment / Decrement

This is similar to the register in direct mode except that the register is incremented or decremented after its value is used to access memory.

Data

# Data Transfer & Manipulation

Computer provides an extensive set of instructions to give the user the flexibility to carry out various computational tasks.

classified into three categories

- ① Data Transfer
- ② Data Manipulation
- ③ Program Control Instruction.

## ① Data Transfer

| Name     | Mnemonic | Mode              | Assembly Conversion | Register Transfer       |
|----------|----------|-------------------|---------------------|-------------------------|
| Load     | LD       | Direct            | LD ADR              | AC ← M[ADR]             |
| Store    | ST       | Indirect          | LD @ADR             | AC ← M[M[ADR]]          |
| Move     | Mov      | Relative          | LD \$ADR            | AC ← M[PCTADR]          |
| Exchange | XCHG     | Immediate         | LD #NBR             | AC ← NBR                |
| Input    | IN       | Register          | LD RI               | AC ← RI                 |
| Output   | OUT      | Register Indirect | LD (RI)             | AC ← M[RI]              |
| Push     | PUSH     | Auto Increment    | LD (RI)+            | AC ← M[RI]<br>RI ← RI+1 |
| Pop      | POP      |                   |                     |                         |

## Data Manipulation Instructions

- ① Arithmetic Instructions
- ② Logical & bit manipulation instructions.
- ③ Shift Instructions.

①



# POORNIMA

## COLLEGE OF ENGINEERING

### DETAILED LECTURE NOTES

PAGE NO. 6

#### Arithmetic Instruction

Increment

INC

Decrement

DEC

Add

ADD

Subtract

SUB

Multiply

MUL

Divide

DIV

Add with carry

ADDC

Subtract with borrow

SUBB

Negate C's complement

NEG.

#### ① Logical & Bit manipulation

Clear

- CLR

Complement

- COM

AND

- AND

OR

- OR

Exclusive-OR

- XOR

Clear Carry

- CLRC

Set

- SETC

Carry

- COMC

Next Carry

- -

|                   |    |
|-------------------|----|
| Enable Interrupt  | EI |
| Disable Interrupt | DI |

### ③ Shift operations:

|                                 |      |
|---------------------------------|------|
| logical shift right             | SHR  |
| logical shift left              | SHL  |
| Arithmatic shift right          | SHRA |
| Arithmatic shift left           | SHLA |
| Rotate Right                    | ROR  |
| Rotate Left                     | ROL  |
| Rotate Right Through Carry RORC |      |
| Rotate Left Through Carry ROLC  |      |

### Program control

|                  |        |
|------------------|--------|
| Branch           | - BR   |
| Jump             | - JMP  |
| skip             | - SKIP |
| call             | - CALL |
| Return           | - RET  |
| Compare          | - CMP  |
| Test (by Anding) | - TST  |

### Status Bits





# POORNIMA

## COLLEGE OF ENGINEERING

### DETAILED LECTURE NOTES

Conditional

Branch Instruction

PAGE NO. 7

BZ                    $Z = 1$

BNZ                   $Z = 0$

BC                    $C = 1$

BNC                  $C = 0$

BP                    $S = 0$

BM                    $S = 1$

BV                    $V = 1$

BNV                  $V = 0$

Un signed      Compare

Condition  
higher

signed      Compare Condition

BHI                  $A > B$

BGT                 greater than

BHE                  $A \geq B$

BGE                 greater or equal

BLD                  $A < B$

less or equal

BLT                 less than

BLOE                $A \leq B$

less or equal

BLE                 less or equal

BE                    $A = B$

equal

BE                   equal

BNE                  $A \neq B$

not equal

BNE                 not equal

$SP \leftarrow SP - 1$

$M[SP] \leftarrow PC$

$PC \leftarrow \text{effective Address}$

Sub routine

① Call

② Return

## Interrupts -

Program interrupt refers to the transfer of program control from a currently running program to another service program as a result of external or internal generated request.

⇒ PSW: collection of all status bits in the CPU is sometimes called a program word or PSW.

## Types

① External Interrupt - from timing device, input-output device, from power supply or from any other external source.

② Internal Interrupts. arise from illegal or erroneous use of an instruction or data. Internal interrupts are also called traps.

The difference b/w internal & external interrupt is that the internal interrupt is initiated by some exceptional condition caused by the program itself rather than by an external event.

③ Software Interrupt: it is a special GHI instruction that behaves like an interrupt rather than a subroutine call. It can be used by the programmer to initiate an interrupt procedure at any desired point in the program.

Ex Super visor call instruction.



# POORNIMA

## COLLEGE OF ENGINEERING

### DETAILED LECTURE NOTES

#### Reduced Instruction set Computer

PAGE NO. 8

The concept of RISC architecture involves an attempt to reduce execution time by simplifying the instruction set of the computer. The major characteristics of RISC processor are.

1. Relatively few instructions
2. Few Addressing Modes
3. memory Access limited to Load & Store instr.
4. All operations are done within the registers of the CPU
5. fixed Length, easily decoded instruction format
6. Single cycle instruction execution
7. Hardware rather than microprogrammed control.

#### Complex Instruction set Computer

Complex Instruction set Computer typically has 100 to 250 instructions.

- ① Large number of Instructions - typically from 100 to 250 instructions.
- ② Some instructions that perform specialised tasks and are used frequently.
- ③ A large variety of addressing modes.
- ④ Variable length instruction format
- ⑤ Instruction set that manipulates operands to memory.

## Pipe line and Vector processing

Parallel Processing is a term used to denote a long class of techniques that are used to provide simultaneous data processing task for the purpose of increasing the computational speed of a computer system.

→ The purpose of PP is to speed up the computing process and increase in throughput that is the amount of processing that can be accomplished during a given interval of time.

### M.J. Flynn classification -

The normal operation of a computer is to fetch instructions from the memory & execute them in processor. The sequence of instruction read from memory constitutes an instruction stream. The operations performed on the data in the processor constitutes an data stream. Parallel processing may occur in instruction stream, Data Stream or both.

- ① single instruction, Single data Stream(SISD)
- ② single instruction, multiple data streams (SIMD)
- ③ multiple instruction, multiple data streams (MIMD)
- ④ multiple instruction, single data stream (MISD)



# Poornima

## COLLEGE OF ENGINEERING

### DETAILED LECTURE NOTES

PAGE NO. 9

SISD — single computer containing a control unit, a processor unit and a memory unit.

- Parallel processing is achieved by pipelining.

SIMD — many processors under the supervision of a common control unit

- All processor receive same instruction but different data items.

MIMD — organisation refers to a computer system capable of processing several programs at the same time.

MISD — no practical system only theoretical concept

Three types of parallel processing

① Pipe line processing

② Vector processing

③ Array Processors

## Pipelining -

Pipelining is a technique of decomposing a sequential process into suboperations with each subprocess being executed in a special dedicated segment that operates concurrently with all other segments.

$$A_i \leftarrow B_i + C_i \quad \text{for } i = 1, 2, 3, \dots, n$$

Segment 1       $R_1 \leftarrow A_i$ ,  $R_2 \leftarrow B_i$       Input  $A_i \& B_i$

Segment 2       $R_3 \leftarrow R_1 \times R_2$ ,  $R_4 \leftarrow C_i$       multiply & input  $C_i$

Segment 3       $R_5 \leftarrow R_3 + R_4$       Add  $C_i$  to product

### clock pulse

#### Segment 1

|                         |                         |
|-------------------------|-------------------------|
| <u><math>R_1</math></u> | <u><math>R_2</math></u> |
|-------------------------|-------------------------|

1

|                         |                         |
|-------------------------|-------------------------|
| <u><math>A_1</math></u> | <u><math>B_1</math></u> |
|-------------------------|-------------------------|

2

|                         |                         |
|-------------------------|-------------------------|
| <u><math>A_2</math></u> | <u><math>B_2</math></u> |
|-------------------------|-------------------------|

3

|                         |                         |
|-------------------------|-------------------------|
| <u><math>A_3</math></u> | <u><math>B_3</math></u> |
|-------------------------|-------------------------|

#### Segment 2

|                         |                         |
|-------------------------|-------------------------|
| <u><math>R_3</math></u> | <u><math>R_4</math></u> |
|-------------------------|-------------------------|

|   |   |
|---|---|
| - | - |
|---|---|

|                  |       |
|------------------|-------|
| $A_1 \times B_1$ | $C_1$ |
|------------------|-------|

|                  |       |
|------------------|-------|
| $A_2 \times B_2$ | $C_2$ |
|------------------|-------|

|                  |       |
|------------------|-------|
| $A_3 \times B_3$ | $C_3$ |
|------------------|-------|

#### Segment 3

|   |   |
|---|---|
| - | - |
|---|---|

|                        |   |
|------------------------|---|
| $A_1 \times B_1 + C_1$ | - |
|------------------------|---|

|                        |                        |
|------------------------|------------------------|
| $A_2 \times B_2 + C_2$ | $A_1 \times B_1 + C_1$ |
|------------------------|------------------------|

|                        |   |
|------------------------|---|
| $A_3 \times B_3 + C_3$ | - |
|------------------------|---|

The speed up ratio due to pipelining is

$$S = \frac{n t_n}{(K+n-1) t_p}$$

$n t_n \rightarrow$  total time required for  $n$  task

to complete  $n$  task with  $K$  pipeline is  
 $(K+n-1) t_p$

if  $n$  is  
large

$$S = \frac{t_n}{t_p}$$

$$t_n = K t_p$$

$$S = \frac{K t_p}{t_p}$$

$$\boxed{S = K}$$



# Poornilma

COLLEGE OF ENGINEERING

## DETAILED LECTURE NOTES

Arithmetic pipe line  $\Rightarrow$

PAGE NO. 10

Used in very high speed computers for arithmetic operations of floating point.

$$X = A \times 2^a$$

$$Y = B \times 2^b$$

1. Compare exponent

2. Align the Mantissa.

3. Add or Subtract the Mantissa.

Normalise the result

$$X = 0.9504 \times 10^3 \quad \underline{3-2=1}$$

$$Y = 0.8200 \times 10^2$$

$$X = 0.9504 \times 10^3 \rightarrow \text{Align.} \\ Y = 0.0820 \times 10^3$$

$$Z = 1.0324 \times 10^3 \quad \text{Add.}$$

$$Z = 0.10324 \times 10^4 \quad \text{Normalisation}$$



# Instruction pipe line

- Pipeline processing can occur in Instruction Stream as well
- An instruction pipeline reads consecutive instruction from memory while previous instructions are being executed in other segment.
- A CISC require other than fetch & decode process of 1 instruction completely
- Some are
  - ① fetch the instruction from memory
  - ② Decode the instruction
  - ③ calculate the effective address
  - ④ fetch the operands from memory
  - ⑤ execute the instruction
  - ⑥ store the result in the proper place.



## RISC Pipeline

RISC is its ability to use an efficient pipelining because of fixed length instruction format, the decoding of operation can occur at the same time as the register selection.

All data manipulation instruction have the register to register operation, so need to calculate effective address or fetching of operand from memory. therefore Two or Three segment instruction pipeline can be used.

One segment fetches the instruction from memory, Second executes the instruction in ALU and third may be store of result into memory

### Three segment Instruction Pipeline

- I - Instruction fetch
- A - ALU operation
- E - Execute instruction

1. LOAD  $R_1 \leftarrow M[\text{Add}_1]$
2. LOAD  $R_2 \leftarrow M[\text{Add}_2]$
3. ADD  $\oplus R_3 \leftarrow R_1 + R_2$   
 $M[\text{Add}_3] \leftarrow R_3$
4. STORE





# POORNIMA COLLEGE OF ENGINEERING DETAILED LECTURE NOTES

PAGE NO. \_\_\_\_\_

FI → Fetch instruction

DA → Decode and calculate effective address

FO → Fetch the operand

Ex → Execute the instruction.

| Step | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 |
|------|----|----|----|----|----|----|----|----|----|----|----|----|
| 1.   | FI | DA | FO | Ex |    |    |    |    |    |    |    |    |
| 2.   |    | FI | DA | FO | Ex |    |    |    |    |    |    |    |
| 3.   |    |    | FI | DA | FO | Ex |    |    |    |    |    |    |
| 4.   |    |    |    | FI |    |    | FI | DA | FO | Ex |    |    |
|      |    |    |    |    |    |    |    | FI | DA | FO | Ex |    |
|      |    |    |    |    |    |    |    |    | FI | DA | FO |    |

In general three major difficulties in instruction pipeline

1. Resource conflicts.
2. Data dependency.
3. Branch difficulties.



# Poornima COLLEGE OF ENGINEERING DETAILED LECTURE NOTES.

## Vector Processing:

PAGE NO 12

- Many scientific Problem require arithmetic operations on large arrays of numbers.
- These numbers are usually formulated as vectors and matrices of floating point numbers.

Ex — Initialise  $I = 0$

Level 1: Read  $A(I)$

Read  $B(I)$

Store  $C(I) = A(I) + B(I)$

Increment  $I = I + 1$

if  $I \leq 100$  goto "Level 1"

Continue.

- To constitutes a program loop that reads a pair of operands from arrays A & B and perform floating point operation/ addition. The loop control variable  $I$  is then updated and the step repeat continues.
- A computer capable of vector processing eliminates the overhead associated with the time it takes to fetch & executes the instruction in the program loop.

$$C(1:100) = A(1:1w) + B(1:1w)$$

### Matrix Multiplication

$$\begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} \times \begin{bmatrix} b_{11} & b_{12} & b_{13} \\ b_{21} & b_{22} & b_{23} \\ b_{31} & b_{32} & b_{33} \end{bmatrix} = \begin{bmatrix} c_{11} & c_{12} & c_{13} \\ c_{21} & c_{22} & c_{23} \\ c_{31} & c_{32} & c_{33} \end{bmatrix}$$

$$c_{ij} = \sum_{k=1}^3 a_{ik} \times b_{kj}$$

### Applications

- Long range weather forecasting
- Petroleum explorations
- Seismic data Analysis
- Medical diagnosis
- Aerodynamics and space flight simulations.
- Artificial Intelligence & Expert systems
- Mapping the human genome
- Image Processing



# POORNIMA COLLEGE OF ENGINEERING DETAILED LECTURE NOTES

## Array Processors -

PAGE NO.

- Performing computation of large array of data

### Types

- An Attached array processor is an auxiliary processor attached to a general purpose computer.
- It is intended to improve the performance of host computer in specific numerical computation tasks.
- An SIMD array processor is a processor that has a single instruction multiple data organization.
- It manipulates vector instruction by means of multiple functional unit responding to a common instruction.

### Attached Array Processor



- The host Computer is general Purpose Computer and attached Processor is a back end Machine driven by the host Computer
- The Array Processor is Attached through an Input/OP Controlled to the Computer and treat it like an external Interface.
- The data from Main memory to local memory is transferred through high speed bus for Attached Processor.

### SIMD Array Processor



The processing units are synchronized to perform the same operation under the control of a control unit thus providing a single instruction stream, multiple data stream organization.

PE → Processing Element

M → Local Memory

END