

$$⑭ (+15) \times (-13) \quad 13 \rightarrow 01101$$

$$BR = 01111$$

$$\overline{BR+1} = 10001$$

$$10000$$

$$-13 \rightarrow 10010$$

$$-13 \rightarrow 10011$$

$$QR = 10011$$

| In Out |                          | AC                   | QR    | $\theta_{out}$ | SC  |
|--------|--------------------------|----------------------|-------|----------------|-----|
|        | initial                  | 00000                | 10011 | 0              | 101 |
| 10     | Subtract BR              | 10001                |       |                |     |
|        | ashr                     | 10001                |       |                |     |
|        | <del>ashr(ACC, BR)</del> | 11000                | 11001 | 1              | 100 |
| 11     | ashr                     | 11100                | 01100 | 1              | 011 |
| 01     | Add BR                   | 01111<br>+ 01011     |       |                |     |
|        | ashr                     | 00101                | 10110 | 0              | 010 |
| 00     | ashr                     | 00010                | 11011 | 0              | 001 |
| 10     | Subtract BR              | 10001                |       |                |     |
|        |                          | 10011                |       |                |     |
|        | ashr                     | 11001                | 11101 | + 000          |     |
|        |                          |                      |       |                |     |
|        |                          | 00110.00010          |       |                |     |
|        |                          | 0011000011           |       |                |     |
|        |                          | ← 128 64 32 16 8 4 2 |       |                |     |
|        |                          |                      |       |                | 195 |

$j^k$ ,  $k$ -bit adders to produce a product of  $j^k$  bits.

Consider  $j=3$ ,  $k=4 \rightarrow$  Multiplier bits  
 $j \times k = 12$  AND gates.

$j-1 = 2$  four-bit adders

$k=4$ ,  $j=3$  4 bit by 3 bit multiplier



~~2+~~

→ It is a  $4 \times 3$  multiplier in which,  $2^{(j-1)}$  4-bit adders do produce a product of  $4+3=7$  bits output.

### \* Division Algorithms:

→ Algorithm for division of signed magnitude numbers

Nr → Dividend  
Dv → Divisor

→ Dividend is stored in A & Q

Divisor is in B

Re

→ After division,

Remainder is contained in A

Quotient is contained in register Q

→ The sign of remainder is same as the sign of dividend

→ The sign of quotient will be +ve if dividend and divisor have same sign and -ve if they have diff. u.

Sign of divisor is in B<sub>5</sub>  
a u dividend = a A<sub>5</sub>

## Flowchart

Provide operation



Eg: 248

B- 17

B = 0001

B-41-09111

- The divide quotient overflow occurs when
  - $A \geq B$ . quotient will indicate divide overflow.
  - 'n' indicates no. of bits in quotient including sign bit.
  - E indicates register A with  $E=FF$ . (If carry is generated, it goes to E)
  - Divide overflow FF will be set to 1 when there is overflow.
- Again Adding B after subtracting B is called restoring remainder.

| <u>Initial</u>              | <u>E</u> | <u>A</u> | <u>Q</u> | <u>SC</u> |
|-----------------------------|----------|----------|----------|-----------|
| Initial                     |          | 01110    | 00000    | 5         |
| shl E+Q                     | 0        | 11100    | 00000    |           |
| add B-1                     |          | 01111    |          |           |
| $E=1$                       | 1        | 01011    | 00001    |           |
| Set $Q_n=1$                 | 1        | 01011    | 00001    | 4         |
| shl E+Q                     | 0        | 10110    | 00010    |           |
| add B-1                     |          | 01111    |          |           |
| <del><math>E=1</math></del> | 1        | 00101    |          |           |
| Set $Q_n=1$                 | 1        | 00101    | 00011    | 3         |
| shl <del>E+Q</del>          | 0        | 01010    | 00110    |           |
| add B-1                     |          | 01111    |          |           |
| $E=0$ , leave $Q_n=0$       | 0        | 11001    |          |           |
| Add B, leaving $Q_n=0$      |          | 10001    |          |           |
| Restoring remainder         |          | 01110    |          | 2         |

|                | <u>E</u> | <u>A</u>            | <u>Q</u>            | <u>Sc.</u> |
|----------------|----------|---------------------|---------------------|------------|
| shl EA         | 0        | 10100               | 01100               |            |
| Add BT1        | 1        | <u>00011</u>        |                     |            |
| E=1            | 1        | 00011               | 01101               |            |
| set Qn=1       | 0        | 00110               | 11010               |            |
| shl EA         | 0        | <u>01111</u>        |                     |            |
| Add BT1        | 0        | <u>10101</u>        |                     |            |
| E=0 leave Qn=0 | 0        | 10001               |                     |            |
| Add B.         | 1        | <u>00110</u>        | 11010               | ①          |
|                |          | 6<br>↓<br>remainder | 26<br>↓<br>quotient |            |

### Floating-Point Arithmetic Operations:



Fig: Register for floating-point arithmetic operations

B<sub>s</sub> — sign of mantissa

B — magnitude of mantissa

b — exponent

Together it is called BR.

Similarly 'A', 'Ac' and 'a' indicates Ac

## Mapping functions!

### Direct Mapping:



Tag Direct mapped cache

Here, each block contains 16 words.

$$\text{Total no. of words} = 4096 \times 16$$

= 64 K locations

$$= 64 \times 1024 \\ = 2^6 \times 2^{10}$$

∴ 16 address bits are required to access  
64 K locations.

Cache memory contains 128 blocks

$$\therefore \text{No. of words} = 128 \times 16$$

$$= 2048 \text{ words} \\ = 2^10 \text{ blocks}$$

Block 1 of main memory is placed in block 'j modulo 128' of cache memory

Block 0, 128, 256... of main memory is placed in block 0 of cache

likewise, Block 1, 128, 256... in block 1...

→ 16 bits in the figure indicates address of word of main memory.

→ 7 bits contains the block of cache memory where the word may be present.

→ Tag bit indicates which block of main memory is contained in block of cache (0, 128, 256)

→ If the tag bit in main memory matches with " " " in cache memory, then it indicates that the required word is contained in cache memory (It is brought into cache if it is not present.)

### Associative mapping:

Any block of main memory can be placed in cache

Main memory address indicates the word of which is to be accessed.

Word is the data stored in a location.

→ Tag bits of block of cache memory indicate

which block of main memory is contained in that block.

→ If the tag bits of main memory matches with the tag bits of cache memory, it indicates that the word is present in cache memory.



Fig. Associative mapped cache.

→ When new block is to be brought into cache memory (from main memory), one block of cache memory should be removed when the

cache is full.

→ Algorithms which decides which ~~of~~ block of cache memory is to be replaced when the cache is full are called replacement algorithms.

→ One of the algorithm is LRU (Least Recently Used) block (which is accessed long back.)

→ Another algorithm is oldest algorithm (where the oldest block is removed.)

→ 4-bits in main memory address indicates which word in the block (16 words in a block) is to be accessed.

### Set-Associative Mapping:



Fig: Set associative mapped cache memory with two blocks per set.

→ In this mapping, in cache memory, some no. of blocks are taken as a set.

Here, each set contains 2 blocks

$$128 \text{ blocks} \Rightarrow 64 \text{ sets}$$

Block 0, 64, 128, ..., 4096 are placed in set 0 of cache memory (Block 0 on block 1)

→ 16 bit main memory address is divided into 3 (parts) fields

6 bits ( $2^6 = 64$ ) are used to indicate set no; set field.

→ In the corresponding set (e.g. set 0) the tag bits which matches ~~Indicates~~ with the tag bits in main memory add. is the required word (Block 0 on block 1)

### Read only Memory:

RAM (Random Access Memory) → we can read or write.

ROM can only read.

### Types of ROM:

Volatile → information is lost when the power is off

## Virtual Memories



Fig Virtual memory organisation

- Initially, files will be stored in disk.
- To execute that file, it should be brought into main memory. For faster access, cache memory is used i.e. data from main memory is transferred to cache.
- Transfer b/w disk and main memory is done in terms of pages where each page contains 2K - 16K records.
- Initially, the address generated by processor is called **virtual address**. It contains page number.

and offset. Page num indicates in which page required word is contained, offset indicates Pn where location u

→ Memory Management Unit (MMU) converts Virtual address into Physical address

→ Using physical address, required word of main memory address. It contains address of page in the main memory

Eg. If virtual gives page 0

location of page 0 is given by physical address.

→ Physical address contains page frame and offset. Page frame indicates the starting location of page in the main memory and offset indicates the location of word in page

\* Address translation:

(It is the process of converting virtual address into physical address)



→ Page table contains the address of each page

in main memory

→ Page table base register contains the starting address of the page table.

Eg: If page no. is 2, 2 is added to page table base register value and it gives the

address of the page is in main memory

(which is in page table)

→ Page Table also contains control bits which indicates the corresponding current status of the page. Status indicates whether the page is actually existing & in the main memory or not. By, the control bits it's whether the page has been modified or not in main memory.

→ The location bits of page tables gives the address of page in memory i.e. page frame.

→ Starting address of page in main memory is called page frame.

Eg. If page frame = 1000, offset = 4  
word in loc 904 is accessed.

→ Page table is present in main memory.

→ Conversion of virtual to physical address is done by MMU.

→ As page-table is contained in main memory, accessing time of virtual address to physical address will be more.

In order to reduce the time to convert from virtual to physical address, a portion of the page table is placed in MAMU in the cache memory. The portion of the page table which is contained in MAMU is called TLB

(Translation Lookaside Buffer)

Virtual Memory:



TLB contains page-table entries of recently accessed pages.

Eg: If TLB contains page 1, 4, 5, then virtual page num will be 1 or 4 or 5.

→ Virtual page num in virtual add. is compared with virtual page num in TLB.

If it matches with virtual page num, the required info. is contained in TLB.

Or, if it is not in TLB and has to be taken from page table.

TLB contains only recently accessed pages.

~~21/9/17~~ Pipelining and Vector processing:





Fig Multiple functional units in CIC

(P<sub>1</sub>, P<sub>2</sub>, P<sub>3</sub>, P<sub>4</sub>)

Here 4 floating pt. no's are required to achieve the speed through pipelining

#### Arithmetic pipeline:

Here, adding floating pt. no's are added or subtracted using pipelining.

Adding / subtraction of float floating point no's is divided into 4 suboperations.

1) compare the exponents.

2) Align the mantissa (shifting right the mantissa by increasing the lower exponent to make both the exponents equal).

3) add or subtract the mantissas

4) Normalize the result. (Most significant digit of floating pt is non zero  $\rightarrow$  non 2 normalized)

In case binary floating pt. no., MSB bit  
of binary number 'n' should be  
equal to 1.



## Instruction Pipeline:

It is used to process multiple instructions simultaneously. In order to process the instructions, following steps ~~are~~ to be followed are:

- 1) Fetch the instruction from memory
- 2) Decode the instruction
- 3) Calculate the effective address. (to find the location of operand.)

Eg: In case of branch instruction, eff. address indicates where to branch.

- 4) Fetch the operands from memory

- 5) Execute the instruction:

Eg: Add, Mul, data transfer etc

- 6) Store the result in proper place.

Eg: In case of addition, result is stored in memory of register

These steps are divided into 4 segments.

For conditional BR instruction, condition is

verified in segment 4. It will verify whether the condition is true. Processing takes place



Fig: Four-segment CPU pipeline

from branching address. After executing some instruction, before executing next instruction, it will check whether there is any interrupt. and if there is an interrupt ISR (Interrupt service Routine) is executed,

and PC is updated with starting address of DSR. ~~For~~

→ For unconditional BR, PC is updated with branching address, and pipe is emptied. Branching instruction is prefetched (before) ~~before~~ and condition is checked.

→ If the condition is false in case of cond. branching, the next instruction after BR instruct is executed

| Step          | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 |
|---------------|----|----|----|----|----|----|----|----|----|----|----|----|----|
| Instruction 1 | FI | DA | FO | EX |    |    |    |    |    |    |    |    |    |
| 2             | FI | DA | FO | EX |    |    |    |    |    |    |    |    |    |
| (Branch)      | 3  |    | FI | DA | FO | EX |    |    |    |    |    |    |    |
| 4             |    |    | FI | —  | —  | FI | DA | FO | EX |    |    |    |    |
| 5             |    |    |    | —  | —  | —  | FI | DA | FO | EX |    |    |    |
| 6             |    |    |    |    |    |    |    | FI | DA | FO | EX |    |    |
| 7             |    |    |    |    |    |    |    |    | FI | DA | FO | EX |    |

Fig. Firing of instruction pipeline  
If the decoded instruction is BR instruction,

the next instruction is halted till the

execution of BR instruction. If the condition

is true, the first segment of instruction in branching add. is executed. Old, ~~the~~ the fetched

instruction is decoded. During decoding of first instruction after branching, second instruction after branching is decoded.

diff

Pipeline conflicts:

which arise

Problems in pipeline are called pipeline conflict

1) Resource conflicts

→ When 2 segments of instruction pipeline try to access the memory at the same time, it is called as resource conflicts;

(Depends on instructions in pipeline)

2) Data dependency conflicts:

→ When one instruction depends upon the result of previous instruction which is not yet available, data dependency conflict occurs.

Eg: The instruction is delayed till the result in previous is obtained.

→ In order to delay the instruction,

a hardware chip is used called as hardware interlocks.

→ Another way is to delay the instruction, i.e.,

by using compiler which inserts the No operation

instruction to delay the instruction. This is called called as delayed load.

→ Instead of storing the result of prior previous

instruction in destination register from which

the next instruction has the access, it is directly forwarded by bypassing the register

forwarded to next instruction. The operation is called

as operand forwarding.

③ Branch

③ Branch difficulties:

During pipelining, operation of pipelining is

interrupted or halted which is called as branch

penalty. It degrades the performance of pipeline

In order to reduce minimize the degradation of

the performance of pipeline during branching

instruction, we will be prefetching the instruction

completion of

from branching address before the execution of  
the instruction. It is done through  
hardware. It is one way to handle  
branch difficulties  
→ In order to have a continuous flow of pipeline  
comp during branching instruction, compiler will  
be either rearranging the instructions or  
inserts the No operation instruct<sup>n</sup>. This which  
is done in RISC computer. This technique  
is called as delayed branch)

Examples for delayed load and delayed branch using

\* RISC Pipeline:  
Instruction corresponding to RISC computer is called  
RISC pipeline. It is divided into 3 segments  
Segment  
I : Instruction fetch  
A : ALU operation → decoding and performing ALU operation.  
E : Execute instruction

→ RISC computer consists of 3 types of

instructions.

- 1) Data manipulation instructions.  $\underline{\text{Ex: Add (+, -, \times, \div)}}$
- 2) Data transfer  $\quad \quad \quad \text{Ex: Load and store}$
- 3) Program control  $\quad \quad \quad \text{Ex: branch}$

→ During data manipulation instruction, we perform data manipulation operation in 2<sup>nd</sup> segment and result of ALU is transferred to destination register in 3<sup>rd</sup> segment.

→ For data transfer instructions, (load or store) in the 2<sup>nd</sup> segment, effective address is calculated for load or store instruction.

Ex:  $M[x] \rightarrow R$   
 $\downarrow$   
 $\underbrace{-\text{eff}}$   
Eff. Add.

During 3<sup>rd</sup> segment, effective add. is transferred to the data memory for load or store operation.

→ For program control instruction, during 2<sup>nd</sup> segment, branch address is calculated by ALU. During 3<sup>rd</sup> segment, branch add. is

transferred to PC.

### Delayed load:

Eg:-

$$① \text{LOAD} : R_1 \leftarrow M[\text{address } 1]$$

$$② \text{LOAD} : R_2 \leftarrow M[\text{address } 2]$$

$$③ \text{ADD} : R_3 \leftarrow R_1 + R_2$$

$$④ \text{STORE} : M[\text{address } 3] \leftarrow R_3$$

### Pipeline

| clock cycles                          | 1 | 2 | 3 | 4 | 5 | 6 |
|---------------------------------------|---|---|---|---|---|---|
| 1. Load M                             | S | A | E |   |   |   |
| 2. Load R <sub>2</sub>                | S | A | E |   |   |   |
| 3. Add R <sub>1</sub> +R <sub>2</sub> | S | A | E |   |   |   |
| 4. Store R <sub>3</sub>               |   | S | A | E |   |   |

~~a) Pipeline timing with data conflict~~  
During 4<sup>th</sup> clock cycle, the value of

R<sub>2</sub> is not ready to be as R<sub>2</sub> is loaded into destination register and data

dependency conflict occurs due to

R<sub>1</sub>+R<sub>2</sub> and the 3<sup>rd</sup> instruction is delayed

- Instruction
- 1st P<sub>1</sub> is loaded by compiler by auto-memcopy
  - 2nd instruction No. 2 and 3rd and 4th instruction
  3. In 5th clock cycle, P<sub>1</sub> is performed, and 6th clock cycle

| clock cycle                            | 1 | 2 | 3 | 4 | 5  | 6 | 7 | 8 | 9 |
|----------------------------------------|---|---|---|---|----|---|---|---|---|
| 1) Load P <sub>1</sub>                 | 2 | 4 | 6 | 8 | 10 |   |   |   |   |
| 2) Load P <sub>2</sub>                 |   | 3 | 1 | 5 |    |   |   |   |   |
| 3) <u>No operation</u>                 |   |   | 2 | 4 | 5  |   |   |   |   |
| 4) Add P <sub>1</sub> , P <sub>2</sub> |   |   |   | 7 | 1  | 5 |   |   |   |
| 5) Store P <sub>2</sub>                |   |   |   |   | 2  | 4 | 5 |   |   |

→ This technique is called as delayed load.

### Vector

#### Vector Processing

→ Vector is a collection of data items

of same or diff. type (homogeneous type)

→ Vector processing deals with operations on vectors

### Vector

→ There are two types of processor to perform any operation, i.e.,

#### Scalar processor:

In scalar processor we can process

only single data item at a time.

(No arithmetic pipeline)  $\Rightarrow$ : only single data can be processed

It processes multiple data items at a time

It uses aerostatic pipeline

→ speed of operation is more in vector

processor then scalar processor.

Eg: Vector Instruction For adding vectors.

$$c(1:100) = A(1:100) + B(1:100)$$

$\vec{A}$  is vector containing 100 data items

44 44 44 44 44 44 44

C u a a

→ vector processor uses vector instructions.

**Scalar**       $\in$       **scalar**       $\in$        $\dots$

The above exp. indicates vector  $u$

→ Some operation can also be performed using scalar instruction with more than one instruction using loops.

→ In scalar processor, single instruction will be operated on single data item, at a time.

→ In vector processor, single instruction will be operated on multiple data items at a time due to arithmetic pipeline.

Eg. Mat

Matrix Multiplication:



To prove To get inner product (element in product matrix),

mat of row vector and column vector of matrices are multiplied.

$$C = A_1 B_1 + A_2 B_2 + A_3 B_3 + \dots + A_k B_k$$

~~A<sub>1</sub>, A<sub>2</sub>, ..., A<sub>k</sub>~~ are row vectors

~~B<sub>1</sub>, B<sub>2</sub>, ..., B<sub>k</sub>~~ are column vectors

C - inner product.

k - can be any value

→ Thus inner product is obtained by

vector processor which uses pipelining.



Fig. Pipeline for calculating an inner product

→ There are 4 segments in multiplier pipeline and adder pipeline

→ During first cycle, first segment of multiplier pipeline contains  $A_7B_1$ ,

→ Also during second cycle, first segment contains  $A_2B_2A_3B_3$  and second segment contains  $A_1B_1$ .

→ During 4<sup>th</sup> cycle, the segments of multiplier pipeline contains  $A_4B_4, A_3B_3, A_2B_2, A_1B_1$

→ During 5<sup>th</sup> cycle, initially adder pipeline consists of 0's and 0's are added to  $A_1B_1$  which is transferred to Adder pipeline

→ After 8 cycles, the

|          |          |          |          |
|----------|----------|----------|----------|
| $A_8B_8$ | $A_7B_7$ | $A_6B_6$ | $A_5B_5$ |
|----------|----------|----------|----------|

|          |          |          |          |
|----------|----------|----------|----------|
| $A_4B_4$ | $A_3B_3$ | $A_2B_2$ | $A_1B_1$ |
|----------|----------|----------|----------|

→ During 9<sup>th</sup> cycle, first segment of adder pipeline is added to  $A_1B_1$ , e.g.,  $A_5B_5 + A_1B_1$

→ 10<sup>th</sup> cycle  $\rightarrow A_2B_2 + A_8B_8$

11<sup>th</sup>  $\rightarrow A_3B_3 + A_7B_7$

12<sup>th</sup>  $\rightarrow A_4B_4 + A_8B_8$

→ Finally, partial products are obtained

$$C = A_1 B_1 + A_5 B_5 + A_9 B_9 + A_{13} B_{13} + \dots \quad \text{--- (1)}$$

$$+ A_2 B_2 + A_6 B_6 + A_{10} B_{10} + A_{14} B_{14} + \dots \quad \text{--- (2)}$$

$$+ A_3 B_3 + A_7 B_7 + A_{11} B_{11} + A_{15} B_{15} + \dots$$

$$+ A_4 B_4 + A_8 B_8 + A_{12} B_{12} + A_{16} B_{16}$$

Finally, 4 partial products in 4 segments  
of adder pipeline are added.

### Super scalar processor

It contains multiple functional units or processor units such that multiple instructions are operated on multiple (separate) data items at the same time.

### Super computer

A commercial computer with vector instructions

and pipelined floating point arithmetic operations

is called a super computer.

→ In vector processor, data items are represented using floating point.

## Array processors

- They are specially used to perform operations on arrays. (collection of data items of same type)
- There are 2 types of array processors.
  - 1) Attached array processor
  - 2) SIMD (Single instruction Multiple data) array processor.

### Attached Array Processor



Fig Attached array processor with host computer.

- It is attached to the general purpose computer. It acts as I/O device to the general purpose computer.

- It will access the data from local memory.
- The transfer of data from main memory and local memory is done using high speed bus.
- It contains one or more pipeline floating pt. adders and multipliers.

### SIMD Array Processor:



- In this processor, it contains multiple processing elements (PE) which contains ALU and registers.
- (Floating Arithmetic unit)

→  $M_1, M_2, \dots, M_n$  indicates local memories

→ Master control unit decodes the instruction and

controls the operation of processing elements.

→ Scalar instructions and program control instructions are executing using master control unit.

Vector instructions are executed by processing elements.

→ In case of vector instruction mentioned before

→ First element of vector A is given to  $M_1$ ,

Second " " " " " " " " " " " "  $M_2$

;

My first " " " " " " " " " " " "  $M_y$

→ ∴ first local memory contains first element

of vector A and first element of vector B

→ Main memory contains the program

∴ Data elements items are simultaneously

processed.)

## Multiprocessors:

- If a system contains multiprocessors then it is called multiprocessor system.
- In a multiprocessor system different jobs are performed simultaneously (or) single job is partitioned into multiple tasks which are performed simultaneously.
- There are 2 types of multiprocessor systems
  - 1) Shared memory (or) tightly coupled multiprocessor system.
  - 2) distributed memory (or) loosely coupled multiprocessor system.

In this each processor uses its own local private memory.

→ We have several structures for interconnecting the components of the multiprocessor system.

### Interconnection structures:

#### 1) Time shared, common bus



Fig: Time-shared common bus organisation

- It indicates single bus (common) b/w the processors and memory
- When one processor uses memory, the other has to wait
- S/P → S/p off processor which is used to perform I/P, O/P transfer

## 2) Multipoint memory

Memory modules.



In this, separate buses b/w processor and memory module.

Memory modules are parts of memory unit.

Memory module contains 4 ports. Each port is connected to 1 CPU. As it contains multiple ports, it is called as multipoint memory.

It comes under shared memory (MM1, MM2...).

core connected  
core connected to CPU1, CPU2...

### 3) Crossbar switch:



Here, single port is used for memory module.

Here, switches are used to establish the

path b/w CPU and Memory module.

So all the switches along the path should be path on for the transfer to take place.

#### 4) Multistage Switching Networks



Fig: Butterfly tree with 2x2 set switches.  
P<sub>1</sub>, P<sub>2</sub> indicate the processors. If connection is to be established b/w processor and 1010101

#### 5) Hypercube Interconnection



indicates loosely coupled or distributed memory multiprocessor system

for 1 cube,  $2^1$  nodes.

for  $n$  cube,  $2^n$  nodes.

each node indicates 1 processor

For  $n$  cube, we are interconnecting  $2^n$  processors

00, 01, 10, 11 indicates address of nodes

Add. of successive nodes differ by only 1 bit



4) Data Input command: It is used to bring the data in the interface register into the data lines.

→ The bus which is used by processor and devices is called ~~I/O~~ bus, which contains address and control lines by processor and memory → ~~memory~~ bus

→ There are 3 ways of using buses.

1) Using separate bus for I/O and memory

2) Using one common bus " " " " with separate control lines which is called as isolated I/O.

3) Using same common bus for I/O and memory with common control lines.

with common control which is called as memory mapped I/O.

Mem

### (Modes of Transfer)

Either data is transferred from input device to memory or from memory to O/P device. This can be done through with or without CPU.

→ To transfer data from I/O device to memory, it is transferred from I/O to processor and from processor to memory.

If to transfer data from memory to I/O device, it is transferred from ~~I/O device~~ to <sup>memory</sup> processor and processor to ~~I/O~~ I/O device.  
It is done through CPU.

∴ There is one way to transfer

→ Second way is directly transferring data from memory and I/O device which is called as direct memory access.

### 3 Modes of transfer:

- 1) Programmed I/O } through CPU
- 2) Interrupt Initiated I/O
- 3) Direct Memory access (DMA) → without CPU

1) In programmed I/O, as CPU is fast and I/O is slow, CPU has to continuously check whether the device is ready to perform data transfer. By this, CPU time is wasted, which means that, we are not efficiently using CPU.  
2) To overcome this, interrupt initiated I/O is used.

in which we are using interrupt facility where the interface will be interrupting the CPU whenever the device is ready to perform data transfer. Meanwhile, CPU can be used to execute another program. In this CPU need not continuously check whether the device is ready or not.

2) DMA — We are directly transferring memory between memory and I/O.

### 缺点

### Priority Interrupt

To perform the data

- 1) Programmed
  - Programmed
  - Programmed
  - Interrupted
  - Interrupt initiated I/O
- In programmed I/O, CPU has to check whether I/O device is ready to transfer data or not. This is the drawback.

This can be overcome.

The time delay can be reduced.



Block diagram of DMA controller

It is one of modes of memory transfer

b/w memory and I/O.

→ In DMA, without the intervention of CPU,

directly data is transferred b/w I/O

and memory. For ~~DM~~ directly direct

access, DMA controller is used.

→ There are 3 registers, which are initially

→ (before performing DMA transfer)

initialised by CPU.

### 1) Address register

- To read (from memory → CPU) and write (CPU to memory)
- It indicates the starting location of memory from which we have to read or write

### 2) Word count registers

- It indicates no. of words that have to be read/stored into memory during read/write operation.

### 3) Control register

- It ~~indeed~~ specifies the mode of transfer i.e. whether read or write. and also to start the DMA transfer.

~~If there is a request from I/O device~~

→ ~~If there is a request from I/O device to perform DMA transfer, initially memory buses are under control of CPU).~~

~~DMA controller should get the control of memory buses by sending bus request signal to the bus requesting the~~

~~the control of signal to the CPU.~~

- When bus request signal is sent, CPU terminates the execution of instruction (at that instant) and keeps the memory buses in 'high impedance state' which means that the memory buses are disconnected from CPU.
- Then, bus grant signal is sent by CPU. (which means memory buses are under control DMA controller)
- During ~~initial~~ initialisation of the 3 registers Data select pin. and register select are enabled
- Read and write pins are bidirectional i.e. they act as  $I_p$  and  $O_p$ . These pins act as  $I_p$  pins to CPU and  $O_p$  pins to memory in order to load data into
- When data is to be loaded 3 registers by CPU, Read/write pins are enabled.

→ After each word transfer, word count becomes 300

→ When all words are transferred from wordcount register to I/O device & memory interrupt is enabled so and wordcount will be 300.

→ In order to read the wordcount register, Read PS is enabled i.e. CPU sends the read signal.

→ DMA acknowledge will be generated to I/O device by cont DMA controller

When I/O device receives DMA acknowledge, if it is an I/P device, it will place the data on the data bus. If it is a O/P device, it will take the data from data bus and word count register will be decremented. and word count value ≠ 0, it will be

→ If the I/O device is fast it will send a DMA request and DMA controller check the whether there is a DMA request or not it will send a DMA request as soon as a DMA request is activated as soon as

previous word is transferred and viceversa.

→ In case of slow I/O devices, DMA controller disables BR signal and bus grant signal is disabled by CPU, i.e. memory buses are under control of CPU, and the CPU executes the program where it has stopped.

This is called as cycle ~~stealing~~.

Handing over the memory buses to CPU after each word transfer in case of slow I/O device is called cycle stealing.

It does not occur during case of fast DMA controller

I/O devices. It will take handover the control of memory buses only after transferring all the words. This is called burst transfer.

→ The 3 registers are initialized by CPU through data bus. value word count register is read by CPU through data bus.

After completion of all words, Interrupt is sent to CPU by DMA controller. CPU reads value of word count register  $\Rightarrow$  all words are transferred.

### \* Input-output Processor



Fig. Block diagram of a computer with I/O processor

- Purpose of Input-output processor is to perform data transfer b/w memory and peripheral devices (I/O devices) i.e. Memory  $\rightarrow$  DOP, IOP  $\rightarrow$  I/O (or) I/O  $\rightarrow$  DOP, IOP  $\rightarrow$  Memory unit
- When there is no IOP, CPU will transfer data from memory to I/O device.
- When DOP is performing data transfer, CPU can be used to execute another program
- CPU sends CTL information to I/O. IOP to know where whether the off I/O device

98 - ready or not,

## CW-Top constraints



→ CPU sends info. instruction to DOP to know whether the other D/O device is ready to perform data transfer or not. In response to this instruction sent by CPU, DOP sends status records to some location in memory.

which indicates that whether the I/O device is ready to perform data transfer.

- CPU reads the status words in memory and knows whether the device is ready to perform data transfer.
- If status word is Ok  $\rightarrow$  I/O device is ready.
- Then CPU sends ~~an~~ start I/O instruction to DOP.
- When DOP receives start I/O instruction, DOP accesses I/O program from memory.
- While DOP performs I/O program, CPU can be used to execute another program.
- Status report indicates whether ~~there is~~ any error occurred during data transfer.
- After the completion of data transfer, DOP interrupts CPU. The purpose of interrupting CPU is to inform CPU that data transfer is completed. Then CPU requests the DOP status report from DOP and that status report is sent to memory location. From memory location, <sup>CPU</sup> CPU reads the status word (report).

→ After reading status words, CPU knows whether any error has occurred or not.

### \* Interrupt initiated I/O

→ In this I/O, I/O interrupts the CPU when the I/O device is ready to perform data transfer.

→ When CPU is interrupted, it executes interrupt service routine (ISR) which is nothing but I/O program.

→ To know the address of ISR, there are 2 ways.

1) Vectorized interrupt

2) Nonvectorized interrupt

→ So in this case, the branch address (the address to which we have to jump in order to execute ISR) is contained in some fixed location in the memory.

∴ CPU has to take the branch address from memory.

→ In vectored interrupt, the branch address is provided by interrupt source. (one which is producing interrupt i.e.  $\text{I/O}$  or  $\text{P/I}$  device).

• This branch ~~interrupt~~ information is called as interrupt vector.

∴ This interrupt vector indicates the starting address of the interrupt service routine i.e.,  $\text{I/O}$  program which is used to perform  $\text{I/O}$  or  $\text{P/I}$  operation. (or) we get the address of address starting address of ISR in that address (direct or indirect.)