

# COA UNIT 1 Notes

Computer Organization & Architecture (Dr. A.P.J. Abdul Kalam Technical University)

## Functional units and their interconnections:

→ Functional units of a computer / digital system are:

- Input unit
- Output Unit
- Central Processing Unit (CPU)
- Memory
- Bus structure.



Note: Theory about these components are given in "Fundamental of COA" Notes and Unit 1 Notes (about bus)

## Buses, bus Architecture, types of buses and bus arbitration

### Bus:

- Bus is a communication system that helps data transfer between different modules of the computer.
- A bus is a communication pathway connecting two or more devices.
- A bus is a group of electrical lines/wires that carry computer signals/bits.
- A bus consists of multiple lines. Each line is capable of transmitting signals representing 1 or binary 0.

A bus that connects major computer components (processor, memory, I/O) is called a system bus.

- On any bus lines can be classified into three function groups:-

- ① Data Bus
- ② Address Bus
- ③ Control Bus.

Data Bus: - Data lines provides path for moving data between components.

- Bidirectional

→ Width of databus is number of lines in databus.

→ Each line can carry only one bit at a time.

### Address Bus: ( Unidirectional )

→ The address lines are used to designate (know) the source or destination of the data on the data bus.

For example: if the processor wishes to read or write a memory word, it puts the address of desired word on the address lines.

The width of address bus determines the maximum possible memory capacity of the system.

### control Bus: ( Bidirectional )

- The control lines are used to control the access to and <sup>the</sup> use of the data and address lines.
- control signals transmit both command and timing information

Timing signals ⇒ indicate the validity of data and address information

command signals ⇒ specify operations to be performed.

- Typical control lines include:-

- Memory write
- Memory Read
- I/O write
- I/O Read
- Transfer Ack
- Bus Request
- Bus Grant
- Interrupt Request
- Interrupt Ack
- clock
- etc.



Bus interconnection scheme.

### Types of Buses:

- Dedicated
- Multiplexed

- Dedicated bus line is permanently assigned either to one function or to a physical subset of computer components.
- Example of functional dedication :- the use of separate dedicated address and data lines.
- Multiplexed : Address and data information may be transmitted over the same set of lines using an Address Valid control line.

### For example:

At the beginning of a data transfer, the address is placed on the bus and Address valid line is activated. At this point the address is then removed from the bus and the same bus are used for subsequent read or write data transfer. (This known as time multiplexing)

## Bus Arbitration:

- More than one module may need control of the bus e.g. CPU and DMA controller.
- e.g. I/O module may need to read or write directly to the memory without sending the data to the processor.
- The process by which multiple requests are recognized and given priority given to one of them is called arbitration.
- Arbitration can be — Centralized / localized
  - Decentralized / Distributed
- In a centralized scheme, a single hardware device, referred to as a bus controller or arbiter, is responsible for allocating time on the bus. The device may be a separate module or part of the processor.
- In distributed scheme, there is no central controller. Rather, each module contains access control logic and the modules act together to share the bus.
- With both methods of arbitration, the purpose is to designate one device, either the processor or an I/O module, as master. The master may initiate a data transfer (e.g. read or write) with some other device, which acts as slave for this particular exchange.

## ~~Timing~~

Timing Timing Refers to the way in which events are coordinated on the bus.

- Buses use
  - ① Synchronous Timing
  - ② Asynchronous Timing

### Synchronous Timing:

- With synchronous timing the occurrence of the events on the bus is determined by a clock.
- The bus includes a clock line upon which a clock transmits a regular sequence of alternative 0's and 1's of equal duration. A single 1-0 transmission is referred to as a clock cycle or bus cycle and defines a time slot.
- All events starts at the beginning of a clock cycle.



## Asynchronous Timing:

- with asynchronous timing, the occurrence of one event on a bus follows and depends on the occurrence of a previous event.
- No clock.

# Bus Arbitration Techniques:

Daisy chain



Polling



Independent Requesting



## Daisy-chain Method or Serial Method:

- ⇒ The request of bus usage can be placed by any module connected
- ⇒ The bus controller grants the bus only to the first module connected.
- ⇒ If the module granted bus, who has been granted bus to, does not have not requested the bus, will forward the grant to the next module.
- ⇒ If the module who is having the grant wants to use bus, makes the busy signal 'set' and use the bus. After using bus, the module unset the busy signal.

Disadvantage: Modules has the fixed Priority.

## Polling Method:

Polling count line =  $n$

Max Number of connected modules  $\Rightarrow 2^n$ .

- ⇒ Modules requests the bus using 'Bus Request' line.
- ⇒ Bus controller places any one of  $2^n$  sequences on the poll count lines.
- ⇒ The module, for which polling sequence is allocated by the bus controller, can set busy signal and use the bus.
- ⇒ Priority is set by the bus controller.
- ⇒ It is time consuming method.
- ⇒ Non-productive Polling  $\Rightarrow$  Bus controller generated the polling sequence for the module which did not request the bus.

## Independent Request Arbitration Technique:

- In this method, every module has connected to the bus controller by separate 'Bus grant' and 'Bus Request'
- ⇒ Individual module can request to the bus controller by its own 'Bus Request' line. And, Bus controller can grant the bus to the module which has requested for the bus.
- ⇒ But this method increases the cost of the system.
- ⇒ Better Performance.

## Bus and Memory Transfer:

- A typical digital computer has many registers
- A path must be provided to transfer information from one register to another.
- common bus structure
  - using Multiplexers.
  - using Tri-state Buffer.

### Bus system using Multiplexer:

Example  $\Rightarrow$  For four registers.

Number of multiplexers = size of register.

Size of multiplexers = Number of registers



Function Table:

| S <sub>1</sub> | S <sub>0</sub> | Register Selected |
|----------------|----------------|-------------------|
| 0              | 0              | A                 |
| 0              | 1              | B                 |
| 1              | 0              | C                 |
| 1              | 1              | D                 |

12

## Memory Transfer:

~~Memory Read: The transfer of information to be stored into memory.~~

Memory Read: The transfer of information from memory to the outside environment is called a read operation

$$\text{Read: } DR \leftarrow M[AR]$$

$DR \Rightarrow$  Data Register

$M[AR] \Rightarrow$  memory location specified by Address Register AR.

Memory Write: The transfer of new information to be stored into the memory is called a write operation

~~$\text{Write: } M[AR] \leftarrow DR$~~

# Registers

- Registers are used to store data temporarily

→ The register in CPU performs two roles



## User Visible Registers:

- A user visible register may be referenced by means of machine language that the processor executes.
- categories - General Purpose
  - Data Register
  - Address Register
  - Condition Codes

### General Purpose Registers:

- can be used by the programmer.
- contains operands for any opcode.
- can be used for addressing functions (e.g. Register indirect, displacement).

Data Register: Data Register may be used only to hold the data

## Address Register:

Address Register may be devoted to a particular addressing mode. Examples -

- Segment Pointer
- Index Register
- Stack Pointer.

- Segment Pointer holds the address of the base of the segment in segmented addressing.
- Index Registers are used for indexed addressing and may be auto indexed.
- Stack Pointer points to the top of the stack. This allows implicit addressing that is push, pop and other stack instructions need not ~~not~~ contain an explicit stack operand.

## Condition codes (Also referred to as Flags):

Condition codes are the bits set by the processor hardware as the result of operation.

- Example:
- positive or negative result
  - Zero Result
  - Overflow
  - etc.

## control and status Register:

- Registers employed to control the operation of processor.
- Not visible to the user.
- Four Registers are essential to instruction execution
  - ① Program Counter
  - ② Instruction Register
  - ③ Memory Address Register (MAR)
  - ④ Memory Data Register (MDR)

- Program counter contains the address of the instruction to be fetched.
- Instruction Register contains the instruction most recently fetched.
- MAR contains the address of a location in the memory.
- MDR or MBR (Memory Buffer Register) contains a word of the data to be written to memory or the word most recently read.

## PSW : Program Status Word :

Many processors include a set of registers known as PSW, that contain the status information

- condition code and other status information

Sign: contains the sign bit of the result.

Zero: set when the result is zero.

Carry: set if an operation resulted in a carry into or borrow out of a higher order bit.

- 16
- Equal: set if a logical compare result is equal.
- Overflow: used to indicate arithmetic overflow.
- Interrupt Enable/ disable: used to enable or disable interrupts
- supervisor:- indicates whether the processor is executing supervisor or user mode.  
certain privileged instructions can be executed only in supervised mode, and certain memory areas can be accessed only in supervisor mode.

# Processor Organization:

Processor contains ALU, CU and Registers

- ALU ⇒ The ALU (Arithmetic Logic Unit) does the actual computation or processing of the data
- CU: control unit:- CU controls the movement of data and instruction into and out of the processor and controls the operation of ALU.
- Registers: minimal internal memory.



## Internal structure of CPU

- ① Processor Organization means how the components of processor are connected and accomplish their tasks.

A processor does the following things:

- Fetch Instruction

- Interpret instruction

- Fetch Data

- Process Data

- Write Data

- Fetch Instruction: The processor reads an instruction from memory.

- Interpret Instruction: The instruction is decoded to determine what action is required.

- Fetch data: The execution of an instruction may require reading data from memory or ~~I/O~~ an I/O module.

- Process Data: The execution of an instruction may require performing some arithmetic or logical operation on the data.

- Write Data: The result of an execution may require writing data to memory or I/O module.

## Topic

### General Register Organization:

→ A bus organization for Seven CPU registers.



Register set with common ALU

- The output of each Register is connected to two multiplexers to form the buses A and B.
  - The selection lines in each multiplexer selects one register or input for the particular bus.
- The buses A and B form the inputs to a common ALU.
- The operation selected in the ALU determines the arithmetic or logic microoperation that is to be performed.
- The result is available for output data and goes into the inputs of ~~the~~<sup>all</sup> registers. Register for output is selected by a decoder.
- The decoder activates one of the register load inputs.

Registers Selection:

| Binary Codes | SEL A | SEL B | SEL D |
|--------------|-------|-------|-------|
| 000          | input | input | None  |
| 001          | R1    | R1    | R1    |
| 010          | R2    | R2    | R2    |
| 011          | R3    | R3    | R3    |
| 100          | R4    | R4    | R4    |
| 101          | R5    | R5    | R5    |
| 110          | R6    | R6    | R6    |
| 111          | R7    | R7    | R7    |

# Table of operations performed by ALU

| OPR Select | Operation     |
|------------|---------------|
| 0          | Transfer A    |
| 1          | Increment A   |
| 2          | Add A+B       |
| 5          | Subtract A-B  |
| 6          | Decrement A   |
| 8          | AND A, B      |
| 10         | OR A, B       |
| 12         | XOR A, B      |
| 14         | complement A  |
| 16         | shift Right A |
| 24         | shift Left A  |

This table specifies some selected operants only  $\Rightarrow 11$   
 five bits can represent total 32 operation

## Topic.

### Stack Organization:

- stores information in LIFO order (Last In First Out)
- The register that holds the address of the stack is called Stack Pointer.
- Stack Pointer's value always points to the top element.
- Two operations on stack ① Push  
② Pop
- Push ⇒ to insert an element in the stack  
Pop ⇒ to delete an element from the stack
- Two status Full ⇒ set to 1 when stack is full  
Empty ⇒ set to 1 when stack is empty.
- ⇒ Stack can be ⇒ collection of memory words or registers
- ⇒ Example 64-word register stack



location 0 always stores LAST element

initially  
SP  $\leftarrow$  0  
EMPTY  $\leftarrow$  1  
FULL  $\leftarrow$  0

if (Full = 0)  $\Rightarrow$  new item is inserted with push operation  
if (EMPTY = 0)  $\Rightarrow$  an item can be deleted from stack with pop operation

### Push operation :-

SP  $\leftarrow$  SP + 1 increment stack pointer  
M[SP]  $\leftarrow$  DR write item on the top of stack  
if (RATHER SP = 0) then FULL  $\leftarrow$  1  
EMPTY  $\leftarrow$  0

M[SP] denotes the memory word specified by the address presently being available in the SP.

$\Rightarrow$  The first item is stored in stack at address 1.  
The last item is stored in the stack at address 0.

### Pop operation :

A new item is deleted from the stack if stack is not empty

DR  $\leftarrow$  M[SP]  $\rightarrow$  Read the data from the top of stack  
SP  $\leftarrow$  SP - 1  $\rightarrow$  Decrement SP  
if (SP = 0) then EMPTY  $\leftarrow$  1  
FULL  $\leftarrow$  0  $\Rightarrow$  check if the stack is empty  
Mark the stack not full.

## ADDRESSING MODES:

Two issues: ① How is the address of an operand specified?  
 ② How the bits of an instruction organized to define the operand addresses and operation of that instruction.

The operation field of an instruction specifies the operation to be performed. This operation must be executed on some data stored in the computer registers or memory words.

The way the operands are chosen during the program execution depends on the addressing mode of the instruction.

The addressing mode specifies a rule for interpreting or modifying the address field of the instruction before the operand is actually referenced.

## Most Common Addressing Techniques:

- Immediate Addressing
- Direct Addressing
- Indirect addressing
- Register Addressing
- Register Indirect Addressing
- Displacement → Relative Addressing
- Base Register Addressing
- Indexing
- Stack addressing

## ① Immediate Addressing:

- In this addressing mode, the operand value is present in the instruction.
- Advantage: no memory reference other than the instruction fetch is required to obtain the operand.
- Disadvantage: limited operand magnitude.



## ② Direct Addressing:

In this addressing mode, the address field contains the actual/effective address of the operand.

$$EA = A$$

$A \Rightarrow$  content of an address field of in the instruction



### Advantage:

- It requires only one memory reference and no special calculation.
- Simple

Disadvantage: limited address space.

## 25 (3) Indirect Addressing:

In this addressing mode, address field refers to the address of a word in the memory.



$$EA = (A)$$

- \* parentheses meaning here  
→ "content of"
- \* EA = Actual / effective address of the location containing the referenced operand.

Advantage: Large address space:  
→ for a word length of  $N$ , an address space of  $2^N$  is now available.

Disadvantage: requires two memory reference

- one for address of operand
- one for to get its value (operand's)

## (4) Register Addressing:

Register addressing is similar to direct addressing.  
The only difference is that the address field refers to a register rather than a main memory address.

$$EA = R$$

If the content of a register address field in an instruction is 5, then register R5 is intended address, and the operand value is contained in R5.



## Advantages:

- only a small address field in instruction is needed
- no time consuming memory references.

## Disadvantages:

- very limited register address space.

"Content of the register will be operand" in Register Addressing

## ⑤ Register Indirect Addressing:

- similar to indirect addressing
- But here, the address field of instruction specifies a register whose contents give the address of the operand in the memory.  
 $EA = (R)$

### Advantage:

- fewer bits to select a register than a memory address
- Large address space.

### Disadvantage:

- Extra memory Reference.



## 6 Displacement Addressing:

This addressing mode combines the capabilities of direct addressing and register indirect addressing.

$$EA = A + (R)$$

- Three most common uses of displacement addressing
  - Relative addressing
  - Base-register addressing
  - Indexing

### Relative Addressing:

- Also called PC-relative addressing
- In this mode the content of the program counter is added to the address part of the instruction in order to obtain the effective address.

Example: assuming  $PC = 825$

address part in instruction = 24

The instruction at location 825 is read from memory during the fetch phase and the program counter is then incremented by one to 826.

The effective address computation for relative address

$$\text{mode is } 826 + 24 = 850.$$

This is 24 memory locations forward from the address of the next instruction.

- often used with branch type instructions
- saves address bits in the instruction

**Effective address = content of Program counter + address in instruction**

## 28) Base Register Addressing:

In this mode the content of a base register is added to the address part of instruction to get the effective address.

- A base register is assumed to hold the base address and the address part of the instruction gives a displacement relative to this base address.

$$\boxed{\text{Effective Address} = \text{content of Base Register} + \text{address in instruction}}$$

## Indexing or Indexed Addressing mode:

- In this mode, the content of an index register is added to the address part of an instruction to obtain the effective address.
- The index register is a special CPU register that contains an index value. The address field of the instruction defines the beginning address of a data array in the memory.
- Each operand in the array is stored in the memory relative to the beginning address.

$$\boxed{\text{Effective address} = \text{content of index Register} + \text{address in instruction}}$$

## 7) Stack Addressing :

- A stack is a linear array of locations.
- A stack is also referred to as a push down list or last-in-first-out queue.
- LIFO concept.
- The stack mode of addressing is a form of implied addressing. The machine instruction need not include memory reference but implicitly operate on the top of stack.

EA = top of the stack

Advantages: No memory reference

Disadvantage: Limited applicability

## 8) Implied Mode or Implicit mode :

- Operands are specified implicitly
- Example instruction such as CMA (complement Accumulator)
- In fact all register reference instructions that use an accumulator are implied mode instructions.
- Zero address instructions in stack organized computer are implied mode instructions

## ⑨ Auto increment or Autodecrement Mode:

The autoincrement mode is similar to register indirect mode except that the register (content) is incremented after the execution of the instruction.

Effective address = content of register + address in instruction

Now increment the content of register  $R = R + 1$

In the autodecrement mode, the content of register is decremented before the execution of instruction.

$$R = R - 1$$

Effective address = content of register + address in instruction

Effective address means  $\Rightarrow$  address of operand

### Numerical Example for Addressing Modes:

PC [ 200 ]

RI [ 400 ]

XR [ 100 ]

AC [ ]

| Address | Memory            |
|---------|-------------------|
| 200     | Load to AC   Mode |
| 201     | Address 500       |
| 202     | Next Instruction  |
| 399     | 450               |
| 400     | 700               |
| 500     | 800               |
| 600     | 900               |
| 702     | 325               |
| 800     | 300               |

- The two word instruction at address 200 and 201 is a Load to AC | Mode | Address 500
- The first word of the instruction specifies the operation code and mode, and the second word specifies the address part.
- PC (Program Counter) has the value 200 for fetching this instruction
- The content of pointer register RI is 400 and the content of index register XR is 400.
- AC (Accumulator) receives the operand after the instruction is executed.

37  
⇒ For each possible mode, we calculate the effective address and the operand that must be loaded into AC.

① Direct mode: ⇒ the effective address is the part of instruction.  
Effective Address = 500      operand ⇒ 800

② Immediate mode: operand is the part of instruction.  
operand = 500  
EA ⇒ ~~address~~ 201

③ Indirect Mode: ⇒ Effective address is stored at the address part of instruction '500'  
Effective Address ⇒ 800      operand = 300

④ Relative Mode: Effective address ⇒ content of PC + instruction address part.  
$$\begin{aligned} EA &= 202 + 500 \\ &= 702 \\ \text{operand} &= 325 \end{aligned}$$

⑤ Index Mode: EA = content of index register + instruction address part.  
$$\begin{aligned} &= 100 + 500 \\ &= 600 \\ \text{operand} &= 900 \end{aligned}$$

### ⑥ Register Mode:

⇒ There is no effective address. content of the register R1 will be loaded to AC.  
operand = 400

### ⑦ Register Indirect mode:

Register R1 contains the effective address.

EA = 400      operand = 700

### ⑧ Auto-increment Mode:

same as Register indirect mode, but the content of Register is incremented after the execution of the instruction

EA = 400, operand = 700, R1 = 401

### ⑨ Auto-Decrement Mode:

same as Register indirect mode, but the content of Register is decremented before the execution of the instruction

Effective Address = 309      operand = 450

Table

|   | Addressing Mode | Effective address | operand |
|---|-----------------|-------------------|---------|
| ① | Direct          | 500               | 800     |
| ② | Immediate       | 201               | 500     |
| ③ | Indirect        | 800               | 300     |
| ④ | Relative        | 702               | 325     |
| ⑤ | Indexed         | 600               | 900     |

|   | Addressing mode   | effective address | operand |
|---|-------------------|-------------------|---------|
| ⑥ | register          | -                 | 400     |
| ⑦ | Register indirect | 400               | 700     |
| ⑧ | Auto increment    | 400               | 700     |
| ⑨ | Auto-decrement    | 399               | 450     |

## COA UNIT-2 Notes

Computer Organization & Architecture (Dr. A.P.J. Abdul Kalam Technical University)

Adders:

- Half Adders
- Full Adders
- n-bit Binary Parallel Adder
- Carry Look-ahead Adder (or Fast adder or speed adder).

Half Adder:

Half adder is a combinational circuit that adds two bits.

Truth Table of Half Adder:Circuit of Half adder

| input |   | output |       |
|-------|---|--------|-------|
| A     | B | Sum    | carry |
| 0     | 0 | 0      | 0     |
| 0     | 1 | 1      | 0     |
| 1     | 0 | 1      | 0     |
| 1     | 1 | 0      | 1     |

Full Adder:

Full adder is a combinational circuit that adds 3 bits.

Truth Table:

| A | B | C | Sum | Cout |
|---|---|---|-----|------|
| 0 | 0 | 0 | 0   | 0    |
| 0 | 0 | 1 | 1   | 0    |
| 0 | 1 | 0 | 1   | 0    |
| 0 | 1 | 1 | 0   | 1    |
| 1 | 0 | 0 | 1   | 0    |
| 1 | 0 | 1 | 0   | 1    |
| 1 | 1 | 0 | 0   | 1    |
| 1 | 1 | 1 | 1   | 1    |

$$\text{sum} = A \oplus B \oplus C$$

$$\text{cout} = AB + BC + AC$$

## Logic diagram of Full adder:



$$C_{out} = \bar{A}\bar{B}C + A\bar{B}C + A\bar{B}\bar{C} + ABC$$

Properties :

$$= \bar{A}\bar{B}C + A\bar{B}C + A\bar{B}\bar{C} + ABC + ABC + ABC$$

$$\left\{ \because x+x=x \right\}$$

$$= BC[\bar{A}+A] + AC[\bar{B}+B] + AB[\bar{C}+C]$$

$$\left\{ x+\bar{x}=1 \right\}$$

$$= BC \cdot 1 + AC \cdot 1 + AB \cdot 1$$

$$\left\{ x \cdot 1 = x \right\}$$

$$C_{out} = BC + AC + AB$$

## n-bit Binary Parallel Adder:

→ An n-bit binary parallel adder adds 2 n-bit numbers.

→ An n-bit binary adder uses n full adders.

Example An 4-bit binary adder adds two 4-bit binary numbers

$$A = A_3 A_2 A_1 A_0$$

$$B = B_3 B_2 B_1 B_0$$

Addition

$$\begin{array}{r}
 & C_3 & C_2 & C_1 & C_0 = 0 & \Rightarrow \text{carry input} \\
 A_3 & A_2 & A_1 & A_0 & & \\
 + B_3 & B_2 & B_1 & B_0 & & \\
 \hline
 S_3 & S_2 & S_1 & S_0 & \Rightarrow \text{sum} \\
 C_4 & C_3 & C_2 & C_1 & \Rightarrow \text{carry output.}
 \end{array}$$

4-bit binary adder diagram.

A 4-bit binary parallel adder consists of 4 full adders in cascade, with output carry from one full adder, connected to the carry input of the next full adder.

## Carry look ahead Adder:

### Drawback of n-bit parallel adder -

- The carry output of one stage (or full adder) is connected to the carry input of the next higher stage (or full adder). This carry is called ripple carry.
- Therefore, the sum and carry can not be produced until the input carry occurs.
- This lead to a time delay in addition process. This delay is known as carry propagation delay.

## Carry-Lookahead Adder:

This method utilizes logic gates to look at the lower order bits of augend and addend to see if a higher order carry is to be generated or not

Carry look-ahead concept uses two functions

- ① Carry Propagate ( $P_i$ )
- ② Carry Generate ( $G_i$ )

## Full adder using two half adders:



$$\text{Carry Propagation } P_i = A_i \oplus B_i$$

$$\text{Carry Generation } G_i = A_i \cdot B_i$$

Both RHS does not contain any  $C_i$

The output  $S_i = P_i \oplus C_i$

$$C_{i+1} = G_i + P_i C_i$$

we can create n-bit ~~look~~ carry look-ahead adder using these two equations.

### Example

4 bit carry look-ahead adder:

For 4-bit,  $i$  is 0 to 3.

To remove the dependency of  $C_{i+1}$  on  $C_i$ , we need to express  $C_{i+1}$  into  $A_i, B_i$  and  $C_0$ .

$$C_{i+1} = G_i + P_i C_i \quad \text{--- (1)}$$

Put

$$i=0 \quad C_{0+1} = G_{0+1} + P_0 C_0$$

$$C_1 = G_{0+1} + P_0 C_0$$

$i=1$

$$C_{1+1} = G_{1+1} + P_1 C_1$$

$$C_2 = G_{1+1} + P_1 [G_{0+1} + P_0 C_0]$$

$i=2$

$$C_{2+1} = G_{2+1} + P_2 C_2$$

$$C_3 = G_{2+1} + P_2 [G_{1+1} + P_1 [G_{0+1} + P_0 C_0]]$$

$i=3$

$$C_{3+i} = G_3 + P_3 C_3$$

$$C_4 = G_3 + P_3 [G_2 + P_2 [G_1 + P_1 [G_0 + P_0 C_0]]]$$

4-bit carry look-ahead ~~Adder~~ Carry Generation circuit



## 4 bit carry lookahead adder circuit:



# Signed Multiplication : (Booth's Algorithm)

Flowchart:



NOTE

if MSB (or sign bit)  
is zero  $\Rightarrow$

Result is positive and  
the magnitude of  
the product.

if MSB (or sign bit 1)

Result is negative  
take 2's complement to  
get the magnitude and  
place negative sign

Signed Booth Multiplication Example:Example  $(-7) \times (+3) = (-21)$ 

$$M = (-7)_{10} = 2's(0111) \\ = (1001)_2$$

$$-M = (0111)_2$$

Tracing Table

|       |   | $(+3)$ | $Q_n$ | $q_0$ | Action/Comment                       |
|-------|---|--------|-------|-------|--------------------------------------|
| STEP① | 4 | 0000   | 0011  | 0     | initialization                       |
|       |   | 0111   | 0111  | 0     | $A = A - M \Rightarrow A = A + (-M)$ |
|       |   | 0011   | 1001  | 1     | ASR $AQq_0$                          |
| STEP② | 3 | 0011   | 1001  | 1     | $n = n-1$                            |
|       |   | 0001   | 1100  | 1     | ASR $AQq_0$                          |
| STEP③ | 2 | 0001   | 1100  | 1     | $n = n-1$                            |
|       |   | 1010   | 1100  | 1     | $A = A + M$                          |
|       |   | 1101   | 0110  | 0     | ASR $AQq_0$                          |
| STEP④ | 1 | 1101   | 0110  | 0     | $n = n-1$                            |
|       |   | 1110   | 1011  | 0     | ASR $AQq_0$                          |
|       | 0 | 1110   | 1011  | 0     | $n = n-1$                            |

Result  $\Rightarrow A Q$ 

11101011

↑ sign bit is one. So the result is negative, take 2's complement to get the magnitude and place negative sign.

Final result:  $(-21)_{10}$ 11101011  
↓ 2's complement

00010100

↓ +1

00010101  $\Rightarrow (21)_{10}$

## Array Multiplier:

Binary Multiplication is done by doing additions. Partial Products are calculated by multiplying the multiplicand by each bit of the multiplier and then summing the partial products.

Ques: Design  $4 \times 3$  binary or Array multiplier.

$$\text{Given } A \Rightarrow A_3 A_2 A_1 A_0$$

$$B \Rightarrow B_2 B_1 B_0$$

$$\text{Number of bits in product} \Rightarrow 4+3=7$$

$$\begin{array}{r}
 & A_3 & A_2 & A_1 & A_0 \\
 & \times & B_2 & B_1 & B_0 \\
 \hline
 & A_3 B_0 & A_2 B_0 & A_1 B_0 & A_0 B_0 \\
 & A_3 B_1 & A_2 B_1 & A_1 B_1 & A_0 B_1 \\
 & A_3 B_2 & A_2 B_2 & A_1 B_2 & A_0 B_2 \\
 \hline
 & P_6 & P_5 & P_4 & P_3 & P_2 & P_1 & P_0
 \end{array}$$



## 4x4 Binary Array Multiplier:

$$\begin{array}{r} A = A_3 \ A_2 \ A_1 \ A_0 \\ \times B = B_3 \ B_2 \ B_1 \ B_0 \end{array}$$



circuit Diagram of 4x4 Array Multiplier

# Division of unsigned Integer: Restoring Division Algorithm:

Flowchart:



Example: Divide 11 by 3.

(Q) Dividend = 11

(M) Divisor = 3

$M=3 \Rightarrow (00011)_2 \Rightarrow$  Number of bits in M =  $n+1$

$Q=11 \Rightarrow (1011)_2$

$-M = (11101)_2$

### Tracing Table

| $n$ | M     | A     | Q    | Action / comment                              |
|-----|-------|-------|------|-----------------------------------------------|
| 4   | 00011 | 00000 | 1011 | Initialization                                |
|     |       | 00001 | 011? | shift left AQ                                 |
|     |       | 11110 | 011? | $A = A - M \Rightarrow 00001 + 11101 = 11110$ |
|     |       |       |      | $A[n] == 1$                                   |
| 3   |       | 00001 | 0110 | $Q[0] \leftarrow 0$ Restore A<br>$n = n-1$    |
|     |       | 00010 | 110? | shift left AQ                                 |
| 2   |       | 11111 | 110? | $A = A - M$                                   |
|     |       | 00010 | 1100 | $Q[0] \leftarrow 0$ , Restore A               |
|     |       | 00010 | 1100 | $n = n-1$                                     |
|     |       | 00101 | 100? | SL AQ                                         |
|     |       | 00010 | 100? | $A = A - M$                                   |
| 1   |       | 00010 | 1001 | $Q[0] \leftarrow 1$                           |
|     |       | 00101 | 001? | $n = n-1$                                     |
|     |       | 00010 | 001? | SL AQ                                         |
|     |       | 00010 | 001? | $A = A - M$                                   |
|     |       | 00010 | 0011 | $Q[0] \leftarrow 1$                           |

| $n$ | $M$   | $A$   | $Q$  | Action / comment |
|-----|-------|-------|------|------------------|
| 11  | 00011 | 00010 | 0011 | $n=n-1$          |

Now Quotient in  $Q = 0011 \Rightarrow 3$

Remainder in  $A = 00010 \Rightarrow 2$

11 (Dividend) / 3 (Divisor)  $\Rightarrow$  3 Quotient  
2 Remainder

# Flow chart of Unsigned Integer Division

## Non-Restoring Method



Example Divide 11 by 3.

Minalshi

(Q) Dividend = 11

Number of bits  $\Rightarrow n = 4$

(M) Divisor = 3

Number of bits  $M = n+1 = 5$

$$-M = 11101$$

Tracing Table:

| <u>n</u> | <u>M</u> | <u>A</u> | <u>Q</u> | Action / comment                                                 |
|----------|----------|----------|----------|------------------------------------------------------------------|
| 4        | 00011    | 00000    | 1011     | Initialization                                                   |
|          |          | 00001    | 011?     |                                                                  |
|          |          | 11110    | 011?     |                                                                  |
|          |          | 11110    | 0110     | $SL \ A \ Q$                                                     |
|          |          | 11110    | 0110     | $A = A - M$                                                      |
|          |          | 11110    | 0110     | $Q[0] \leftarrow 0$                                              |
|          |          | 11110    | 0110     | $n = n - 1$                                                      |
| 3        |          | 11100    | 110?     |                                                                  |
|          |          | 11111    | 110?     | $SL \ A \ Q$                                                     |
|          |          | 11111    | 1100     | $A = A + M$                                                      |
|          |          | 11111    | 1100     | $Q[0] \leftarrow 0$                                              |
| 2        |          | 11111    | 1100     | $n = n - 1$                                                      |
|          |          | 11111    | 100?     | $SL \ A \ Q$                                                     |
|          |          | 00010    | 100?     | $A = A + M$                                                      |
|          |          | 00010    | 100?     | $\begin{array}{r} 11111 \\ + 00011 \\ \hline 100010 \end{array}$ |
|          |          | 00010    | 100?     | ↑ discard it                                                     |
| 1        |          | 00010    | 1001     | $Q[0] \leftarrow 1, n = n - 1$                                   |
|          |          | 00101    | 001?     | $SL \ A \ Q$                                                     |
|          |          | 00010    | 001?     | $A = A - M$                                                      |
|          |          | 00010    | 001      | $Q[0] \leftarrow 1$                                              |
|          |          | 00010    | 0011     | $n = n - 1$                                                      |
| 0 (Yes)  |          | 00010    | 0011     |                                                                  |

MSB of A = 0  $\Rightarrow$  then Result:

$$\text{Quotient in } Q = (0011)_2 = 3$$

$$\text{Remainder in } A = (00010)_2 = 2$$

Ques Divide 7 by 3 using Non-Restoring method.

$$\begin{aligned} Q &= \text{Dividend} = 7 \Rightarrow 111 \\ M &= \text{Divisor} = 3 \Rightarrow 0011 \\ -M &= 1101 \end{aligned} \quad ] \quad n=3$$

0 → Subt  
1 → Add.

| <u>n</u>        | A     | Q    | Action                    |
|-----------------|-------|------|---------------------------|
| 3               | 0000  | 111  | Initialization            |
|                 | ↑     |      |                           |
|                 | 0001  | 11?  |                           |
|                 | +1101 |      | $SL AD,$<br>$A = A - M$   |
|                 | 1110  | 11?  | $Q[0] \leftarrow 0$       |
|                 | ↑     |      |                           |
|                 | 1110  | 110  |                           |
| 2               | 1101  | 10?  | Shift Left<br>$A = A + M$ |
|                 | 0011  |      |                           |
| carry discarded | 0.000 | 101. | $Q[0] \leftarrow 1$       |
| 1               | 0001  | 01?  |                           |
|                 | 1101  |      | $A = A - M$               |
|                 | 1110  | 010  | $Q[0] \leftarrow 0$       |
| 0               |       |      |                           |
|                 | ↓     | ↓    |                           |
|                 |       |      | Remainder Quotient        |

$$A \Rightarrow 1110$$

↑

$$A \Rightarrow 1110$$

$$\begin{array}{r} + M \Rightarrow +0011 \\ \hline 0001 \end{array}$$

$$\text{Remainder} = 1 \quad \text{Quotient} = 2$$

## Floating Point Representation:

→ we can represent a floating point number in the form

$$\pm s \times B^{\pm E}$$

$B \Rightarrow$  Base, For binary  $B=2$

⇒ This number can be stored in a binary word with three fields

- Sign (plus or minus)
- significand or Mantissa (S)
- Exponent E.

| Sign | Biased Exponent | Mantissa or significand |
|------|-----------------|-------------------------|
|------|-----------------|-------------------------|

## IEEE 754 Format:

- ① Single Precision (32 bit)
- ② Double Precision (64 bit)

|                  | Signbit | Biased Exponent bits | Bias | Fraction bits |
|------------------|---------|----------------------|------|---------------|
| single Precision | 1       | 8                    | 127  | 23 bits       |
| double Precision | 1       | 11                   | 1023 | 52 bits       |

Two functions for representing floating point numbers

→ Normalization

→ Biasing

|                    |                        |
|--------------------|------------------------|
| Sign $\Rightarrow$ | 0 for positive numbers |
|                    | 1 for negative numbers |

A normalized number is one in which the most significant digit of the significand is one.

For base 2 representation

$$1.bbb\dots b \times 2^{E}$$

$$1.M \times 2^{E}$$

where  $M \Rightarrow$  Mantissa / fraction / significand

Because the most significant bit is always one, it is unnecessary to store.

Given a number that is not normalized, the number may be normalized by shifting the radix point to the right of left most 1 bit and adjusting the exponent accordingly.

Biasing: Exponent is stored in biased representation.

In single precision case, 8 bits field yields the numbers 0 through 255.

with a bias of 127 ( $2^7 - 1$ ), the true exponent values are in range -127 to +128

In double precision case, 11 bits field yields the numbers 0 through 2047 with a bias of 1023 ( $2^{10} - 1$ ), the true exponent values are in range -1023 to +1024

Why Biasing: So that the bits in exponents can be treated as unsigned or nonnegative integers.

Instead of storing exponent in 2's complement, we can store it as unsigned number with the help of biasing

Ques: Represent  $(-1234.125)_{10}$  into single precision and double precision representation.

Ans:- ① First convert 1234.125 into binary number.

$$(1234.125)_{10} = 10011010010.001$$

② Now Normalize this number  $\Rightarrow$  move the decimal point right to the leftmost 1.

$$\text{leftmost} = 1.0011010010001 \times 2^{+10}$$

single precision: 32 bit

single precision: 32 bit  
• (ii) calculate biased exponent.

$$\begin{aligned}
 \text{Biased Exponent } E' &= E + \text{Bias} \\
 &= +10 + 127 \\
 &= 137 \\
 &= 10001001
 \end{aligned}$$

sign = negative = 1

| sign | Biased Exponent | Mantissa (23)              |
|------|-----------------|----------------------------|
| 1    | 10001001        | 00110100100010000000000000 |

Double Precision 64 bits

Double Precision 64 bits:  
 (ii) calculate biased exponent  $E' = E + \text{Bias}$   
 $= 10 + 1023$

$$= 10 + 1023$$

$$= 1033$$

$$= 10000001001$$

| sign | Exponent (E') | Mantissa (52)              |
|------|---------------|----------------------------|
| 1    | 10000001001   | 0011010010001000000- -0000 |

Ques Represent 000100110101.001101 using single precision and double precision.

Given number 000100110101.001101

Normalized Number  $\Rightarrow 1.00110101001101 \times 2^{+9}$

Biased Exponent (SP):  $E' = E + 127$

$$= 9 + 127$$

$$\Rightarrow 136$$

$$\Rightarrow 10001000$$

Biased Exponent (DP):  $E' = E + 1023$

$$= 9 + 1023 \Rightarrow 1032$$

$$\Rightarrow 10000001000$$

single Precision

|   |          |                           |
|---|----------|---------------------------|
| 0 | 10001000 | 0011010100110100000000000 |
|---|----------|---------------------------|

Double Precision

|   |                                    |
|---|------------------------------------|
| 0 | 10000001000100110101001101000---00 |
|---|------------------------------------|

Ques: Represent  $-(0.0000010001100110)_2$  using single precision and double precision.

Normalize the number.

$$\Rightarrow 1.00011001101 \times 2^{-6}$$

Biased Exponent in SP:  $E' = E + \text{Bias} \Rightarrow -6 + 127$   
 $\Rightarrow 121$   
 $\Rightarrow 01111001$

Biased Exponent in DP:  $E' \Rightarrow E + \text{Bias} \Rightarrow -6 + 1023$   
 $\Rightarrow 1017$   
 $\Rightarrow 0111111001$

single Precision  $\Rightarrow$

|   |         |                          |
|---|---------|--------------------------|
| 1 | 0111001 | 000110011010000000000000 |
|---|---------|--------------------------|

Double Precision:-

|   |            |                       |
|---|------------|-----------------------|
| 1 | 0111111001 | 0001100110100000--000 |
|---|------------|-----------------------|

## Floating-Point Addition and Subtraction:

- Four basic phases of the algorithm for addition and subtraction.

1. Check for zeros.
2. Align the significands
3. Add or subtract the significands
4. Normalize the result.

| Floating Point Numbers                               | Addition and subtraction                                                                                                                                                                                                                                           |
|------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $X = X_s \times B^{X_E}$<br>$Y = Y_s \times B^{Y_E}$ | $X+Y = (X_s \times B^{X_E - Y_E} + Y_s) \times B^{Y_E}$ $X-Y = (X_s \times B^{X_E - Y_E} - Y_s) \times B^{Y_E}$ $\left. \begin{array}{l} X+Y = (0.3 \times 10^2 + 0.2) \times 10^3 \\ X-Y = (0.3 \times 10^2 - 0.2) \times 10^3 \end{array} \right\} X_E \leq Y_E$ |

### Example:

$$X = 0.3 \times 10^2 = 30$$

$$Y = 0.2 \times 10^3 = 200$$

$$X+Y = (0.3 \times 10^{2-3} + 0.2) \times 10^3 = (0.03 + 0.2) \times 10^3$$
$$= 0.23 \times 10^3$$
$$= 230$$

$$X-Y = (0.3 \times 10^{2-3} - 0.2) \times 10^3 = (0.03 - 0.2) \times 10^3$$
$$= (-0.17) \times 10^3$$
$$= -170$$

Phase ① : Zero check: If either operand is 0, the other is reported as the result.

Phase ② : Significand alignment: To manipulate the numbers so that the two exponents are equal.

Alignment may be achieved by shifting either the smaller number to right (increasing its exponent) or shifting the larger number to left.

- often small number is picked for shifting

Phase ③ Addition: Next, two significands are added together taking into account their signs.

Phase ④ Normalization: This phase normalize the result.

Normalization consists of shifting significand digits left until the most significant digit is nonzero. Each shift causes a decrement of the exponent and thus could cause an exponent underflow. Finally, the result must be rounded off and the reported.

For floating-point addition and subtraction, it is necessary to ensure both the operands have the same exponent value. This may require shifting the radix point on one of the operands to achieve alignment.

A floating-point may produce one of these conditions:

- Exponent overflow
- Exponent Underflow
- Significand underflow
- Significand overflow

Exponent Overflow: A positive exponent exceeds the maximum possible exponent value. In some systems, this may be designated as  $+\infty$  or  $-\infty$ .

Exponent Underflow: A negative exponent is less than the minimum possible exponent value. (e.g.  $-200$  is less than  $-127$ ) This means the number is too small to be represented and it may be reported as zero.

Significand underflow: In the process of aligning significands digits may flow off the rightend of the significand. So, some form of rounding is required.

Significand overflow: The addition of two significands of the same sign may result in a carry out of the most significant bit. This can be fixed by realignment.

# Flow chart for Floating Point Addition and Subtraction



Figure 9.22 Floating-Point Addition and Subtraction ( $Z \leftarrow Z \pm Y$ )

# Flow chart for Floating Multiplication



Figure 9.23 Floating-Point Multiplication ( $Z \leftarrow X \times Y$ )

## Floating-Point Multiplication:

- if either operand is 0(zero), zero is reported as result.
- The next, ~~step~~ step is to add the exponents. if the exponents are stored in biased form, the exponent sum would have doubled the bias. Thus, the bias must be subtracted from the sum. The result could be either an exponent underflow or exponent overflow, which would be reported.

- if the exponent of the product is within the range, the next step is to multiply the significand, taking into account their signs.
- After the product is calculated, the result is normalized and rounded.

# flow3.jpg

## Flow chart for Floating Division



### Floating Point division:-

- Test for zero → if divisor is zero, an error is issued or result is set to infinity.  
→ if dividend is zero, the result is zero.
- Next, the divisor exponent is subtracted from the dividend exponent. This removes bias, which must be added back in.
- Tests are made for exponent underflow or overflow.
- The next step is to divide the significands.
- Followed by normalization and rounding.

## Logic Microoperations :

→ logic microoperations specify binary operations ~~for~~ for strings of bits stored in registers. These operations consider each bit of the register separately and treat them as binary variables.

### For Example

- AND ( $\wedge$ )
  - OR ( $\vee$ )
  - NOT ( $\neg$ )
  - NOR
  - NAND
  - X-OR ( $\oplus$ )
  - X-NOR ( $\ominus$ )

$$R_1 = 1010$$
$$R_2 = 1100$$

$$\begin{array}{r} \text{AND} \\ \hline \end{array} \Rightarrow \begin{array}{r} 1010 \\ 1100 \\ \hline 1000 \end{array}$$

$$\begin{array}{r} 0R \Rightarrow 1010 \\ 1100 \\ \hline 1110 \end{array}$$

NOT      1010  
                0101

$$\begin{array}{r}
 \underline{\text{NOR:}} \quad 1010 \\
 \quad \quad \quad 1100 \\
 \hline
 \quad \quad \quad 0001
 \end{array}$$

$$\begin{array}{r}
 \text{NAND:-} \\
 \begin{array}{r}
 1010 \\
 1100 \\
 \hline
 0111
 \end{array}
 \end{array}$$

$$\begin{array}{r}
 \underline{\text{X-OR}} \\
 1010 \\
 1100 \\
 \hline
 \underline{0110}
 \end{array}$$

$$\begin{array}{r}
 \underline{\text{X-NOR}} \\
 \begin{array}{r}
 1010 \\
 1100 \\
 \hline
 1001
 \end{array}
 \end{array}$$

## Shift Microoperations

→ Shift Micro operations are used for serial transfer of data.

→ Types of Shift operations

- Logical shift
  - shift left
  - shift right
- Circular shift or Rotate
  - circular Right shift
  - circular left shift
- Arithmetic
  - Arithmetic Right shift
  - Arithmetic left shift

### Logical shift left:



### Logical shift Right:



### Example:

1 0 0 1 0 1 1 1  
0 0 1 0 1 1 1 0      After shift left

1 0 0 1 0 1 1 1  
0 1 0 0 1 0 1 1      After shift right

## Circular Shift: (Also known as Rotate operation).

→ circulates the bits of the register around the two ends without loss of information.

### circular left shift

|                |                |                |                |                |                |                |                |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| b <sub>7</sub> | b <sub>6</sub> | b <sub>5</sub> | b <sub>4</sub> | b <sub>3</sub> | b <sub>2</sub> | b <sub>1</sub> | b <sub>0</sub> |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|

↓ After circular left shift

|                |                |                |                |                |                |                |                |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| b <sub>6</sub> | b <sub>5</sub> | b <sub>4</sub> | b <sub>3</sub> | b <sub>2</sub> | b <sub>1</sub> | b <sub>0</sub> | b <sub>7</sub> |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|

### Circular Right Shift

|                |                |                |                |                |                |                |                |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| b <sub>7</sub> | b <sub>6</sub> | b <sub>5</sub> | b <sub>4</sub> | b <sub>3</sub> | b <sub>2</sub> | b <sub>1</sub> | b <sub>0</sub> |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|

After ↓ circular Right shift

|                |                |                |                |                |                |                |                |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| b <sub>0</sub> | b <sub>7</sub> | b <sub>6</sub> | b <sub>5</sub> | b <sub>4</sub> | b <sub>3</sub> | b <sub>2</sub> | b <sub>1</sub> |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|

### Example

10010111  
/ / / / / / /  
00101111  
Circular Left Shift

10010111  
/ / / / / /  
11001011 → Circular Right Shift

## Arithmetic Shift:

- An Arithmetic shift operation shift a signed binary number to the left or right
- Arithmetic shift leaves the sign bit unchanged.

### Arithmetic Left Shift



## Arithmetic Right Shift:



## Arithmetic Logic Unit:

- CPU contains ALU, CU and Registers
- ALU is responsible for arithmetic and logical operations
- Basically ALU is a digital circuit that performs arithmetic operations like addition, subtraction, division and multiplication, and logical operations like AND, OR, XOR, NOT, etc.
- Types of ALU ⇒ ① Combinational ALU  
② Sequential ALU.

## combinational ALU's





4 bit Arithmetic Circuit performing operations  
like addition, subtraction, increment, decrement  
and transfer etc.

| Sq | S <sub>1</sub> | S <sub>0</sub> | Cin | Input     | Output D = A + Y + Cin                   |
|----|----------------|----------------|-----|-----------|------------------------------------------|
| 0  | 0              | 0              | 0   | B         | D = A + B → Add                          |
| 0  | 0              | 1              | 0   | B         | D = A + B + 1 → Add with carry           |
| 0  | 1              | 0              | 0   | $\bar{B}$ | D = A + $\bar{B}$ → Subtract with Borrow |
| 0  | 1              | 1              | 0   | $\bar{B}$ | D = A + $\bar{B}$ + 1 → Subtract         |
| 1  | 0              | 0              | 0   | 0         | D = A → Transfer A                       |
| 1  | 0              | 1              | 0   | 0         | D = A + 1 → Increment A                  |
| 1  | 1              | 0              | 1   | 1         | D = A - 1 → <u>Decrement A</u>           |
| 1  | 1              | 1              | 1   | 1         | D = A → Transfer A                       |

when S<sub>1</sub>=1 and S<sub>0</sub>=1 then  
input is  $(B_3 B_2 B_1 B_0) 1111$   
which is 2's complement  
of 0001 ⇒ It makes

$$\begin{array}{r}
 \begin{array}{cccc} A_3 & A_2 & A_1 & A_0 \\ +1 & 1 & 1 & 1 \end{array} \xrightarrow{\text{+2's compl}} \begin{array}{c} A \\ \hline \end{array} \\
 \Rightarrow A + (-1) \\
 \Rightarrow A - 1
 \end{array}$$

one stage of logic circuit:-



Function Table

| S <sub>1</sub> | S <sub>0</sub> | Logic operation (E)               |
|----------------|----------------|-----------------------------------|
| 0              | 0              | A <sub>i</sub> AND B <sub>i</sub> |
| 0              | 1              | A <sub>i</sub> OR B <sub>i</sub>  |
| 1              | 0              | A <sub>i</sub> XOR B <sub>i</sub> |
| 1              | 1              | NOT A <sub>i</sub>                |

## One stage of arithmetic logic shift unit



## Sequential logic circuit based ALU



- Two registers X and Y stores data or operands.
- select/control selects the appropriate arithmetic or logic operation which is performed over the data stored on register X and Y.
- After the execution of operation the result will be stored in result register.
- After the execution of operation flags may be set such as carry, zero result, positive or negative result, overflow, division by zero etc.

## COA Unit 3 Notes

Computer Organization & Architecture (Dr. A.P.J. Abdul Kalam Technical University)

## Instruction:

- The operation of the processor is determined by the instruction, it executes, referred to as machine instructions or computer instruction
- The collection of different instructions that processor can execute is referred to as the processor's instruction set.
- A computer instruction is a binary code that instruct the computer to perform a specific operation.

## Elements of a Machine instruction:

- operation code
  - source operand reference
  - Result operand reference
  - Next instruction reference
- An operation code known as **opcode** which specifies operation to be performed.
- Source operand reference: Reference to operands on which the operation is to be performed.
- Result operand reference: A reference to operand which will store the result produced by the instruction
- A reference to the next instruction to be executed



A simple instruction format

- source ~~and~~ and result can be in one of four areas.
  - main or virtual memory
  - Processor register
  - Immediate ( contained in instruction directly)
  - I/O device.

- opcodes are represented by abbreviations, called mnenomics such as ADD, SUB, LOAD, STOR etc.
- An instruction may have many fields. A layout of a simple instruction is known as "Instruction format"

### Types of Instruction:

Most computer instructions can be classified into three categories:

- Data Transfer instructions
- Data Manipulation instructions
- Program control instruction

Data transfer instructions cause transfer of data from one location to another without changing the binary information content.

Data manipulation instructions are those that perform arithmetic, logic, and shift operations.

Program control instruction provide decision making capabilities and change the path taken by the program when executed in the computer.

## Data Transfer:

- Data transfer instructions move data from one location to another without changing the data content.
- These instructions are used to bring data to and from memory to registers.

## Examples:

- MOVE : designates a transfer from one register to another.
- Store : transfer ~~and~~ from a processor register into memory.
- Load : a transfer from memory to register
- Exchange: swaps information between two registers or a register and memory.
- Input : transfer from input terminal to processor register.
- Output: transfer from process register to output terminal.
- Push: transfer from process register to the top of stack.
- Pop: transfer from top of stack to process register.
- Clear: transfer of word 0's (zeros) to the destination.
- Set: transfer of word 1's (ones) to the destination.

## Data Manipulation:

- Data manipulation instructions perform operations on the data.
- These can be divided into three types.

- ① Arithmetic
- ② Logical and bit manipulation instructions
- ③ Shift microoperations

## Arithmetic Instructions:

These instructions are used to perform arithmetic operations.

### Example:

Add → compute the sum of two operands

Subtract → Compute difference of two operands

Multiply → compute the product of two operands

Divide → Compute quotient of two operands.

Add with carry → compute the sum of two operands with carry.

Subtract with borrow → compute the difference with borrows.

Negate (2's complement) → change the sign of the operand.

Increment → Add 1 to operand

Decrement → Subtract 1 from operand.

## Logical Instruction:

- logical and bit manipulation instruction.

- logical instructions perform binary operations on string of bits stored in registers.

### Example:

AND → performs AND of operands

OR → performs OR of operands

Complement → takes complement  
or NOT

X-OR → performs X-OR of operands

clear selected bits  $\Rightarrow$  to clear a bit or group of selected bits to zero.

set selected bits  $\Rightarrow$  to set a bit or selected bits to 1.

complement  $\Rightarrow$  to complement a bit or selected bits.

selected bits

Test  $\Rightarrow$  Test specified condition (set flags)

compare  $\Rightarrow$  make logical comparison of two operands.

Enable

Interrupt  $\Rightarrow$  set control variables for interrupt

Disable

Interrupt  $\Rightarrow$  handling, timer control etc.

### Shift Instructions:

→ Shifts are operations in which the bits of the word are moved to the left or right.

#### Example:

logical shift Right

logical shift left

Arithmetic shift Right

Arithmetic shift left

Rotate Right

Rotate Left

## Program Control: or Transfer of Control instructions

- A program control type of instructions, when executed, may change the address value in the program counter and cause the flow of control to be altered.
- Program control instructions specify conditions for altering the content of the program counter.

### Example Program control instruction

Jump(branch) → Unconditional transfer, load PC with specified address.

Jump(Conditional) → Test specified condition, either load PC with specified address, or do nothing based on condition

Jump to subroutine → Place current program control information in known location, or CALL  
Jump to specified address.

Return → Replace contents of PC and other registers from the known locations.

Skip → Increment to PC to skip next instruction.

Skip(conditional) → Test specified conditions, either skip or do nothing based on condition

Halt → Stop program execution.

Wait (hold) → Stop program execution, test specified condition repeatedly; resume execution when condition is satisfied.

No operation → No operation is performed, but program execution is continued.

### Input and output Instructions:

input (read) → Transfer data from input terminal to a destination

output (write) → Transfer data from specified source to output terminal

start I/O → Transfer instruction to I/O processor to initiate operation

Test I/O → Transfer status information I/O system to the specified destination.

### conversion Instruction:

• convert or translate value or words.

#### Example

Translate → Translates value in a section of memory based on table of correspondence

convert → converts the content of word from one form to another.

Example

## Conditional Branch instructions

Branch if zero

Branch if not zero

Branch if carry

Branch if not carry

Branch if plus

Branch if minus

Branch if overflow

Branch if no overflow

## STORED PROGRAM ORGANIZATION: OR

### Single Accumulator Organization

OR

### Processor Organization with Accumulator

- To organize a computer in this method, Processor has a register called accumulator and an instruction code format with two parts.
- |        |                    |
|--------|--------------------|
| Opcode | Address of operand |
|--------|--------------------|
- The first part specifies the operation to be performed and the second specifies an address of operand in the memory.
- Another. The other operand will be stored in the processor register

### Example

Memory 4096X16



Number of bits in address = ~~12~~ 12

$$\begin{aligned} \text{Number of memory locations} &= 2^{12} \\ &= 4096 \end{aligned}$$

The number of bits of operand or data  
= 16

In this method, processor performs the operation specified by the opcode on the data specified by stored on the memory location and content of AC Register.

## Instruction Format:

Most common fields of instructions are -

- Operation field which specifies the operation to be performed
- Address field which contains the location of operand i.e., register or memory location
- Mode field which specifies how operand is to be found.

Generally CPU organizations are

- Single Accumulator Organization
- General Register Organization
- Stack Organization

→ An instruction is of various length depending on the number of addresses it contains

On the basis of number of addresses instruction can be classified

- Three address instruction
- Two address instruction
- One address instruction
- Zero address instruction

instruction formats

## Three address instruction :

This format of instruction has three addresses to specify a register or memory location.

|        |                     |                |                     |
|--------|---------------------|----------------|---------------------|
| Opcode | Destination address | Source Address | Source address Mode |
|--------|---------------------|----------------|---------------------|

Examples:      ADD R1, R2, R3  
                        OR  
                        ADD R1, A, B

## Two address instruction:

This format has only two address to specify a register or memory location.

|        |                     |                |      |
|--------|---------------------|----------------|------|
| Opcode | Destination address | Source address | Mode |
|--------|---------------------|----------------|------|

Examples:      MOV R1, A      OR  
                        ADD R1, B

## One - address instruction :

- It has only one address to specify a register or a memory location
- This uses an implied 'ACCUMULATOR' or AC Register to store one operand / result.

|        |                    |      |
|--------|--------------------|------|
| Opcode | Address of operand | Mode |
|--------|--------------------|------|

Example      LOAD A

OR

This document is available free of charge on

ADD B

Downloaded by Sangam gupta (sangamg7318@gmail.com)

## Zero Address Instruction:

- This instruction format does not contain any address field.
- A stack based computer do not use address field in instruction to perform operation because the operation is performed on the two top items of the stack.



Example

ADD  
~~op~~

Note In stack-base organization, to evaluate an expression first it is converted to Postfix notation or Reverse Polish Notation

Example for instruction formats to evaluate an expression

Example ① Expression  $X = (A+B)*(C+D)$

Three address Format:

Description:

ADD R1, A, B

$R1 \leftarrow M[A] + M[B]$

ADD R2, C, D

$R2 \leftarrow M[C] + M[D]$

MUL ~~X~~, R1, R2

$M[X] \leftarrow R1 * R2$

Here R1, R2 are registers

$M[J] \rightarrow$  memory locations

## Two address instructions

$$X = (A+B) * (C+D)$$

|            | Description               |
|------------|---------------------------|
| MOV R1, A  | $R1 \leftarrow M[A]$      |
| ADD R1, B  | $R1 \leftarrow R1 + M[B]$ |
| MOV R2, C  | $R2 \leftarrow M[C]$      |
| ADD R2, D  | $R2 \leftarrow R2 + M[D]$ |
| MUL R1, R2 | $R1 \leftarrow R1 * R2$   |
| MOV X, R1  | $M[X] \leftarrow R1$      |

## One address instruction

|         | Description               |
|---------|---------------------------|
| LOAD A  | $AC \leftarrow M[A]$      |
| ADD B   | $AC \leftarrow AC + M[B]$ |
| STORE T | $M[T] \leftarrow AC$      |
| LOAD C  | $AC \leftarrow M[C]$      |
| ADD D   | $AC \leftarrow AC + M[D]$ |
| MUL T   | $AC \leftarrow AC * M[T]$ |
| STORE X | $M[X] \leftarrow AC$      |

where  $AC$  ~~is~~<sup>is</sup> accumulator register

$M[T]$  is temporary memory location to hold the intermediate result

$M[J] \leftarrow$  memory location

## Zero address Instruction:

$$X = (A+B)*(C+D)$$

Postfix notation  $X = A\ B + C\ D + *$

|      |   | Description                       |
|------|---|-----------------------------------|
| PUSH | A | TOP = <del>A</del> $\leftarrow A$ |
| PUSH | B | TOP = B                           |
| ADD  |   | TOP = A+B                         |
| PUSH | C | TOP = C                           |
| PUSH | D | TOP $\leftarrow D$                |
| ADD  |   | TOP $\leftarrow (C+D)$            |
| MUL  |   | TOP $(A+B)*(C+D)$                 |
| POP  | X | $M[X] = TOP$                      |

Ques: write a program to evaluate  $\frac{(A-B)+C*(D*E-F)}{G+H*K}$

Ans: Three address instruction:

|     |            | Description                 |
|-----|------------|-----------------------------|
| SUB | R1, A, B   | $R1 \leftarrow M[A] - M[B]$ |
| MUL | R2, D, E   | $R2 \leftarrow M[D] * M[E]$ |
| SUB | R2, R2, F  | $R2 \leftarrow R2 - M[F]$   |
| MUL | R2, R2, C  | $R2 \leftarrow R2 * M[C]$   |
| ADD | R1, R1, R2 | $R1 \leftarrow R1 + R2$     |
| MUL | R3, H, K   | $R3 \leftarrow M[H] + M[K]$ |
| ADD | R3, R3, G  | $R3 \leftarrow R3 + M[G]$   |
| DIV | X, R1, R3  | $M[X] \leftarrow R1 / R3$   |

## Two address instructions

|     |        | Description               |
|-----|--------|---------------------------|
| MOV | R1, A  | $R1 \leftarrow M[A]$      |
| SUB | R1, B  | $R1 \leftarrow R1 - M[B]$ |
| MOV | R2, D  | $R2 \leftarrow M[D]$      |
| MUL | R2, E  | $R2 \leftarrow R2 * M[E]$ |
| SUB | R2, F  | $R2 \leftarrow R2 - M[F]$ |
| MUL | R2, C  | $R2 \leftarrow R2 * M[C]$ |
| ADD | R1, R2 | $R1 \leftarrow R1 + R2$   |
| MOV | R3, H  | $R3 \leftarrow M[H]$      |
| ADD | R3, G  | $R3 \leftarrow R3 + M[G]$ |
| DIV | R1, R3 | $R1 \leftarrow R1 / R3$   |
| MOV | X, RI  | $M[X] \leftarrow R1$      |

## One address instruction

|       |   | Description               |
|-------|---|---------------------------|
| LOAD  | A | $AC \leftarrow M[A]$      |
| SUB   | B | $AC \leftarrow AC - M[B]$ |
| STORE | T | $M[T] \leftarrow AC$      |
| LOAD  | D | $AC \leftarrow M[D]$      |
| MUL   | E | $AC \leftarrow AC * M[E]$ |
| SUB   | F | $AC \leftarrow AC - M[F]$ |
| MUL   | C | $AC \leftarrow AC * M[C]$ |
| ADD   | T | $AC \leftarrow AC + M[T]$ |
| STORE | T | $M[T] \leftarrow AC$      |
| LOAD  | H | $AC \leftarrow M[H]$      |
| MUL   | K | $AC \leftarrow AC * M[K]$ |
| ADD   | G | $AC \leftarrow AC + M[G]$ |

STORE T1  $M[T1] \leftarrow AC$   
 LOAD T  $AC \leftarrow M[T]$   
 DIV T1  $AC \leftarrow AC / M[T1]$   
 STORE X  $M[X] \leftarrow AC$

## Zero address instruction

Postfix Notation AB - CDE \* F - \* + GHK \* + /

|        | <u>Description</u>                                              |
|--------|-----------------------------------------------------------------|
| PUSH A | Top $\Rightarrow A$                                             |
| PUSH B | Top $\Rightarrow B$                                             |
| SUB    | Top $\Rightarrow (A - B)$                                       |
| PUSH C | Top $\Rightarrow C$                                             |
| PUSH D | Top $\Rightarrow D$                                             |
| PUSH E | Top $\Rightarrow E$                                             |
| MUL    | Top $\Rightarrow (D * E)$                                       |
| PUSH F | Top $\Rightarrow F$                                             |
| SUB    | Top $\Rightarrow ((D * E) - F)$                                 |
| MUL    | Top $\Rightarrow C * ((D * E) - F)$                             |
| ADD    | Top $\Rightarrow (A + B) + (C * ((D * E) - F))$                 |
| PUSH G | Top $\Rightarrow G$                                             |
| PUSH H | Top $\Rightarrow H$                                             |
| PUSH K | Top $\Rightarrow K$                                             |
| MUL    | Top $\Rightarrow (H * K)$                                       |
| ADD    | Top $\Rightarrow G + (H * K)$                                   |
| DIV    | Top $\Rightarrow ((A - B) + C * ((D * E) - F)) / (G + (H * K))$ |
| POP X  | $X \leftarrow \text{Top}$                                       |

## Tutorial sheet

Ques: ① write a program for expression  $A + B * C$  using one address and Two address instruction

Ans: Expression  $\Rightarrow D = A + B * C$

Program using one address instruction

|       |   |                           |
|-------|---|---------------------------|
| LOAD  | B | $AC \leftarrow M[B]$      |
| MUL   | C | $AC \leftarrow AC * M[C]$ |
| ADD   | A | $AC \leftarrow AC + M[A]$ |
| STORE | D | $M[D] \leftarrow AC$      |

Program using Two address instruction

|                |       |                           |
|----------------|-------|---------------------------|
| MOV            | RI, B | $RI \leftarrow M[B]$      |
| <del>MUL</del> | RI, C | $RI \leftarrow RI * M[C]$ |
| ADD            | RI, A | $RI \leftarrow RI + M[A]$ |
| STORE          | D, RI | $M[D] \leftarrow RI$      |
| MOV            |       |                           |

Program using Zero address instruction

~~ADD~~ A BC \* + =

|        |                    |
|--------|--------------------|
| PUSH A | $Top = A$          |
| PUSH B | $Top = B$          |
| PUSH C | $Top = C$          |
| MUL    | $Top = B * C$      |
| ADD    | $Top = A + B * C$  |
| POP D  | $M[D] = A + B * C$ |

## New Topic

→ How would you find the number of bits?

Example (Ans) if a system has memory of 4096 words 16 bits per word  
then ⇒

| Register             |    | Size or<br>Number of bits |
|----------------------|----|---------------------------|
| Program Counter      | PC | 12                        |
| Address Register     | AR | 12                        |
| Instruction Register | IR | 16                        |
| Data Register        | DR | 16                        |
| Accumulator          | AC | 16                        |

Number of locations  
= 4096

Number of bits to  
identify 4096  
locations ⇒ 12

→ Number of bits in address bus ⇒ 12  
Number " " " data bus = 16

Ques A computer uses a memory of 256 K words of 32 bit each.  
A binary instruction code is stored in a one word of memory.

The instruction has four parts

- An indirection bit
- An operation code
- A register part to specify 64 registers and
- An address part

- (i) How many bits are there in the operation code, the register code and address part?
- (ii) Draw the instruction format and indicate the number of bits in each part.
- (iii) How many bits are there in the data and address inputs/bus of memory

Ans Number of words in memory =  $256K \Rightarrow 256 \times 1024 \Rightarrow 2^8 \times 2^{10}$   
Size of the word  $\Rightarrow 32$  bit

Size of the instruction = 32 { because instruction is stored in one word of memory}

No. of Indirection bits  $\Rightarrow 1$

Number of registers = 64  $\Rightarrow 2^6$

bits to identify 64 registers = 6

Number of bits in address = 18

instruction format Given



$$\begin{aligned} \text{Number of bits in opcode} &= 32 - (1 + 6 + 18) \\ &= 7 \text{ bits} \end{aligned}$$

- (i) Number of bits in opcode = 7  
Number of bits in register code = 6  
Number of bits in address part = 18

(ii) Format of instruction



- (iii) Number of bits in the address input = 18  
Number of bits in data input = 32

Ques For 256 instructions and 1024 word memory, find out the number of bits required in each type of instruction.

Four address instruction has

- Opcode
- Reference to operands
- Reference of the next instruction

Aus:

Four address instruction

|        |                |                |                |                |
|--------|----------------|----------------|----------------|----------------|
| Opcode | A <sub>1</sub> | A <sub>2</sub> | A <sub>3</sub> | A <sub>4</sub> |
|--------|----------------|----------------|----------------|----------------|

A<sub>1</sub>, A<sub>2</sub>, A<sub>3</sub> → operand/ Result reference

A<sub>4</sub> ⇒ Next instruction reference.

Number of possible instruction = 256

Number of bits in opcode = 8

$$2^8 = 256$$

Number of locations = 1024

Number of bits in address = 10

$$\{ \because 2^{10} = 1024 \}$$

Four address instruction

|               |                         |                         |                   |                        |
|---------------|-------------------------|-------------------------|-------------------|------------------------|
| 8 bits opcode | 10 Address <sub>1</sub> | 10 address <sub>2</sub> | 10 A <sub>3</sub> | 10 bits A <sub>4</sub> |
|---------------|-------------------------|-------------------------|-------------------|------------------------|

Number of bits in instruction = 48

## instruction

|        |          |          |           |
|--------|----------|----------|-----------|
| Opcode | operand1 | operand2 | - - - - - |
|--------|----------|----------|-----------|

- if operands are stored in register, then it is called register reference instruction.
- if operands are stored in memory, then it is called a memory reference instruction.

Ques: if 16-bit fixed length of instruction is stored in 128 word memory, then find out the number bits in opcode and total number of possible operations in 1 address instruction format and 2 address instruction format.

Ans: One address instruction format.

Number of locations in memory = 128

Number of bits in address to identify 128 location =  $n$

$$2^n = 128$$

$$2^n = 2^7$$

$$\boxed{n = 7}$$

Number of bits in address field = 7



→ Number of bits in opcode = 9

Number of possible operations =  $2^9 \Rightarrow 512$

## COMMON BUS SYSTEM:



## \* Instruction cycle:

An instruction cycle consists of an instruction fetch, followed by zero or more operands fetches, followed by zero or more operand stores, followed by an interrupt check (if interrupts are enabled).

→ An instruction cycle includes the following stages/subcycles.

- Fetch
- Execute
- Interrupt

- ① Fetch: Read the next instruction from memory into processor
- ② Execute: Interpret the opcode and perform the indicated operation
- ③ Interrupt: If interrupts are enabled and an interrupt has occurred, save the current process state and service the interrupt.
- ④ Indirect (additional) only when indirect addressing involved.

After an instruction is fetched, it is examined if any indirect addressing is involved or not. If involved, the required operands are fetched using indirect addressing. Following execution, an interrupt may be processed before next instruction fetch.



Instruction cycle  
(indirect addressing involved)

## Instruction cycle State diagram:



The exact sequence of events during an instruction cycle depends on the design of the processor.

## Instruction Sub-cycle's Data flow:

**Fetch cycle:** During fetch cycle, an instruction is read from memory. PC (program counter) contains the address of next instruction to be fetched. This address is moved to MAR and placed on the address bus.

The control unit requests a memory read and the result is placed on the data bus and copied to MBR and moved to IR. And then PC is incremented by 1.



Data flow fetch cycle

### Indirect Cycle:

After fetch cycle, control unit checks IR if it contains an operand specifier using indirect addressing, if so an indirect cycle is performed.

- The eight most N bits of MBR, which contains the address reference are transferred to the MAR.
- Then control unit requests a memory read, to get the desired address of the operand into the MBR.



Dataflow  
indirect  
cycle

## Interrupt cycle:

If an interrupt has occurred, the current content of PC must be saved so that the processor can resume normal activity after the interrupt.

Thus the content of PC are transferred to the MBR to be written into the memory. The special memory location reserved for this purpose is loaded into MAR from the control unit.

The PC is loaded with the address of the interrupt routine. As a result, the next instruction cycle will begin by fetching the appropriate instruction.



Data flow, interrupt cycle

## Flow chart of instruction cycle:

- As programme In basic computer, each instruction cycle consists of the following.
  - 1- Fetch the instruction from memory
  - 2- Decode the instruction
  - 3- Read the effective address from the memory if indirect address
  - 4- Execute the instruction.

After step 4, the control goes back to step 1 to fetch, decode and execute the next instruction.

This process continues unless a HALT instruction is encountered



The flowchart represents initial configuration of instruction cycle and shows how the control determines the instruction type after the decoding.

### Types of instructions:

#### Memory Reference Instructions

opcode from 000 to 110 i.e  $D_7 = 0$



#### Register Reference Instruction

opcode 111 i.e  $D_7 = 1, I = 0$



#### Input-output instruction

opcode 111 i.e  $D_7 = 1, I = 1$



#### Register Reference Instructions'

|   |     |                                                                                       |                                   |
|---|-----|---------------------------------------------------------------------------------------|-----------------------------------|
| ① | CLA | $AC \leftarrow 0$                                                                     | clear AC                          |
| ② | CLE | $E \leftarrow 0$                                                                      | clear E $E \Rightarrow$ End carry |
| ③ | CMA | $AC \leftarrow \overline{AC}$                                                         | complement AC                     |
| ④ | CME | $E \leftarrow \overline{E}$                                                           | complement E                      |
| ⑤ | CIR | $AC \leftarrow \text{Shr AC}$<br>$AC(\text{IS}) \leftarrow E$<br>$E \leftarrow AC(0)$ | circulate AC right                |

|      |     |                                                                                 |                      |
|------|-----|---------------------------------------------------------------------------------|----------------------|
| (6)  | CIL | $AC \leftarrow \text{shl } AC$<br>$AC(0) \leftarrow E$<br>$E \leftarrow AC(15)$ | circulate left .     |
| (7)  | INC | $AC \leftarrow AC + 1$                                                          | increment AC         |
| (8)  | SPA | if $(AC(15) == 0)$ then<br>$PC \leftarrow PC + 1$                               | skip if positive     |
| (9)  | SNA | if $(AC(15) == 1)$ then<br>$PC \leftarrow PC + 1$                               | skip if negative     |
| (10) | SXA | if $(AC == 0)$ then<br>$PC \leftarrow PC + 1$                                   | skip if zero AC Zero |
| (11) | SZE | if $(E == 0)$ then<br>$PC \leftarrow PC + 1$                                    | skip if E zero       |
| (12) | HLT | $S \leftarrow 0$ ( S is a start-stop flip flop )                                | Half computer        |

## Memory Reference Instructions:

operation Decoder:

|     |     |                |                                                                                                                                                                  |
|-----|-----|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| (1) | AND | D <sub>0</sub> | D <sub>1</sub> T <sub>4</sub> : DR $\leftarrow M[AR]$<br>D <sub>0</sub> T <sub>5</sub> : AC $\leftarrow AC \wedge DR$ , SC $\leftarrow 0$                        |
| (2) | ADD | D <sub>1</sub> | D <sub>1</sub> T <sub>4</sub> : DR $\leftarrow M[AR]$<br>D <sub>1</sub> T <sub>5</sub> : AC $\leftarrow AC + DR$ ,<br>E $\leftarrow C_{out}$ , SC $\leftarrow 0$ |
| (3) | LDA | D <sub>2</sub> | D <sub>2</sub> T <sub>4</sub> : DR $\leftarrow M[AR]$<br>D <sub>2</sub> T <sub>5</sub> : AC $\leftarrow DR$ , SC $\leftarrow 0$                                  |

↑ Load to AC

(4) STA  $D_3$   $D_3 T_4:$   $M[ARJ] \leftarrow AC, SC \leftarrow 0.$   
 ↑  
 Store AC

(5) BUN  $D_4$   $D_4 T_4:$   $PC \leftarrow AR, SC \leftarrow 0$   
 Branch  
unconditionally

(6) BSA  $D_5$   $D_5 T_4:$   $M[ARJ] \leftarrow PC, AR \leftarrow AR+1$   
 ↑  
 Branch and  
Save Return address

(7) ISZ  $D_6$   $D_6 T_4:$   $DR \leftarrow M[ARJ]$   
 $D_6 T_5:$   $DR \leftarrow DR + 1$   
 $D_6 T_6:$   $M[ARJ] \leftarrow DR,$   
 if ( $DR = 0$ ) then  $PC \leftarrow PC+1$   
 $SC \leftarrow 0$   
 Increment  $DR$  and  
skip if zero

Note that we need Seven time signals ( $T_0$  to  $T_6$ ) to execute  
the longest instruction ISZ

## Input - Output Instructions

|     |                                                |                      |
|-----|------------------------------------------------|----------------------|
| INP | $AC(0-7) \leftarrow INPR$ , $FGI \leftarrow 0$ | input character      |
| OUT | $OUTR \leftarrow AC(0-7)$ $FGO \leftarrow 0$   | output character     |
| SKI | $if(FGI = 1) then$<br>$PC \leftarrow PC + 1$   | skip on input flag   |
| SKO | $If(FGO = 1) then$<br>$PC \leftarrow PC + 1$   | skip on output flag  |
| ION | $IEN \leftarrow 1$                             | interrupt enable ON  |
| IOF | $IEN \leftarrow 0$                             | interrupt enable off |

## input - output configuration

input - output terminal

Serial communication interface

computer Register and flip flops



INPR  $\Rightarrow$  8 bit input Register

AC  $\Rightarrow$  Accumulator

FGI  $\Rightarrow$  1 bit control flip flop. This flag is set to 1 when new information is available. And, this flag is clear to zero when no information is accepted by the computer.

OUTR  $\Rightarrow$  output Register (8 bits)

FGO  $\Rightarrow$  is 1 bit control flip flop. If FGO is 1 then information from AC is transferred to OUTR. And then FGO is clear to zero.

---

## Instruction codes:-

- \* The organization of a computer can be defined by its internal registers, the timing and control structure, and the set of instructions it uses.
- \* The internal organization of a digital system is defined by the sequence of microoperations it performs on data stored in its registers.
- A computer instruction is a binary code that specifies a sequence of microoperations for the computer.
- Instruction codes together with the data are stored in memory. (call stored program concept)  
The computer reads each instruction from memory ~~and~~. The control then interprets the binary code of instruction and proceeds to execute it by using a sequence of microoperations. Every computer has its own unique instruction set.
- Instruction code is a group of bits that instructs the computer to perform a specific operation. It is usually divided into ~~two~~ parts
- operation codes operation code part of an instruction is a group of bits that define operations such as add, subtract, shift, complement or increment etc.

The number of bits in opcode =  $n$

$$2^n \Rightarrow \text{number of operations}$$

will depend on the number of operations ~~in~~ in computer

## Control Unit:

control unit generates control signals. control signals provide control inputs for the multiplexers in the common bus, control inputs in processor registers, and microoperations for the accumulator.



→ components of control unit are

① → Two decoders

② → One sequence counter

③ → control logic gates.

- An instruction is read from memory is placed in the IR register.
- In control unit the IR is divided into three parts
  - I bit (Indirect bit)
  - the operation code bit (12-14)
  - ~~the~~ bit 0 to 11
- The operation code bits are decoded by a  $3 \times 8$  decoder as outputs  $D_0$  to  $D_7$ .
- Bit 15 is transferred to a flip flop I.
- Bits 0 through 11 are applied to the control logic gates.
- The 4-bit sequence counter can count in binary from 0 to 15. The outputs of counter are decoded into 16 timing signals ~~to~~  $T_0$  to  $T_{15}$

The sequence counter SC can be incremented or cleared. Most of the time, the counter is incremented to provide the sequence of timing signals. Once in awhile, the counter is cleared to zero, causing the next active timing signal to be  $T_0$ .

Example consider a case where SC is incremented to provide the signals ~~to be~~  $T_0, T_1, T_2, T_3$  and  $T_4$  in sequence. At the time of  $T_4$ , SC is cleared to 0 if decoder output  $D_3$  is active. This is expressed symbolically.

$$D_3 T_4 : SC \leftarrow 0$$

## Micro operation

→ The execution of an instruction involves the execution of a sequence of substeps, generally called cycles.

For example, an execution may consist of fetch, indirect, execute and interrupt cycles. Each cycle is in turn made up of a sequence of more fundamental operations called microoperations.

A single microoperation generally involves a transfer between registers, transfer between a register and an external bus or a simple ALU operation



Ques: what is difference between operation (macro operation) and micro-operation .

Ans: An operation is the part of instruction. An operation is a binary code that tells the computer to perform the specific task or operation.

The control unit receives the instruction from memory and interprets the operation code bits. It then issues a sequence of control signals to initiate microoperations in internal computer registers.

For every operation code, the control issues a sequence of microoperations. For this reason, an operation code is sometimes called a macro operation because it specifies a set of micro operations.

## Hardwired and Micro programmed control:

- There are two types of control organization. A control unit can be implemented using two techniques:
  - Hardwired control unit
  - Microprogrammed control unit

### Hardwired control:

- In hardwired organization, the control unit is implemented with gates, flip flops, decoders and other digital circuits.
- When the control signals are generated by the hardware using conventional logic design techniques, the control unit said to be hardwired.



- Hardwired control unit uses a fixed logic to interpret an instruction and generate appropriate signals.

## Microprogrammed control

A microprogrammed control unit is implemented using programming approach.

→ The control unit initiates a series of sequential microoperations. The control variables can be represented by a string of 1's and 0's called a control word.

A microprogrammed control unit is a control unit whose binary control variables are stored in memory

Control memory: A memory that is part of control unit is called control memory. Control memory is the storage in the microprogrammed control unit to store the micro program.

micro-instruction: Each word in control ~~was~~ memory contains a microinstruction. A microinstruction specifies one or more microoperations for the system.

A sequence of microinstructions constitutes a ~~micro~~ microprogram.

Microcode: A logically coherent portion of micro program is called micro-code.

A micro instruction can cause execution of one or more micro operations and a sequence of microinstruction (micro program) can cause the execution of an instruction.

## Microprogrammed control organization



- The control memory is assumed to be a ROM, within which all control information is permanently stored.
- The control address register specifies the address of the microinstruction and control data register holds the microinstruction read from memory.
- The microinstruction contains a control word that specifies one or more micro operations for the processor. Once these operations are executed, the control must determine the next address.
- The location of the next micro instruction may be the one next in the sequence or it may be located somewhere else in the control memory.
- The next address generator is sometimes called a micro program sequencer as it determines the address sequence that is read from the control memory.

## Organization of Control Memory:



This shows control words or microinstruction arranged in control memory. The microinstruction in each routine are to be executed sequentially. Each routine ends with a branch or jump instruction indicating where to go next.

## Microinstruction Format :

Example:



→  $F_1, F_2, F_3 \Rightarrow$  specify micro operations  
each F defines seven distinct micro operations  
None for zero (000)  
Different operations for (1 to 6)

→  $CD \Rightarrow$  Field selects status bit conditions  
→  $BR \Rightarrow$  field specifies the type of branch to be used  
→  $AD \Rightarrow$  This field contains a branch address. ~~This~~ The size of this field depends on the control memory size.  
Here AD has 7 bits that means  $\Rightarrow 2^7$  words in control memory.

## Horizontal Microprogrammed Control unit:

- The control signals are represented in decoded binary format that is 1 bit per one control signal.
- If 53 control signals are present in the processor then 53 bits are required.
- More than 1 control signal can be enabled at a time.
- It supports longer control word.



- It requires no additional hardware (decoders).
- It is faster than Vertical microprogrammed.

## Vertical Microprogrammed control unit:



- The control signals are represented in the encoded binary format.

- For  $N$  control signals  $\log_2 N$  bits are required.
- It supports shorter control words.
- It requires an additional hardware (decoders) to generate control signals.
- It is slower than horizontal microprogrammed.

Ques: suppose there are 64 signal control signals in the system and ~~and~~ 1024 word control memory then find out the size of microinstructions horizontal and vertical both.

Horizontal :-

- 64 bits for 64 control signals
- Address bits  $\Rightarrow 10$  because  $words = 1024 = 2^{10}$
- size of microinstruction  $\boxed{[control\ signal\ bits \mid Address\ bits]}$

|    |    |
|----|----|
| 64 | 10 |
|----|----|

$\Rightarrow 74$  bits

vertical :-

- control signals  $= 64 \Rightarrow 2^6$
- Number of bits for control signals  $= 6$
- Address bits  $\Rightarrow 10$ . because  $1024 \Rightarrow 2^{10}$
- size of vertical micro instruction  $\Rightarrow$

|   |    |
|---|----|
| 6 | 10 |
|---|----|

$\Rightarrow 16$  bits

Ques: consider a Microprogrammed CU, where 1024 word control memory is used. The CU has to support 50 control signals and 16 flag conditions.

- (i) How many bits required in control word.
- (ii) what is the size of control memory

Ans:

Format of control word or microinstruction

|                 |                 |         |
|-----------------|-----------------|---------|
| control signals | condition flags | Address |
| bits            | bits            | bits    |

Horizontal microprogramming:

size of control word or microinstruction  $\Rightarrow [50 \mid 4 \mid 10] \Rightarrow 64$

Number of bits for control signals  $\Rightarrow 50$  (bit per control signal)

Number of bits for flags  $\Rightarrow 4$  because  $16 \Rightarrow 2^4$

Number of address bits  $\Rightarrow 10$  because  $1024 = 2^{10}$ .

The size of each control word  $\Rightarrow 64$  bit.

control memory has words  $\Rightarrow 1024$

$$\begin{aligned} \text{Size of control memory} &= 1024 \times 64 \text{ bits} \\ &\Rightarrow 1 \text{ KB} \times 8 \text{ B} \\ &\Rightarrow 8 \text{ KB} \end{aligned}$$

Vertical Microprogramming:

control signal bits  $\Rightarrow 6$  because

Control signals  $\Rightarrow 50$

flag bits  $\Rightarrow 4$

Address bits  $\Rightarrow 10$

size of control word  $\Rightarrow 20$  bits  $\Rightarrow (6+4+10)$

size of control memory  $\Rightarrow 20 \times 1024$

size of control memory  $\Rightarrow 20 \times 1024$

## → Micro program Sequencer:

- The next address generator is sometimes called a microprogram sequencer, as it determines the address sequence that is read from control memory.
- Typical functions of a microprogram sequencer are:
  - Incrementing the control address register by one
  - Loading into the control register an address from control memory
  - Transferring an external address
  - Loading an initial address to start the control operations.

## → Address Sequencing:

- Microinstructions are stored in control memory in groups, with each group specifying a routine.
- Each computer instruction has its own microprogram routine in control memory to generate the microoperations that execute the instruction.
- steps that the control must undergo during the executing of a single computer instruction.
  - An initial address is loaded into the control register address register when power is turned on in the computer
  - This address is usually the address of the first microinstruction that activates the instruction fetch routine.

→ The control unit must go through the routine that determines the effective address of the operand.

→ The next step is to generate the micro operations that execute the instruction fetched from the memory.

Transformation from the instruction code bits to an address in control memory where the routine is located is referred to as mapping process

→ Address sequencing capabilities required in a control memory are:

→ Incrementing the CAR

→ Unconditional branch or conditional branch, depending on status bit conditions

→ Mapping process from the bits of the instruction to an address for control memory.

→ A facility for subroutine call or return.

→ The micro-instruction contains

→ a set of bits to initiate microoperations in control registers.

→ other bits to specify the method by which the next address is obtained.



microprogram sequences for control memory

### Explanation:

- The first multiplexer selects an address from one of four sources (incremented External, from micro instruction or SBR) and routes it into CAR.
- The second multiplexer tests the value of a selected status bit and the result of test is applied to input logic.
- $SBR \Rightarrow$  SubRoutine Register

CD field of microinstruction selects one of the status bits in the second multiplexer.

| CD | condition  | Symbol |                      |
|----|------------|--------|----------------------|
| 00 | always = 1 | U      | unconditional branch |
| 01 |            | I      | Indirection bit      |
| 10 |            | S      | Signbit              |
| 11 |            | Z      | Zero value in AC     |

BR  $\Rightarrow$  Branch type.

|    |      |
|----|------|
| 00 | JMP  |
| 01 | CALL |
| 10 | RET  |
| 11 | MAP  |

Input logic design  $\Rightarrow$

$$S_1 = I_1$$

$$S_0 = I_1 \cdot I_0 + \bar{I}_1 T$$

$$L = \bar{I}_1 I_0 T$$

## Reduced Instruction Set Computer (RISC)

The concept of RISC architecture involves an attempt to reduce execution time by simplifying the instruction set.

### Characteristics of RISC Processors:

- Relatively few instructions
- Relatively few addressing modes
- Memory access limited to load and store instructions
- All operations done within the register of CPU
- Fixed length, easily decoded instruction format
- Single cycle instruction execution
- Hardwired rather than microprogrammed.
- Simple load and store operations for memory access.

## Complex Instruction Set Computers (CISC)

The essential goal of CISC architecture is to provide a single machine instruction for each statement that is written in high level language.

### Characteristics of CISC architecture:

- A large number of instructions (from 100 to 250 typically)
- Some instructions that perform specialized tasks and are used infrequently.
- A large variety of addressing modes - typically from 5 to 20 different modes.
- Variable length instruction formats.
- Instructions that manipulate operands in memory.

## Pipelining:

- Pipelining is an implementation technique whereby the multiple instructions are overlapped in execution.
- Pipeline processing is an implementation technique where arithmetic suboperations or the phases of a computer instruction cycle overlap in the execution.
- Pipeline is divided into stages or segments and these stages are connected with one another to form a pipe like structure. Instructions enter from one end and exit from another end.
- Pipeline increases the overall instruction throughput.
- In pipeline system, each segment consists of an input register followed by a combinational circuit. The register is used to hold data and combinational circuit performs operations on it. The output of the next segment's combinational circuit is applied to the input register of the next segment.



## Types of pipeline:

- Arithmetic Pipeline
- Instruction Pipeline

### Arithmetic pipeline:

- found in very high speed computers
- Arithmetic suboperations overlapping
- used to implement floating point operations, multiplication of fixed point numbers etc.

### Instruction pipeline:

- In this stream of instructions can be executed by overlapping fetch, decode and execute phases of an instruction cycle.
- increases throughput of the system

### Advantages of pipelining:

- The cycle time of processor is reduced.
- It increases throughput of the system.
- It makes system reliable.

### Disadvantages of pipelining

- The design of a pipelined processor is complex and costly to manufacture.
- Instruction latency is more.

## Example for pipeline organization

- Example of performing combined multiply and add operations with a stream of numbers.

$$A_i \rightarrow B_i + C_i \quad \text{for } i = 1, 2, 3, \dots, 7.$$

Each suboperation is to be implemented in a segment with a pipeline.

### Suboperations

$$R_1 \leftarrow A_i \quad R_2 \leftarrow B_i$$

$$R_3 \leftarrow A_i * B_i \quad R_4 \leftarrow C_i$$

$$R_5 \leftarrow R_3 + R_4$$

The five registers are loaded with new data every clock pulse.



| Clock pulse | Segment 1      |                | Segment 2                      |                | Segment 3                                      |    |
|-------------|----------------|----------------|--------------------------------|----------------|------------------------------------------------|----|
|             | R1             | R2             | R3                             | R4             | R5                                             | Ci |
| 1           | A <sub>1</sub> | B <sub>1</sub> |                                |                |                                                | -  |
| 2           | A <sub>2</sub> | B <sub>2</sub> | A <sub>1</sub> *B <sub>1</sub> | C <sub>1</sub> |                                                | -  |
| 3           | A <sub>3</sub> | B <sub>3</sub> | A <sub>2</sub> *B <sub>2</sub> | C <sub>2</sub> | A <sub>1</sub> *B <sub>1</sub> +C <sub>1</sub> |    |
| 4           | A <sub>4</sub> | B <sub>4</sub> | A <sub>3</sub> *B <sub>3</sub> | C <sub>3</sub> | A <sub>2</sub> *B <sub>2</sub> +C <sub>2</sub> |    |
| 5           | A <sub>5</sub> | B <sub>5</sub> | A <sub>4</sub> *B <sub>4</sub> | C <sub>4</sub> | A <sub>3</sub> *B <sub>3</sub> +C <sub>3</sub> |    |
|             | ⋮              | ⋮              | ⋮                              | ⋮              | ⋮                                              | ⋮  |
|             | so on -        |                |                                |                |                                                |    |

## General Considerations :

- ⇒ some applications need to repeat the same task many times with different sets of data.
- ⇒ The operands pass through all the segments in a fixed sequence. Each segment consists of a combinational circuit  $S_i$  that performs a suboperation over the data stream flowing through the pipe.
- ⇒ The segments are separated by registers  $R_i$  that hold the intermediate results between the stages.
- ⇒ Information flows between adjacent stages under the control of a common clock pulse applied to all registers simultaneously.

Task: a task as the total operation performed going through all the segments in the pipeline.

## Space-time diagram : (For four pipeline stages)

- ⇒ shows the segment utilization as a function of time.



Assumed Number of task = 6

## Instruction Pipeline

⇒ The phases of a computer instruction cycle overlap in the execution.

Instruction cycle ⇒ 5 phases.



- FI - Fetch instruction
- DI - Decode instruction
- FO - Fetch operands
- EO - Execute operation
- WR - Write result

Suppose there are three instructions to execute  $I_i, I_{i+1}, I_{i+2}$

## Instruction execution in nonpipeline architecture



Number of cycles to complete 3 tasks  $\Rightarrow 3 \times$  cycle to complete 1 task.

$$\begin{aligned} &\Rightarrow 3 \times 5 \\ &= \cancel{15} \end{aligned}$$

Time to execute 3 tasks  $\Rightarrow 3 \times t_w$   
= 3 × time to execute or complete one task

# Instruction execution in pipeline architecture



Number of cycle required to complete  $n$  tasks in  $k$ -stage pipeline  $= k+n-1$

$n=3$  (3 instructions)

$k=5$  (5 stages of instruction cycle)

$$\text{cycles} \Rightarrow 3 + 5 - 1$$



$t_p \Rightarrow$  time taken by a segment or phase.

$t_m \Rightarrow$  time taken by combination circuit of segment.

$t_d \Rightarrow$  propagation delay of register.

$$t_p = \max_{i=1}^k (t_i) + t_d$$

$$t_p = \max \text{ time taken by any segment} + \text{Register delay}$$

5-Segment pipeline

## Formulas:

(or phases)

consider  $k$ -segment pipeline  $\Rightarrow$  means Number of segments  $\Rightarrow k$

Number of tasks to be executed =  $n$

Clock cycle time =  $t_p$

Time taken by task 1 to complete  $\Rightarrow k \cdot t_p$

The remaining  $(n-1)$  tasks, time equal to  $(n-1) \cdot t_p$

The time taken by  $n$  tasks using  $k$ -segment pipeline

$$= k \cdot t_p + (n-1) \cdot t_p$$

$$= [k + (n-1)] \cdot t_p$$

Number of cycles by  $n$  tasks using  $k$ -segment pipeline

$$\Rightarrow k + (n-1)$$

Consider a non-pipeline unit that performs the same task and takes a time equal to  $t_n$  to complete each task  
then time required to complete  $n$  tasks.  $\Rightarrow n t_n$

Speedup ratio  $S = \frac{\text{time taken by non-pipeline}}{\text{time taken by pipeline system}}$

As the number of tasks increases  $n$  becomes much larger than  $k-1$ , and  $k+n-1$  approaches the value of  $n$   
under this condition the speedup  $\Rightarrow$

$$S = \frac{t_n}{t_p}$$

we assume that the time it takes to process a task is the same in the pipeline and non-pipeline circuit, we will have

$$t_h = K t_p \Rightarrow$$

Including this assumption the speedup reduces to

$$S = \frac{K t_p}{t_p} = K$$

This shows that theoretical maximum speedup that a pipeline can provide is  $K$  (number of segments in pipeline)

$$\text{Frequency} = \frac{1}{t_p}$$

$$\text{Throughput } H_K = \frac{n}{(K+h-1)t_p} \quad \text{or} \quad \frac{nf}{(R+n-1)}$$

Throughput: The number of tasks performed per time unit

$$\text{Efficiency} = \frac{S_K}{K}$$

Latency: The number of time units (clock cycle) between two initiations of a pipeline is the latency between them.

Collision: Any attempt by two or more initiations to use the same pipeline stage at the same time will cause a collision.

Forbidden latencies:- that will cause collision

Permissible latencies:- that will not cause collision

Reservation Table: A reservation table is a two-dimensional table.  
Row  $\Rightarrow$  pipeline stage or processing element  
Column  $\Rightarrow$  represents cycles in the execution of a task  
An X is placed in the entries corresponding to the stages required by the task during that cycle.

$\Rightarrow$  Forbidden latencies  $\Rightarrow$  distance between two X (check marks) in the same row of the reservation table.

|                | 1 | 2 | 3 | 4 | 5 | 6 |                    |
|----------------|---|---|---|---|---|---|--------------------|
| S <sub>1</sub> | X |   |   |   |   | X | $\Rightarrow 1, 6$ |
| S <sub>2</sub> |   | X |   | X |   | . | $\Rightarrow 2, 4$ |
| S <sub>3</sub> |   |   | X |   | X |   | $\Rightarrow 3, 5$ |

$6-1 \Rightarrow 5$   
 $4-2 \Rightarrow 2$   
 $5-3 = 2$

$$\text{Forbidden latencies} = \{2, 5\}$$

$$\text{Permissible latencies} = \{1, 3, 4\}$$

- if reservation table has  $n$  columns, the maximum forbidden latency is  $m \leq (n-1)$
- permissible latency  $p$  will satisfy  $1 \leq p \leq m-1$

collision vector collision vector is an  $m$ -bit vector  
 $m = \text{maximum forbidden latency}$

$$C = C_m C_{m-1} \dots C_2 C_1$$

$C_i = 1$  if latency  $i$  is a forbidden latency.

$C_i = 0$  otherwise

previous Example Forbidden latencies = {2, 5}

$$C = C_5 C_4 C_3 C_2 C_1$$

$$1 \ 0 \ 0 \ 1 \ 0$$

### state Diagram:

From collision vector, we can construct a specifying the permissible state transitions among the successive iterations

⇒ collision vector forms the initial state

⇒ The next state at time  $t+p$  is obtained by shifting the present state  $p$ -bits to the right and ORing with the initial collision vector.



5+ means 5 or more than 5 bits shift

Latency Sequence: A latency sequence is a sequence of permissible non forbidden latencies between successive task initiations.

Latency Cycle: A latency cycle is a latency sequence which repeats the same subsequence (cycle) indefinitely.

constant cycle:- A constant cycle is a latency cycle which contains only one latency value.

\* Simple cycle: A simple cycle is a latency cycle in which each state appear only once.

Greedy cycle: Some of the simple cycles are greedy cycles. A greedy cycle is one whose edges are all made with the minimum latencies from their respective starting states. Greedy cycles are formed only using outgoing edges with minimum latencies.

Suppose state



minimum latency (permissible) = 1



min latency  $\Rightarrow 2$ .



min latency = 3

Note

Minimum latency respect respective to state is marked with \* in state diagram.

- when number of shifts is  $m+1$  or greater, all transitions are redirected back to the initial state.

## Ques: How to draw state diagram

collision vector  $10010 \Rightarrow$  forms initial state  
it has permissible latency  $\Rightarrow 1, 3, 4$

initial state  $10010$

case 1  $\Rightarrow$

$10010$   
 $\downarrow 1$  shift Right

$01001$   
 $10010$  ORing  
 $\underline{11011}$  next state

- ① do shifts 1, 3, or 4
- ② ORing with CV

case 3  $\Rightarrow$

$10010 - CV$   
 $00010 - 3$  shift Right  
 $10010 - CV$  (ORing)  
 $\underline{10010}$  — next state

case 4  $\Rightarrow$

$10010$  — 4-shift Right  
 $00001$   
 $10010$   
 $\underline{10011}$  — next state



State  $\Rightarrow$  11011

case 3 shift of 3.

$$\begin{array}{r} 00011 \\ 10010 \\ \hline 10011 \end{array} \Rightarrow \text{shift of 3.}$$
$$10010 \Rightarrow \text{CV}$$
$$\text{ORing} \Rightarrow \text{next state (already present state)}$$

State : 10011

~~shift~~ Permissible latency  $\Rightarrow 3, 4$ .

case 3:- shift of 3

$$\begin{array}{r} 10011 \\ 00010 \\ 10010 \\ \hline 10010 \end{array} \xrightarrow{\text{3 shifts Right}} \text{ORing}$$

case 4: shift of 4.

$$\begin{array}{r} 10011 \\ 00001 \\ 10010 \\ \hline 10011 \end{array} \xrightarrow{\text{4 shifts Right}} \text{ORing}$$

complete state diagram

minimum permissible latency of this state = 1  
So marked with star.



min permissible latency  $\Rightarrow 3$

min permissible latency = 3

Simple cycle:

(3), (5), (1,5), (1,3,5), (4,3), (4,5) and so on.

Constant cycle:

(3), (5)

Greedy cycles:- Examples  $\Rightarrow$  (1,5), (3,4)

## MAL Minimum Average Latency

Simple cycles.

Average latency

3  
5  
(1,5)

$$\frac{1+5}{2} = 3$$

$$\text{So } \text{MAL} = 3$$

(1,3,5)

$$\frac{1+3+5}{3} = 3$$

(4,3)

$$\frac{4+3}{2} = 3.5$$

Throughput =  $\frac{1}{\text{MAL}} = \frac{1}{3} \Rightarrow 16.67$  million operations per second.

Throughput = if minimum constant cycle used.  
minimum constant cycle = 3.

$$\text{Throughput} = \frac{1}{3} \text{ MOPS.}$$

### Reservation Table for Function X:

|                | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|----------------|---|---|---|---|---|---|---|---|
| S <sub>1</sub> | X |   |   |   |   | X |   | X |
| S <sub>2</sub> |   | X |   | X |   |   |   |   |
| S <sub>3</sub> |   |   | X |   | X |   | X |   |

### Question

Forbidden latencies  
 $\Rightarrow 2, 4, 5, 7$

### Reservation Table for function Y:

|                | 1 | 2 | 3 | 4 | 5 | 6 |
|----------------|---|---|---|---|---|---|
| S <sub>1</sub> | Y |   |   |   | Y |   |
| S <sub>2</sub> |   |   | Y |   |   |   |
| S <sub>3</sub> |   | Y |   | Y |   | Y |

Forbidden latencies  
 $\Rightarrow 2, 4$

- $\Rightarrow$  If reservation table has  $n$  columns, the maximum forbidden latency is  $m \leq (n-1)$
- $\Rightarrow$  permissible latency  $p$  will satisfy  $1 \leq p \leq m-1$



possible cycles  $\Rightarrow (1, 8)$   
 $(3)$   
 $(6)$

$$MAL = 3.$$

Average latency

$$\frac{1+8}{2} \Rightarrow 4.5$$

$$\Rightarrow 3$$

$$\Rightarrow 6$$

|                                 |
|---------------------------------|
| $(1, 5) \Rightarrow 3.0$        |
| $3 \Rightarrow 3.0$             |
| $(3, 5) \Rightarrow 4.0$        |
| $MAL \Rightarrow \underline{3}$ |

## COA Unit 4 notes

Computer Organization & Architecture (Dr. A.P.J. Abdul Kalam Technical University)

## Unit-4 (Part-1)

### Memory Hierarchy:

The memory hierarchy system consists of all storage devices employed in a computer system from the slow (but high storage) auxiliary memory to very fast (but low storage) registers.



fig! 1 Memory hierarchy

- ① Registers: are fastest memory with very low storage and most expensive.
- ② Cache Memory: slower than registers, but faster than main memory & has more storage than registers. used to hold program & data that are heavily used.
- ③ Main Memory: storage more than Cache Memory but less than Auxiliary memory. Speed less than Cache memory but more than auxiliary memory.
- ④ Auxiliary Memory: It has maximum storage capacity but slowest among all devices. It is also cheapest among all devices.

CPU can directly communicate with main memory and also via cache memory. CPU can not directly communicate with Auxiliary memory.



- Main Memory
- ① It is the central storage unit in a computer system.
  - ② It is used to store program and data during the computer operation.
  - ③ Based on semiconductor ~~Integrated Circuits~~ integrated circuits.
  - ④ Types of Main Memory:



### RAM (Random Access Memory)

- (i) It is a volatile memory.
- (ii) Read / write Access is random.
- (iii) Both read and write operation can be performed.
- (iv) RAM has two types:

(a) SRAM (Static RAM): It makes use of flip-flops to store binary information. It is called static RAM because data once stored will remain same (does not change) until power is on. The memory density (memory cell / Area) is relatively less. So less access time is taken in read/write operation. So it is faster, due to this, it is used in Cache memory.

(b) DRAM (Dynamic RAM): It makes use of capacitors for storage. The presence of electric charge on capacitor indicates value '1'. The stored charge on capacitors decreases with time. So value '1' may change to '0' if electrical charge finished. Due to this, it is called dynamic RAM. The memory density (memory cell per unit Area) is relatively high. Due to this, it has larger access time but more

storage capacity. Due to this, it is used mainly in main memory. (3)

(IV) ROM (~~RAM~~ Read Only Memory): It is also a type of main memory. In this, data/program both are read only. No write operation can be performed in this (But versions of ROM now performs write operations also). Random access ROM is also random access. It is also non-volatile (No data lost, if power off) in nature. The types of ROMs are:-

- (i) EPROM (Programmable ROM): This ROM chip is one time programmable. Once programmed, it cannot be erased.
- (ii) EEPROM (Erasable Programmable ROM): This ROM chip is multi time programmable. Once programmed, user can erase data using Ultra-violet rays to erase whole chip & then programme again. Any change require, whole chip to be erased.
- (iii) EEPROM (Electrical Erasable PROM): This ROM chip is also multiple time programmable. In this chip, electrical signals are used to erase data at byte or block level rather than erasing whole chip & program again. faster than EEPROM.
- (iv) Flash Memory: It is almost similar to EEPROM. The main difference from EEPROM is that in this, a bit level erasing & rewriting of data is possible which make it very fast. Due to this, it is called flash memory.

## Numerical on RAM & ROM chips

Q1 How many  $128 \times 8$  RAM chips are needed to provide a memory capacity of 2048 bytes.

Ans

$$\text{Size of one RAM chip} = 128 \times 8$$

$$\begin{aligned}\text{Total memory capacity} &= 2048 \text{ bytes} \\ &= 2048 \times 8\end{aligned}$$

$$\begin{aligned}\text{No. of RAM chips} &= \frac{\text{Total memory size}}{\text{One RAM chip size}} \\ &= \frac{2048 \times 8}{128 \times 8} = \frac{2^11}{2^7} \\ &= 2^{11-7} = 2^4 = 16\end{aligned}$$

Q2 How many ROM chips of  $1024 \times 8$  size are required for Total ROM of 4K bytes.

Ans

$$\text{No. of ROM chips} = \frac{\text{Total ROM size}}{\text{One ROM chip size}}$$

$$= \frac{4 \text{ Kbytes}}{1024 \times 8}$$

$$= \frac{4 \times 1024 \times 8}{1024 \times 8}$$

$$\begin{aligned}&= 4 \text{ ROM chips} \\ &\underline{\underline{\text{Ans}}}\end{aligned}$$

# RAM chips & ROM chips configuration

(5)

## RAM chip



Typical RAM chip of size 128x8

CS<sub>1</sub>, CS<sub>2</sub> → Two chip select control inputs

RD → Read control input

WR → Write control input

AD<sub>0</sub>-AD<sub>6</sub> → 7 bit address inputs  
8 bit data bus

Function table for RAM chips

| CS <sub>1</sub> , CS <sub>2</sub> | RD | WR | Memory function | State of the data bus |
|-----------------------------------|----|----|-----------------|-----------------------|
| 0 0                               | X  | X  | Inhibit         | High Impedance        |
| 0 1                               | X  | X  | Inhibit         | High Impedance        |
| 1 0                               | 0  | 0  | Inhibit         | High Impedance        |
| 1 0                               | 0  | 1  | Write           | Input data to RAM     |
| 1 0                               | 1  | X  | Read            | Output data from RAM  |
| 1 1                               | X  | X  | Inhibit         | High Impedance        |

Function table of RAM chip

Address and control inputs

Address and control methods

Address and control methods

## ROM chip:



Function table for ROM chip

| CS <sub>1</sub> | CS <sub>2</sub> | RD | WR | Memory function | State of data bus    |
|-----------------|-----------------|----|----|-----------------|----------------------|
| 0               | 0               | X  | X  | Inhibit         | High Impedance       |
| 0               | 1               | X  | X  | Inhibit         | High Impedance       |
| 1               | 0               | X  | X  | Read            | Output data from ROM |
| 1               | 1               | X  | X  | Inhibit         | High Impedance       |

Numerical  
Example-2

① How many lines of address bus must be used to access 2048 bytes of memory,

~~ROM size is~~

② How many chips of size 256x8 RAM

are required.

③ How many address lines are common to all chips.

④ How many lines must be decoded for chip select? Specify the size of decoder

⑤ Show how connection are made with detailed configuration.

① No. of address lines

$$\begin{aligned}
 &= 2048 \text{ bytes} \\
 &= 2048 \times 8 \\
 &= 2^{11} \times 8 \\
 &\quad \downarrow \quad \downarrow \\
 &\quad \text{address lines} \quad \text{data lines}
 \end{aligned}$$

No. of data lines = 8

② No. of chips = Total memory

$$\begin{aligned}
 &\text{One RAM size} \\
 &= \frac{2048 \times 8}{256 \times 8} = \frac{2^{11}}{2^8} = 2^3 \\
 &= 8 \text{ chips}
 \end{aligned}$$

③



④ 3 address line ( $A_8, A_9, A_{10}$ ) must be denoted for chip select.

Decoder size =  $3 \times 8$  decoder

⑤ Detailed Connection are already shown on Ans ③ -

12.2 A computer uses RAM chips of  $1024 \times 1$  capacity

(a) How many chips are needed, and how should their address lines be connected to provide a memory capacity of 1024 bytes.

(b) How many chips are needed to provide a memory capacity of 16 K bytes. How the chips are to be connected to the address bus.

$$(a) \text{ No. of chips needed} = \frac{\text{Total memory}}{\text{One RAM size}} = \frac{1024 \text{ bytes}}{1024 \times 1} = 1 \text{ chip}$$



Memory Connection diagram

8 chips

$$\text{Total memory} = 1024 \times 8$$

$$(b) \text{ No. of chips needed} = \frac{16 \text{ Kbytes}}{1024 \times 1} = \frac{16 \text{ Kbytes}}{1024 \times 1} = \frac{16 \text{ Kbytes}}{1024 \times 1} = 16 \text{ chips}$$



Memory connection

## Memory Address Map:

- \* Memory address map is the pictorial representation of assigned address space for each chip in the system.
  - \* Memory Address Map consists of three columns
    - (a) Components
    - (b) Hexadecimal address
    - (c) Address Bits (Address Bus)
  - \* Components specifies whether a RAM, ROM or I/O device
  - \* Hexadecimal Address column specifies the assigned range of hexadecimal equivalent address for each chip.
  - \* Address Bits (Address Bus) column specifies the no. of bits required in the address bus with value of each address bits. The hexa decimal address of each chip is obtained from the information under the address bus assignment.

| Components     | Hexadecimal Address | Address Bits (Address Bus)                                                                  |
|----------------|---------------------|---------------------------------------------------------------------------------------------|
| RAM chip (1-4) | 0000H-01FFH         | 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1<br>0 0 0 0 0 0 0 ← XXXX XXXX<br>↓<br>2x4 RAM decoder |
| ROM chip (1)   | 0200H-03FFH         | 0 0 0 0 0 0 1 X XXXX XXXX<br>↓ ROM                                                          |

Sample Example of Memory address Map for 4 RAM Chips and 1 ROM Chip

\* In the RAM chip, for RAM size  $128 \times 8$  RAM



Bit (8-9) used to represent 4 RAM chips as system has 4 RAM chips.

→ used  $2 \times 4$  decoder for assigning unique

value to RAM chips

→  $\frac{9 \text{ } 8 \text{ (Address bits) }}{0 \ 0 \quad \text{RAM 1}}$

0 1 RAM 2

1 0 RAM 3

1 1 RAM 4

Bit (10) → used to differentiate between RAM and ROM -

0 → RAM

1 → ROM

~~Bit (11-16)~~ are set to zero(0) as they are not used in this problem

for ROM chip of size  $512 \times 8$  ROM

$2^9 \times 8$  ROM

9 address bits are represented as  $X$

$X \rightarrow$  may be 0 or 1

starting address  $9 8 7 6 5 4 3 2 1$   
 $0 0 0 0 0 0 0 0 0$

Final  $\rightarrow 1 1 1 1 1 1 1 1 1$

12.4

Consider a system with RAM size  $128 \times 8$  and ROM of size  $512 \times 8$ . How many RAM chips are required for ~~draw memory address map~~ for a total memory of a total RAM of 4096 bytes and 4096 bytes of ROM. Draw Memory address Map.

Q8 (i) No. of RAM chips =  $\frac{4096 \text{ bytes}}{128 \times 8} = \frac{4096 \times 8}{128 \times 8} = \frac{2^12}{2^7} = 2^5 = 32$

No. of ROM chips =  $\frac{4096 \text{ bytes}}{512 \times 8} = \frac{4096 \times 8}{512 \times 8} = \frac{2^{12}}{2^9} = 2^3 = 8$

(ii)



# Memory Address Map

(10)

| Components | Memory Addresser | Address Lines                                                                                                                                                                                                                          |
|------------|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| RAM        | 0000-0FFFH       | A <sub>15</sub> A <sub>14</sub> A <sub>13</sub> A <sub>12</sub> A <sub>11</sub> A <sub>10</sub> A <sub>9</sub> A <sub>8</sub> A <sub>7</sub> A <sub>6</sub> A <sub>5</sub> A <sub>4</sub> A <sub>3</sub> A <sub>2</sub> A <sub>1</sub> |
| ROM        | 1000H-1FFFH      | 0000 0 $\xleftarrow{5 \times 32}$ decoder X X X X X X X<br>0001 $\xleftarrow{3 \times 8}$ decoder X X X X X X X                                                                                                                        |

- Q2.3 A ROM chip of  $1024 \times 8$  bits has four select inputs and operates from a 5V power supply. How many pins are needed for the IC package. Draw a block diagram and label all inputs & output terminals.



$$\text{No. of address lines} = 1024 \times 8$$

$$= 2^{10} \times 8$$

↓      ↓      ↓      ↓  
10      8      data      address  
address lines      lines

$$\text{No. of data lines} = 8$$

$$\text{Power input} = +5V, \quad \text{select inputs} = 4$$

$$\text{Ground pin} = GND,$$

$$\text{Total pins} = 10 + 8 + 1 + 1 + 4 = \underline{\underline{24 \text{ pins}}}$$

12.5

System with RAM size 128<sup>2</sup> (11)

A Computer employs RAM chips of  $256 \times 8$  and ROM chips of  $1024 \times 8$ . The Computer system needs 2K bytes of RAM. 4K bytes of ROM and 4 Interface units, each with four registers. A memory mapped I/O configuration is used. The two highest order bits of the address bus are assigned 00 for RAM, 01 for ROM and 10 for Interface registers.

(a) How many RAM & ROM chips are needed?

(b) Draw a memory address map for the system.

(c) Give the address range in hexadecimal for RAM, ROM and Interface.

Solution

Available on YouTube video

"Memory address map with example"

on "LS Academy for Technical Education"

## Cache Memory-

- Cache memory is a Static Random-Access Memory.
- The main advantage of cache memory is its very fast speed.
- It can be accessed by the CPU at much faster speed than main memory.
- Cache memory bridges the speed mismatch between the processor and the main memory.
- Cache memory lies on the path between the CPU and the main memory.
- It facilitates the transfer of data between the processor and the main memory at the speed which matches to the speed of the processor.



## **Cache and Main Memory**

- Data is transferred in the form of words between the cache memory and the CPU.
- Data is transferred in the form of blocks or pages between the cache memory and the main memory.

## Purpose-

The fast speed of the cache memory makes it extremely useful.

- It is used for bridging the speed mismatch between the fastest CPU and the main memory.
- It does not let the CPU performance suffer due to the slower speed of the main memory.

## Principle of Locality (Temporal)

- Programs access a small proportion of their address space at any time (Locality in Time)
- Temporal locality
  - Items accessed recently are likely to be accessed again soon
  - Keep most recently accessed data items closer to the processor
  - e.g., instructions in a loop, induction variables



# Principle of Locality (Spatial)

- Programs access a small proportion of their address space at any time (Locality in Space)
- Spatial locality
  - Items near those accessed recently likely to be accessed soon
  - Move blocks consisting of **contiguous words** to the upper levels
  - E.g., sequential instruction access, scanning an array data

```
for (i=0; i<1000; i++)  
    x[i] = x[i] + s;
```



## Terminology

- **Hit:** data is in some block in the upper level (**Blk X**)
  - **Hit Rate:** fraction of memory accesses found in upper level
  - **Hit Time:** Time to access the upper level which consists of
    - SRAM access time + Time to determine hit/miss



- **Miss:** data is not in the upper level so needs to be retrieved from a block in the lower level (**Blk Y**)
  - **Miss Rate** = 1 - (Hit Rate)
  - **Miss Penalty:** Time to bring in a block from the lower level and replace a block in the upper level with it
    - + Time to deliver the block to the processor
  - **Hit Time** << Miss Penalty

## Cache Mapping-

- Cache mapping defines how a block from the main memory is mapped to the cache memory in case of a cache miss.

**OR**

- Cache mapping is a technique by which the contents of main memory are brought into the cache memory.

The following diagram illustrates the mapping process-



- Main memory is divided into equal size partitions called as **blocks** or **frames**.
- Cache memory is divided into partitions having same size as that of blocks called as **lines**.
  - During cache mapping, block of main memory is simply copied to the cache and the block is not actually brought from the main memory.

### Cache Mapping Techniques



# Direct Mapping

Block  $j$  of main memory maps onto block  $j \bmod 128$  of the cache

4: one of 16 words. (each block has  $16 = 2^4$  words)

7: points to a particular block in the cache ( $128 = 2^7$ )

5: 5 tag bits are compared with the tag bits associated with its location in the cache.

Identify which of the 32 blocks that are resident in the cache ( $4096/128$ ).



# Associative Mapping



Figure 5.16. Associative-mapped cache.

Main  
memor



# Set-Associative Mapping

4: one of 16 words. (each block has  $16=2^4$  words)

6: points to a particular set in the cache ( $128/2=64=2^6$ )

6: 6 tag bits is used to check if the desired block is present ( $4096/64=2^6$ ).

Figure 5.17. Set-associative-mapped cache with two blocks per

| Tag | Set | Wor |
|-----|-----|-----|
| 6   | 6   | 4   |

Main memory  
address  
portion

## Write back Vs Write through Method In Cache Memory

### Write Through Method

- ① In this method, Cache memory as well as main memory is updated at the same time.
- ② Implementation is easy.
- ③ Simpler logic circuit.
- ④ Redundancy of data.
- ⑤ High Reliability.
- ⑥ Data Consistency.
- ⑦ More Memory bandwidth used.
- ⑧ More power is required.

### Write Back Method

- ① In this method, Cache memory is only updated. The main memory will be updated later on.
- ② Implementation is difficult.
- ③ Complex logic circuit.
- ④ No redundancy of data.
- ⑤ Low Reliability.
- ⑥ Data inconsistency.
- ⑦ Less memory bandwidth is used.
- ⑧ Less power is required.

Example on next Page

## Write Back Vs Write Through in Cache Memory



$X=300$

## Associative Memory

①

- \* A memory unit accessed by content is called "Associative memory" or "Content Addressable Memory".
- \* This type of memory is accessed simultaneously and in parallel on the basis of data content rather than by specific memory address or location.
- \* When a word is written in an associative memory, no address is given. The memory is capable of finding an empty unused location to store the word.
- \* When a word is to be read from an associative memory, then the content of the word or part of the word is specified.
- \* The memory locates all words that match the specified content and marks them for reading.
- \* It takes relatively less time to find an item based on Content rather than by an address.

## Block diagram of Associative Memory:



Figure 1(a):



Argument Register (A)



Key Register (k.)

$m$  words  
of  
 $n$  bits  
each



Match Logic

Fig. 1(b) Detailed Block diagram of Associative Memory.

$$A = 101 \quad | \quad | \quad | \quad | \quad 00$$

$$k = 111 \quad 0000 \quad 00$$

$$w_1 = 100 \quad 1111 \quad 00 \quad (\text{Match})$$

$$w_2 = \underline{101} \quad 0000 \quad 01 \quad (\text{No Match})$$

- \* Key Register provides a mask for choosing a particular field or key in the argument word.
- \* Only that part of Argument Register (A) is compared for which key register contains 1 bits.
- \* Only 101 portion of argument Register (A) is compared with word ( $w_1$ ) and word ( $w_2$ ).  
101 portion of Argument register (A) matches with word ( $w_2$ ).

## Cell Structure of Associative Memory with match logic

(3)

- \* Each cell (memory cell) consists of a flip-flop storage element ( $F_{ij}$ ) and circuit for reading, writing and matching the cell.
- \* The input bit is transferred into the storage cell during a write operation.
- \* The bit stored is read out during a read operation.
- \* The match logic compares the content of the storage cell with the corresponding unmasked bit of the argument and decides whether the bits matcher or not.



Figure 2(a): Cell Structure of Associative Memory

## \* Match Logic:

(4)

- \* The match logic for each word can be derived from the boolean function

$$X_j = A_j F_{ij} + A'_j F'_{ij}$$

(i) If the bits are equal,  $X_j = 1$

(ii) If the bits are not equal,  $X_j = 0$

- \* Each memory word ( $W_i$ ) is a combination of individual memory cells ( $F_{i1}, F_{i2}, F_{i3}, \dots, F_{in}$ )

$$W_i = F_{i1} F_{i2} F_{i3} \dots F_{ij} \dots F_{in}$$

- \* for a word  $i$  to be equal to the argument in  $A$ , we must have all  $X$  variables equal to 1.

$$M = X_1 X_2 X_3 X_4 \dots X_n = 1$$

\* If key bit ( $K_j = 0$ ), then  $A_j$  and  $F_{ij}$  needs no comparison.

\* If key bit ( $K_j = 1$ ), then  $A_j$  and  $F_{ij}$  are compared.

\* So  $M_i = (X_1 + K_1') (X_2 + K_2') (X_3 + K_3') \dots (X_n + K_n')$

It can be re-written as

$$M_i = \sum_{j=1}^n (X_j + K_j')$$

Putting value of  $X_j$

$$M_i = \sum_{j=1}^n (A_j F_{ij} + A'_j F'_{ij} + K_j')$$

So the logic circuit for matching one word is as given:

5



Figure: Match Logic for one word of associative memory

5

## Auxiliary Memory (Secondary Storage) ⑥

There are mainly two types of auxiliary devices that are commonly used in Computer System:

- ① Magnetic disk
- ② Magnetic tape

① Magnetic disk: Magnetic disk provides a large amount of secondary storage to modern computers.

### Disk Structure



fig: Disk Structure

Platter: A round surface (disk) containing magnetic coating 7

Track: A circle on the disk surface containing data.

Cylinder: Set of tracks accessed simultaneously from read/write head.

Sector: A portion of track on the disk surface. Each sector stores same no. of bytes. So the density of bits is more in sectors near the centre of disk.

Block: The smallest unit of data that can be written or read to/from the disk.

Read/write head: A transducer attached to an arm for reading/writing data to/from the disk surface

Arm Assembly: A mechanical unit holding all heads and arms.

Drive Motor: The Drive (DC) motor rotates the platters at a fixed rate (e.g. 3600 rpm).

Head motion: A mechanism is required to move the head assembly in/out.

Seek time: Time taken by the read/write head to reach a specific track is called seek time.

Latency time: Time taken by the correct sector to arrive under R/W head is called "Rotational Delay" or "Latency time".

$$\text{Average latency time} = \frac{1}{2} \times \text{rotational speed}$$

## Average Access time:

Average access time = average seek time + average latency time  
 + Data transfer time + Controller overhead  
 + Queue delay.

## Data Transfer time:

$$\text{Data Transfer time} = \frac{1}{\text{No. of sectors} \times \text{rotational speed}}$$

## Data Transfer Rate:

$$\text{Data transfer rate} = \frac{\text{No. of rotations} \times \text{track} \times \text{No. of head capacity}}{\text{sec}}$$

- \* Disks that are permanently attached to the arm assembly (and can not be removed by the user) are called "hard disk".
- \* A disk drive with removable disk are called floppy disk/Compact disk(CD)/DVD etc.
- \* Magnetic disks used direct access methods.

## Magnetic tape:

- \* Magnetic tape is a medium for magnetic recording made of a thin, magnetizable coating on a long, narrow strip of plastic film.
- \* In this, only one side of the ribbon is used for storing data.
- \* It uses sequential access method (reading/writing) on memory location in a serial order.
- \* Data Read/Write operation is slower, because of sequential access.
- \* Magnetic tape has storage capacity ranging from 100 MB to 200 GB.

## Advantages:

- ① These are inexpensive.
- ② It provides backup or archival storage.
- ③ It can be used for copying data from hard disk.
- ④ It is a reusable memory.
- ⑤ It is compact and easy to store.

## Disadvantages:

- ① It provides sequential access i.e. any location can not be accessed directly which creates delay.
- ② It requires caring for storage, humidity, dust also influence data reliability.
- ③ Difficult to update or modify data.

## Virtual Memory:

(16)

- \* Virtual memory is a concept that gives an illusion to the user that large memory space is available to the user to execute any program however actually such memory space is not available actually.
- \* Virtual memory ~~space~~ system provides a mechanism for translating program-generated addresses into correct main memory locations.
- \* Address Space: An address used by the program or generated by CPU is called Virtual address/Logical address and the set of such addresses is called Address Space.
- \* Memory Space: An address in main memory is called Physical address/Memory address and the set of such addresses is called Memory Space.

Conversion from Virtual (Logical) address to Physical address is done by using a mapping table.



## Implementation of Virtual memory:

The Concept of Virtual memory can be implemented using

### ① Address Mapping using pages:

- \* The logical address space / logical memory is divided into equal size of parts called "Pages".
- \* The physical address space / Physical memory is divided into equal size of parts called frames / blocks.
- \* Page size is equal to the frame size.
- \* No. of pages in address space is not equal to the no. of blocks in memory space.
- \* Ex: If address space = 8K, memory space = 4K and ~~no. of~~ Page size = 1K then

~~Block size ≠ Page size~~

Block size = Page size = 1K words

$$\text{No. of pages} = \frac{\text{address space}}{\text{Page size}} = \frac{8K}{1K} = 8$$

$$\text{No. of blocks} = \frac{\text{memory space}}{\text{Block size}} = \frac{4K}{1K} = 4$$



Address Space = 8K



Memory space = 4K

Virtual address (Logical Address)

001 0101010010



Word No.

Page Table

| Page No. | Block No. | Valid bit |
|----------|-----------|-----------|
| 000      | 00        | 0         |
| 001      | 11        | 1         |
| 010      | 00        | 1         |
| 011      | 00        | 0         |
| 100      | 00        | 0         |
| 101      | 01        | 1         |
| 110      | 10        | 1         |
| 111      | 00        | 0         |



Physical address (12)



| Block No. | Block   |
|-----------|---------|
| 00        | Block 0 |
| 01        | Block 1 |
| 10        | Block 2 |
| 11        | Block 3 |

Main (Physical) memory

## Memory Mapping using Paging

\* Virtual address (Page No., offset) has value

Page No. = 001  $\rightarrow$  3 bits as total Pages = 8

Offset = 0101010010  $\rightarrow$  10 bits

Offset specify the address of word in within the page/Block. Since Page size = 1K words = 1024 words

so No. of address bits =  $2^{10} = 1024$

10 bits

\* Since Page No.(001) has Block No.(11) in the Page table. So Valid bit is said to 1 for it as this page have entry in physical memory.

\* Offset value is same in both Virtual address and physical address.

Offset  $\rightarrow$   
= 0101010010



Page 1

## ② address mapping using Segmented-Paging Method

- \* In segmented paging, each program is considered as a collection of segments. with each segment is further divided into pages. segments are of variable size.
- \* The logical address is partitioned into 3 fields:

| S           | P        | W                    |
|-------------|----------|----------------------|
| Segment No. | Page No. | word No.<br>(offset) |

- \* Physical address consists of 2 fields:

|           |                      |
|-----------|----------------------|
| Block No. | Word No.<br>(offset) |
|-----------|----------------------|

- \* Each Segment has its own page table.

| Page 0 |
|--------|
| Page 1 |
| Page 2 |
| Page 3 |
| Page 4 |
| Page 5 |
| Page 6 |
| Page 7 |

} Segment 0

} Segment 1

} Segment 2

| Block 0 |
|---------|
| Block 1 |
| Block 2 |
| Block 3 |

Physical Memory  
(Memory Space)

Page size = block size

Logical Memory  
(Address space)

| S | P | W |
|---|---|---|
| 2 | 3 | 1 |

Virtual address

| S BRC Base Address |     |
|--------------------|-----|
| 0                  | 10  |
| 1                  | 30  |
| 2                  | 70  |
| 3                  | 125 |
| :                  | :   |

Segment table

5 → Segment No.

BA → Base address

(Starting address)

S → Segment No.  
P → Page No.  
W → word No.

| P   | B |
|-----|---|
| 70  | 3 |
| 71  | 2 |
| 72  | 1 |
| 73  | 4 |
| 74  | : |
| 75  | : |
| 76  | : |
| 77  | : |
| 78  | : |
| 79  | : |
| 80  | : |
| 81  | : |
| 82  | : |
| 83  | : |
| 84  | : |
| 85  | : |
| 86  | : |
| 87  | : |
| 88  | : |
| 89  | : |
| 90  | : |
| 91  | : |
| 92  | : |
| 93  | : |
| 94  | : |
| 95  | : |
| 96  | : |
| 97  | : |
| 98  | : |
| 99  | : |
| 100 | : |

Page Table for Segment 2

→ 4 | 1

| Block 0 |
|---------|
| Block 1 |
| Block 2 |
| Block 3 |
| Block 4 |
| :       |
| :       |
| :       |
| :       |

Physical  
(main)  
memory

## Page Replacement

(14)

Before understanding page fault, a few concepts are needed to understand!

Page fault: Page fault occurs, when the desired page is not available in main memory.

Pure Page fault: Pure page fault occurs at the starting of program execution, when no page is available in the main memory.

Page Replacement: Page replacement occurs after page fault in which the required page brings in from the secondary memory to the main memory and the page that is not required have been moved out from main memory to the secondary memory.

However page replacement is only needed when the ~~secondary~~ main memory is full and in order to bring the desired page from secondary memory to main memory, the unnecessary pages are removed from main memory to secondary memory.

### Page Replacement Algorithms:

Page replacement algorithm decides which page need to be removed from the main memory to make space for the new page.

## ① FIFO (First in First out) Page Replacement algorithm

In this algorithm, ~~for~~ The page which comes first in the main memory is to be removed first.

example: If address space = 8K

memory space = 4K

Page size = 1K

No. of blocks = 4

No. of pages = 8

(a) If the page reference string sequence is



No. of page faults = 10

## ② LRU (Least Recently Used)

In this algorithm, the page that has not been used in recent time will be replaced first.

(b) Page Reference string Sequence



No. of page fault = 11

## Address Mapping Using Segmented Paging

16

(Numerical) M. Mano (Problem: 12-23)

The logical address space in a computer system consists of 128 segments. Each segment consists of 32 pages of 4K words in each. Physical memory consists of 4K blocks of 4K words in each. Formulate the logical and physical address formats.

Ans



Logical address

$$\text{No. of Segments} = 128 = 2^7 \rightarrow \text{No. of segments bits}$$

$$\text{No. of Pages} = 32 = 2^5 \rightarrow \text{No. of page bits}$$

in each segment

$$\text{No. of words} = 4K = 2^{12} \rightarrow \text{No. of word bits}$$

in each page



$$\text{No. of blocks} = 4K = 2^{12} \rightarrow \text{No. of block bits}$$

(17)

## No. of address space Numerical

(M. Mano: Problem No: 12-19)

12.19

An address space is specified by 24 bits and the corresponding memory space by 16 bits.

- (a) How many words are there in the address space.
- (b) How many words are there in the memory space.
- (c) If a page consists of  $2^k$  words, how many pages and blocks are there in the system.

Ans (a) No. of words in address space =  $\underline{\underline{2^{24}}}$   
as address space has each address of 24 bits

(b) No. of words in memory space =  $\underline{\underline{2^{16}}}$   
as memory space has each address of 16 bits

(c) Page size =  $2^k$  words.

So block size = Page size =  $2^k$  words

$$\begin{aligned} \text{No. of pages} &= \frac{\text{No. of words in address space}}{\text{No. of words in each page}} \\ &= \frac{2^{24}}{2^k} = \underline{\underline{\frac{2^{24}}{2^k}}} = \underline{\underline{2^{13}}} = \underline{\underline{2^{10} \times 2^3}} \\ &\quad = \underline{\underline{k \times 8}} = \underline{\underline{8k}} \end{aligned}$$

$$\begin{aligned} \text{No. of blocks} &= \frac{\text{No. of words in memory space}}{\text{No. of words in each block}} \\ &= \frac{2^{16}}{2^k} \end{aligned}$$

$$\begin{aligned} &= \frac{2^{16}}{2^{10}} = \underline{\underline{\frac{2^{16}}{2^{10}}}} = \underline{\underline{2^6}} = \underline{\underline{64}} \end{aligned}$$

Optical disk: An optical disk is any computer disk that uses optical storage techniques and technology to read and write data. It is the computer storage disk that stores data by digitally and uses laser beams to read and write data.

(18)

- \* Non-Volatile.
- \* laser technology extremely fast.
- \* typical capacity = 700 MB.

#### Advantages of optical disk

- \* Cheap Media
- \* CD and DVD can hold lots of media.
- \* Easy to carry
- \* Small in size.

#### Disadvantages of optical disk

- \* Easy to scratch
- \* Easy to break
- \* Can get damaged
- \* Smaller capacity as compared to Pendrive.

## Numerical on Virtual Memory

19

A virtual memory has a page size of 1k words. There are 8 pages and 4 blocks. The associative memory page table contains the following entries

| Page | Block |
|------|-------|
| 0    | 3     |
| 1    | 1     |
| 4    | 2     |
| 6    | 0     |

Make a list of all virtual addresses (decimal) that will cause the page fault if used by CPU.

Ans

| Address   | Page   |
|-----------|--------|
| 0-1023    | Page 0 |
| 1024-2047 | Page 1 |
| 2048-3071 | Page 2 |
| 3072-4095 | Page 3 |
| 4096-5119 | Page 4 |
| 5120-6143 | Page 5 |
| 6144-7167 | Page 6 |
| 7168-8191 | Page 7 |

| Block   | Size |
|---------|------|
| Block 0 | 1K   |
| Block 1 | 1K   |
| Block 2 | 1K   |
| Block 3 | 1K   |

Blocks

Range Pages

Page size = block size = 1K = 1024 words

(0 to 1023) = 1024 words

(20)

Pages that will not be found in main memory will cause page fault

| Page No. | Virtual Addresses (decimal) |
|----------|-----------------------------|
| 2        | 2048 - 3071                 |
| 3        | 3072 - 4095                 |
| 5        | 5120 - 6143                 |
| 7        | 7168 - 8191                 |

## Difference between 2D RAM & 2.5D RAM

### 2D RAM:

- \* In 2D RAM organization, a RAM consists of  $2^n \times m$  consisting of  $2^n$  memory words of m bits each.
- \* It has a row decoder of size  $n \times 2^n$  to select 1 out of  $2^n$  memory words.
- \* Each row (memory word) have m columns (1 column for each individual bit).
- \* In 2D RAM, hardware is fixed.
- \* It requires more no. of logic gates.
- \* 2D RAM organization is more complex relatively.



$32 \times 4 \Rightarrow 2^5 \times 4 \rightarrow$  No. of address lines

Organization of  $32 \times 4$  2D RAM

## 2.5 D RAM :

- \* In 2.5 D RAM organization, the no. of address lines are divided into approximately equal parts, one for row select decoder and another for column select decoder.
- \* In 2.5 D RAM, hardware is reusable.
- \* It requires less no. of logic gates.
- \* 2.5 D RAM is less complex than 2D RAM.



# COA UNit 5 Notes

Computer Organization & Architecture (Dr. A.P.J. Abdul Kalam Technical University)

### Peripheral Devices:-

An external device connected to an I/O module is often referred to as a peripheral device or simply a peripheral.

### 3 categories of external devices:-

#### ① Human Readable Device:

suitable for communicating with the computer user (e.g. VDT (Video Display ~~or~~ Terminal), Printer, Keyboard, etc).

#### ② Machine Readable Devices:-

suitable for communicating with equipment. e.g. magnetic disk, sensor etc.

#### ③ Communication Device:

suitable for communicating remote devices.

Eg - The remote device can be the human readable device such as terminal, a machine readable device or another computer.



Generic Model of an I/O Module

## I/O Interface:

- ⇒ I/O interface provide a method for transferring information between CPU and external devices.
- ⇒ The input output interface resolves the differences that exist between the CPU and each external device.



## Functions of I/O Module:

- Major functions for an I/O Module fall into following categories
  - control and timing
  - Processor Communication
  - Device Communication
  - Data Buffering
  - Error Detection

## Block Diagram of I/O Module



I/O Port: A connection point that acts as an interface b/w the computer and external devices like mouse, keyboard, printer, modem etc.

Ports are of two types:-

Internal Port: It connects motherboard to internal device like Hard disk drive, internal modem etc.

External Port: It connects the motherboard to external device like, mouse, printer, and keyboard etc.

## Example of Interface:



Interfaces

Serial Interface

Parallel Interface

- \* In parallel interface, there are multiple connecting I/O module and the peripheral, and multiple bits are transmitted simultaneously over the data bus. (all of bits of data word transmitted simultaneously)
- \* In serial interface, there is only one line used to transmit data, and bits must be transmitted one at a time.



## I/O Systems:

User programs never directly interact with I/O devices. All I/O operations are performed with the help of system calls.

There are 4 types of I/O commands:-

- Control
- Test Status
- Read
- Write

## Modes of Data Transfer ( I/O Techniques )

- Programmed I/O
- Interrupt Driven I/O
- Direct Memory Access

### Programmed I/O

- Input-output instructions written in computer program.
- Each data item transfer is initiated by an instruction in the program.
- Transferring data under program control requires a constant monitoring of peripherals by CPU. (Once initiated when to transfer)
- ⇒ ~~Time initiated~~
- ⇒ In this method, the CPU stays in a program loop until I/O unit indicates that it is ready for the data transfer.
- ⇒ Time consuming
- ⇒ ~~Less efficient method~~ less efficiency.



## Programmed I/O



## Interrupt Driven I/O

## Interrupt Driven I/O

- special commands to inform the interface to issue an interrupt request signal when the data is available from the device. meantime the CPU can proceed to execute another program. The interface meanwhile keeps monitoring the device.
- ⇒ when the interface determines the device is ready to data transfer, it generates an interrupt request to the computer.
- ⇒ upon detecting the interrupt signal, the CPU momentarily stops the task it is processing, branches to a service program to process I/O transfer, and returns to the task it was actually performing.

## Isolated I/O vs. Memory Mapped I/O

When the processor, main memory and I/O share a common bus, two modes of addressing are possible:

- Memory Mapped I/O
- Isolated I/O (Input-output mapped I/O).

### Memory Mapped I/O :-

- There is a single address space for memory locations and I/O devices.
- Processor uses same read instructions to access both the memory and Input-output devices.
- No specific I/O instructions.
- One set of read and write signals



General structure of memory Mapped I/O

## Isolated I/O or Input-Output Mapped I/O

(9)

- There are separate address spaces for memory or I/O device.
- CPU specifies that the address on lines is for memory or I/O
- Distinct input output commands.
- ~~separately~~ separate read and write lines.



## comparison between programmed and interrupt driven I/O

### Programmed I/O

- can be implemented without any additional hardware
- based on busy waiting CPU keeps checking the status of I/O device.
- low efficiency
- only one activity can be handled using programmed I/O

### Interrupt Driven I/O

- Additional hardware required for interrupt handling
- CPU, switches to another task without waiting
- higher efficiency than programmed I/O
- Multiple I/O activity can be handled in overlapped fashion here

## Comparison between Input-output Mapped (Isolated) and Memory Mapped I/O

### Memory Mapped I/O

- same address space for memory and I/O devices.
- uses same instructions for both I/O and memory operations
- one set of read and write signals for memory and I/O
- Memory Mapped I/O is less efficient

### Peripheral Mapped I/O

or

### Input-output Mapped I/O

- two (separate) address spaces for memory and I/O devices.
- separate commands for memory operations and different commands for I/O.
- separate read and write signals for memory and I/O
- More efficient

## Interrupt Processing :

The basic method of interrupting the CPU is done by activating a control line that connects the interrupt source to the CPU.

CPU  $\Rightarrow$  \* recognizes the presence of interrupt

- \* executes a specific interrupt handling program.

- \* determines the source of interrupt

- \* determines the address (branch address) of the interrupt handling program.

### Simple interrupt Processing



## Design issues:

Two design issues in implementing ~~I/O~~ Interrupt I/O.

- ① → To determine which device issued the interrupt from multiple devices.
- ② if multiple devices interrupts have occurred, how does the processor decide which one to process.

## Issue: Device identification

4 Techniques exist:

- Multiple interrupt lines
- Software Poll
- Daisy chain (Hardware Poll, vectored)
- Bus Arbitration (vectored)

## Multiple Interrupt Lines:-



- ⇒ immediate recognition of device
- ⇒ separate Interrupt request line
- ⇒ impractical approach

multiple interrupt line system



### Software Poll:

- When processor detects the interrupt, it branches to an interrupt service-routine whose job is to poll each I/O module to determine which module caused the interrupt.
- Time consuming
- Priority can be implemented by defining polling sequence.

### Vectored Interrupt using Daisy chaining : (Hardware Poll)

- All I/O module shares common interrupt request line
- The interrupt acknowledgement line is daisy chained through the modules.
- When a processor senses the interrupt, it sends out an interrupt acknowledge propagating through a series of I/O module until it gets a requesting module.
- The requesting module responds by placing a word on data lines: This word is referred to as vector.

## Bus Arbitration Techniques:- (vectored interrupts)

- with bus arbitration technique, an I/O module first gains control of bus before it can raise the interrupt request line.
- Thus only one module can raise the line at a time.

## Types of Interrupts:

### Program interrupts

These are generated by some condition that occurs as a result of an instruction execution such as:

- Arithmetic Overflow
- Division by zero
- Execution of illegal machine instruction
- Segment limit violation
- Execution of privileged instruction

### Timer Interrupts

These are generated within the processor. This allows operating system to perform certain operations on regular basis.

### Input-output interrupts

These are generated for initiation or completion of I/O operations. I/O failure or I/O error too can generate an interrupt.

### Hardware Failure Interrupts

These are generated by a failure; such as power failure or memory parity error.

### Interrupts vs. Exceptions:

An interrupt is generated by a signal from hardware, and it may occur at random times during execution of a program.

An exception is generated from software, and it is provoked by the execution of an instruction.

## Hardware and software interrupts

- When microprocessors receive interrupt signals through pins (hardware) of microprocessor. These are known as Hardware interrupts.
- Software interrupts are those which are inserted in the between the program which means these are mnemonics of the microprocessor.

## Vectored and non vectored interrupts

- In vectored interrupts, the source (or device) that interrupts supplies the branch information to the computer.
- In a non-vectored interrupt, the branch address is assigned to a fixed location in the memory.

## Maskable and non-maskable:

- Maskable interrupts are those which can be disabled or ignored by the microprocessor.
- Non-maskable: which can not be disabled or ignored by the microprocessor.

## Types of interrupts :

4 categories

- Program interrupts
- Timer interrupts
- Input/output interrupts
- Hardware failure

## Direct memory Access:-

- Both interrupt driven and programmed I/O require involvement of CPU for data transfer.
- CPU can be better utilized for program execution.
- I/O activities are slow and involvement of CPU will not have a desired effect on the system performance.
- DMA increases the speed of I/O transfer by eliminating the role played by CPU ~~in~~ in I/O operations.
- When large amount of data is stored/transferred from CPU, a DMA module can be used.

## DMA operates in the following ways:

When I/O is requested, the CPU instructs the DMA module about the operation, by providing information:

- Source address
- Target address
- Read / write
- Number of words to be read or written.

⇒ CPU loads the DMA registers 'Data count' and 'Address Register' with number of bytes to be transferred and starting address memory location respectively.

when the DMA controller is ready to transmit or receive data, it activates the DMA request line to CPU. CPU wait for the next break point. CPU now leaves the control of system bus and activates DMA acknowledge.

- The DMA controller transfers the data directly from or to main memory. After a word is transferred. Data count (or word count register) and Address register are updated.
- If the data count register has not reached zero but the I/O device is not ready to send or receive the data, the DMA controller releases the system bus to CPU by deactivating DMA request line.
- if data count register is decremented to zero, DMA controller finally relinquishes the control of system bus. It may also send an interrupt signal to CPU to indicate the end of data transfer.





Block Diagram of DMA



- Bus Request (BR) input is used by the DMA controller to request to CPU to leave the control of system bus.
- The CPU activates the 'Bus Grant' (BG) output to inform the DMA controller.
- When DMA terminates the transfer, it disable the bus request line. CPU disables the bus grant, takes control of buses and returns it to its normal operation.

### DMA Data Transfer modes:

- DMA block transfer
- Cycle stealing mode
- Transparent DMA.

#### ① DMA Block Transfer:-

- In DMA block transfer technique, a block of data of arbitrary length can be transferred in a single burst.
- This DMA mode is needed for secondary memories like disk drives, where data transmission can not be stopped or slowed without loss of data.

## ② cycle stealing Mode:

In cycle stealing mode, the DMA controller is allowed to transfer one word at a time, after which it must return the control of the bus to the CPU.

- \* DMA module must use the bus only when the processor does not need it; or it must force the processor to suspend operation temporarily.

DMA module transfers a word and returns the control to the processor. Note that this is not an interrupt; the processor does not save a context and do something else. Rather, processor pauses for one bus cycle. The overall effect is to cause the processor to execute more slowly. Nevertheless, for a multiple-word I/O transfer, DMA is far more efficient than interrupt driven or programmed I/O.

## ③ Transparent DMA:

In transparent DMA, DMA is allowed to steal only those cycles when the CPU is not using the system bus.

CPU does not require to have control of system bus during Decode instruction or execute instruction phase.

- \* DMA transferring data during transparent DMA does not have any adverse effect on CPU performance.

## Input - output channels:

Input - output channels or input - output processors, virtually retrieves CPU from all I/O related activities. It has complete control over device. It can also communicate with CPU.

- \* I/O processor is a processor like CPU with limited number of instructions. An I/O processor has the ability to execute the I/O instructions. It can also carry out the work of data conversion required by the input - output device.

There are two types of I/O channels:

- ① Selector channel
- ② Multiplexer channel.

① Selector channel:

Address & data to / from  
main memory

control signals  
to / from CPU



High speed  
input - output  
devices.

- A selector channel can handle multiple high speed devices.
- Only one device is selected at a time for communication.
- A selector channel is useful in high speed transfer of data in burst mode.

## ② A multiplexer channel :



A multiplexer channel can handle I/O with a number of devices at the same time. If the device is slow then byte multiplexing is used.

Example There are three devices which need to send individual bytes are

$B_1, B_2, \dots, B_5 \rightarrow$  data from 1<sup>st</sup> device.  
 $X_1, X_2, \dots, X_5 \rightarrow$  data from 2<sup>nd</sup> device.  
 $Y_1, Y_2, \dots, Y_5 \rightarrow$  data from 3<sup>rd</sup> device.

Then on a byte multiplexer channel may send the data bytes as  $B_1 X_1 Y_1 B_2 X_2 Y_2 B_3 X_3 Y_3 B_4 X_4 Y_4 \dots$

## Data Transfer:

- Synchronous
- Asynchronous

Synchronous: If registers in the interface share a common clock with CPU registers, the transfer between two units is said to be synchronous.

Asynchronous: The internal timing in each unit is independent from the other in that each uses its own private clock for internal registers, in this case the two units are said to be asynchronous.

## Asynchronous Data Transfer:

- two independent units transmit control signals to indicate the time at which data is being transmitted.
- Two methods
  - Strobe
  - Handshaking

Strobed Method (one way) One of communicating device supplies all the control signals.

- Strobe can be activated by either source or destination



Timing Diagram

The strobe is a single line that informs the destination unit when a valid data word is available in the bus.

- \* The source unit first places the data on the databus. After a brief delay to ensure that the data settle a steady value, the source activates the strobe pulse.

- \* In destination initiated data transfer, destination unit activates the strobe pulse, informing the source to provide the data.

Disadvantage: In strobe method, the source unit that initiates the transfer has no way of knowing whether the destination unit has actually received the data item.

## Handshaking: (Two way)

- \* The disadvantage of one way control is that it does not verify that the data transfer has been completed successfully. Data may be lost.
- \* Control of data transfer is based on the use of handshake between processor and the device selected for input-output.

### Block Diagram



Source unit

Destination unit



## Synchronous Bus:

- include a clock in the control lines
- A fixed protocol for communication that is relative to the clock

Advantage: - involves very low logic and can run very fast.

Disadvantage: Every device on the bus must run at the same clock rate.

## Synchronous timing

- The occurrence of events on the bus is determined by a clock
- A clock is a regular sequence of alternating 1's and 0's of equal duration.



- T<sub>1</sub> ⇒ processor places address on address lines and may assert various status lines.  
⇒ once the address lines have stabilized, the processor issues address enable signal.

### For Read operation

T<sub>2</sub> ⇒ Processor issues a read command.

T<sub>3</sub> ⇒ Memory recognizes the address and places the data on the data lines.

### For write operation

T<sub>2</sub> ⇒ processor puts the data on data lines.

⇒ once the data on data lines have stabilized, the processor issues a write command.

T<sub>3</sub> ⇒ Memory copies the information from data lines

## Asynchronous Bus:

- ⇒ No clock
- ⇒ The occurrence of one event on a bus follows and depends on the occurrence of previous event.

### Sequence of events for read operation:



- The processor places address and status signals on the bus.
- After signals have stabilized, the processor issues a read command.
- After address decoding, memory places the data on data lines.
- After stabilized data, memory asserts an acknowledge to the processor.

### Synchronous Timing

- Simple to implement and test
- less flexible
- Devices must be on same clock gate

### Asynchronous Timing

- Hard to implement and test
- More flexible
- combination of slow and fast devices.

## Sequence of events of Write operation:



- The processor places address and status signals on the busses.
- The master places the data lines at the same time.
- The processor issues write command.
- Memory module responds to the write command by copying the data from the data lines and asserting the acknowledge line.
- The master then drops the write signal and memory module drops the acknowledge signal.

## Serial communication

or

## Serial Transmission

→ Serial data transmission occurs by transmitting data one bit at a time in sequential order over a bus.



start bit    data bits    stop bits → To know the start and end of transmission.

## Parallel Transmission

In parallel transmission several bits can be transmitted at the same time.



Serial Communication can be

- Simplex
- Half Duplex
- Full Duplex

Simplex Transmission: signals travels in one direction only.  
eg. Keyboard



### Half Duplex:

→ capable of sending signals in both the directions, but in only one direction at a time

e.g. ~~Telephone~~ Walkie Talkie.



### Full Duplex: (Bidirectional Transmission)

→ This method allows signal transmission in both the directions simultaneously.

e.g. Telephone IP service.