

## Syllabus

- 1) Functional blocks of CPU
- 2) Data Representation
  - \* fixed point and floating point number representation
- 3) Instructions
- 4) Addressing modes ✓
- 5) Stored Program Concept ✓
- 6) Instruction execution
- 7) Memory Hierarchy
- 8) Virtual memory ✓
- 9) Associative memory
- 10) Cache memory
- 11) I/O Organization
- 12) methods of data transfer.
  - \* programmed I/O
  - \* interrupt initiated I/O
  - \* DMA Transfer
- 13) IOP (Input Output Processing)
- 14) Pipeline and vector processing
- 15) Flynn's classification

## Computer organization

It is concerned with study of hardware components and how these hardware components were connected together to make a computer system

## Computer architecture

It is concerned with the study of structural and behavioural of a computer as ~~seen~~ seen by the user.

## functional units of a computer



## Basic operational concepts



The (a) fetching of address of program will kept in **PC** and from

NOTE

PC Navigates the program



Now control unit decide mode now  
(b) memory read  
(c) memory write

now data will be moved from MDR  $\rightarrow$  IR

In IR The Realisation of instruction will be done.

i.e. what is opcode, address and what will



(2<sup>n</sup> distinct) operations

$\rightarrow$  Next the instruction will be moved to ALU and calculation will be done and again write back to memory



MAR:

$\rightarrow$  It holds the address of next instruction which is to be executed

PC:

$\rightarrow$  It generates the address of next instruction to be executed.

~~It~~ One instruction is 3 bytes and stored at 5000h. If it is the current value of PC next exec is 5001h not 5000h.

MDR: It holds the content of memory location which is pointed by MAR.

IR: The instruction realises its format in IR



$$2^{10} = 1024 \text{ Kilo}$$

$$2^{20} = 1024 \text{ Kilo} = 1 \text{ Megabyte}$$

$$2^{30} = 1024 \text{ M} = 1 \text{ Gigabyte}$$

$$2^{40} = 1024 \text{ G} = 1 \text{ Petabyte}$$

ECE

MM size  $\approx$  2048/32

what is the size of PC and IR

(a)  $2048 = 2^{11} \times 2^5$

address = 11 bits

so PC size = 11 bits



and given data is (a) data size = 32 bits

so the size of MDR  $\Rightarrow$  32

IR = 32 because data will be loaded in IR box

|                  |                |              |
|------------------|----------------|--------------|
| Reg = 32         | [ 1. . . . . ] | MDR          |
| Accumulator = 32 | [ . . . . . ]  | IR (to Reg)  |
|                  | [ . . . . . ]  | Reg (to Acc) |

### Program execution

1. Fetch  $\rightarrow$  fetch Routine

T0: MAR  $\leftarrow$  PC

T1: MDR  $\leftarrow$  m[MAR], PC  $\leftarrow$  PC + 1

T2: IR  $\leftarrow$  MDR

2. Decode

DR:

T3:

|    |                |          |
|----|----------------|----------|
| I  | opcode         | /address |
| 15 | 14 13 12 11 10 | 0 (LSB)  |

$\leftarrow$  16 bits  $\rightarrow$

## Execute

- memory reference (for knowing address of variable)
- register reference (for getting values of operands)
- I/O reference (any scanf() used)

~~Registers~~

1 (RTL)

Microoperation and Register Transfer Language

### \* M operation:

→ The operations which are formed on the data stored with in register called microoperations

### \* RTL G

#### Notations

1. Capital letters  $\in R_1, R_2, R_3, MAR, PC$

2. Arrow mark ( $\leftarrow$ )



3. Parenthesis ( )



$$R_2 \leftarrow R_1(0-7)$$

4. comma, separate two microoperations in one one clock cycle e.g.

$$\text{MDR} \leftarrow M[\text{MAR}], \text{PC} \leftarrow \text{PC} + 1$$

## Microoperations ( $M$ )

1. Arithmetic microoperations
2. Logical microoperations
3. Shift microoperations

### Arithmetic microoperations :

- \* Basic :
  - 1) Add
  - 2) SUB
  - 3) INC
  - 4) DEC

### 4-bit Binary adder



### 4-bit Binary subtractor:

$$A - B = A + \bar{B} + 1$$



when standard

$\bar{D}|P=0$  The output is  $1$  because  $P$

When standard  $D|P=1$   
The output complements

# 4 bit binary adder - subtractor



if  $\text{mod} = 0 \rightarrow$  it will act as a 4 bit binary

adder

if  $\text{mod} = 1$  it will act as a 4 bit binary  
subtractor

## 4 bit binary subtraction

A = 11

B = A + 1

$$\begin{array}{r} 00 \\ \underline{- 0100} \\ 1101 \end{array}$$

Take input so we have both  
have half adder



## 4 bit binary subtraction

A = 01

B = 00

A < B;

so

answ ~~0100~~  
answ ~~1111~~

A - B =

$\rightarrow 1111$  (minuend - 111)

3 minuend  
 $\begin{array}{r} 0100 \\ 1111 \\ - 0101 \\ \hline \end{array}$  b = borrow



→ for n bit decremetel  $\rightarrow$  n full adder

## A Logical micro-operation circuits



| $S_1$ | $S_0$ |     |
|-------|-------|-----|
| 0     | 0     | AND |
| 0     | 1     | OR  |
| 1     | 0     | XOR |
| 1     | 1     | NOT |

Shift microoperations:

### 1) Logical Shift

Left: The zero will be inserted at weight position

$$\begin{array}{r} \text{e1} \\ \begin{array}{ccc} 1010 & 1101 \\ \swarrow\searrow & \swarrow\searrow \\ 0000 & 0001 \end{array} \end{array}$$

$$\begin{array}{r} \text{Right} \\ \begin{array}{c} 01011010 \\ \overline{\quad\quad\quad\quad\quad\quad\quad\quad\quad} \\ 01010110 \end{array} \end{array}$$

## ② Arithmetic shift

Left

MSB treated as sign bit  
0 → +ve  
1 → -ve

Right shift

$\begin{array}{r} 10101101 \\ \rightarrow 10101101 \end{array}$  ①

Sign Preserved

B Left shift [ zero inserted at weight position



• XOR gate → used for sign reflection

## ③ Circular shift

Left 10101101

R<sub>24</sub>

10101101

Right ① 0101101

① 0101101 ① → (a)

## Better Representation:

1. Fixed point
2. Floating point

## Performance metrics

MIPS → Million Instructions Per second

MFlops → Million floating point operations



\* The minimum number for 8 bit register is 127  
minimum number = -1

## Floating point

- True halfway
- -ve
- 1. Signed magnitude
- 2. Signed one's complement
- 3. Signed two's complement

## Sign magnitude

Take complement of sign bit only  
of positive number and copy the magnitude

$$\text{Ex } -14 = +14 = \begin{array}{r} 01110 \\ +14 = \underline{11110} \end{array}$$

$$+14 = 11110$$

Signed 1st complement.

Take total complement of the number

$$\text{Ex } -14 = +14 = \begin{array}{r} 01110 \\ +14 = 10001 \end{array} \quad \text{including sign bit}$$

$$-14 = 10001$$

Signed 2's complement

$\rightarrow -14 = \text{One's complement} + 1$

$$-14 = -15 \text{ complement} = 10001 + 1$$

$$\boxed{10010}$$

$$\text{Ex } -89$$

$$-89$$

Signed magnitude

$$-89 = \begin{array}{r} 110011001 \\ \text{sign} \end{array}$$

$$\begin{array}{r} 10011001 \\ 64 \underline{32} 16 8 4 2 \end{array}$$

$$+89 = \boxed{010011001}$$

Signed One's complement

$$(-10011001) \rightarrow \text{Total complement of } +89$$

Signed 2's complement

$$= 101100110$$

$$\cancel{101100111} = -89 = 1011$$

## Signed 2's complement

- 89 -

$$\begin{array}{r}
 +89 = 01001100 \\
 2^{\text{'s comp}} \rightarrow \underline{10110011} \\
 -89 = 10110011
 \end{array}$$

## Overflow detection circuit

→ The maximum ~~possible~~ value which can store in an 8 bit register: ~~11111111~~ +127

Consider a system to perform a 8 bit operation with a register capacity as 8 bit



When and Then O/p = Now

processor supplying an additional flip-flop

• if now O/p = +150

→ when two mbit values are added and the result is (n+1) bits, the additional bit cannot be accommodated in to regular size. Situation will need to be informed to the processor with help of overflow detection circuit

- ⇒ An AND gate is used to determine the overflow (borrow out)
- When overflow is being detected moreover supplies an additional flip-flop to one accommodate the additional bit and test it has sign bit
- The (n+1) bits is shown as a result to the user.

### \* Floating point representation:

mix<sup>t</sup>  
↓  
float  
number

Exponent  
2. without unnecessary 0's per

0.0000 → memory



0.1000 →  
↓  
float  
number



no leading 0's

→ due to floating point loss

memory loss reduced

To reduce the IEEE floating point memory location they introduce IEEE floating point

- ① Single precision format (32-bit)
- ② Double precision format (64-bit)



\*  $(87.67)_{10}$  convert to single point precision format

steps  
① Convert the given decimal to binary

$$(87.67)_{10} \quad 87 = \underline{0101 \ 0111}$$

~~0.67~~

$$+ 87 = \underline{01010111}$$

$$\begin{array}{r} 87 \\ \hline 2 | 43-1 \\ 2 | 21-1 \\ 2 | 10-1 \\ 2 | 5-0 \\ 2 | 2-1 \\ 2 | 1-0 \end{array}$$

$$\begin{aligned}
 (0.67)_{10} &= 0.67 \times 2 = 1.34 - 1 \\
 &\quad 0.34 \times 2 = 0.68 - 0 \\
 &\quad 0.68 \times 2 = 1.36 - 1 \\
 &\quad 0.36 \times 2 = 0.72 - 0 \\
 &\quad 0.72 \times 2 = 1.44 - 1 \\
 &\quad 0.44 \times 2 = 0.88 - 0 \\
 &= 101010
 \end{aligned}$$

$$(0.67)_{10} = 0.101010$$

$$(0.67)_{10} = (01010111 \cdot 101010)_2$$

② Normalize the result

B  $01010111 \cdot 101010 \times 2^0$   
 we have move left ~~left~~ right such that point before  
 number is non-zero

$$0101011101010 \times 2^6$$

③ calculate the biased exponent

$$\begin{aligned}
 B_E &= E + 12 \\
 &= 6 + 12
 \end{aligned}$$

$$E = \text{Exponent}$$

+133

$$(133)_{10} = (1000010)_2$$

(4) Represent them in format

|   |          |                              |
|---|----------|------------------------------|
| 0 | 10000101 | 010 1111010   0000 0000 0000 |
|   |          |                              |

→ (4) Now group it into four bits in order to represent in the hexadeciaml

42AF5000H

home work :

$$1) (-97.66)_{10}$$

$$2) (-129.27)_{10}$$

8 D -97.66

SOL  $-97 = \boxed{1} \ 1100001$   
SIGN

$$= 11100001$$

$$0.66 = 0.66 \times 2 = 1.32 - 1$$

$$0.32 \times 2 = 0.64 = 0$$

$$0.64 \times 2 = 1.28 = 1$$

$$0.28 \times 2 = 0.56 = 0$$

$$0.48 \times 2 = 0.96 = 0$$

$$0.96 \times 2 = 1.92 = 1$$

$$0.56 \times 2 = 1.12 = 1$$

$$0.12 \times 2 = 0.24 = 0$$

$$\begin{array}{r} .2 \\ \hline 97 \\ -48 \\ \hline 49 \\ -48 \\ \hline 1 \\ -2 \\ \hline 1 \\ -2 \\ \hline 1 \\ -2 \\ \hline 0 \\ -2 \\ \hline 0 \\ -2 \\ \hline 0 \\ -2 \\ \hline 1 \end{array}$$

$$= (1100001)$$

$$\begin{array}{r} 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ \hline 1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \end{array}$$

$$(-99.60) \cdot (110001.1010)$$

(2) Normalize

$$1.1100011010 \times 2^8$$

$$1.1100011010 \times 2^8$$

(4)  $\frac{1}{2} + 127$

$$2^{17} = 128 = 1000\ 0000$$

$$\begin{array}{r} 1100001101000 \\ \underline{\quad 0 \quad} \quad \underline{\quad 0 \quad} \quad \underline{\quad 0 \quad} \quad \underline{\quad 0 \quad} \end{array}$$

(5) Convert from 10 to final

expand

|   |         |               |          |   |   |   |
|---|---------|---------------|----------|---|---|---|
| 1 | 1000110 | 1100001101000 | 00000000 |   |   |   |
| 6 | 3       | 6             | 1        | 8 | 0 | 0 |

$$= (6361A8000)_{16}$$

a) -129.27

$$\underline{\text{Sd}} \quad -129 = 11000\ 0001$$

$$0.27 =$$

$$0.27 \times 2 = 0.54 \rightarrow 0$$

$$0.54 \times 2 = 1.08 \rightarrow 1$$

$$0.81 \times 2 = 1.62 \rightarrow 1$$

$$0.2 \times 2 = 0.4 \rightarrow 0$$

$$0.4 \times 2 = 0.8 \rightarrow 0$$

$$1000\ 0001$$

min 6th 6 times

$$\begin{array}{r} 129 \\ 2 \overline{) 641} \\ 2 \overline{) 320} \\ 2 \overline{) 160} \\ 2 \overline{) 80} \\ 2 \overline{) 40} \\ 2 \overline{) 20} \\ 2 \overline{) 10} \\ 2 \overline{) 5} \\ 2 \overline{) 1} \end{array}$$

## Instructions:

### I/O organization:

#### I/O commands:

→ Input command, output command, control command and status command.

#### Peripheral Devices:

- 1) The devices which are connected externally to the computer are known as peripheral device
- 2) CPU use the following set of commands <sup>I/O</sup> to interact with I/O device

① Data input - The I/O device supply data to the processor

2) Data output CPU supplies data to ~~I/O~~ an I/O device

3) Control Command : The human readable format is converted into stream of 0's and 1's for the machine to understand and perform the necessary operation

+ Once the control word is loaded into device interface then only the device starts functioning

#### Status command:

The amount of work done will be informed to user.



### Necessity of an I/O interface

- 1) Synchronization mechanism is required
- 2) Transfer rates
- 3) Word formats of CPU and I/O device are different
- 4) & difficult to memory all ~~I/O~~ <sup>device</sup> ~~operations~~ <sup>operations</sup> I/O

Explanation for above:

→ CPU is an electronic device which work with speed of electrons where as I/O devices are electromechanical where it takes time to transfer (or) receive data from I/O device. Therefore Synchronization mechanism is required

→ The rate at which CPU transfers the data is very high where an I/O device cannot accept at the rate ~~at~~ ~~at~~ time

The rate at which CPU transfers can be accepted by the interface and stepdown to an I/O device speed and transfers to it.

→ The word formats understood by the CPU and I/O devices are different. The interface manages the block transfers (256b) in to word transfers

→ As the no of I/O devices increases it will be difficult for the CPU to remember and operate the no of I/O devices.

The interface maintains the device address and reduce the burden of the CPU instead of remembering I/O addresses



# example I/O of Interface



| CS | RS1 | RS0 |                                    |
|----|-----|-----|------------------------------------|
| 0  | x   | x   | NO operation                       |
| 1  | 0   | 0   | Port A Register                    |
|    | 0   | 1   | Port B                             |
| 1  | 0   |     | <del>Port A</del> Control register |
| 1  | 1   |     | Status register                    |

→ The interface will be selected with help of chip select input. after that the required register is selected with the help of Register Select Input (R<sub>0</sub> and R<sub>1</sub>)

- It can perform an "I/O read" and "I/O write operations" with the help of RD and WR Pin
- So the data to and ~~from~~ from the CPU will be transferred through data bus buffers
- portA and portB registers can act as both input and output
- The control register accommodates a control word based on which the I/O device starts ~~from~~ performing I/O operation
- These status register will make the CPU to know the status of the work completed

A synchronous Data transfer Method :-

Synchronous:

The communication is ~~to~~ handled between 2 devices having a common clock

## Asynchronous

The interaction between 2 devices under the different clock pulse



## In asynchronous Data transfer

- ① Using strobe (signal)
- ② Using handshake

### (1) Using Strobe :-

#### a) Source-initiated Strobe :-



#### b) Destination-initiated Strobe



The disadvantage with these method the source never knows whether the data is received by the destination or not.

To overcome these disadvantages 2nd method is proposed using Handshake.

## ② Using handshake

### ① Source-initiated handshake



- The data is placed on database and transmitted by source and enables datavalid signal
- The destination after receiving the data from database sends an acknowledgement to the source
- After receiving the acknowledged signal the source disables the datavalid signals and put another set of data onto the database and again enables datavalid signal
- These process will be continued until destination stops sending the acknowledgement signal

## b) Destination-initiated handshake



In this method the destination sends a ready to receive data signal to source. Upon source receiving the ~~reading~~ 'ready to receive' data signal. It put the data on to the databus, and enables datavalid signal.

→ After the data is received by destination, ~~it will~~ it's send ready to receive data signal again upon receiving it source disables the data valid signal and put another set of data on to the databus and validates the datavalid signal. These process will be continued until destination stops sending ready to receive data signal.

## Modes of Data transfer

- 1) Programmed I/O

→ JA Here  
 defined . set of  
 CPU follows that instru  
 program



→ The disadvantage with these method is  
CPU wait for an I/O device to transfer data,  
making the speed of the system come down  
to the speed of an I/O device.

→ When an OS finds CPU ~~busy~~ to give some  
task it appears to be "busy needless state"

### Interrupt Initiated I/O :-

The disadvantage of CPU waiting for an I/O  
device will end up using these method i.e.  
Whenever I/O device is ready it sends an interrupt  
to CPU, when then CPU completes the current  
instruction execution and process the I/O device  
request.



## interrupt



~~Priority~~

## \* Priority interrupt:

### Static:

① Daisy Chaining

② Static priority:

The priority is set during manufacturing time.

### Dynamic Algorithm

### Dynamic priority

→ priority varies



$PI$  = priority in

$PO$ : priority out

VAD = A vector address

B) If  $PI=1, PO=0$   
+ It means that  
current device has  
given interrupt  
and processing by  
CPU (device)

~~PI=1~~  
eventually ~~not~~ device 3 has made request  
Then highest priority is given  
to device. First it will check PI  
if

→ Interrupts of 2 categories

i) vectorized:

The interrupting source has to supply the address where the control has to transfer

## Non vectored:

CPU is aware of the address where the control has to transfer.

- \* The daisy chaining method is used in the computer to allow multiple devices to interrupt CPU, based upon the device priority. The process can process the demand.
- \* The highest priority device is acknowledged every time when the interrupt line goes high if he doesn't want to use the service of the CPU, it passes the priority to the next device.
- \* CPU processes the device whose priority  $P_1 = 1$  and  $P_0 = 0$ .
- \* The device can disable priorities of devices which are having lesser than the current devices and the devices interrupt request which are having more than the current device priority cannot be disabled.

## Arbitration logic:

conflict resolving logic

→ whenever more than one device process request, the processor has to judge to which device it has to process.

## DMA

(Direct memory access)

→ Removing CPU from picture and letting the peripheral device to transfer data directly to memory

Control signals of DMA

• BR: Bus Request

• BG: Bus Grant

INTR: Interrupt

chip of DMA



Address Register:

If stores

→ The address of the memory wh location (00000000) of the memory where data needed to be transfer

→ It will be incremented

→ DS → device selected (chip select)

RS → Register select (either Address Register (A) or word counter)

# Data Transfer



① peripheral device sends a DMA request to the DMA controller

② Upon receiving it DMA issues Bus request signal to the CPU.

③ After receiving the Bus request CPU grants

the Address register and word count register of DMA controller and then grants the bus by issuing busgrant signal

(4)

→ DMA controller acknowledges the peripheral device that the request is going to be processed

(5) & (6)

The DMA takes the bus and visits memory location and then transfers data from memory to peripheral device (or) from peripheral device to memory

\* for each one word of data transfer the content of address register will be incremented by 1 and word count register will be decremented

bx1

(7)

The DMA controller ~~issues~~ issues INTR signal to the CPU once the word count register value becomes "0"

(8)

After receiving INTR signal CPU examines the word count register value if it really becomes '0' then the CPU takes the busback

DMA performs 2 types of data transfer

- 1) cycle stealing
- 2) burst transfer

### \* Cycle Stealing:

It performs one word data transfer at a time

\* It is preferred for transfers less amount of data

### \* Burst transfer:

Bulk amount of data is transferred

The DMA almost performs burst transfer

### IOP (Input output processes):



The IOP and CPU communicate through shared memory

The CPU will request IOP to send its ~~data~~ status.

- IOP put in memory and CPU reads from memory and when status is ok
- now when CPU grants permission IOP will access to IOP ~~functions~~ <sup>requesting</sup> and performs the I/O operations

### CPU - IOP Communication



NOTE: example for ~~8086~~ 8086

for 8086, 8089 is IOP and IBM 360/370 is IOP

NOTE:

- Peripheral devices uses interrupt initiated I/O method to interrupt Input output processor (IOP)

→ IOP uses the DMA method for transferring data between memory and I/O devices

→ CPU and IOP communicate with each other by using "Shared memory organization" method

## \* Memory Organization \*



### Memory hierarchy

① ROM

$$16 \text{ K} \times 8 = 512 \times 8$$

Then no of chips  $(128 \times 8)$  chip required to build memory

Storage of RAM -  $128 \times 8 - (1)$

$128 \times 8 - (2)$

$128 \times 8 - (3)$

$128 \times 8 - (4)$

So we require 4  $128 \times 8$  chips for RAM



Main memory [Composed of both Random Access Cells]

→ 2 types of RAM

|                                              | Static RAM                                                              | Dynamic RAM |
|----------------------------------------------|-------------------------------------------------------------------------|-------------|
| → It is made up of flipflops (most reliable) | → It is made up of capacitors + Transistor = flipflop                   |             |
| → Density of storage is less                 | → Density of storage is more                                            |             |
| → NO Periodical Refreshment is required      | → Has capacitors tends to leak, so periodically refreshment is required |             |
| → Costlier                                   | cheaper                                                                 |             |
| → Retains data for longer periods of time    | shorter Retains data for short duration                                 |             |
| more faster (high speed)                     | Comparatively slower than SRAM                                          |             |

used cache  
memories

used for main memory

## Memory Chip:

### RAM chip



### ROM chip:



OE - Output enable

### NOTE

ROM occupies more storage than the RAM size

CPU - Connection to main memory



Consider a computer system ~~to~~ consist of  $1024 \times 8$  has main memory, out of which exactly half is RAM and half memory is ROM. The RAM chip available  $128 \times 8$

and ROM chip available is  $512 \times 8$ . how many chips are required to form main memory and specify the size of decoders used.

So



Note on Mainmemory [both RAM and ROM]  
i.e.

If there are multiple ROM chips then will be a decoder

→ The decoder size is dependent upon the no of chips



$\rightarrow$  Used to select ROM (or) RAM  
Memory Addressing

|      |            |
|------|------------|
| Ram1 | 0 - 127    |
| Ram2 | 128 → 255  |
| Ram3 | 256 → 383  |
| Ram4 | 384 → 511  |
| Ram5 | 512 → 1023 |

Q) How many 128x8 RAM chips are needed to provide memory capacity of 2048 bytes?

1A) How many lines of address bus must be used to access 2048 bytes of memory how many of these lines will be common to all chips?

1C) How many lines must be decoded for chip select? Specify the size of decoder.

Answers



4 - ~~2~~ 2<sup>7</sup>

6 - 9

9 - 4  
20 - 2

4 - 2<sup>0</sup>

1A)

$$128 \times 8 \rightarrow 2^7 \times 8$$

$$2^7 \times 8 \rightarrow 2^{11}$$

$$\therefore x = 2^4 = 16 \text{ chips}$$

$$x = 16 \text{ chips}$$

How much  
I should  
be multiplied  
to get  
2048 bytes

1B) + 11  $\rightarrow$  address lines

+ 8  $\rightarrow$  data lines

+ all common to all

1C) 4:16

Extend the memory  
RAM and ROM by 1024 bytes each.  
List the memory address map and indicate what  
size decoders are needed

8 RAM chip size : (1096)

ROM chip size : (1096)

\* \* calculation for RAM \*

$$\rightarrow 2^7 \times 12 = 2^{12}$$

$$(n=2^5)$$

[x=32] 128x8 chips required

$$2^2 \times 10$$

$$2^{12} = 2^5$$

→ Decoder size

$$5:32$$

\* \* calculation for the ROM

$$- 2^{12} \times 8 = 2^9 \times 8$$

$$n = 2^3$$

8 512 ROM chips required



3:8 decoder.

20M chips of  $1024 \times 8$ . The computer system needs  $2^4$  bytes of RAM,  $4K$  bytes of ROM, and 4 interface units, each with a register. A memory mapped I/O configuration is used

- How ~~many~~ many Ram and Rom chips are needed
- Draw a memory address for the system.

| <u>RAMs</u> chips                                                                                       | <u>ROM</u> chips                                                                                                                                                                    |
|---------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $= 256 \times 8 = 2^8 \times 2^3 = 2^{11}$ $= 2^8 \times 2^3 = 2^{11}$ $= 2^8 \times 2^3 = 2^{11}$      | $= 2^{12} = 2^{10} \times 2^2$ <p style="border: 1px solid black; padding: 2px; display: inline-block;">4 ROM chips</p> <p style="margin-left: 20px;"><math>2^{11}</math> bytes</p> |
| $= 8 \text{ RAMchips}$ <p style="border: 1px solid black; padding: 2px; display: inline-block;">3:8</p> |                                                                                                                                                                                     |

Total main memory size is 61K

- for this type of question Given information
- RAM
- RAM, ROM space (O) directly gives total MM size and memory condition
  - available RAM and ROM chips
  - finds No. of RAMchips and ROMchips
  - find O size of decoder for RAM and ROM

## Auxiliary memory

(Hard disk)

- Magnetic disk
- magnetic tape



Rotational delay: delay caused by 120 rpm rotation of magnetic disk  
seek delay: delay caused by R/w header

## Disk consists of

### \* platters:

These platters consists of concentric circles known as tracks. These tracks are further divided in to sectors and sectors consists of blocks

→ To access data from the disk 2 types of delays are encountered

(i) Rotational delay: It is the delay caused by moving disk

(ii) Seek delay:

It is delay caused by R/w header

→ R/w header restricts its movement towards north and south direction directions only

- whereas disk can rotate 360°
- It is an electromechanical device where mechanical operations are involved in accessing data
- The rate at which the data is ~~is~~ saved should be equal to the rate at which it is retrieved
- If there is a mismatch then loss of data will occur
- It is a semi-random access device meaning that the file header can be placed randomly at any sector, but processing will be done sequentially

### Magnetic tapes :-

- These devices are obsolete (<sup>not used</sup>)
- These are sequentially accessed devices
- These are used for Bulk storage of data at a cheaper price.

### \* Associative Memory \*

- It is also known as content addressable memory

# (C) SUT-DHAK

Argument registers

Key registers

conditions



Q1:

content of significance  
of key reg

significance of key reg

1011011 Argument Register

1111111 → indicates  
for searchability

|   |   |   |   |   |   |   |  |   |
|---|---|---|---|---|---|---|--|---|
| 0 | 1 | 1 | 0 | 1 | 0 | 1 |  | 0 |
| 1 | 0 | 1 | 0 | 1 | 0 | 1 |  | 0 |
| 1 | 0 | 1 | 1 | 0 | 1 | 1 |  | 1 |

OP = 0010

D/P = 1011011

1011011

11110000

1st index  
second  
6 search bits

|   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|
| 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 1 | 0 | 1 | 1 | 1 | 1 | 1 |

OP = 0110

OP = 101010  
10111

→ Here we are ignoring  
remaining bits.

- Associative memory is used
- 1) insert
  - 2) Delete
  - 3) search operation
- The content need to be inserted, so it is inserted (or search will be inserted in) to argument register, key register will be inserted in to argument register, key register
- The size of the argument register and memory word are equal in size
- and memory word's word size
- The key register is optional,  $\rightarrow$  Bx default if consists of all 1's
- match register consists of single bit of storage for each word of memory array
- If a match is found the corresponding match bit will be one hence else 0
- associative memory is costlier memory because match logic is associated with each and every cell
- When a portion of data is to be searched only those corresponding bits in the key register should be made 1 and remaining bits will be 0's
- The content of argument register will be searched parallelly among all the memory locations in the memory array.

## Locality of reference



$$\text{Hit ratio} = \frac{\text{No of hits}}{\text{Total no of references}}$$

$$\text{Avg Memory access time} = H(\text{Cache}) + (1-H)[\text{Cache time} + \text{main memory access time}]$$

$H = \text{Hit ratio}$

$(1-H) \rightarrow \text{miss ratio}$

C  $\rightarrow$  Cache

→ CPU generates the main memory address with that address it visits the cache and finds that instruction then it will return to CPU, if some instruction is not found in cache, it visits main memory, while it is taking that instruction to CPU it drops a copy in to the cache memory for future reference.

### Mapping:

Transformation of data from main memory to cache is known as mapping.

### Type of mapping:

- 1) Associative mapping
2. Direct mapping
- 3 Set-associative mapping

### Associative mapping:

In these mapping the cache memory is made up of associative memory.



The size of the argument register is equal to  
the size of the address part.

- The associative memory array is made with address part and Data part.
- CPU Generated address will be kept in to the argument register, then argument register content will be searched in the address part of memory ~~array~~, if a match is found its corresponding data is returned to CPU, it is termed as hit.

→ If no match is found it visits main memory and drops a copy of address and data in to the cache memory for future reference.

### Direct Mapping:

As the Associative memory is costlier (because match logic is associated with each and every cell), the designer thought of implementing cache memory with same semiconductor material like Main Memory.



→ ~~one~~ constraint for direct mapping

~~one~~ for one index only one tag

Set Associative (P7)

for one index we can have

many tag values



Ex:



set associative:  
the disadvantage of direct mapping is for one index only one tag value can be stored in cache, with that the probability of finding the required data in the cache memory will be difficult.

for that reason set associative mapping technique come in to existence. In these techniques more than one set ( $\langle$  Tag, Data  $\rangle$ ) is accommodated in the cache.

Here for one index value  $\rightarrow$  more than one tag value  $\rightarrow$  ~~one tag value will be assigned~~

→ A two way set associative cache memory uses blocks of 4 words. The cache can accommodate a total of 2048 words from main memory. The main memory size  $128K \times 32$

a) How many bits are there in tag, index, block and word fields of the address format.

b) How many bits are there in each word of cache and how are they divided in to function

c) how many blocks can be cache accommodate

d) What is the size of cache memory



Ans  $128K \times 32$

\* 17 → Address lines

\* 32 → data lines

CPU generates 17 bit



| Two way |      |         |  |
|---------|------|---------|--|
| Tag     | data | Tag / d |  |
| 6       | BL   |         |  |
| 11      | 32   |         |  |

2048 words

$$= 2^6 \cdot 2^{11}$$

→ 11 words

B

~~functions~~

In 17 bits 11 bits will be index

3

6 bits will be tag

d) The size of cache

$$= 2048 \times 76$$

$$(6 + 32 + 6)$$

$$\frac{138}{26}$$

a) In tag → 6 bits, index = 11 bits

word bytes → ~~2~~ 2

Tag / index



2048 × 3



- TO address ~~word~~ block  $2^{10} = 2^9 = 512$  bits
- TO address word in block  $4 \times 2^2 = 16$  bits

a) 10 bit Ad

$$\text{Total block} = \frac{2^{11}}{2^2} = 2^9 = 512$$

given size of block - 512 bytes

a)

Tag = 6  
Index = 11  
Block = 9  
Word = 2

b) 76 bits



$$\frac{2^{11}}{2^2} = 2^9 = 512 \text{ bytes}$$

d) \* 2018 & 76

- Q) A digital computer has a memory unit of 64Kx16 and a cache memory of 1kwords. The cache uses direct mapping with a block size of 256 words.

a) How many bits are free

- (i) Tag = 6
- (ii) index = 10
- (iii) block = 256
- (iv) word fields of address format



b) how ~~many~~ many bits are there in each word of cache and how are they divided in ~~function~~?

Include a valid bit.

- c) how many block can be cache accommodate
- d) What is the size of cache memory

2



direct mapping



CPU generated address - 16 bits

Instruction Index = 10 bits

Tag = 6 bits

a) Tag = 6 bits

Index = 10 bits

block = 8 ( $2^8 = 2^3$ )

word = 2

b) bits - ~~Tag~~ Tag = 6

data = 16

= 23 bits

We have included valid bit  
given



c) No of blocks = 256

d) Size of cache = 1K x 23

## Virtual memory:

An extension to main memory which uses portions of secondary memory is known as



→ Total MS secondary memory can be virtual but all of them should be in DUDANED.

→ Address space: The address generated by programmer is known as logical address. Collection of such logical addresses is known as address space.

→ Memory space: CPU generates physical address. Collection of those addressed is known as memory space.

### Question

An address space is specified by 24 bits

and corresponding memory space by 16 bits.

- No. of words are there in address space
- No. of words are there in memory space
- If a page of 2 words, how many pages and blocks are there in system

$$a) 2^{24} = 16 \text{ MB}$$

$$b) 2^{16} = 8 \text{ Gb}$$

$$c) \text{No of Pages} = \frac{16 \text{ MB}}{2 \text{ KB}}$$

$$\boxed{\text{No of Pages} = \frac{\text{No of frames}}{\text{No of block}}} = \frac{16 \times 1024}{8 \times 1024} = 2^{13}$$

b) No of blocks

$$\frac{82}{2^{10}} = 32 \text{ blocks}$$

blocks

## page Replacement algorithms

\* The Two most commonly use page replacement Algorithms

1) FFO

2) LRU

FFO

→ The FFO algorithm selects for replacement the page that has been in the memory the longest time

→ LRU → It based on the assumption that Least Recently Used page is a better choice for removal than the least recently loaded pages as in FFO

(b) A Virtual memory System has an address space of 8K words, a memory space of 4K words and page and block size of 1K words. The following page reference changes occur during a given time interval

4, 2, 0, 1, 2, 6, 1, 4, 0, 1, 2, 3, 5, ?

Determine the no of page faults and the pages that were resident in mm after each page reference change if the replacement algorithm used is ~~a) LRU~~ a) FIFO

b) LRU

[pagefault] - If a referenced page is found in cache it is termed as a page hit

→ If a required page is not found in cache then it is termed as pagefault

A)      4    2    0    1    2    6    1    4    0    1    2    3    5  
 fib:      \*    \*    \*    hit    \*    hit    \*    hit    hit    hit    \*    \*

|   |   |
|---|---|
| 4 | 5 |
| 2 | 7 |
| 0 | 2 |
| 1 | 3 |

$$\begin{aligned} \text{pagefaults} &= 10 \\ \text{pagehit(s)} &= 5 \end{aligned}$$

Some of block starting 4, 2, 0, 1  
will not consider as pagefaults

Given memory space = 4K word ✓

Size of each frame = 1K word

No of frames  $\frac{4\text{Kword}}{1\text{Kword}} \rightarrow 4$  frames

So

We can consider pagefaults 6

e.g. by depending on options

420 126 140 102 357



V 12-V

4. 2 0 1 2 6 1 4 0 102

357



LPU  
It will replace which is not used for longest period of time

MPU  
If will replace which is used just now

4. 2 0 1 2 6 1 4 0 102

|   |   |
|---|---|
| 4 |   |
| 2 | 6 |
| 0 |   |
| 1 | 2 |

returnall

4 2 0 1 2 6 1 4 0 102 357

|   |    |
|---|----|
| 4 | 27 |
| 2 | 6  |
| 0 | 3  |
| 1 | 5  |

q → pagefaults  
G → page hits

for these we can't say ~~how many~~ how many will receive in mm (monthly)

## Flexness classification

→ M.J. Flynn classified computers on the basis into four categories based upon their instruction fetch and instruction execution.

(i) SISD [Single ~~one~~ instruction stream : Single Data Stream] → uniprocessor

(ii) SIMD (Single instruction stream multiple data stream)

(iii) MISD (Multiple ~~inter~~ instruction Stream single data stream)

(iv) MIMD (multiple instruction stream multiple data stream) → multiprocessor

## Pipelining:

→ Decomposing a sequential process into sub operations where each sub operation is given dedicated segment such that all segments work ~~at~~ concurrently.

→ ~~preliminary~~ Pipelining can be applied on 2 concepts

1) ALU Operations (data stream)

2) Instruction execution (instruction stream)

## Arithmetic Pipelining :

$$A_i = (A_i + B_i) + C_i, \quad i=1 \text{ to } 7 \text{ (execute 7 times)}$$

Space time diagram ( illustrates behaviour of Arithmetic pipeline )



node  
(DEK  
pulse)

without pipelining

- |                           |                                                                                                           |
|---------------------------|-----------------------------------------------------------------------------------------------------------|
| 1. $A_i, B_i$             | $\left\{ \begin{array}{l} 7 \times 3 = 21 \rightarrow \text{clock pulse} \\ \text{ } \end{array} \right.$ |
| 2. $A_i \times B_i, c_i$  |                                                                                                           |
| 3. $A_i \times B_i + c_i$ |                                                                                                           |



### INSTRUCTION PIPELINE +

steps involved in executing a instruction

- ① fetch the instruction
- ② Decode the instruction and calculate effective address if required
- ③ fetch the Operand
- ④ execute the instruction
- ⑤ store the Result.

| Pipeline Stage | FI <sub>i</sub> | D&EA <sub>i</sub> | FO <sub>i</sub> | E & SR <sub>i</sub> |
|----------------|-----------------|-------------------|-----------------|---------------------|
| 1              | FI <sub>1</sub> | -                 | -               | -                   |
| 2              | FI <sub>2</sub> | D&EA              | -               | -                   |
| 3              | FI <sub>3</sub> | D&EA <sub>2</sub> | FO <sub>1</sub> | -                   |
| 4              | FI <sub>4</sub> | W&EA              | FO <sub>2</sub> | E & SR <sub>2</sub> |

Note: pipelining starts at 4th clock cycle = no of segments

Optimisation of Pipelining at = no of segments

Pipeline Conflicts / Difficulties in implementing Instruction

Unit for instruction pipelining

### Resource Conflicts (Bus)

1. Bus Dependencies ( )
2. Branching Difficulties ( ) → add()

add()



The value of C will not be 9 it will be 6 (because it's only 5 instns.)

### Resource Conflict

- ① When the bus is been used to perform instruction fetch and operand fetch in the same clock cycle a conflict arises in allocating Bus to perform which operation

### Data dependencies

- ② When a compiler finds the symbol which appearing on LHS also appears immediately on RHS a conflict arises which is known as data dependency

## Solutions to overcome data dependency

- 1) Hardware interlocks ✓
- 2) Operand forwarding ✓
- 3) Delayed load ✓

## 3) Branching difficulties :-

→ As the pipelining concept execute the instruction sequentially, there are instructions which will make the program to run out of sequence.

e.g. add(1);  
      ^  
      |  
      3  
      |  
      100 add(1)  
      |  
      {  
      |  
      5

basically if we apply pipeline  
example: If we execute 5th instruction  
but we need 100th value

## 4) RISC Pipelining (Reduced instruction set computer)

→ These processor ~~will~~ implement the instruction pipeline in 3 segments

~~D~~ T : instruction fetch

A : ALU operation

E : execute instruction

## Vector Processing

Solutions for branching difficulties :-

- 1) prefetch target instruction
- 2) Branch-target Buffer
- 3) Branch prediction
- 4) delayed Branch.

## Vector Processing

- ~~problems~~ A high dimension vector calculation will be done on high end machine in very less amount of time, that are beyond the capability of a conventional computer
- These problems are characterized by the fact that there required a vast number of computations and will take a conventional computer days to complete
- in many and for engineering applications the problems can be formulated in terms of vectors and matrices that lend themselves to vector processing

## applications

- 1) long range weather forecasting
- 2) petroleum explorations
- 3) seismic data analysis
- 4) medical diagnosis
- 5) Aerodynamics & spaceflight simulations
- 6) Artificial intelligence & expert system.
- 7) ~~Image~~ processing

## Vector Operations

- 1) Loop operations
- 2) matrix multiplication
- 3) memory interleaving
- 4) superscalar processors
- 5) super computers

Register mode

Processor always to locate an operand  
register

Immediate mode

The content is loaded into the register itself

Ex: mov R0, #10h (16 bit immediate)

(constant memory)

2) Direct mode:

If an address is contained in register with  
by an offset

MOV R0, #E

3) Indirect mode:

MOV R0, @R1

( $\Rightarrow$  Base is register  
and its value  
contents in higher  
of content  
+ offset)

4) Register / Constant mode

If both the address and content

in register it is termed as register

5) Register indirect mode:

MOV R0, @R1



6) Auto method / Auto direct mode:

It is computer ~~now~~ provided indirect mode  
where the content is passed by the processor itself  
decrements to the next address

→ to open [open]  
→ to close [close]

~~Search mode :~~

Point 2<sup>nd</sup>  
Enter address + edit by pushing the memory  
instead of pc key pressed on index  
register is used in calculation of effective  
address

Ex:

Index register = 5001      I1 0001/43

$$\begin{array}{r} EA = 5001 \\ 431 \\ \hline \underline{\text{I1}} \end{array}$$

## Some of the memory

- Q) The memory which is used to store the copy of data
- (i) instructions stored in Larger memory inside  
Processor cache called "Level 1 cache"
- (ii) The Larger ~~memory~~ memory placed between the primary cache and memory is called
- Level 2 cache (main cache)
  - ~~Level 2 cache [secondary cache or external cache]~~
  - EEPROM
  - FLLB

(iii) The next level of memory hierarchy after L2 cache is

- SMT
- TLB
- OMM
- Registers



New Level means above Level

Level 1 is a memory which is inside processor (or) CPU

→ It is implemented with the use of SRAM

→ SRAM ~~use~~

Level 2: It is outside processor and is between the primary memory and CPU

Cache manages the both L1 and L2

fastness of of



## Q2 Cache memory

Reason for the implementation of the cache memory is \_\_\_\_\_

- a) To increase the internal memory of the system
- b) The difference in speeds of operation of the processor and memory
- c) To reduce the memory access and execute time of the memory
- d) None of the mentioned

The effectiveness of the cache memory is based on the property of \_\_\_\_\_

- a) Locality of reference
- b) Memory Localisation
- c) memory size
- d) None of the mentioned

D Locality of reference  
when there are more operands

The temporal aspect of the locality of reference means \_\_\_\_\_

The recently executed instruction will be executed soon again

4. The spatial aspect of the locality of reference means \_\_\_\_\_

Ans That the instruction in close proximity [clustering] of the instruction executed will be executed in future.

Explanation .

→ nearby instruction is more likely to be executed in future

5) The correspondence between the main memory blocks and those in the cache is given by mapping function

7) The write-through procedure is used

To write directly on the memory and the cache simultaneously.

④ Dirty bit is used to signify that the cache location is updated.

9) copy-back protocol is used

a) To copy the contents of the memory onto the cache

b) To update the contents of the memory from cache

4. The spatial aspect of the locality of reference means \_\_\_\_\_

Ans That the instruction in close proximity [clustered] of the instruction executed will be executed in future

Explanation:

→ nearby instruction is more likely to be executed in future

5) The correspondence between the main memory blocks and those in the cache is given by mapping function

6) The walk-through procedure is used

To write directly on the memory and the cache simultaneously.

7) Dirty bit is used to signify that the cache location is updated.

8) copy-back protocol is used

a) To copy the contents of the memory onto the cache

b) To update the contents of the memory from cache

- D) To remove the contents of cache and push all  
on to memory

Q) none

Explanation -

This is another way of performing the write operation, where in cache manipulation first and then the memory.

- B) The approach where the memory contents are transferred directly to the processor from memory is called Early-start

---

which memory is difficult to interface with processor?

- A) static memory
- B) Dynamic memory
- C) ROM
- D) none

Ans:

Because it requires periodical refreshes

---

The minimum time delay required between initiation of two successive memory operations is called memory cycle time

for a memory system the cycle time is is some as  
longer than access time

$$\boxed{\text{Cycle time} = \text{Latency time} + \text{transferring}}$$

Latency time = Latency time is overhead of finding the right memory location and preparing to access

Transmission time : Time required to transfer the data

⑧ The Refreshing rate of dynamic RAMs is approximately once in 2 millisecond"

and dynamic RAM has higher bit density

a

⑨ ii) The access time of magnetic bubble memory is approximately

- A) 30 nanosecond
- B) 30 micro second
- C) 30 pico second
- D) 0.5 second

12) The use of "hardware" in Memory management  
↳ through "segment relocation" and protocols

SE → To perform address translation to reduce size of  
→ memory / " " . . . . executable

memory refreshing may be done

A. By the CPU that contains a special refresh counter, only

10. Q1 (bx external refresh)

1) CPU register, cache, main memory are internal memory of the system

2) An ~~advantage~~ advantage of memory interfaces is that

A) Large memory is obtained

B) Effective speed of the memory is increased

C) The cost of the memory is reduced

D) volatile memory is obtained

2) Which of the following need extra hardware  
for DRAM refreshing?

1085  
55 Motorola - 68000

10

NOTE

→ Property of Locality of reference may fail  
in a program how

- A Many conditional jumps
- B Many unconditional jumps
- C  many operations
- D many operations

Q4 what is the correct sequence of time delays to trust happen during a data transfer from a disk to memory

Sol: seektime → latencytime → transfer time

It is the time required to read/write data to move from one track to other access time

The time a program or device takes to locate a single piece of information and make it available to computer for processing

the time it takes to position the proper sector  
[In sector tracks will be true] under read/write  
head

25) How many RAM chips of size (256K x 1bit) are required to build 1M Byte memory?

- A. 8
- B. 32
- C. 24
- D. 32

$$256k = 2^18$$

$$\underline{2^{18} \times 1} \times n = 2^{10} \times 8$$

↓

In order to earn 1 bit

$$n = \underline{(8 \times 1)} \rightarrow \text{one chip}$$

If it is earning 8 bits (no clocking)  
↓  
So 32 chips are require

latency time

the time it takes to position the proper sector  
[In sector tracks will be true] under read/write  
head

25) How many RAM chips of size (256K x 1bit) are required to build 1M Bxk memory?

- A. 8
- B. 32
- C. 24
- D. ~~32~~

$$256K = 2^{18}$$

$$\underline{2^{18} \times 1} \times x = 2^{10} \times 8$$

↓

⊗

In order to earn 1 bit

$$n = (8 \times 4) \rightarrow \text{one chip}$$

If it is earned in 8 bits (no clatency)  
↓  
so 32 chips are require

Four memory chips of 16by 16 have three address buses connected to it. The size will be  
of size 16x16

How many addresses are require for 25x60 video?

Q 1000 [25x60 = 1500]

A flip flop can be converted to T flip flop  
by additional logic circuit

B  $D = T \oplus \bar{D}$

C  $D = T \cdot \bar{D}$

D  $D = T \oplus \bar{D} = \bar{D}$

→ If main memory of 8k bytes and its cache is 2k words. It uses absolute mapping  
then each word of cache shall be

Q  $2 * 8 = 16 \text{ bits}$

optimal memory management aspects of a hardware  
implementing booth's algorithm have (11101) & (1100)

(11010)

memory unit is accessed by contents called

absolute memory

Aging register are

counter which indicate how long these  
absolute page have been referred

48, 52, 53, 58, 59, 64

i) The address of PTT table is pointed by  
page table base register

The performance of cache measured on terms of  
hit ratio

→ A R bit field can specify  $2^k$   
registers

Translation from symbols is to Binary is done in —

## 2 phary

→ 159, 160, 173, 176, 181, 184, 186, 191, 198, 199, 200  
→ fragmenting dividing file  
size fragments  
→ memory to equal  
→ 165, 166, 167, 168

→ dedicated computer means which assigns file  
one and only one task

→ interrupts initiated through instruction called software  
interrupts

→ Memory access in RISC architecture is limited to  
instructions STA and LDA

→ CPU doesn't perform the operation

- A) Data transfer
- B) Logic operation
- C) Arithmetic operation
- D) All of above

, AS CPU  
contains  
ALU, IF  
will perform  
arithmetic and  
logic