

## Computer Organization

# Computer Organisation & Design

(1)

## Module 1: Computer Functionality

- (1) Basics : (i) Computer (ii) Program (iii) Instruction  
 (iv) Encoding (v) Decoding

## ② Designer and User View of Computer



(a) Designer View



(b) User View

## ③ Data Representation

### (i) Data

### (ii) Representation chart



$$\begin{array}{ccccccc}
 & n & (n-1) & (n-2) & \cdots & 0 \\
 \boxed{1} & \boxed{1} & \boxed{1} & \boxed{1} & \cdots & \boxed{0}
 \end{array}$$

2's comp:  $(-2)^n + \sum_{i=0}^{n-1} b_i \times 2^i$

(iii) Unsigned Number:  $(0 \text{ to } 2^n - 1)$

unsigned

(a) There is no overflow case.

(b) If range is exceeded the carry flag is set.



(iv) Signed Data:  $(-2^{n-1} \text{ to } 2^{n-1} - 1)$

(a) There is overflow case in it, so an overflow flag is required to handle range exceeding.

(v) Addition:  $(n\text{-bit}) + (n\text{-bit}) = (n+1)\text{ bit}$

(vi) Multiplication:  $(n\text{-bit}) * (n\text{-bit}) = (2n)\text{ bit}$  ss total multiplication possible  
[ $2^n \times 2^n$ ] ( $2n$ ) bits ROM

(vii) Overflow Case Signed Data:

$$\begin{array}{r} x \ x_1 \ x_2 \ x_3 \\ + y \ y_1 \ y_2 \ y_3 \\ \hline z \ z_1 \ z_2 \ z_3 \end{array} \rightarrow \text{Overflow} = (x \bar{y} \bar{z} + \bar{x} \bar{y} z)$$

Adding two negative we get + (which is not possible)

Adding two positive we get negative no. (which is not possible)

(b) We only consider Overflow

flag in case of Signed Number, no matter if carry flag is set or not. (Result of addition is also one bit)

(c) If OV=1 (Flag is set) then:

if  $\bar{x} \bar{y} z = 1$  (positive overflow) : MSB = 0 | Data

.. if  $x \bar{y} \bar{z} = 1$  (negative overflow) : MSB = 1 | Data

include sign bit also

(viii) Program Status Word: Used to represent data format



n = 8 bit

(ix) Floating Point Representation

$$\text{Excess Bias} = \left(\frac{2^n}{2}\right)$$

$$(a) \pm M * B^{\pm e} \quad M = \text{Mantissa}$$

B = Base

e = exponent

$$(b) \text{Normalized form: } \pm 1.M * B^{\pm e}$$

Implicit

$$\begin{array}{l} 111.111 \Rightarrow \text{Normalize} \Rightarrow 1.1111 \times 2^{+2} \\ 00000111 \Rightarrow \text{Normalize} \Rightarrow 1.11 \times 2^{-4} \end{array}$$

move right  
move left

(c) Single Precision :



$$\text{Bias} = + (2^{n-1} - 1) = +127$$

$$BE = \text{Actual E} + \text{Bias}$$

(d) Double Precision :



$$\text{Bias} = + (2^{n-1} - 1) = +1023$$

$$BE = \text{Actual E} + \text{Bias}$$

(3)

## ③ Range of Floating Point

$$\pm M * 2^{(B.E - \text{Bias})}$$

Bias: +127

Sign

 $\boxed{+}$  BE

BE

Range = { -127 to +127 }

Mantissa range = { 1 to  $(2 - 2^{-23})$  }  $\leftarrow (2^n - 1) * 2^{\text{Range}}$ 

(Normalize)

Bias: +1023

Sign

 $\boxed{+}$  BE

BE range = { -1023 to +1024 }

Mantissa range = { 1 to  $(2 - 2^{-52})$  } $(1111\ldots 53 \text{ times} * 2^{-52})$ 

Normalize

## ④ Computer Architecture

### i) Von-Neumann Architecture

- \* Dynamic over program
- \* Program must present in main memory.



\* CPU generates physical address for execution of program.

ex: 8085 MP, 8086 MP

### ii) Harvard Architecture

- \* Instructions are permanently present in Code Mem.
- \* Data is dynamic
- \* Execute only one type of operation at a time.



## ⑤ Memory Organization

memory chip is collection of bit

i)



ii) Memory requests generate by CPU:



iii) Memory Config:



iv) 9mside RAM



v) Nbit-CPU : It means CPU process N bit data at a time.  
It is also word length of CPU. If CPU is 8 bit :  
then word length is 8 bit. 8 If it is 16 bit, word length is 16 bit.

Word Addressable Memory : It means each cell of memory stores 1 W.

Byte Addressable Memory : It means each cell of memory stores 8 bit.

nibble Addressable Memory : It means each cell of memory stores 4 bit.

Bit Addressable Memory : It means each cell of memory stores 1 bit.

\* These all has nothing to do with Address line.

\*  $2^n \times 16$  : It means :

Address line : n bit

Cell size : 16 bit

④ In memory data always stored in <sup>byte</sup> addressable Format. ex: Byte Addressable Memory means each memory cell stores 8 bit.

⑤ CPU always process data in Word length format.

ex: 32-bit CPU it means CPU process 32 bit data at a time.

⑥ To make data in word length parallel decoders are used.

ex: Byte Addressable Memory & 32 bit CPU



\* use merge 4 cell together to make it 32 bit word length.

⑦ Interfacing of cell's to make it word bits, it cause problem in addressing to solve this problem two methods are used:

a) Little Endian : LSB store on lower address and MSB on higher

b) Big Endian : LSB store on Higher address and ~~MSB~~ MSB on lower



Data: 20 80  
MSB LSB

## ⑧ CPU Architecture

i) Processor contains set of hardware pins that are used to operate with externally connected component.

ii) 3 types of hardware pin

(a) Active High : when clock signals are high.

(b) Active Low : when clock signals are low.

(c) Tristate Pin : this pin is used to carry multiple data but not at same time.

Address Latch Enable (ALE) : It is hardware used for time multiplexing so that bus carry multiple data.

$ALE = 0$  = Bus carrying data

$ALE = 1$  = Bus carrying address

(b) ALE contains 2 pins: STB pin (Active High) (STROBE PIN)  
 $\overline{OE}$  pin (Output enable) (Active Low)

STB = 0 = I/P line disable  
 1 = I/P line enable



(iv) Machine Cycle: No. of CPU cycles required to complete one machine instruction

(b) It depends on Designer.

(v) Memory interfacing [8085] [Mem]



(vi) System Bus: It is used to provide communication b/w major component of system i.e. Memory & I/O

(b) Types of lines in SB

\* Address Line:

- Unidirectional
- with help of it Mem capacity of system can determine

Ex:  $A_{15} \dots A_0$ : 8085 MP  
 $A_{19} \dots A_0$ : 8086 MP

\* Data Line

- Bidirectional
- with help of it word length of CPU can determine

Ex:  $A_7 \dots A_0$ : 8085 MP  
 $A_{15} \dots A_0$ : 8086 MP

\* Control Line

- Control Signal: RD or WR
- Timing Signal: used for Sync. b/w Mem/I/O device.

(c) Common bus cause Ambiguity. To solve it following Bus config are used:

\* I/O Processor (IOP)

- Separate BUS
- Common Control Signal & Address Bus
- Different bus for Mem & I/O

\* Isolated I/O [Jo Mapped I/O]

- Extra Pin
- Common data bus & Address bus
- Different CS for Mem & I/O.
- (I/O & M) Pin is used.

\* Memory mapped I/O

- Shared Mem
- Common CS & Address Bus.
- Shared Mem with I/O.
- Complete mem nature.

| Pins | [J10 / M] | RD | WR | LOAD | STORE |
|------|-----------|----|----|------|-------|
| 0    | 0         | 0  | 0  | IN   |       |
| 1    | 1         | 1  | 0  |      | OUT   |
| 2    | 0         | 0  | 1  |      |       |



## ① Instruction Cycle

i) Execution sequence of program.

ii) Types:

a) Fetch cycle (FC)



b) Execution cycle (EC)

## iii) Fetch cycle & Execution cycle:

c) Fetch instruction based on Program Counter  
CPU                            from memory

d) PC is register, which holds address of instruction that is going to execute by CPU. It always increments by Step size. Step size is fixed constant depends on CPU word length, because CPU always execute instruction ~~in~~ in word length manner.

e) -t : starting address of program } To start execution starting Address sent to accumulator (trace)

f) [opcode = word length]      [opcode < word L]      [opcode > word length]  
(not more)

g) [1B instruction]      [2B instruction]      [3B instruction]  
[opcode = 8 bit]      [opcode = 16 bit]      [opcode = 24 bit]  
[intermediate = 8 bit]      [intermediate = 16 bit]

h) Before starting of execution cycle, program counter change to valid Program counter of next instruction. For example, before starting execution of instruction  $I_1$ , PC changes to  $I_2$ , then execution of  $I_1$  begin.



## iv) Execution cycle

a) CPU organization classified into 3 part based on ALU Data path:

\* Stack CPU   \* Accumulator   \* General Register CPU

## D) Stack CPU

- All Operands present on stack.
- It supports 0-Addressing mode only OPCODE requires.
- Opcode: PUSH, POP, ADD, MUL, DIV, SUB
- Stack memory require.
- Always fetch data from ~~TOP~~ TOP of stack.

$S_1 \quad S_2 \quad D$   
 TOS      TOS      TOS



## (C) Accumulator CPU

- In it One Operand is present in Accumulator and other can be present in ~~any~~ register or memory.

- It supports 1-Addressing mode, because other operand.

- Opcode: LOAD, STORE, ADD ~~OR~~, MUL ~~OR~~ ...

|                                                      |                                                                                |
|------------------------------------------------------|--------------------------------------------------------------------------------|
| $S_1 \quad S_2 \quad D$<br>ACC    Reg.    ACC<br>Mem | $S_1 = \text{Source 1}$<br>$S_2 = \text{Source 2}$<br>$D = \text{Destination}$ |
|------------------------------------------------------|--------------------------------------------------------------------------------|

- Register File: It stores ~~all~~ all register.

*1. Under addressable Register file = N, it means N registers are present.*

- Memory Spill: Storing of ~~any~~ data from Register to Memory.  
Unwanted memory access.

- Order of execution matter to optimize code.

*ex:  $(A+B) / (C+D)$  → It requires 2 mem spill.*

## ② Expand Opcode Technique

\* It is useful in fixed length instruction set.

\* In it we merge free bits to Opcode, so maximum opcode possible.

$$\text{Instruction} = \frac{\text{Opcode}}{\text{Size}} + \frac{\text{Address}}{\text{Size}}$$

- \* Rules:
  - Identify higher order instruction, i.e. smallest opcode in instruction
  - Identify total instruction possible
  - Identify total free instruction after primitive instruction.
  - Assign free ~~bits~~ instructions to lower order instruction.

$$\boxed{\text{# of instruction} = \text{No. of Free Opcode} \times 2^{\text{Address bit}}}$$

(9)

## General Purpose CPU Registers CPU

(i) Two kind of encoding scheme possible:

- \* Register to Memory interface
- \* Register to Register interface

(ii) If it is 2-Address Type Instruction.

Inst : [Op-Code | Reg | Addr | Addr ]

(iii) Register to Memory interface.

Inst : [Op-Code | Reg | Mem ]      S<sub>1</sub>      S<sub>2</sub>      D  
                                        Reg      Mem      Reg

(iv) Register to Register interface

Inst : [Op-Code | Reg | Reg ]      S<sub>1</sub>      S<sub>2</sub>      D  
                                        Reg      Reg      Reg

(v) Register to Register Reference CPU (Memory Ref)

(i) If it is 3-Address type of instruction, all are register operand.

Inst : [Op-Code | Reg<sub>1</sub> | Reg<sub>2</sub> | Reg<sub>3</sub> ]      S<sub>1</sub>      S<sub>2</sub>      S<sub>3</sub>  
                                                Reg<sub>1</sub>      Reg<sub>2</sub>      Reg<sub>3</sub>

(vi) 4-Address Instruction Format

[Op-Code | Address | Address | Address | Address to Next Inst]  
Operation      D      S<sub>1</sub>      S<sub>2</sub>

(7) Instruction Execution Sequence



SOP & PD combined to ALU

IF → Mem

ID → Decoder

OF & PD → ALU

WB → Register or Mem

(i) Instruction Fetch (IF)

(ii) Instruction Decode (ID)

(iii) Operand Fetch (OF)

(iv) Process Delta (PD)

(v) Write Back (WB)

\* CPU Fetch Instruction from memory in Word Size format.



### ③ Program Status Word

i) It is collection of Accumulators & Flags.

ii) Flag: It is a Flip Flop hold only one bit.

Types of Flags:

a) Control Flag ( $CPU \rightarrow User$ )

\* Carry Flag:

Extra bit out of MSB.

\* Parity Flag:

Even no. of one's in ALU O/P.

\* Auxiliary Carry: (Used in BCD)

- Extra bit from lower nibble to higher nibble.

- Carry bit is used to hold range exceeding condition of higher nibble.

- Unpacked BCD & Packd BCD

- 1 digit  $\rightarrow$  (8 bit)      1 digit  $\rightarrow$  (4 bit)

- $CY = 1$  or  $AC = a$ , then add 6 to make valid BCD.

(at result)

\* Zero Flag:

- ALU Output is 0.

\* Signed Flag:

- MSB of ALU content.

\* Overflow Flag:

- $y_1 y_2 + y_1 y_2$

b) Conditional Flag ( $User \rightarrow CPU$ )

\* Trap Flag

0: Output after execution.

1: Step by step output

$\rightarrow 0 \rightarrow 0$  (go)

$\rightarrow 1 \rightarrow 1$  (+1 CLK)

\* Interrupt Flag

0: Don't check any interrupt (Disable)

1: Check interrupt & continues (Enable)

\* Direction Flag

0: Autoincrement (Clear direction)

1: Auto Decrement (Set direction)

→ Starting Address

→ Last Address.

### ⑩ Addressing Mode

i) It gives location of object.

ii) It contains Effective address or address which load to EA.

iii) Effective address is address of actual Object which is either data or instruction.

iv) Mode of Addressing      a) Implicit      b) Explicit

opcode itself.  
indicate type  
of AM

opcode | AM Address

↓  
↓ to use to indicate type  
of Addressing mode.

✓ I/#  $\rightarrow$  Immediate AM  
Regname: Register AM

[ ]  $\rightarrow$  Direct AM

@ | ()  $\rightarrow$  Indirect AM

Index  
Register  $\rightarrow$  Indexed AM

+/-  $\rightarrow$  Autoindex AM

# Frequencies of Addressing Mode

## ① Sequential Control Flow AM

① It focuses on data location. So data transfer and data manipulation both are design with this AM.

### (a) Implied / Implicit AM :



ex: ADD ;(on stack)  
(CPU)



ex: STC ;(carry)  
CLC

### (b) Immediate AM :



ex: MOV R10 #23

• Range Limited: Unsigned data

constant

& Signed Constant EA : where program store #23

### (c) Register AM .



EA: RegName



### (d) Direct / Absolute AM :



EA: Addr

### (e) Indirect AM



EA: Address present at [Addr]

[IIR, IMR, IRR]

ex: MOV R10, @4000

### (f) Register Indirect AM



EA: Present in RegName

### (g) Indexed Based AM

\* Post Auto Inc

\* Post Auto Dec

\* Pre Auto Inc

\* Pre Auto Dec

[IRR, IALU, IMR]



BaseAddress(R11)

+ BaseAddr + R11

EA

[EA] → Data

ex: MOV R11, #4

[MOV R10 | 2000(R11)]

2000(R11) → 2000+R11

ex: MOV R11, #FFFF

[MOV R10, +R11]

### (h) Indexed Indirect AM

[IRR, IALU, IMR, AMR]

Addr(R11)

→ Addr + R11

[pos when EA Present]

EA → [EA] → DATA

### (i) Autoindexd Addressing Mode

BASE Address (+/-) stepsize

[Base Addr]

[IRR, IALU, IMR]

[Opcode | Base Adr]

## (ii) Transfer of Control AM

- ① Its purpose is location of Next instruction.
- ② During execution program transfer from current location to target location.
- ③ It is Transfer of Control instruction (TOC). To goto (Branch Address)
- ④ We divide memory in segments for better utilization.
- ⑤ If Transfer of Control occurs within same segment then it is Intrasegment TOC.
- ⑥ If Transfer of Control occurs between two segments then it is Intersegment TOC.

(a)

- ⑦ To implement Intrasegment TOC [PC Relative Addressing Mode] is used.

PC Relative Addressing Mode:  $EA = PC + \text{Relative Address}$

\* Relative Address present in Address field of instruction:  $\boxed{\text{Opcode} \mid \text{Add}^n}$   $\rightarrow$  Relative value.  $(\text{current location} - \text{target location})$

(b)

- ⑧ To implement the Intersegment TOC [Base Register AM] is used

$$EA = [\text{BA}] + \text{Relative Address}$$

$\downarrow$  Present in Address field of instruction.  
 Content of Base register  $\downarrow$  (Best for independent code)

## (ii) Instruction Set

### (i) Types of Instructions

#### (a) Data Manipulation Instruction

\* ADD, MUL, DIV, SUB, INC, DEC, CMP, AND, OR, XOR, ISL, ASL, LSR, ASR,

\* Arithmetic inst,  $\rightarrow$  ROL, ROR, Logical inst, RCL, RCR  
Shift inst.

\* CMP:  $Op_2 - Op_1$

$CY = 1 \rightarrow Op_2$  is smaller

$Z = 0 \rightarrow Op_2$  <sup>not equal to</sup>  $Op_1$

$Z = 1 \rightarrow Op_2 == Op_1$

#### (b) Data Transfer Instruction

\* MOV, LOAD, STORE, PUSH, POP, IN, OUT

#### (c) Data Control Instruction

\* Conditional TOC  
\* Unconditional TOC  
→ Based on previous flag <sup>if</sup>, conditional TOC is evaluated. When condition true then control transfers to given location.

## SUB Programs (Functions in Higher Language) (Known as)

(13)

- Single Entry - Single Exit
- Main program is suspended during execution of sub programs
- CALL & RET
  - ↓      ↓
  - Invoke    Invoke
  - PUSH      POP, and return
  - int stack      reset to PC.
  - before function
  - after control

(12) Interrupt Cycle : It is a signal generated by hardware.

(b)



(c) CPU checks INTRPIN after each execution cycle.

(d) When interrupt occurs PC value stored in ~~stack~~ and vector address is loaded into PC.

(e) Nested interrupt.

(f) Types of interrupt:

i) Hardware interrupt

- External interrupt
- Internal interrupt
- RST 4.5, 5.5, 6.5, 7.5

EH      ML      H: High  
INR      ESR      Extreme: E

• Maskable & Non Maskable interrupt

• Vector & Non vector

• Level trigger &  
Edge trigger  
interrupt.

ii) Software interrupt

- System calls.

Time required to  
Service ~~time~~: ~~Unacceptable Response time~~  
~~+ Service Time~~

+ Service Time

13

## RISC and CISC :

- More registers
- Imm addressing mode
- Fixed length instruction
- It support lesser full pipeline
- Cycles per instruction (CPI) = 1
- It used in real time app.

### RISC

(Reduce Instruction Set (CISC))

- It contains smaller instruction set.
- It supports microprogram control unit.

### CISC (Complex Inst. Set (Comp))

- Supports less register
- It supports more address mode
- It supports variable length instruction
- It supports complex full pipeline
- CPI  $\neq 1$
- It used in general app.
- It uses larger instruction set.

13

## 14 Register Organisation RISC Computer

- i) RISC register classified into 4 Groups.

- Global register (G)
- Local register (L)
- gen register (C)
- Out register (O)

### (ii) Window Size

It is formed by grouping Local, gen and Out register

$$\text{window} = L + 2C + O$$

iii) Global register re-use by all.

iv) Window organize in overlapped manner, one window out is used as in for others.

v) Register File Size :  $[W(L+C) + Cr]$  W: # of windows.

### Module 2

## 15 Computer Components

### i) CPU

### ii) Memory

### iii) I/O

i) CPU : It is processing unit.

It contains 3 things:

#### a) Register

- \* Program Counter (PC)
- \* Instruction Register (IR)
- \* Accumulator (Acc)
- \* Temporary Reg. (TR)
- \* Memory Address Reg (MAR)
- \* Memory Buffer Reg (MBR)
- \* Stack Pointer (SP)
- \* Flag Reg
- \* General Purpose Reg. (GPR)

#### b) ALU

#### c) Control Unit



Micro-Operations : (a) It is fundamental operation in Base hardware. (15)

(b) Those operation which are directly executed on base hardware without further decomposition is micro-operation.

(c) Operations run on base hardware in 1 cycle is microoperation.

(d) For user  $\mu$ -operation is register to register transfer.

(e) Micro instruction is collection of one or more micro-operations, still each Micro instruction take 1 cycle only because all micro-operations are running in parallel.

(f) Control signal required to execute  $\mu$ -operations.

M/C Instn  $\rightarrow$  Register Transfer  $\rightarrow$  Hardware  $\rightarrow$  Microoperation  $\rightarrow$  Microprogram  $\rightarrow$  Control Signal  
Language

\* Microprogram:

(a) It is sequence of operations which is used to perform meaningful ~~task~~ action.

(b) In instruction cycle Common Microprogram is used to execute instruction;

• Instruction Fetch • Instruction Decode (ID) • Operand Fetch (OF) • Write Back (WB) • Microprogram Initialization

\* Instruction Fetch:



$$\begin{aligned} T_1 &: PC \rightarrow MAR; \\ T_2 &: M[\text{MAR}] \rightarrow MBR; \\ T_3 &: PC + \text{Step Size} \rightarrow PC \\ T_4 &: MBR \rightarrow IR \end{aligned}$$

\* Instruction Decode



$$T_1: IR \rightarrow \text{Decoder}$$

\* Operand Fetch



$$\begin{aligned} T_1 &: IR[\text{Addr}] \rightarrow MAR \\ T_2 &: M[\text{MAR}] \rightarrow MBR \\ T_3 &: MBR \rightarrow ACC \end{aligned}$$



$$\begin{aligned} T_1 &: IR[\text{Addr}] \rightarrow MAR \\ T_2 &: M[\text{MAR}] \rightarrow MBR \\ T_3 &: MBR \rightarrow MAR \\ T_4 &: M[\text{MAR}] \rightarrow MBR \\ T_5 &: MBR \rightarrow ACC \end{aligned}$$

## ① Write Back

[STA] 200



T<sub>1</sub>: ACC → MBR

T<sub>2</sub>: JR [Addr] → MAR

T<sub>3</sub>: MBR → M [MAR]

This two statements can be swapped without any effect.

## ② Interrupt SUB Program



T<sub>1</sub>: PC → MBR

T<sub>2</sub>: SP → MAR

T<sub>3</sub>: MBR → M [MAR] { System Bus }

Vector → PC { Local Bus }

## ③ Control Unit

Require:

- (a) How many Control Signal present in Base H/w.
- (b) How many instruction are implemented in Base H/w.
- (c) How many Moperation are require for each instruction.
- (d) What are control signal require for each Moperation.

(A) Hardwired Control Unit: Used in RISC Control unit.

• It is the fastest CU. \* It is used in Real time application.

→ It is expressed in term of SOP. For each Control Signal we find SOP.

• Control Signal perform operation in hardware on the basis of Moperations.

If we have n control signal then we can make  $2^n$  Instruction. We can also take its subset but at max  $2^n$  Operation we can make.

Ex: If ~~we have~~ there is 32 instruction and only 2 control signals then it is not possible because with 2 CS we can make at max 4 instruction.

Instruction → Moperation → CS  
(Ex: +, -, /, \*)

• SOP Form: Control Signal  $\rightarrow$  Moperation, (Instruction) + Moperation (Signal which is using CS) + ...

If CS used by all instruction, we just

write Moperation no.

\* It is by default means

all instruction are using it.



# Microprogrammed Control Unit

- \* In this design microprograms are present in Control Memory.
- \* In it a sequencer unit is present, in which decoded instruction is input and on the basis of instruction it have information about what are M operation require to complete the instruction. So, it generate M operation in Microinstruction address which is present in Memory.



## Types of Microprogrammed Instruction

→ Horizontal Microprogrammed Instruction

\* N-bits are required to represent N control signal

Instruction No:

Microinstruction<sup>1</sup>: 

|   |   |                 |                 |                 |   |
|---|---|-----------------|-----------------|-----------------|---|
| - | - | CS <sub>1</sub> | CS <sub>2</sub> | CS <sub>3</sub> | - |
|---|---|-----------------|-----------------|-----------------|---|

Microinstruction<sup>2</sup>: 

|   |   |                 |                 |                 |   |
|---|---|-----------------|-----------------|-----------------|---|
| - | - | CS <sub>1</sub> | CS <sub>2</sub> | CS <sub>3</sub> | - |
| 0 | 0 | 0               | 1               | -               | - |

\* High degree of parallelism.

\* Faster due to no decoder

\* It is not used.

\* By default Microinstruction Design with 1 control signal.

+ It cause effect in Vertical. If n signal active at a time  $\Rightarrow$   $n \log_2 N$  bits

Requirement

- \* How many Control Signal Present in Base H/w?
- \* How many instruction is present?
- \* How many M operation an instruction require for each instruction?
- \* What Control Signal reqd. for each Microinstruction to execute.

(By default we consider vertical to design vertical Microprogram instruction)

\*  $\lceil \log_2 N \rceil$  bits are required to represent for each N Control Signal.

10 → Signal (4 bit)

Instruction No  
Operation: 

|   |   |    |    |    |   |
|---|---|----|----|----|---|
| - | - | 00 | 01 | 10 | - |
|---|---|----|----|----|---|

at a time 3 active

|                |                |                |
|----------------|----------------|----------------|
| S <sub>1</sub> | S <sub>2</sub> | S <sub>3</sub> |
|----------------|----------------|----------------|

  
4bit 4bit 4bit  
-12bit  
→ 4 MByte

\* Low degree of parallelism.  
\* Slower due to decoder.

\* It uses in CISC computer. i.e. what by default design

design with 1 signal

\* Function Code : Microinstruction ( $\log_2 N$ )

\* By default Microinstruction Design with 1 control signal.

## (16) Performance Analysis

i) Performance  $\propto \frac{1}{\text{Execution}}$

ii) Execution time = No. of second / prog

$$\text{No. of second} = \frac{\text{No. of instruction}}{\text{Prog}} * \frac{\text{No. of cycle required}}{\text{for each instruction}} * \frac{\text{No. of clock}}{\text{cycle}}$$

$$ET = \text{Instruction Count} * \text{Cycle Per Instruction} * \text{Cycle Time}$$

$$ET = IC * CPI * \text{Cycle Time}$$

$$ET = (\sum IC_i * CPI_i) * \text{Cycle Time}$$

iii) Speed UP : It is a factor which is use to measure performance gain.

$$S = \frac{\text{Performance of } X}{\text{Performance of } Y} = \frac{\frac{1}{ET_X}}{\frac{1}{ET_Y}} = \frac{ET_Y}{ET_X}$$

## (17) Amdahl's Law

\* According to Amdahl's, rather than developing new system.

improve the performance of existing system. it divided into two part.

\* Due to enhancement, of current system, it divided into two part.  
Unenhanced part and enhanced part.

\* Overall System performance :  $S_{\text{Overall}} = \frac{ET_{\text{Old}}}{ET_{\text{New}}} = \frac{(\text{Performance of new})}{(\text{Performance of old})}$

\*  $ET_{\text{new}} = ET_{\text{of Unenhanced Part}} + ET_{\text{of Enhanced Part}}$ .

\* Enhance those part which used most.

\*  $ET_{\text{new}}$  depends on (Fraction enhance) i.e how much part of system

enhance and (SpeedUp enhance) i.e how much performance gain by enhancement.

F : enhance Fraction

(1-F) : unenhance Fraction

$$S = \frac{ET_{\text{of Old F}}}{ET_{\text{of New F}}} = \frac{ET_{\text{of Old F}}}{ET_{\text{of New F}} * \frac{100\%}{\text{Speed Up}}}$$

$$S_{\text{Overall}} = \frac{ET_{\text{Old}}}{ET_{\text{of unenhanced}} + ET_{\text{of enhanced}}} = \frac{100\%}{(100\% - F) + \frac{F}{S}} = \frac{1}{(1-F) + \frac{F}{S}} = \frac{1}{((1-F) + \frac{F}{S})^{-1}} = ((1-F) + \frac{F}{S})^{-1}$$

↓ Unenhanced fraction

$$\text{Multiple enhance S}_{\text{Overall}} = \left[ (1 - \sum F_i) + \sum \frac{F_i}{S_i} \right]^{-1}$$

## High Performance CPU Design

① Concurrency is present, means 2 or more process can run at same time.

② Four type of Architecture according to Flynn's:

① Serial ~~Control~~ Stream and serial ~~Data~~ Stream (SISD)

Instruction

② Serial Instruction Stream and Multiple Data Stream (SIMD)

③ Multiple Instruction Stream and Single Data Stream (MISD)

④ Multiple Instruction Stream and Multiple Data Stream (MIMD)

⑤ MISD is never used and not implement.

⑥ SIMD & MIMD need additional hardware, so it is by default support concurrency.

⑦ SISD is having no extra hardware support, so with help of pipeline it is going to have concurrency.



## 14 Pipelining

~~Parallel Pipeline~~ It is hypothetical pipeline concept.

• It is successful pipeline

• It works for fixed length instruction

### ① Basic Pipeline:

① It is consider to have 4 stages i.e IF, ID, Ex and WB.

IF works on memory.

ID works on decoder

Ex works on ALU

WB works on memory or register.

② In Non-Pipeline each instruction execute serially. one after another, so in each cycle only one operation of one instruction is present.

### ③ With Pipeline:



## (d) Functions of Pipeline

- i) It decompose problem into independent subproblem.
- ii) It then run all those independent subproblem on independent hardware.
- iii) It provide Overlapping Execution, means it insert new input before the completion of old input.  
This way new input always inserted in each clock cycle, so CPI for pipeline is 1.

### Space-time diagram of Pipeline



## (e) Delays in pipeline:

- i) If all stages have uniform delay.  
Cycle time ( $t_p$ ) = stage delay  
$$\text{cycle time } (t_p) = \text{stage delay}$$
- ii) If all stages have non-uniform delay.  
$$\text{cycle time } (t_p) = \text{Max}(\text{Stage Delay})$$
- iii) If buffer delay present in pipeline  
$$\text{cycle time } (t_p) = \text{Max}(\text{Stage Delay} + \text{Buffer delay})$$
- iv) If setup time or skew time present in pipeline  
$$\text{cycle time } (t_p) = t_p + \text{overhead}$$

$$ET_{\text{Pipeline}} = (K + (n-1)) \cdot t_p \quad | \quad ET_{\text{Nonpipeline}} = n \cdot t_n \quad \text{So, } S(\text{Performance gain}) = \frac{\text{Performance of Pipe}}{\text{Performance of nonpipe}}$$

$$S = \frac{ET_{\text{Nonpipeline}}}{ET_{\text{Pipeline}}} \Rightarrow \left[ S = \frac{n \cdot t_n}{(K + (n-1)) \cdot t_p} \right] \text{ when, } n \text{ is finite and } CPI \neq 1, \quad \left[ S = \frac{t_n}{t_p} \right] \text{ when } n \rightarrow \infty \quad CPI = 1,$$

Perfectly balanced pipeline:  $t_n = K \cdot \text{cycle} \Rightarrow t_n = K \cdot t_p$ , so  $S = K$

Efficiency  $(\eta) = S/K$ , and Throughput  $= n/(K + (n-1)) t_p$

## (f) Types of Pipeline:

- Linear Pipeline:
  - It is synchronized.
  - Its latency is 2 (difference b/w two successive initiation in pipeline)
  - It contains pipeline in ~~and~~ only forward completion, pipeline
- Non-linear Pipeline:
  - It is asynchronous.
  - Its latency depends on reservation table.
  - Reservation table helps in execution order.

|    | 1 | 2 | 3 | 4 | 5 | 6 |
|----|---|---|---|---|---|---|
| s1 | x |   |   |   |   |   |
| s2 |   | x |   |   |   |   |
| s3 |   |   | x | x |   |   |
| s4 |   |   |   |   | x |   |

$s_1 + s_2 + s_3 + s_4$

# RISC Pipelining

- i) To analyze what are the problems associated with pipeline, we consider a successful pipeline implementation i.e RISC pipeline.

## Instruction Set of RISC

### (i) Data Transfer Instruction

- \* It uses Load and Store
- \* It uses indexed addressing mode to access memory.  
ex: Load  $n_0, 3(n_1)$   
Store  $3(n_1), n_0$

### (ii) Data Manipulation Instruction

- \* In it ALU operation perform only on register data.
- \* ex: Add  $n_0, n_1, n_2$

### (iii) Transfer of Control Instruction

- \* Unconditional TOC
- \* Conditional TOC: RISC has its own instruction for it.  
ex: DJNZ  $n_0, 2000$

### (c) RISC uses 5 Stage Pipeline:

- i) Instruction Fetch (Memory) {Along with Fetch PC increment for next stage due to fixed length instruction}
- ii) Instruction Decode (Decoder) {Branch Fetch & Branch Instruction}
- iii) Execute Instruction (ALU)
- iv) Memory Reference (Mem)
- v) Register Access (Write Back)



- ② \* Dependency is major problem in pipeline.  
 \* Dependency cause extracycle and in these extracycle no new instruction is loaded this is also known as Stall cycle.  
 \* When stall cycle present then  $CPI \neq 1$ .

### (e) Types of Dependencies:

#### (i) Structural Dependencies

- \* It occurs due to resource conflict in pipeline.
- \* Resource Conflict is when two or more instruction try to use same memory or function or any functional unit.

- \* In RISC memory resource conflict is present between SR Fetch and ALU Memory as well as in Stage 4 we also access memory.
  - \* To solve memory conflict, memory divided into 2 parts Code Memory and Data Memory, so in Fetch time Code memory is access and Stage 4 data memory is access.
  - \* If we not divide memory in 2 parts, all the instructions have to wait before accessing memory, it should be free, that wait cause stalls.
  - \* Dividing memory is also called as Precycling Mechanism.
  - \* When pipeline full always 4 instruction overlaps.
- Problem  $\xrightarrow{\text{value}} \text{STALLS} \xrightarrow{\text{optimize}} \text{Final Solution}$   
(Structural Hazard)

### (ii) Data Dependency:

- \* It occurs when a instruction  $i$  refers to result dependent data before instruction  $i+1$  writes it. It造成 unsafe condition.
- \* e.g.  $\begin{array}{l} S1_4 = P1 + P2 \\ S1_3 = S1_4 + P2 \end{array} \quad \begin{array}{l} \text{Serial (non-pipeline)} : \text{no problem} \\ \text{Pipeline} : \text{Data dependency problem} \\ (\text{I read data at same time}) \\ \text{when I executing} \end{array}$

| TOMSOL ALG (Algorithm) |           |        |        |           |           |        | Processor latency is zero because<br>of true data dependency. |       |
|------------------------|-----------|--------|--------|-----------|-----------|--------|---------------------------------------------------------------|-------|
| Stage                  | Operation | Opnd 1 | Opnd 2 | Dependent | Dependent | Status | Allocation                                                    | Stall |
| 1                      | ADD       | S1_0   | S1_1   | -         | -         | 0      | $S_0 = \text{value}_0$<br>$S_1 = \text{value}_1$              |       |

- \* With help of this algorithm data dependency problem solved, but it causes (Race hazard or Data Hazard).

- \* To solve (Race on Data Hazard) hardware technique is use i.e. Optimised Forwarding. Big grant circuiting for RSB Buffer is use.

### (iii) Control Dependency: It occurs when unwanted instruction is executes in pipeline.

- \* To solve this problem if branch operation is used, it inserts NOP (no operation) instruction in place of unwanted instruction. (Control Hazard)
- \* If unwanted instruction depends on, on which stage branch statement is handle. For RISC Branch handled on second stage, so unwanted instruction is always 1. It is also known as penalties (no of unwanted instn)
- \* Branch stalls on NOP =  $(\text{Branch Frequency}) * (\text{Branch Penalties})$
- \* Branch prediction: If this technique is used then branch target available at first stage.
- \* Software technique (Combination):
  - ① Code reordering (Safe Sequence is our target not order)
  - ② NOP substitution

## Instruction Scheduling

- (i) Inorder execution: If any instruction is dependent then rest of instruction also share stalls.
- (ii) If Decoding forwarding not support, then run independent instructions first, but it cause problem known as Out of Order execution.
- True data dependency (RBW) { read before write } (adjacent instn)
- (a) ANTI Data Dependency (WRB) { write before read }
- (b) Output Data Dependency (WBW) { write before write }
- (iii) Hazards:
- (a) Read Before Write → Problem  
→ Solution → Hazard
  - (b) Write Before Read → Problem  
→ Solution → Hazard
  - (c) Write Before Write → Problem  
→ Solution → Hazard
- Annotations:  
 I: lock and go  
 adjacent instruction  
 OI: O  
 JO: J  
 J: J  
 } True Data Loss  
 } ANTI Data Loss  
 } Output Data Loss

## Performance Analysis:

$$S = \frac{CPI_{\text{Non-pipe}} * \text{Cycle Time}}{(1 + \# \text{stalls} / \# \text{instn}) * \text{Cycle time}}$$

$$ET_{avg} = \frac{(1 + \# \text{stall} / 4n)}{\# \text{cycle time}}$$

When perfectly balanced: ( $T_n = K \cdot t_p$ )

$$S = \frac{K}{1 + \# \text{stall} / \# \text{instn}}$$

when,  $\eta = 100\%$ .

$$S = \frac{K}{1+0} = [S = K]$$

$$\# \text{line in CM} = \frac{\text{CM size}}{\text{Block size}}$$

$$\# \text{line in MM} = \frac{\text{MM size}}{\text{Block size}}$$

## Memory Organization (Module 3)

\* Sequential Memory Organization  
\* Hierarchical Memory Organization

### ① Hit ratio: ( $\# \text{Hits} / \text{Total} \# \text{of access}$ )

### ② Cache Memory



Tag Space: Extra memory to store tag information.

Tag Directory Size = # lines in CM \* tag space in line

Data Mem Size = # lines in CM \* Block size

CM size = Tag Directory Size + Data Mem Size

(b) Status of Cache = column content of Cache  
 Status of Line = row content of Cache.

### (c) Mapping

Main Mem:



Cache Mem:



\* original Tag Space is empty.

\* Tag space is used to identify whether block is present or not.

### (d) Direct Cache Mapping ( $K \bmod N = i$ )

\* Compulsory Miss  $\rightarrow$  First time access

\* Conflict Miss  $\rightarrow$  suppose to present but not found

\* Direct Mapping, there is chance of thrashing, due to  $B$  bits

\* more no. of conflict misses

\* Sequential Mem access:

$$T_{avg} = H_1 T_1 + (1-H_1) H_2 T_2 + \dots + (1-H_1)(1-H_2) \dots H_n T_n$$



MM:

CM:

Tag Space:  $b - C$  bits

$$C = \log_2 \left( \frac{CMSIZE}{B \text{ loc size}} \right)$$

$a = \text{word offset}$   
 $b = \text{tag bit}$

\* Hierarchical Memory access:



$$T_{avg} = H_1 T_1 + (1-H_1) H_2 (T_2 + T_1) + (1-H_1)(1-H_2) H_3 (T_3 + T_2 + T_1) + \dots + (1-H_1)(1-H_2) \dots H_n (T_n + T_{n-1} + T_{n-2} + \dots + T_1)$$

### (e) Associative Cache

\* NO fixed binding like Direct Map.

\* it is called as content addressable memory

\* it can be map to any cache block in sequence starting from top.

Address binding



Directory Size = # lines in CM \* tag space



\* In this cache, replacement algorithms are used because of no fixed binding. Replacement algo (LRU, FIFO).

$B$  bits  
 [---]  
 memory location

MM:

CM:



? Main operation performed on it

→ illusion

## Set associative Cache

\* In this cache lines are grouped into set and accommodate multiple blocks in set. (25)

$$\# \text{ Sets}(S) = \frac{N}{P\text{-way}} ; N \Rightarrow \text{No. of lines} \\ P \Rightarrow \text{No. of line in one set.}$$



where  
 $S = \text{No. of rows}$   
 $P = \frac{\text{No. of columns}}{\text{How many blocks can accommodate in one row}}$



\* Mod function is used to map the block with set after mapping with set. block can accommodate any line in sequence.

$$K \bmod S = i$$

\* Replacement algorithm used inside set.

\* Replacement algorithm used inside set.

\* This cache design is combination of Direct and Associative mapping.



$$\text{Tag Director's size} = \# \text{ lines} * \text{Tag space} \\ = N * (b-d) \text{ bits}$$

$$\text{or} \\ = (S \times P) * (b-d) \text{ bits}$$

$$\text{where: } d = \log_2 \left( \frac{N}{P \text{ way}} \right) \\ \text{No. of sets}$$

$$N = S \times P$$

$$1 < P < N$$

$$\text{P-way : } P = 1$$

$$P = N$$

→ Work like Direct Mapping

→ Work like associative mapping.

## (ii) Updating Techniques

- \* When CPU tries to read from memory location or write on memory location and if that block present in Cache, it is known as Read Hit and Write Hit.
- \* When block not present in Cache, it is known as Read Miss and Write Miss.  
And when Memory copy from Main to Cache for read and when Memory copy from Cache to Main for write purpose is known as Read allocate and for write purpose known as Write allocate.
- \* Updating technique required because of inconsistency in data present in Cache as well as in main memory.
- \* Two policies are used for update technique, i.e.
  - \* Write-Through protocol
  - \* Write-back protocol

- \* Write-Through Protocol
  - CPU performs simultaneous write operation on both Cache and MM.
  - Inclusion always succeeds.
  - Simultaneous Write time =  $\text{Max} \left( \frac{\text{word update time in cache}}{\text{time in cache}}, \frac{\text{word update time in MM}}{\text{time in MM}} \right)$

Read:

$$T_{avg\ read} = H_n T_c + (1-H_n) \left[ T_m + T_c \right]$$

↓  
if present  
in Cache.  
 $T_c$ : read time

read  
allocate  
time of time required to transfer data  
from MM to CM }

Write:

$$T_{avg\ write} = H_n T_w + (1-H_n) \left[ T_m + T_w \right]$$

↓  
if present  
in Cache

main memory

Average mem access time:

$$T_{avg} = \left( \text{freq}_{read} * T_{avg\ read} \right) + \left( \text{freq}_{write} * T_{avg\ write} \right)$$

When simultaneous memory organization is there then (27)  
 Hit ratio for write is always 1 [Hw = 1]

$$\text{So, } T_{avg\ write} = \frac{T_n}{MM\ access\ time}$$

$$T_{avg\ read} = H_1 T_1 + (1-H_1) \cdot H_2 \cdot T_2 + \dots$$

→ If Question contain : CM with Unit-Through Policy  
 if  $H_w = 1$ , then use Simultaneous Mem. Organ.  
 $H_w \neq 1$ , then use Hierarchical Mem Organ.

\* Write Back Protocol  
 → In this method CPU maintains one extra bit i.e. Update bit  
 If it is set, this means modification done on ~~read~~ Cache Block  
 and if it is 0, then no modification done.

→ So, when CPU going to replace a block, it first checks its  
 Update bit, if it is 1, then CPU first writes it back to  
 Main memory then replace it.  
 update bit → 1 (Dirty bit) } % chance that it is dirty bit } × D  
 update bit → 0 (Clean bit) } % chance that it is clean bit } × C

Read Time:

$$T_{avg\ read} = H_R T_C + (1-H_R) \left\{ \%D * \left( T_M + T_M + T_C \right) + \%C * (T_M + T_C) \right\}$$

↑  
main mem access time  
↓  
write back time      read time  
allocate time

Write Time

$$T_{avg\ write} = H_w T_C + (1-H_w) \left\{ \%D * (T_M + T_M + T_C) + \%C * (T_M + T_C) \right\}$$

Average time

$$T_{avg\ wB} = (freq_{read} \times T_{avg\ read}) + (freq_{write} \times T_{avg\ write})$$

\* If hit ratio of write is not given then we consider it as 1.

## IV Multilevel Cache

④ Miss Penalty: Time required to transfer data from higher level to lower level. When there is no miss in lower level.

$$\textcircled{b} \quad L_1 \text{ CM size} < L_2 \text{ CM size} < L_3 \dots$$

$$L_1 \text{ CM access time} < L_2 \text{ CM access time} < \dots$$

⑤ Two kind of Miss rate:  $= (\# \text{ misses in cache} / \text{total CPU reference})$

\* Global Miss Rate

$$* \text{ Local Miss Rate} = (\# \text{ misses in cache} / \text{Total access to cache})$$

$$T_{avg} = \frac{\text{Hit ratio}^*}{\text{Hit Time } L_1} + (\text{Miss Rate } L_1 * \text{Miss Penalty of } L_1)$$

$$\text{Miss Penalty } L_1 = \frac{\text{Hit ratio}^*}{\text{Hit Time } L_2} + (\text{Miss Rate } L_2 * \text{Miss Penalty of } L_2)$$

$$\text{Miss Penalty } L_2 = \text{Main Memory access time.}$$

$$\begin{aligned} \text{Average mem stall per instruction} &= \\ &= (\# \text{ of misses in } L_1 / \text{ref} * \text{hit time } L_2) + (\# \text{ of misses in } L_2 / \text{ref} * \text{miss penalty of } L_2) \end{aligned}$$

## V Types of Misses

- \* Compulsory Miss
- \* Capacity Miss
- \* Conflict MISS.

⑥ Cache Inclusion: If there are multilevel cache, ~~then~~ if content of higher level cache also present in lower level cache then ~~the~~ cache is said to be inclusive. whatever changes occur in lower level cache must reflect in higher level cache so for this both cache must use write-through policy.

# I/O Organization (Module 4)

29

## ① interrupt driven I/O.

## ⑪ I/O Modes

## ④ Programmed I/O

ex: Bunking System

- \* I/O operations are programmed in CPU.
  - \* CPU waiting time. x Depends on speed of I/O device.

## ⑥ Interrupt Driver I/O

- ↳ Depends on latency of interface.



### ⑥ Direct Memory Access (DMA)

- ④ Direct Memory Access (DMA)  
\* Bulk amount of data transfer from I/O to MM, without involvement of CPU.

## Access Sequence:

- Access Sequence:

  - \* CPU initializes DMA module along with I/O command.
  - \* DMA Control logic interprets the command and enable I/O operation.
  - \* Based on speed of I/O device, it consume the time to prepare data, after that it enable DMAREQ signal to DMA.
  - \* DMA after receiving DMAREQ, it enable HOLD signal to CPU, then CPU generate HOLDA to DMA and DMA after receiving HOLDA signal, it get control of DMA after receiving HOLDA signal, it generate DMAREQA



System BUS and after that DMA goes to device, after which data transfers from I/O to MM, to device, after which data transfers from I/O to MM, and continue till COUNT REG value become 0. After COUNT REG become 0, DMA cycle completed and link re-establish b/w CPU & System BUS.

\* If DMA CPU present in 2 states:

→ BUSy State: In this state I/O device prepare data.

Let, X "Preparation time" & Y "Transfer time".

$$\% \text{ of time CPU Busy} = \left( \frac{X}{X+Y} \right) \times 100\%$$

→ Blocked State: In this state Data transfer take place CPU not have access to system Bus.

$$\% \text{ of time CPU Block} = \left( \frac{Y}{X+Y} \right) \times 100\%$$

## \* Modes of DMA

→ Burst Mode (Waiting time more)

Takes SB → Bulk Transfer → Return SB

→ Cycle Stealing Mode (Waiting time less than BM)

Takes SB → Small Data unit transfer → Return SB

→ Interleaving Mode (Efficient Utilization)

Use SB when CPU Busy with local operation.

Instruction cycle: IF (Mem) IO (Mem) OF (Mem) PD (Mem) WB (Mem)



# Secondary Storage



→ TRACKS (White Space)

Inner track (Blue line)  
Inter tracks  
Distance



{ Track 0 of every surface of every platter completely form a cylinder }

$$\text{Rotational Time} = \frac{\text{RT}}{2}$$



$$\# \text{ tracks on surface} = \frac{\text{Storage area}}{\text{Inter track Distance}}$$

Track width mention

$$\# \text{track} = \# \text{track} / \text{track width}$$

$$\# \text{of cylinder on disk} = \# \text{tracks}$$

Total time req to access data from DISK = Access time + Average Rotational lat.

$$+ \text{Transfer time} + \text{Over head}$$

$$\text{Track Capacity} = \# \text{Sector/track} \times \text{Sector Capacity}$$

$$\text{Cylinder Capacity} = \# \text{Surface} \times \text{Track Capacity}$$

$$\text{Disk Capacity} = \# \text{Cylinder} \times \text{Cylinder capacity}$$

$$\text{Disk capacity} = \# \text{Surface} \times \# \text{track/surface} \times \# \text{Sector/track} \times \text{Sector capacity}$$

$$\# \text{Surface} = 2 \times \# \text{of Platter}$$

$$\text{Sector \#} = (i, j, k) \rightarrow \begin{matrix} \text{Sector} \\ \downarrow \\ \text{cylinder} \end{matrix} \begin{matrix} \text{Surface} \\ \downarrow \\ \text{Surface No.} \end{matrix}$$



### 30 Spatial Locality

- i) It means adjacent data in block access in sequence.
- ii) When array is present then there is chance of spatial locality

| Storage Sequence | Accessing | Spatial Locality |
|------------------|-----------|------------------|
| Row-wise         | Row-wise  | Yes (50% miss)   |
| Col-wise         | Col-wise  | Yes (50% miss)   |

$$\text{# of Miss / Row} = \text{No. of Blocks / Row}$$

$$\text{# of Miss / Col} = \text{No. of Blocks / Col}$$

ex: Element size =  $AB$        $\{ \text{Row-wise storing and rowwise access} \}$

Block size =  $2B$

# Element / Block = 2

Row size = 4

# Block / Row =  $4/2 = 2 \text{ Block}$

Total # Miss / Row =  $\# \text{Row} \times \# \text{Block / Row}$

=  $4 \times 2$

= 8, Miss (50%)

$\{ \text{Row wise storage and column wise access} \}$

Element size =  $AB$

Block size =  $2B$

# Element / B = 2

Column size = 4

Row size = 4

# Block / Row =  $4/2 = 2 \text{ Block}$

But access is col. so,

# Block / Row = 4

Total # Miss / Col =  $\# \text{Col} \times \# \text{Block / Col}$

=  $4 \times 4$

= 16 Miss