

# COMPUTER ORGANISATION

$1024$  bits ( $2^{10}$  bits) :  $1\text{ Kb}$

$2^{20}$  bits :  $1\text{ Mb}$

$2^{30}$  bits :  $1\text{ Gb}$

$2^{40}$  bits :  $1\text{ Tb}$

$4\text{ bits} = 1\text{ nibble}$ ,  $8\text{ bits} = 1\text{ byte}$

\* Reasons for memory hierarchy:

[Access time  $\downarrow$ , cost  $\uparrow$ ] (Trade off)

→ Accessing from Semiconductor mem.

is fast (Cache/MM). (But cost  $\uparrow$ )



\* Memory Hierarchy:

Time



- Access Time incr  $\uparrow$ .
- Size incr  $\uparrow$
- Cost decrease  $\downarrow$
- Frequency dec  $\downarrow$

## \* MAIN MEMORY

→ Can word  $\Rightarrow$  organized in fixed  $\Rightarrow$  RAM.

→ Rows

$2^n$

→ 1 word

→ Address [n bits]  $\Rightarrow$  [Word]

[Word on stud]

[Eq]

$\rightarrow$  Pg 13 [Q2]

$\rightarrow$  Pg 14 [Q2]

$\rightarrow$  Pg 16

$\rightarrow$  Pg 17

$\rightarrow$  Pg 18

$\rightarrow$  Pg 19

$\rightarrow$  Pg 20

$\rightarrow$  Pg 21

$\rightarrow$  Pg 22

$\rightarrow$  Pg 23

$\rightarrow$  Pg 24

$\rightarrow$  Pg 25

$\rightarrow$  Pg 26

$\rightarrow$  Pg 27

$\rightarrow$  Pg 28

$\rightarrow$  Pg 29

$\rightarrow$  Pg 30

$\rightarrow$  Pg 31

$\rightarrow$  Pg 32

$\rightarrow$  Pg 33

$\rightarrow$  Pg 34

$\rightarrow$  Pg 35

$\rightarrow$  Pg 36

$\rightarrow$  Pg 37

$\rightarrow$  Pg 38

$\rightarrow$  Pg 39

$\rightarrow$  Pg 40

$\rightarrow$  Pg 41

$\rightarrow$  Pg 42

$\rightarrow$  Pg 43

$\rightarrow$  Pg 44

$\rightarrow$  Pg 45

$\rightarrow$  Pg 46

$\rightarrow$  Pg 47

$\rightarrow$  Pg 48

$\rightarrow$  Pg 49

$\rightarrow$  Pg 50

$\rightarrow$  Pg 51

$\rightarrow$  Pg 52

$\rightarrow$  Pg 53

$\rightarrow$  Pg 54

$\rightarrow$  Pg 55

$\rightarrow$  Pg 56

$\rightarrow$  Pg 57

$\rightarrow$  Pg 58

$\rightarrow$  Pg 59

$\rightarrow$  Pg 60

$\rightarrow$  Pg 61

$\rightarrow$  Pg 62

$\rightarrow$  Pg 63

$\rightarrow$  Pg 64

$\rightarrow$  Pg 65

$\rightarrow$  Pg 66

$\rightarrow$  Pg 67

$\rightarrow$  Pg 68

$\rightarrow$  Pg 69

$\rightarrow$  Pg 70

$\rightarrow$  Pg 71

$\rightarrow$  Pg 72

$\rightarrow$  Pg 73

$\rightarrow$  Pg 74

$\rightarrow$  Pg 75

$\rightarrow$  Pg 76

$\rightarrow$  Pg 77

$\rightarrow$  Pg 78

$\rightarrow$  Pg 79

$\rightarrow$  Pg 80

$\rightarrow$  Pg 81

$\rightarrow$  Pg 82

$\rightarrow$  Pg 83

$\rightarrow$  Pg 84

$\rightarrow$  Pg 85

$\rightarrow$  Pg 86

$\rightarrow$  Pg 87

$\rightarrow$  Pg 88

$\rightarrow$  Pg 89

$\rightarrow$  Pg 90

$\rightarrow$  Pg 91

$\rightarrow$  Pg 92

$\rightarrow$  Pg 93

$\rightarrow$  Pg 94

$\rightarrow$  Pg 95

$\rightarrow$  Pg 96

$\rightarrow$  Pg 97

$\rightarrow$  Pg 98

$\rightarrow$  Pg 99

$\rightarrow$  Pg 100

$\rightarrow$  Pg 101

$\rightarrow$  Pg 102

$\rightarrow$  Pg 103

$\rightarrow$  Pg 104

$\rightarrow$  Pg 105

$\rightarrow$  Pg 106

$\rightarrow$  Pg 107

$\rightarrow$  Pg 108

$\rightarrow$  Pg 109

$\rightarrow$  Pg 110

$\rightarrow$  Pg 111

$\rightarrow$  Pg 112

$\rightarrow$  Pg 113

$\rightarrow$  Pg 114

$\rightarrow$  Pg 115

$\rightarrow$  Pg 116

$\rightarrow$  Pg 117

$\rightarrow$  Pg 118

$\rightarrow$  Pg 119

$\rightarrow$  Pg 120

$\rightarrow$  Pg 121

$\rightarrow$  Pg 122

$\rightarrow$  Pg 123

$\rightarrow$  Pg 124

$\rightarrow$  Pg 125

$\rightarrow$  Pg 126

$\rightarrow$  Pg 127

$\rightarrow$  Pg 128

$\rightarrow$  Pg 129

$\rightarrow$  Pg 130

$\rightarrow$  Pg 131

$\rightarrow$  Pg 132

$\rightarrow$  Pg 133

$\rightarrow$  Pg 134

$\rightarrow$  Pg 135

$\rightarrow$  Pg 136

$\rightarrow$  Pg 137

$\rightarrow$  Pg 138

$\rightarrow$  Pg 139

$\rightarrow$  Pg 140

$\rightarrow$  Pg 141

$\rightarrow$  Pg 142

$\rightarrow$  Pg 143

$\rightarrow$  Pg 144

$\rightarrow$  Pg 145

$\rightarrow$  Pg 146

$\rightarrow$  Pg 147

$\rightarrow$  Pg 148

$\rightarrow$  Pg 149

$\rightarrow$  Pg 150

$\rightarrow$  Pg 151

$\rightarrow$  Pg 152

$\rightarrow$  Pg 153

$\rightarrow$  Pg 154

$\rightarrow$  Pg 155

$\rightarrow$  Pg 156

$\rightarrow$  Pg 157

$\rightarrow$  Pg 158

$\rightarrow$  Pg 159

$\rightarrow$  Pg 160

$\rightarrow$  Pg 161

$\rightarrow$  Pg 162

$\rightarrow$  Pg 163

$\rightarrow$  Pg 164

$\rightarrow$  Pg 165

$\rightarrow$  Pg 166

$\rightarrow$  Pg 167

$\rightarrow$  Pg 168

$\rightarrow$  Pg 169

$\rightarrow$  Pg 170

$\rightarrow$  Pg 171

$\rightarrow$  Pg 172

$\rightarrow$  Pg 173

$\rightarrow$  Pg 174

$\rightarrow$  Pg 175

$\rightarrow$  Pg 176

$\rightarrow$  Pg 177

$\rightarrow$  Pg 178

$\rightarrow$  Pg 179

$\rightarrow$  Pg 180

$\rightarrow$  Pg 181

$\rightarrow$  Pg 182

$\rightarrow$  Pg 183

$\rightarrow$  Pg 184

$\rightarrow$  Pg 185

$\rightarrow$  Pg 186

$\rightarrow$  Pg 187

$\rightarrow$  Pg 188

$\rightarrow$  Pg 189

$\rightarrow$  Pg 190

$\rightarrow$  Pg 191

$\rightarrow$  Pg 192

$\rightarrow$  Pg 193

$\rightarrow$  Pg 194

$\rightarrow$  Pg 195

$\rightarrow$  Pg 196

$\rightarrow$  Pg 197

$\rightarrow$  Pg 198

$\rightarrow$  Pg 199

$\rightarrow$  Pg 200

$\rightarrow$  Pg 201

$\rightarrow$  Pg 202

$\rightarrow$  Pg 203

$\rightarrow$  Pg 204

$\rightarrow$  Pg 205

$\rightarrow$  Pg 206

$\rightarrow$  Pg 207

$\rightarrow$  Pg 208

$\rightarrow$  Pg 209

$\rightarrow$  Pg 210

$\rightarrow$  Pg 211

$\rightarrow$  Pg 212

$\rightarrow$  Pg 213

$\rightarrow$  Pg 214

$\rightarrow$  Pg 215

$\rightarrow$  Pg 216

$\rightarrow$  Pg 217

$\rightarrow$  Pg 218

$\rightarrow$  Pg 219

$\rightarrow$  Pg 220

$\rightarrow$  Pg 221

$\rightarrow$  Pg 222

$\rightarrow$  Pg 223

$\rightarrow$  Pg 224

$\rightarrow$  Pg 225

$\rightarrow$  Pg 226

$\rightarrow$  Pg 227

$\rightarrow$  Pg 228

$\rightarrow$  Pg 229

$\rightarrow$  Pg 230

$\rightarrow$  Pg 231

$\rightarrow$  Pg 232

$\rightarrow$  Pg 233

$\rightarrow$  Pg 234

$\rightarrow$  Pg 235

$\rightarrow$  Pg 236

$\rightarrow$  Pg 237

$\rightarrow$  Pg 238

$\rightarrow$  Pg 239

$\rightarrow$  Pg 240

$\rightarrow$  Pg 241

$\rightarrow$  Pg 242

$\rightarrow$  Pg 243

$\rightarrow$  Pg 244</

## 3 Level Memory System

(~~imp~~)

(~~imp~~)

(~~imp~~)

(~~imp~~)



(~~imp~~)

(~~imp~~)

(~~imp~~)

$$\text{Total Access Time} = H_1 T_1 + (1 - H_1) H_2 T_2 + (1 - H_1)(1 - H_2) T_3$$



$$\text{Total Access Time} = H_1 T_1 + (1 - H_1) H_2 T_2 + (1 - H_1)(1 - H_2) (T_1 + T_2 + T_3)$$

(~~imp~~) → PQ 23 [with/without L1]

NOTE: Hit ratio doesn't influence individual memory access time.

Access time: fixed during manufacturing.

(~~imp~~)

\* Speed up = Time without cache - Time with cache

(~~imp~~) → PQ 27 [Find Speed up]

(~~imp~~) → PQ 28/29 [Bus Transfer time] (Given Bus Size) (~~imp~~)

(~~imp~~) → PQ 29 [Bus Transfer time] (Given Bus Size) (~~imp~~)

(~~imp~~)

(~~imp~~)

## \* CACHE MEMORY

- small & fast (SRAM)

- frequently used data & instructions are present in cache to minimize Tavg.

copy out when



Processor

$$HR = \frac{\text{no. of hits}}{\text{no. of hits} + \text{no. of misses}}$$

(~~imp~~)

\* Cache works on LOR (Load Or Replace).



(~~imp~~)

Black size ↑ HR ↑





Pg 61 Microprocessor 8086  
64 KB cache with word size = 32 bits  
Block Size = 16 words  
Word Addressable MM → Word Addressable [imp]

Size of PA Space =  $2^P$  bytes  
Word Length =  $2^W$  bytes  
No of words =  $2^{P-W}$  :  $(P-W)$  bit PA

| Tag | Sgt | Word | M.        |
|-----|-----|------|-----------|
|     |     |      | $P = n/k$ |

## \* Delay of Tag Comparator

Delay of Tag comparator = Settling time / Hit latency / Hit delay

$\Rightarrow$  A 2 way SAM with  $2 \times 1$  mux.

→ Pg 56 [DM cache & a 2 way SAM cache] ↗ F(wr)

## \* Cache Replacement Policies

- Aims to minimize miss penalty for future references. **imp**
  - [FIFO, LRU, LFU]
  - No replacement strategy for DM

| Mus        | DM | TEAM | SAM |
|------------|----|------|-----|
| Compulsory | ✓  | ✓    | ✓   |
| Conflict   | ✓  | X    | ✓   |
| Capacity   | ✓  | ✓    | ✓   |

→ Pg 69 [With the mixer for each report] ✓

$\rightarrow P_0 \neq 0$  [Associative Mapping]

Pg 71 [4 way Set Associative Cache] ~~4~~

$$HR = \frac{\text{no. of hits}}{\text{Total reference}}$$

In AM  $\rightarrow$  no Conflict Misses

## ~~\* CACHE, MEMORY & ARRAY~~

~~return A = C~~ to find root -  
~~start~~ true if active & no more -

## Memory Structure of Arrays

## Cache Structure

AF(2) A(1) A(2)

|      |      |      |      |     |       |
|------|------|------|------|-----|-------|
| A(0) | A(1) | A(2) | A(3) | ... | A(99) |
|------|------|------|------|-----|-------|

*Leptothrix* • *Leptothrix*

~~• Yet to go on week 2 most done~~

- eg → Pg 84 [Both Qs] ✓
- \* 2D Array  $A[10][10]$
- Array Structure:  $00|01|02|03|04|05|06|07|08|09|10|11\dots$
  - Memory Structure (1 block brought) (p. 97)
  - Access array Pg 87
- eg → Pg 87
- eg → Pg 91 [Vimp Q] ✓

## \* SECONDARY STORAGE

### (i) Magnetic Disk

- Very thin circular metal plate
- Basic unit of MD = 1 sector
- Same no. of Sectors in each track.



Storage capacity is constant

- \* Storage density:  $\rho \uparrow$  (outer to inner)
- $\rho = \text{No. of Bytes/cm}^2$
- \* Data transfer time:  $D = \text{No. of Bytes/sec}$
- \* Seek time: time to move r/w head to desired track no. ( $t_s$ )
- \* Rotational latency: time to move r/w head to desired sector beginning ( $t_r$ )
- $t_r = 1/2 * \text{Rotation time}$
- \* Access time of disk (Avg):  $t_{avg} = t_s + t_r + t_d + t_{disk seek}$

→ Pg 97 [Multi-disk organisation]

### \* Types of Magnetic disks:

- Single Sided / double Sided disks: [5 disk → 10 surfaces]
  - fixed r/w Head / movable r/w Head
- 1 r/w Head per surface [1 rotation covers 1 track]  $n$  Tracks,  $n = \text{surfaces}$   $\uparrow$  Hardware





\* No. of cylinders = No. of Tracks (imp)

eg Pg 99 [Q1] [Solve all Qs] X

\* 1 rotation covers 1 track  
[data transfer rate = no. of bytes / s]

⇒ Fixed r/W Head = no. of \* movable transfer rate (imp)

\*  $w$  = width of storage

$r$  = width of track

$y$  = Gap

→ No. of tracks =  $w/r + y$  (i)

→ If no Gap =  $w/r$  (ii)

→ If only Gap =  $w/r + y$  (iii)

eg Pg 104 (imp)

\* Max Storage density = 2000 bits/cm.

[Refer to circumference of innermost track]



\*  $\langle c, k, s \rangle$  (imp)

- ↳ No. of sectors / track
- ↳ No. of track surfaces / cylinder
- ↳ No. of cylinders (i)

eg Pg 106 [ $\langle 400, 16, 29 \rangle$  corresponds to which sector number] X

### \* Data Organization on Disk

Surface:  $\langle$  Surface no., track no., sector no.  $\rangle$

Cylinder:  $\langle$  cylinder no., track surface no., sector no.  $\rangle$  (i)

eg Pg 110 (i) [rotation latency & data transfer Rate] (imp)

eg Pg 113 [Avg. Access time] (imp)

$$T_{avg} = \frac{1}{2} (t_{lat} + t_{tr} + t_{OH} + t_{dt})$$

time in Pow(10)

### \* Disk Structures

number of bits in minimum sector (i)

Constant Angular Velocity (i)

[Same Track Capacity] (i)

[f varies] (i)

[low Access time] (i)

Constant Linear Velocity (i)

[f constant] (i)

[Track capacity varies with  $r$ ] (i)

[More Access time] (i)

## \* I/O INTERFACING



### → Data Transfer modes :

- (i) Programmed i/o / programmed control
  - i/o device don't have direct access to mem
  - CPU involved for:

device status : status of transfer  
polling : i/o initialization

data transfer : i/o initialization  
- insufficient.

### (ii) Interrupt driven i/o

- when device ready, sends interrupt
- CPU involved for:

i/o initialization : status of transfer.

data transfer : i/o initialization

→ 1 with less overhead

### (iii) DMA Transfer

- large volumes of data transfer.

- processor relieved completely

→ only i/o initial? done by CPU.

dis

## \* DMA [ Direct Memory Access ]

\* DMA Modes : The way DMA takes control of CPU / System bus.

1. Cycle Stealing mode : DMA returns bus after a word transfer.

2. Block mode : return bus after block transfer.

3. Burst mode : after complete data transfer.

4. Interleaved mode :  $\frac{1}{2}$  cycle  $\rightarrow$  DMA

$\frac{1}{2}$  cycle  $\rightarrow$  CPU

NOTE: During DMA, CPU idle.



$$\% \text{CPU} = \frac{x}{x+y} \times 100 \quad \% \text{CPU} = \frac{y}{x+y} \times 100$$

e.g. → Pg 132 [ In Cycle Stealing mode, DMA controller just returns bus after 1 word transfer ]

Given : mem cycle time (y) :  $\frac{1}{2}$  ms

→ Pg 135 [ 16 bit counter register ]

e.g. → Pg 132 [ In Cycle Stealing mode, DMA controller just returns bus after 1 word transfer ]

Given : mem cycle time (y) :  $\frac{1}{2}$  ms

→ Pg 135 [ 16 bit counter register ]

e.g. → Pg 132 [ In Cycle Stealing mode, DMA controller just returns bus after 1 word transfer ]

Given : mem cycle time (y) :  $\frac{1}{2}$  ms

→ Pg 135 [ 16 bit counter register ]

e.g. → Pg 132 [ In Cycle Stealing mode, DMA controller just returns bus after 1 word transfer ]

Given : mem cycle time (y) :  $\frac{1}{2}$  ms

→ Pg 135 [ 16 bit counter register ]

e.g. → Pg 132 [ In Cycle Stealing mode, DMA controller just returns bus after 1 word transfer ]

Given : mem cycle time (y) :  $\frac{1}{2}$  ms

→ Pg 135 [ 16 bit counter register ]

e.g. → Pg 132 [ In Cycle Stealing mode, DMA controller just returns bus after 1 word transfer ]

Given : mem cycle time (y) :  $\frac{1}{2}$  ms

→ Pg 135 [ 16 bit counter register ]

eg → Pg 136 [2 Qs] ✓ Vimp  
 ⇒ Peak data transfer = no. of words trans.  
 rate rate in bytes/sec SIC init  
 Transfer time = Time to transfer 1 word/bits  
 → Pg 139 [Vimp] ✓

### \* I/O TRANSFER

Synchronous (Comp N/W) → Asynchronous (Comp System).  
 Serial → Serial Parallel (I/O, CPU, MM) → (MM, pipeline)

1) Synchronous Serial  
 - Synchronous characters must be excluded from data transfer Rate.



Sync. Characters.

2) Asynchronous Serial  
 - Sync. bits are included in data transfer Rate.



\* Rate at which serial info is transmitted  
 ⇒ baud Rate.

3) Asynchronous Parallel → Strobe, Handshake



(i) Retransmission

(ii) Handshake Protocol



eg

→ Pg 144 ✓ Vimp

### \* MACHINE INSTRUCTIONS

How CPU executes 1 instruction.

dray → Pg 145 ✓

Vimp

PC: holds address of next inst.

IR: holds current inst code

MAR: holds address of memory to be referenced

MDR: holds data to be read / write



## \* INSTRUCTION CYCLE

→ Every inst. undergoes some stages.

1) Instruction Fetch (IF)

MAR  $\leftarrow$  PC  
 PC  $\leftarrow$  PC + 1 Microoperations.  
 Read: IR  $\leftarrow$  MDR. jmp

Aim: Getting curr. inst. in IR.

## 2) Instruction Decode (ID)

ADD R<sub>i</sub>, X. Time  
 Immediate Time  
 operand Add Part Time

1011 → Decoder → Adder circuit activates

## 3) Operand Fetch (OF)

MAR  $\leftarrow$  LOC(X) X: operand.  
 READ MDR Time Ri: register.  
 R<sub>j</sub>/ALU  $\leftarrow$  MDR

## 4) Execute (EX)

perform ALU op.

## 5) Write Back (WB)

MAR  $\leftarrow$  LOC(Z)  
 MDR  $\leftarrow$  R<sub>i</sub>-R<sub>j</sub>/ALU/R<sub>i</sub> add.  
 WRITE MDR Time  
 [Write in Add. loc Z in MM]

## 6) Indirect Cycle

ADD R<sub>i</sub>, #300

Position (2 main cycles) X



7) Interrupt Cycle: If interrupt occurs, CPU needs to serve interrupt.

NOTE Processor performs any microop in reg only (be it arithmetic/logic/Shift/register transfer).

If an inst. takes 7 CC, it means it has 7 micro-operations. [7 steps]

1 microoperation = 1 CC

## INTERRUPT PROCESSING

→ Current program suspended Time  
 → Stores (Register Value, CPU Mode (PSW), Status Regs, Return Address) in Stack (Present in MM) Time  
 → IP is an audience to CPU. Branch Init.

NOTE Prog: [I<sub>1</sub>, I<sub>2</sub>, I<sub>3</sub>, I<sub>4</sub>] Time

New interrupt occurs when CPU is executing I<sub>2</sub>. When CPU first completes I<sub>2</sub>, then stores Return Address (I<sub>3</sub>) (gets from PC). Time

\* CPU Modes

- User (Non-Privileged Mode)
- Kernel/System (Privileged mode).
- Perform OS Related SW.

### \* Interrupt Types

#### (i) External / H/W Interrupt:

- due to timer / I/O device  
- priority readers. (simultaneous I/P activation).

#### (ii) Internal / Prag interrupt:

[divide by zero] [jmp]  
[Invalid opcode]  
[register overflow]

#### (iii) Software interrupt:

Aimed due to context switching

$$T_p = N \cdot cc * t_c$$

$T_p$ : Time to complete the prag

$t_c$ : CC time

eg Pg 164 [Byte Addressable mem]

- If interrupt occurs during H.A.U.L.T.  
(last inst.) ; PC is not updated

eg Pg 165 [Prog given]

### \* Pipelining

- it helps to speed up execution

. (If non-pipeline) - when 0.99 %  
Int.  $\rightarrow \{ IF, ID, EX, WB \}$  (4 Phases)

Assume  $\rightarrow$  Each Phase : 1CC

$$3 \text{ inst? } \Rightarrow 3 \times 4 = 12 \text{ CC}$$

\* Non-Pipeline: At any time, only one instruction is valid.

imp

### \* Phase-Time diag. (Pipeline)

[9 instructions]

- each unit of CPU takes care of each phase (parallel functioning)

$\rightarrow$  Pg 13 [At CLK7,  $I_7$  fetched int,  $I_6$  in ID,  $I_5$  has begin execution, &  $I_4$  result is write back]

: 4 inst. simult. at 1 CLK

or, Aim of Pipelining is to make:

$$CPI \approx 1$$

or, 1S = 1 instruction

(Max after 4 th clock, CPI=1)

### \* Performance of Pipeline

$$\text{Speed up} = \frac{T_w}{T_p}$$

$T_w$  (Time without Pipeline)  
 $T_p$  (Time with Pipeline)

$$\Rightarrow \text{Speed up} = \frac{4 \times 9 \text{ CC}}{12 \text{ CC}} = 3$$

$$m * \text{efficiency} = \text{Speed up}$$

$$\text{no. of stages} : \text{efficiency} = \frac{3}{4}$$

$$(Mux box / Total box)$$

$$\therefore \text{efficiency} = 1, \text{Speed up} = m$$

Practically, Speedup  $< m$

\* Speed Up: on the number of stages + number of instructions

$n = \text{Total instruction}$

$m = \text{Stage} \rightarrow \text{all } m \text{ clock cycles}$

$$T_p = 1 \times m + (n-1) \times 1$$

$$T_{WP} = m \times n$$

$$\text{Speed up} = \frac{T_{WP}}{T_p} = \frac{mn}{m + (n-1)} = \frac{mn}{m + (m-1)} = \frac{mn}{2m - 1}$$

$$\text{Speed up} = \frac{mn}{2m - 1}$$

\* No-Pipeline (CPU  $\rightarrow$  1 unit) ( $T_1$ )

Pipeline (CPU divided into units & buffers  $\rightarrow$  delay) ( $T_2$ )

if 1 instruction  $\rightarrow$   $T_1 = 21 \text{ ns}$

$$T_2 \geq T_1 \quad (\text{Pipeline takes more time})$$

\* If efficiency =  $3/4$ :

ideally it could have been 4 if  $n \rightarrow \infty$   
but we made it 3.  $\rightarrow$  (m)

$$\rightarrow \text{Pg 18 [23 CC]} \times$$

\* INSTRUCTION PIPELINE  $\rightarrow$  H/W

m Stage Pipeline: CPU is divided into m independent modules

Buffer registers: stores result of each stage.

Central Unit



jmp

Jumps largest stage delay as 1 CLK  
 $1 \text{ CLK} = 15 \text{ ns} \therefore 1 \text{ CC} = 15 \text{ ns}$

$$m=4, [150, 120, 160, 140]$$

$$\text{register delay} = 5 \text{ ns}$$

$$1 \text{ CC} = 165 \text{ ns}$$

$$T_p = [1 \times m + (n-1) \times 1] \text{ CCs}$$

$$\rightarrow (4 + 999) \text{ CCs}$$

$$\rightarrow \text{Pg 20 [Speed up?]} \rightarrow \text{CPI=1}$$

$$\text{CLK} = 12 \text{ ns} \rightarrow 12 \text{ ns per Inst}$$

Time without Pipeline

[Add the delays]



Speed up = Time without pipeline

Time with Pipeline

$$30 \text{ ns per unit} \rightarrow (5 + 6 + 11 + 8) \text{ ns} = 28 \text{ ns}$$

$$12 \text{ ns per unit} \rightarrow 12 \text{ ns}$$

$$\rightarrow \text{Pg 21 [2015 Q]} \times$$

In Pipelined unit, like CPI=1

## \* Dependencies

- We want to be able to achieve CPI = 1 in (Pipeline), due to Hazards
- Dependencies

### (1) Structural dependency

- when 1/more stage of the pipeline try to use the same resources
- eg: IF, ALU both trying to access the same resource/resource



Solution: resource replication/duplication  
[have multiple copies of resources]

### \* Resource replication in MIPS RX000

Call → 9th Pic → 1st instruction unit

- (2) Control Dependency → caused due to branching
- will arise when flow of control becomes unpredictable
- ✓ conditional branches
- ✓ unconditional branches
- ✓ function calls/return
- ✓ flag control int. that may affect PC

→ due to this, CPI ≠ 1

diagram

Pg 25 ✓

At 6th CLK, we get the next int. Address, but by that time, already S4, S3, S2, S1 of int. computed.

### \* Flushing

[we have cycles/pipeline] [CPI > 1]

### \* Solution

(i) flushing

(ii) stalling Pipeline (leaving it empty till CLK 6)

(iii) No op (Compiler recognises BT & int. no operations). → [branch delay]

Reschedule/rearrange: some int. moved in NOP, whose movement doesn't affect prog as a whole. ✓

\* Let there be 100 int. [CPI = 1]

[80 N Branch, 20 Branch Int.]

→ Op of Branch Int. will be known at the end of 3rd Stage. [CPI = 2]

|       | 1     | 2     | 3     | 4     | 5     | 6     | 7     | 8     | 9     | 10 |
|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----|
| $I_1$ | $S_1$ | $S_2$ | $S_3$ | $S_4$ |       |       |       |       |       |    |
| $I_2$ |       | $X$   | $S_1$ | $S_2$ | $S_3$ | $S_4$ |       |       |       |    |
| $I_3$ |       |       |       |       | $S_1$ | $S_2$ | $S_3$ | $S_4$ |       |    |
| $I_4$ |       |       |       |       |       | $S_1$ | $S_2$ | $S_3$ | $S_4$ |    |

(After 4 CLK, 1 int. coming out) [CPI = 2]

$$\text{Avg} = \frac{80 \times 1 + 20 \times 2 - 1 \cdot 2}{100} \text{ CPI} = 1.4$$

→ [From Pg 27: Refer book (Notes)  
[Ravi]]