

R<sub>1</sub> =

create\_playlist input, R<sub>1</sub>

Add input1, input2

input(1) Playlist

input(2) → song

6/Oct/2017

Output device will send an interrupt to the processor that it is ready to receive the data.

### Interrupt Driven Program

Giving charge of input and output to another device called DMA (Direct Memory Access)



- No input/output can access the memory directly.
- There is a controller with every input/output device.
- Processor checking input/output device and wasting its time so now we have DMA controller.

DMA



Initialize DMA control register.

Processor send information to DMA controller.



DMA need

1. Address of input / output device.
2. Address of the memory location.
3. Number of bytes.
4. Read / Write operation.

Input / output devices are given priority ~~for~~ for data bus access (if Processor need to access data bus)

No. of cycle lost in the cycle  $\rightarrow$  Cycle Stealing.



### Interrupt Acknowledge

Interrupt vector table

ISR      Interrupt Service Routine

Each device will be handled in a separate panel by functions

When device sends small no. on the bus.

Processor check machine. Finds the address in the table. 2, 3 device \$ can be shared by same ISR.

Processor running a program. An interrupt is there so it will store all the values so that it can resume its task after working with ISR.

Go to ISR. ISR will process DMA controller may be initialised then come back overheads when dealing with interrupt.

Put back all the stored data in their respective register and resume their task.

I<sub>1</sub>, interrupt (of priority 3)

Timeline

on ISR D0.5 device 1  
user

T<sub>2</sub> - S

T<sub>1</sub> - 3

Priority → T<sub>1</sub> - 3

D<sub>1</sub> = 3 (is waiting)

D<sub>2</sub> = S (no)

D<sub>3</sub> = I (20)



Trap

Trying to open memory address which is given to another program → SIGSEV

Trap Handling.

We have a trap handler:

Traps are synchronous & Problem will occur at the same time

7/10/2017



If 255 allowed and 1000 given then it has to be initialised.

flow of control →



time →

PC now changed

Interrupt → Is there because of DMA



It is talking about only PC, it is not talking about OS.

- ⊕ ALP (Assembly level Program)
  - If you have access to the machine, the hardware can check the flags.
  - These are faster to execute
  - For embedded application where memory is less program is written using ALP.
  - Compiler is the form of a translator converting other high level language program to assembly language program. Compiler may have assembler to convert the program to machine language.
  - After conversion instruction are read one by one.

Hardisk → RAM → Processor → Fetch & Decoded Executed  
 When one instruction finishes it goes to next instruction.  
 This is known as interpretation.

### Translator

→ Compiler are  
translator

### Interpreter

- Every sentence interpreted
- Instruction by instruction
- fetch decode execute

In 100 sec of a program.

10% of the program utilising 90% of the time

Rewrite this part in assembly language

known as performance tuning.

### Program Time (yr)

ALP            50

TIL            10

### Execution Time (sec)

30

100

Before tuning

Critical 10% 1 yr            90

Rest 90% 9 yr.            10

$$\text{Critical 10\%} \quad 1 + 50 \times \frac{1}{10} = 6 \text{ yrs}$$

$$\text{Rest} \quad 90 \times \frac{9}{10} = 81 \text{ yrs}$$

### Pseudoinstructions (Assembler Directives)

from

X EQU 100

→ Pseudoinstruction

→ Machine language

Go through the list of Pseudoinstruction

ADD R1, R2

CMP #0

JNE LOOP

≡

ADD R1, R2

CMP #0

JNE LOOP

≡

Macro

Expansion

As soon as the program finds this it  
replaces FN with the whole definition  
of FN

FN MACRO

ADD R1, R1

CMP #0

JNE LOOP

ENDM

≡

FN

≡

FN

← MACR call

• What is the difference between Macro and Procedure.

- Avoid procedure because of the overheads included in it.
- Macro is not expanded during runtime, macro call will not take place during call time. It will change as soon as the assembler converts to machine language. Macro is has been converted at this time Expansion is taking place when assembler is converting to machine language.

Macro

- During assembly time
- If called 5 times then 5 copies of the macro
- No return statement.

Procedure

- During runtime
- Only one copy in the address
- There is a return statement.

Date \_\_\_\_\_  
Page \_\_\_\_\_

|     |            |   |                                                                                |
|-----|------------|---|--------------------------------------------------------------------------------|
| ILC |            |   |                                                                                |
| 0   | ADD R1, R2 | = | Assembler takes instruction find<br>opcode and convert to machine<br>language. |
| 2   | JMP L1     | = |                                                                                |
| 5   | CALL B     | = | forward referencing<br>(don't know the value of L1 Right now)                  |

L1 stored in the symbol table.

L1: OUT R3, 0

HLT

### Two pass assembler

Going to read your ALP program two times.  
Every time it reads it is called a pass.  
Read one time → Pass 1.

### ILC (Instruction Location Counter)

#### I Pass

Opcode Table

Symbol Table

Macro Table

#### Opcode Table

| Opcode | Oprand 1 | Oprand 2 | Hexadecimal code<br>for this instruction<br>hex code | Length | Type |
|--------|----------|----------|------------------------------------------------------|--------|------|
|        |          |          |                                                      |        |      |

- If a macro encountered then it is expanded in macro table.

- Clueless about B.

#### II Pass

It will check the table and execute it.

- At the end of PASS II all the thing complete and object module is ready.



Structure of an object module



Linker

It combines

LOADER → LOADS.



LINKER   LOADER  
Performs both  
the task.

## Relocation problem

## External Reference problem

- Relocation Problem : Address (base address) and jmp statement are according to the address (base address) of their module.
- External Reference problem : Module A does not know the address of rest of the module and vice-versa.

| Module Name | length | Starting Address |
|-------------|--------|------------------|
| A           | 800    | 1000             |
| B           | 200    | 1800             |
| C           | 500    | 2000             |

Thus JMP B. converts to JMP 1800  
 JMP 150 converts to JMP 1950.

14/Oct/2017

From memory to processor → Fetch.

Fetch - Decode - Execute cycle

Processor Unit.

Micro Architecture.

PC → MAR → Address Bus → location accessed → MDR

Internal bus of the processor.





only one bus inside processor so only one can access the bus.

- If ALU is working on two operands

Bring R0 store it in the Temporary register in ALU then bring R1, add R0 and R1 store it in the Temporary register till the bus is ready to use.

↑ ↑ ↑ Control Signal



Each Register is having an in control signal and an out control signal

|      |             |
|------|-------------|
|      | STORE R3,R5 |
| 2004 | LOAD@R1,R2  |
|      | ADD R3,R4   |
| 2000 | MOV R3,R1   |

Run: if this is high whatever value in bus will be stored in the register.



Run: if this is high whatever value in register will go into the bus.

| PC out | → control signal for the program which is allowing it to transfer data to the bus.

16 bit bus → 16 wires.

Control unit decides in or out.

- MAR in activated then Read signal needs to be generated. Read operation is for the memory. It will tell to read memory.
- Information from PC has gone to MAR as well as ALU
- Pass 4 through the MUX
  - ↓ size of one instruction.
- MUX select 4 add it, Zim enabled.

### Fetch Phase

Clock Cycle ①      PCout, MARin, Read, Select 4, Add, Zim  
 Data bus      2000      MDRin-E

Clock Cycle ②      Zout, PCin, Ym, /WMFC

- \* First check cache then check in main memory, if not in cache then program send to cache since accessing the main memory takes time but cache can be accessed & easily.

Wait for memory function complete.

Waiting for data in MDR from memory.

MDR → Bidirectional communication of with Internal as well as external bus



Clock Cycle 3 MDRout, IRin

Now fetch ~~for~~ phase is completed.

MOV R3 R1 → This data has been converted to binary. By decoder, control signals have been generated for this instruction.

Decode Phase

Clock Cycle 4. ~~I<sub>out</sub>~~, ALUin, MARin, MDRin  
Execution. R3out, R1in, END

Next Fetch Cycle will be executed

This was the micro program or the micro routine for the instruction.

Fetch Phase

: (Same)

Now next instruction has been decoded by decoder. Control signals have been generated for this instruction

ADD R1, R4

Decode Phase

Clock Cycle R1out, Yin

Clock Cycle R4out, Select Y, Add, Zin

Clock Cycle Zout, R1in, END

Fetch Phase

: (Same)

LOAD @R1, R2

$[IR] \leftarrow [PC]$   
 $[R] \leftarrow [R1]$

### Decode Phase

Clock Cycle

R1 out, MDR in, MAR in, Read Signal Generated

Clock Cycle

MDR in, WMFC

Clock Cycle

MDR out, R2 in, END

### Fetch Phase

(Same)

STORE R3, @RS

$[RS] \leftarrow R3$

Clock Cycle

R3 out, MAR in, MDR in

RS out, MAR in

### Decode Phase

Clock Cycle

~~R3 out, MDR in~~

Process of accessing the memory  
location done first

Clock Cycle

~~RS out, MAR in~~

Clock Cycle

RS out, MAR in, Write

Clock Cycle

R3 out, MDR in, MDR out - EX, WMFC, END

RISC

CISC Instruction

ADD @R7, R8 → HW.

16/ Oct / 2017

R7 out, MAR in, Read Signal, WMFC  
R7 out

### Decode Phase

Clock Cycle

~~R7 out, MAR in, RFA D~~

~~R7 out, MDR in - EX, MMF~~



clock cycle R<sub>out</sub>, M<sub>ARIN</sub>, READ  
clock cycle M<sub>DRIN-FAT</sub>, W<sub>MFC</sub> R<sub>out</sub>, Y<sub>in</sub>  
clock cycle M<sub>DRout</sub>, Y<sub>in</sub> M<sub>DRout</sub>, select Y, Add, Z<sub>in</sub>  
clock cycle R<sub>out</sub>, SELECT → Z<sub>out</sub>, R<sub>in</sub>, END

ADD R<sub>1</sub>, R<sub>2</sub>

JMP L0

L0 : =

If L0 at 40 and JMP L0  
at 20 then offset is 26

- OS will load the program wherever it finds space, it may vary everytime.
- offset remain same

JMP S5

PC~~o~~

Decoder Phase

R<sub>out</sub>, Y<sub>in</sub>

clock cycle Offset field of IR<sub>out</sub>, select Y, Add, Z<sub>in</sub>  
clock cycle Z<sub>out</sub>, PC<sub>IN</sub>, Y<sub>in</sub>, END.

In Y<sub>in</sub> we already have ~~PC~~ value thus

~~PC<sub>out</sub>~~, Y<sub>in</sub>

Decoder is connected to ALU.

if L0 above JMP statement then offset will be negative.

(Branch < 0) 26

### Decode Phase

clock cycle offset field of  $Z_{out}$ , Select Y, Add,  $Z_{in}$  If  $Z \neq 0$   
clock cycle  $Z_{out}$ , PC<sub>in</sub>, Y<sub>in</sub>, END

JZ 26

### Decode Phase

clock cycle offset field of  $Z_{out}$ , Select Y, Add,  $Z_{in}$ , If  $Z = 0$ , END  
clock cycle  $Z_{out}$ , PC<sub>in</sub>, Y<sub>in</sub>, END

END always starts a new fetch cycle.

Tables



Files (collection of records)  
L<sub>fixed</sub>

23/Oct/2017

### Three Bus Architecture

#### Fetch Phase

1. PC<sub>outB</sub>, R = B, MAR<sub>IN</sub>, Read, Jn PC  
Input given in Multiplexer.
2. WMFC
3. MD<sub>outB</sub>, R = B, IR<sub>IN</sub>



Decode Phase

ADD R3, R4

1.  $R3_{outB}, R4_{outA}$ , Select A, Add,  $R4_{IN}$ , END.

Decode Phase

LOAD @R3, R4

1.  $R3_{outB}$ , Select A ( $R=A$ ), MAR<sub>IN</sub>, Read
2. WMFC
3.  $MDR_{outB}, R4_{inout}$ , Selected, Adds END

Decode Phase

Store R7, @R3

- ~~- 1. R7 out, Select A ( $R = A$ ), MAR IN  $\times$  R3
  - 2. WMFC
  - 3. MDR~~

1.  $R_3 \text{ out } B, R = B, \text{ MAR IN}, \cancel{\text{Write}}$

2.  $R_7 \text{ out } B, R = B, \text{ MDR IN}, \text{ Write}$

3. WMFC

4. END



Interrupt Service Routine

Check External Input

Check WMFC (this is also an external signal)

At a given point only one high either  $T_1, T_2, \dots, T_n$

Run signal Generator that is it enable different sing signal may be  $T_1$  becomes  $T_2$  WMFC signal Run = 0 Halt If in  $T_3$  will remain in  $T_3$ .

After all the instruction end generated Regt took place.

Let  
 1 → 7 steps.  
 2 → 10 steps.

Size of Accelerator by 16 decoder.

29/oct/2017

Add Reg, Reg.

1 Zin  
 2  
 3  
 4  
 5 Zin, END  
 LOAD

ADDI Reg, #N

1 Zin  
 2  
 3  
 4  
 5  
 6 END.

Store  
 1 Zin.  
 2  
 3  
 4  
 5  
 6  
 7 END

SUB  
 1 Zin  
 2  
 3  
 4  
 5  
 6  
 7 END

Zin activated in  $T_1$  in all instruction.  
 Add  $T_5$ .



for end



If there are two end like



Maximum No of 7

Decoder 3 by 8

Hardwired logic : If we want speed we go for hardwired logic.

Alternate for hardwired logic is microprogram control.

PC<sub>IN</sub>, PC<sub>out</sub>, MAR<sub>IN</sub>, MAR<sub>out</sub>, MDR<sub>IN</sub>, MDR<sub>out</sub>, Yin,  
Yout, Zin, Zout, R<sub>IN</sub>, R<sub>out</sub>, ..., R<sub>8in</sub>, R<sub>8out</sub>, ADD, SUB,  
Read, Write, Select 4, WMFC, END

For 1 step of Add



All the instruction in MicroRoutine will be stored  
so in control store.



11111111 → All instruction

Whenever 1 all microRoutine corresponding  
its instruction will be activated.

Fetch cycle of is same for all.



0. PCout, MARin, Zin, ADD, Select 4
- 1.
2. MDRout, Rin
3. Branch to starting add of microroutine

27/07/2017

20. R3out, Yin
21. R4out, Select Y, Add, Zin,
22. Zout, Rin, JMP to Microroutine etc.

Groups will be formed such that signals become mutually exclusive if like bad write  
only one will be high at a time

If 12 signals then 4 bits needed

We can group all the out signals together so only one will be high at a time.



In horizontal format 16 bits needed if  
 16 operation are there  
 In Encoded format 4 bits needed.

|   |       |       |
|---|-------|-------|
| 1 | PCout | 0.000 |
| 2 | Zin   | 0001  |
| 3 | MARin | 0110  |

### Micro Program Control

27/07/2017

### Pipelining



Instruction → B1, B2, B3.  
 Buffers.

Sns 2ns 3ns 2ns  
 Clock Cycle 3 : Decoding 3<sup>rd</sup> instruction  
 Executing 2<sup>nd</sup>  
 Storing of 1<sup>st</sup>.

During clock cycle 3 whatever in clock cycle 2  
 will be there in buffer that is B2 buffer  
 has decoded instruction 1.  
 B1 have fetched instruction of the 2-

CLK → 1 2 3 4 5 6 7 8

- 1.) I<sub>1</sub> has been fetched
- 2.) I<sub>2</sub> will be decoded and I<sub>3</sub> will be fetched.
- 3.) I<sub>3</sub> has been fetched

| CLK            | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|----------------|---|---|---|---|---|---|---|---|
| I <sub>1</sub> | F | D | E | W |   |   |   |   |
| I <sub>2</sub> |   | F | D | E | W |   |   |   |
| I <sub>3</sub> |   |   | F | D | E | W |   |   |
| I <sub>4</sub> |   |   |   | F | D | E | W |   |
| I <sub>5</sub> |   |   |   |   | F | D | E | W |

$$T = 5 \text{ ns}$$

If no pipelining been made time taken.

Pipeline

Time taken

Latency

After every 5ns instruction comes up.  
Total time taken is still 20 ns.  
1 instruction every 5ns.

$\frac{1}{5 \times 10^{-9}}$  in 1 sec.

$2 \times 10^6$  instruction/sec  
throughput.

| CLK            | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|----------------|---|---|---|---|---|---|---|---|
| I <sub>1</sub> | F | D | E | W |   |   |   |   |
| I <sub>2</sub> |   | F | D | E | F | E | W |   |
| I <sub>3</sub> |   |   | F | D |   | E | W |   |
| I <sub>4</sub> |   |   |   | F |   | D | E | W |

Pipeline has been stalled



Want to access memory at some time in some cycle then these are called Hazards and for this pipeline is stalled.

- > Cache is existing so that memory access takes only one clock cycle.

Cache miss  $\Rightarrow$

MP



30 Oct/2017

Non-availability of data causes stall

Hazards

1. Data Hazard
2. Control
3. Structural

{ main()

int  $x, y, z = 10;$

$x = 2 * z \rightarrow \text{MUL } R_1, R_2, R_3$

$y = 3 + z \rightarrow \text{ADD } R_2, R_4, R_5$

}



Cache miss  $\rightarrow$  Instruction not available in Cache  
so pipeline is stalled.

$$x = 2 + 3$$

$$y = 3 + x$$



RAW Kind Hazard  $\rightarrow$  Read-after-write hazard.  
WAW  
WAR

after 3 bus architecture.



✓ Operand forwarding so that stalling is not needed.



From output Buffer to input Buffer

LOAD R2, 10(R3)

→ ADD R1, R3, R2.



Write after Write Hazard.

F D F M W

F D F W

→ delay writing stage  
Put a stall over here.

MUL R1, R2, R3 NOP

ADD R3, R4, R5 NOP

SUB R7, R8, R9

→ No Operation.

We can reorder the instruction



For RAW

- Stalling
- Decoder checking hazard and hardware exists (Operand Forwarding)
- No operation when compiler detect error but no hardware. Then it can also go for reordering
- Renaming of registers also helps

1/November/2017

MAR<sub>IN</sub>, PCout, C<sub>3</sub>, C<sub>4</sub>, C<sub>5</sub>, C<sub>6</sub>, C<sub>7</sub>.



|                |     |
|----------------|-----|
| C <sub>3</sub> | 010 |
| C <sub>4</sub> | 001 |
| C <sub>5</sub> | 100 |
| C <sub>7</sub> | 110 |

2 1 1

ADD R1, R2, R3  
SUB R1, R4, R3

Instruction / Control Hazard

Split Cache

*Data*

*Instruction*

If there is a cache miss then a instruction taken through memory which took time.

I. F D E W  
 I<sub>2</sub> F F E D E W  
 ↓ F D E W.  
 checking main memory  
 if total cache Miss found)

### Branch Instructions

loop ADD R1, R2 ADD R1, R2  
 DCR R3 JMP label  
 CMP R3, #0 =  
 JNE loop. label.  
 = unconditioned  
 we don't know the jump  
 target address.

→ Changes the flow of the program these are  
 called Branch instruction.

I. F D E W Remove classified  
 I<sub>2</sub> F D E W from the pipeline.  
 I<sub>3</sub> F D E W  
 I<sub>4</sub> F D F W X  
 I<sub>5</sub> F D E W X  
 F D

For branch instruction address is available at  
 the end of execution.

Branch Penalty : task of removing execution due to  
 Branch Delay Slots Branch instruction.

F D E W

F D E W

Instruction / Control Hazard.  
Instruction Queue and Prefetching.



Here 6 in Fetch unit it is checked if the instruction brought is a branch instruction or not.

|    | 1 | 2 | 3 | 4 | 5   |
|----|---|---|---|---|-----|
| I. | F | D | E | W |     |
| I. |   | F |   |   | D   |
| I. |   |   | F | D | E W |
|    |   |   |   | F |     |
|    |   |   |   |   | F   |

Branch Folding : Branch instruction concurrently executed with other instruction.

Results of these instruction will be put in correct order only.

If there is other instruction then only branch folding possible.

ADD R1, R2, R3  
 SUB  
 AND  
 OR

ADD → 0 0 0 0  
 SUB → 0 0 0 1  
 AND → 0 0 1 0  
 OR → 0 0 1 1

R1 R2 R3 R4 R5 R7 R8 R9

|         |                                   |       |       |       |  |  |  |  |
|---------|-----------------------------------|-------|-------|-------|--|--|--|--|
| 0 0 0 0 |                                   |       |       |       |  |  |  |  |
| 0 0 0 1 |                                   |       |       |       |  |  |  |  |
| 0 0 1 0 |                                   |       |       |       |  |  |  |  |
| 0 0 1 1 |                                   |       |       |       |  |  |  |  |
| 0       | 0   1   0   2   3   4   5   6   7 |       |       |       |  |  |  |  |
| 6   7   | 1   0                             | 1   1 | 1   9 | 1   5 |  |  |  |  |

ADD R1, R2, R3, f

~~K I T D~~ ~~I I I D~~

31 Nov 2017

Conditional Branch

Delayed Branch

Branch Prediction

Static

Dynamic

- I<sub>1</sub> loop: SHL R1  
I<sub>2</sub> DCR R2  
I<sub>3</sub> JNZ loop  
I<sub>4</sub> ADD R3, R4

|                |   |   |   |   |                          |
|----------------|---|---|---|---|--------------------------|
| I <sub>1</sub> | F | D | E | W | → Forget Instruction     |
| I <sub>2</sub> |   | F | D | E | W                        |
| I <sub>3</sub> |   |   | F | D | E W                      |
| I <sub>4</sub> |   |   |   | F | D X → Branch Instruction |
| I <sub>5</sub> |   |   |   | F | D X X                    |
| I <sub>6</sub> |   |   |   |   | F                        |

To solve this problem we can do this ↴

- I<sub>1</sub> loop: DCR R2  
I<sub>2</sub> JNZ loop  
I<sub>3</sub> SLL R1  
I<sub>4</sub> ADD R3, R4

for Delayed Branch → Reordering instruction in such a way that one of the instruction is brought up to reduce delay.

Assume that Branch will never be taken and if branch is taken remove the unwanted prediction instruction. So the time logic will be correct.



If there is condition checking at the end of the loop then all probability branch will be taken.

If condition checked at the starting of the loop then in all probability branch will not be taken.

These are static condition prediction

In dynamic prediction it check the previous history and predict not taken if not taken previously and vice versa



Speculative Execution

Given:

If wrong prediction then have to remove the instruction.

Structural hazard



Split Cache is the way to overcome this problem of structural hazard.

How can addressing modes can cause hazards in pipeline or not cause delay in the pipeline?

POP PUSH operation may change the stack pointer  
Complex instruction always degrade the performance of pipeline.

LOAD (R1), R2  
 LOAD I<sub>o</sub>(R1), R2 ← Needs ALU.  
 LOAD ((Offset + R1)) \*, R2.

R1 = 3000  
 Offset = 100  
 1st memory access.

ADD offset, R1, R1 F D F M F m w  
 LOAD @ R1, R3 F  
 LOAD @ R3, R2 D  
 Instruction Cache.





6/Nov/2017



### Memory Hierarchy



### Processor Chip



Cache are made up of 6 transistor



K address lines

n data lines what are the maximum number of memory location that I can have?



R=4 16 addresses

32 bit machine it is word addressable memory. 16 words.  
byte addressable 4 words.

Within a word locat two bit are going to give the byte address

32 bit address men  $2^{32}$  → 1 GB memory

If cache miss then check main memory if not in main memory then from hard disk the program is brought.

Processor wants something check cache there is a miss then main memory then if miss men through hard disk..

Load through : Writing in cache and sending to processor at the same time.

How mat miss and hit checked? (later)

Cache works on the principle of locality. Locality of two type either spatial locality or temporal locality.

time

If loop is running. Means after a period of time we need that instruction again known as temporal locality.

## Main memory

256 words

8 bit for addressing



11111111 B31

B1

block 0.

Fully Associative  
Algorithm

## Cache



block size 8 words

Fully associative mapping: If we put the block (brought from memory) anywhere in cache.



Have to follow some kind of  
Algorithm (FIFO may be)

7/Nov/2017

## Cache

## RAM

~~Byte~~  
~~1 bit~~  
Cache mapping



bit address

Lab Evaluation Week.

Fully Associative  
Algorithm

Fully Associative Memory: Cache filled in random manner and when cache is full then any algorithm is used to remove a block so that other block can be Rept.

LRU: least Recently used Algorithm is used mostly and ~~not~~ not LIFO

Less used block sent back to memory.

When processor checking block found then it is hit. We have counter for all blocks initialised with 0 and when there is a hit in the block then counter of other block are incremented by 1.

Counter also increases when new block are brought



This will be removed if any other block comes in.

If B7 is used (a hit then)





Processor check the tag  
Bit.



~~direct mapping~~ Direct Mapping : There is a fixed place for a particular block.

Cache Block = (Main memory Block) mod (No of blocks in cache)

4 Block in cache.

|   |   |    |    |    |
|---|---|----|----|----|
| 3 | 7 | 11 | 15 | 19 |
| 2 | 6 | 10 | 14 | 18 |
| 1 | 5 | 9  | 13 | 17 |
| 0 | 4 | 8  | 16 | 12 |

Add 2, 33, 7, 35  
8 8 8 8 ← fully associative  
0 4 0 4 ← Block in RAM  
B0 B4 B0 B4  
M M M M ← direct mapping

If fully associative mapping

At start always a miss

8/Nov/2017

| Address | Main Block | Cache Block | Hit/Miss (Associative) | Hit/Miss | Data |
|---------|------------|-------------|------------------------|----------|------|
| 4       | 0          | 0           | M                      | Miss     |      |
| 33      | 4          | 0           | M                      | Miss     |      |
| 7       | 0          | 0           | H                      | Hit      |      |
| 35      | 4          | 0           | H                      | Hit      |      |



tag will be different

To check if 0, 4, 8, 12 or 16 tag is checked.

~~Set Associative Mapping~~ Set Associative mapping: Something b) as direct and associative  
2 way & set associative means two blocks will form a set.

Cache Block = (Main Memory Block) mod (No of sets in cache)



8/Nov/2017

→ data belongs to correct program.

Valid tag

→ data belongs to previous program.

Size of the cache

$$256 \times 32 + 5 \times 2 + 1 \times 2$$

miss penalty

Cache → Read  
Cache → Write

Hit miss Main memory → Cache → MP  
main memory → Cache → MP

Load Through



If Miss then extra time to bring that data known as  
Miss Penalty



Cache Coherence : Whatever is there in the cache should be in the main memory (same as it is)

Write back Protocol :

Dirty block : When changes only in cache and not in main memory

Modified or dirty bit checked then at the time of replacement block will be written back otherwise block ~~is~~ removed as it is.

What will be the average access time of the processor?

$h \rightarrow$  hit rate

$1-h \rightarrow$  miss rate

miss Penalty

$$\text{Avg. Access Time} = hC + (1-h)M$$

\* time required to access the cache.



Solution

with cache

without

Date \_\_\_\_\_  
Page \_\_\_\_\_

$$\text{Time} = h_1 C_1 + (1-h_1) \left[ h_2 C_2 + (1-h_2)m \right]$$

- c. 30% of the instruction in programme are memory read write operations.

hit Rate

$$\text{Instruction} = 0.95$$

$$\text{Data Cache} = 0.9$$

Had there been no cache time required to access Main memory is 10 clock cycle.

Miss Penalty 17 clock cycle

Cache Access 1 clock cycle

Calculate Speed up With Cache  
Without Cache

Find Ratio

Solution:

$$\text{With Cache} 100 [0.95 \times 1 + (0.1 - 0.95) \times 17] + 30[0.9 + 0.1 \times 17]$$

$$\text{Without Cache} (100 + 30) \times 10$$

$$\text{Speed up} = \frac{\text{Time w/o cache}}{\text{Time with cache}}$$

10/Nov/2017

Class  
Page

## Virtual Memory



Program in Disk. Suppose 32 bit machine then  
 $2^{32}$  addresses are possible.

OS does not let the user know that it has less memory. It gives an impression that it has a large memory.

Direct Memory Address : DMA

OS will load a part of it to the physical address

MMU: Memory management Unit: Going to generate the actual physical address

Virtual memory



Virtual Memory divided into  
virtual pages.

Page Frame and Virtual pages same size

Here also replacement same as Cache Main Memory  
here it is RAM and Virtual Memory.

Page Fault : Virtual page 20 Not available for  
Program 1. Page is not there in the RAM.

When there is a page fault replacement takes place

32 bit address → logical Address

Physical memory depends upon the size of physical memory.

Ram - 8 4KB pages

32 KB memory.  $\geq 2^{15}$  Bits

Now map 32 win is.



Here  $2^{32}$  → size of virtual memory.

No of virtual pages

$$\frac{2^{32}}{2^2 \times 2^{10}} = 2^{20}$$

Bits for Virtual Page = 30 bits.

offset 12 bits.

VP | offset  
20 | 12

offset bits are mapped exactly in the same manner

## Page Frame Table



Valid bit      Dirty / modified bit  
Read Write Permission.

In execution also No 'Write' can take place

32 bit logical Address  $\rightarrow$  15 bit physical address.



If control bit 1  
then virtual page has been brought  
into the RAM

Bit not 1 + Page Default