

Explain working of RIP w/ example.  
Draw tables, routes and how convergence occurs.

## co Lecture

Set associative mapping

Offset → which block out from CA we have pick

Memory address is classified into 3 things

Tag → whether data resides in that cache level or not.

|     |       |        |
|-----|-------|--------|
| Tag | Index | Offset |
|-----|-------|--------|

④ Hit rate and Miss ratio

$$L = 1 - \text{hit rate}$$

In particular state  
where bits are  
found in three levels  
of cache.



\* Direct Map, Full Map

L1 → 16~192 KB  
L2 → 10~100 KB  
L3 → 1~8 MB



Three-level cache organisation

Pages are serviced from secondary memory

① To main memory.

② From main memory blocks are taken to the CPU.

\* Set Associative Mapping

↳ combination of direct and associative approaches.

① Cache consists of no. of sets.

② Each set contains a no. of lines.

③ A given block maps to any line in a given set.

④ Eg: 2 lines per set.

▫ 2 way associative mapping.

▫ A given block can be in one of 2 lines  
in only one set.



off-chip cache

In this is not concept in current generations.  
Cache → always on-chip in recent generations.

Data searching process:

if in L1 cache:

break

elif in L2 cache:

break

elif in L3 cache:

break

else:

go to main memory and get the data

\* Memory address structure

|     |                         |                                |
|-----|-------------------------|--------------------------------|
| Tag | Set<br><del>index</del> | word offset<br><del>word</del> |
|-----|-------------------------|--------------------------------|

① Direct map

② Full map

③ Set associative → can be 1-way, 2-way,  
4-way, 8-way set associative

Tag → denotes whether particular block is present  
in the one level cache or not.

→ if present, then find in cache

→ if not present, then find in next level.

|     |       |        |
|-----|-------|--------|
| Tag | index | offset |
|-----|-------|--------|

|||

|        |        |        |
|--------|--------|--------|
| Tag    | set    | word   |
| s bits | 1 bits | 2 bits |

- (1) Address length =  $(s + w)$  bits.
- (2) No. of address =  $2^{s+w}$  bytes or words.
- (3) Block size = line size =  $2^w$  words or bytes
- (4) No. of blocks in main memory =  $2^S$ .
- (5) No. of lines in set =  $k$ .
- (6) No. of sets =  $U = 2^d$
- (7) No. of lines in cache =  $m = kU = k \cdot 2^d$ .
- (8) Size of cache =  $k \cdot 2^{d+w}$  words or bytes
- (9) Size of tag =  $(s - d)$  bits.

Q7. A set associative cache consist of 64 lines of slots, divide into 4 lines set. Main memory contain 4K blocks of 128 words each. Show the format of main memory address.



$$\text{Main memory} = 4 \times 128 \text{ kb}$$

$$\therefore S = 12$$

$$\text{Block size} = 128$$

$$w = 7$$

|     |       |        |
|-----|-------|--------|
| Tag | index | offset |
|     |       |        |

|        |        |        |
|--------|--------|--------|
| Tag    | set    | word   |
| s bits | 1 bits | 2 bits |

- (1) Address length =  $(s+w)$  bits.
- (2) No. of address =  $2^{s+w}$  bytes or words.
- (3) Block size = line size =  $2^w$  words or bytes
- (4) No. of blocks in main memory =  $2^s$ .
- (5) No. of lines in set =  $k$ .
- (6) No. of sets =  $u = 2^d$
- (7) No. of lines in cache =  $m = ku = k \cdot 2^d$ .
- (8) Size of cache =  $k \cdot 2^{d+w}$  words or bytes
- (9) size of tag =  $(s-d)$  bits.

Q7 A set associative cache consist of 64 lines as slots, divide into 4 lines set. Main memory contain 4K blocks of 128 words each. Show the format of main memory address.



$$\text{Main memory} = 4 \times 128 \text{ kb}$$

$$4 \times 1024 = 2^S$$

$$\therefore S = 12$$

$$\text{Block size} = 128$$

$$2^W = 128$$

$$W = 7$$

$$\rightarrow \text{tag} = 12 - 2 = 10 \text{ bits}$$

\* Direct mapped

↳ (Block address) % (Blocks in cache)

# Blocks is a power of 2

# Use low  $n - m$  order address bits.

\* Full Map



Eg: 8-blocks, 1 word / block, direct mapped.  
Initial state:

| Index | $U$ | Tag | Data |
|-------|-----|-----|------|
| 000   | N   |     |      |
| 001   | N   |     |      |
| 010   | N   |     |      |
| 011   | N   |     |      |
| 100   | N   |     |      |
| 101   | N   |     |      |
| 110   | N   |     |      |
| 111   | N   |     |      |

2010123  
monday

## Co Lecture

\* Principle of Inclusion.

\* Direct Mapped Cache



Block mod off of  
miss rate is very high

\* Fully associative / Fully Mapped.



\* Set associative



Block mod # no of sets.

\* set associative

So it can go in any one of given sets.



12 bit address

16 bytes per block  $\Rightarrow$  4 LSB's used to determine the desired byte/word offset within the block

- \* Fully associative address scheme
- Any block from memory can be put in any cache block i.e. [no mapping scheme]

Condition: cache block where it is going should be empty.

When all blocks fitted then to accommodate new data  $\rightarrow$  either we can use FIFO or LIFO scheme to remove [it already a block from empty data from the block, and then place data there.]

[These dirty bit is also important]

★ LFU and LRU policies are also imp.

★ Also random is those replacement.

IF  $DF=1$ ,  
then store  
the data to  
be evicted into  
main memory  
and then evict  
that.

✓ IF  $DF=0$ ,  
directly evict  
it.

★ Block placement vs  
Block replacement

dirty bit and Tag  $\Rightarrow$  used to determine whether a data is in cache or not.

\* Direct mapping

↳  $b \bmod 4$

↓  
Block  
no.

↳ no. of blocks in  
cache.



|     |     |
|-----|-----|
| MB0 | 0%4 |
| MB1 | 1%4 |
| MB2 | 2%4 |
| MB3 | 3%4 |
| MB4 | 4%4 |
| MB5 | 5%4 |
| MB6 | 6%4 |
| MB7 | 7%4 |
| MB8 | 8%4 |

\* Direct mapping implementation



\* set Associative Mapping

\* Summary

→ ① Fully Associative

- Most flexible

- Longest search time  $O(N)$

② Direct Mapped cache

→ Least flexible (more evictions)      → more evict  
- shortest search time  $O(1)$       → more miss

\* Direct mapping

$L \mod 4$

Block no.

$\lfloor \frac{L}{\text{no. of blocks in cache}} \rfloor$



\* Direct mapping implementation

\* set Associative Mapping

\* Summary

→ ① Fully Associative

- Most flexible

- Longest search time  $O(N)$

② Direct Mapped cache

→ More evictions

- Least flexible (more evictions)

more

- shortest search time  $O(1)$ .

misses

### ③ K-way set associative

Compromise

- 1-way set associative
- N-way set associative.

Search time is  $O(k)$  usually small enough to be done in parallel  $\Rightarrow O(1)$ .

e.g. this problem is written on Pg no ③ at bottom

→

Here

16 block set main memory has  
will be formed. 4<sup>12</sup> blocks.

Block 0      S0 4 lines  
Block 1

Each block has  
128 words.

Block 0      S1 4 lines  
Block 1

① Therefore 4 bits are  
needed to identify set  
no.

+ 4K blocks

Block 0      S2 4 lines  
Block 1

② Main memory has  $2^{12} = 4096$   
blocks.

↳ so it generates 12 bits.

↓  
These set + tag length = 12 bits

no. of blocks  
in main memory

S1S 4 lines. Each block has 128 words

↳ so we need 7

for it

specifying word

| TAG    | SET    | WORD   |
|--------|--------|--------|
| 8 bits | 4 bits | 7 bits |

Q. Consider a machine with a byte addressable main memory of  $2^{16}$  bytes and block size of 8 bytes. Assume that a direct mapped cache consisting of 32 lines is used with this machine.

a) How is a 16-bit memory address divided into tag, line no. and byte no?

→ Main memory =  $2^{16}$  bytes → 16 bit address.

Here each block has 8 bytes.

$$\frac{2^{16}}{8} = 2^{13} \text{ blocks of main memory.}$$

$$\therefore \text{Set length} + \text{tag length} = 13 - 0.$$

Here cache has 32 lines. Here

| Tag    | Offset | Index  | Word Offset |
|--------|--------|--------|-------------|
| 8 bits | 5 bits | 3 bits |             |

$$32 = 2^5$$

are no. of lines in directly mapped.

b) Into what line would bytes with each of following will be stored?

① 0001 0001 0001 1011

$$\text{Tag} = 17$$

$$\text{Offset} = 3$$

$$\text{Word} = 3$$

↓

↓

this means

~~17th block~~

Line 3

a) consider a machine with a byte addressable main memory of  $2^{16}$  bytes and block size of 8 bytes. Assume that a direct mapped cache consisting of 32 lines is used with this machine.

b) How is a 16-bit memory address divided into tag, line no. and byte no?

→ main memory =  $2^{16}$  bytes → 16 bit address.

Here each block has 8 bytes.

$$\frac{2^{16}}{8} = 2^{13} \text{ blocks of main memory.}$$

∴ Set length + tag length = 13 → 0.

Here cache has 32 lines. Here

| Tag    | Offset | Word offset |
|--------|--------|-------------|
| 8 bits | 5 bits | 3 bits      |

$$32 = 2^5$$

↓  
are no. of lines in directly mapped.

b) Into what line would bytes with each of following will be stored?

① 0001 0001 0001 1011  
↓            ↓            ↓            ↓  
Tag = 17      offset = 5      word = 3.

↓  
this means Line 3

No. of bytes stored in cache block = No. of bytes stored in main memory block

Q) Suppose the byte with address 0001 1010 0001 1010 is stored in cache. What are the addresses of other bytes stored along with it.

d) How many total bytes of memory can be stored in the cache?

e) Why is the tag also stored in cache?

→ e) 0001 1010 0001 1010  
  |   |  
Line 3 word no. 2

↓

so word stored in this

0001 1010 0001 1000 (0th word)  
001 (1st word)  
010 (2nd word)  
011 (3rd word)  
100 (4th word)  
101 (5th word)  
110 (6th word)  
111 (7th word)

d)  $32 \times 8 = 256$  bytes.

32 lines } each line/block will store

8 bytes.

e] Because two items with two different memory addresses can be stored in the same place in the cache. The tag is used to distinguish between them.

## MSD Lecture

### Data Analytics steps



### Pre-processing steps:

$$s(x) = \frac{\bar{x} - \mu}{S_{\bar{x}}/\sqrt{n}} \quad n \rightarrow \infty \Rightarrow \text{For variance } \rightarrow \text{unknown}$$

Here  $s_x^2 = \frac{1}{n-1} (x_i - \bar{x})^2$

scribbles  
wednesday

CS Lecture

- a) considers a memory system that uses a 32-bit address to address at byte level, plus a cache that uses a 64-byte line size.
- b) Assume a direct mapped cache with a tag field in address of 20 bits. Show address format

$$\rightarrow \begin{array}{lll} \text{Block size} & = \text{Block} & \\ \text{in main} & \text{size} & = 2^6 = 64 \\ \text{memory} & \text{in cache} & \end{array}$$

No. of blocks is

$$\begin{array}{lll} \text{main memory} & = & \boxed{\text{tag + index}} \\ 32 & = & 20 \\ 2 & & 2 \end{array}$$

So:

| Tag     | index  | offset |
|---------|--------|--------|
| 20 bits | 6 bits | 6 bits |

b]. Assume an associative cache. Show the address format and determine following parameters

c]. 4 way set-associative cache with tag = 9 bits

| Tag           | index          | Word           | No. of lines in cache                                                              |
|---------------|----------------|----------------|------------------------------------------------------------------------------------|
| 9 bits<br>tag | 17 bits<br>Set | 6 bits<br>Word | No. of sets = $2^{17}$ No. of lines in cache = $2^{17} \times 128$<br>undetermined |

Soln of b]. Tag = 26 bits Word = 6 bits.

each set has  
4 lines

e.g. consider a computer with following characteristics: total of 1MB main memory, word size = 1 byte, block size = 16 bytes [one block has 16 words] and cache size of 64 kbytes.

d]. Give main memory address of F0010, 01234 and CABBE give all things for direct mapped cache.

$$2^{20} \rightarrow 1 \text{ Mb.}$$

↳ So address length = 20 address lines.

$$20 - 4 = 16$$

So there are 16 blocks in main memory.



For 01234.



For CABBE



① No. of block size in cache should always be equal to block size in main memory.

$$\# \text{ block in cache} = \frac{2^{16}}{2^4} = 2^{12}$$

$$\# \text{ of block in MM} = \frac{2^{20}}{2^4} = 2^{16}$$

$$\text{Tag} = \frac{2^{16}}{2^{12}} = 2^{(4)} \rightarrow \text{Tag}$$

b) Give any two main memory addresses with different tags that map to same cache slot for a direct-mapped cache



Indirectly mapped:

| Tag | Line | Offset |
|-----|------|--------|
|-----|------|--------|

For this question for 2 addresses.

Tag

↓

Line

↓

Offset

↓

Take different should

may or may  
not be same

Eg: 0FFF F and

3FFF I.

Q No. of blocks in main memory can be or cannot be equal to no. of blocks in main memory

d) For the main memory addresses of F0010 and CABBE, give the corresponding tag and offset values

→ As this is fully associative, hence there is no lines distinction.

→ F0010



CABBE



d) For the main memory addresses of F0010 and CABBE, give the corresponding tag, cache set and offset values for a two-way set associative

→ No. of blocks in cache =  $2^{12}$ .

Each Set has 2 blocks.

So total no. of sets =  $\frac{2^{12}}{2^1} = \frac{11}{2}$ .

For representing 11 we need 11 bits set tag. set 2 word



→ Similar to what we do  
in page replacement policy in OS

- \* Replacement algorithms.
  - ① FIFO and LIFO
  - ② LRU and LFU
  - ③ random

Page Fault ~ Block miss  
in cache  
in OS

↳ on miss, a new block must be brought in.

② this requires evicting a chosen block residing in the cache.

Q. A computer system uses 16-bit memory address [ $2^{16}$  addresses = 64KB memory]. It has a 2KB cache in a direct-mapped fashion with 64 bytes per cache block. Give a. calculate structure

↳  $2^{12} \rightarrow 12$  address lines in cache.

$$\frac{2^{12}}{2^6} = 25 \text{ blocks in cache.}$$

| tag    | lines  | offset |
|--------|--------|--------|
| 5 bits | 5 bits | 6 bits |

Memory Address Structure.

6) Initially cache is empty, for each of above  
indicate hit or miss.

128, 144, 2176, 2180, 128, 2176

3Ks → 

- ① cold or compulsory
- ② Capacity
- ③ Conflict
- ④ Coherence

 } Types of cache misses.

① For  $(128)_{10} = (0000 \ 0000 \ 1000 \ 0000)_2$ ,  
Tag Lines Offset

since initially cache was empty so it is a miss.

After this miss, tag field for cache block 00010 is set to 00000.

② For  $(144)_{10} = (0000 \ 0000 \ 1001 \ 0000)_2$ ,  
Tag Lines Offset

since tag field for cache block 00010 is 00000 before this access, this will be a cache hit.

③ For ~~(2176)~~  $(2176)_{10} = (0000 \ 1000 \ 10000000)_2$ ,  
Tag Line Offset

Here miss.

so line 00010 is not set to tag 00001

$$④ (128)_{10} = (\underbrace{0000}_{\text{tag}} \underbrace{0000}_{\text{line}} \underbrace{(000 \ 0000)}_{\text{offset}})_2$$

Here miss at 128 because at line 00010 tag was 00001 but it [due to 2176] the should what we want was 00000

so miss and now line 00010 has tag 00000.

$$⑤ (2176)_{10} = (\underbrace{0000}_{\text{tag}} \underbrace{1000}_{\text{line}} \underbrace{(000 \ 0000)}_{\text{offset}})_2$$

Here also miss.

As tag at block 00010 tag is 00000

↓  
So it is miss.

And now it will get updated.

26/10/23  
Thursday

## — AI Lecture

\* → Normal Form in FOL

Generalised modes ponen out:  
(Incomplete)

↑ First order logic

School 2/3  
Tuesday

## co Lecture

2. A set-associative cache consists of 64 lines of 64 slots, divided into four lines. Set. Main memory address contains 4K blocks of 128 words each. Show the format of main memory.



Q1] For hexadeciml main memory addresses 111111, 666666, BBBBAA, show following info in hexadecimal format:

a) Tag, line and word values for a direct-mapped cache using format of Fig 4.10.

| 6bits | 10bits | 2bitd | → Given |
|-------|--------|-------|---------|
| Tag   | Line   | Word  |         |



similarly you can do for 666666 and BBBBAA

Q2] A two way-set associative cache has lines of 16 bytes and a total size of 8KB. The 64MB main memory is byte addressable. Show format of main memory address.

→ Main memory = 64 KB

↳ So need 26 bit address.

Blocksize in cache = Block size in memory = 16 bytes

→ So requires 4 lines

$$\text{No. of blocks in main memory} = \frac{2^{26}}{2^4} = 2^{22}$$

$$\text{No. of blocks in cache} = \frac{2^3 \times 2^{10}}{2^9} = 2^9$$

|       |      |      |
|-------|------|------|
| 13bit | 1bit | 1bit |
|-------|------|------|

↳ wrong sol

Main memory = 64 MB  
 ↳ 26 bit address line.

Block size = Block size = 16 bytes.  
 in cache              in memory

$$\text{No. of blocks} = \frac{2^{26}}{2^4} = 2^{22}$$

$$= \frac{2^{13}}{2^4} = 2^9 = 512$$

∴ Set + tag =

Each set has 2 blocks.

↳ 256 set

↳ To represent this we need 8 lines.

|         |     |        |
|---------|-----|--------|
| 14 bits | set | 4 bits |
| 10 bits |     | offset |



cache coherence

↳ similar to dirty read, dirty write problem in the DBMS  
 ↳ Shared memory problem in OS.

Protocol to solve this: Cache coherence protocol

| Time step | Event               | CPU A's cache | CPU B's cache | Memory |
|-----------|---------------------|---------------|---------------|--------|
| 0         |                     |               |               | o      |
| 1         | CPU A reads x       | o             |               | o      |
| 2         | CPU B reads x       | o             | o             | o      |
| 3         | CPU A writes 1 to x | 1             |               | 1      |

Main memory

↳ State copy



{L1, L2} p  
 {L3} p



④ Cache  
coherence  
protocol

hardware based

software based

snooping based  
directory based

necessary conditions to keep memory system  
coherent:

① Read: p to X not followed by a write.  
p to X and write by another processor.

② Read: p to X followed by write: p to X  
and no write by another processor.

③ [Incomplete].

Snoopy Bus [Incomplete].  
[means checking]

Here we only send

p0  
load x

p1  
load x

invalidation

message.

Based

only we give  
↑ invalidation  
Invalidation msg  
Based

updation

Based

we also  
rectify/  
update the  
data

p0  
write #3,x  
p1

x=3

x=1

x=1

memory

Invalidate

x=1

memory.

How we update also



- \* Variants of snoopy bus Need to study this.
- MSI → Modify shared invalid
- MESI → Modify exclusive shared invalid
- MOESI → Modify own exclusive shared invalid
- MERSI → ~~Modify~~  
  MO SI

MESIF [used in Intel i7]

↳ forward exclusive shared invalid forward.

- a) Two processes require access to the same line of data from data memory. Processes have a cache and use the MESI protocol. Initially both cache are empty.

Figure below depicts the consequence



Data in shared memory is only for reading.

To modify it we need the permission of the cache controllers.

Then it will modify and broadcast the message.

Protocols

- invalidation based → mostly used nowadays
- update based [consumes lot of time and resources]

Invalid → line in the cache does not contain valid data.

We cannot use

this for processing → as it was recently updated by some cache

tmp

| valid flag | dirty flag | Tag | state |
|------------|------------|-----|-------|
|------------|------------|-----|-------|

- If we want to access modified data which is private property of the cache, then first convert it to shared state and then next process takes place.

private properties of cache

|                             | Modified           | Exclusive          | Shared                           | Invalid              |
|-----------------------------|--------------------|--------------------|----------------------------------|----------------------|
| valid cache                 | Yes                | Yes                | Yes                              | No                   |
| memory copy                 | out of date        | valid              | valid                            | -                    |
| copies exist in other cache | No                 | No                 | Maybe                            | Maybe                |
| a write to this line...     | does not go to bus | does not go to bus | goes to bus<br>and updates cache | goes directly to bus |

] Difference between consistency and coherency

Deals with multiple blocks of memory

Deals with single block of memory





 = dirty line copyback

## Bus generated events

means before modifying the block  
write it to the main memory

Q. A snapshot of state associated with blocks on 2 separate cores in a centralized shared memory system is shown below. In this system, cache coherency is maintained with an MSI snooping protocol. You can assume direct mapped cache.

| PC:     | Tag  | Data word 1 | Data word 2 | Data word 3 |
|---------|------|-------------|-------------|-------------|
| Block 0 | 1000 | 10          | 20          | 30          |
| Block 1 | 4000 | 500         | 600         | 700         |
| :       | :    | :           | :           | :           |
| Block N | 200  | 2           | 4           | 6           |

| Data word 4 | Cohesency state |
|-------------|-----------------|
| 40          | M               |
| 500         | S               |
| :           |                 |
| 8           | S               |

Q. If Po wants to write Block 0, what happens to its cohesency state?

↳ Remains in modified as it want to write in its own

5)

| PC:     | Tag  | DW1 | DW2 | DW3 | DW4 | Cohes stat |
|---------|------|-----|-----|-----|-----|------------|
| Block 0 | 1000 | 10  | 10  | 10  | 10  | I          |
| Block 1 | 4000 | 500 | 600 | 700 | 800 | S          |
| :       | :    | :   | :   | :   | :   |            |
| Block N | 2000 | 2   | 4   | 6   | 8   | S          |

b) If P1 writes to block 1, is block 1 on P0 invalidated? Why or why not?

soln 23

Monday

### DBMS Lecture

#### \* Dependency Preservation

↓

If we take union of decomposition's FD's  
then we should exactly get the original  
FD's set.

$$(f_1 \cup f_2 \cup f_3 \cup f_4)^+ = (F)^+$$

↓  
FD's of decomposition  
set

But very lengthy proof

- \* Testing for dependency preservation
- \* To check if a dependency  $\alpha \rightarrow \beta$  is preserved in a decomposition of R into  $R_1, R_2, \dots, R_n$  we apply the following test (with attribute closure done with respect to F).  
result =  $\alpha$

\* Testing for BCNF

\* testing DeComposition for BCNF.

↳ Not able to understand.

### CO Lecture

continuing question of Pg. no. 63 and 64

b) the blocks will not get invalidated as the tags are different.

c) If tags were same then it would have got invalidated

d) If PI brings in Block M for reading, and no other cache has a copy, what state is it cached in?

↳ It will be in shared state in MSI protocol.  
If it was in MESI protocol, then it would have been exclusive state.

Q.2] Action sequence form:

P#: <op> <addr> [<value>] w|R  
↓      ↳  
write    Read  
operation



Each iteration here is independent

Here <address> denotes memory address  
<value> indicates the value.

One diagram: This uses MSI protocol

| P0              | P1        | P3             |
|-----------------|-----------|----------------|
| coherency state | Add       | Date           |
| I               | 100 00 10 | B0 I 100 00 10 |
| S               | 108 00 08 | B1 S 128 00 08 |
| M               | 110 00 10 | B2 M 110 00 10 |
| I               | 118 00 10 | B2 S 118 00 10 |

Here memory diagram is  
remaining.

- a) [10] <5.2> P0: read 120 ↳ this address
- b) P0: write 120 ← 80
- c) P3: write 120 ←

→ Sol'n for a Here add's tag 120 is residing  
in P3

↳ so we don't need to bring it from main memory

In P3 120 is present in S state.

so no need to change its state, as  
it is already available for reading.

now when we are bringing it for reading,  
we must first replace some block from P0  
remove

to make space for this coming block

✓  
1st strategy:

replace the  
block with  
invalid state

↓  
2nd strategy:

replace the  
block with  
M state

→ 3rd strategy:

replace the  
block with  
S state

we follow this  
strategy

↓  
if there are many  
then replace the first  
invalid block.

so now: B O I 100 00 10

↓ Replaced by

B O S 120 00 20

sol for These we assume initial state and  $08^{th}$  are  
indep.  
b] Pg: write data at address 10 to add 120.

↓ so tag 120 is present in P3 in  
shaded [S] state.

First get it in the P0 state and

that too in modified state. cache  
for modify state seek permission from control

But now when we are bringing it for reading, we must first replace some block from P0, remove

to make space for this coming block.

✓  
1<sup>st</sup> strategy:  
replace the  
block with  
invalid state

↓  
2<sup>nd</sup> strategy:

Replace the  
block with  
M state

3<sup>rd</sup> strategy:  
Replace the  
block with  
S state.

we follow this  
strategy

↓  
IF there are many  
then replace the first  
invalid block.

So now: BO I 100 00 10

↓ Replaced by  
BO S 120 00 20

soln for These we assume initial state and 080100  
b] P0: write data off pages so to add 120. indepe

↳ so tag 120 is present in P3 in  
shaded [S] state.

↓  
First get it in the P0 state and

that too in modified state, cache  
for modify state seek permission from control

Here also we remove first invalid block from P0 and then bring this new block in new state modified.

↓  
And now send the invalidate message for that tag

↓  
So whenever tag 120 is present [except P0] mark them to invalid [I] state.

so in  
for P3: write so to add 120.

↳ Already 120 is present in P3 tag

↓  
So directly convert it to modify state and write so to add & tag 120 location

↓

No need of seeking permission from cache controller.

d] P1: read 110

e] P0: write 108  $\leftarrow$  48

f] P0: write 130  $\leftarrow$  78

g] P3: write 130  $\leftarrow$  78

→

→ main foo d] Read add's tag10 from p1

L, but add's tag10 is not present in p1

Now search in other caches

↓  
it is also

It is present in local cache P1 but  
in invalid state.

↓

So now search in other caches  
Here it is present in P0 in M state

✓ P3 in I state

We will not use this as  
this is in I state.

Now read from P0 share cache

↓

First convert P0 110 add's tag into  
share state, bring it into local  
cache [P1] and replace it with P1 110  
add's tag which is in I state.

e. P0: write 48 to add's tag 108.

It is present in local cache in I state.

Seek it in the permission of cache controller  
to change it to M state.

Modify it bring it and replace it with  
some invalid block and then broadcast

f) P0: write 130 ← 78

↳ Here 130 add~~s~~ tag is not present in  
the cache.

↓

so seek it from the main memory  
in 's' state and side back  
recently used block in P0 back to  
modified memory.

↓

then seek permission from cache  
controller to convert it to m-state  
and write 78 to that location

↓

Here no need of ~~modification~~ invalid  
broadcasting state as we seeked this from the  
memory

g) P3: write 78 to add~~s~~ tag 130 location

↳ Here 130 add~~s~~ tag is not present in  
any of cache

↓

so seek it from the main memory  
in 's' state and side back

the invalid state block back to  
memory.  
First

then seek permission from cache controller  
to convert it to M state and write  
LR to that location

↓  
Here no need of invalidation or invalid  
broadcasting as we seeked from the  
memory

\* Average Access Time

$$\rightarrow \text{AMAT} = \text{Hit Time} + \text{Miss rate} \times \text{Miss penalty}$$

↑ time required  
for hit

e.g. CPU with 1ns clock, hit time = 1cycle, miss  
miss penalty = 20 cycles, L-cache miss rate = 5%

↓  
when data is not  
available in upper  
hierarchy, it then refers  
to PLB and then goes  
to next cache hierarchy!

In this process whatever clock cycles  
are wasted, it is miss penalty.

$$\text{AMAT} = 1 + 0.05 \times 20 = 2 \text{ ns}$$

∴ 2 cycles per instruction



→ modified CPU time

$$\text{CPU time} = \left( \frac{\text{CPU execution}}{\text{clock cycles}} + \frac{\text{Memory stall}}{\text{clock cycles}} \right)$$

+ modified average access time

$$= \frac{\text{Hit time of L1}}{\text{miss rate of L1}} + \left( \frac{\text{Hit time for L2}}{\text{miss rate of L2}} + \frac{\text{miss penalty L2}}{\text{miss rate of L2}} \right)$$

30/10/23  
Monday

## Robotics Lecture

ESE → many numericals

↳ based on trajectory

29/08/23 [in this part I have missed]

\* Automation and Robotic Economics

① Basic components of robotic industrial

- controllers
- Manipulators
- Power supply
- End effectors

01/11/2023  
Wednesday

\* z-test

↳ used when variance is known

### co Lecture

\* DRAM Refreshing techniques

- ① RAS only refresh
- ② Hidden refresh
- ③ Distribute refresh.

- 
- ① RAS means it will refresh cells after a particular period. All cells in row get refreshed at a time. Also known as first refresh
  - ② Once if we tried a particular location, then RAS is made high. The reading data sight is not disturbed, but other rows are refreshed at same time.
  - ③ RAM is not available for read or write during refresh. So first refresh, then wait then again refresh. In wait, read or write can be possible

- \* In chip present on DDR, first bit is address, one side is banks and second is banks.
  - \* On each RAM, there are 8 chips that are simultaneously working.
  - \* DRAM Dual inline memory module
  - \* Channel
  - \* DIMM
  - \* RANK RANK
  - \* CHIP
  - \* BANK
  - \* Row column.
- (Imp)
- RAM → Has n<sub>1</sub> → Each bank → Each bank has n<sub>2</sub> chips → Each chip has n<sub>3</sub> banks
- Each bank has n<sub>4</sub> rows and n<sub>5</sub> columns
- \* Channel is socket where DDR gets inserted.
  - \* DIMM is a module.
  - \* RANK 0 and RANK 1 are two sides and on each side chips are present.
  - \* On each side [RANK] 8 chips are present.
  - \* Each chip consists of 8 banks.
  - \* In bank there are row and column addresses.



CS (0=1)  
chip select (R0 OR R1)

\* 8 chips on a bank



\* Rows and columns  
in DRAM.

0-7

- while getting 64-bit data from SDRAM, 8-bit is taken from each chip.
  - Here the activated bank no. for each chip is same.
  - Also row and col<sup>n</sup> no. of activated bank in each chip is same.
  - For 8 bits from each bank, 8 column of that bank are used.
- e) Assume DDR with storage = 1GB  $\Rightarrow$   
consist of 1 RAMX, 8 chip, 8 banks per chip  
If rows are 14, find no. of columns
- $1\text{GB} = 2^{30}$  bytes. one location  $\rightarrow$  1 byte

So  $2^{30}$  locations.

$$8 \times 8 \times 2^x \times 14 \times 2^x = 2^{30}$$

↓      ↓      ↓      ↓      ↓  
 No. of banks    No. of    No. of    No. of  
 chips    per chip    rows in    columns  
 chip    columns    one chip    in  
 rows    one chip    bank    one bank.

$$x = 30 - 14 - 6$$

$$x = 10$$

∴ so 10 columns



$2^{32}$  bytes

Q. 2] storage capacity = 4GB.

2 ranks, 8 chips per rank, 8 banks per chip.  
column = 10, find rows?

$$2^1 \times 2^3 \times 2^3 \times 2^{10} \times 2^x = 2^{32}$$

$$x = 32 - 10 - 3 - 3 - 1$$

$$= 15$$

$$x = 15.$$

\* Error correction.  
Imp

one relation

$$2^{k-1} \geq M+k$$

This inequality gives the no. of bits...

[Incomplete]

M = data length

k = syndrome length.

- 1) If syndrome is all 0's  $\Rightarrow$  No errors.
- 2) If syndrome contains one and only one bit set to 1, then an error is occurred in one of the 4 check bits. No correction is needed.

- ④ If syndrome contains more than one bit set to 1, then the numerical value of the syndrome indicates the position of the data bit in error. This data bit is inverted.

The check bits are calculated as follows:

'⊕' → denote exclusive-OR.

$$C_1 = D_1 \oplus D_2 \oplus D_4 \oplus D_5 \oplus D_7$$

$$C_2 = D_1 \oplus D_3 \oplus D_4 \oplus D_6 \oplus D_7$$

$$C_4 = D_2 \oplus D_3 \oplus D_4 \oplus D_8$$

$$C_8 = \quad \oplus D_5 \oplus D_6 \oplus D_7 \oplus D_8$$

Each check bit operates on every data bit whose position number contains a 1 in the same [Incomplete].

a) Data = 00111001  
 $\begin{array}{c} \downarrow \\ D_8 \end{array}$        $\begin{array}{c} \downarrow \\ D_1 \end{array}$

$C_1 = D_1 \oplus D_2 \oplus D_4 + D_5 \oplus D_7$

$$= 1 \oplus 0 \oplus 1 \oplus 1 \oplus 0$$

$$= 1$$

$$C_2 = D_1 \oplus D_3 \oplus D_4 \oplus D_6 \oplus D_7$$

- ② If syndrome contains more than one bit set to 1, then the numerical value of the syndrome indicates the position of the data bit in error. This data bit is inverted.

The check bits are calculated as follows:

$\oplus$  → denote exclusive-or.

$$C_1 = D_1 \oplus D_2 \oplus D_4 \oplus D_5 \oplus D_7.$$

$$C_2 = D_1 \oplus D_3 \oplus D_4 \oplus D_6 \oplus D_7.$$

$$C_4 = D_2 \oplus D_3 \oplus D_4 \oplus D_8.$$

$$C_8 = D_5 \oplus D_6 \oplus D_7 \oplus D_8.$$

Each check bit operates on every data bit whose position number contains a 1 in the same [Incomplete].

e.g. Data = 00111001

$$\begin{array}{c} \downarrow \quad \downarrow \\ D_8 \quad D_1 \end{array}$$

$$\rightarrow C_1 = D_1 \oplus D_2 \oplus D_4 + D_5 \oplus D_7$$

$$= 1 \oplus 0 \oplus 1 \oplus 1 \oplus 0$$

$$= 1$$

$$C_2 = D_1 \oplus D_3 \oplus D_4 \oplus D_6 \oplus D_7$$

$$= 1 \oplus 0 \oplus 1 \oplus 1 \oplus 0 = 1.$$

$$c_4 = d_2 \oplus d_3 \oplus d_4 \oplus d_8$$

$$= 0 \oplus 0 \oplus 1 \oplus 0 = 1.$$

$$c_8 = d_5 \oplus d_6 \oplus d_7 \oplus d_2$$

$$= 1 \oplus 1 \oplus 0 \oplus 0 = 0.$$

→ Suppose now  $d_3$  sustains an error and changes from 0 to 1, so re-calculated check bits.

$$c_0 = 0 \quad c_1 = 1$$

$$c_2 = 0 \quad c_3 = 0$$

And now

$$\begin{array}{cccc} c_8 & c_4 & c_2 & c_1 \\ 0 & 0 & 0 & 1 \\ + & 0 & 1 & 1 & 1 \\ \hline 0 & 1 & 1 & 0 \end{array}$$

↳ this means  $d_3$  is erroneous  
by diagram given by Sir J.

SPPA  
2nd  
Semesters

SGPA

OF 3rd

Semester

CGPA of

4th semester

predict their CGPA of 4th semester  
design 3 layers neural network

### Co Lecture

July 3  
Thursday

- Hamming Parity Code → Hamming bits are added at powers of 2.  
Assuming 4 bits

1101

D<sub>3</sub> D<sub>2</sub> D<sub>1</sub> D<sub>0</sub>

|                |                |                |                |                |                |                |
|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| D <sub>3</sub> | D <sub>2</sub> | D <sub>1</sub> | P <sub>4</sub> | D <sub>0</sub> | P <sub>2</sub> | P <sub>1</sub> |
| 1              | 1              | 0              | P <sub>4</sub> | 1              | P <sub>2</sub> | P <sub>1</sub> |

→ 0/P = 100 \* s.

For P<sub>1</sub> → start from P<sub>1</sub> and pick one and skip one.

D<sub>3</sub> D<sub>2</sub> D<sub>1</sub> D<sub>0</sub> P<sub>1</sub>

1 0 1 P<sub>1</sub>

Take P<sub>1</sub> such that no. of ones are even.

1010.

similarly go for P<sub>2</sub>, P<sub>4</sub>, P<sub>8</sub>...

④ Then for P<sub>2</sub>:

Start from P<sub>2</sub> and go like pickup 2, skip.  
Also pick P<sub>2</sub> such that  
it is completely  
has even no. of  
1's.

D<sub>3</sub> D<sub>2</sub> D<sub>0</sub>, P<sub>2</sub>  
| | | |

⑤ Then for P<sub>4</sub>:

start from P<sub>4</sub> and go like pickup 4, skip.

D<sub>3</sub> D<sub>2</sub> D<sub>1</sub> P<sub>4</sub>  
| | 0 0

↳ Keep no. of 1's even.

Ans

11 0 0 1 1 0

If D<sub>3</sub> bit is getting corrupted due to some reason, so it got changed from 1 to 0.

Original: 11 0 0 1 1 0

Corrupted: 01 0 0 1 1 0

1<sup>st</sup> check

~~0010~~  $\Rightarrow$  Failed the condition  
of even no.  
of ones.

7<sup>th</sup> bit is getting corrupted due to some reason, so it got changed from 1 to 0.

[0101] Now on find  $P_4^{'}, P_2^{'}$  and  $P_1^{'}$  we get

$$P_4^{'} = 1, P_2^{'} = 0 \text{ and } P_1^{'} = 1$$

take XOR of old and new bits

$$\begin{array}{r} 010 \\ \oplus 101 \\ \hline 111 \end{array}$$

$\Rightarrow$  so 7<sup>th</sup> bit is an error.

i.e D<sub>3</sub> is an error.

↓  
so flip it.

[1101]  $\rightarrow$  so D<sub>3</sub> bit is corrected.

\* CRC — cyclic Redundancy check

Eg: Let message be 110011.

$$P(x) = x^4 + x^3 + 1$$

$$= 1x^4 + 1x^3 + 0x^2 + 0x + 1$$

$$= 11001$$

$$\begin{array}{r}
 & 1 & 0 \\
 11001 & \boxed{1} & 1 & 0 & 0 & 1 & 1 \\
 & 1 & 1 & 0 & 0 & 1 & \downarrow \\
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 0 & 0 & 1 \\
 & & 0 & 0 & 0 & 0 & 0 \\
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 0 & 1 & \\
 \end{array}$$

Wrong.

$$\begin{array}{r}
 1000001 \quad \text{→ Wrong.} \\
 \hline
 11001 \quad \boxed{1} & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 \\
 \oplus & 1 & 1 & 0 & 0 & 1 & \downarrow \\
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 0 & 0 & 1 \\
 \oplus & 0 & 0 & 0 & 0 & 0 & \downarrow \\
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 0 & 1 & 0 \\
 \oplus & 0 & 0 & 0 & 0 & 0 & \downarrow \\
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 1 & 0 & 0 \\
 \oplus & 0 & 0 & 0 & 0 & 0 & \downarrow \\
 \hline
 & \cancel{\phantom{0}} & 0 & 1 & 0 & 0 & 0 \\
 & & 0 & 0 & 0 & 0 & 0 \\
 \hline
 & \cancel{\phantom{0}} & 1 & 0 & 0 & 0 & 0 \\
 & & 0 & 0 & 0 & 0 & 0 \\
 \hline
 & & 1 & 0 & 0 & 0 & 0
 \end{array}$$

Wrong.

$$\begin{array}{r}
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 0 & 0 & 1 \\
 \oplus & 1 & 1 & 0 & 0 & 1 & \downarrow \\
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 0 & 0 & 1 \\
 \oplus & 0 & 0 & 0 & 0 & 0 & \downarrow \\
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 0 & 1 & 0
 \end{array}$$

$$\begin{array}{r}
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 1 & 0 & 0 \\
 \oplus & 0 & 0 & 0 & 0 & 0 & \downarrow \\
 \hline
 & \cancel{\phantom{0}} & 0 & 0 & 1 & 0 & 0
 \end{array}$$

$$\begin{array}{r}
 \hline
 & \cancel{\phantom{0}} & 0 & 1 & 0 & 0 & 0 \\
 & & 0 & 0 & 0 & 0 & 0 \\
 \hline
 & \cancel{\phantom{0}} & 1 & 0 & 0 & 0 & 0 \\
 & & 0 & 0 & 0 & 0 & 0 \\
 \hline
 & & 1 & 0 & 0 & 0 & 0
 \end{array}$$

$$\begin{array}{r}
 100001 \\
 \text{---} \\
 100 | 1100110000 \\
 \oplus 1100110000 \\
 \hline
 \cancel{0}00001 \\
 \oplus 00000 \\
 \hline
 \cancel{0}000100 \\
 \oplus 00000 \\
 \hline
 \cancel{0}01000 \\
 \oplus 00000 \\
 \hline
 \cancel{0}00000 \\
 \oplus 11001 \\
 \hline
 \cancel{0}1001
 \end{array}$$

msg = 1100111001  
 data      remainder

[Revise from DC].



④ data  
 $m_{sg} = 101101$   
 generator  $f(x) = x^3 + x^2 + x + 1$   
 $= 1101$

→ For n-bit polynomial add  $(n=0)$  bits to data

$$\begin{array}{r}
 1101 \\
 \hline
 1011011000 \\
 \oplus 1101 \\
 \hline
 01100 \\
 \oplus 1101 \\
 \hline
 00011 \\
 \oplus 0000 \\
 \hline
 00111 \\
 \oplus 0000 \\
 \hline
 01110 \\
 \oplus 1101 \\
 \hline
 00110 \\
 \oplus 0000 \\
 \hline
 01100 \\
 \oplus 1101 \\
 \hline
 0001
 \end{array}$$

↑

coefficient for generator polynomial  
generator polynomial should not get divided  
by  $x$  but not divided by  $(x+1)$

- \* Secondary storage → read about info
- \* Magnetic Read and write mechanisms.
- \* Disk Data Layout
- \* constant Angular velocity
- \* Multiple zones.
- \* Winchester Disk Format
- \* Physical characteristics of disk systems.  
Head Motion      Platters

Disk Portability      Head Mechanism

sides

- \* Disk classification
- \* Typical Hard Disk Drive Parameters
- \* Timing of a disk I/O transfer.
- \* Disk Performance Systems Parameters.

2nd year  
Saturday

### (Co Lecture)

- \* RAID → Redundancy Array of Independent Disks
- ↳ With respect to RAID 3, RAID 4, RAID 5 [width interleaving parity]

- ① Using one parity bit for strips
- ↳ So first strip from all 4 disks parity is calculated, and parity is kept at 5th hard-disk in first position location.

- \* RAID 4 [Block Level Parity]

- ① The strip size is very large
- ② 1st parity for each block in same index in each hard-disk is calculated separately
- ③ After it we have 4 parity bits.
- ④ These 4 parity bits are stored in 5th hard-disk with same index as strip or block.

- \* RAID 5 [Block Level Distributed Parity]

- ① Like RAID 4, parity strip access all disk.
- ② Round Robin allocation for parity strip.

- ③ Avoid RAID 4 bottleneck at parity disk.

Commonly used in network services.

- ④ RAID 6.

- \* RAID 6 [Dual Redundancy]

two parity calculations.  
stores different blocks on different disk  
uses requirements of RAID requires  
2+2 disks.  
High data availability.

\* RAID comparison → CCAPD Table 6.47.

\* Comparison of SSD and Hard Disk Drives.

\* SSD architecture \* Practical Issues of SSD that are not in HHD.

MBR and GPT disk

↓  
Master Boot Record.

↳ Its boot process

① System initialisation with firmware system  
BIOS [Basic I/O System].

② BIOS looks for boot loaders on MBR,  
then executes it.

④ Boot loader reads the partition table

FAT, NEFTS

Ext4, Ext3

↓

file system  
in windows

↓

Ubuntu file  
systems

⑤ Conventional Windows/DOS MBR search  
for one primary partition.

④ Load the OS.

- ① At power on  $CS = FFFFH$  and  $IP = 0000H$
- ② ROM BIOS present at  $FF00H$ . Once we power on the reading on ROM BIOS starts
- ③ While reading the ROM BIOS, it is made to read first sectors of HDD.
- ④ Very first sectors of HDD = boot sectors.

[ $CS:0000H$ ,

loaded from  
 $07C00H$ .

$IP: 7C00H$ ]

- ⑤ In this the very first sectors 512 bytes there is small boot program.

↳ Boot program finds out on HDD where is the active partition.

- ⑥ Active Partition means that OS is need to be loaded.

\* MBR

- i. The MBR that is located on first sector of hard-disk.





|           |             |                                 |
|-----------|-------------|---------------------------------|
| 000 - 445 | 000H - 1BDH | 445 bytes code area.            |
| 446 - 509 | 1BEH - 1FDH | 64 bytes master partition table |
| 510 - 511 | 1FEH - 1FFH | 2 bytes boot record signature   |

partition table breakdown:

|           |           |              |
|-----------|-----------|--------------|
| 446 - 461 | 1BE - 1C0 | 16 Bytes #1  |
| 462 - 477 | 1CE - 1D0 | 16 Bytes #2  |
| 478 - 493 | 1DE - 1E0 | 16 bytes #3. |
| 494 - 509 | 1EE - 1F0 | 16 bytes #4  |

→ 0 1B Boot Indicators [SOH]

1-3 3 Bytes starting sectors [CHS]

4 1 Bytes Partition type

5-7 3 Bytes Ending sectors [CHS]

8-11 4 Bytes starting sector [LBA]

12-15 4 bytes indicate partition size  
in sectors

→ i) If it is MBR hard disk, atmost 4 primary partitions possible.

ii) If more partitions are needed then one of primary partition is used as extended partition

iii) The part is :



q) In extended partition there may be linked list structure of extended partitions.

If anyone of all link is destroyed, then we may lose all extended partitions.

\* LBA → 32 bytes

∴  $2^{32}$  max sectors can be pointed by LBA.

Each sector 512 bytes.

$$\therefore 2^{32} \times 2^9 = 2^{41} \text{ Hence } = 2^{40} \times 2^1$$

Hence the entire capacity  $2^{40} - [1TB = 2^{40}]$

FAT = File Allocation table.



Loc 32 bit FAT using INT<sup>13</sup> as extension.

more types:

- ① 0E using 16 bit FAT using INT13 extension.
- ② 82 linux swap partition
- ③ 83 linux native file system.
- ④ EE GPT protective MBR
- ⑤ EFI/UEFI System partition.
  - ↳ globally uniquely ID.
- \* GPT [GUID partition table].
- initially intel came with IDR, that will remove limitation of 2TB disc.
- the proposed model is Extensible firmware interfaces [EFI].
- 3] EFI IDR is accepted by all industries.

Now it is UEFI

Unified extensible  
firmware...]

GPT → default partition style for X64 based systems.

↓  
USES EFI to boot drives.

| Protective MBR | GPT Headers | Partition Table | Partition Areas | Backup Area |
|----------------|-------------|-----------------|-----------------|-------------|
|----------------|-------------|-----------------|-----------------|-------------|

- uses primary and partition backup partition table to improve integrity
- supports up to 128 partitions.



- 1. First sector is protective MBR.
- 2. Next sector is GPT header.
- 3. Next few sectors are LBA's.
- 4. Next starts actual partition area.
- 5. At end, few are backup area.

Backup of primary GPT header & present in  
secondary GPT header.

④ LBA 2 - 33 if upper part is copied to  
LBA 2' - 33' in lower part, near secondary  
GPT for backup.

at max 2<sup>8</sup> partitions and each sector store  
information about 4 partitions i.e. LBA 2-33

$$\frac{128}{4} = 32 \text{ sectors are required.}$$

⑤  $\frac{512}{4} = 128$  bytes [Storage for each partition  
info].

⑥ After LBA - 33, actual position starts.

### Virtual Memory organisation

process  $\leftrightarrow$  Cache

Cache  $\leftrightarrow$  main memory      One block contain multiple words

Cache  $\leftrightarrow$  main memory

One page contain multiple block

One frame contain multiple pages

~~Cache~~ main memory  $\leftrightarrow$  Secondary memory.

Main memory to Secondary memory we  
don't have the logic of write through.

As access time is much  
higher.



Backup of primary GPT header present in secondary GPT header

② LBA 2 - 32 in upper part is copied to LBA 2' - 32' in lower part, near secondary GPT for backup.

③ At max 32 partitions and each sector store information about 4 partitions i.e. LBA 2-33

$$\frac{128}{4} = 32 \text{ sectors are required.}$$

$$\textcircled{4} \quad \frac{512}{4} = 128 \text{ bytes [storage for each partition info]}$$

⑤ After LBA - 33, actual partition starts.

### \* Virtual Memory organisation

→ process  $\leftrightarrow$  Cache

Cache  $\leftrightarrow$  main memory

One page contains multiple blocks

One frame contains multiple pages

Each main memory  $\leftrightarrow$  Secondary memory

In Main memory to Secondary memory we don't have the logic of write through.

As access time is much higher.

- CO 62 | Basics on P&M  
Date: [REDACTED]
- \* Virtual Memory Address Translation
    - ↳ Page Fault, Virtual Address, Physical Address
  - ④ The unit that decides/moves from Secondary memory to Main memory or vice versa is called the page.
  - \* TLB - inner core is TLB
    - [web.cs.wpi.edu]
  - \* TLB → Translational lookup buffer.
    - ↳ aside special type of n-way associative cache

MMU → Memory Management Unit

- ↳ mapping of logical address to

### CO Lecture

Skills

Monday

\* Virtual Memory

→ Use main memory as a "cache" for secondary (disk) storage.

- ↳ Managed by CPU hardware and OS.

Programs share main memory.  
Each gets a private virtual address space holding its frequently used code and data.

CPU and OS translates virtual addresses to physical addresses.

- ① VM block is called a page
- ② VM translation "miss" is called a page fault.



page = fixed sized unit which moves from secondary to primary memory.

RAM consists of some blocks whose entries are in TLB.

Page

P → consist info about blocks which resides in the main memory.

Page exchange



for this scenario, main memory acts as the cache.

- \* Address Translation
- Fixed - size pages [Eg: 4K]



### \* Page Fault

↳ On page fault, the page must be fetched from disk.

↳ Takes millions of clock cycles.

↳ Handled by OS.

### \* Page Table

*↳ resides in the main memory*

↳ Stores following info:

① Contains page table entries, indexed by virtual page numbers.

② Page table register in CPU points to page table in physical memory [RAM]

page is present in main memory:  
① PTE stores physical page no.  
② also stores other status bits like dirty,  
valid, etc.

translating using a Page Table

Mapping Pages to storage \* Replacement and writes  
① LRU replacement policy

Here we have various bits

- a) Reference bit in PTE set to 1 on access to page.
- b) Periodically cleared to 0 by OS.
- c) [Incomplete].

Fast Translation using SLB

Address translation would appear to require extra memory references.

- a) one to access the PTE.
- b) Then the actual memory access.

But access to page tables has good locality.

- a) So use a fast cache of PTE's within the ~~buffer~~ CPU.
- b) Called a TLB [Translational look-aside Buffer]

- \* TLB Misses
  - \* TLB and Cache Interaction
  - \* Use of swapping      → Dynamic Partitioning
  - \* Allocation of Free Frames
  - \* Logical and Physical Address
    - Logical Address
    - Physical Address
      - Page numbers
      - relative address within page
    - relative address within frame



- \* Virtual Memory and demand Paging
  - X86-based processors makes use of this

① Each page of a process is brought in only when it is needed.

## Principle of locality

- 4. Find First in local storage, if present use it else triggers the page fault if it is not present in local storage.

## concept of thrashing



$$\begin{aligned}1\text{ KB} &= 2^{10} \text{ Bytes} \\1\text{ MB} &= 2^{20} \text{ Bytes} \\1\text{ GB} &= 2^{30} \text{ Bytes} \\1\text{ TB} &= 2^{40} \text{ Bytes}\end{aligned}$$

class  
Monday

co lecture

- \* TLB and cache operations
- i) multiple programs can be executed simultaneously
- ii) Any program / software that requires more memory than available in main memory can also be serviced.
- \* Segmentation
  - ① visible to programmers.
  - ② Hence programmes can think of various segments in address space.
  - ③ Includes hardware for both segmentation and page.





48 - 000000

- 7. Each virtual address
  - ↳ 16 bit segment reference.
  - ↳ 32 bit offset
- ↳ unsegmented virtual memory =  $2^{32} = 4\text{GB}$
- ↳ segmented Virtual memory =  $2^{48} = 64\text{TB}$
- 8. physical address space employs a 32 bit address for a max of 4GB.
- 9. virtual address space is divided into two parts:
  - a) one-half is global, shared by all processes.
  - b) the remainder is local and is distinct for each process.

- \* segment protection
- \* associated with each segment are two forms
  - ↓      ↗
  - privilege      Access
  - level      attribute

- \* Intel x86 Memory Management Formats.
- \* Address Translation Mechanisms.

— X Memory Management Ends X —

- \* Pipelining starts \*
- \* Instruction cycle state diagram
- \* Data-flow, fetch cycle



[Incomplete Diagram]

- ① The stages in instruction processing include fetch the instruction [IF], decode the instruction [ID], calculate operand address, fetch the operand, execute instruction [IE], write result [WR]

Q.1] Total Students = 170

Total answerbooks = 500 → Each answer sheet has 5 questions.  
Avg 5min/ans book question

$$500 \times 5 = 2500 \text{ mins}$$

for one question → 5 min  
one.

for ansbook → 25 min

$$\text{for all papers} \rightarrow 25 \times 500 \\ = 12500 \text{ mins.}$$

was case when only individual examiner  
is checking the papers.

|    | Ex1  | Ex2  | Ex3  | Ex4  | Ex5 |
|----|------|------|------|------|-----|
| t1 | A1-1 |      |      |      |     |
| t2 | A2-1 | A1-2 | A1-3 |      |     |
| t3 | A3-1 | A2-2 | A2-3 | A1-4 |     |

If 5 examiners are working in pipelined way i.e 1<sup>st</sup> examiner checks only 1<sup>st</sup> que and give to next examiner

Initially at starting, after 25 min A1 will be completely assessed.

then after every 5min, next answer book is completely assessed.

$$\text{Total time } t = 25 + 5 \times 4 = 25 + 20$$

↓  
First  
question  
will take  
25 mins

Pipeline → allows to improve overall throughput.

$$\text{formula} = T + (n-1) \left( \frac{I}{K} \right) \quad ①$$

for time  
using  
pipeline

\*  $T_{\text{regulates}} = nT - \Theta$   
↓  
without  
using pipeline

$$\begin{aligned} * \text{ Speed up} &= \frac{T_{\text{old}}}{T_{\text{new}}} = \frac{nT}{T + (n-1) \left( \frac{T}{K} \right)} \\ &= \frac{n}{1 + \frac{n-1}{K}} \\ &= \frac{nK}{nK - 1} \end{aligned}$$

### \* Pipelining for the instructions



First in

completion of 1st instruction in 6 clock cycles.  
After that one clock cycle each another instruction will be completed.

for executing instructions → need 10 clock cycles  
↳ But all this is in ideal scenario.

But there are many dependencies:

a) Flow dependence

↳ A statement  $s_2$  is flow-dependent on statement  $s_1$  if an execution path exists from  $s_1$  to  $s_2$  and if at least one o/p of  $s_1$  is feed as an i/p to  $s_2$ .

b) Antidependence

↳ statement  $s_2$  is antidependent on statement  $s_1$  if  $s_2$  follows  $s_1$  in program order and if output of  $s_2$  overlaps the input to  $s_1$ .

c) O/P dependence    e) Unknown dependence

d) I/O dependence

Two statements are o/p dependence if they produce (write) the same o/p variable.

Read and write are I/O statements. I/O dependence occurs not because the same variable is involved but because the same file is referenced by both I/O statements.

Hazards:

- ① RAW → read after write
- ② WAR → write after read
- ③ WAW → write after write

\* Data dependence in programs:

- S1: Load R1, A
- S2: Add R2, R1
- S3: Move R1, R3
- S4: Store B, R1

- ① S1 and S2 flow dependant.
- ② S2 and S3 are o/p dependant.
- ③ S2 and S3 are anti-dependant

clu123  
monday

### Robo Lecture

\* Robot languages and programming

WAVE and AL

question concome  
on this in ESE

① VAL

② AML

③ MCL

④ RAIL

⑤ AUTOPASS

① textual  
robot language  
User-friendly  
Provides arm movement  
in joint, world and tool  
rates, gripping and speed control

11/12/23

Wednesday

[collected]

- \* Branch predictions.
- \* unconditional branching



\* Decode stage or  
the pipeline

- \* Instruction address generator:



[incomplete diagram]

- \* Branch delay slot

Add R7, R8, R9

Branch if [R3 == 0] Target

I<sub>j+1</sub>

→ means finding target address

IF we calculate this in encode stage then branch penalty is more than that as compared to calculate in decode stage

- a) original sequence of instructions containing a conditional branch instruction.

Branch if  $P_0 == 0$  Target  
add R7, R2, R9  
 $i_{j+1}$

target:  $i_k$

b) placing the add instruction in the branch delay slot where it is always executed.

↳ Delayed branching technique

\* Branch delay slot

↳ Here place a instruction which is independent of whether branch is taken or not.

And the execution of this instruction is compulsory

↓  
↳ becoz it depends on branch condition satisfaction.

↳ While choosing this ensure that data dependancies are retained (preserved)

↓  
If no such independent instruction exist, then we insert 'NOP' in it.

↳ [No operation]

- \* control hazard (branch hazard)
  - i) occurs when pipelines make wrong decision on branch prediction.
  - ii) Dealing with branches
    - a) Multiple streams
    - b) Prefetch branch target
      - L1 prefetch both data and instructions.
    - c) Branch prediction
    - d) Delayed branch



- \* Multiple streams
  - \* Branch prediction
  - \* Loop buffers.

### \* Branch prediction

Approach 1  
Predict never taken

Approach 2

Predict always taken

Approach 3  
Predict by Opcode

These are static approaches

DO not depend on execution history.

$\leftarrow$  Branch

Approach 4: Taken/not taken switch  $\rightarrow$  approaches use  
Approach 5: Branch history table

Branch prediction flow chart

Read next  
conditional  
branch  
instruction

Predict taken

YES Branch  
taken?

Approach 6: History

Approaches use  
dynamic  
and depend on  
execution  
history.

[Incomplete  
diagram]

\* Stall on Branch.

↳ can come in PSE.

\* RISC-V Pipelined  
Datapath.

Unit of Pipelining  
Ends

CN Lecture

IEEE 802.11 WiFi Frame Format

|       |          |           |           |           |
|-------|----------|-----------|-----------|-----------|
| Frame | Duration | Address 1 | Address 2 | Address 3 |
|-------|----------|-----------|-----------|-----------|

|          |           |      |          |
|----------|-----------|------|----------|
| Sequence | Address 4 | Data | Checksum |
|----------|-----------|------|----------|

frame





Q: Difference between 802.11 and 802.11  
Ethernet WiFi  
format.

\* ARP protocol in wired access

- ① Switch don't have MAC id.
- ② Router has MAC id.

ARP protocol → so let's suppose we want to send data to a host whose IP we know, but we don't know MAC of him. So for finding his MAC we need ARP protocol.

↳ So for this it sends

the host  
may be any  
in subnet  
↑  
\* DNS resolver  
↓  
Domain name ↔ IP  
mapping

U/S ARP.

Mac Address ↔ IP  
mapping

Here we want  
host in same  
subnet

ARP querying is done by switch and request is sent by host to switch.



IF A wants to send a packet to B but it doesn't know the B's MAC.  
So:

- ① A sends a broadcast ARP request to switch 1
- ② Switch 1 then does ARP, and sends a discovering request to ALL.
- ③ Router also receives discovering request so it also now does ARP.
- ④ Now Switch 2 also receives discovering request from Router, so it again does the ARP.
- ⑤ And when switch 2 gets it ~~also~~ does backtrack.

Link Layer Switches - Filtering and Forwarding

self learning

Virtual Local Area Networks [VLAN'S]

↓  
Concept is related to the proxy server traffic  
↳ Isolation of networks

\* DHCP ~~uses~~ uses UDP.

\* How to search www.google.com  
→ New node

DHCP

ARP

Google

ARP

→ required everywhere

for detail  
to modify question in EOE

150