

24/7/19



\* Program :- holds the address of the next counter instruction.

\* Stack :- stores the result of the previous PC pointer



MOV R<sub>1</sub>, R<sub>2</sub>  
opcode.

output enabled → to be stored in database  
input enabled → reads the info from database





## \* Memory hierarchy



Info needs to be transferred to main memory from secondary memory.

25/7/19

## \* Simple CPU Design





## # Instruction Read

- fetch opcode
- opcode decode
- execute instn.

I<sub>0</sub>: { ADD  $R_1$  → repeated addition  $\Rightarrow$  multiplication

I<sub>1</sub>: { CMP → has nand (uses flip-flops) performs subtraction

I<sub>2</sub>: { AND  $R_1$  → Logical & NAND from CMP takes care of other logical op.

I<sub>3</sub>: JMP

I<sub>4</sub>: MOV  $R_1, R_2$

I<sub>5</sub>: MOV  $R_2, R_1$

I<sub>6</sub>: MOV Acc,  $R_1$  → destination set depends on instruction

I<sub>7</sub>: MOV Acc,  $R_2$  → source of indirect address

I<sub>8</sub>: MOV  $R_1, Acc$

I<sub>9</sub>: MOV ,  $R_2, Acc$

I<sub>10</sub>: MOV Acc, M (memory to accumulator)

I<sub>11</sub>: MOV M, Acc

I<sub>12</sub>: MVI



[Acc  $\leftarrow \overline{Acc}$ ]

M → Acc VOM

[RAM] M  $\rightarrow$  Acc

:DT

no decimal width  $\rightarrow$

width for operand



Instruction register

Eg.: examples of instructions in memory

T11 → register ref. instn =

001 → MOV Acc, M

010 → MOV M, Acc

011 → JMP

\* For any instr<sup>n</sup> to be executed

To: MAR  $\leftarrow$  PC

T<sub>1</sub>:  $\begin{cases} IR \leftarrow M[\text{MAR}]; \text{ Instn res. should be loaded with the instn in MAR} \\ \text{now, opcode is selected.} \end{cases}$   
 $PC \leftarrow PC + 1$  Assuming every instr<sup>n</sup> to take only 1 memory location

T<sub>2</sub>:  $\begin{cases} DCD(IR); \text{ Decode the instn resistor.} \\ \text{MAR} \leftarrow IR_{0:11}, IR_1 \end{cases}$

Load the MAR with content of memory,  $IR_{0:11}$

T<sub>3</sub>: ADD R<sub>1</sub>

DR  $\leftarrow$  R<sub>1</sub>, DR is loaded with contents of R<sub>1</sub> so that ALU can access it.

T<sub>4</sub>: ACC  $\leftarrow$  ALU<sub>ADD</sub> (ACC, DR), To(result is given down to ALU)  
(Time stamps)

move back to T<sub>0</sub> (timestamp)  
next instn fetched.

### # MOV R<sub>1</sub>, R<sub>2</sub>

T<sub>5</sub>: R<sub>1</sub>  $\leftarrow$  R<sub>2</sub>

↓  
now next T<sub>0</sub>,

### # MOV Acc, M

T<sub>6</sub>: ACC  $\leftarrow$  M [ MAR ]

To;

Acc gets the contents of the memory address, which is already loaded to  $IR_{0:11}$  and (meaning loaded to MAR)

Sequence counter (4 bit)

its size depends on complexity of instn

16 time stamps. Assuming that the entire task will not take more than

T<sub>15</sub>

To

16 bit of timestamp.

Sequence counter

CLR everytime we go back to T<sub>0</sub>, we clear the sequence counter  
INCR  
CLK clock.

•  $I_0, I_{15}$



\* Usually in 8085 machine the address has a jump inst<sup>n</sup>  
Jumps to application area

Timestamp T<sub>0</sub>

PC:  $O_E = T_0 + t_A$  (enabled)  
MAR:  $LD = T_0 + t_A$  (enabled)

Timestamp T<sub>1</sub>

PC: INCR = T<sub>1</sub> +

IR: LD = T<sub>1</sub> + ~~for a particular report ← last location read~~ \*

Timestamp T<sub>2</sub>

MAR: LD = T<sub>2</sub> +

IR: OE = T<sub>2</sub> +

For ADD R<sub>1</sub> , Timestamp T<sub>3</sub>

DR: LD = T<sub>3</sub> D<sub>7</sub> Io +

Acc = T<sub>4</sub> D<sub>7</sub> Io + [regiser ref. inst<sup>n</sup> (OCD) → 111]

R<sub>1</sub>: OE = T<sub>3</sub> D<sub>7</sub> Io +

ALU = ADD = T<sub>4</sub> D<sub>7</sub> Io +

SEQCTR:INCR =

$T_0 + T_1 + T_2 + T_3 D_7, I_0 + T_4 D_7, I_0 +$

For MOV ~~Acc, R<sub>2</sub>~~, Timestamp T<sub>3</sub>

CLR = T<sub>4</sub> D<sub>7</sub> Io +

Acc: R<sub>2</sub>: LD = T<sub>3</sub> D<sub>7</sub> I<sub>5</sub> +

T<sub>2</sub>: OE = T<sub>3</sub> D<sub>7</sub> I<sub>5</sub> +

For  $MOV R_1, R_2$ ,

$$R_1: LD = T_3 D_7 I_{14} +$$

$$R_2: DE = T_3 D_7 I_{14} +$$

For  $MOV Acc, R_1$ ,

$$R_1: DE = T_3 D_7 I_6 +$$

$$Acc: LD = T_3 D_7 I_6 +$$

31/7/19

MOV Acc, M

T<sub>3</sub>,

$$Acc: \cancel{LD} \leftarrow LD = T_3 D_7 I_{10} +$$

MAR,

$$T_3: Acc \leftarrow M [MAR], I_0$$

$$Acc: LD = T_4 D_7 I_0 + T_3 D_7 I_0 +$$

$$ALU: ADD = T_4 D_7 I_0 +$$

SEQCTR: INCR =

$$CLR = T_4 D_7 I_0 + T_3 D_7 I_0 +$$



MOV Acc, M

MOV1 Acc, M

$$Acc: LD = T_3 D_7 I_{15} +$$

(base) (indirect add.) RAM



Indirect memory location.

MOV1 Acc, M

Acc: LD = T\_3 D\_7 I\_{15} +

(base) (indirect add.) RAM

\* Hardwired control unit  $\rightarrow$  Proper circuitry is used to generate control signals.

Adv: Very fast

Disadv: Its not scaled.

\* Microprogrammed ctrl  $\rightarrow$  alternate unit design.

addr scalable as programming is allowed. I, O, T, S, D

disaddr, programming allowed.

memory is used up and hence slow.

\* Central memory  $\rightarrow$  control words. (CM)

In Time Stamp T<sub>0</sub>

LD MAR — C<sub>0</sub> designated 0th bit of CM

OE<sub>PC</sub> — C<sub>1</sub> " 1st bit of CM

0...---1101

Branch add.

(BA)

during T<sub>1</sub>,

LD IR — C<sub>2</sub>

RD<sub>M</sub> — C<sub>3</sub>

(read memory)

INCR<sub>PC</sub> — C<sub>4</sub>

0...---01110002

During T<sub>3</sub>

OE<sub>IR</sub> — C<sub>5</sub>

0...---0100001xx

sum(Carry C<sub>5</sub>)



7/8/19



After decode & decode, we don't know which memory add to jump to to execute next inst<sup>n</sup>.

Microprogram sequencer → all other units except the ctrl memory.

+ control memory = Microprogrammed Ctrl unit.

Timestamps;

T<sub>0</sub>: MAR ← PC

T<sub>1</sub>: IR ← M[MAR]; PC ← PC + 1

T<sub>2</sub>: DCD(IR);

MAR ← I<sub>0,11</sub>

A LD MAR = C<sub>0</sub>

OE<sub>PC</sub> = C<sub>1</sub>

LD<sub>IR</sub> = C<sub>2</sub>

RD<sub>M</sub> = C<sub>3</sub>

PC<sub>INCR</sub> = C<sub>4</sub>

OE<sub>IR</sub> = C<sub>5</sub>.

\* C<sub>5</sub> C<sub>4</sub> C<sub>3</sub> C<sub>2</sub> C<sub>1</sub> C<sub>0</sub> BA M  
 For T<sub>0</sub>, 0 0 0 0 1 1 F~~0~~ 01 1

For T<sub>1</sub>, 0 1 1 1 0 0 02 1

For T<sub>2</sub>, 1 0 0 0 0 1 XX 0

we do not know what instr to execute, so we move to internal add-generator.

Eg: MOV R<sub>1</sub>, R<sub>2</sub>

BA Mode  
1 + 0 ... . 0 0 0 1

OE R<sub>2</sub> = C<sub>6</sub>

LD R<sub>1</sub> = C<sub>7</sub>

Eg:  
ADD R<sub>1</sub>  
T<sub>3</sub>: DR  $\leftarrow$  R<sub>1</sub>

C<sub>12</sub> C<sub>11</sub> C<sub>10</sub> C<sub>9</sub> C<sub>8</sub> C<sub>7</sub> C<sub>6</sub> ... BA Mode.  
0 ... 0 1 1 0 0 ... 0 0 1 T<sub>3</sub>

T<sub>n</sub>: ACC  $\leftarrow$  ALU<sub>AD</sub> (ACC, DR); To

1 1 1 0 ... . 0 0 0 1

LD DR = C<sub>8</sub> ALU<sub>AD</sub> = C<sub>11</sub>

OE R<sub>1</sub> = C<sub>9</sub> OE ALU = C<sub>12</sub>

LD ACC = C<sub>10</sub>

## FF Microinstruction design

n, no of bits [log<sub>2</sub>n]

Horizontal μ programming signal inst<sup>n</sup> to be decoded/encoded

Vertical μ programming multiple inst<sup>n</sup>s to be decoded/encoded

Diagonal μ programming (somewhere in between)  
requires > bits than VμP

< bits than HμP

### \* Compatibility class

The compatibility class is a set of control signals such that the ctrl signals within that set are pairwise compatible i.e., no two ctrl signals are active simultaneously.

### \* Maximal compatibility class

It is a compatibility class to which no other ctrl signal can be added without introducing incompatibility.

### \* Minimal cover

It is a minimal subset of maximal compatibility class, which includes all the control signals. (We will go for encoding in Minimal cover in case of diagonal μP).

## # Control signals

$I_1 : ab \wedge g$

$I_2 : ace \wedge h$

$I_3 : adf$

$I_4 : bcf$

$a, b, c, d, e, f, g, h,$   
ctrl signals reqd.

$S_1 : a, b, e, d, f, g, h.$  next, after 1st, strike off all signals not app. to gates

$S_2 : bd, be, bh, cd, de, dg, dh, ef, eg, fg, fh, gh$   
pairs of signals  
which are never  
together

$S_3 : \text{make a group of 3 ctrl signals}$

~~bde, bdh, edg, egh, ecd, edg, edh, def, deg, dgh~~

$S_4 : \text{ctrl signals together}$

8/8/19

Cover table

|             | $\bar{a}$ | $b$ | $\bar{c}$ | $d$ | $e$ | $f$ | $g$ | $h$ |
|-------------|-----------|-----|-----------|-----|-----|-----|-----|-----|
| $K_1 = a$   | x         |     |           |     |     |     |     |     |
| $K_2 = cd$  |           | x   | x         |     |     |     |     |     |
| $K_3 = bde$ | x         |     |           | x   | x   |     |     |     |
| $K_4 = bdh$ |           | x   |           | x   |     | x   |     |     |
| $K_5 = deg$ |           |     | x         |     | x   | x   |     |     |
| $K_6 = dgh$ |           |     |           | x   | x   | x   |     |     |
| $K_7 = efg$ |           |     | x         | x   | x   | x   |     |     |
| $K_8 = fgh$ |           |     |           | x   | x   | x   | x   |     |

essential  
covers.

strike off dominating columns.  
So,  $K_1, K_2$  are essential covers.

## Reduced countable

|              | b | d | e | f | g | h |
|--------------|---|---|---|---|---|---|
| $K_3 = bde$  | x |   | x |   |   |   |
| $K_4 = bch$  | x |   |   |   |   |   |
| $-K_5 = deg$ |   |   | x |   |   |   |
| $-K_6 = dgh$ |   |   |   | x |   |   |
| $K_7 = efg$  |   | x | x |   |   |   |
| $K_8 = fgh$  |   | x |   | x |   |   |

we will strike off dominated rows  
and not dominating rows.

$$\text{Minimal cover} = \{(K_1, K_2), K_3, K_8\}$$

or

$$\{(K_1, K_2), K_4, K_7\}$$

"all the control signals have been covered"

we need to choose these in such a way that all the alphabets are covered and not struck off in the reduced countable.

## \* Pipelining

Assuming 4 inst's executed.

| I <sub>1</sub> | EX |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |
|----------------|----|----------------|----------------|----------------|----------------|----------------|
| I <sub>2</sub> | OD |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |
| I <sub>3</sub> | OF | I <sub>1</sub> |                | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |
| I <sub>4</sub> | IF | I <sub>1</sub> |                | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |

{instn fetch IF  
opcode fetch OF  
opcode decode OD  
execution EX}

16 time stamps reqd. for 4 instns  
non pipeline method.

Best case scenario :-

IR gets from I<sub>1</sub>)  
is fetched

| EX |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |  |
|----|----------------|----------------|----------------|----------------|----------------|--|
| OD |                | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |  |
| OF | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |                |  |
| IF | I <sub>1</sub> | I <sub>2</sub> | I <sub>3</sub> | I <sub>4</sub> |                |  |

7 time stamps reqd.

Worst case :-

### Data dependency Hazards :

RAW - read after write

WAR - write after read

WAW - write after write

∅  
I<sub>0</sub>: MOV R<sub>1</sub>, R<sub>2</sub> (R<sub>1</sub> written on)

I<sub>1</sub>: ADD R<sub>1</sub> R<sub>1</sub> ← A<sub>1</sub> + R<sub>1</sub> (R<sub>1</sub> is now read)

I<sub>2</sub>: MOV R<sub>1</sub>, R<sub>3</sub> (R<sub>1</sub> is being written on)

I<sub>3</sub>: ADD R<sub>2</sub>, R<sub>3</sub> (R<sub>3</sub> is now read)

(I<sub>0</sub>, I<sub>1</sub>) → RAW dependency

(I<sub>1</sub>, I<sub>2</sub>) → WAR

(I<sub>0</sub>, I<sub>2</sub>) → WAW

(I<sub>0</sub>, I<sub>3</sub>) → WAR

ADD R<sub>2</sub>, R<sub>4</sub>  
MOV R<sub>5</sub>, R<sub>6</sub>

Both of them are reading from same source, this is not dependency.

Q. Find all possible dependencies present in the following program fragment when you are using a 4-stage pipeline.

I<sub>0</sub>: MOV R<sub>1</sub>, R<sub>2</sub> ① ②

I<sub>1</sub>: ADD R<sub>3</sub>, R<sub>4</sub> ③ ④

I<sub>2</sub>: SUB R<sub>1</sub>, R<sub>5</sub> ① ⑤

I<sub>3</sub>: ADD R<sub>2</sub> ②

I<sub>n</sub>: MOV R<sub>5</sub>, R<sub>6</sub> ⑤ ⑥

I<sub>5</sub>: ADD R<sub>5</sub> ⑤

I<sub>6</sub>: MOV R<sub>7</sub>, R<sub>1</sub> ① ⑦

Soln:-

(I<sub>0</sub>, I<sub>2</sub>) → WAWS (I<sub>0</sub>, I<sub>3</sub>) → X

(I<sub>0</sub>, I<sub>6</sub>) → RAW (I<sub>2</sub>, I<sub>6</sub>) → WAR

(I<sub>2</sub>, I<sub>6</sub>) → TRAW (I<sub>2</sub>, I<sub>5</sub>) → X

(I<sub>4</sub>, I<sub>5</sub>) → RAW

\* Assuming  $I_0$  &  $I_8$  dependent on each other.

i. To make sure the instn that one depends on releases its products not before the one is executed.

9/8/19

## MEMORY MANAGEMENT

### \* Virtual Memory

Made to believe that we have available

Paging }  
Segmentation } using these  
we can apply  
the concept of  
virtual memory  
in our system.



### # Paging

"Main mem. can be divided into same sized pages"



When there is a demand for mem. & the MM cannot provide it, then:

swap out :- some pages are swapped out from MM  $\Rightarrow$  SM

swap in :- some pages are swapped in from SM  $\Rightarrow$  MM (empty slot)

Choosing page to swap  $\rightarrow$  on the basis of some Algorithm.

### # Page Replacement algorithms

- FIFO (First in first out)
- LRU (Least recently used)
- MFU (Most frequent used)
- Optimal replacement

Eg: 1, 4, 5, 1, 6, 9, 4, 2, 3, 5, 7

page reference stream

Assuming  $\rightarrow$  each page  $\xrightarrow{\text{size}} 1K$   
main memory  $\xleftarrow{\text{accommodate}} 4\text{ pages}$ .



FIFO:

(like queue)



XPN:-

\* No. of page missed.

1, 4, 5, 1, 6, 9, 4, 2, 5, 3, 1, 7, 9, 3, 8, 5, 2, 5, 6, 7, 8.

Serial FIFO



OPT. TAP.



+8 +4 → 12

RF FIFO



13 + 4 → 17



## # Segmentation



aim:- go for minimum fragmentation.

[11K, 1K, 5K, 10K, 4K]

## External fragmentation



→ First fit

[11K, 1K, 5K, 10K, 4K]



→ Worst fit

aim:- maximize fragmentation

[11K, 1K, 5K, 10K, 4K]



DRAM (Dynamic RAM)  
[made of capacitors] → Main memory

SRAM (Static RAM) → Cache memory  
[costly]





22/8/19

- Q. Consider a CPU where all instructions require 7 clock cycles to complete execution. There are 140 instructions in the instruction set. It is found that 125 control signals are needed to be generated by the control unit. While designing the horizontal microprogrammed control unit, single address field format is used for branch control logic. What is the maximum size of the control word & control address register?

Soln - each instruction takes 7 cycles

$$\therefore 140 \text{ instructions take} = (140 \times 7) \text{ cycles} \\ = 980 \text{ cycles}$$

$$\text{Now for } 2^m \geq 980$$

$$m \geq 10$$

$$\therefore (10 + 125) \text{ bits} \rightarrow \text{control word}$$

10 units control address register.

Q. Consider the following sequence of micro operations

MBR  $\leftarrow$  PC  
MAR  $\leftarrow$  X  
PC  $\leftarrow$  Y  
Memory  $\leftarrow$  MBR

- A. Instruction fetch
- B. Operand fetch
- C. Conditional branch
- D. Initiation of interrupt service.

which one of the following is a possible operation performed by this sequence?

Ans MBR - Memory buffer register

It stores the data being transferred to and from the immediate access store.

MAR - Memory address register.

It holds the memory location of data that needs to be accessed.

PC - Program counter.

It contains the address of the instruction being executed at the current time.

The 1st instruction places the value of PC onto MBR.

The 2nd instruction places an address X into MAR

The 3rd instruction places an address Y into PC.

The 4th instruction places the value of MBR (old PC value) into memory.

Q.

23/8/19

## \* Cache Memories      Associative

SRAM (stack RAM)

makes up cache memory.

CM < MM < SM

Cache lines



$\{(size\ of\ page)\} = \text{Cache memory}$   
Block in each page



$2^{16}$

1

0

.....

0

user programs cannot be placed in cache memory: CPU cannot access it directly

CPU  $\xrightarrow{\text{first search}}$  CM  $\xrightarrow{\text{then search}}$  MM  
cache miss / cache hit (and not)  
(does not get) (get it)

microprocessor = address



Tag memory stores address of blocks in CM

Tag (13 bits)



CM



OS divides the 16 bits to search through CM

This part is called Tag

Any blk of the MM can be placed in any cache line

We search the Tag sequentially for a block till we get it  
 $\xrightarrow{\text{if we don't}} \text{Cache miss.}$

Remove the Tag & place it in Argument register.



- One bit associated with each tag



- Initially all bits are 0.
- As we have not cleared CM, then CM can have some garbage value.*

When a block is sent from MM  $\rightarrow$  CM then its V Bit is set to 1.



## \* Direct Mapped Cache

rest. which block will be allowed to move to which cache line.

Now, cache is not 1D structure



Microprocessor  $\xrightarrow{\text{generate address}}$



(Block is not available in cache memory. So that block needs to be moved from memory to that particular cache line.)

Eff Set Associative Cache

The architecture of the Web Remoting 2D group.



- Q. A set associative cache memory consists of 128 blocks divided into 4 sets. The main mem. consists of 16384 blocks and each block contains 256 8bit words. How many bits are reqd. to address the main memory? How many bits are reqd. to identify the tag, the set and the block?

Ans- No. of blocks  $\rightarrow 2^{17} = 16384$  blocks. 8 bits = 1 byte  
 each block  $\rightarrow 256 \times 8$  bits.

$16384 \times 256 = 2^{22}$  bits  
 M.M.

To identify block  $\rightarrow 256$   
 $128$

8 bits for block identification.

No. of Block in cache = 128

$$128/4 \rightarrow 32 \text{ octo.}$$

$$\approx 32$$

$$2^5$$

5 bits reqd. for oct identification.

Bits reqd. for tag identification  $\rightarrow 22 - (5+8)$

Q. Computer has an 8gb memory with 64 bit word size. Each block of memory stores 16 words. Computer has a direct mapped cache of 128 blocks and uses word level addressing. What is the addressing format if we change the cache to a 4 way set associative cache, what is the new address format?

sol. word length  $\rightarrow$  64 bit.

each block  $\rightarrow$  16 words.

$\rightarrow 16 \times 64$  bits.

$$2^4 \times 2^6 \rightarrow 2^{10}$$

$$\text{For } 1 \text{ gb} \rightarrow 2^{30} \text{ bits. } 8 \text{ gb} \rightarrow 8 \times 2^{30} \rightarrow 2^{33} + 2^0$$



30 bits reqd. to address main memory.

Each block  $\rightarrow$  16 words.

4 bits for ~~id~~ identification [Offset].

• 4 way set ass

block identification, (128 blocks)  $\rightarrow 2^7$

(7 bits reqd)

tag group  $\rightarrow 30 - (7+4)$

in 1 way set ass  $\rightarrow \frac{128}{4} \rightarrow$  divided into 4 sets each.  
 $32 \rightarrow 2^5$ .

|     |   |   |
|-----|---|---|
| Tag | 5 | 4 |
|-----|---|---|

4/9/19

SC12 Session 10

## \* MULTI LEVEL CACHES

L<sub>1</sub> — On chip cache [hence fastest]

L<sub>2</sub> } unified / split  $\rightarrow$  it is in motherboard [slower than L<sub>1</sub>]

L<sub>3</sub> } requires data path to memory

but distance travelled is less & hence faster than mem.

L<sub>1</sub>  $\leftarrow$  instruction related info.

L<sub>2</sub>  $\leftarrow$  all other data goes to L<sub>2</sub>.

Cache map (dirty bit)

CPU

|   |   |         |
|---|---|---------|
| 1 | X | changed |
| 2 | Y | changed |
| 3 | Z |         |
| 4 | Q |         |

I/O

MM

|   |   |
|---|---|
| 1 | X |
| 2 | Y |
| 3 | Z |
| 4 | Q |

write through cache  $\rightarrow$  any change in the cache is written through MM

write back cache  $\rightarrow$  we associate dirty bit with each cache line

dirty bit

any change  $\rightarrow 1$

no change  $\rightarrow 0$

check if a block in MM is in CM & then check if

dirty bit is set = If 1, then write back of the cache is written.

(L1 + L2)

S  $\leftarrow$  (L1 + L2)  $\rightarrow$  combination add

(L1 + L2)

(L1 + L2) - Q  $\rightarrow$  result

Q1. A set associative cache consists of 8 sets divided into 2 line sets. The main memory contains 4K blocks of 128 words each. Show the format of main memory address.

Q2. Consider computer with the following characteristics. MM = 1mb, word size = 1 byte. Block size = 16 bytes. Cache size = 64 kb.

For the main memory addresses of F0010, 01234, give the corresponding tag, cache line address, and the word offset for a direct mapped cache.

| <u>Soln 1.</u> Main memory $\rightarrow$ 4K 1024 blocks.<br>$\underbrace{\hspace{10em}}$<br>128 words.<br>$= 2^{12} \times 2^{10} \times 2^7$<br>$= 2^{19}$<br><u>19 bits</u> reqd to address main mem. | Set ass. cache. $\rightarrow$ $64/2$<br>$\Rightarrow 32$<br>$= 2^5$<br><div style="border: 1px solid black; padding: 2px; display: inline-block;">5 bits.</div><br>6 blocks $\rightarrow$ 4x1024 $\Rightarrow 2^{12} \times 2^{10} \times 2^7$<br>Block offset $\rightarrow$ 128<br>$= 2^7$ .<br><u>7 bits.</u> |               |          |               |   |   |   |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|---------------|---|---|---|
| $\text{Tag} = 19 - (7+5)$<br>$= 7 \text{ bits}$                                                                                                                                                         | <table border="1" style="border-collapse: collapse; text-align: center;"> <tr> <th>Tag</th> <th>Set ass.</th> <th>Block offset.</th> </tr> <tr> <td>7</td> <td>5</td> <td>7</td> </tr> </table>                                                                                                                 | Tag           | Set ass. | Block offset. | 7 | 5 | 7 |
| Tag                                                                                                                                                                                                     | Set ass.                                                                                                                                                                                                                                                                                                        | Block offset. |          |               |   |   |   |
| 7                                                                                                                                                                                                       | 5                                                                                                                                                                                                                                                                                                               | 7             |          |               |   |   |   |

Soln 2. Main memory  $\rightarrow 1mB = 2^{20}$   
 word size  $\rightarrow 1 \text{ byte}$ .  
 block size  $\rightarrow 16 \text{ bytes} \rightarrow 2^4$   
 Cache size  $\rightarrow 64 \text{ KB}$   
 $\text{no. of cache lines} = \frac{64 \text{ KB}}{16 \text{ bytes}} = 4 \times 2^{10} = 2^{12}$ .

$$\text{Tag} = 20 - (12+4) = 4$$

| Tag | Set ass. | Block # |
|-----|----------|---------|
| 4   | 12       | 4 .     |

F0010

1111    0000 0000 0001 0000  
 Tag.                  cache line add.                  Word offset.

01234    0000 0001 0010 0011 0100