

INDIAN INSTITUTE OF ENGINEERING SCIENCE AND TECHNOLOGY,  
SHIBPUR

B.TECH 5<sup>th</sup> SEMESTER MID-TERM EXAMINATION, OCTOBER 2021

COMPUTER ARCHITECTURE AND ORGANIZATION [CS3103]

Name: Abhirup Mukherjee

Enrolment No: 510519109

G-Suite ID: 510519109.abhirup@students.iiestg.ac.in

Date : 7<sup>th</sup> October 2021

Q1) A) Given 4 frames in cache

→ Block access sequence 0, 8, 0, 6, 8

→

→ Compare direct, 2-way set associative & full associative in terms of no. of misses

(i) direct

→ Initially



(empty cache)

(no. of frame = 4)

(i) → 0 block access.

→ compulsory miss

→ 0<sup>th</sup> block in 0/1 = 0<sup>th</sup> frame

|   |     |
|---|-----|
| 0 | 0 B |
| 1 |     |
| 2 |     |
| 3 |     |

(ii) → 8 block access.

→ compulsory miss

→ 8<sup>th</sup> block goes 8/1 = 0<sup>th</sup> frame.

|   |     |
|---|-----|
| 0 | 8 B |
| 1 |     |
| 2 |     |
| 3 |     |

(iii) 0<sup>th</sup> block access

→ Conflict miss

→ 0<sup>th</sup> block goes to 0% $\cdot$ 4 = 0<sup>th</sup> frame

|    |   |
|----|---|
| OB | 0 |
|    | 1 |
|    | 2 |
|    | 3 |

(iv) 6<sup>th</sup> block access

→ compulsory miss

→ 6<sup>th</sup> block goes to 6% $\cdot$ 4 = 2<sup>nd</sup> frame

|    |   |
|----|---|
| OB | 0 |
|    | 1 |
| 6B | 2 |
|    | 3 |

(v) 8<sup>th</sup> block access

→ conflict miss

→ goes to 0<sup>th</sup> frame

|    |   |
|----|---|
| 8B | 0 |
|    | 1 |
| 6B | 2 |
|    | 3 |

∴ Total misses = 5

## I) 2 way set associative

~~2 way~~

Initial



i) 0<sup>th</sup> block

→ compulsory miss

→ goes to 01.2 = 0<sup>th</sup> set



ii) 8<sup>th</sup> block

→ compulsory miss

→ goes to 87.2 = 0<sup>th</sup> set



iii) 0<sup>th</sup> block

→ hit

iv) 6<sup>th</sup> block

→ compulsory miss

→ goes to 67.2 = 0<sup>th</sup> set

→ assuming oldest accessed block goes first.  
→ 8<sup>th</sup> block gone



v) 8<sup>th</sup> block

→ conflict / capacity miss

→ assuming oldest accessed block goes first

→ 0<sup>th</sup> block gone

∴ Total miss = 4



### III) fully associative

Initial

|  |   |
|--|---|
|  | 0 |
|  | 1 |
|  | 2 |
|  | 3 |

i) 0<sup>th</sup> block

→ compulsory miss

|    |   |
|----|---|
| OB | 0 |
|    | 1 |
|    | 2 |
|    | 3 |

ii) 3<sup>rd</sup> block

→ compulsory miss

|    |   |
|----|---|
| OB | 0 |
| BB | 1 |
|    | 2 |
|    | 3 |

iii) 0<sup>th</sup> block

→ hit

iv) 6<sup>th</sup> block

→ compulsory miss

|    |   |
|----|---|
| OB | 0 |
| BB | 1 |
| GB | 2 |
|    | 3 |

v) 8<sup>th</sup> block

→ hit

No. of miss = 3

IV) Summary : in terms of misses in this case

Full associative < 2-way set associative < direct

(B) 'False sharing miss normally occurs in a system realizing the large block size & write-invalidate scheme for cache coherence'

→ This happens because in \* large block size frame, ~~the~~ modification to one value in block invalidates whole block for other processes. This leads to misses for other values in block, even if they are unmodified

Eg



| activity                                                    | T | C <sub>1</sub>        | C <sub>2</sub>        | remark                                                     |
|-------------------------------------------------------------|---|-----------------------|-----------------------|------------------------------------------------------------|
| ① initial                                                   |   | $X_1 = 20$ $X_2 = 10$ | $X_1 = 20$ $X_2 = 10$ |                                                            |
| ② P <sub>1</sub> write $X_1$<br>→ invalidate C <sub>2</sub> |   | $X_1 = 30$ $X_2 = 10$ | —                     |                                                            |
| ③ P <sub>2</sub> read $X_2$<br>→ miss (false)               |   | $X_1 = 30$ $X_2 = 10$ | $X_1 = 30$ $X_2 = 10$ | got read miss for $X_2$<br>even when $X_2$ not modified    |
| ④ P <sub>2</sub> write $X_2$                                |   | —                     | $X_1 = 30$ $X_2 = 20$ |                                                            |
| ⑤ P <sub>1</sub> <del>write</del> $X_1$<br>→ write miss     |   | $X_1 = 40$ $X_2 = 20$ | —                     | got write miss for $X_1$ ,<br>even when $X_1$ not modified |

Q.) c) → three processor ~~C~~  $P_1, P_2, P_3$   
with caches  $C_1, C_2, C_3$

→ caches can accommodate one block at a time.

→ MM stores two block  $B_1 \& B_2$



→ Snoopy MESI protocol [write-invalidate]

→ initially  $C_1, C_2, C_3$  empty

Activity

|                                                                         | $C_1$                        | $C_2$ | $C_3$                       | MM    |       |       |                     |         |
|-------------------------------------------------------------------------|------------------------------|-------|-----------------------------|-------|-------|-------|---------------------|---------|
|                                                                         | date                         | state | date                        | state | date  | state | $B_1$               | $B_2$   |
| ① initial                                                               | -                            | I     | -                           | I     | -     | I     | *valid              | valid   |
| ② $P_1$ read $x$<br>→ read miss                                         | <del><math>B_1</math></del>  | E     | -                           | I     | -     | I     | valid               | valid   |
| ③ $P_2$ read $x$<br>→ read miss                                         | $B_1$                        | S     | $B_1$                       | S     | -     | I     | valid               | valid   |
| ④ $P_2$ update $y = y + 500$<br>→ write hit<br>→ invalidate             | -                            | I     | $B_1$<br>$x=100$<br>$y=700$ | M     | -     | I     | invalid             | valid   |
| ⑤ $P_3$ read $X$<br>→ read miss<br>→ $P_2$ gives $X$ to MM<br>→ $P_3$   | -                            | I     | $B_1$                       | S     | $B_1$ | S     | $x=100$<br>$y=700$  | $z=400$ |
| ⑥ $P_1$ updates $x = x - 200$<br>→ read miss<br>→ then write-invalidate | $B_1$<br>$x=-100$<br>$y=700$ | M     | -                           | I     | -     | I     | invalid             | valid   |
| ⑦ $P_3$ read $y$<br>→ read miss<br>→ $P_1$ gives $B_1$ to MM & $P_3$    | $B_1$                        | S     | -                           | I     | $B_1$ | S     | $x=-100$<br>$y=700$ | valid   |



Q2) A) 5 state pipe → IF, ID/OF, EX, MEM, WB



2) date dependencies

- (i)  $I_1 I_2 \rightarrow \text{RAW}$
  - (ii)  $I_2 I_3 \rightarrow \text{WAR}$

b) If no date forwarding implemented, there will be date hazard

$I_2$  reads  $R_1$  before  $R_1$  set by  $I_1$

) possible solution with stall & data forwarding.



2) B) outer loop 10, inner loop 20 time



a) always branch taken

outer loop  $\rightarrow$  10 time right, 1 wrong.

inner loop  $\rightarrow$  20 time right, 1 wrong.

$$\text{Total} = \cancel{22} + \underbrace{(20+1) \times 10}_{\text{incorrect}} + 1 + 10 = 221$$

Correct = 210

$$\text{Accuracy} = \frac{210}{221} = 95\%.$$

b) 1 bit history

$\rightarrow$  for all inner loop, branch taken, except last

$\rightarrow$  pred = 'not taken' after 20 iter inner loop

$\rightarrow$  the mispredict pred = taken again

$\rightarrow$  So in 9 iter of outer loop  $\rightarrow$  wrong = 2 + 9 = 11

$\rightarrow$  in last iter  $\rightarrow$  20 correct, 1 incorrect, 1 correct

no. of wrong = 19

$$\text{Accuracy} = \frac{221 - 19}{221} = 91\%.$$

2) c)

given original code.

I<sub>1</sub> : load R<sub>1</sub>, XI<sub>2</sub> : decrement R<sub>2</sub>, II<sub>3</sub> : Br zero R<sub>2</sub>, I<sub>5</sub>I<sub>4</sub> : Add R<sub>3</sub>, R<sub>4</sub>I<sub>5</sub> : Sub R<sub>5</sub>, R<sub>6</sub>

2) before branch prediction.

I<sub>1</sub> : load R<sub>1</sub>, XI<sub>3</sub> : Br zero R<sub>2</sub>, I<sub>5</sub> ↗ swapped.I<sub>2</sub> : decrement R<sub>2</sub>, R<sub>1</sub>I<sub>4</sub> : Add R<sub>3</sub>, R<sub>4</sub>I<sub>5</sub> : Sub R<sub>5</sub>, R<sub>6</sub>

b) Target address

I<sub>1</sub> : load R<sub>1</sub>, XI<sub>2</sub> : decrement R<sub>2</sub>, II<sub>3</sub> : Br zero R<sub>2</sub>, I<sub>5</sub>I<sub>5</sub> : Sub R<sub>5</sub>, R<sub>6</sub>I<sub>4</sub> : Add R<sub>3</sub>, R<sub>4</sub>

→ for predict taken scheme

→ both ② &amp; ③ perform equally well