

# CSE-312 COMPUTER ARCHITECTURE

YASH VERMA

19075083

CSE B.tech

1.)

(A) Cache size =  $2^R$

My roll no. = 19075083

$$\text{Now, } R = 1 + (3 \% 3)$$
$$= 1$$

: Cache size =  $2^1 = 2$

→ Initial State:

| Index | V | Tag | Data |
|-------|---|-----|------|
| 0     | 0 |     |      |
| 1     | 0 |     |      |

→ Access 000100 → Cache Miss

| Index | V | Tag    | Data               |
|-------|---|--------|--------------------|
| 0     | 1 | 000102 | memory<br>(000102) |
| 1     | 0 |        |                    |

→ Access 100101 → Cache Miss

| Index | V | Tag                | Data                         |
|-------|---|--------------------|------------------------------|
| 0     | 1 | 00010 <sub>2</sub> | Memory(000100 <sub>2</sub> ) |
| 1     | 1 | 10010 <sub>2</sub> | Memory(100101 <sub>2</sub> ) |

→ Access 010110 → Cache Miss

| Index | V | Tag                | Data                          |
|-------|---|--------------------|-------------------------------|
| 0     | 1 | 01011 <sub>2</sub> | Memory(010110 <sub>2</sub> )  |
| 1     | 1 | 10010 <sub>2</sub> | Memory(0100101 <sub>2</sub> ) |

→ Access 001111 → Cache Miss

| Index | V | Tag                | Data                         |
|-------|---|--------------------|------------------------------|
| 0     | 1 | 01011 <sub>2</sub> | Memory(010110 <sub>2</sub> ) |
| 1     | 1 | 00111 <sub>2</sub> | Memory(001111 <sub>2</sub> ) |

→ Access 101110 → Cache Miss

| Index | V | Tag                | Data                         |
|-------|---|--------------------|------------------------------|
| 0     | 1 | 10111 <sub>2</sub> | Memory(101110 <sub>2</sub> ) |
| 1     | 1 | 00111 <sub>2</sub> | Memory(001111 <sub>2</sub> ) |

(1)

(B) 2-way set associative caching

Initially: Way - 1

Way - 0

| Set | LRU | V | Tag | V | Tag |
|-----|-----|---|-----|---|-----|
| 000 | 0   | 0 |     | 0 |     |
| 001 | 0   | 0 |     | 0 |     |
| 010 | 0   | 0 |     | 0 |     |
| 011 | 0   | 0 |     | 0 |     |
| 100 | 0   | 0 |     | 0 |     |
| 101 | 0   | 0 |     | 0 |     |
| 110 | 0   | 0 |     | 0 |     |
| 111 | 0   | 0 |     | 0 |     |

Memory - address accesses:

(i) 000100 - cache miss

| Set | LRU | V | Tag | V | Tag              |
|-----|-----|---|-----|---|------------------|
| 100 | 1   | 0 |     | 1 | 000 <sub>2</sub> |

(ii) 100101 - Cache miss

| Set | LRU | V | Tag | V | Tag              |
|-----|-----|---|-----|---|------------------|
| 101 | 1   | 0 |     | 1 | 100 <sub>2</sub> |

(iv) 010110 - Cache Miss

Update in Table: Way - 1 . Way - 0

| Set | LRU | V | Tag | V | Tag     |
|-----|-----|---|-----|---|---------|
| 110 | 1   | 0 |     | 1 | $010_2$ |

(iv) 001111 - Cache miss

| Set | LRU | V | Tag | V | Tag     |
|-----|-----|---|-----|---|---------|
| 111 | 1   | 0 |     | 1 | $001_2$ |

(v) 101110 - Cache Miss

| Set | LRU | V | Tag     | V | Tag     |
|-----|-----|---|---------|---|---------|
| 110 | 0   | 1 | $101_2$ | 1 | $010_2$ |

Final Table:

| Set | LRU | Way - 1 |           | Way - 0 |           |
|-----|-----|---------|-----------|---------|-----------|
|     |     | V       | Tag       | V       | Tag       |
| 000 | 0   | 0       | $(101)_2$ | 0       |           |
| 001 | 0   | 0       | $(101)_2$ | 0       |           |
| 010 | 0   | 0       | $(101)_2$ | 0       |           |
| 011 | 0   | 0       | $(101)_2$ | 0       |           |
| 100 | 1   | 0       | $(101)_2$ | 1       | $(000)_2$ |
| 101 | 1   | 0       | $(101)_2$ | 1       | $(100)_2$ |
| 110 | 0   | 1       | $(101)_2$ | 1       | $(010)_2$ |
| 111 | 1   | 0       | $(101)_2$ | 1       | $(001)_2$ |

4-way set associative Caching

| Set | Way 3 |   | Way 2 |   | Way 1 |   | Way 0 |   |     |
|-----|-------|---|-------|---|-------|---|-------|---|-----|
|     | LRU   | V | Tag   | V | Tag   | V | Tag   | V | Tag |
| 00  | 0     | 0 |       | 0 |       | 0 |       | 0 |     |
| 01  | 0     | 0 |       | 0 |       | 0 |       | 0 |     |
| 10  | 0     | 0 |       | 0 |       | 0 |       | 0 |     |
| 11  | 0     | 0 |       | 0 |       | 0 |       | 0 |     |

iii) Access 000100 - Cache Miss

| Set | LRU | V | Tag | V | Tag | V | Tag | V | Tag                 |
|-----|-----|---|-----|---|-----|---|-----|---|---------------------|
| 00  | 1   | 0 |     | 0 |     | 0 |     | 1 | (0001) <sub>2</sub> |
| 01  | 0   | 0 |     | 0 |     | 0 |     | 0 |                     |
| 10  | 0   | 0 |     | 0 |     | 0 |     | 0 |                     |
| 11  | 0   | 0 |     | 0 |     | 0 |     | 0 |                     |

iv) Access 100101 - Cache Miss

| Set | LRU | V | Tag | V | Tag | V | Tag | V | Tag                 |
|-----|-----|---|-----|---|-----|---|-----|---|---------------------|
| 00  | 1   | 0 |     | 0 |     | 0 |     | 1 | (0001) <sub>2</sub> |
| 01  | 1   | 0 |     | 0 |     | 0 |     | 1 | (1001) <sub>2</sub> |
| 10  | 0   | 0 |     | 0 |     | 0 |     | 0 |                     |
| 11  | 0   | 0 |     | 0 |     | 0 |     | 0 |                     |

(iii) Access 010110 - Cache Miss

|     | Way-3 |   |     | Way-2 |     | Way-1 |     | Way-0 |            |
|-----|-------|---|-----|-------|-----|-------|-----|-------|------------|
| Set | LRU   | V | Tag | V     | Tag | V     | Tag | V     | Tag        |
| 00  | 1     | 0 |     | 0     |     | 0     |     | 1     | $(0001)_2$ |
| 01  | 1     | 0 |     | 0     |     | 0     |     | 1     | $(1001)_2$ |
| 10  | 1     | 0 |     | 0     |     | 0     |     | 1     | $(0101)_2$ |
| 11  | 0     | 0 |     | 0     |     | 0     |     | 0     |            |

(iv) Access 001111 - Cache Miss

| Set | LRU | V | Tag | V | Tag | V | Tag | V | Tag        |
|-----|-----|---|-----|---|-----|---|-----|---|------------|
| 00  | 1   | 0 |     | 0 |     | 0 |     | 1 | $(0001)_2$ |
| 01  | 1   | 0 |     | 0 |     | 0 |     | 1 | $(1001)_2$ |
| 10  | 1   | 0 |     | 0 |     | 0 |     | 1 | $(0101)_2$ |
| 11  | 1   | 0 |     | 0 |     | 0 |     | 1 | $(0011)_2$ |

(v) Access 101110 - Cache Miss

| Set | LRU | V | Tag | V | Tag | V | Tag | V | Tag        |
|-----|-----|---|-----|---|-----|---|-----|---|------------|
| 00  | 1   | 0 |     | 0 |     | 0 |     | 1 | $(0001)_2$ |
| 01  | 1   | 0 |     | 0 |     | 0 |     | 1 | $(1001)_2$ |
| 10  | 2   | 0 |     | 0 |     | 0 |     | 1 | $(0101)_2$ |
| 11  | 0   | 0 |     | 0 |     | 1 |     | 1 | $(0101)_2$ |
|     |     |   |     |   |     | 0 |     |   | $(0011)_2$ |

(2)

(A) Page size =  $2^R$  KB

R = Last digit of Roll No.  $(19075083) \% 3$ .

$$= 3 \% 3$$

$$\boxed{R = 0}$$

$$\text{Page size} - 2^0 = 1 \text{ KB} = 2^{10} = 1024$$

Entry in Page table = 4 bytes

We have 64-bit addressing scheme

$$\rightarrow \text{No. of page table entries} = \frac{\text{address space size}}{\text{Page size}}$$
$$= \frac{2^{64}}{2^{10}} = 2^{54}$$

$\rightarrow$  Size of page table = No. of entries  $\times$  entry size

$$= 2^{54} \times 4$$

$$= 2^{54} \times 2^2$$

$$= 2^{56} \text{ bytes}$$

(2) Page Offset = 12 Page size = 1 kB =  $2^{10}$  bytes

(B)



Page Table: Page size 1kB, virtual address space 4 GB, physical memory 1 GB.

(3)  
(A)



Given Instruction:

add \$1, \$2, \$3

Steps for implementation of Single Cycle:

- First we need to fetch the instruction, we need adder, register to data access , ALU for performing arithmetic operations and some multiplexers , memory .
- After setting datapath for single cycle implementation . Instruction memory read address of program counter and in decode phase . Instructions are decoded to loaded to registers .
- \$2 register is read first then \$3 register is read for data 2 . destination register is given by instruction [15:11] .
- \$2 and \$3 values are loaded to ALU and with help of ALU OP , operation is performed .
- Now , ALU result will be stored in register instead of data memory .
- PC will be increased simultaneously .

Q3(B)  
sw \$t5,8(\$t3) in multi-cycle implementation



Multiple Cycle Approach requires

- single memory for data & instruction
- single ALU, no extra address
- extra register to hold data between cycles.

We break the instructions into following potential execution steps:

- Instruction fetch and PC increment (IF)
- Instruction decode & Register fetch (ID)
- Execution, memory address, concatenation or branch completion. (Ex)
- Memory access or R-type instruction completion (MEM)
- Memory write completion

(4)

(A) functional unit delays:

memory - 3 ns

ALU and address - 3 ns

FPU add - 8 ns

FPU multiply - 16 ns

Register file access (read/write) - 2 ns

$$R = 1 + (3 \times 5) = 1 + 3 \boxed{R = 4} \quad [\text{Rate no. } 19075083]$$

Instruction:

Instruction memory  $\rightarrow$  A

Register read  $\rightarrow$  B

ALU operation  $\rightarrow$  C

Data Memory  $\rightarrow$  D

Register Write  $\rightarrow$  E

FPU add  $\rightarrow$  F (<sup>or</sup> ~~and~~ sub)

FPU multiply  $\rightarrow$  G (or divide)

| <u>Instruction</u> | A | B | C | D | E  | F | G | Total |
|--------------------|---|---|---|---|----|---|---|-------|
| Load               | 3 | 2 | 3 | 3 | 2  |   |   | 13    |
| Store              | 3 | 2 | 3 | 3 |    |   |   | 11    |
| R-format           | 3 | 2 | 3 | 0 | 2  |   |   | 10    |
| Branch             | 3 | 2 | 3 |   |    |   |   | 8     |
| Jump               | 3 |   |   |   | 2  |   |   | 3     |
| FP add/sub         | 3 | 2 |   | 2 | 8  |   |   | 15    |
| FP mul/divide      | 3 | 2 |   | 2 | 16 |   |   | 23    |

(a) a single cycle implementation using a fixed period clock.

Ans

Ans: longest time (of an instruction)  
= i.e. 23

(b) Variable Period clock:

$$\begin{aligned}
 &= 13 \times 30\% \text{ (load)} + 11 \times 20\% \text{ (store)} + 10 \times 27\% \text{ (R-format)} \\
 &+ 8 \times R\% \text{ (branch)} + 3 \times (7-R)\% \text{ (Jump)} + 15 \times 8\% \text{ (add)} \\
 &+ 23 \times 8\% \text{ (mul/div)}
 \end{aligned}$$

Now,  $R \in S$ ,  $R = 4$

$$\begin{aligned} &= 13 \times \frac{3}{10} + 11 \times \frac{2}{10} + 10 \times \frac{27}{100} + 8 \times \frac{4}{100} + 3 \times \frac{3}{100} \\ &\quad + 15 \times \frac{8}{100} + 23 \times \frac{8}{100} \quad , \quad , \\ &\approx 3.9 + 2.2 + 2.7 + 0.32 + 0.09 + 1.2 + 1.84 \\ &\approx 12.25 \text{ ns} \end{aligned}$$

Ans: 12.25 ns

Comparison:

$$\frac{\text{Performance}(a)}{\text{Performance}(b)} = \frac{23}{12.25} = 1.879$$

$$\approx 1.88$$

(4)  
(b) In computing, Page Fault is an exception that the memory management unit raises when a process accesses a memory page without proper preparations.

A page fault occurs when a program attempts to access data or code that is ~~in~~ in its address space, but is not currently located in the system RAM. So, when page fault occurs, following sequence of events occur:

- The Computer hardware traps to the kernel and program counter (PC) is saved on the stack. Current instruction state & information is saved in CPU registers.
- An assembly program is started to save the general registers and other volatile ~~in~~ information to keep the OS from destroying it.
- OS finds that page fault has occurred and tries to find out which virtual page is needed.
- Once virtual address caused page fault is known, system checks to see if address is valid and checks if there is no protection access problem.

- If virtual address is valid, the system checks to see if a page frame is free.  
If no frames are free, the page replacement algorithm is run to remove a page.
- If frame selected is dirty, page is rescheduled for transfer to disk, context switch takes place, fault process is suspended and another process is made to run until disk transfer is completed.
- As soon as, page frame is clean, <sup>OS</sup> look up disk address where needed page is, schedules disk operation to bring it in.
- When disk interrupt indicates page has arrived, page tables are updated to reflect its position, and frame marked as being in normal state.
- Faulting instruction is backed upto state it had when it began and PC is reset. Faulting is scheduled, OS returns to subroutine that called it.
- Assembly routine unloads registers and other state information, returns to user space to continue execution.

(5)

(A) For this consider the following Computational Task :-

To compute the dot product of 2 vectors:-

Let  $V_1$  and  $V_2$  be the vectors of length  $l$ . Then, their dot product is defined as

$$\therefore \text{Dot Product} = \sum_{i=0}^{n-1} V_1(i) \times V_2(i)$$

Thus, the program for computing the dot product of 2 vectors is:-

|              |                           |                                          |
|--------------|---------------------------|------------------------------------------|
| Move         | $\# V_1 \text{ Vec}, R_1$ | ; $R_1$ points to vector $V_1$           |
| Move         | $\# V_2 \text{ Vec}, R_2$ | ; $R_2$ points to vector $V_2$           |
| Move         | $\# N, R_3$               | ; $R_3$ serves as Counter                |
| Clear        | $R_0$                     | ; $R_0$ accumulates the dot product      |
| Loop Move    | $(R_1)+, R_4$             | ; Compute the product of next components |
| Multiply     | $(R_2)+, R_4$             | ; Components                             |
| Add          | $R_4, R_0$                | ; Add to previous sum                    |
| Decrement    | $R_3$                     | ; Decrement the Counter                  |
| Branch $> 0$ | Loop                      | ; Loop again if not done                 |
| Move         | $R_0, \text{DOTPROD}$     | ; store dot Product in memory.           |

→ The above program computes the dot product and stores the result in memory to location DOTPROD. The

The first elements of each vector,  $V_1(0)$  and  $V_2(0)$ , are stored at memory locations  $V_1\text{Vec}$  and  $V_2\text{Vec}$ , with the remaining elements in the following word location.

The above programs was 5 registers. Instead of using  $R_0$  to accumulate the sum, the sum can be accumulated directly into DOTPROD.

This means that the last Move instruction in the program can be removed, but DOTPROD is read and written on each pass through the loop, significantly increasing memory access.

The 4 registers,  $R_1, R_2, R_3$  and  $R_4$ , are still needed to make this program efficient, and they are all used in the loop. Suppose that  $R_1$  and  $R_2$  are retained as pointers to the  $A \& V_1$  and  $V_2$  vectors, counter register  $R_3$  and temporary storage register  $R_4$ . Could be replaced by memory location in a 2-52-registers machine, but the number of memory access would increase significantly.

Hence, we can say - "Having a large number of processor registers makes it possible to reduce the no. of memory accesses needed to perform complex tasks."

5. b) \$s0 holds the value  $128_{\text{ten}}$ .

(i) Instruction:

add \$t0, \$s0, \$s1

Consider the following instruction set:

add \$t0, \$s0, \$s1

The memory allocated to one instruction is 4 bytes.

The range of the number can be calculated as  $2^{n-1}$ .

n = no. of bits

n = 32

The range is  $2^{31} - 1$  to  $-2^{31}$

The range is 2,147,483,647 to -2,147,483,648

The overflow condition is as follows:

Range of \$s1 =  $2,147,483,647 - 128$   
 $= 2147483519$

After this value, the condition will overflow.

Thus, the overflow range of \$s1 is as follows:

Overflow range (\$s1) = [2,147,483,520] to [2,147,483,647]

(ii) Value of  $\$50 = 128_{10}$

Consider the following instruction set:

sub \$t0, \$50, \$s1

The memory allocated to one instruction is 4 bytes.

The range of number can be calculated as  $2^{n-1}$

$$n=32$$

The range is  $2^{31}-1$  to  $-2^{31}$

The range is 2,147,483,647 to -2,147,483,648

The overflow condition is as follows:

$$\begin{aligned} \text{Range of } \$s1 &= -2,147,483,648 + 1.28 \\ &= -2,147,483,8520 \end{aligned}$$

Thus, overflow range of \$s1 is:

Overflow range ( $\$s1$ ) = [-2,147,483,648 to -2,147,483,8520]