

# Computer Architecture - Unit 2

Final Exam, 22.Jul.2021, 10:30AM

Name: \_\_\_\_\_ Last name: \_\_\_\_\_

Infostud ID: \_\_\_\_\_

**Exercise 1.** How many clock cycles are needed to complete the following code on the Risc-V pipelined architecture? (Figure in the next page.) Remember that:

- Execution completes when the last instruction exits the pipeline.
- If you read and write the same register of the register file in the same clock cycle, the value that is read is the value that is being written.

Show a figure of the pipeline with bubbles and forwardings.

```
sub s1, s0, s0
lw s0, 4(s1)
lw s1, 8(s0)
add s1, s3, s3
sw s1, 12(s2)
sub s3, s1, s0
lw s7, 0(s3)
add s1, s3, s3
lw s5,4(s7)
```



sub S<sub>1</sub>, S<sub>0</sub>, S<sub>0</sub>

lw S<sub>0</sub>, 4 (S<sub>1</sub>)

lw S<sub>1</sub>, 8 (S<sub>0</sub>)

add S<sub>1</sub>, S<sub>3</sub>, S<sub>3</sub>

sw S<sub>1</sub>, 12 (S<sub>2</sub>)

sub S<sub>3</sub>, S<sub>1</sub>, S<sub>0</sub>

lw S<sub>7</sub>, 0 (S<sub>3</sub>)

add S<sub>1</sub>, S<sub>3</sub>, S<sub>3</sub>

lw S<sub>5</sub>, 4 (S<sub>7</sub>)



**Exercise 2.** Consider a two-way associative cache with two sets and block of 8 words. The cache is initially empty and replacement of blocks is done by using LRU. Show which of the following memory accesses are hit and which are misses: 16, 136, 184, 252, 160, 80, 252, 80, 48, 136, 60, 204, 100, 88, 196, 60, 32, 176, 80, 224.

2 - ways

2 sets

32 bytes block size

|       |       |   |
|-------|-------|---|
| ∅ 2 6 | 4 2   | 0 |
| ∅ X 7 | 7 8 5 | 1 |

$$M \quad 16:32 = 0 \quad \checkmark \\ 0 \text{ m2} = 0$$

$$H \quad 136:32 = 4 \quad \checkmark \\ 4 \text{ m2} = 0$$

$$H \quad 80:32 = 2 \quad \checkmark \\ 2 \text{ m2} = 0$$

$$M \quad 136:32 = 4 \quad \checkmark \\ 4 \text{ m2} = 0$$

$$H \quad 60:32 = 1 \quad \checkmark \\ 1 \text{ m2} = 1$$

$$H \quad 224:32 = 7 \quad \checkmark \\ 7 \text{ m2} = 1$$

$$M \quad 184:32 = 5 \quad \checkmark \\ 5 \text{ m2} = 1$$

$$H \quad 204:32 = 6 \quad \checkmark \\ 6 \text{ m2} = 0$$

$$M \quad 252:32 = 7 \quad \checkmark \\ 7 \text{ m2} = 1$$

$$H \quad 100:32 = 3 \quad \checkmark \\ 3 \text{ m2} = 1$$

$$H \quad 160:32 = 5 \quad \checkmark \\ 5 \text{ m2} = 1$$

$$M \quad 88:32 = 2 \quad \checkmark \\ 2 \text{ m2} = 0$$

$$M \quad 80:32 = 2 \quad \checkmark \\ 2 \text{ m2} = 0$$

$$H \quad 196:32 = 6 \quad \checkmark \\ 6 \text{ m2} = 0$$

$$H \quad 252:32 = 7 \quad \checkmark \\ 7 \text{ m2} = 1$$

$$H \quad 60:32 = 1 \quad \checkmark \\ 1 \text{ m2} = 1$$

$$H \quad 80:32 = 2 \quad \checkmark \\ 2 \text{ m2} = 0$$

$$H \quad 32:32 = 1 \quad \checkmark \\ 1 \text{ m2} = 1$$

$$M \quad 48:32 = 1 \quad \checkmark \\ 1 \text{ m2} = 1$$

$$M \quad 176:32 = 5 \quad \checkmark \\ 5 \text{ m2} = 1$$

Exercise 3. Consider the following code: **single clock architecture**

*a node is 8 bytes, and at every iteration we handle a node.*

```
.data
n000: .word 8, n001      # list of 1024 integers
n001: .word -3, n002
n002: .word 5, n003
[...]
```

lista: .word n000 *pointer to the head*      VM page faults

```
.text
    lui s0, 0x10012
    li s1, 0(s0)
    li s2, 0
loop: beq s1, zero, fine
      lw s1, 4(s1)
      addi s2, s2, 1
      bne s1, zero, loop
fine: ecall
      li a7, 10
      ecall
```

$$4\text{KB} \cdot 2.048 = 8.192$$



3 pages +  
1 page (code)

---

4

This program counts the number of nodes in the list.

1. **Question A:** what is the approximate total miss rate if you have an instruction cache and a data cache, both one-way associative with 8 blocks of 64 bytes?
2. **Question B:** What is the speed-up (how faster is it) if we use the Risc-V multiple issue architecture with loop unrolling of 4 loops instead of the standard pipelined architecture? Show the code.

### Question A

instruction cache 0% miss rate



Singly linked list (Just to recall how it's made)



block capacity  $\rightarrow$  64 bytes      1 node = 8 bytes ( $\times 2$  words)

8 nodes fit in a block

We handle 1 node (8 bytes) at every iteration, so after 8 loops there will be a block miss of the DATA cache.

|     | instr. | DATA |  |
|-----|--------|------|--|
| it0 | H      | M    |  |
| it1 | H      | H    |  |
| it2 | H      | H    |  |
| :   |        |      |  |
| it8 | H      | M    |  |

} every 8 iterations there is 1 miss  
DATA miss  $\rightarrow \frac{1}{8} = 12.5\%$   
TOT. miss rate =  $\frac{1 \cdot \frac{1.024}{8}}{5 \cdot 1.024} = 0.025 (2.5\%)$

## Question B

beq S<sub>1</sub>, zero, fine | lw S<sub>1</sub>, 4(S<sub>1</sub>) 0

addi S<sub>2</sub>, S<sub>2</sub>, 1 | 1

beq S<sub>1</sub>, zero, fine | lw S<sub>1</sub>, 4(S<sub>1</sub>) 2

addi S<sub>2</sub>, S<sub>2</sub>, 1 | 3

beq S<sub>1</sub>, zero, fine | lw S<sub>1</sub>, 4(S<sub>1</sub>) 4

addi S<sub>2</sub>, S<sub>2</sub>, 1 | 5

beq S<sub>1</sub>, zero, fine | lw S<sub>1</sub>, 4(S<sub>1</sub>) 6

addi S<sub>2</sub>, S<sub>2</sub>, 1 | 7

bne S<sub>1</sub>, zero, loop | 8

```
.text
    lui s0, 0x10012
    li s1, 0(s0)
    li s2, 0
loop:
    beq s1, zero, fine
    lw s1, 4(s1)
    addi s2, s2, 1
    bne s1, zero, loop
fine:
    ecall
    li a7, 10
    ecall
```

speed-up

$$\frac{4 \cdot 1.024}{9 \cdot \frac{1024}{4}} = 1.7 \times \text{faster}$$

