

---

# Discussion 8

Feb 27



# Outline

- DINOCPU assignment
- VM quiz

---

## What is the new condition?

- ① data memory is processing current memory request
  - ② inst memory is processing current memory request.
- +
- ③ branch/jump is taken
  - ④ load-to-use hazard.

case that only one condition is happening.

|                           | F                                                                                       | D     | E     | M     | W                                                                                         |
|---------------------------|-----------------------------------------------------------------------------------------|-------|-------|-------|-------------------------------------------------------------------------------------------|
| a. ① data mem<br>is busy. | stall                                                                                   | stall | stall | stall | stall  |
| b. ② inst mem<br>is busy  | stall  |       |       |       |                                                                                           |
| c. ③ branchy<br>jump.     | PC from<br>taken                                                                        | flush | flush | flush |                                                                                           |
| d. ④ load-to<br>use.      | Stall                                                                                   | stall | flush |       |                                                                                           |

case when ① data memory is busy.

|                            | F     | D     | E     | M     | W     |
|----------------------------|-------|-------|-------|-------|-------|
| e. ② <sup>inst never</sup> | stall | Stall | Stall | stall | stall |
| f. ② ③                     | stall | Stall | Stall | stall | stall |
| g. ② ④                     | stall | Stall | Stall | stall | stall |
| h. ③                       | stall | Stall | Stall | stall | stall |
| i. ④                       | stall | Stall | Stall | stall | stall |

case when ② inst memory.. is busy

|                    | F             | D     | E     | M                                                                                         | W            |
|--------------------|---------------|-------|-------|-------------------------------------------------------------------------------------------|--------------|
| j. ③ branchy jump. | PC from taken | flush | flush | stall  | taken jump c |
| k ④ load-to-use.   | stall         | stall | flush |                                                                                           |              |



# VM quiz

## Question 1

1 pts

Which of the following are reasons why we want to virtualize processes?

To increase performance of the process



Run a process on a machine with a larger amount of physical memory than the ISA-defined address space

Share physical hardware between multiple processes

View the system as if each process was running on its own

Isolate each process's data

Run a process on a machine with a smaller amount of physical memory than the ISA-defined address space

Protect one process from reading or writing another processes data

## Question 2

1 pts

What controls the virtual to physical address translation?

- The operating system
- The compiler
- The user
- The hardware
- The process

### Question 3

1 pts

When implementing segmentation, each segment has three registers, a base, a bounds, and an offset. Are these virtual or physical addresses?

Base: [ Select ]

Virtual

Bounds [ Select ]

Virtual

Offset [ Select ]

Physical

#### Question 4

1 pts

Fill in the sentence below to make it true.

The segmentation registers [ Select ] are.  part of the architectural state and

[ Select ] must  be saved when doing a context switch. They

[ Select ] are  part of the ISA.

## Question 5

2 pts

Assume the following flat translation table. The following is given like a C array (e.g., the 0th entry is the leftmost entry).



Assuming the page size is 64KiB, translate the following addresses. Give your answers in hex with no leading 0s. (E.g., Oxabcd)

$$64\text{KiB} = 2^{16} \text{ byte}$$

$$64\text{KiB} = 64 \times 1024 \text{ byte}$$

$$\log(64 \times 1024) = 2^{16}$$

16 bits



What is the size of the virtual address space in KiB?

$$4 \times 64\text{KiB} = 256\text{KiB}$$

2  
len(VAaddr)

## Question 6

2 pts



Use the following information to answer the questions below.

The base page size is 1KiB.

Each PTE is 32 bits. = 4 bytes

- There are 4 levels in the page table.
- Every level (chunk) of the page table is the same size as the base page size (like in rv32, rv64, and x86).

$$1\text{KiB} = \frac{2^{10} \text{ bytes}}{2^2 \text{ bytes}} = 2^8$$

Bits per index

The number of bits to index each level of the page table:

$$2^{10} \text{ bytes} = 1 \text{ KiB.}$$

$$2^{40} \text{ bytes} = 1 \text{ tebibytes.}$$

Number of levels

What is the size of the virtual address space. Give your answer in tebibytes (TiB).

$$\underbrace{2^8 \times 2^8 \times 2^8 \times 2^8}_{2^{40} \text{ bytes}} = 2^{32} \times 1 \text{ KiB}$$

$$\frac{2^{32} \times 2^{10} \text{ bytes}}{2^{40} \text{ bytes/tebibytes}} = 2^2 = 4 \text{ TiB}$$

## Question 7

1 pts

A system has the following characteristics: \* The base page size is 1KB. \* Each PTE is 64 bits. \* There are four levels in the page table. \* Every level (chunk) of the page table is the same size as the base page size (like in rv32, rv64, and x86).

Compared to rv32 with a base page size of 4KB, how would you expect this new address translation design to compare?

Select all that are true.

[rv32 has 2 L page table  
4kib page size.]

- There would be less fragmentation in memory with the new design compared to rv32.
- The overhead from the page table would be higher with this design. I.e., this new design requires more memory than the rv32 design.
- For the same size TLB, the new design would have more misses than rv32.
- The new design requires fewer memory accesses for every TLB miss than rv32.

## Question 8

2 pts

For this question, use the following assumptions:

- Memory latency is 70 cycles
- The average page walk time is 200 cycles
- The TLB latency is 0 cycles (it is fully pipelined with the L1 cache access)
- The TLB hit ratio is 97%
- The L1 cache has a hit ratio of 90%
- The L1 cache has a hit time of 2 cycle

What is the AMAT of this system in cycles? (Note, correct answers within 0.5 cycles will be counted correct.)

AMAT without translation.

$$= 2 + 10\% (70) = 9$$

15. AMAT with translation.

$$= 9 + 3\% (200) = 15 \text{ cycles}$$

AMAT with translation.

$$97\% (2 + 10\% (70))$$

$$+ 3\% (200 + (2 + 10\% (70)))$$

The following table shows a map of a subset of memory. Since I can't show you all 4GiB of memory, I have shown a few addresses and the 4 bytes stored starting at that address.

Some information about this system:

- The page size is 64KiB  $2^{16}$
- The physical address size is 32 bits
- The virtual address size is 26 bits
- There is a two level page table (each level is the same width)
- All PTEs are 32 bits.
- The internal PTEs only contains the address for the next level
- The leaf PTEs only contain the PPN, not the full address

Physical address 4 bytes of data

|              |            |
|--------------|------------|
| 0xffd86e68   | 0x51c3775c |
| 0xffd81c04   | 0x000013fc |
| 0xc8d0143c   | 0x73e9e000 |
| ① 0xc8d0141c | 0xb50ce400 |
| 0xc8d01418   | 0xc81c9000 |
| 0xc8d0140c   | 0xffd81c00 |
| 0xc8d01400   | 0x743fec00 |
| 0xc81c903c   | 0x000012a0 |
| 0xc81c6eac   | 0xeedbc44  |
| ② 0xb50ce428 | 0x00002afa |
| 0xb50ce240   | 0xfaf2eb54 |
| 0xb08cec78   | 0xd760ae7c |
| 0x743fec20   | 0x00002e82 |
| 0x743fb9cc   | 0xf6d699f8 |
| 0x73ea3408   | 0x874d8b50 |
| 0x73e9e068   | 0x00000258 |

0xb50ce428

PAddr.

PPN. 0x2afaec78

effective addr.  
0x 00 ea ec 78



11100  
0x1C  
VPN1  
0x8de141C  
101000  
0x28  
VPN2.

|              |
|--------------|
| 0x5a65ec78   |
| 0x526aec78   |
| 0xc3a63bb8   |
| 0x432bcc6c   |
| ③ 0x2afaec78 |
| 0xbcabb920   |
| 0x2919ec78   |
| 0x3608cb08   |
| 0x13fcce78   |
| 0x019dee60   |
| 0x12a0ec78   |
| 0x513f5f3c   |
| 0x0258ec78   |
| 0xa34139f0   |

The base of the page table (e.g., satp or CR3 register) points to 0xc8d01400

Given this information, what data is returned when executing the following load?

ld x4, 0(x1)

16 bits

Assume the effective address is 0x00eaec78  
Give your answer in hex (e.g., Oxabcd)  
offset 16 bits.

x4: 0xbcabb920