

|                                                |                                                                                                                    |                                                                                                                |                                                               |                                                       |
|------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|-------------------------------------------------------|
| $1\text{ ms} = 10^{-3}\text{ s}$ (millisecond) | 2's complement $n-bits$<br>Range: $[-2^{(n-1)}, 2^{(n-1)}]$                                                        | 8 Reg<br>4B Reg word<br>2 <sup>16</sup> memory<br>8 insts                                                      | 32 Reg<br>8 B Reg words<br>2 <sup>64</sup> memory<br>40 insts | little endian<br>0 DA<br>1 OB<br>2 OC<br>3 OB<br>4 DA |
| $1\text{ us} = 10^{-6}\text{ s}$ (microsecond) | ① <u>Sign bit</u> : 0 (+), 1 (-)<br>② <u>exponent</u> : 8 b floating point                                         | Biased base -127 encoding                                                                                      | ARM-LEG                                                       | big endian<br>0 DA<br>1 OB<br>2 OC<br>3 OB<br>4 DA    |
| $1\text{ ns} = 10^{-9}\text{ s}$ (nanosecond)  | Add 127 to the value of the actual exponent to encode: E.g. -127 $\rightarrow$ 0 $\underline{\underline{1}}$ 0 (0) | $2^0 = 2, 2^1 = 4, 2^2 = 8, 2^3 = 16$<br>$2^4 = 32, 2^5 = 64, 2^6 = 128, 2^7 = 256$<br>$2^8 = 512, 2^9 = 1024$ | 2 KB is $2^{10}\text{ B}$                                     |                                                       |

Hz = cycle / s    1 kHz =  $10^3$  Hz  
 $1\text{ mHz} = 10^{-3}\text{ Hz}$     1 GHz =  $10^9$  Hz  
in bytes:  
char (1) short (2) int (4)  
float (4) double (8) long (8-64)  
pointer < 4 - 32b  
            8 - 64b  
most significant bits after decimal

$$\text{Ex. } (10.625) \xrightarrow{10} (1010.101) \xrightarrow{2} (1.010101 \times 2^3)_2 \Rightarrow 0 \underline{10000010} \underline{0101010} \dots$$

RISC  
E.g. LC2K, RISC-V, ARM  
1. fewer, simpler insts  
\* small CPI  
2. encoding of insts are usually the same size  
more popular  
3. simpler hardware  
4. Program is larger, since every inst is simple, require more insts overall

### CISC

- more, complex insts  
\* higher CPI
- encoding of insts are usually in different sizes
- complex hardware
- Short, expressive programs.  
easier to write by hand - each inst is doing more things

write w/ cache

two possibilities for a physically-addressed:

TLB access time + cache time + memory time

| store w/o allocate       | write-back<br>write cache<br>write to memory<br>if dirty, write to memory | write-through<br>write cache + mem<br>write to mem<br>Do nothing |
|--------------------------|---------------------------------------------------------------------------|------------------------------------------------------------------|
| 2H miss<br>replace block |                                                                           |                                                                  |

① TLB hit  $\rightarrow$  cache miss  $\rightarrow$  go memory

② TLB miss  $\rightarrow$  go memory

"page fault"

memory (PT) hit  $\rightarrow$  cache hit

miss  $\rightarrow$  update TLB and PT

miss  $\rightarrow$  disk access

miss  $\rightarrow$  update TLB

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Data to processor

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB

miss  $\rightarrow$  update TLB, PT and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\rightarrow$  update TLB and memory

miss  $\rightarrow$  update cache

miss  $\rightarrow$  Context switch = clear TLB + cache

miss  $\$

## Detect and stall:

| Time:     | 1  | 2  | 3   | 4   | 5  | 6   | 7   | 8  | 9  | 10 | 11 | 12 | 13 |
|-----------|----|----|-----|-----|----|-----|-----|----|----|----|----|----|----|
| add 1 2 3 | IF | ID | EX  | ME  | WB |     |     |    |    |    |    |    |    |
| nor 3 4 5 |    | IF | ID* | ID* | ID | EX  | ME  | WB |    |    |    |    |    |
| add 6 3 7 |    | IF | IF  | IF  | ID | EX  | ME  | WB |    |    |    |    |    |
| lw 3 6 10 |    |    |     |     | IF | ID  | EX  | ME | WB |    |    |    |    |
| sw 6 2 12 |    |    |     |     | IF | ID* | ID* | ID | EX | ME | WB |    |    |

## \* D latch

- When the Gate (clock) is 1, Q (output) = D (input)  
meaning the latch is transparent
  - When the Gate (clock) is 0, Q latches to the value of D at that instant and remembers, even if D changes later
- \* D flip-flop  
Only store the value the instant the clock changes (edge)
- \* optimal achievable serial runtime:

add/nor: read memory (fetch) + read register file + ALU + write register file

lw: read memory (fetch) + read register file + ALU + read memory (loading) + write register

sw: read memory (fetch) + read register file + ALU + write memory

beq: read memory (fetch) + read register file + ALU (comparison)

noop / halt: read memory (fetch)

\* direction prediction for jalr is not worse than beq because jalr is always taken

\* The TLB often has a miss rate that is greater than 90%

\* single-cycle: longest inst

multi-cycle: longest component (write memory)

\* Detect and forward: when counting # bits

executed, we don't consider lookup time and larger overhead

add/nor: forward res from MEM to the start of EX and forward from WB (MEMWB) to the EX

\* FA cache = easy to implement but longer

\* Directly-mapped cache: quick lookup, no overhead need for LRU - but constant cache evictions  $\Rightarrow$  higher miss rate

## \* SRAM:

\* volatile: need constant power to keep data

\* Fast:  $\sim 1\text{ ns}$  read time

DRAM: \* volatile \* slower  $\sim 50\text{ ns}$

\* non-volatile = holds information even when no power is supplied

\* Reducing a processor's frequency decreases main memory latency measured in terms of processor's cycles.

\* It is possible for a write-through cache to write less total bytes of memory than a write-back cache when running the same program  $\Rightarrow$  clock period  $\uparrow$

Write-through and write-back becomes the same if we only access a particular block one time before we exit.  $\Rightarrow$  time = clock period  $\cdot$  # clocks  $\downarrow$

Write-back works the best if we access a particular block multiple times before exiting



\* Block offset bits =  $\log_2$  (block size)

Set index bits =  $\log_2$  (# sets)

Tag bits = address size - set index bits - block offset bits

Address size =  $\log_2$  (size of memory)

Cache size = # cache blocks  $\cdot$  block size

\* PTs use VPNs to index into the PT to find PPN

\* 1. When data is in the data cache: # cycles to access cache

2. When data must be fetched from memory:

① hit in TLB: cycles for cache + cycles for memory

② miss in TLB and hit in two PTs: cycles for cache + cycles for 2 PTs + cycles for memory

\* Assuming we have 2-levels of PTs

3. When data must be fetched from disk

① miss in 1st level PT = cycle for cache + cycle for 1st level PT + cycle for disk

② miss in 2nd level PT = cycle for cache + cycle for 2 PTs