

Two's Complement

• Flip bits, add 1.

Integer Multiplication

$$1000 = 8_{10}$$

$$\times 1001 = 9_{10}$$

$$\underline{1000}$$

$$+ 1000$$

$$1001000 = 72_{10}$$



Floating Point.

|            |                |                 |
|------------|----------------|-----------------|
| S          | E(xponent)     | S(significand)  |
| 1 bit sign | Biased Integer | Integer         |
| 32b        | 1 bit          | 8 bits (-127)   |
| fffb       | 1 bit          | 11 bits (-1023) |

Memory

FFFF FFFF



malloc (size\_t)  
 calloc (size\_t, size\_t)  
 realloc (void\*, size\_t) returns void\*  
 free (void\*)  
 variables declared outside functions

~ 0000 0000

Boolean Algebra

$$x\bar{x} = 0$$

$$x \cdot 0 = 0$$

$$x \cdot 1 = x$$

$$x \cdot x = x$$

$$x \cdot y = y \cdot x$$

$$(x \cdot y) \cdot z = x \cdot (y \cdot z)$$

$$x \cdot (y + z) = x \cdot y + x \cdot z$$

$$x \cdot y + x \cdot z = x$$

$$x \cdot y + x = x + y$$

$$(x + y) \cdot x = x \cdot y$$

$$x \cdot y + x = x + y$$

$$(x + y) = \bar{x} \cdot \bar{y}$$

$$(\bar{x} \cdot y) = \bar{x} + \bar{y}$$

$$x + \bar{x} = 1$$

$$x + 1 = 1$$

$$x + 0 = x$$

$$x + x = x$$

$$x + y = y + x$$

$$(x + y) + z = x + (y + z)$$

$$x + y + z = (x + y) + z$$

$$(x + y) \cdot x = x \cdot y$$

$$x \cdot y + x = x + y$$

$$x \cdot y + x = x + y$$

$$(x + y) = \bar{x} \cdot \bar{y}$$

$$(\bar{x} \cdot y) = \bar{x} + \bar{y}$$

$$(\bar{x} \cdot y) = \bar{x} + \bar{y}$$

Transistor energy efficiency

$$P = \frac{dE}{dt}, E_{0-to-1} = E_{-to-0} = \frac{1}{2} C V_{dd}^2$$

total capacitance to be switched

$$P_{sw} = \frac{1}{2} dC V_{dd}^2 F \rightarrow \text{clock frequency (Hz)}$$

operating voltage

"activity factor" = average percentages of capacitors switching per cycle.

$E_{sw} \propto V_{dd}^2$ ,  $V_{logic} \propto V_{dd}$   $\Rightarrow$  Improve by Vdd and compensate performance by adding cores.

Cache

Temporal Locality & Spatial Locality



Misses

1. Compulsory (first time) 2. Capacity

3. Conflict (solved with fully associative)

4. Coherency

- Extra  
 1. valid bit  
 2. Dirty bit  
 3. Shared bit

$$AMAT = \text{Hit Time} + \text{Miss Rate} \times \text{Miss Penalty}$$

Implicit exponent  $- (2^{e-1} - 1) + 1 \Rightarrow -126$  in 32b  
 with implicit leading 1 changed to  $-1022$  in 64b  
 on implicit leading 0.

MOESI Coherency Policy



## Cache Design Choices

### 1. Increasing Associativity

- Hit time  $\uparrow$  from DM to  $\geq 2$  ways (since mux correct way to processor)
- Hit time  $\sim$  (slightly increased) for further increase in associativity
- Miss rate  $\downarrow$  : reduced conflict misses ( $1 \rightarrow 2 \rightarrow 4$ )
- Miss penalty mostly unchanged since replacement policy runs in parallel with fetching missing line from memory

### 2. Increasing # of entries.

- Hit time  $\uparrow$  : reading tags and data from larger mem structures
- Miss rate  $\downarrow$  : reduced capacity and conflict miss. Miss rate  $\downarrow \sim 2x$  / capacity  $\uparrow \sim 4x$
- Some point increase in hit time may overcome improvement in hit rate.

### 3. Increasing block size.

- Hit time unchanged but may be slightly reduced as # of tags is reduced
- Miss rate  $\downarrow$  at first (spatial locality), then  $\uparrow$  because of conflict miss  $\uparrow$
- Miss penalty  $\uparrow$  but with fixed constant initial latency it is amortized over the whole block.

### 4. Add a "victim cache"

CSR: 

$$\text{Speedup} = \frac{\text{Time w/o E}}{\text{Time w/ E}}$$

$$\text{Time w/o E} = \text{Time w/o E} \times \left[ (1-F) + \frac{F}{S} \right]$$

$$\text{Speedup} = \frac{1}{(1-F) + \frac{F}{S}}$$

## CALL (Compilers $\Rightarrow$ Assemblers $\Rightarrow$ Linker $\Rightarrow$ Loader)

### Compiler $\leftarrow$ Computationally Heavy

- Lexes input  $\rightarrow$  tokens
- Parses tokens  $\rightarrow$  abstract syntax tree
- Semantic Analysis and Optimization
- Code Generation.

### Assembler

- foo.s  $\rightarrow$  foo.o
- Reads and uses Directives
- Produces machine lang.
- takes two pass (remember  $\rightarrow$  fill) to solve forward reference problem.
- Outputs Symbol Table and Relocation Table.

### Linker

- Puts multiple .o file  $\rightarrow$  .out file

### Loader

- Reads header.
- Creates new address space
- Copy instructions and data to mem
- Copies arguments onto stack
- Initializes registers (SP)
- Jumps to start-up routine
  - copies arguments from stack  $\rightarrow$  reg
  - sets PC.
- Main routine exits  $\longrightarrow$  cleans up
- Links DLL.

### Hard Disk



$$\text{Seek Time} = \frac{\# \text{ of tracks}}{3} \cdot \frac{S}{\text{track}}$$

$$\text{Rot Time} = \frac{1}{2} \text{ time of a cycle.}$$

$$\text{MTTF} / \text{MTTR} / \text{MTBF} = \text{MTTF} + \text{MTTR}$$

$$\text{Availability} = \frac{\text{MTTF}}{\text{MTTF} + \text{MTTR}} = \frac{\text{MTTF}}{\text{MTBF}}$$

### RAID 0: Striping

### RAID 1: Disk Mirroring

### RAID 3: Single Parity Disk (bitwise)

### RAID 4: high I/O Rate Parity (blockwise)

### RAID 5: high I/O Rate Interleaved Parity

### RAID 6: + parity block per stripe $\uparrow$

### Warehouse

$$\text{PUE} (\text{Power Usage Effectiveness}) = \frac{\text{Power of total building}}{\text{Power of IT equipment}}$$