

## Logical Not (operator: !)

• result is 0 or 1

if (!x)

if left is 1, then right is  
not reached

07/31



**Masking** - Using bitwise operations on integers of various sizes to capture and manipulate bits.

- Bitwise Or with a mask pattern to ensure certain bits are turned on (i.e. set to 1)
- Bitwise And with a mask pattern to ensure certain bits are turned off (i.e. set to 0)

ex) Capture Low order 23 bits  
of a 32 bit number

$X \& 0X007FFFFF$



ex) Capture Low order 52  
bits of 64 bit number

$X \& 0X0000FFFF FFFF FFFF$

\*Bitwise and is  
an identity operator.

### Operating on single bits:

- Set bit N=1 in X  $\rightarrow X |= (1 \ll N)$
- Set bit N=0 in X  $\rightarrow X \&= \sim(1 \ll N)$
- Toggle bit N in X  $\rightarrow X ^= (1 \ll N)$
- Is Bit N in X 1 or 0?  $\rightarrow X \& (1 \ll N) \rightarrow$  gives zero (false) or nonzero (true)

If bit N was 0,  
then now 1.  
If bit N was 1,  
then now 0.

ex) 1101100100011

$\& 00000010000000$  ← only one bit  
is 1

00 00000000000

Swap values of ints A and B

temp = A

A = B

B = temp

Swap A and B (no temp)

A  $\leftarrow$  B

B  $\leftarrow$  A

A  $\leftarrow$  B

Endianness:



\* Big Endian = msb first, decreasing numeric significance as byte address increases



\* Little Endian = Lsb first, increasing numeric significance as byte address increases

Ex) Little Endian 0X2710



Big Endian Ex) 0X01020304



VS Little Endian Ex) 0X01020304



Intel Register (Little Endian)



\* Backward compatibility  
(only in theory, not practical)

%RAX occupies the low-order bytes of the %RDX register

## ★ + know sizeof()

C bitfields accessing individual bits or groups of bits

STRUCT BF example {

    unsigned int F1:10;      size in bits

    unsigned int F2:3;

    unsigned int :4;      ok to have  
                          unnamed bitfield  
                          (hardware)

    unsigned int F3:7;

}

STRUCT BF example BEX;

BEX F1 = 30;      assign what  
                         fits

BEX F2 = 0;

;

if (BEX.F1 == 20 && BEX.F2 != 14)

    Union {



        unsigned char C;

        STRUCT RegBite {

            unsigned int RA:4;      size in bits

            unsigned int RB:4;

        } RPART



? can access  
groups of bits

} ;

\* Layout of exactly where the bits are implementation dependent.

\* Bitfields do not have addresses.

## ★ ★ ★

Caching:

\* CPU are faster than memory access

\* Faster memory is more expensive.

→ Allow faster and cheaper memory cost.



## \* Principles of locality:

1. Spacial locality - If specific memory location was accessed then nearby memory location are more likely to be accessed soon.
2. Temporal Locality - If specific memory location was accessed then that same memory location is more likely to be accessed again soon.



## Memory Hierarchy:



## \* Caching terminology:

- Cache: a small, fast storage device that contains a subset of large and slow storage device
  - Block: a fixed-size chunk
  - Line: meta information about one block consisting of tag bits, selector bits, Valid bit \*
- what's in the cache* → boolean

- Set: a collection of one or more lines
- Cache Capacity: total number of bytes a Cache can hold

$$\text{Capacity } C = S * E * B$$

↑                      ↑                      ↑  
 Number of sets      Block size  
in bytes      Number of  
lines per set

← ~~☆☆☆~~  
know how to  
use this.

- Placement Policy: where to put meta information for each block in the cache and where to put the actual data blocks

### General Organization of cache:



Main memory addresses of  $m$  bits are broken into 3 parts



Meta information for Cache block is

- Valid bit
- Tag



Initial state of cache is empty (I.E. valid bit for every block is zero.)

ex) Block Size :  $4096 = 2^{12} \Rightarrow$  Block offset is 12 bits



## ~~Caching~~ Hits and misses:

- Hit: Desired block is already in the cache
- Compulsory / Cold miss: Could not find desired block because cache is empty
- Conflict miss: Multiple blocks keep mapping to the same line
- Capacity miss: Number of blocks needed exceed the size of cache



## Direct-mapped Cache:



- One line per set
- Simplest kind of set
- Search Cache by mapping selector bits to line of Cache.
- ~~2~~ Selector bits imply  $2^2$  lines in the direct-mapped cache.
- All Addresses with the same selector bit values go to the same line of cache.



Swapping in/out Cache lines

\* Thrashing = Constantly throwing a block and putting a block in the same block

\* Tag bit that tells which block to access to

\* Selector = What line

\* Some lines may not be used even though some get used multiple times

Ex) 64 bit address

4096 byte block

8 blocks in cache

|   |     |
|---|-----|
| 0 | TAG |
| 1 | TAG |
| 2 | TAG |
| 3 | TAG |
| 4 | TAG |
| 5 | TAG |
| 6 | TAG |
| 7 | TAG |



Read Addresses

1234 ABCD7777 3000

1234 ABCD7777 3FFF

Block offset

All in Same block

ABCD1234 7777 3000

FEDCBA90 6666 3000

## Set Associative Cache:



- Main memory addresses broken into tag, selector, block offset bits
- Each set holds more than one line.
- Any line in a set can hold any block that maps to that set.

### 4 way set associative



\* Hardware is going to do a parallel search.

- \* Replacement Policy: Select victim block to evict
  - { 1. Random Selection
  - 2. Get rid of least recently used
- \* Better than direct-mapped, b/c less bits needed for selector bits, but way more complicated.

08/02

- Cache tries to find a block that matches an address
- If the requested block is in the Cache → "HIT"
- If the requested block is not in the Cache → "Miss"
- # of Hits / Total requests = Hit Ratio.

\* Goal - to maximize hit ratio.

### Set Associative Cache



- Selector bits from address determine set.
- Any line in the set will do.
- Parallel Search within a set for matching tag bits.
- Hardware parallel search
- Still some Potential underutilization.
- Because selector bits associate to specific sets.

\* Direct map cache use underutilization

