

CS 251 - S23

# Computer Organization and Design

## Full Course Notes

With Prof Zille Huma Kumal

---

In-class notes were tricky in this class. I didn't write everything, but I hope this helps.

---

# CS 251

(1)

Branching  
4, B #1 → 5

64-bit  
architecture

## 1) Introduction + ARM



## 2) ARM Simulations + Performance

WORD: 4 bytes. ← INSTRUCTIONS



GPR: general purpose register.

XZR: zero.

PC: program counter (SPR)

SPR: special purpose register.

#: immediate.  $-2^{24} \rightarrow 2^{25}$ .

### Instruction Formats

#### R-Format

- 3 operands, in registers
  - Store in first register
- ADD X3, X1, X2

#### D-Format

- 3 operands, 2 in GPR
- LDUR X1, [X2, #24]  
 $X1 = (X2 + 24)^{*}_{X2[3]}$

#### I-Format

- 3 operands, 1 is #
- SUBI X1, X2, #100

#### B-Format

- unconditional goto
- B #28 → (word offset)  
 $PC = PC + 4 \times 28$

#### CB-Format

- conditional goto
- CBNZ X1, #8  
not zero  
compare to 0

Computer design: the choice of ISA

Computer architecture: ISA + computer organization

### HOW COMPUTERS WORK

User Program

High Level Language

Assembly

Operating System

Machine Language

### ARM is our ISA



## 3) Performance

CPU Time: # of clock cycles × clock time

Clock Rate: 1 / clock period

Generally, we count total instructions.

## 4) Digital Logic Design



Karnaugh alternative:  
Don't Cares

## 5) Boolean Logic

XXX Ü

## 6) Logic Gates

### ONLY NAND

- SOP form
- invert between

### XOR

$\Rightarrow D$   
1 if odd # of 1's

### Multiplexor



### Decoder



## 7) Transistors, Clocks, CMOS

Metal Oxide Semiconductor Field-Effect Transistor  
Unipolar material

src = Gate DRAIN current is drain to src  
P-channel substrate

input resistance high

A = 0 A = 1 Q = resistance

V<sub>DD</sub>: input voltage (drain)

A: gate power!

nMOS naturally inverting

pMOS not inverts

A: input

V<sub>DD</sub>: output

CMOS Complimentary!

input A = 0 A = 1

resistance low high

EG:



3-state buffer  
Gate X C F

C X | Q<sub>1</sub> Q<sub>2</sub> F  
---|---|---|---  
00 | H H Z  
01 | H H Z  
10 | L L O  
11 | L L 1

SEL X F

## 8) Sequential Logic



conceptually understand that

### SR Latch (NOR)



### D-Latch



| C | D | t+1                   |
|---|---|-----------------------|
| 0 | 0 | Q <sub>t</sub>        |
| 0 | 1 | Q̄ <sub>t</sub>       |
| 1 | 0 | Q <sub>t</sub>        |
| 1 | 1 | Q (reset)<br>Q̄ (set) |

## 9) Edges, DFF, Sequential Circuits



## 10) Data Representation/Manipulation. 1-Bit ALU

**Two's Compliment!**

**Full Adder**

| Cin | A | B | Cout | Sum |
|-----|---|---|------|-----|
| 0   | 0 | 0 | 0    | 0   |
| 0   | 0 | 1 | 0    | 1   |
| 0   | 1 | 0 | 0    | 1   |
| 0   | 1 | 1 | 1    | 0   |
| 1   | 0 | 0 | 0    | 1   |
| 1   | 1 | 0 | 1    | 0   |
| 1   | 1 | 1 | 1    | 1   |



## ARM instructions

LSL - logical shift Left  
ASL - Arithmetic shift Left  
LSR - 0's on left  
ASR - sign on left

## 11) Floating Point (FP)

Normalized: fraction starts w/ 1.

Overflow: exponent too big.

Underflow: exponent too small.

### IEEE 754 FP Standard



### Special cases

| E     | F     | Value       |
|-------|-------|-------------|
| 0     | 0     | 0           |
| all 1 | 0     | $\pm\infty$ |
| all 1 | non 0 | NaN         |

### Bias notation

bias = 127  
 $2^{-8} \rightarrow E = 119 - \text{bias}$

## 12) Single Cycle Processors

Byte addressing: either LSB or MSB, in ARM

### Primary Storage/State elements

ROM Registers SRAM DRAM

- DRAM slower + cheaper than SRAM
- Refresh controller must allow read/write access
- Possibility of getting more bytes (page-mode RAM)

## 13) Single Cycle Processor Implementation

Future Josiah; read the slides and remember the "high level" ideas.



# CS 251 ②

## 15) Pipelining

Pipelining: Parallel execution of instructions.

### Data Path Stages

- ① IF Instruction Fetch — fetch instruction.
- ② ID Instruction Decode — read regs & decode.
- ③ EX Execute — execute operation.
- ④ MEM Memory — get operand in data mem.
- ⑤ WB Write Back — write result to a reg.



NOP  
no operation

"forwarding"  
is feeding EX intb  
next IP, for example.

"stalling"  
waiting another step  
in case it's impossible

Data Hazard: result of an instruction is needed by another.

Structural Hazard: hardware can't support the combo we need.

Control Hazard: CB instruction might change next instructions.

→ control hazard solutions:

- ① Stall
- ② Predict ← "dynamic prediction" uses past behaviour.
- ③ Delay Decision ← more unaffection instr to after

(Pipeline Registers) store intermediate values between clock cycles.

## 16) Forwarding

Forwarding: to the EX stage

- Pass ALU Result from a later pipeline back into EX forwarding unit.

Needed When: (both 1 and 2)

- ① RegWrite = 1 in MEM or WB.
- ② Destination is not X31

Notation?  
IDK, see slides

EX hazards

MEM hazards

## 17) Control Hazards

We have a Hazard Detection Unit that stalls if a hazard is detected.

↳ set all controls to zero, especially mem write rewrite.

Load-Use Hazard: LDUR isn't fast enough!

Flush: Don't run this instruction again!

CBZ: doesn't need ALU!

### Branching Types

Branch Not Taken: assume it doesn't branch (always).

Branch Taken: assume it does branch, always.

#### Dynamic Branch Prediction

1-bit: use a bit to remember if branched last time. 1 = was taken. init = ??

2-bit: use the DFA



## 18) Code Rearrangement

Code Rearrangement: swap lines of arm code to remove hazards.

Loop unrolling: remove values only used for CB instructions and take care of the looping yourself! (bad for caching).

Notes: STUR should be unchanged. Don't swap STUR→LDUR.

use branch-delay slots instead of flushing. Don't swap forward or backward dependencies. Don't move instructions in/out of a loop.

### Exceptions

- External event (eg I/O device) changes control flow.
- Internal event (eg overflow) changes control flow.

Solution ☑ Save PC for interrupted instruction! Service the instruction.

ELR: Exception Link Register

ESRs: Exception Syndrome Register

Direct-Mapped-Cache:



Cache: probably SRAM. Memory but on the chip!



Block: minimum info unit

Hit: info present when requested

Miss: info not present, must be copied up.  
↳ stalls processor.

## 19) More Cache

### Block-Size Read/Write

Just read/write blocks at a time!

V = valid bit: iff has right value!

D = dirty bit: iff block's been changed.

### 2-way Set-associative Cache

- Look at LSB's for the tag
- Have 2 values in each spot (tag)!

↳ Fully associative: maximally "#-way"

Caches search tags in parallel.

## 20) More Cache

When cache is full? What to evict?

LRU: least recently used. ← needs to track order.

↳ "use" bit → the LRU word.

Index: General slot (from LSB) for an address.

Tag: Differentiator for associative

Linesize: Number of items in each row.

### AMAT - Average Memory Access Time

AMAT = Time for hit + Miss Rate × Miss Penalty.

★ I didn't write any real notes on page tables or TLBs, so I recommend going through the tutorials/lectures for that.

| Tag | Index | Block   | Byte |
|-----|-------|---------|------|
| 63  | ...   | 5 4 3 2 | 1 0  |

A  $2^n$ -way cache of size  $2^k$  bytes with a block size of  $2^b$ :

- $2^b$  byte-bits
- $b$  block-bits (linesize = bytes + blocks)
- index bits:  $i = k - b - n = k - ls - n$
- tag bits remain:  $64 - 2 - b - i = 64 - ls - i$

## 21) Virtual Memory

Virtual Memory: between main memory and secondary storage (disks). (non-physical).

Page: virtual memory block

Page Fault: virtual memory miss

USED TO

- share memory between programs
- allow single user to exceed main mem size

Process: running program

Time Slice: block of time each process gets to run

### Page Replacement Schemes

LRU: Least Recently Used

LFU: Least Frequently Used

FIFO: First In First Out

Random: Random

Page Table:

virtual page → physical memory

LEC  
22

TLB: Translation Lookaside Buffer

Stores translations between virtual and real addresses.

(TLB is a cache specifically for translations.)



↑ my ideal daily routine ↑

↓ everyone happily studying for 251 together ↓

