

GPIO

- General Purpose Input/Output (GPIO)
  - can read/write w/ software

## • Functions

- Digital In
- Digital Out
- Other (ex: DAC or ADC)

## • Diagram:



## • Push Pull vs Open Drain

- PP = both P-Mos + N-Mos activate
- OD = only N-mos operates (when 1 is written, pin has high imp.)

## • Output Speed Control - rising/falling speeds

- $\Delta t$  =  $\frac{\Delta V}{\text{Speed}}$  =  $\frac{\Delta V}{\text{EMI Noise}}$  and  $\uparrow$  power.
- Consis. based on peripheral speed.

- **Slew Rate:**  $\frac{\Delta V}{\Delta t}$  (amount of time under max voltage) to change

## • Pull up &amp; down resistors to drain-the buffer.

## • Memory Mapped I/O:

System  $\rightarrow$  NVIC, Timers

Ext. Device  $\rightarrow$  SD card

Ext. RAM  $\rightarrow$  offchip memory

Peripheral  $\rightarrow$  Timers, GPIO

SRAM  $\rightarrow$  on chip RAM

Code  $\rightarrow$  on chip Flash.

Interrupts

- Interrupts standard program sequence
  - Specific type of an exception (hardware invoked sub-routine call)
- Code is translated from C to assembly to machine code.
- Program Counter keeps track where you are in code.
- Microcontrollers are single threaded (no simul. ops.)
- Basics
  - Peripherals can interrupt
  - Can be interrupted by more than one thing.
    - ↳ Calls diff function Interrupt Service Routine (ISR)
    - ↳ for each one.
  - Can be enabled and disabled
  - Have priorities, it can skip over lower to go first
  - Pending interrupt = Raised but not handled interrupt.
    - ↳ Handler must tell periph. it acknowledges the request
- Flow of interrupt
  1. Peripheral raises particular interrupt
  2. CPU checks if its enabled
  3. If EN: mark "N" as pending
  4. checks priority level of "N" vs curr. CPL.
  5. If  $\text{Prio-N} > \text{CPL}$ :  $\text{CPL} = \text{Prio-N}$ , CPU state  $\rightarrow$  Stack
    - Nth entry  $\rightarrow$  PC of vectorable
  6. Running ISR.
- Exceptions will not stop unless higher prio one is raised.

Timers

- Used to periodically do things
  - Sys tick is special, but several timers do exist.  
x Generates SysTick interrupts in fixed intervals.

## General Timer Construction



- Prescaler divides clk.
- Reload val. decides how far Counter counts.
- $f_{clk-PSC} = \frac{f_{clk}}{\text{prescaler} + 1}$
- We can use these timers to execute code regularly.

Multiplexing  
+  
Debouncing

$$f_{\text{out}} = \frac{f_{clk}}{(PSC + 1)(\text{reload} + 1)}$$

- Electrical Debouncing is easy but hard for large # of buttons.
- For keypads, we can apply voltage to each row fast and then read out the columns that receive voltage.  
x we can use a Timer ISR.
- Debouncing a matrix can be done through software.
  - We can keep a history of outputs from the matrix.
  - For 8 history byte, 00000001 - key press  
11111110 - key release  
11111111 - press + stable  
00000000 - release + stable
  - We want to scan the keypad faster than human reaction to record separate key actions while slower than total bounce time so multiple incorrect bits in a row are not recorded.
- We can multiplex displays too, apply power to each digit and ground the segments to light up.

Direct Memory Access (DMA)

- DMA allows you to move data without the CPU actively taking part. This speeds up communication rate.
- Can move data between Memory buffer + Peripheral after CPU read/write.



- DMA Flow-through vs Fly-by
  - Flow-through stores data as an intermediate buffer (to change data size or math.)
  - Fly-by only connects bus between memory buffer and peripherals.
- DMA controller sits on bus, stores source + destination, etc.

Digital-to-Analog Converter (DAC)

- Converts digital data into a voltage signal w/ a N-bit DAC:

$$V_{DAC\text{out}} = V_{ref} \cdot \frac{\text{Digital Value}}{2^N - 1}$$

(operates as a digital potentiometer)

- Can be used for digital audio + waveform generator
- Can optimize speed + resolution + power dissipation

- Binary fractions in fixed pt. notation is inexact
  - 16-bit value

$$\overline{128} \ \overline{64} \ \overline{32} \ \overline{16} \ \overline{8} \ \overline{4} \ \overline{2} \ \overline{1} \circ \ \overline{y_2} \ \overline{y_1} \ \overline{y_0} \ \overline{y_{32}} \ \overline{y_{64}} \ \overline{y_{128}}$$

- DAC arch. (op. at speed of light)



- Limits

- Resistor tolerance
- Minor errors become enlarged

- Wave table of digital values to convert can operate well to generate desired waves.

$$f_s = \frac{\text{Samples/sec}}{\text{wave table samples/Cycle (of wave)}}$$

Analog to Digital Conversion (ADC)

## • ADC

- Resolution: Number of bits in ADC output <sup>binary</sup>

$$\times \boxed{V_{out\text{ADC}} = \text{round}((2^N - 1) \cdot \frac{V_{in}}{V_{Ref}})}$$

- Quantization Error: max error from digital value vs analog.

$$\times \boxed{\frac{V_{Ref}}{(2^N - 1)(2)} = \text{Max error Quantization}}$$

(i.e. the LSB (least significant bit))

- Sampling rate: how often the analog value is recorded.

$$\times \boxed{f_{sampling} \geq 2 \cdot f_{signal}}$$

## • How it works

- Converts to value by adding bit by bit and checking if > or < actual value.

- Input values are clipped at  $V_{Ref}$  and  $V_{SS}$

• How to beat jitter in analog signals: ~~to~~

- Boxcar averaging: Take lot of samples, average the last few for the new value at a slower update rate

ex. 1kHz update = 128k samp/sec + average last 128 samp.

Advanced Timers + Pulse Width Modulation (PWM)

- Advanced timers include a compare and capture register (CCR) where it triggers an output when CCR = timer counter.
  - Active high mode = output is high signal
  - Toggle mode = output switches from high  $\leftrightarrow$  low.
  - PWM mode = is high  $<$  CCR, is low  $>$  CCR.
  - Also can have up count/down count or both.
- PWM Duty Cycle:  $\frac{CCR}{Reload+1}$ , Period =  $(1+ARR) \cdot \text{Clock period}$ 
  - wave form generation with high and low digital signals
  - can program duty cycle + frequency
- GPI0 as a DAC using PWM
  - varying duty cycle can do this
  - averaging the area under can generate a tone w/ a low pass filter.
  - wave freq needs to be way higher than the target signal.
  - analog control of a transistor through DAC is exact., (power wise) PWM is only  $\approx$  so it removes this issue.
- No low pass filter for human perception and/or other signal dampeners.

Serial-Parallel Interface (SPI)

- Serial data transfers reduces wires.
- Synchronous: SPI, I<sup>2</sup>C, USART, - Async: UART
- SPI operates with min. one master + multiple slaves
  - master controls clk.
  - shifts in bits each clk cycle.
  - one clk. pulse per bit.
- Band rate: bits/s (speed of transfer)
- Master initializes all data transfer, drives clk & Slave Select pins.
  - pins MOSI, MISO, SCK, NSS (Master will have multiple slave selects if multiple slaves exist)
- Shift reg's can be hooked up to a master device for data storage.
- Bidirectional SPI
  - Each transaction exchanges 2 words of 4-16 bits each between two devices
    - \* usually master sends command + slave device responds + is ignored,
  - Master instead sends something to strobe SCK.

- Programmable + Async

- Device 1      Device 2



- Use a start bit + stop bit for framing

- use ~~just~~ parity bit to prevent interference (all bits + par.)  
is even/odd

Format : { start, 0xXX, par., stop }

- There's more advanced codes than parity

- Baud rate is time for transaction

$$\text{Data Rate} = \frac{\text{Baud Rate}}{\text{Size of Trans.}}$$

- Standards:

- Idle high, start bit + , stop bit ↓

- Sent LSB first , parity can be even/odd/none

- Stop bit can be .5, 1, 1.5, 2 bits long

- Word size 7-8 bits, parity not req.

- Almost everyone uses 8N1 (8 bits no parity, one stop bit)

- RX → RX , devices can transmit simultaneously

- Each devices have sep clocks to read, but should be similar  
(5% tolerance)

- Errors

- Framing Error: Didn't see stop bit (clk out of tol. or disagree)  
in packet format

- Receiver Overrun: Didn't read received byte before new one  
Started (sys is too slow to read??)

- Parity Error: 1 bits don't add prop. (Noise)

Inter-Integrated Circuit (I<sup>2</sup>C)

- Characteristics (useful bc only 2 wires)
  - Serial, byte-oriented
  - Multi-master, multi-slave
  - Two bidirectional open-drain lines: Serial Data Line (SDA)  
They need pull up resistors      Serial Clock Line (SCL)



- Devices pull low for 0, pull-up res. keep it high for 1.
- Band rate:
  - Stand. 100 kbit/s, 400 kbit/s (Fast), 1mbit/s (Fast+)
  - 10 kbit/s (low speed mode), 3.4 mbit/s (high speed) (not rec.)
  - Limited by bus capacitance, pullup res. strength; length of network
- CLK pulse for every bit (two lines)
- Devices can rec., transmitt, or both
- Slower than SPI, but more convenient.
- SDA transaction (MSB first)
  - { Start bit, 7'b (slave addr), R/W, ACK ↓ (from SI), 8'b (command, Ack, 8'b (Data), Ack, pg 0, write 1, read

0 /

No command, all data replaces command w/ data  
2nd Ack is from master and last Ack is NAck from master
- CLK can be stretched inbetween Ack and Msb of next byte.
  - if slave can't keep up w/ master.

Computer Abstraction

## • Basic Division of Hardware

## - Space Division



## - Time Division

X Fetch, decode, read operands, perform operation, write, det. next instruction

-



## • Instruction Set Architectures (ISAs)

- Connects Hardware to software w/ assembly
- X We use RISC-V

ISA Basics

## • Registers - storage within the core, separate from memory

~~X Word size storage (8-64 bit) storage~~

- RISC-V 32-bit ISA has 32-bit reg. X 0-31

## - Conventions for RISC-V

|   |                                                |
|---|------------------------------------------------|
| X | x0/xzr - Hard-wired zero                       |
| X | x1/xra - return address                        |
| X | x2/xsp - stack pointer                         |
| X | x3/xgp - global pointer                        |
| X | x4/xtp - thread pointer                        |
| X | x5-x7/x0-2 - temporaries                       |
| X | x8/xso/xfp - Saved reg/frame pointer           |
| X | x9/x32 - saved reg                             |
| X | x10-x11/x0-1 - Function arguments/return vals. |
| X | x12-x17/x2-7 - Function arguments              |
| X | x18-x27/x2-11 - saved registers                |
| X | x28-x31/x3-6 - Temporaries                     |

## • ISAs have limited, expensive registers

- Memory solves this. w/ byte addressing

## • Addressing Mode - How operand gets its values

- Register Addressing - read value from register

- Immediate Addressing - read value from constant in instruction

- Offset Addressing - Read value from memory. Address from imm + reg.

ISA Basics contd

- Big endian (MSB @ 0), Little endian (LSB @ 0)
- RISC Instructions are 32 bits or 4 bytes long

- R-Type (Res/Res) - funct7, rs2, rs1, funct3, rd, opcode  
 7bit 5bit 5bit 3bit 5bit 7bit

X Opcode partially specs instr., funct7+3 defines operation

X rs2 is 2nd operand, rs1 is 1st op., rd is dest. reg

- I-Type (imm with + load) - immediate, rs1, funct3, rd, opcode  
 12 bits 5bit 3bit 5bit 7bit

X rs1 - source base address, imm - const operand or addy offset  
 w/ 2's compliment sign extended

- S-Type (stores) - imm [11:5], rs2, rs1, funct3, imm [4:0], opcode  
 7bits 5bit 5bit 3bit 5bits 7bit

X rs1: base address added to address, rs2 - source op. reg #, imm - offset

• RISC can do arithmetic, load: 8b, 16b, 32b, shift, and do logical ops.

• Final uncovered inst type for now:

- U-Type (upper imm. format) - imm [31:12], rd, opcode  
 20bit 5bit 7bit

- Examples:
- |        |              |                                                    |
|--------|--------------|----------------------------------------------------|
| - qadd | $x7, x6, x5$ | $\downarrow \downarrow \downarrow$<br>rs1      rs2 |
| - lw   | $x8, 0(x9)$  | $\uparrow$ imm $\nwarrow$ rs1                      |
| - sll  | $x5, 0(x6)$  | $\uparrow$ imm $\nwarrow$ rs2                      |

ISA Control Flow

- We can create blocks to jump to

- block 1: ...  
 ...

block 2: ...

- Using branch instr. we can jump based on values in reg.

- bne  $x0, x5, \text{block1}$ ,  $\text{beq } x0, x5, \text{block2}$  ( $\neq$  and  $=$ )

- bge  $x0, x5, \text{block1}$ ,  $\text{blt } x0, x5, \text{block2}$  ( $\geq$  and  $<$ )

- Branching can be used for if/else, loops, etc.

Procedures

- How to call a function.
  - Pass in arguments w/  $\times 10 - \times 17$ , return w/  $\times 10 - \times 11$ .
  - Call jal, stores where to ret. to and jumps.
  - After finishing call jalr to load picv addy and jump.
- Since only 8 temp reg's exist, use stack in memory to store data.
  - To use stack you subtract 4 for each new reg
  - Else sw/lw and offset w/ the stack pointer
  - add back stack usage at the end.
- Frame pointer shows where each function's data begins
- Memory space division:



- Application Binary Interface (Defines the assembly  $\rightarrow$  binary protocol)

PC - Relative

- Translating a C program:



- ASCII is used to encode strings & chars.
- Assembler translates to m1, and provides info for building program.
- Object Modules produces an executable image that will be loaded later

Single Cycle Processor

- Datapath (single cycle)

- Uses Muxes, ALU, register files, etc.

- \* Comb Elements



- \* Storage Elements



## Register

## RegisterFile

## Memory

- WE - write enable, RW - write to new, Put a reg on bus A, put RB reg on bus B.

- Computer Outline:



- Processor Implementation

- Impl. def. Clocks per instruction (CPI) and clock cycle time (Cycletime).
- ISA + compiler def how many instructions in a program

- Datapath - single cycle: Get one instruction done in one long clock cycle.
  - steps: Fetch, decode operands, execute, memory, writeback

Single Cycle Processor cont'd

## Steps

- Fetch Instructions: then update PC at end of every cycle



- ALU Instructions:  $R[rd] \leftarrow R[rs1] \text{ op } R[rs2]$
- Load Instructions:  $R[rd] \leftarrow \text{Mem}[R[rs1]] + \text{Sign Ext}[imm12]$
- Store Instruction:  $\text{Mem}[R[rs1]] + \text{Sign Ext}[imm11:5 || imm4:0] \leftarrow R[rs2]$
- Cond. Branch Instruction:  $IR = \text{Mem}[PC]$   
 $\text{if } (R[rs1] \text{ op } R[rs2])$   
 $\quad PC \leftarrow PC + (\text{Sign Ext}(imm12) \text{ cc } 1)$   
 $\text{else}$   
 $\quad PC \leftarrow PC + 4$

Arithmetic

## • Unsigned vs signed

- Unsigned - normal + range of  $[0, 2^{32}-1]$

- Signed # representation -  $[-2^{31}, 2^{31}-1]$  range (All reps have MSB as -)

- x Sign Mag (MSB is -)  $[-3, 3], \pm 0$  for 3 bits

- x Ones Complement (- is opposite of +)  $[-3, 3], \pm 0$  for 3 bits

- x Twos Complement (~~Opp + 1~~)  $[-4, 3]$  for 3 bits

- x Sign extension copies msb for new bit.

## • ALU bit-slice - and, or, add



- For subs, add a signal to xor w/  
B.
- Overflow detection w/ xor of  
MSB carry in/out.
- Ripple carry adder chains  
together 1-bit full adders,  
but is less efficient.

## • Carry-lookahead adder - efficient adding. (Propagate or gen. carry ahead)

- Ripple through tree of k-bit blocks (logarithmic vs linear)

- Leaf Nodes:  $P_i, g_i = 1 \text{ gate delay}, S \leq 2 \text{ gate delays}$



usually expand C expression  
to minimize delay  
(C<sub>0</sub> doesn't connect to  
# of C<sub>s</sub> and C<sub>i</sub>)  
S/P dependent on block size

## - Intermediate Nodes



## - Ex.



- Sum bit delay =  $(\text{bit\#}/\text{block size}) * 2 + 1$

Arithmetic Control

- Shift Logic: uses Multiplexers

- $\text{Max}(\text{shift}[0], d, \{\text{d}[6:0], 0\})$
- $\text{Max}(\text{shift}[1], \text{staseo}, \{\text{staseo}[5:0], 00\})$   
etc.

- Addins Multiple #'s

- Chain adders
- Or use carry save into a Trad adder
  - & carry save is 3 sums added (1 in cin) and carry is input for next bit a line down



- Multiplication

- Grade school (how you do multiplication mentally)

contd on next page



- Booth's Algorithm Flowchart



Arithmetic contd

## • Multiplication

- Grade School:



Control:



## • Multiplication

- V2:



- V3 - insert + multiplier in product[31:0]

- Signs - abs val and calc sign after.  $\neq$  (not always)

## • Booth's Algorithm.

- Booth encoding - go from LSB to MSB,  $0 \rightarrow 1 = \bar{T}$ ,  $1 \rightarrow 1 = 0$ ,  $1 \rightarrow 0 = T$ - Multiply while shifting other term left.  $\bar{T}$  means  $x = -1$ .

## • Division

- Grade School Division: (Dividend starts in remainder)



Arithmetic contd

## • Division

- V<sub>2</sub>:  $\text{Rem}[31:0] = \text{Dividend} @ \text{start}$



- V<sub>3</sub>: Right side will be q, left will be rem.

$\text{R}[31:0] = \text{Dividend} @ \text{start}$



x Bot half will be quotient, Top half will be remainder

→ All these methods are restoring division (undo remainder subtraction)

## • Non-restoring Division

- (+) Divisor  
x subtract if (+)R, Add if (-)R

- (-) Divisor  
x Add if (+)R, Subtract if (-)R

- If rem at end is - add divisor one more time.

## • Floating point.

- IEEE-754 - Store S, E, F (sign, exponent, float)  
 $x 2^e \cdot f \cdot 2^S$

x 16 for S

x 8b for float, 11b for double (e)

x 23b for float, 52b for double (f)

- Further systems are needed for rounding, floating point addition, multiplication, division.