

Introduction

- \* Instruction Set Architecture (ISA) - Instruction Set
  - I/O
  - Mem. Addressing, endianness

- \* Computer Organization - Specifications
  - Inter-Computer Comm.

Brief History : Vacuum Tube → Transistor → Integrations (IC)

→ IBM 370 Arch., ...

System Interconnection (Bus)

Sometimes, Multiprocessor in Server Computer.

\* Cache Memory

## Instructions (Basic)



→ "The Beginning" : Intel 8086

Architectures (ISA) [ CISC (Many ops. happen per instruction) → Intel x86  
RISC (1 Operation per instruction) → ARM ]

*Complex*      *Micro operations*  
*Reduced*      *(or few)*      *(operation)*

Embedded System : Internal system in a larger system or a standalone system.

\* Real-time constraints

Internet of Things : ES with Internet

- Information Technology
- Operational Technology
- Personal
- Sensor / Actuator

## Microcontroller



## Performance Enhancement

### 1. Pipelining



### 2. Branch Prediction :



3. Superscalar Execution : Many instructions per clock cycle
4. Data Flow Analysis : Data depends on result  $\rightarrow$  នៅក្នុងអេដ្ឋីណែនាំ ?
5. Speculative Execution : 2. + 4. នៅក្នុង execution engine នៃ

### Amdahl's Law

Assume workload  $T$



### Little's Law

: In Queueing System,  $\langle N \rangle = \langle \dot{N} \rangle \times t$

$N$  : Number of items

$\dot{N}$  : Rate which items arrive.

→ សម្រាប់ multiple queues, ...

Clock :



សម្រាប់ការបន្ទាន់បច្ចេកទេស : MIPS (million instructions per second)

FLOPS (floating-point logic operations per second)

L MFLOPS, GFLOPS, TFLOPS, ...

OPS (operations per second)

Benchmarking : និមួយនាម ត្រូវបាន G.M., និង p-test

- Easily measured
- High-level, portable
- Wide-distribution
- Represents prog. lang. paradigms

e.g. SPEC Benchmark

## Computer Interconnections & Functions

\* Neumann Architecture : sequential instruction.

- Single R/W mem.
- Addressable mem.



## Fetch-Execute Cycle



1. Pull instruction from mem.
2. PC increments.
3. Load instruction into IR. (Instruction Register)

## Actions

1. Processor  $\leftrightarrow$  Memory (Cacheable)
2. Processor  $\leftrightarrow$  I/O (Not cacheable)
3. Control : Instruction specifying sequence of exec.
4. Data Processing : Arithmetic Operation, Logic Operation, ... .

Instruction Fetch & Exec. of the cycle (in 1/10 Cycles) :- Mem. R/W time: 1.  
i.e., In RISC Arch. : No. of Regs.



Levels of Interrupt: Program e.g. 0#DIV, Ref. outside alloc. mem., ...

Timer : Do something now. / Switch between processes.

I/O : Signal completions/errors to CPU by I/O controller.

Hardware Failure : Power failure / Parity error in Mem.

Interrupt Processing:



I/O Functions

» I/O communicates directly with processor.

Want: I/O comm. with mem. directly. : DMA (Direct Memory Access)



( Ideally, but not practical.  
Introducing "Bus")

**(Bus)** Data Bus : Moving data between modules with line (parallel according to bit width)  
 e.g. 128 lines, → 0-63 → 64-127 →

Problem : Line length

Aufteilung in Segmente!



Address Bus : Signalling locations.

Control Bus : Signalling functions.

### Point-to-Point Interconnection

QPI (2008) : Pairwise connections (P2P)

Layered protocol (e.g. OSI Network Layer)

Packet of data : header - data - Ecc .



QPI layers

Lane : Group of data path / set of wires.



PCI : Peripheral Component Interconnection

Separate Bus ,  $\therefore$  Bus-based schematics

PCIe (PCI Express) ໃຊ້ P2P ແລະ PCIe controller / Switch  
 $\therefore$   $\therefore$  Multilane.



\* Transaction ດີເລີໂຈ set transaction ທີ່ຕົວສູງສາມາດສຳເນົາ. ຖໍ່ໄຟ Rollback  
 $\therefore$  ທີ່ຕົວເນັດ.

e.g.  $\text{ທີ່ຕົວ} \rightarrow \text{ປໍລິຍານີ້} \rightarrow \text{ວິທີ}$   
 $\downarrow$   
 $\text{ດີເລີໂຈ} \quad \text{if failed}$   
 $(\text{Rollback operation})$



PCIe TL (Transaction Layers) – Request, Mem. Access, Config, I/O, ...

\* PCIe ပုဂ္ဂနယ် ဆောင်ရွက်မှု DMA တဲ့ e.g. GPU, Accelerator, TPV, ...

မြို့မြို့ TL Layer : Encapsulate higher layer onto lower layer).

Implementation via PCIe စား NVMe

---

## Communication Protocols

e.g. EDID in HDMI

Asynchronous : No clock at all.  
Synchronous : with clock from Master

Serial : Data stream  
Parallel : Data width



## UART (Universal Asynchronous Receiver/Transmitter)

Protocol, Cheap Hardware Implementation  
Standards like UART: RS-232, RS-485, ... between Voltage, Comparison, ...  
use low #bits :: clock difference / deviation → offset.  
can be any bits (e.g. 5 bit, 8 bit, 7 bit ...)



\* Standards recommend 2 ms timeout.

- RXD (from TXD)

- TXD (from RXD)

- RTS # Ready to send

- CTS # Clear to send



DTE សែរ RTS និង DCE

ពួកគេចងាយបញ្ចូន DTE → DCE

DCE នូវការបញ្ចូន CTS និង DTE

±12 V Lower Interference,  
High Signal Integrity.

UART  $\xrightarrow{\phi}$  RS-232  $\phi$ : Inverse mapping:  $[0, +5] \rightarrow [+12, -12]$

SPI (Serial Peripheral Interface) — ~~multiple slaves~~, clock  $\approx$  1 MHz.

Synchronous, Full-Duplex, Master-Slave

Multiple Slaves

|   |                       |
|---|-----------------------|
| O | SCK                   |
| O | MOSI                  |
| O | MISO                  |
| O | $\overline{SS}$ (NSS) |



Daisy chain Multiple Slaves

\* Daisy chaining

multiple slaves in one bus.



### Communication (Hardware)

MOSI  $\longrightarrow$  } Data Path  
 MISO  $\longleftarrow$

SCK  $\longrightarrow$  } Control Path  
 $\overline{SS}$   $\longrightarrow$



Data Validity : Design w/ Data change into Falling CLK  
 Data Latch(Read) into Rising }

# I<sup>2</sup>C (Inter IC / Inter Integrated Circuit)

Synchronous, Half-Duplex, Multi-Master-slave.

- SDA : Data Line
- SCL : Clock Line

- Software Addressable
- Multi-master/slave with CD (collision detection)
- Serial, 8-bit, Bi-directional, 4-mode
- On-chip filtering

Wired-AND (Open-Drain) : To pull-up resistor.



\* Data changes on clock L., latches on clock H.

Wire-and Signal for  
data transmission  
into signal reception

I<sup>2</sup>C → MSB first  
LSB last (D<sub>7</sub>D<sub>6</sub>D<sub>5</sub>...D<sub>1</sub>D<sub>0</sub>)



Acknowledge on bit 9 (with SDA Low)

If ack is slow, clock from master can "stretch".



From slide

## Clock Synchronization (Multi-master)

(

One clock will wait for slower clock.

Data Arbitration : If 2 devices crash from calling same address, devices always check for data, if it is wrong, that device will stop transmitting.

Data payload :

Frame

|     |            |
|-----|------------|
| 1-7 | Addr.      |
| 8   | R/W (H/L)  |
| 9   | \Ack(NACK) |
|     |            |
| 1-7 | Data       |
| 8   | Data       |
| 9   | \Ack.      |



\* NACK w/o Addr, R/W = no slave

\* NACK w/o Data = slave rejects / (W)  
NO ACCEPT

\* NACK w/o Data = to slave rejects (R)

## I<sup>2</sup>C Speed Standard

Wire-AND

|            |          |
|------------|----------|
| Standard   | 100 kbps |
| Fast       | 400 kbps |
| Fast+      | 1 Mbps   |
| High Speed | 3.4 Mbps |

Bidirectional

Ultra fast mode 5 Mbps (Unidirectional),  
uses push-pull driver.

## CAN (Controller Area Network)

| Asynchronous, half duplex, Multi master/slave.



- Error Counters

### Identifier/Address

11-bit → 29-bit

### Concepts

1. Messages Prioritization
2. Latency Guarantee (min. time)
3. Multicast with time sync.



4. Error detection, signaling
5. Auto Re-transmission.

## USB (Universal Serial Bus)

| Asynchronous, half-duplex?, master-slave.

| Power management

\* Upstream, Downstream Connection

1.x - 2.0 :  
 Vcc  
 D+  
 D-  
 GND

3.0 :  
 SSTX+/-  
 SSRX+/-

Devices:  
 1. Host  
 2. Hub / Root Hub  
 3. Peripherals

Topology



\* USB Enumeration [ ]  
 into identify peripherals  
 (16bit address)

Hub  
 1. Bus-powered  
 2. Self-powered

Peripherals  
 1. Standalone (1 addr.)  
 2. Compound (many addr. in 1 dev.)

(USB)

## Power Management

\* Host must be able to disable power from 1 device.

[Initiator implement, Initiator.]

Endpoints : 16 endpoints  
each 4-bit addr.

Transfer types

|                |                                                                                                                                                                     |                                       |
|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------|
| 1. Control     | :                                                                                                                                                                   | Config. & setup                       |
| 2. Bulk        | :                                                                                                                                                                   | Sends data over time. e.g. files, ... |
| 3. Isochronous |   | Mouse / keyboard, data streaming      |
| 4. Interrupt   |   | Polling                               |

---

## Controls & Feedback

(Regulation or Command  
Goal (Target value))

- Open-loop control : User → Goal

User tries to achieve goal

Fwd. path → Output

- Closed-loop control : User



Linear control

$$X_i - BX_o = \frac{X_i}{A} \rightarrow G = \frac{X_o}{X_i} = \frac{A}{1+AB} \text{ open-loop gain}$$

overall gain  
"closed-loop gain"

$(G > A)$  AB neg. → positive feedback (e.g. set闹钟, 空调) → unstable

$(G < A)$  AB pos. → negative feedback

If  $AB \gg 1$  :  $G \approx \frac{A}{AB} = \frac{1}{B}$  好像 Fwd. path



feedback only

$$G \cdot \text{Bandwidth} = \text{const.}$$

## Intro to Operational Amplifier



↓  
OpAmp.



Voltage Divider (V. Div)

$$\frac{V_I}{a+b} = \frac{V_O}{b}$$

$$\therefore V_O = \left(\frac{b}{a+b}\right) V_I$$

## Operational Amplifier



- 1: Inverting input
- 2: Non-Inverting input
- 3: Output

↳ Active / IC  
Models / Initiatives ( $V_{CC}$ )

Simple Amplifier



## Ideal OpAmp



1. Infinite Input Impedance (Disconnected)
2. Zero output impedance
3. Zero common-mode gain
4. Infinite open-loop gain ( $A$ )
5. Infinite bandwidth

$$V_O = A(V_+ - V_-)$$

$$V_O \propto \Delta V_{in}, \text{ where } |V|$$

- Inverting closed-loop configuration (ideal OpAmp)



- Weighed summer



- Inverter 2 outputs  $\rightarrow$  non-invert

V<sub>O</sub> • Non-inverting config.



- Voltage lowerer (Unity-gain buffer) / Follower



## • Difference Amplifier



$$v_{O1} = -\frac{R_2}{R_1} V_{I1}$$



$$\begin{aligned} v_{O2} &= \left(1 + \frac{R_2}{R_1}\right) v_I \\ &= \left(1 + \frac{R_2}{R_1}\right) \left(\frac{R_4}{R_3 + R_4}\right) v_{I2} \Rightarrow \frac{R_2}{R_1} V_{I2} \end{aligned}$$

$\frac{R_1}{R_2} = \frac{R_4}{R_3}$



$$\therefore v_O = v_{O1} + v_{O2} = \frac{R_2}{R_1} (V_{I1} - V_{I2})$$

e.g.



With C :



(Integrator)



(Differentiator)



(Filter)

### Comparators



#### Schmitt Trigger



(Inverting)



: Noise Margin Threshold



#### (Non-inverting)



$$V_0 = -\frac{R_1}{R_1+R_2} V_{in}$$

### Homework

LT Spice

## PID Control

\* Feedback Loop :



- Simple

↳ PWM On if lower, Off if higher than setpoint. → oscillate



- Proportional (P) : Power

↳ output =  $K_p \cdot \text{error}$ .

Problem: Too much → oscillate

Too low → never reach setpoint.



- Proportional-Integral (PI) : Position

↳ output =  $K_p \cdot \text{error} + K_I \cdot \text{accumulated\_error} \cdot \Delta t$



- Proportional-Derivative (PD) : Monitors speed by looking for change  
: speed

↳ output =  $K_p \cdot \text{error} + K_D \cdot \frac{(e(t) - e(t_{last}))}{\Delta t}$



→ PID : Good position + Good speed

\* Saturation (clamp output!) :  $\text{MIN\_OP} \leq \text{output} \leq \text{MAX\_OP}$

\* Timesteps :  $\tilde{\tau}$  (approximation timestep) von  $K_I, K_D$

Tuning  
P : 0 → should be at full power  
I : 0 → until steady state is OK.  
D : 0 → dampen overshoot & improve responsiveness

Application / Uses : "Cascaded PID"

\* Using OP Amp. to make PID

---

## Booting Process

Know: Interrupt Vector :  $\text{addr. } \text{of } \text{I} \text{ in } \text{addr. } \text{of } \text{I} \text{O}$

## Interrupt - DMA

- I/O - Programmed : Polling
  - Interrupt-driven
  - Direct Memory Access  $\rightarrow$  Uses DMA controller
- } need CPU intervention
- \* Interrupt from I/O  
\* Exception from CPU

## DMA Controller Config.



ARM Interrupts ← Non-Maskable Interrupt (NMI) : ignore  $\% \text{ of } \text{it}$   
Interrupt Request (IRQ) / Masking

Tail-chaining : Push next interrupts onto the stack.  
Then pop all at once once done.

IVT (vector table) : Lookup table for types, priority, loc. of handler

