



# McGill

## ECSE 421 Lecture 11: Memories, Communication, Output

ESD Chapter 3

© Peter Marwedel, Brett H. Meyer

# Last Time

---

- Embedded systems employ hardware in a loop
- VLIW Architecture
  - Reduce the cost of instruction-level parallelism
- Re-configurable Logic
  - If custom HW is too expensive and SW is too slow

# Where Are We?

| W  | D | Date        | Topic |                                            | ESD         | PES        | Out | In  | Notes             |
|----|---|-------------|-------|--------------------------------------------|-------------|------------|-----|-----|-------------------|
| 1  | T | 12-Jan-2016 | L01   | Introduction to Embedded System Design     | 1.1-1.4     |            |     |     |                   |
|    | R | 14-Jan-2016 |       | Introduction to Embedded System Design     | 1.1-1.4     |            |     |     |                   |
| 2  | T | 19-Jan-2016 | L02   | Specifying Requirements / MoCs / MSC       | 2.1-2.3     |            |     |     |                   |
|    | R | 21-Jan-2016 | L03   | CFSMs                                      | 2.4         |            |     |     |                   |
| 3  | T | 26-Jan-2016 | L04   | Data Flow Modeling                         | 2.5         | 3.1-5,7    | LA1 |     |                   |
|    | R | 28-Jan-2016 | L05   | Petri Nets                                 | 2.6         |            |     |     | Thru Slide 21     |
| 4  | T | 2-Feb-2016  | L06   | Discrete Event Models                      | 2.7         | 4          |     |     | G: Zaid Al-bayati |
|    | R | 4-Feb-2016  | L07   | DES / Von Neumann Model of Computation     | 2.8-2.10    | 5          | LA2 | LA1 |                   |
| 5  | T | 9-Feb-2016  | L08   | Sensors                                    | 3.1-3.2     | 7.3,12.1-6 |     |     |                   |
|    | R | 11-Feb-2016 | L09   | Processing Elements                        | 3.3         | 12.6-12    |     |     |                   |
| 6  | T | 16-Feb-2016 | L10   | More Processing Elements / FPGAs           |             |            |     | LA2 |                   |
|    | R | 18-Feb-2016 | L11   | Memories, Communication, Output            | 3.4-3.6     |            | LA3 |     |                   |
| 7  | T | 23-Feb-2016 | L12   | Embedded Operating Systems                 | 4.1         |            |     |     |                   |
|    | R | 25-Feb-2016 |       | <b>Midterm exam: in-class, closed book</b> |             |            | P   |     | Chapters 1-3      |
|    | T | 1-Mar-2016  |       | <b>No class</b>                            |             |            |     |     | Winter break      |
|    | R | 3-Mar-2016  |       | <b>No class</b>                            |             |            |     |     | Winter break      |
| 8  | T | 8-Mar-2016  | L13   | Middleware                                 | 4.4-4.5     |            |     | LA3 |                   |
|    | R | 10-Mar-2016 | L14   | Performance Evaluation                     | 5.1-5.2     |            |     |     |                   |
| 9  | T | 15-Mar-2016 | L15   | More Evaluation and Validation             | 5.3-5.8     |            |     |     |                   |
|    | R | 17-Mar-2016 | L16   | Introduction to Scheduling                 | 6.1-6.2.2   |            |     |     |                   |
| 10 | T | 22-Mar-2016 | L17   | Scheduling Aperiodic Tasks                 | 6.2.3-6.2.4 |            |     |     |                   |
|    | R | 24-Mar-2016 | L18   | Scheduling Periodic Tasks                  | 6.2.5-6.2.6 |            |     |     |                   |



# Today

- Embedded system hardware is frequently used in a loop (“hardware in a loop”):



# Memory

---

- Efficiency is once again a concern
  - Speed (latency and throughput); predictable timing
  - Energy efficiency
  - Size and cost
  - Other attributes (volatile vs. persistent, *etc.*)

# Where is the Power Consumed?

- Consumer portable systems -



© ITRS, 2010

- Memory and logic, static and dynamic relevant
- Current trends will violate maximum power constraint (0,5-1 W)

# Where is the Power Consumed?

- Mobile phones -



Source: Siemens

[O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.



# Where is the Power Consumed?

- Stationary systems -



- Dynamic power in logic dominates
- Overall power consumption a nightmare for environmentalists

# Power is Consumed in Memories!



IEEE Journal of SSC Nov. 96



[Based on slide by and ©: Osman S. Unsal, Israel Koren, C. Mani Krishna, Csaba Andras Moritz, U. of Massachusetts, Amherst, 2001]

[Segars 01 according to Vahid@ISSS01]

# Energy Consumption and Access Time

- Scratchpad SRAM vs. DDR2 DRAM -



- <http://www.hpl.hp.com/research/cacti/>
- 16 bit read; size in bytes;
- 65 nm for SRAM, 80 nm for DRAM

Source: Olivera Jovanovic, TU Dortmund, 2011

# Energy Consumption and Access Times

- Multi-Ported Register Files -

Cycle Time (ns)



Area ( $\lambda^2 \times 10^6$ )



Power (W)



# CPU and Memory Performance Gap

- Growing gap between CPU and main DRAM speed



- Similar problems in
    - Embedded systems
    - MPSOCs
- ⇒ Memory access times >> processor cycle times  
⇒ “Memory wall” problem



[P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane]

# Single Core Performance Growth



Copyright © 2011, Elsevier Inc. All rights Reserved.

[Hennessy/Patterson: Computer Architecture, 5th ed., 2011]

# Multi-Core Performance Growth



Copyright © 2011, Elsevier Inc. All rights Reserved.

[Hennessy/Patterson: Computer Architecture, 5th ed., 2011]

# Hierarchical Scratch Pad Memories (SPM)

SPMs are small, physically separate memories mapped into the address space.

## Hierarchy



## Address Space



## Example



ARM7TDMI cores, well-known for low power consumption



# Off-chip SRAM vs. On-chip SPM

ATMEL board with  
ARM7TDMI and ext. SRAM



# Why Not Just Use Caches?

- Energy overhead
  - Parallel set accesses, tag comparators, muxes



[R. Banakar, S. Steinke, B.-S. Lee, 2001]

# Energy and Associativity



# Communication: Requirements

---

- Real-time behavior
- Efficient, economical  
(*e.g.*, centralized power supply)
- Appropriate bandwidth and communication delay
- Robustness (*e.g.*, operates in extreme conditions)
- Fault-tolerance
- Diagnosability, maintainability
- Security

# Priority-based Arbitration

For example, consider a simple bus



- Bus arbitration is frequently priority-based
- Communication delay depends on communication traffic of other partners
- No tight real-time guarantees (except for highest priority partner)

# Real-time Behavior

---

- Carrier-sense multiple-access/collision-detection  
**(CSMA/CD, Standard Ethernet)**
  - No guaranteed response time
- Alternatives:
  - Token rings, token busses
  - Carrier-sense multiple-access/collision-avoidance  
**(CSMA/CA)**

# CSMA/CA

---

- WLAN technique
  - Request precedes transmission
- Each partner gets an ID (priority)
- Protocol
  - Partners try setting their ID on the bus
  - Partners detecting higher ID disconnect themselves
  - Highest priority partner gets guaranteed response time; others only if they are given a chance

# Time Division Multiple Access (TDMA)

- Each communication partner is assigned a fixed time slot



<http://www.ece.cmu.edu/~koopman/jtdma/jtdma.html#classical>

[E. Wandeler, L. Thiele: Optimal TDMA Time Slot and Cycle Length Allocation for Hard Real-Time Systems, ASP-DAC, 2006]

- Master sends sync
- Some waiting time
- Each slave transmits in its time slot
- Variations: truncating unused slots, >1 slots per slave
- TDMA resources have a deterministic timing behavior
- TDMA provides QoS guarantees in networks-on-chips

# FlexRay

---



- Developed by the FlexRay consortium (BMW, Ford, Bosch, DaimlerChrysler, ...)
- Specified in SDL
- Improved error tolerance and time-determinism
- Meets requirements with transfer rates >> CAN  
High data rate can be achieved:
  - initially targeted for ~10Mbit/sec;
  - design allows much higher data rates
- TDMA protocol
- Cycle subdivided into a static and dynamic segments

# TDMA in FlexRay

- Each node has exclusive access for a time
- Dynamic segment for transmission of variable length data
- Fixed priorities in dynamic segment
  - Minislots for each potential sender
  - Bandwidth used only when it is actually needed



# Time Intervals in FlexRay



© Prof. Form, TU Braunschweig, 2007 fit

Quelle: Vector Informatik GmbH

- **Microtick ( $\mu t$ )** = Clock period within a partner; may differ between partners
- **Macrotick ( $mt$ )** = Basic unit of time, synchronized between partners ( $=r_i \times \mu t$ ,  $r_i$  varies between partners  $i$ )
- **Slot** = Interval allocated per sender in static segment ( $=p \times mt$ ,  $p$ : fixed (config))
- **Minislot** = Interval allocated per sender in dynamic segment ( $=q \times mt$ ,  $q$ : variable)  
Short minislot if no transmission needed; starts after previous minislot
- **Cycle** = Static segment + dynamic segment + network idle time



# Structure of FlexRay Networks

- Bus guardian protects the system against failing processors
  - *E.g.*, so-called “babbling idiots”



# Other Busses

---

- IEEE 488: Designed for laboratory equipment.
- Sensor/actuator busses: connecting sensors/actuators, low rates
- Field busses
- CAN: *Car-area network*; controller bus for automotive applications
- LIN: Low cost bus for sensors/actuators in the automotive domain
- MOST: Multimedia bus for the automotive domain (not a field bus)
- MAP: Bus designed for car factories
- Process Field Bus (Profibus): used in smart buildings
- The European Installation Bus (EIB): bus designed for smart buildings
  - CSMA/CA; low data rate
- Standard Ethernet
  - Timing predictability is an issue

# Wireless Communication: Examples

---

- IEEE 802.11 a/b/g/n
- UMTS; HSPA
- DECT
- Bluetooth
- ZigBee
- Timing predictability of wireless communication?

# Review: Kirchhoff's Current Law

- For any point in an electrical circuit,
  - the sum of currents flowing into that point is equal to
  - the sum of currents flowing out of that point
- *Principle of conservation of electric charge*

Formally, for any node in a circuit:

$$\sum_k i_k = 0$$

Count current flowing away from node as negative

Example:



$$i_1 + i_2 + i_4 = i_3$$

$$i_1 + i_2 - i_3 + i_4 = 0$$

[Jewett and Serway, 2007]

# Review: Kirchhoff's Voltage Law

- Conservation of energy implies that:
  - the sum of the voltages across elements in any closed circuit must be zero

Formally, for any loop in a circuit:

$$\sum_k V_k = 0$$

Count voltages traversed against arrow direction as negative

Example:



$$V_1 - V_2 - V_3 + V_4 = 0$$

$V_3 = R_3 \times I_3$  if current counted in the same direction as  $V_3$

$V_3 = -R_3 \times I_3$  if current counted in the opposite direction

[Jewett and Serway, 2007]

# Review: Operational Amplifiers

- Operational amplifiers (op-amps) are devices amplifying the voltage difference between two input terminals by a large gain factor  $g$



$$V_{\text{out}} = (V_+ - V_-) \cdot g$$

High impedance input terminals  
⇒ Currents into inputs  $\approx 0$

For an **ideal** op-amp:  $g \rightarrow \infty$

(In practice:  $g$  may be around  $10^4 \dots 10^6$ )

# Review: Op-Amps with Feedback

- In circuits, negative feedback is used to define the actual gain



$$V_{\text{out}} = -g \cdot V_- \quad (\text{op-amp feature})$$

$$I \cdot R_1 + V_{\text{out}} - V_- = 0 \quad (\text{loop rule})$$

$$\Rightarrow I \cdot R_1 + -g \cdot V_- - V_- = 0$$

$$\Rightarrow (1+g) \cdot V_- = I \cdot R_1$$

Due to the feedback to the *inverted* input,  $R_1$  reduces voltage  $V_-$ . To which level?

ground

$$\Rightarrow V_- = \frac{I \cdot R_1}{1+g}$$

$$V_{-,ideal} = \lim_{g \rightarrow \infty} \frac{I \cdot R_1}{1+g} = 0$$

$V_-$  is called **virtual ground**: the voltage is 0, but the terminal may not be connected to ground

# Digital-to-Analog (D/A) Converters

- Various types, can be quite simple, e.g.:



# Generating Current Proportional to $x$

Loop rule:

$$x_0 \cdot I_0 \cdot 8 \cdot R + V_- - V_{ref} = 0$$

$$\Rightarrow I_0 = x_0 \times \frac{V_{ref}}{8 \times R}$$

In general:  $I_i = x_i \times \frac{V_{ref}}{2^{3-i} \times R}$



Junction rule:  $I = \sum_i I_i$

$$\Rightarrow I = x_3 \times \frac{V_{ref}}{R} + x_2 \times \frac{V_{ref}}{2 \times R} + x_1 \times \frac{V_{ref}}{4 \times R} + x_0 \times \frac{V_{ref}}{8 \times R} = \frac{V_{ref}}{8 \times R} \times \sum_{i=0}^3 x_i \times 2^i$$

$I \sim \text{nat}(x)$ , where  $\text{nat}(x)$ : natural number represented by  $x$

# Generating Voltage Proportional to $x$

Loop rule\*:  $y + R_1 \times I' = 0$

Junction rule<sup>o</sup>:  $I = I'$

$$\Rightarrow y + R_1 \times I = 0$$

From the previous slide

$$I = \frac{V_{ref}}{8 \times R} \times \sum_{i=0}^3 x_i \times 2^i$$

Hence:

$$y = -V_{ref} \times \frac{R_1}{8 \times R} \sum_{i=0}^3 x_i \times 2^i = -V_{ref} \times \frac{R_1}{8 \times R} \times nat(x)$$



Op-amp turns  
current  $I \sim nat(x)$   
into a voltage  $\sim$   
 $nat(x)$

# Output Generated from Signal $e_3(t)$



\* Assuming  
“zero-order  
hold”

Is it Possible to  
reconstruct  
input signal?



# Reconstructing Input at the Output



- Assume the Nyquist criterion is met
- Let  $\{t_s\}$ ,  $s = \dots, -1, 0, 1, 2, \dots$  be times we sample  $g(t)$
- Assume a constant sampling rate of  $1/p_s$   
( $\forall s: p_s = t_{s+1} - t_s$ )
- According sampling theory, we can approximate the input signal as follows:

$$z(t) = \sum_{s=-\infty}^{\infty} \frac{y(t_s) \sin \frac{\pi}{p_s} (t - t_s)}{\frac{\pi}{p_s} (t - t_s)}$$

Weighting factor  
for influence of  
 $y(t_s)$  at time  $t$

[Oppenheim, Schafer, 2009]



# Weighting Factor

$$\text{sinc}(t - t_s) = \frac{\sin\left(\frac{\pi}{p_s}(t - t_s)\right)}{\frac{\pi}{p_s}(t - t_s)}$$



# Contributions from Sampling Instances



# Attempted Reconstruction



# How Is $\text{sinc}( )$ Computed?

- Filter theory
  - Interpolation by an ideal low-pass filter  
( $\text{sinc}$  is the Fourier transform of a unit pulse)

$$z(t) = \sum_{s=-\infty}^{\infty} \frac{y(t_s) \sin \frac{\pi}{T_s} (t - t_s)}{\frac{\pi}{T_s} (t - t_s)}$$



The filter removes high frequencies present in  $y(t)$

# Limitations

---

$$z(t) = \sum_{s=-\infty}^{\infty} \frac{y(t_s) \sin \frac{\pi}{T_s} (t - t_s)}{\frac{\pi}{T_s} (t - t_s)}$$

- Reconstruction with  $\text{sinc}( )$  is precise
- Actual filters do not compute  $\text{sinc}( )$   
In practice, filters are used as an approximation.  
Computing good filters is an art itself!
- All samples must be known to reconstruct  $e(t)$  or  $g(t)$   
 $\Rightarrow$  Waiting indefinitely before output generation!
  - In practice, only a finite set of samples is available
- Quantization noise cannot be removed

# Output

---

- **Displays**
  - Display technology is extremely important
  - Major research and development efforts
  - State-of-the-art: dual-technology displays
- **Electro-mechanical devices**
  - Influence the environment through motors and other electro-mechanical equipment
  - Frequently require analog input

# Actuators

---

- Huge variety of actuators and output devices
- Microsystems motors as examples (© MCNC)



(© MCNC)



# Actuators (2)



Courtesy and ©: E. Obermeier,  
MAT, TU Berlin

<http://www.piezomotor.se/pages/PWtechnology.html>

[http://www.elliptec.com/fileadmin/elliptec/User/Produkte/Elliptec\\_Motor/Elliptecmotor\\_How\\_it\\_works.h](http://www.elliptec.com/fileadmin/elliptec/User/Produkte/Elliptec_Motor/Elliptecmotor_How_it_works.h)



# Secure Hardware

---

- Security needed for communication and storage
- Security often requires cryptography and cryptographic logic
  - *E.g.*, to resist *side-channel* attacks like
    - Measurements of the supply current or
    - Electromagnetic radiation
- Physical protection (shielding, sensor detecting tampering with the modules)
- Smart cards: special case of secure hardware
  - Have to run with a very small amount of energy.
- In general, we have to distinguish between different levels of security and knowledge of “adversaries”

# Summary

---

- Embedded systems employ hardware in a loop
- Embedded Memory
  - Must be efficient
  - Why not use caches?
- Communication
  - Many important requirements
  - Must be robust
  - Must be able to satisfy real-time constraints
- Output
  - DAC using Op-Amps and a filter

# Next Time

---

- Embedded Operating Systems
  - Chapter 4.1