

# **EE 437/538B: Integrated Systems**

## **Capstone/Design of Analog Integrated Circuits and Systems**

### **Lecture 4: Overview of Basic TRx Blocks**

Prof. Sajjad Moazeni

[smoazeni@uw.edu](mailto:smoazeni@uw.edu)

Spring 2022

# Outline

---

- **Signaling Basics**
  - Single-ended vs. differential
  - “Current-mode” vs. “Voltage-mode” signaling
  - Termination
- **TX Circuit Design**
  - Z control
  - CML, VM drivers
  - Power vs. swing
  - Serialization options
- **RX Circuit Design**
  - Comparator review
  - Deserialization options

# Single-Ended Signaling



- RX: comparing against a shared reference
  - Reference may be implicit (i.e., ground/supply)
  - Mismatch between shared and individual lines
- TX: generates large variations on power supply
  - SSO – simultaneous switching outputs
- No X TALK immunity
- Supply noise

# Classic Debate

---

- “Differential must be twice as fast as single-ended in order to win”
- Reality more complicated
  - E.g., power supply to signaling pin ratio higher in S.E.
- Short “answer”
  - Differential a lot easier to build and get right the first time
  - Can make S.E. work – but often a lot more painful

# *Single-ended Interconnects*



## \* Optical Links

- In-package I/O
  - XSR/USR links (112G, ...)
  - Memory-processor (<5Gb/s)

# Differential Interconnects



- High-speed wireline I/O for backplanes (PCIe, NVLink, ...)
- Intra-rack links (short range)

# “Voltage-Mode” vs. “Current-Mode”



- Transmission line has both voltage and current...
- Terminology unfortunately heavily overloaded
  - Whether or not  $Z_0$  of driver is high
  - How  $Z_0$  of driver is set
  - What sets output swing

# “Voltage-Mode” vs. “Current-Mode”



$$\left\{ \begin{array}{l} Z_T = R_T = 5\Omega \\ \phi_{SV,pp} = 2I_D R_T \end{array} \right.$$



# Another View



- Load ← • RX opposite of TX → Source
- Signal integrity implications?

canceling the TX term → reduces the swing!

# External vs. Internal Termination



- Internal: makes package L, pad C part of T-line
- External: chip/package become a stub
  - If want on-die term need to control its value...

EZ

# T-Coil Output/Input Stage



[Koszel JSSC 2008]



- Output T-coil between driver and pad allows for splitting of driver, ESD, and pad capacitance
- Provides significant bandwidth enhancement and improved return loss

Used for  
3D - integrated Optical  
TRX

[Steffan ISSCC 2017]



# Active Terminations



triode

✓ tunable

✗ non-linear



tuning range

parasitics



# AC vs. DC Termination

- With diff. can terminate to complement
  - High Z → lower power
  - See more shortly
- TX sets common-mode
  - Can be inconvenient
  - May need wide CM range
- AC-coupled + AC-term
  - Places some requirements on data though



# TX Design: Series vs. Parallel Termination



# CML TX + RX Term



Double-terminated  
on-chip

# CML Power Consumption



$$P = V_{TT} \cdot I_b$$

$$V_{SW,pp} = I_b \cdot R_T \rightarrow P = V_{TT} \cdot \frac{V_{SW,pp}}{R_T}$$

$$V_{TT,min} = V_{tail}^* + V_{drive}^* + \frac{V_{SW,amp}}{2}$$

$$P_{min} = V_{SW,amp}^2 \left( 1 + \frac{2 V_{PSat}}{V_{SW,amp}} \right) \frac{2}{R_T}$$

*Overhead of Diff pair*

# Differential VM TX + RX

- Main motivation: can reduce power for same swing/supply



# Simplified Model And Power



$$\left\{ \begin{array}{l} V_{SW,amp} = V_{TT}/2 \\ I_{TT} = V_{TT}/4R_T \end{array} \right.$$

$$P = V_{TT} \cdot I_{TT} = \frac{V_{TT}^2}{4R_T}$$

better  
than CML  
Drivers

$$\Rightarrow P_{min} = \frac{V_{SW,amp}^2}{R_T}$$

# Bad News: Extra Complexity

- Driver impedance (termination) now set totally by devices
  - Some sort of impedance control is critical
- “High-swing” driver:



EE290C

Lecture 3

24

[Elad Alon]

# Impedance Control

---

EE290C

Lecture 3

26

[Elad Alon]

# Serialization: Input vs. Output

---

- **On-chip clocks often slower than off-chip data-rates**
  - Need to take a set of parallel on-chip data and serialize it
- **Can serialize either at input of TX or at final output**

# TX Multiplexer – Full Rate

4:1

- Tree-mux architecture with cascaded 2:1 stages often used
- Full-rate architecture relaxes clock duty-cycle, but limits max data rate
  - Need to generate and distribute high-speed clock
  - Need to design high-speed flip-flop



# TX Multiplexer – Half Rate

- Half-rate architecture eliminates high-speed clock and flip-flop
- Output eye is sensitive to clock duty cycle
- Critical path no longer has flip-flop setup time
- Final mux control is swapped to prevent output glitches
  - Can also do this in preceding stages for better timing margin



[Sam Palermo]

# Serialization: Input vs. Output



- Input ser. requires on-chip circuitry to run at full line rate
  - May lead to high power consumption
  - In older technologies (0.35um) was hard to support high-freq. clocks
- Output ser. noved burden at pad
  - At the time was highest BW
- Limit in both designs: edge rate
  - Either for the clock or for the data



# Basic RX

- Simplest: RX is just a comparator @  $f_{\text{bit}}$ 
  - (Clocking later)



- Key things to watch out for:

- High sensitivity (low noise, low offset/hysteresis)
- Common-mode input range
- Supply/common-mode rejection
- Max. clock rate
- Power consumption →  $P \propto f$

# Typical Design



# RX Clocked Comparators

- Also called regenerative amplifier, sense-amplifier, flip-flop, latch
- Samples the continuous input at clock edges and resolves the differential to a binary 0 or 1



[Sam Palermo]

# Important Comparator Characteristics

---

- Offset and hysteresis
- Sampling aperture, timing resolution, uncertainty window
- Regeneration gain, voltage sensitivity, metastability  
 $\Delta V_{min}$
- Random decision errors, input-referred noise

50 GS/s  $\xrightarrow{20\text{ps}}$  80 ps <sup>QDR</sup>

[Safe Decision]

# Dynamic Comparator Circuits



[J. Kim]

## A Strong-Arm Latch ✓

- To form a flip-flop
  - After strong-arm latch, cascade an R-S latch
  - After CML latch, cascade another CML latch
- Strong-Arm flip-flop has the advantage of no static power dissipation and full CMOS output levels



[Toifl]

## B CML Latch



[Sam Palermo]

# StrongARM Latch Operation

[J. Kim TCAS1 2009]



- 4 operating phases: reset, sampling, regeneration, and decision

[Sam Palermo]

# StrongARM Latch Operation – Sampling Phase

[J. Kim TCAS1 2009]

- Sampling phase starts when clk goes high,  $t_0$ , and ends when PMOS transistors turn on,  $t_1$
- M1 pair discharges X/X'
- M2 pair discharges out+/-

$$\frac{v_{out}(s)}{v_{in}(s)} = \frac{g_{m1}g_{m2}}{sC_{out}C_x \left( s + \frac{g_{m2}(C_{out} - C_x)}{C_{out}C_x} \right)}$$

$$\cong \frac{g_{m1}g_{m2}}{s^2C_{out}C_x} = \frac{1}{s^2\tau_{s1}\tau_{s2}}$$

where  $\tau_{s1} \equiv C_x/g_{m1}$ ,  $\tau_{s2} \equiv C_{out}/g_{m2}$



# StrongARM Latch Operation – Regeneration

## [J. Kim TCAS1 2009]

- Regeneration phase starts when PMOS transistors turn on,  $t_1$ , until decision time,  $t_2$
- Assume M1 is in linear region and circuit no longer sensitive to  $v_{in}$
- Cross-coupled inverters amplify signals via positive-feedback:

Speed  $\propto \frac{1}{\tau_R} \sim \frac{g_m}{C_{out}(g_{m2,r} + g_{m3,r})}$

$$G_R = \exp\left(\frac{t_2 - t_1}{\tau_R}\right)$$

$$\tau_R = C_{out} / (g_{m2,r} + g_{m3,r})$$



# Conventional RS Latch

- RS latch holds output data during latch pre-charge phase

- Conventional RS latch rising output transitions first, followed by falling transition



[Sam Palermo]

# RX Sensitivity

- RX sensitivity is a function of the input referred noise, offset, and minimum latch resolution voltage

$$v_S^{pp} = 2v_n^{rms} \sqrt{SNR} + v_{min} + v_{offset*}$$

- Gaussian (unbounded) input referred noise comes from input amplifiers, comparators, and termination
  - A minimum signal-to-noise ratio (SNR) is required for a given bit-error-rate (BER)

For BER =  $10^{-12}$  ( $\sqrt{SNR} = 7$ )

- Minimum latch resolution voltage comes from hysteresis, finite regeneration gain, and bounded noise sources

Typical  $v_{min} < 5mV$

- Input offset is due to circuit mismatch (primarily  $V_{th}$  mismatch) & is most significant component if uncorrected

[Sam Palermo]

# Higher Speeds

---

# Demultiplexing RX

- Demultiplexing allows for lower clock frequency relative to data rate
- Gives extra regeneration and pre-charge time in comparators
- Need precise phase spacing, but not as sensitive to duty-cycle as TX multiplexing



[Sam Palermo]

# 1:4 Demultiplexing RX Example



- Increased demultiplexing allows for higher data rate at the cost of increased input or pre-amp load capacitance
- Higher multiplexing factor more sensitive to phase offsets in degrees

[Sam Palermo]