



# High-Speed Link Circuits and Systems for Chiplet

---

江文宁  
芯片与系统前沿技术研究院

博学而笃志 切问而近思

# 目 录

## CONTENTS

01

### Introduction

What is high-speed link for Chiplet?

What is the advanced features?

Why we need it?

02

### Signal Integrity

What is signal integrity?

The sources

The impact

03

### TX/RX

What is RX?

The architecture and circuit level implementation

The advanced techniques

04

### TX/RX Continue

More concerns

The examples

1. ECEN720 from Sam Palermo, TAMU, " High speed wireline links circuit design"
2. ECE 546 from Jose E. Schutt-Aine, UIUC, "High-Speed Links"
3. Other internet info



FUDAN UNIVERSITY

# 01 Introduction

博学而笃志 切问而近思

01

# What is Interface?



Sound, Optical, Electricity, Cable, Magnetism, ...



01

# What is Interface?





Transmitter/ Receiver



Transmitter/ Receiver

Transmitter/ Receiver

Transmitter/ Receiver





01

# What is Interface?





Dual In-line Package  
双列直插式封装



Small Outline Package  
SOP/TSOP/SSOP/VSOP  
博学而笃志，切问而近思  
小引出线封装



Quad Flat Package  
(QFP 2.0-3.6mm/LQFP 1.4mm/  
TQFP 1.0mm) 四方扁平式封装



Quad Flat No-leads Package  
方形扁平无引脚封装



Land Grid Array  
栅格阵列封装  
金属触点，可拆卸



Ball Grid Array  
球栅阵列封装  
更小焊点，焊死

- Separate chips
- Different packages: BGA, QFN, DIP...
- Long path: PCB trace, cable, wireless...
- Diverse standards: HDMI, USB Type-C, LVDS, SerDes...



## Parallel: Multiple connections between chips

- Consumes more power
- Bigger ICs with complex packages
- Susceptible to EM interference
- Challenging skew balancing requirements
- Practically no latency

## Serial: Single connection pair

- Saves power
- Fewer pins makes compact IC
- Robust EM performance
- Clock can be recovered from data
- Adds latency

• **Area → Cost!**

# What is parallel link?



System synchronous interface



source synchronous interface



- Total data =  $\text{data}_{\text{1line}} * N_{\text{channel}}$
- Example: DDR4 is parallel link

# How to improve parallel data rate?



System synchronous interface



source synchronous interface



## 1. Higher clock rate?

- TX skew 50ps, channel skew 50ps, clock jitter  $\pm 50\text{ps}$ , RX sample 200ps
- **Max clock frequency limit 2.5G(DDR), 1.25G(SDR)**

## 2. More traces?

- Large area (IO PAD/ESD/package/cable) → cost!
- Simultaneous switching output (SSO) noise
- Crosstalk



## 2. More traces?

- Large area → cost!
- Simultaneous switching output (SSO) noise → differential traces?  
→ small inductance?



## SERializer/DESerializer



## SERializer/DESerializer



### Parallel link

- Off-chip traces → expensive ☹
- Clock rate limit ☹

### SerDes

- On-chip traces → convenient ☺
- No clock line (CDR) ☺
- Differential link
- Equalization

# What is high-speed link in Chiplet?



- **Advanced Package ( $<=2\text{mm}$ )**
- **High density, Low latency**
- **High bandwidth, High efficiency**
- **Low BER, Universal**

# What is UCle?

## Board Members

Leaders in semiconductors, packaging, IP suppliers, foundries, and cloud service providers are joining together to drive The open chiplet ecosystem.

JOIN US!



UCle 1.0 on March 2, 2022  
UCle 1.1 on August 8, 2023

- **Universal Chiplet Interconnect Express**
- **Open specification for die-to-die interconnect and serial bus between Chiplets.**



**FDI: Flit aware D2D interface**

**CRC: Cyclic redundancy check**

**RDI: Raw Die-to-Die interface**

**AFE: Analog front-end**



(c) Two CXL stacks multiplexed inside the adapter

**Multi-protocol to one adapter**



## Standard Package Module



**No lane repair**

**Sideband:**

1. Out of channel for link training and interface;
2. Access of registers
3. Link management packets
4. Parameter exchanges

## x64 and x32 Advanced Package Module

Multi-die Advanced Package Module



- 1 redundant for Valid
- 1 redundant for Clock and Track
- 4 redundant for 64 Data line  
(2 redundant for 32 Data line)

## Valid

**Valid framing example**



### 4-7. Clock gating



1. TX byte framing (data valid); 2. Clock gating (fast response or idle)

## Track



**Runtime clock retimer in RX**

# What is UCLe?

## Track

### Forwarded clock frequency and phase

| Data rate (GT/s) | Clock freq. (fCK) (GHz) | Phase -1 | Phase-2 | Deskew (Req/Opt) |
|------------------|-------------------------|----------|---------|------------------|
| 32               | 16                      | 90       | 270     | Required         |
|                  | 8                       | 45       | 135     | Required         |
| 24               | 12                      | 90       | 270     | Required         |
|                  | 6                       | 45       | 135     | Required         |
| 16               | 8                       | 90       | 270     | Required         |
| 12               | 6                       | 90       | 270     | Required         |
| 8                | 4                       | 90       | 270     | Optional         |
| 4                | 2                       | 90       | 270     | Optional         |

### Runtime clock retimer in RX

## Groups for different bump pitches

Advanced package



| Bump Pitch (um)  | Minimum Frequency (GT/s) | Expected Maximum Frequency (GT/s) |
|------------------|--------------------------|-----------------------------------|
| Group 1: 25 - 30 | 4                        | 12                                |
| Group 2: 31 - 37 | 4                        | 16                                |
| Group 3: 38 - 44 | 4                        | 24                                |
| Group 4: 45 - 55 | 4                        | 32                                |

**Advanced package → small bump pitch → low frequency, low power, small area, high density**

# What is UCLE?

| Parameter                                                        | Advanced Package<br>(x64)                        |       |         |                    | Standard Package   |                    |                    |    |
|------------------------------------------------------------------|--------------------------------------------------|-------|---------|--------------------|--------------------|--------------------|--------------------|----|
| Data Width (per module)                                          | 64                                               | 64    | 64      | 16                 | 16                 | 16                 | 16                 | 16 |
| Data Rate (GT/s)                                                 | 4/8/12                                           | 16    | 24/32   | 4-16               | 4/8/12             | 16                 | 24/32              |    |
| Power Efficiency Target (pJ/b)                                   | See Table 1-3                                    |       |         |                    |                    |                    |                    |    |
| Latency Target (TX+RX) (UI) <sup>1</sup><br>(Target upper bound) | 12                                               | 12    | 16      | 12                 | 12                 | 12                 | 16                 |    |
| Idle Exit/Entry Latency (ns)<br>(target upper bound)             | 0.5                                              | 1     | 1       | 0.5                | 0.5                | 1                  | 1                  |    |
| Idle Power<br>(% of peak power)<br>(target upper bound)          | 15                                               | 15    | 15      | 15                 | 15                 | 15                 | 15                 |    |
| Channel Reach (mm)                                               | 2                                                | 2     | 2       | 2-10               | 25                 | 25                 | 25                 |    |
| Die Edge Bandwidth Density<br>(GB/s/mm) <sup>2</sup>             | See Table 1-3                                    |       |         |                    |                    |                    |                    |    |
| Bandwidth area density<br>(GB/s/mm <sup>2</sup> )                | 158/316/473                                      | 631   | 710/947 | 21-85              | 21/42/64           | 85                 | 109/145            |    |
| PHY dimension width (um) <sup>3</sup>                            | 388.8                                            | 388.8 | 388.8   | 571.5 <sup>4</sup> | 571.5 <sup>4</sup> | 571.5 <sup>4</sup> | 571.5 <sup>4</sup> |    |
| PHY dimension Depth (um) <sup>5</sup>                            | 1043                                             | 1043  |         | 1320               | 1320               | 1320               | 1540               |    |
| ESD <sup>6</sup>                                                 | 30V CDM (Anticipating going to 5-10V in Future.) |       |         |                    |                    |                    |                    |    |

# What is UCle?

## UCle Key Performance Targets



| Metric                                                   | Link Speed/<br>Voltage    | Advanced Package<br>(x64) | Standard Package |
|----------------------------------------------------------|---------------------------|---------------------------|------------------|
| Die Edge Bandwidth Density <sup>1</sup><br>(GB/s per mm) | 4 GT/s                    | 165                       | 28               |
|                                                          | 8 GT/s                    | 329                       | 56               |
|                                                          | 12 GT/s                   | 494                       | 84               |
|                                                          | 16 GT/s                   | 658                       | 112              |
|                                                          | 24 GT/s                   | 988                       | 168              |
|                                                          | 32 GT/s                   | 1317                      | 224              |
| Energy Efficiency <sup>2</sup><br>(pJ/bit)               | 0.7 V<br>(Supply Voltage) | 0.5 (<=12 GT/s)           | 0.5 (4 GT/s)     |
|                                                          |                           | 0.6 (>=16 GT/s)           | 1.0 (<=16 GT/s)  |
|                                                          |                           | -                         | 1.25 (32 GT/s)   |
|                                                          | 0.5 V<br>(Supply Voltage) | 0.25 (<=12 GT/s)          | 0.5 (<=16 GT/s)  |
|                                                          |                           | 0.3 (>=16 GT/s)           | 0.75 (32 GT/s)   |
| Latency Target <sup>3</sup>                              |                           | <=2ns                     |                  |

Latency includes the latency of the Adapter and the Physical Layer (FDI to bump delay) on Tx and Rx

# What is UCIe?

## Characteristics of UCIe on Standard Package

| Index                                 | Value                                             |
|---------------------------------------|---------------------------------------------------|
| Supported speeds (per Lane)           | 4 GT/s, 8 GT/s, 12 GT/s, 16 GT/s, 24GT/s, 32 GT/s |
| Bump Pitch                            | 100 um to 130 um                                  |
| Channel reach (short reach)           | 10 mm                                             |
| Channel reach (long reach)            | 25 mm                                             |
| Raw Bit Error Rate (BER) <sup>1</sup> | 1e-27 (<= 8 GT/s)<br>1e-15 (>= 12 GT/s)           |

**Table 1-2. Characteristics of UCIe on Advanced Package**

| Index                                 | Value                                              |
|---------------------------------------|----------------------------------------------------|
| Supported speeds (per Lane)           | 4 GT/s, 8 GT/s, 12 GT/s, 16 GT/s, 24 GT/s, 32 GT/s |
| Bump pitch                            | 25 um to 55 um                                     |
| Channel reach                         | 2 mm                                               |
| Raw Bit Error Rate (BER) <sup>1</sup> | 1e-27 (<=12GT/s)<br>1e-15 (>=16GT/s)               |

## Raw BER requirements

| Package Type     | Data Rate (GT/s) |       |       |       |       |       |
|------------------|------------------|-------|-------|-------|-------|-------|
|                  | 4                | 8     | 12    | 16    | 24    | 32    |
| Advanced Package | 1E-27            | 1E-27 | 1E-27 | 1E-15 | 1E-15 | 1E-15 |
| Standard Package | 1E-27            | 1E-27 | 1E-15 | 1E-15 | 1E-15 | 1E-15 |

**FEC: Forward Error Correction**

**CRC: Cyclic Redundancy Check**

**Low BER but large latency**

## 5.7 Ball-out and Channel Specification

UCIe interconnect channel needs to meet the requirement of minimum rectangular eye open as specified in [Table 5-9](#) under channel compliance simulation conditions with noiseless and jitter-less behavioral TX and RX models.

**Figure 5-14. Example Eye diagram**



**Table 5-9. Eye requirements**

| Data Rate (GT/s)            | Eye Height (mV) | Eye width (UI) |
|-----------------------------|-----------------|----------------|
| 4, 8, 12, 16 <sup>1 3</sup> | 40              | 0.75           |
| 24, 32 <sup>1 2 3</sup>     | 40              | 0.65           |

1. Rectangular mask.
2. With equalization enabled.
3. Based on minimum Tx swing specification.

# What is eye diagram?



NRZ: 1 bit per clock cycle

1 Level  
0 Level



PAM4: 2 bits per clock cycle

3 Level  
2 Level  
1 Level  
0 Level



## Non-return to zero (NRZ) and pulse-amplitude modulation (PAM)

# What is different of D2D link?



**PCIe D2D link is Parallel or Serial?**

# What is different of D2D link?



**Hybrid: A Parallel link with Serialized on-chip bus!**

# What is different of D2D link?



Serdes connection

博学而笃志 切问而近思



Chiplet D2D connection

[芯砾D2D接口]

JUNE 12<sup>th</sup>

- **What is high-speed link for Chiplet?**

A Parallel link with Serialized on-chip bus

- **What is the advanced features?**

High density, Low latency, High bandwidth, High efficiency,  
Low BER, Universal

- **Why does we need it?**

Chiplet is a new application differ from the conventional  
Parallel and Serial Links

# 《小芯片接口总线技术要求》



ICS 31.200  
CCS L56

## 团 标 准

T/CESA 1248—2023

### 小芯片接口总线技术要求

Technical requirements for chiplet interface bus

2023-01-13 发布

2023-02-13 实施

中国电子工业标准化技术协会 发布

## 前　　言

本文件按照GB/T 1.1-2020《标准化工作导则 第1部分：标准化文件的结构和起草规则》的规定起草。

请注意本文件的某些内容可能涉及专利。本文件的发布机构不承担识别专利的责任。

本文件由中国电子技术标准化研究院提出。

本文件由中国电子技术标准化研究院和中国电子工业标准化技术协会归口。

本文件起草单位：中国电子技术标准化研究院、无锡芯光互连技术研究院有限公司、无锡芯光集成电路互连技术产业服务中心、中国科学院计算技术研究所、芯耀辉科技有限公司、海光信息技术股份有限公司、山东云海国创云计算装备产业创新中心有限公司、无锡众星微系统技术有限公司、芯动科技（珠海）有限公司、苏州锐杰微科技集团有限公司、牛芯半导体（深圳）有限公司、宁波德图科技有限公司。

本文件主要起草人：郝沁汾、李永耀、彭弘瑞、彭一弘、展永政、曹江城、方刘禄、吴止境、林江、程永波、曾令刚、吕佳杰、金伟强、何鑫、蒲菠、孔宪伟、任翔、尹航、刘军、赵明、李仁刚。

# 《小芯片接口总线技术要求》

本文件所适用的场景见图1。



图 1 小芯片接口技术应用场景种类

# 《小芯片接口总线技术要求》

## 4.4 体系架构

小芯片接口总线技术的体系架构见图2，主要包括数据链路层(Data Link Layer, DLL)、物理适配层(Physical Adaptation Layer, PAL)和物理层(Physical Layer, PHY)等，后面将不加区别使用中文或英文缩写概念。



图 2 标准内容体系结构图

数据链路层提供了物理层的初始化(Initialization)、事件管理(Event management)、信息交换的状态机(State machines)以及缓冲机制(Buffering)等功能。

# 《小芯片接口总线技术要求》



图3 小芯片接口总线基本配置单元的逻辑接口框图

小芯片接口总线的逻辑接口框图见图3，其中，红色（上）和绿色（下）代表PAL的发射信号和接收信号，蓝色（中）代表PHY层控制信号。图3中的发射和接收信号为基本配置单元模式。

PHY层信道接口类型不同，PHY层的发射信号和接收信号也有所不同。当PHY层采用并行总线接口时，发射信号和接收信号分别为16通道的发送数据端口TXDQ[15:0]和16通道的接收数据端口RXDQ[15:0]，速率选择为2GT/s, 4GT/s, 6GT/s, 8GT/s, 12GT/s, 16GT/s。当PHY层采用差分串行总线接口，发射信号和接收信号分别为TXP[3:0]、TXN[3:0]和RXP[3:0]、RXN[3:0]，速率选择为2GT/s, 4GT/s, 6GT/s, 8GT/s, 12GT/s, 16GT/s, 20GT/s, 24GT/s, 28GT/s, 32GT/s。接口种类与速率的对应关系见表1。

# 《小芯片接口总线技术要求》



本节描述了单端和差分接口的关键性能指标。相关指标的物理要求如下：

- 1) 带宽线密度以×16为例，标准封装凸点间距为150μm，先进封装凸点间距为55μm；
- 2) 能效包括了所有物理层相关的电路功耗；
- 3) 延时时间包括了适配层和物理层，从 TX 到 RX 环回的延时时间；
- 4) 误码率包括 TX 和 RX 的误码率。

单端和差分接口的关键性能指标表见表2、表3。

表 2 单端接口的关键性能指标

| 性能    | 条件                    | 先进封装     | 标准封装     | 单位      |
|-------|-----------------------|----------|----------|---------|
| 带宽线密度 | 2 GT/s                | 537.48   | 85.33    | GT/s/mm |
|       | 4 GT/s                | 1075     | 170.67   | GT/s/mm |
|       | 6 GT/s                | 1612.4   | 256      | GT/s/mm |
|       | 8 GT/s                | 2150     | 341.33   | GT/s/mm |
|       | 12 GT/s               | 3224.8   | 512      | GT/s/mm |
|       | 16 GT/s               | 4300     | 682.67   | GT/s/mm |
| 能效    | ≤12 GT/s              | 1        | 1.25     | pJ/bit  |
|       | ≥16 GT/s              | 0.75     | 1        | pJ/bit  |
| 延迟时间  | TX+RX 有 FEC@<8 GT/s   | 26.00    | 26.00    | ns      |
|       | TX+RX 有 FEC@8~16 GT/s | 13.00    | 13.00    | ns      |
|       | TX+RX 无 FEC@<8 GT/s   | 10.00    | 10.00    | ns      |
|       | TX+RX 无 FEC@8~16 GT/s | 5.00     | 5.00     | ns      |
| 误码率   | 有 FEC                 | 1.00E-15 | 1.00E-15 | -       |
|       | 无 FEC                 | 1.00E-12 | 1.00E-12 | -       |

# 《小芯片接口总线技术要求》

表 3 差分接口的关键性能指标

| 性能    | 条件                     | 先进封装     | 标准封装     | 单位      |
|-------|------------------------|----------|----------|---------|
| 带宽线密度 | 2 GT/s                 | 268.74   | 42.67    | GT/s/mm |
|       | 4 GT/s                 | 537.5    | 85.33    | GT/s/mm |
|       | 6 GT/s                 | 806.2    | 128      | GT/s/mm |
|       | 8 GT/s                 | 1075     | 170.67   | GT/s/mm |
|       | 12 GT/s                | 1612.4   | 256      | GT/s/mm |
|       | 16 GT/s                | 2150     | 341.33   | GT/s/mm |
|       | 20 GT/s                | 2687.5   | 426.65   | GT/s/mm |
|       | 24 GT/s                | 3224.8   | 512      | GT/s/mm |
|       | 28 GT/s                | 3762.5   | 597.31   | GT/s/mm |
|       | 32 GT/s                | 4300     | 682.67   | GT/s/mm |
| 能效    | ≤12 GT/s               | 2        | 2.5      | pJ/bit  |
|       | ≥16 GT/s               | 1.5      | 2        | pJ/bit  |
| 延迟时间  | TX+RX 有 FEC@<8 GT/s    | 26.00    | 26.00    | ns      |
|       | TX+RX 有 FEC@8~16 GT/s  | 13.00    | 13.00    | ns      |
| 延迟时间  | TX+RX 有 FEC@16~32 GT/s | 9.00     | 9.00     | ns      |
|       | TX+RX 无 FEC@<8 GT/s    | 10.00    | 10.00    | ns      |
|       | TX+RX 无 FEC@8~16 GT/s  | 5.00     | 5.00     | ns      |
|       | TX+RX 无 FEC@16~32 GT/s | 5.00     | 5.00     | ns      |
| 误码率   | 有 FEC                  | 1.00E-15 | 1.00E-15 | -       |
|       | 无 FEC                  | 1.00E-12 | 1.00E-12 | -       |

# 《小芯片接口总线技术要求》



## 5.1.1 并行总线接口

并行总线接口信号列表见表4。

表4 并行总线接口信号列表

| 符号            | 类型 | 描述           |
|---------------|----|--------------|
| RXDQ[15:0]    | 输入 | 接收方向数据       |
| TXDQ[15:0]    | 输出 | 发送方向数据       |
| RXCLKP/RXCLKN | 输入 | 接收方向时钟信号（差分） |
| TXCLKP/TXCLKN | 输出 | 发送方向时钟信号（差分） |

## 5.1.2 差分串行总线接口

差分串行总线接口信号列表见表5。

表5 差分串行总线接口信号列表

| 符号                   | 类型      | 描述           |
|----------------------|---------|--------------|
| RXP[3:0]<br>RXN[3:0] | 输入      | 接收方向数据信号（差分） |
| TXP[3:0]<br>TXN[3:0] | 输出      | 发送方向数据信号（差分） |
| RXCLKP<br>RXCLKN     | 输入（可选的） | 接收方向时钟信号（差分） |
| TXCLKP<br>TXCLKN     | 输出（可选的） | 发送方向时钟信号（差分） |

# 《小芯片接口总线技术要求》

单端并行16接口，双向模式，凸点间距150μm，交错列凸点间距为250μm：



图 38 常规封装小芯片间互连凸点排布示意图（单端并行 16 接口）

# 《小芯片接口总线技术要求》

差分串行16接口，双向模式，凸点间距150μm，交错列凸点间距为250μm：



图 39 常规封装小芯片间互连凸点排布示意图（差分串行 16 接口）

# 《小芯片接口总线技术要求》



图 53 先进封装小芯片间互连凸点排布



FUDAN UNIVERSITY

02

# Signal Integrity

博学而笃志 切问而近思



# IC Package Examples

- Wirebonding is most common die attach method
- Flip-chip packaging allows for more efficient heat removal
- 2D solder ball array on chip allows for more signals and lower signal and supply impedance



[Sam Palermo, Texas A&M]

## Bondwires

- $L \sim 1\text{nH/mm}$
- Mutual  $L "K"$
- $C_{couple} \sim 20\text{fF/mm}$



## Package Trace

- $L \sim 0.7-1\text{nH/mm}$
- Mutual  $L "K"$
- $C_{layer} \sim 80-90\text{fF/mm}$
- $C_{couple} \sim 40\text{fF/mm}$





- FCB packaging allows for much less chip interface impedance

[Intel]

| Electrical Parameter            | Wirebond Package Type |              |              | Flip-chip Package Type |             |
|---------------------------------|-----------------------|--------------|--------------|------------------------|-------------|
|                                 | CPGA                  | PPGA         | H-PBGA       | QIGA                   | FC-PGA      |
| Bondwire/Die bump R (mohms)     | 126 - 165             | 136 - 188    | 114 - 158    | 2                      | 0.06        |
| Bondwire/Die bump L (nH)        | 2.3 - 4.1             | 2.5 - 4.6    | 2.1 - 4.1    | 0.02                   | 0.013       |
| Trace R (mohms/cm)              | 1200                  | 66           | 66           | 500                    | 120         |
| Trace L (nH/cm)                 | 4.32                  | 3.42         | 3.42         | 3.07                   | 2.329       |
| Trace C (pF/cm)                 | 2.47                  | 1.53         | 1.53         | 1.66                   | 1.707       |
| Trace Z_0 (ohms)                | 42                    | 47           | 47           | 43                     | 38.5        |
| Pin/Land R (mohms)              | 20                    | 20           | 0            | 8                      | 20          |
| Pin/Land L (nH)                 | 4.5                   | 4.5          | 4.0          | 0.75                   | 2.9         |
| Plating Trace R (mohms/cm)      | 1200                  | 66           | 66           | N/A                    | N/A         |
| Plating Trace L (nH/cm)         | 4.32                  | 3.42         | 3.42         | N/A                    | N/A         |
| Plating Trace C (pF/cm)         | 2.47                  | 1.53         | 1.53         | N/A                    | N/A         |
| Plating Trace Z_0 (ohms)        | 42                    | 47           | 47           | N/A                    | N/A         |
| Trace Length Range (mm)         | 8.83 - 26.25          | 6.60 - 42.64 | 4.41 - 22.24 | 3.0 - 18.0             | 10.0 - 42.6 |
| Plating Trace Length Range (mm) | 1.91 - 10.50          | 1.91 - 16.46 | 0.930 - 8.03 | N/A                    | N/A         |

- Components soldered on top (and bottom)
- Typical boards have 4-8 signal layers and an equal number of power and ground planes

Max 100 layers

- Backplanes can have over 30 layers



- Signals typically on top and bottom layers
- GND/Power plane pairs and signal layer pairs alternate in board interior
- Typical copper trace thickness
  - “0.5oz” (17.5um) for signal layers
  - “1oz” (35um) for power planes



# Connectors

- Important to maintain proper differential impedance through connector
- Crosstalk can be an issue in the connectors



- Used to connect PCB layers
- Made by drilling a hole through the board which is plated with copper
  - Pads connect to signal layers/traces
  - Clearance holes avoid power planes
- Expensive in terms of signal density and integrity
  - Consume multiple trace tracks
  - Typically lower impedance and create "stubs"



# Impact of Via Stubs at Connectors



- **Legacy BP** has default straight vias
  - Creates severe nulls which kills signal integrity
- **Refined BP** has expensive backdrilled vias

14



50-150um

$\geq 2\text{Gbps}$  need Backdrill



## Transmission Line Parameters

Cross-sectional view of typical uniform interconnects:



- Capacitance between conductors,  $C$  (F/m)
- Inductance of conductor loop,  $L$  (H/m)
- Resistance of conductors (conductor loss),  $R$  ( $\Omega$ /m)
- Shunt conductance (dielectric loss),  $G$  (S/m)



$R, L, G, C$  are specified as per-unit-length parameters

## Propagation Speeds for Typical Dielectrics

| Dielectric        | Rel. Dielectric Constant<br>$\epsilon_r$ | Propagation speed<br>(cm/nsec) | Delay time per unit length<br>(ps/cm) |
|-------------------|------------------------------------------|--------------------------------|---------------------------------------|
| Polyimide         | 2.5 – 3.5                                | 16-19                          | 53 - 62                               |
| Silicon dioxide   | 3.9                                      | 15                             | 66                                    |
| Epoxy glass (PCB) | 5.0                                      | 13                             | 75                                    |
| Alumina (ceramic) | 9.5                                      | 10                             | 103                                   |

# Transmission Line Model

- Model Types
  - Ideal
  - Lumped C, R, L
  - RC transmission line
  - LC transmission line
  - RLGC transmission line



- Condition for LC or RLGC model (vs RC)

$$f_0 \geq \frac{R}{2\pi L}$$

| Wire                                        | R       | L       | C       | >f (LC wire) |
|---------------------------------------------|---------|---------|---------|--------------|
| AWG24 Twisted Pair                          | 0.08Ω/m | 400nH/m | 40pF/m  | 32kHz        |
| PCB Trace                                   | 5Ω/m    | 300nH/m | 100pF/m | 2.7MHz       |
| On-Chip Min. Width M6<br>(0.18μm CMOS node) | 40kΩ/m  | 4μH/m   | 300pF/m | 1.6GHz       |

## Example T-Line Structures

Parallel-Plate Line



$$C = \epsilon_0 \epsilon_r \frac{w}{s} \quad L = \mu_0 \frac{s}{w}$$

$$Z_0 = \sqrt{\frac{\mu}{\epsilon_0 \epsilon_r}} \frac{s}{w} \quad R_{DC} = \frac{2\rho}{wt}$$

Coaxial Line



$$C = \frac{2\pi\epsilon_0\epsilon_r}{\ln(b/a)} \quad L = \frac{\mu_0}{2\pi} \ln\left(\frac{b}{a}\right)$$

$$Z_0 = \sqrt{\frac{\mu}{\epsilon_0 \epsilon_r}} \frac{\ln(b/a)}{2\pi}$$

$$R_{DC} = \frac{\rho}{\pi a^2} + \frac{\rho}{\pi(c^2 - b^2)}$$

$\rho$ : Resistivity

$\epsilon_0 \approx 8.85 \times 10^{-12} \text{ F/m}$

博学而笃志 切问而近思

[Oregon State U]

JUNE 12<sup>th</sup>

- The resistive ( $\alpha_R$ ) and dielectric ( $\alpha_D$ ) loss terms cause a signal propagating down a transmission-line to become attenuated with distance

$$\frac{V(x)}{V(0)} = e^{-(\alpha_R + \alpha_D)x}$$



- Resistive loss term is due to conductor skin effect
- Dielectric loss term is due to dielectric absorption
- Both terms increase with frequency, although at different rates

# Skin Effect (Resistive Loss)

- High-frequency current density falls off exponentially from conductor surface
- Skin depth,  $\delta$ , is where current falls by  $e^{-1}$  relative to full conductor
  - Decreases proportional to  $\sqrt{\text{frequency}}$
- Relevant at critical frequency  $f_s$  where skin depth equals half conductor height (or radius)
  - Above  $f_s$  resistance/loss increases proportional to  $\sqrt{\text{frequency}}$



[Dally]

$$J = e^{-\frac{d}{\delta}} \quad \delta = (\pi f \mu \sigma)^{-\frac{1}{2}}$$

For rectangular conductor:

$$f_s = \frac{\rho}{\pi \mu} \left( \frac{h}{2} \right)^2$$

$$R(f) = R_{DC} \left( \frac{f}{f_s} \right)^{\frac{1}{2}}$$

$$\alpha_R = \frac{R_{DC}}{2Z_0} \left( \frac{f}{f_s} \right)^{\frac{1}{2}}$$



- Reduce impedance: silver/gold-plating
- Reduce trace path and area
- Multi-small traces paralleling

电力电缆的截面



# Dielectric Absorption (Loss)

- An alternating electric field causes dielectric atoms to rotate and absorb signal energy in the form of heat
- Dielectric loss is expressed in terms of the loss tangent
- Loss increases directly proportional to frequency

$$\tan \delta_D = \frac{G}{\omega C}$$

TABLE 3-4 Electrical Properties of PC Board Dielectrics

| Material                                       | $\epsilon_r$ | $\tan \delta_D$ |
|------------------------------------------------|--------------|-----------------|
| Woven glass, epoxy resin ("FR-4")              | 4.7          | 0.035           |
| Woven glass, polyimide resin                   | 4.4          | 0.025           |
| Woven glass, polyphenylene oxide resin (GETEK) | 3.9          | 0.010           |
| Woven glass, PTFE resin (Teflon)               | 2.55         | 0.005           |
| Nonwoven glass, PTFE resin                     | 2.25         | 0.001           |

[Dally]

$$\alpha_D = \frac{GZ_0}{2} = \frac{2\pi f C \tan \delta_D \sqrt{L/C}}{2} \\ = \pi f \tan \delta_D \sqrt{LC}$$



**Reflection coefficient (from A to B):**

$$\Gamma = \frac{Z_B - Z_A}{Z_B + Z_A} \quad [-1, 1]$$



$$V_i = 1V \left( \frac{50}{50+50} \right) = 0.5V$$

$$k_{rT} = \frac{50 - 50}{50 + 50} = 0$$

$$k_{rS} = \frac{50 - 50}{50 + 50} = 0$$



[Sam Palermo, Texas A&M]

35



$$V_i = 1V \left( \frac{50}{50+50} \right) = 0.5V$$

$$k_{rT} = \frac{\infty - 50}{\infty + 50} = +1$$

$$k_{rS} = \frac{50 - 50}{50 + 50} = 0$$

$$R_s = 50\Omega$$

$$Z_0 = 50\Omega, t_d = 1\text{ns}$$

$$R_T \sim \infty (1\text{M}\Omega)$$



[Sam Palermo, Texas A&M]



$$V_i = 1V \left( \frac{50}{50+50} \right) = 0.5V$$

$$k_{rT} = \frac{0-50}{0+50} = -1$$

$$k_{rS} = \frac{50-50}{50+50} = 0$$

$$R_s = 50\Omega$$

$$Z_0 = 50\Omega, t_d = 1\text{ns}$$

$$R_T = 0\Omega$$





$$V_i = 1V \left( \frac{50}{400+50} \right) = 0.111V$$

$$k_{rT} = \frac{600 - 50}{600 + 50} = 0.846$$

$$k_{rS} = \frac{400 - 50}{400 + 50} = 0.778$$

$$R_s = 400\Omega$$

$$Z_0 = 50\Omega, t_d = 1\text{ns}$$

$$R_T = 600\Omega$$



[Sam Palermo, Texas A&M]

$$[V_{11}=0.111]$$

$$[0.094=0.111*0.846]$$

$$\begin{aligned} [V_{11}] &= 0.111 + 0.094 * (1 + 0.778) \\ &= 0.278 \end{aligned}$$

$$[0.061758=0.073*0.846]$$

$$\begin{aligned} [V_{12}] &= 0.278 + 0.062 * (1 + 0.778) \\ &= 0.38823 \end{aligned}$$

$$[0.0408=0.04824*0.846]$$

$$\begin{aligned} [V_{13}] &= 0.38823 + 0.041 * (1 + 0.778) \\ &= 0.4611 \end{aligned}$$





$$R_s = 400\Omega$$

$$Z_0 = 50\Omega, t_d = 1\text{ns}$$

$$R_T = 600\Omega$$





Reflection coefficient (from A to B):

$$\Gamma = \frac{Z_B - Z_A}{Z_B + Z_A} \quad [-1, 1]$$



$$V_{10} = 0.83\text{V}; V_{20} = 0\text{V}$$

$$V_{11} = 0.83\text{V}; V_{21} = 0 + 0.83 + 0.83 * \rho_2 = 1.66\text{V}$$

$$V_{12} = 0.83 + 0.83 + 0.83 * \rho_1 = 1.106\text{V}; V_{22} = 1.66\text{V}$$

$$V_{13} = 1.106\text{V}; V_{23} = 1.66 + (-0.554) + (-0.554 * \rho_2) = 0.552\text{V}$$

$$V_{14} = 1.106 + (-0.554) + (-0.554 * \rho_1) = 0.922\text{V}; V_{24} = 0.552\text{V}$$

$$V_{15} = 0.922\text{V}; V_{25} = 0.522 + 0.37 + 0.37 * \rho_2 = 1.292\text{V}$$

$$V_{16} = 0.922 + 0.37 + 0.37 * \rho_1 = 1.045\text{V}; V_{26} = 1.292\text{V}$$

$$V_{17} = 1.045\text{V}; V_{27} = 1.292 + (-0.247) + (-0.247 * \rho_2) = 0.798\text{V}$$

$$V_{18} = 1.045 + (-0.247) + (-0.247 * \rho_1) = 0.963\text{V}; V_{28} = 0.798\text{V}$$

$$V_{19} = 0.963\text{V}; V_{29} = 0.798 + 0.165 + 0.165 * \rho_2 = 1.128\text{V}$$

Voltage (V)



Voltage (V)



- **Overshoot and Ringing**
- $V_{\text{termination}} = V_s$
- **RLC resonance**

# Termination Reflection Patterns



$$R_s = 25\Omega, RT = 25\Omega$$

$$kr_s & kr_T < 0$$

**Voltages Converge**



$$R_s = 25\Omega, RT = 100\Omega$$

$$kr_s < 0 \& kr_T > 0$$

**Voltages Oscillate**



$$R_s = 100\Omega, RT = 25\Omega$$

$$kr_s > 0 \& kr_T < 0$$

**Voltages Oscillate**



$$R_s = 100\Omega, RT = 100\Omega$$

$$kr_s > 0 \& kr_T > 0$$

**Voltages Ring Up**

# Termination Reflection Patterns



- Shunt C discontinuity



- Series L discontinuity



$$t_r = 10\text{ps}$$



**Peak voltage spike magnitude:**

$$\frac{\Delta V}{V} = \left( \frac{\tau}{t_r} \right) \left[ 1 - e^{\left( -\frac{t_r}{\tau} \right)} \right]$$

# Termination

- No Termination
  - Little to absorb line energy
  - Can generate oscillating waveform
  - Line must be **very short** relative to signal transition time
    - $n = 4 - 6$
  - Limited off-chip use
- Source Termination
  - Source output takes 2 steps up
  - Used in moderate speed point-to-point connections



$$t_r > n T_{\text{round-trip}} = 2nl\sqrt{LC}$$



# Termination

- Receiver Termination
  - No reflection from receiver
  - Watch out for intermediate impedance discontinuities
    - Little to absorb reflections at driver
- Double Termination
  - Best configuration for min reflections
    - Reflections absorbed at both driver and receiver
  - Get half the swing relative to single termination
  - Most common termination scheme for high performance serial links





Reflection coefficient (from A to B):

$$\Gamma = \frac{Z_B - Z_A}{Z_B + Z_A} \quad [-1, 1]$$



## Lumped vs. Distributed Circuits

### Lumped-Element Circuits:

- Physical dimensions of circuit are such that voltage across and current through conductors connecting elements does not vary.
- Current in two-terminal lumped circuit element does not vary (**phase change or transit time are neglected**)



**Lumped Parameter Electrical Circuit (集总参数) → Z/Y-parameter**

**Physical dimensions (d) << signal wavelength ( $\lambda$ )**

# Signal Integrity Analysis

## Lumped vs. Distributed Circuits

### Distributed Circuits:

- Current varies along conductors and elements;
  - Voltage across points along conductor or within element varies
- phase change or transit time **cannot be neglected**

Example:



Distributed Parameter Electrical Circuit (分布参数) → S-parameter

Physical dimensions (d) not  $\ll$  signal wavelength ( $\lambda$ )



**Two port network: specific reference port impedance (50ohm)**

**S11: input port voltage reflection coefficient (return loss)**

**S21: forward voltage gain (insertion loss)**

**S12: reverse voltage gain (insertion loss)**

**S22: output port voltage reflection coefficient (return loss)**



$$\begin{pmatrix} b_1 \\ b_2 \end{pmatrix} = \begin{pmatrix} S_{11} & S_{12} \\ S_{21} & S_{22} \end{pmatrix} \begin{pmatrix} a_1 \\ a_2 \end{pmatrix}$$

**Two port network:**

$S_{11}: (b_1/a_1)|_{a_2=0}$  smaller  $\rightarrow$  better

$S_{12}: (b_1/a_2)|_{a_1=0}$  approaching 0dB  $\rightarrow$  better

$S_{21}: (b_2/a_1)|_{a_2=0}$  (gain) approaching 0dB  $\rightarrow$  better

$S_{22}: (b_2/a_2)|_{a_1=0}$  smaller  $\rightarrow$  better

**Passive network  $S_{12}=S_{21}$**

**Simulation:**  
 Cadence: Allegro PCB SI  
 Agilent : Advanced Design System (ADS)  
**Measurement:**  
 Network analyzer

# Scattering parameter



|                     |               |
|---------------------|---------------|
| #GRM32ER60E337ME05  |               |
| #In Production      |               |
| #2022/02/16         |               |
| #s11                |               |
| #DCOV 25degC series |               |
| Frequency [Hz]      | S11 [dB]      |
| 100                 | -23. 05951722 |
| 104. 5792151        | -23. 44324079 |
| 109. 3681224        | -23. 82715631 |
| 114. 376324         | -24. 21124013 |
| 119. 6138619        | -24. 59546963 |
| 125. 091238         | -24. 9798233  |
| 130. 8194349        | -25. 36428082 |

Ceramic capacitor:

**S11=-23dB@100Hz, Mag(S11)=0.0708**

$$dB(S_{11}) = 20 \log_{10} [\text{Mag } (S_{11})] = -23\text{dB}$$



**Four port network:**

**S11/S22/S33/S44: return loss**

**S12/S21/S34/S43: insertion loss**

**S13/S31/S24/S42: near-end crosstalk (NEXT)**

**S14/S41/S23/S32: far-end crosstalk (FEXT)**

- Why S Parameters?
  - Easy to measure
  - Y, Z parameters need open and short conditions
  - S parameters are obtained with nominal termination
  - S parameters based on incident and reflected wave ratio



FUDAN UNIVERSITY

03 TX/RX

博学而笃志 切问而近思



Depend on Parallel / Serial







Use a precise clock to chop the data into equal periods



overlay each period onto one plot



**[Walker]**





**Eye diagrams are a layered view of every bit transition combination**





## Non-ideal Real-Time Eye



Noise: voltage detection error



Noise: voltage detection error



$$\text{Quality Factor} = (\text{Level1} - \text{Level0}) / (1\text{Sigma1} + 1\text{Sigma0})$$

## Non-ideal Real-Time Eye



### Jitter:

- The deviation of the significant instances of a signal from their ideal locations in time
- Random Jitter (unbounded, rms jitter) /Deterministic Jitter(Bounded, Peak-to-Peak jitter)



**Deterministic Jitter:**  
predictable and repeatable behavior



## Jitter and the Real-Time Eye



**Random Jitter → Gaussian distribution**  
**Thermal noise, flicker noise or shot noise**

## Jitter Components



### Acronyms

- DDJ : Data Dependent Jitter
- BUJ : Bounded Uncorrelated Jitter

ABUJ : Aperiodic Bounded Uncorrected Jitter





**Figure 1:** The dual-Dirac jitter distribution. In (a) the DJ and RJ distributions and, in (b), their convolution.



$$\text{BER}(x) = \rho_T \int_x^{\infty} \text{PDF}(x') dx' + \rho_T \int_{-\infty}^x \text{PDF}(x'-T) dx'$$

$\rho_T$  is the logic transition density (i.e., the ratio of the number of transitions to the number of bits)

$$\text{TJ(BER)} = 2Q_{BER} \times \text{RJ}(\delta\delta) + \text{DJ}(\delta\delta)$$

$$\text{RJ}(\delta\delta) = \sigma \text{ and } \text{DJ}(\delta\delta) = \mu_R - \mu_L.$$

$Q_{BER}$  is a constant

| BER        | $Q_{BER}$ |
|------------|-----------|
| $10^{-10}$ | 6.3       |
| $10^{-11}$ | 6.7       |
| $10^{-12}$ | 7.0       |
| $10^{-13}$ | 7.4       |
| $10^{-14}$ | 7.7       |



Figure 2: An eye diagram with, (a) no jitter, (b) dual-Dirac DJ, (c) RJ and dual-Dirac DJ, and (d) bathtub plot,  $\text{BER}(x)$ .











# Inter-Symbol Interference (ISI)

- Previous bits residual state can distort the current bit, resulting in inter-symbol interference (ISI)
- ISI is caused by
  - Reflections, Channel resonances, Channel loss (dispersion)



- At channel input (TX output), eye diagram is wide open
- As data pulses propagate through channel, they experience dispersion and have significant ISI
  - Result is a closed eye at channel output (RX input)



[Meghelli (IBM) ISSCC 2006]

# Inter-Symbol Interference (ISI)



## "ISI" of Bitstream "11011" for a 10G Backplane

# Inter-Symbol Interference (ISI)







# RX Continuous-Time Linear Equalizer

- Passive R-C (or L) can implement high-pass transfer function to compensate for channel loss
- Cancel both precursor and long-tail ISI
- Can be purely passive or combined with an amplifier to provide gain



# RX Continuous-Time Linear Equalizer

- Passive structures offer excellent linearity, but no gain at Nyquist frequency



$$H(s) = \frac{R_2}{R_1 + R_2} \frac{1 + R_1 C_1 s}{1 + \frac{R_1 R_2}{R_1 + R_2} (C_1 + C_2) s}$$

$$\omega_z = \frac{1}{R_1 C_1}, \quad \omega_p = \frac{1}{\frac{R_1 R_2}{R_1 + R_2} (C_1 + C_2)}$$

$$\text{DC gain} = \frac{R_2}{R_1 + R_2}, \quad \text{HF gain} = \frac{C_1}{C_1 + C_2}$$

$$\text{Peaking} = \frac{\text{HF gain}}{\text{DC gain}} = \frac{\omega_p}{\omega_z} = \frac{R_1 + R_2}{R_2} \frac{C_1}{C_1 + C_2}$$

# RX Continuous-Time Linear Equalizer

- Input amplifier with RC degeneration can provide frequency peaking with gain at Nyquist frequency
- Potentially limited by gain-bandwidth of amplifier
- Amplifier must be designed for input linear range
  - Often TX eq. provides some low frequency attenuation
- Sensitive to PVT variations and can be hard to tune
- Generally limited to 1<sup>st</sup>-order compensation



[Gondi JSSC 2007]

$$H(s) = \frac{g_m}{C_p} \frac{s + \frac{1}{R_s C_s}}{\left( s + \frac{1 + g_m R_s / 2}{R_s C_s} \right) \left( s + \frac{1}{R_d C_p} \right)}$$

$$\omega_z = \frac{1}{R_s C_s}, \quad \omega_{p1} = \frac{1 + g_m R_s / 2}{R_s C_s}, \quad \omega_{p2} = \frac{1}{R_d C_p}$$

$$\text{DC gain} = \frac{g_m R_d}{1 + g_m R_s / 2}, \quad \text{Ideal peak gain} = g_m R_d$$

$$\text{Ideal Peaking} = \frac{\text{Ideal peak gain}}{\text{DC gain}} = \frac{\omega_{p1}}{\omega_z} = 1 + g_m R_s / 2$$

# RX Continuous-Time Linear Equalizer



- Pros
  - Provides gain and equalization with low power and area overhead
  - Can cancel both precursor and long-tail ISI
- Cons
  - Generally limited to 1st order compensation
  - Amplifies noise/crosstalk
  - PVT sensitivity
  - Can be hard to tune



- Tune degeneration resistor and capacitor to adjust zero frequency and 1<sup>st</sup> pole which sets peaking and DC gain
- Increasing  $C_S$  moves zero and 1<sup>st</sup> pole to a lower frequency w/o impacting (ideal) peaking
- Increasing  $R_S$  moves zero to lower frequency and increases peaking (lowers DC gain)
  - Minimal impact on 1<sup>st</sup> pole



$$\omega_z = \frac{1}{R_S C_S}, \quad \omega_{p1} = \frac{1 + g_m R_S / 2}{R_S C_S}$$



- Pros

- With sufficient dynamic range, can amplify high frequency content (rather than attenuate low frequencies)
- Can cancel ISI in pre-cursor and beyond filter span
- Filter tap coefficients can be adaptively tuned without any back-channel

- Cons

- Amplifies noise/crosstalk
- Implementation of analog delays
- Tap precision

Eye-Pattern Diagrams at 1Gb/s on CAT5e\*



\*D. Hernandez-Garduno and J. Silva-Martinez, "A CMOS 1Gb/s 5-Tap Transversal Equalizer based on 3<sup>rd</sup>-Order Delay Cells," ISSCC. 2007.

# RX Analog FIR Equalization

- 5-tap equalizer with tap spacing of  $T_b/2$



3<sup>rd</sup>-order delay cell



1Gb/s experimental results



D. Hernandez-Garduno and J. Silva-Martinez, "A CMOS 1Gb/s 5-Tap Transversal Equalizer based on 3<sup>rd</sup>-Order Delay Cells," ISSCC, 2007.

# RX Digital FIR Equalization

- Digitize the input signal with high-speed low/medium resolution ADC and perform equalization in digital domain
  - Digital delays, multipliers, adders
  - Limited to ADC resolution
- Power can be high due to very fast ADC and digital filters





- 12.5GS/s 4.5-bit Flash ADC in 65nm CMOS [Harwood ISSCC 2007]
- 2-tap FFE & 5-tap DFE
- XCSR power (inc. TX) = 330mW, Analog = 245mW, Digital = 85mW



- **Driving/Equalization/Termination**
- **Techniques:**
  - **Swing enhancement techniques,**
  - **Impedance control**
  - **Pad bandwidth extension**
  - **Slew-rate control**

- Finite supply impedance causes significant Simultaneous Switching Output (SSO) noise (xtalk)
- Necessitates large amounts of decoupling capacitance for supplies and reference voltage
  - Decap limits I/O area more than circuitry





- A difference between voltage or current is sent between two lines
- Requires 2x signal lines relative to single-ended signaling, but less return pins
- Advantages
  - Signal is self-referenced
  - Can achieve twice the signal swing
  - Rejects common-mode noise
  - Return current is ideally only DC

# Controlled-Impedance Drivers

- Signal integrity considerations (min. reflections) requires  $50\Omega$  driver output impedance
- To produce an output drive voltage
  - Current-mode drivers use Norton-equivalent parallel termination
    - Easier to control output impedance
  - Voltage-mode drivers use Thevenin-equivalent series termination
    - Potentially  $\frac{1}{2}$  to  $\frac{1}{4}$  the current for a given output swing



**Current-Mode**



**Voltage-Mode**

# Current-Mode Logic (CML) Driver



- Used in most high performance serial links
- Low voltage operation relative to push-pull driver
  - High output common-mode keeps current source saturated
- Can use DC or AC coupling
  - AC coupling requires data coding
- Differential pp RX swing is  $\pm IR/2$  with double termination



$$V_{d,1} = (I/2)R$$

$$V_{d,0} = -(I/2)R$$

$$V_{d,pp} = IR$$

$$I = \frac{V_{d,pp}}{R}$$



$$V_{d,1} = (I/4)(2R)$$

$$V_{d,0} = -(I/4)(2R)$$

$$V_{d,pp} = IR$$

$$I = \frac{V_{d,pp}}{R}$$

## Single-Ended Termination



$$V_{d,1} = (V_s / 2)$$

$$V_{d,0} = -(V_s / 2)$$

$$V_{d,pp} = V_s$$

$$I = (V_s / 2R)$$

$$I = \frac{V_{d,pp}}{2R}$$

## Differential Termination



$$V_{d,1} = (V_s / 2)$$

$$V_{d,0} = -(V_s / 2)$$

$$V_{d,pp} = V_s$$

$$I = (V_s / 4R)$$

$$I = \frac{V_{d,pp}}{4R}$$

# Current-Mode vs Voltage-Mode

| Driver/Termination | Current Level   | Normalized Current Level |
|--------------------|-----------------|--------------------------|
| Current-Mode/SE    | $V_{d,pp}/Z_0$  | 1x                       |
| Current-Mode/Diff  | $V_{d,pp}/Z_0$  | 1x                       |
| Voltage-Mode/SE    | $V_{d,pp}/2Z_0$ | 0.5x                     |
| Voltage-Mode/Diff  | $V_{d,pp}/4Z_0$ | 0.25x                    |

- An ideal voltage-mode driver with differential RX termination enables a *potential* 4x reduction in driver power
- *Actual* driver power levels also depend on
  - Output impedance control
  - Pre-driver power
  - Equalization implementation

- Linear RX equalizers don't discriminate between signal, noise, and cross-talk
  - While signal-to-distortion (ISI) ratio is improved, SNR remains unchanged





# RX Decision Feedback Equalizer

- DFE is a **non-linear** equalizer
- Slicer makes a **symbol decision**, i.e. quantizes input
- ISI is then directly subtracted from the incoming signal via a feedback FIR filter

$$z_k = y_k - w_1 \tilde{d}_{k-1} - \dots - w_{n-1} \tilde{d}_{k-(n-1)} - w_n \tilde{d}_{k-n}$$



# RX Decision Feedback Equalizer



- ▶ **Pros**
  - No noise and crosstalk amplification
  - Filter tap coefficients can be adaptively tuned without any back-channel
- ▶ **Cons**
  - Cannot cancel pre-cursor ISI
  - Critical feedback timing path
  - Timing of ISI subtraction complicates CDR phase detection



**Track the history bits and predict the current condition then subtraction**

# RX Decision Feedback Equalizer



[Liu ISSCC 2009]

- DFE with 2-tap FIR filter in feedback will only cancel ISI of the first two post-cursors

# RX Decision Feedback Equalizer



- A DFE with FIR feedback requires many taps to cancel ISI
- Smooth channel long-tail ISI can be approximated as exponentially decaying
  - Examples include on-chip wires and silicon carrier wires

# RX Decision Feedback Equalizer

[Liu ISSCC 2009]



- Large 1<sup>st</sup> post-cursor  $H_1$  is canceled with normal FIR feedback tap
- Smooth long tail ISI from 2<sup>nd</sup> post-cursor and beyond is canceled with low-pass IIR feedback filter
- Note: channel needs to be smooth (not many reflections) in order for this approach to work well

21

# TX Feed Forward Equalizer

For 10Gbps :  $W(z) = -0.131 + 0.595z^{-1} - 0.274z^{-2}$



$$\mathbf{W} = [-0.131 \quad 0.595 \quad -0.274]$$

**Low Frequency Response (Sum Taps)**

$$\dots 1 \quad 1 \quad 1 \quad \dots] * [-0.131 \quad 0.595 \quad -0.274] = [\dots \quad 0.190 \quad 0.190 \quad 0.190 \quad \dots]$$

**Nyquist Frequency Response (Sum Taps w/ Alternating Polarity)**

$$[\dots \quad -1 \quad 1 \quad -1 \quad \dots] * [-0.131 \quad 0.595 \quad -0.274] = [\dots \quad 1 \quad -1 \quad 1 \quad \dots]$$



17" Refined Server 10Gb/s Pulse Response



pre and main cursors with tap coefficients to emphasize the main cursor

# TX Feed Forward Equalizer



**Nyquist Frequency Response**  $\left( f = \frac{1}{2T_s} \right)$

$$z = \cos(\pi) + j \sin(\pi) = -1 \Rightarrow W\left(f = \frac{1}{2T_s}\right) = -1 \Rightarrow 0dB$$

- **Equalizer has 14.4dB of frequency peaking**
  - Attenuates DC at -14.4dB and passes Nyquist frequency at 0dB

Note:  $T_s = T_b = 100ps$

pre and main cursors with tap coefficients to emphasize the main cursor



**Emphasis:** 预加重(pre-emphasis)和去加重(de-emphasis)

**pre-emphasis:** compensate the high-frequency loss

**de-emphasis:** reduce mid-low frequency swing



$$y[k] = \sum_{n=0}^2 c_n x[k-n]$$

$$H[z] = \frac{Y[z]}{X[z]} = C_0 + C_1 z^{-1} + C_2 z^{-2}$$

Use more de-emphasis: reduce amplitude, saving power, reduce Electromagnetic Interference  
 No decision, compensation with the signal amplitude

# TX Feed Forward Equalizer



w/o TX FFE



w/ TX FFE



High pass filter in TX

# TX Equalization



- Pros
  - Simple to implement
  - Can cancel ISI in pre-cursor and beyond filter span
  - Doesn't amplify noise
  - Can achieve 5-6bit resolution
- Cons
  - Attenuates low frequency content due to peak-power limitation
  - Need a "back-channel" to tune filter taps

