

# A 56-Gb/s PAM4 Wireline Transceiver Using a 32-Way Time-Interleaved SAR ADC in 16-nm FinFET

Yohan Frans, Jaewook Shin, Lei Zhou, Parag Upadhyaya, Jay Im, Vassili Kireev, Mohamed Elzeftawi, Hiva Hedayati, *Member, IEEE*, Toan Pham, Santiago Asuncion, Chris Borrelli, Geoff Zhang, Hongtao Zhang, and Ken Chang, *Senior Member, IEEE*

**Abstract**—A 56-Gb/s PAM4 wireline transceiver testchip is implemented in 16-nm FinFET. The current mode logic transmitter incorporates an auxiliary current injection at the output nodes to maintain PAM4 amplitude linearity. The ADC-based receiver incorporates hybrid analog and digital equalizations. The analog equalization is performed using two identical stages of continuous time linear equalizer, each having a constant of  $\sim 0$ -dB dc-gain and a maximum peaking of  $\sim 7$  dB peaking at 14 GHz. A 28-GSample/s 32-way time-interleaved SAR ADC converts the equalized analog signal into digital domain for further equalization using digital signal processing. The transceiver achieves  $<1e-8$  bit error rate over a backplane channel with 31-dB loss at 14-GHz and 3.5-mV<sub>rms</sub> additional crosstalk, using a fixed  $\sim 10$ -dB TX equalization and an adaptive hybrid RX equalization, with the DSP configured to have a 24-tap feed forward equalizer and a 1-tap decision feedback equalizer. The transceiver consumes 550-mW power at 56 Gb/s, excluding the power of the on-chip configurable DSP that cannot be accurately measured as it is implemented as part of a larger test structure.

**Index Terms**—56 Gb/s, ADC, PAM4, transceiver, wireline.

## I. INTRODUCTION

THE emergence of Internet of Things and cloud computing has triggered rapid increase in bandwidth demand in data centers and telecommunication infrastructures. The increasing bandwidth demand had recently prompted the industry to propose a new electrical interface standard capable of operating up to 56 Gb/s per lane [1]. In order to avoid costly infrastructure upgrades, the interface needs to support legacy channels (i.e., backplane) designed for current generation electrical interface (e.g., up to 28 Gb/s in traditional non return to zero (NRZ) signaling). These legacy channels often have a very large insertion loss (IL) beyond 14 GHz with significant reflections. Fig. 1 shows an example of such backplane channel. If the NRZ signaling was used to operate the link at 56 Gb/s, the IL at Nyquist frequency (28 GHz) would be  $\sim 60$  dB, and the IL-to-crosstalk ratio (ICR) from one aggressor at 28 GHz would be  $\sim 0$  dB. These constraints make it very difficult to implement the NRZ signaling at 56 Gb/s with reasonable

Manuscript received August 8, 2016; revised November 10, 2016; accepted November 10, 2016. Date of publication January 9, 2017; date of current version March 23, 2017. This paper was approved by Guest Editor Brian Ginsburg.

The authors are with Xilinx, Inc., San Jose, CA 95124 USA (e-mail: yohanf@xilinx.com).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2016.2632300



Fig. 1. IL and crosstalk profile of legacy backplane designed for 28-Gb/s NRZ signaling.

power efficiency. The PAM4 signaling [2] is a better choice for these legacy channels, since the signal frequency content can be limited to 14 GHz. For the channel shown in Fig. 1, the IL is  $\sim 31$  dB and the ICR is  $\sim 30$  dB at 14 GHz. Even though many of the PAM4 signaling challenges (e.g., smaller signal power due to transmitter's peak power constraint, sensitivity to residual ISI) still need to be addressed, these IL and ICR numbers fall within the equalization capability of the state-of-the-art transceivers.

In order to relax the raw bit-error-rate (BER) requirement, the proposed standard allows the use of forward error correction (FEC) mechanism in the targeted links/systems. Error-correcting block codes such as Reed–Solomon code (Fig. 2) is used to encode/decode the data, such that low BER (e.g.,  $<1e-18$ ) can still be achieved even when the raw (pre-FEC) BER is relatively high (e.g.,  $1e-4$  to  $1e-6$ ) [3], at the expense of higher total latency in the links/systems. A raw BER of  $<1e-4$  is currently proposed as standard [1].

This paper presents the design of a 56-Gb/s ADC-based PAM4 transceiver with a moderate target BER (e.g.,  $1e-6$  to  $1e-8$ ) over legacy channels to be used with FEC. Section II covers the PAM4 transmitter design. The ADC-based receiver is described in Section III. The experimental results are summarized in Section IV.



Fig. 2. BER improvement using block codes.



Fig. 3. PAM4 transmitter output levels and linearity definition.

## II. TRANSMITTER

There are two new challenges in designing a PAM4 transmitter [4] when compared with designing an NRZ transmitter. First, in general, a transmitter designed for PAM4 signaling must be able to deliver a higher swing, because the corresponding receiver effectively needs to be able to resolve smaller amplitude compared with NRZ signaling (in an ideal case, the receiver needs to be able to resolve one third of the transmitted swing, as shown in Fig. 3). Second, the transmitter must maintain good linearity when transmitting the four PAM4 signaling levels. The dc-level distortions due to transmitter nonlinearity directly translate into smaller effective eye height seen by the receiver. The proposed standard [1] specifies transmitter linearity requirement in terms of ratio of level mismatch (RLM), as shown in Fig. 3.

Fig. 4 shows the block diagram of the three-tap (one main-tap, one pretap, and one posttap) transmitter used in this design. A pattern generator generates either a Pseudo random bit sequence (PRBS) data sequence (PRBS-7, PRBS-15, or PRBS-31) or a programmable 128-b data pattern. The 128-b output of the pattern generator is serialized to a 16-b data that feeds a finite impulse response (FIR) generator block. The FIR generator block generates three streams of 4-b data—one data stream for each FIR tap. A transmitter



Fig. 4. Transmitter block diagram.



Fig. 5. Transmitter front end.

front-end block receives these 4-b data streams and generates FIR-equalized PAM4 signals. A 14-GHz differential current mode logic (CML) input clock generated by a phase-locked loop (PLL) is converted into a regulated-CMOS level and sent to the transmitter front-end block. A duty cycle detection/correction loop detects duty cycle distortion at the input of the transmitter front end and corrects the distortion in regulated-CMOS domain [5].

The transmitter front end (Fig. 5) consists of three segments (one segment for each transmitter tap) whose outputs share a common T-coil-enhanced termination network. The size of the main-tap segment is two times larger than the size of the pretap and posttap segments. Each transmitter tap segment consists of two stages: a 4-to-2 MUX/pre-driver stage and a final CML driver stage. The 4-to-2 MUX/pre-driver serializes a 4-b data input (the output of FIR generator) into a 2-b data (one MSB and one LSB). The final CML driver stage converts the 2-b data into PAM4 signals. The driver current of each segment can be controlled using a 5-b digital control to enable fine adjustments of the transmitter swing and/or transmitter equalizations.

Each tap of the PAM4 final CML driver (Fig. 6) is realized by the current summing of two times driver (MSB) and one time driver (LSB). The transmitter must maintain linearity between the four output levels while delivering up to 1.2 V diff-pp swing. Since the main power supply level of the driver (AVTT) is only 1.2 V, the output common mode will be too low to keep the differential pairs N0 and N1 and N2 and



Fig. 6. CML driver.



Fig. 7. 4:2 MUX/predriver circuit.

N<sub>3</sub> in saturation when the driver delivers high current, hence distorting the output linearity. In order to raise the output common mode, some (about 25% typically) of the driver current is sourced using current sources from an auxiliary supply (1.8 V). Since only  $\sim 25\%$  of the driver current is sourced from 1.8 V auxiliary supply, the power penalty is only 12.5% of the total driver power. Open-loop compensation using a replica ( $N_{rep}$ ) of driver input differential pair helps maintain optimum output common mode over process voltage temperature (PVT). As an example, at FF process corner and high temperature,  $N_{rep}$  provides additional current that further increase the output common mode and increase  $V_{ds}$  of N<sub>0</sub> and N<sub>1</sub> and N<sub>2</sub> and N<sub>3</sub> differential pairs to improve their saturation margins. Small cascode devices (Ncas0, NCas1) are placed above the tail current sources to increase their output impedance, which further improves dc linearity and reduces ac distortion.

The combined 4-to-2-MUX/predriver circuit (Fig. 7) incorporates a pseudo-H-bridge scheme with positive feedback. Pseudodifferential regulated-CMOS clocks (clk and clkb) are used to select the multiplexer output between pseudodifferential CMOS data d<sub>0</sub>/d<sub>0b</sub> and d<sub>1</sub>/d<sub>1b</sub>. The load structure of this circuit consists of cross-coupled PMOS devices (P<sub>0</sub> and P<sub>1</sub>) in parallel with passive resistors R<sub>0</sub> and R<sub>1</sub>. In order to extend the bandwidth, a small T-coil network is used. The circuit has a high gain at zero crossing, which improves rise/fall



Fig. 8. Receiver block diagram.

times and helps suppress clock switching noise at the driver output.

### III. RECEIVER

One of the main challenges in PAM4 signaling is its sensitivity to residual ISI. Since there are four possible levels, the worst case impact occurs when residual ISI from major transitions (i.e., from the lowest level to the highest level or vice versa) is superimposed on the main signal (which even in an ideal case is only one third of the major transition). In order to minimize the residual ISI, a large number of equalization taps (covering both precursor taps and post-cursor taps) are needed. However, implementing a large number of equalization taps in analog domain (at the receiver front-end) requires multistage feed forward equalizer (FFE) and/or decision feedback equalizer (DFE) summing circuits with enough gain to overcome slicer sensitivity and with good linearity over the range of PAM4 signaling levels. The design of this multistage analog front end can be very challenging at a higher line rate such as 56 Gb/s. In the proposed receiver, the large number of equalization taps is implemented in the digital domain using digital signal processing (DSP), thus avoiding linearity issues associated with the analog solution.

Fig. 8 shows the receiver block diagram. The on-die termination (ODT) includes a T-coil structure that compensates for parasitic capacitances from the pad, ESD protection devices, and analog front-end input gates. The receiver analog front end [two stages of automatic gain control (AGC) and two identical stages of continuous time linear equalizer (CTLE)] provides signal equalization and conditioning, which reduces the resolution and full-scale-range requirements of the ADC [6], [7]. The 28-GSample/s 32-way time-interleaved (TI) ADC converts the differential analog input into 8-b digital values. The ADC outputs are then retimed to a single 875-MHz clock domain. One set of these retimed outputs is sent to an on-chip configurable DSP and error-checker block. For the testing and debug purpose, another set of the retimed outputs is sampled periodically and stored in a 64 kb (8k symbols) buffer. A replica of the on-chip DSP block (implemented in an off-chip FPGA) periodically samples these 8k symbols and generates equalized PAM4 symbols. The off-chip FPGA also performs equalization adaptation, clock and data recovery (CDR), and ADC offset/gain/skew calibrations based on the sampled ADC outputs. The FPGA adaptation logic concurrently adapts the analog front end (CTLE and AGC settings) and the DSP coefficients. The FPGA CDR logic



Fig. 9. Constant dc-gain and constant peaking frequency CTLE.



Fig. 10. 32-way TI ADC block diagram.



Fig. 11. 7-GHz clock skew correction circuit.

implements a second-order baud-rate CDR using the Mueller-Muller algorithm.

The T-coil-enhanced ODT provides  $\sim 27$ -GHz bandwidth and less than  $-8$  dB return loss at 14 GHz (meeting the return loss requirement defined in [1]). Each one of the AGC stages provides up to 10 dB of programmable dc gain range. Each one of the two identical CTLE stages is designed to have a constant-dc gain ( $\sim 0$  dB) and programmable high-frequency gain peaking (up to 7 dB) with constant peaking frequency at  $\sim 14$  GHz. The offset from AGC and CTLE random device mismatches is corrected at these analog front-end stages so that the effective ADC full-scale range is not compromised. Compared with a constant high-frequency gain CTLE (with programmable dc gain), the constant-dc gain CTLE can either reduce the required AGC's gain at high-loss channels and/or improve the linearity of subsequent stage at low-loss channels. Furthermore, this approach minimizes the interaction between AGC adaptation loop and CTLE adaptation loop. Constant peaking frequency around Nyquist ( $\sim 14$  GHz) is chosen to suppress early post-cursor ISI as much as possible to help



Fig. 12. (a) 7-GHz clock skew correction range and (b) correction step.



Fig. 13. ADC timing diagram.

reduce ADC dynamic range requirement, while minimizing crosstalk and thermal noise beyond Nyquist frequency. As shown in Fig. 9, the CTLE uses an  $RC$  source-degeneration topology. In order to obtain constant dc gain and constant peaking frequency, the values of the source-degeneration capacitance  $C_s$  and the values of the output load capacitance  $C_d$  are set for every gain-peaking setting. Higher gain peaking is obtained by increasing the value of  $C_s$  and decreasing the value of  $C_d$ .

Fig. 10 shows the block diagram of the 8-b 28 GSample/s, 600 mV diff-pp full-scale range, 32-way TI SAR ADC used in the receiver. There are two stages of time interleavers. The first stage is a four-way time interleaver, where the input is buffered by B0/B1 buffers and sampled and held using four-phase 7-GHz sampling pulses. The sampling pulses sharing the same buffer (CLK7G\_0 and CLK7G\_2 sharing B0 buffer and CLK7G\_1 and CLK7G\_3 sharing B1 buffer) are designed to be nonoverlapping to avoid charge sharing. The second stage is an eight-way time interleaver, where each of the signals sampled by the 7-GHz sampling pulses is



Fig. 14. ADC clocking diagram.



Fig. 15. 875-MHz SAR sub-ADC circuit.

further sampled and held using eight-phase 875-MHz sampling pulses and converted into digital values using eight instances of 875-MHz SAR sub-ADC. The output of the 32 instances of SAR sub-ADCs is then retimed to a single 875-MHz clock domain and sent to a 64-kb storage.

The timing skew calibration [8]–[10] of the 7-GHz clocks and the gain/offset calibration of the 875-MHz sub-ADC instances are performed using the pseudorandom data input [9], in contrast to the use of sinusoidal tones for calibration in [8]. This approach allows live data calibration where scrambled pseudorandom data are common in most high-speed interfaces. In order to keep the area of the sub-ADC circuit small, the gain and offset corrections of the sub-ADCs are performed in the digital domain at the expense of ~10% reduction of the ADC full-scale range. Fig. 11 shows the circuit diagram of the 7-GHz clock skew correction block (only one clock phase is shown). The main path is

composed of two stages of inverters, having a controlled  $RC$ -network in between them. NMOS M1 controls the charging and discharging current of the capacitor C1 (60 fF). The gate voltage of NMOS M1 ( $V_{ctrl}$ ) is controlled by a digital skew correction code through the resistor-ladder (R2R) DAC. Since the R2R DAC has a high output impedance (4 k $\Omega$ , a bypass capacitor C2 (3 pF) is added to reduce noise coupling to the control voltage  $V_{ctrl}$ . The worst case differential non linearity of the R2R DAC with three-sigma over PVT variations is 0.46 LSB, so the skew correction step is always positive. Fig. 12 shows the simulation results of the skew correction knob characteristics over PVT corners. The skew correction range is more than -5.5 to 5.5 ps without any mismatch (random and systematic). When random and systematic mismatch is considered, the skew correction range shrinks down to -3.7 to 3.7 ps—still wide enough to cover clock distribution mismatches up to the first stage sample-and-holds (S/Hs). The

skew correction step is designed to be less than 100 fs to target 38-dB signal to noise and distortion ratio (6-b ENOB) with margin for the Nyquist frequency of 14 GHz.

Fig. 13 shows the timing diagram of the TI ADC. The top four signals are front-end four-phase 7-GHz sampling pulses ( $\text{CLK7G}_0$ ,  $\text{CLK7G}_1$ ,  $\text{CLK7G}_2$ , and  $\text{CLK7G}_3$ ). Each of the 7-GHz sampling pulse has an approximately 40% duty cycle that ensures there is no overlap between  $\text{CLK7G}_0$  and  $\text{CLK7G}_2$  (and similarly between  $\text{CLK7G}_1$  and  $\text{CLK7G}_3$ ). One group of 875-MHz sampling pulses (consisting of eight different phases) is generated from each phase of the four-phase 7-GHz sampling pulses (as shown in Fig. 13,  $\text{CLK875M}_0_0$  to  $\text{CLK875M}_0_7$  are derived from  $\text{CLK7G}_0$ ). Hence, there are 32 different phases (four groups and eight phases per group) of 875-MHz pulses in total—each one corresponds to one instance of an 875-MHz SAR sub-ADC. The 875 MHz sampling pulses have a  $\sim 3\text{UI}$  pulselwidth in order to avoid overlaps between them. This gives  $\sim 29\text{UI}$  period for the sub-ADC to perform conversion. The rising edges of the 875-MHz sampling pulses are aligned with the rising edge of corresponding 7-GHz sampling pulse. If they are not aligned well, the falling edge of the previous 875-MHz sampling may occur after the rising edge of 7-GHz clock. In this case, the previously sampled signal is mixed with the current sampled signal, significantly increases signal distortion.

The block diagram of TI ADC clock generation and timing circuit is shown in Fig. 14. It generates four-phase 7-GHz sampling pulses for front-end S/H and 32-phase 875-MHz sampling pulses for the SAR sub-ADCs. A quadrature phase interpolator generates four phases of 7-GHz clocks with 50% duty cycle. The phase interpolator delay is controlled by the CDR loop. Each of the 7-GHz clock phase has a dedicated path to the front-end S/H. Skew correction block adjusts the delay of each phase of the 7-GHz clocks based on the ADC timing-skew calibration loop. The outputs of the skew correction block are split into two paths: front-end S/H path and sub-ADC path. In the front-end S/H path, a 7-GHz clock generation block generates 40%-duty-cycle sampling pulses from the four-phase clocks using AND gates. The 7-GHz clock generation block also inserts some delay in order to control timing (rising-edge alignment) between the front-end 7-GHz sampling pulses and the corresponding back-end sub-ADC sampling pulses over PVT variations. After the delay chain, the 40%-duty-cycle sampling pulses are converted into CMOS-level pseudodifferential signals to drive the front-end S/Hs. The generation of 32-phase, 3UI-wide 875-MHz sub-ADC sampling pulses starts by changing the duty cycle of the 7-GHz clocks from 50% to 25%. Two adjacent phases of 7-GHz clocks with 50% duty cycle are used to generate one phase of 7-GHz clock with 25% duty cycle. These 25%-duty-cycle clocks are then converted into CMOS-level pseudodifferential signals and sent to an 875-MHz clock generation block, which is composed of four groups of synchronous divide-by-eight block and eight NOR gates.

Fig. 15 shows the block diagram of the 875-MSa/s 8-b asynchronous SAR sub-ADC. It consists of 127 units of



Fig. 16. Sub-ADC comparator.



Fig. 17. Die micrograph.

capacitors, one comparator, an SAR logic block, and a retimer. A differential capacitor DAC (CDAC) structure is adopted to keep the common mode at the comparator input constant during conversion. In order to achieve high sampling rate with minimum power consumption, the size of the CDAC and any parasitic capacitors needs to be minimized. Top-plate sampling scheme [11] is adopted to reduce the CDAC requirement from 8 to 7 b, therefore reducing CDAC size by half. The CDAC unit capacitor is also sized just enough ( $\sim 0.4 \text{ fF}$ ) to meet the matching requirement. Since the sub-ADC only needs to have moderate resolution (8 b), the CDAC matching requirement is not very stringent. Instead of using one of the available power supplies, a low-voltage, low-impedance reference voltage ( $V_{\text{ref}}$ ) is used in the CDAC to achieve the same full-scale range without increasing the total capacitor size. The relationship between full-scale range voltage ( $V_{\text{fs}}$ ) and  $V_{\text{ref}}$  is shown as follows:

$$V_{\text{fs}} = \frac{2^7 C_u}{C_{\text{tot}}} 2 V_{\text{ref}} \\ C_{\text{tot}} = 2^7 C_u + C_{\text{par}} \quad (1)$$

where  $C_u$  is the CDAC unit capacitance,  $C_{\text{tot}}$  is the total capacitor size, and  $C_{\text{par}}$  is the parasitic scaling capacitor. From (1), for a given full-scale range and CDAC unit capacitor, lower  $V_{\text{ref}}$  value allows for lower parasitic scaling capacitor. A single programmable CDAC reference voltage ( $V_{\text{ref}}$ ) is generated on-chip to drive all 32 sub-ADCs for better matching.



Fig. 18. Transmitter output eye diagram.



Fig. 19. Measured ADC performance with 181.5-MHz and 13.99-GHz sinusoidal inputs.

The  $V_{ref}$  output impedance is designed to be low in order to avoid disturbance from 32 DACs switching.

The comparator is composed of a preamplifier followed by a StrongArm (SA) latch (Fig. 16). The preamplifier is added to reduce the input-referred noise of the comparator to below 0.5 LSB. A reset switch is added to the preamplifier to implement overdrive recovery, which clears the memory effect from previous bit. The preamplifier is enabled earlier than the SA latch to allow the output to settle before SA latch starts. A pair of dummy NMOS with its drain and source shorted together is connected to the input and output of the preamplifier to reduce kick back from the preamplifier output to the CDAC. Together with SAR logic delay and the CDAC settling time, the comparator regeneration time constant ( $\tau$ ) determines the metastability error rate of the SAR ADC [12].  $\tau$  is designed to be  $\sim 4$  ps to achieve  $< 1e-15$  metastability error rate (significantly lower than the raw BER target of the link) assuming uniform distribution of the input signal.

In order to avoid distributing high frequency clock and to increase the SAR conversion speed, asynchronous SAR logic

with programmable delays is implemented. To increase speed further, the SAR logic is implemented using custom-built digital circuits.

#### IV. EXPERIMENTAL RESULTS

Fig. 17 shows the die micrograph of the testchip fabricated in 16-nm FinFET. There are two TX/RX lanes sharing a common PLL, clock distribution, and bias block. The first RX lane has full receiver functionalities. The CTLE in the second RX lane is bypassed to facilitate direct measurements of the ADC performance.

Fig. 18 shows the transmitter output eye diagram over  $\sim 5$ -dB channel. The transmitter is configured to transmit PRBS7 and PRBS31 patterns with  $\sim 4.5$ -dB posttap equalization. The measured random jitter is 200 fs-rms. The transmitter linearity measurement is performed by measuring average dc voltage of the four PAM4 levels and calculating RLM using the equation shown in Fig. 3. The transmitter achieves the RLM of 0.97 at 1.2 V diff-pp swing.



Fig. 20. Pre-DSP received eye diagram at (a) ADC Input for 6-dB channel and (b) 32-dB channel.

The stand-alone ADC performance is measured on the second RX lane with CTLE bypassed. The ADC gain, offset, and timing skew calibrations are first performed by passing PAM4 PRBS31 data sequence through the ADC. Once the calibrations are completed, the ADC inputs are connected to a differential sinusoidal signal at various frequencies—generated using an RF signal generator and a single-ended-to-differential phase splitter. FFT is performed on the ADC outputs captured in the 64-kb storage. The ADC achieves the ENOB of 6.3 at 180 MHz and 4.9 at 14 GHz, as shown in Fig. 19. The ADC performance at 180 MHz is limited by the thermal noise of the single-slice SAR ADC comparator. The ADC performance at 14 GHz is limited by residual timing skew of the four-phase 7-GHz sampling pulses.

The link functionality is tested by connecting the transmitter output to the receiver through different kinds of backplane channels. The off-chip FPGA periodically reads reading the ADC outputs stored in of the 64-kb buffers and performs ADC calibration, CDR, equalization adaptation, and DSP. Fig. 20 shows eye diagrams at the CTLE/AGC outputs (captured using the ADC as a digital oscilloscope) and the post-DSP histograms of the four PAM4 levels (sampled at the CDR lock point) over 200k symbols. The CTLE/AGC outputs show open eye with ~6-dB channel [Fig. 8(a)] and closed eye with 25-dB channel (8b). In both cases, the DSP opens the eye in the post-DSP PAM4-level histograms. The estimated BER is  $<1e-8$  based on extrapolation of the histograms.

In order to measure the link performance at full 56-Gb/s throughput (without data subsampling), the on-chip DSP and the on-chip error checker are used. The off-chip FPGA still performs ADC calibrations, CDR, and equalization adaptation.



Fig. 21. BER bath-tub curve for 31-dB channels at various crosstalk levels.

TABLE I  
PERFORMANCE SUMMARY

|                                                         |                                                         |
|---------------------------------------------------------|---------------------------------------------------------|
| Technology                                              | CMOS 16nm FinFET                                        |
| Power Supply ( $V_{avcc}$ , $V_{avtt}$ , $V_{aux}$ )    | 0.9V, 1.2V, 1.8V                                        |
| Dual Transceiver Active Area                            | 2.8mm <sup>2</sup>                                      |
| Max TX Swing                                            | 1.2V diff-pp                                            |
| TX RJ (PRBS7, Major Transition)                         | 200fs                                                   |
| ADC ENOB                                                | 6.5@0.18GHz, 4.9@14GHz                                  |
| ADC Power (including ADC clocks)                        | 280mW                                                   |
| BER at 56Gb/s (31 dB loss @14GHz, 3.5mV rms cross-talk) | $<1\times10^{-8}$                                       |
| Power per lane at 56Gb/s (Does not include DSP)         | 550mW (140mW TX, 370mW RX, 40mW PLL/Clock Distribution) |

The resulting ADC calibration and equalization coefficients are used to set the corresponding coefficients in the on-chip DSP. The link performance measurement results are described in the form of BER bath-tub curve around the CDR lock point (Fig. 21). In this case, the ball grid array (BGA)-to-BGA channel loss is  $\sim 31$  dB at 14 GHz. The transmitter FIR settings are fixed and the receiver analog and digital equalizations are adapted. The DSP is configured to have 24-tap FFE and 1-tap DFE. Additional crosstalk is generated by sending a PRBS aggressor signal through a backplane channel next to the channel under test, measuring the rms of the crosstalk-induced voltage near the receiver, and adjusting the amplitude of the aggressor signal until the desired crosstalk rms voltage is obtained. The link BER is  $<1e-15$  without additional crosstalk at the receiver and  $<1e-8$  with 3.5 mV<sub>rms</sub> additional crosstalk at the receiver. The timing window at 1e-4 BER (specified by the proposed standard) is  $>0.2$  UI.

Table I shows the performance summary. The transceiver consumes 550-mW total power (excluding the on-chip configurable DSP power which cannot be accurately measured as it is implemented as part of a larger test structure and not optimized in this design). The TX consumes 140 mW, the RX consumes 370 mW (of which 280 mW is consumed by the ADC), and the PLL/clock distribution/bias block consumes 40 mW.

## V. CONCLUSION

The transceiver described in this paper is able to achieve <1e-8 raw BER (significantly better than 1e-4 raw BER targeted by the standard [1]) over a legacy backplane channel with 31-dB loss and 3.5-mV additional crosstalk. Given the significant BER margin achieved in this design relative to the standard requirement, we will look for opportunities to further optimize power and performance in the future. Specific to the ADC-based receiver design, this will include analyzing the impact of ADC resolution to the link BER and analyzing the partition between the analog and digital portions of the equalizations.

## ACKNOWLEDGMENT

The authors would like to thank Xilinx SerDes design and validation teams for contributing to their circuit design and silicon measurements.

## REFERENCES

- [1] Optical Internetworking Forum (OIF), "CEI-56G-LR-PAM4 long reach implementation agreement draft text," Opt. Internetworking Forum Contrib., Tech. Rep. 2014.380.03, Jun. 2016.
- [2] V. Stojanovic *et al.*, "Autonomous dual-mode (PAM2/4) serial link transceiver with adaptive equalization and data recovery," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 1012–1026, Apr. 2005.
- [3] Otto *et al.*, "Proposal for CEI-56G FEC requirements section," Opt. Internetworking Forum Contrib., Tech. Rep. 2015.302.02, Jul. 2015.
- [4] M. Bassi, F. Radice, M. Brucolieri, S. Erba, and A. Mazzanti, "A 45Gb/s PAM-4 transmitter delivering 1.3Vppd output swing with 1V supply in 28nm CMOS FDSOI," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Jan. 2016, pp. 66–67.
- [5] Y. Frans *et al.*, "A 40-to-64Gb/s NRZ transmitter with supply-regulated front-end in 16nm FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2016, pp. 68–70.
- [6] D. Cui *et al.*, "A 320mW 32Gb/s 8b ADC-based PAM-4 analog front-end with programmable gain control and analog peaking in 28nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Jan. 2016, pp. 58–59.
- [7] E. H. Chen, R. Yousry, and C.-K. K. Yang, "Power optimized ADC-based serial link receiver," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 938–951, Apr. 2012.
- [8] L. Kull *et al.*, "A 90GS/s 8b 667mW 64× interleaved SAR ADC in 32nm digital SOI CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2014, pp. 378–379.
- [9] C. Erdmann, "Time skew extraction of interleaved analog-to-digital converters," U.S. Patent 8830094, Sep. 9, 2014.
- [10] H. Wei, P. Zhang, B. D. Sahoo, and B. Razavi, "An 8 bit 4 GS/s 120 mW CMOS ADC," *IEEE J. Solid-State Circuits*, vol. 49, no. 8, pp. 1751–1761, Aug. 2014.
- [11] S. Gambini and J. Rabaey, "Low-power successive approximation converter with 0.5 V supply in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 11, pp. 2348–2356, Nov. 2007.
- [12] A. Waters, J. Muhlestein, and U.-K. Moon, "Analysis of metastability errors in asynchronous SAR ADCs," in *Proc. IEEE Int. Conf. Electron., Circuits, Syst. (ICECS)*, Cairo, Egypt, Dec. 2015, pp. 547–550.



**Yohan Frans** received the B.S. degree in electrical engineering from the Bandung Institute of Technology, Indonesia, in 1995, and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2001.

From 2001 to 2012, he was with Rambus Inc., Sunnyvale, CA, where he was involved in high-performance and low-power serial links and memory interfaces as a Circuit Design Engineer, a Circuit Architect, and a Design Manager. Since 2012, he has been with Xilinx Inc., San Jose, CA, USA. He is currently leading design teams as a Senior Engineering Director with the SerDes Technology Group, Xilinx, San Jose, CA, USA, where he is involved in developing high-speed wireline transceivers for advanced FPGA. His current research interests include high-speed mixed-signal circuit design, serial link architecture, transmitter/receiver design, PLL/DLL, memory interfaces, and low-power circuit architectures.



**Jaewook Shin** received the B.S., M.S., and Ph.D. degrees in electrical engineering from Kwangwoon University, Seoul, South Korea, in 2004, 2006, and 2011, respectively.

In 2011, he joined the Electrical Engineering Department, University of California at Los Angeles, Los Angeles, CA, USA, as a Post-Doctoral Researcher. Since 2014, he has been with Xilinx Inc., San Jose, CA, USA. He is currently involving in analog-mixed signal circuits and systems, such as data converter and phase-locked loop. His current research interests include low-power, low-noise circuits and systems in serial link and communication systems.



**Lei Zhou** received the B.Eng. degree in electrical engineering from the Huazhong University of Science and Technology, China, in 2002, the M.Eng. degree from the Department of Electrical and Computer Engineering, National University of Singapore, Singapore, in 2005, and the Ph.D. degree in electrical and computer engineering from the University of California at Irvine, Irvine, CA, USA, in 2010.

He joined Broadcom, Irvine, CA, and Quantenna, Fremont, CA, where he was involved in developing RF front-end circuits for wireless LAN transceivers. He interned with Atheros, San Jose, CA, and Broadcom in 2006, 2007, and 2009. He is currently a Senior Design Engineer with the SerDes Technology Group, Xilinx, San Jose, CA, USA, where he is involved in developing high-speed analog and mixed signal circuits for NRZ and PAM4 transceivers. He has authored or co-authored over 10 journal and conference publications. His current research interests include analog, mixed-signal, and RF/MMW integrated circuit design for wireline and wireless transceivers.



**Parag Upadhyaya** was born in Kathmandu, Nepal. He received the B.S.E.E., M.S.E.E., and Ph.D. degrees (Hons.) in electrical engineering from Washington State University, Pullman, WA, USA, in 2000, 2005, and 2008, respectively.

From 2001 to 2003, he was with Cypress Semiconductor, Austin, TX, USA, where he was involved in the design of high-speed wireline/optical transceivers. He is currently a Senior Design Manager with Xilinx, San Jose, CA, USA, and leads development of high-speed transceivers for FPGA applications. He has authored or co-authored over 40 journal, conference, and book chapter publications. His current research interests include high-speed wireline and wireless transceivers, semiconductor device physics, and low-power RFICs.

Dr. Upadhyaya was a recipient of the outstanding Ph.D. student award in 2008 at Washington State University, the Best Paper Award in 1998 for his work on tribology, and the Best Poster Awards in 2004 and 2006 for his works on subharmonic mixer and novel mixed-signal circuits for wireline/wireless transceiver, respectively.



**Jay Im** received the B.S. degree in electronics engineering and the M.S. degree in physics from Seoul National University, Seoul, South Korea, and the Ph.D. degree in physics from Ohio State University, Columbus, OH, USA, in 2001. His thesis was on nanometer-scale study of metal contacts to wide-bandgap semiconductor materials, including SiO<sub>x</sub>, SiC, and GaN.

He was with Silicon Storage Technology Inc., Sunnyvale, CA, USA, from 2001 to 2004, as a Process/Device Engineer, where he was involved in developing CMOS Inc., San Jose, CA, USA, in 2004, where he was involved in various projects, including embedded nonvolatile memory technology development, E-fuse PDK development, custom design of OTP macro embedded in field-programmable gate array, RTL design of interface block for foundry IP, and led the preproduction diagnostic module group designing and testing silicon test vehicles for the new technology nodes. He is currently leading a group of circuit designers developing analog front-end and high-speed digital circuit blocks for high-speed SerDes PHY.



**Vassili Kireev** received the M.S. degree in quantum electronics from the Moscow Institute of Physics and Technology, Moscow, Russia, in 1987, and the Ph.D. degree in microelectronic physics and technology with a focus on induced carrier concentration methods of semiconductors characterization from Institute of Problem of Microelectronic Technology and High Purity Materials, Chernogolovka, Moscow Region, Russian Federation, in 1997.

From 1987 to 1999, he was with the Institute of Microelectronic Technology, Russian Academy of Sciences, Moscow, Russian Federation. From 1999 to 2005, he was a Visitor Scholar with the University of California at Los Angeles, Los Angeles, CA, USA, and the Oak Ridge National Laboratory, Oak Ridge, TN, USA, where he was involved in dynamic fracture mechanics, semiconductor lasers, and MEMS sensors. In 2006, he joined Xilinx Inc., San Jose, CA, USA. He is currently involving in high-speed data converter for optical data communications. His current research interests include the design of SerDes analog front-end, clocking, and passive components.



**Mohamed Elzeftawi** received the B.Sc. degree in electronics and communication and the M.Sc. degree in engineering physics from Cairo University, Egypt, in 2004 and 2007, respectively, and the Ph.D. degree in electrical and computer engineering from the University of California at Santa Barbara, Santa Barbara, CA, USA, in 2012, with a focus on designing compact low-power analog circuits and IR-UWB transmitter for neural brain-recording implants.

He was involved in developing new products for 60 GHz wireless communication backhauls and LTE base-stations during his 2010 and 2011 internships at MoseleySB Research and Development Department, Santa Barbara, CA. He is currently a Staff IC Design Engineer with the SerDes Technology Group, Xilinx Inc., San Jose, CA, USA, where he is currently involved in developing 56 Gbps NRZ and PAM4 transceivers in FinFet technology. He has several publications and authored a book chapter. He holds four U.S. patents and one patent pending.



**Hiva Hedayati** (M'02) received the Ph.D. degree from Arizona State University, Tempe, AZ, USA, in 2009.

In 2009, he joined the Marvell DataCom Group, Santa Clara, CA, USA, where he was involved in analog, RF, and mixed signal front ends for wireless and wireline communication ICs. In 2012, he joined the SerDes Technology Group, Xilinx, San Jose, CA, USA, where he took on a lead role designing high speed analog and mixed signal building blocks for NRZ and PAM4 transceivers. In 2016, he joined Corporation, Santa Clara, CA, where he lead the design of high-speed communication systems.

Dr. Hedayati has been a Technical Program Committee Member of the IEEE RFIC Conference.



**Toan Pham** received the B.S. degree in electrical engineering and computer sciences from the University of California at Berkeley, Berkeley, CA, USA, in 1996.

From 1996 to 2010, he was with Trident Microsystems, Sunnyvale, CA, Oki Semiconductor, Sunnyvale, LSI logic, San Jose, CA, and AMD, Sunnyvale. He is currently a Logic Designer with SerDes Technology Group, Xilinx, Inc., San Jose, CA, USA. His current research interests include logic design and high-speed digital circuit design.



**Santiago Asuncion** received the B.S. degree in electrical engineering and computer science from the University of California at Berkeley, Berkeley, CA, USA, in 1999.

In 1999, he joined Xilinx, San Jose, CA, USA, as a Product Application Engineer, providing support of Xilinx Products. From 2001 to 2007, he was with the IC Design Team, where he was involved in the Input/Output Block and SerDes. In 2008, he joined the SerDes Application/Validation team, where he was involved in validating and characterizing various

SerDes blocks.



**Christopher Borrelli** received the B.S. degree in computer engineering from Villanova University, Villanova, PA, USA, in 1998.

In 1998, he joined Xilinx, San Jose, CA, USA, where he held various positions, including a Technical Support, a RTL Engineer, and a SerDes Applications Engineer. He is currently the Director of SerDes System Engineering and Applications with Xilinx.



**Geoff Zhang** received the Ph.D. degree in microwave engineering and signal processing from Iowa State University, Ames, IA, USA, in 1997.

He was with HiSilicon, Shenzhen, China, Huawei Technologies, Shenzhen, LSI, Allentown, PA, USA, Agere Systems, Allentown, Lucent Technologies, Allentown, and Texas Instruments, Dallas, TX, USA. In 2013, he joined Xilinx Inc., San Jose, CA, USA. He is currently a Distinguished Engineer and a Supervisor with the SerDes Technology Group, Xilinx, San Jose, CA, USA, where he is involving in transceiver architecture and modeling. His current research interests include transceiver architecture modeling and system level end-to-end simulation, both electrical and optical.



**Hongtao Zhang** received the B.S. degree in applied physics from Tsinghua University, Beijing, China, in 2001, and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of California at San Diego, San Diego, CA, USA, in 2004 and 2006, respectively.

He was in SerDes characterization with Texas Instruments, Dallas, TX, USA. From 2010 to 2013, he was with the SerDes Design Team, Oracle Corporation, where he was involved in circuit design and architecture modeling. He is currently a Senior Staff Design Engineer with the SerDes Technology Group, Xilinx Inc., San Jose, CA, USA, where he is involving in SerDes architectures development and circuits design. His current research interests are SerDes architecture and modeling, high-speed mixed-signal design and optimization, and system level simulation.



**Ken Chang** (M'99–SM'14) received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1990, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1994 and 1999, respectively.

From 1999 to 2010, he was with Rambus Inc., Sunnyvale, CA. He led several projects including FlexIO interface for CELL processors, 16 Gb/s and 20 Gb/s low power memory interfaces exploring various signaling techniques. Since 2010, he has been with Xilinx Inc., San Jose, CA, USA, and led the SerDes Technology Group, where he has been involved in developing multistandard SerDes IPs for field-programmable gate arrays. He has authored or co-authored over 30 IEEE publications and holds over 20 U.S. patents in the high-speed link area. His current research interests include high-speed mixed-signal CMOS circuit design, transmitter and receiver design, CDR, equalization, PLL/DLL design, circuit noise analysis, signal integrity analysis, and mixed signal design methodology.