

# A 40-to-56 Gb/s PAM-4 Receiver With Ten-Tap Direct Decision-Feedback Equalization in 16-nm FinFET

Jay Im, Dave Freitas, Arianne Bantug Roldan, Ronan Casey, *Member, IEEE*, Stanley Chen, *Member, IEEE*, Chuen-Huei Adam Chou, Tim Cronin, Kevin Geary, Scott McLeod, Lei Zhou, Ian Zhuang, Jaeduk Han, *Student Member, IEEE*, Sen Lin, Parag Upadhyaya, *Member, IEEE*, Geoff Zhang, Yohan Frans, and Ken Chang, *Member, IEEE*

**Abstract**—A 40–56 Gb/s PAM-4 receiver with ten-tap decision-feedback equalization (DFE) targeting chip-to-module and board-to-board cable interconnects is designed in 16-nm FinFET. The design implements direct feedback of the first post-cursor ( $h_1$ ) DFE tap to reduce the number of slicers. The  $h_1$  feedback signals are directly tapped from the master latch output of the StrongArm-based slicers. A CMOS amplifier with delayed pre-charge release is used to boost and pre-condition the  $h_1$  feedback signals before being applied to current-mode logic tap cell for optimum DFE summer settling time. The receiver achieves less than 1E-12 PRBS31 bit error rate (BER) over a channel with 10-dB loss at 14-GHz consuming 230 mW. Fully adapted by off-chip software, the receiver performance demonstrates the effectiveness of direct  $h_1$  loop and the need for higher DFE taps to achieve a required BER over channels with reflections. Receiver performance over higher loss channels up to 23 dB and/or under emulated cross-talk noise injection cases are also presented.

**Index Terms**—16-nm FinFET, analog receiver, CMOS, decision-feedback equalization (DFE), direct  $h_1$  feedback, four-level pulse amplitude modulation (PAM-4), wireline transceiver.

## I. INTRODUCTION

THE increasing bandwidth demand in data centers and telecommunication infrastructures has prompted new electrical interface standards capable of operating up to 56-Gb/s per lane. The industry has recently proposed four-level pulse amplitude modulation (PAM-4) signaling standards for such high-speed transceivers. For example, CEI-56G-VSR-PAM4 and CEI-56G-MR-PAM4 standards [1] define such PAM-4 signaling at 56-Gb/s targeting 10- or 20-dB channel losses at the Nyquist frequency (14 GHz). The

Manuscript received May 1, 2017; revised August 3, 2017; accepted August 24, 2017. Date of publication November 8, 2017; date of current version November 21, 2017. This paper was approved by Guest Editor Mounir Meghelli. (*Corresponding author: Jay Im.*)

J. Im, D. Freitas, S. Chen, C.-H. A. Chou, T. Cronin, S. McLeod, L. Zhou, I. Zhuang, P. Upadhyaya, G. Zhang, Y. Frans, and K. Chang are with Xilinx, Inc., San Jose, CA 95124 USA (e-mail: jay.im@xilinx.com).

A. B. Roldan is with Xilinx, Inc., Singapore 486040.

R. Casey and K. Geary are with Xilinx, Inc., Cork, Ireland.

J. Han and S. Lin are with the Berkeley Wireless Research Center, Berkeley, CA 94704 USA.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2749432



Fig. 1. Example VSR channel with ~10 dB measured IL (from PCB TX pad to RX pad) at 14 GHz.

standards are applicable for chip-to-chip/module connections or board-to-board connections with cable interconnects. In PAM-4 signaling, each symbol carries 2 bits of information, effectively reducing the bandwidth requirement of the channel and the front-end data paths of the transceiver by 50% as compared to a non-return-to-zero (NRZ) signaling scheme. Since the signal is split into four levels in a PAM-4 scheme, however, the impact from residual ISI or cross-talk noise becomes more severe compared to NRZ signaling.

To illustrate this point, Fig. 1 shows the measured insertion loss (IL), or  $S_{21}$  as a function of the frequency, for a test board trace with connectors resembling a very-short-reach (VSR) channels. The measured IL is ~10 dB at 14 GHz, from the TX printed circuit board (PCB) pads to the RX PCB pads where the ball grid array (BGA) package balls make contact. Including the package loss, the total IL becomes ~12 dB at 14 GHz from the TX controlled collapse chip connection (C4) pads to the RX C4 pads. With this VSR channel and package, the simulated single-pulse response



Fig. 2. Simulated single-pulse response at RX front-end output without DFE correction for the VSR channel (S21 with the package in Fig. 1).

through a TX finite impulse response (FIR) filter and an RX continuous-time linear equalizer (CTLE) is plotted in Fig. 2. Although the channel IL is merely  $\sim 10$  dB at 14 GHz, the single-pulse response exhibits significant reflections from impedance discontinuities, mainly from package and PCB traces. Specifically, the reflection at around the 6th symbol unit interval (UI) post-cursor (h6) is attributed to the impedance discontinuity at the package BGA ball to device socket or PCB pad interface. Another noticeable reflection at around the 21st UI post-cursor (h21) is attributed to the interface between the PCB pad and the breakout cable harness socket in the bench setup. In an NRZ signaling scheme, these reflections are benign enough to not need additional equalization to meet the required bit error rate (BER). However, they are detrimental to a PAM-4 signaling scheme and cannot be effectively equalized by an RX CTLE or by a TX feed-forward equalizer (FFE) with limited taps.

In order to effectively equalize the PAM-4 signal from such channels to meet the minimum target BER required by the standards, multi-tap FFE or decision-feedback equalization (DFE) is essential. In analog-to-digital converter (ADC)-based solutions [2], [3], the PAM-4 signal, after analog signal conditioning from the CTLE, is sampled and digitized using a high-speed ADC providing a digital representation of the sampled waveform value to a digital signal processing (DSP) block. Subsequently, the DSP performs necessary FFE or DFE functions and data/error slicing. While the ADC-based solution provides a higher level of flexibility in terms of the number of taps to cover both pre- and post-cursor ISI, the power consumption and clock and data recovery (CDR) loop latency of the ADC-based solution cannot easily match that of an analog-based solution. In order to avoid these challenges of an ADC-based receiver architecture, this paper presents the design of a PAM-4 receiver using a ten-tap direct DFE targeting VSR and medium reach (MR) channels [4], and demonstrates that the RX power consumption can be reduced by  $\sim 40\%$  or more from a recently published ADC-based 56-Gb/s PAM-4 receiver power of 370 mW (excluding DSP performing FFE equalization) [3].

## II. RECEIVER DESIGN

The PAM-4 receiver chip, manufactured using 16-nm FinFET technology, is a part of a bigger structure which also contains a PAM-4 TX and a shared *LC*-phase-locked loop (PLL) with high-speed clock distribution. The on-chip *LC*-PLL provides four phase (I, Ib, Q, and Qb) 14-GHz clocks to the RX clock correction and phase interpolator (PI). The shared *LC*-PLL, TX, and high-speed clock distribution blocks are described in more detail in [2] and [3].

### A. Receiver Top Level

The top level receiver block diagram is shown in Fig. 3. The analog front-end (AFE) consists of two stages of CTLE followed by one programmable gain amplifier (PGA) stage. The two-stage peaking amplifiers provide both high-frequency boost (for near cursor ISI cancellation) and mid-to-low frequency boost (for long-tail ISI cancellation) [5] which reduces the required DFE tap range. Each AFE stage is implemented by the current-mode logic (CML) topology with passive load resistors and bandwidth-extending peaking inductors. The gain control of the PGA stage and the peaking control of the two CTLE stages are achieved by *RC* source degeneration technique as previously reported in [5]. The typical stage bandwidth is targeted at  $\sim 24$  GHz with the peak gain frequency centered at the Nyquist frequency (14 GHz). Total AFE peaking (the gain at 14-GHz relative to the dc gain) ranges from  $\sim 0$  dB to  $\sim 16$  dB. More specific information pertaining to the CTLE design considerations and implementation details are to be found in [2], [3], and [5]. The RX input pad network (for C4 bumps) consists of calibrated on-die termination (ODT) resistors, ESD protection diodes, and T-coil peaking inductors. From the RX input pad network, the received PAM-4 signal is sent to the CTLE input devices. The ODT resistor value is digitally calibrated using an off-chip high-precision reference resistor. The ODT calibration code is also used to calibrate the load resistors of the high-speed CML buffers in the CTLE and DFE summer stages. Using calibrated resistor loads minimize the gain and bandwidth tolerances due to silicon process variation as well as undesirable ringing from inductive peaking.

The analog RX data signal partially equalized by CTLE is sent to a half-rate ten-tap DFE block that performs mixed-signal DFE summing and PAM-4 data and error slicing. A total of 10-DFE taps are implemented to provide equalization range for the typical trace length ( $\sim 10$ – $30$  mm) between the silicon C4 signal bumps to their corresponding BGA package balls. DFE is clocked by a single-phase differential clock whose phase is controlled by a PI. The captured data and error bit streams are converted from thermometer code to binary code and are sent to the deserializer (DES). After the deserializer, the data bus and error bus are forwarded to an on-chip pseudo-random binary sequence (PRBS) checker and storage buffer.

An off-chip controller, implemented in a field-programmable gate array (FPGA) firmware (marked in gray box in Fig. 3), periodically fetches the captured



Fig. 3. Receiver block diagram.



Fig. 4. PAM-4 slicer naming convention and data/error decoding.

PAM-4 data and error samples from the on-die storage buffer to perform error calculations for various calibration and adaptation loops. A baud-rate CDR loop, formed with off-chip logic, controls the output clock phase of the on-chip PI to sample data and error at the optimal sampling point. A PAM-4-level adaptation scheme adjusts the thresholds of the data and error slicers based on the voltage levels of the slicer input. The front-end offset correction code is adapted by monitoring the time average of the settled error slicer DAC codes. A standard minimum mean square error algorithm is used for adapting DFE tap coefficients and CTLE peaking parameters. The PGA gain setting is adapted based on a set target cursor height which is limited by the receiver front-end linearity through the automatic gain control (AGC) loop.

#### B. Decision-Feedback Equalization

Fig. 4(a) illustrates PAM-4 signal and corresponding data and error slicers naming convention used in this paper. For every sampling UI, three data slicers (DH, DZ, and DL)

produce a 3-bit thermometer-coded data corresponding to one of four PAM-4 symbols. The thermometer-coded data is later translated to two binary digits. For every sampling UI, four error slicers (EHP, ELP, EHN, and ELN) produce a 4-bit thermometer-coded error. Based on the detected current symbol value, only one of the error outputs is selected as the valid error bit. The translation of data and error slicer outputs is tabulated for clarity in Fig. 4(b). The half-rate DFE block (Fig. 5) shows the split data paths, even and odd. To meet the stringent DFE feedback timing, each half consists of two cascaded CML-based current-summing stages (SUM1 for DFE taps h3 through h10 and SUM2 for DFE taps h1 and h2) [6].

To further reduce the summing node loading, critical in reducing power consumption and design complexity, direct feedback of h1 is implemented, as opposed to the widely used speculative scheme. Using this scheme, each of the even/odd half of SUM2 drives seven slicers: three data slicers (DH, DZ, and DL) and four error slicers (ELN, EHN, ELP, and EHP). In contrast, conventional speculative h1 for PAM-4



Fig. 5. DFE block diagram.



Fig. 6. (a) Direct versus (b) speculative h1 DFE in PAM-4 (full-rate case shown for clarity; assuming StrongArm sense-amp followed by keeper latch structure).

signaling would have required 12 data slicers because the previous symbol has four possible outcomes. As an illustration, Fig. 6 depicts the h1 DFE loop (in the full-rate case for simplicity) using Fig. 6(a) direct h1 DFE and Fig. 6(b) speculative h1 DFE, respectively. The speculative DFE scheme

requires four times more data slicers ( $4 \times$  for each DH, DZ, and DL) than direct feedback scheme, corresponding to four possible PAM-4 levels of the previous symbol. With speculative h1, summer buffer (Gm2) has to drive  $4 \times$  more data slicers while maintaining linearity at a larger signal swing

because the h1 ISI is not canceled at the summer output node. Maintaining bandwidth and more stringent linearity requirement of the summer necessitates extra power consumption. Also, the high-speed sampling clock loading increases four times with the additional slicers, further increasing clock power consumption. Another challenge in speculative h1, specifically for PAM-4, is that the select signal of the unroll MUXes needs to be decoded from thermometer-coded raw PAM-4 data. Having such decoding logic to drive unroll MUXes with large loading in the critical 1UI timing path further diminishes the benefit of loop unrolling. Unlike data slicers, time multiplexing is possible for error slicers in speculative h1 scheme, obviating the need for increased slicer counts from 4 to 16 per each summer branch, but only at the cost of degraded CDR and adaptation performance. Based on these considerations, h1 is direct feedback in this paper to simplify the design while minimizing power consumption.

Data for tap h1, h2, h3, and h4 use thermometer encoding (raw data slicer output from DH, DZ, and DL) in order to minimize the latency of the high-speed DFE feedback signals. Data for tap h5 through h10 are converted to be binary weighted to reduce the number of latches and clock loading, reducing power consumption. The location of data format transition in the DFE FIR latch chain is selected based on the two competing design considerations: 1) the timing closure of the DFE loops must be achieved and 2) the earlier the thermometer-to-binary conversion takes place, the less number of parallel latches are needed, which reduces power consumption and silicon area. In the main data path from the slicer array to the deserializer, the 3-bit thermometer-coded PAM-4 data samples and the 4-bit error samples are converted into a combined 3-bit (2-bit data and 1-bit error, per PAM-4 symbol UI) binary code [Fig. 4(b)], where only the valid one out of four error samples is chosen based on the value of data samples, prior to deserialization. Decoding data and error bits prior to deserialization reduces the number of data paths, saving power.

### C. Data Slicer and Direct h1 Loop

Meeting direct h1 feedback timing to SUM2 output node within 1UI poses serious challenges to circuit implementation. Fig. 7 illustrates direct h1 loop in full-rate structure (assuming slicer implemented by StrongArm latch sense-amp latch followed by keeper latch) where the 1UI = 35.7 ps of budget is split three ways. These three elements of the timing budget are the slicer Tcq, the propagation delay from the change in digitized h1 value to the change in tap differential current in the summer, and finally the settling time of summer output node in response to the tap current change. At such high speed, a clear partitioning of the three timing arcs is not meaningful and sub-block circuit design needs to be done by high-level co-optimization with all three components in the loop (data slicer, DFE tap cell, and summer) carefully optimized.

Requirements for PAM-4 data slicer are very stringent, needing low latency to satisfy direct h1 timing and high resolution in both temporal and voltage domains over large



$T_{cq} + Th_{1delay} + T_{sum\_settling} < 1 \text{ UI} = 35.7 \text{ ps}$   
(@56Gb/s)

Fig. 7. h1 direct DFE timing budget.

signal and slice levels. For example, a typical unit symbol cursor height of 100 mV (in diff-z2p) implies that the slicer sensitivity needs to be much smaller than  $\sim 10$  mV to secure error-free decision in the presence of residual ISI and noise. Slice level linearity range must be as large as 300 mV for all possible PAM-4 symbols to be captured with reasonable margin. The sampling aperture, defined by the time window within which the slicer input data signal needs to be stable enough not to corrupt the slicer output, must be much smaller than 1UI for error-free decision, while maintaining the slicer Tcq requirement for the h1 DFE loop to be closed within 1UI.

A StrongArm sense amplifier [7] with a speed-optimized h1 path was designed (Fig. 8) to satisfy these requirements. The data slicer sends the master latch output (QP, QN) directly to h1 DFE tap of SUM2 via a CMOS inverter amplifier marked in the dashed box (h1 amp) with delayed pre-charge release. The h2 output comes from a conventional keeper latch, which stores captured data during the reset phase. The keeper latch is implemented in the standard clocked differential CMOS latch topology, thereby requiring a local clock inversion as illustrated in Fig. 8. On the falling edge of the clock, both polarities of the h1 outputs are pre-charged to supply, injecting only common-mode tap current into SUM2 output (Fig. 9). This helps maintaining the tail node of the h1 CML tap cell and speeds up h1 settling. On the rising edge of the sampling clock, the h1 pseudo-differential outputs start to develop and eventually settle to a complementary rail-to-rail signals that steer h1 tap current into the SUM2 output. The h1 CML tap cell is comprised of three parallel unit tap cells of equal weight, driven by three slicer outputs from DH, DZ, and DL, respectively. The common-mode levels of the slicer data input from DFE summer and the slicer threshold digital-to-analog converter (DAC) output need to be well matched for the linear operation of the slicer. This requirement is achieved by employing a DFE summer replica in the DAC.

The timing of the h1 amp pre-charge release is delayed relative to sampling edge by two inverters in order to keep the SUM2's h1 tap CML tail node from dipping during sense-amp regeneration phase, which can cause over-correction resulting in longer SUM2 settling. Fig. 10 illustrates simulated



Fig. 8. StrongArm data slicer with h1 amplifier.



Fig. 9. h1 loop (only even to odd branch path shown for clarity). Slicer h1 amplifier resets to high for faster h1 summer settling time.

waveforms with and without such delay. Without the delay, both of the pseudo-differential h1 amp outputs momentarily dip together below supply level (pre-charged value)

before they split into intended differential values [Fig. 10(c)]. This common-mode dip causes the tail node of h1 CML tap [Fig. 10(b)] to dip also by as much as 200 mV and



Fig. 10. Delayed pre-charge release of slicer h1 amp. (a) h1 tap current, (b) h1 CML tap tail node, (c) h1 amp output, and (d) slicer sampling clock edges for current (solid line) and the next (dashed line) symbols. Y-axes are in the unit of volts, except (a) which is in mA.



Fig. 11. Half-rate direct h1 DFE timing diagram.

subsequent differential inputs would cause transient surge in over-drive ( $V_{od}$ ) of the tap switching device. As a result, h1 differential currents injected into summing node [Fig. 10(a)] would overshoot by as much as 75% before eventually settling to the target value, but not before the next UI sampling aperture opens. The overshoot behavior is uncontrollable high-speed disturbance in nature and can be sensitive to process-voltage-temperature (PVT) conditions. With properly matched delay of the pre-charge release, h1 tap current is much more controlled (albeit with little more latency than

without delay case) and results in faster h1 settling behavior overall. The amount of CMOS clock delay tracks well with StrongArm sense-amp and the PVT variation of the h1 settling behavior is well controlled over such variation. Notice that the h1 amp experiences a little longer crowbar current period due to the delayed pre-charge release. However, the amount of increased power consumption due to this extra crowbar current is insignificant.

Overall timing diagram of half-rate direct h1 DFE scheme is depicted in Fig. 11 with a simplified block diagram. In the



Fig. 12. Simulated slicer input waveforms (a) before and (b) after  $h_1 = 55$  mV and  $h_2 = 20$ -mV corrections are turned on. PRBS7 pattern generated by a Verilog-A module with  $h_1$  and  $h_2$  ISI injection. All vertical axes are in the unit of volts.

timing diagram,  $D_k$  represents the analog value of  $k$ th PAM-4 symbol before  $h_1$  ISI cancellation, the solid blue arrows represent slicing clock edges, the dashed green arrows represent digitizing action ( $d_k$  represents sampled PAM-4 digital value of  $D_k - h_1 * d_{k-1}$ ), and the dotted-dashed red arrows represent direct  $h_1$  feedback action. Fig. 12 plots simulated waveforms of the slicer clocks (Clk and Clk<sub>b</sub>), and the even and odd summer outputs Fig. 12(a) before and Fig. 12(b) after  $h_1$  and  $h_2$  corrections are turned on. The summer is driven by a PRBS7 pattern generator with artificially injected  $h_1$  and  $h_2$  ISI using a Verilog-A module. The simulation is performed by a fast circuit simulator using the schematic-based design without wire load resistances taken into account. The slicer kickback is seen in the summer outputs a few picoseconds after the capturing clock rising edge. The impact of the slicer kickback would be seen primarily as an apparent  $h_0$  cursor amplitude reduction and an increased compression to a higher order, which the individually adapted slicer levels will track and be corrected by boosting PGA gain by the AGC loop.

The four error slicers are not required to meet such critical timing and are therefore implemented using a conventional

master-slave StrongArm sense-amp without an  $h_1$  amp, with smaller device sizes than data slicers to save power. The sampling aperture timing of the data and error slicers are characterized and matched (Fig. 13) in order to avoid systematic offset in sampling clock phase away from optimal CDR lock position. A simulated sampling aperture timing of data and error slicers show the timing is matched well within 1 ps with an aperture window of  $\sim 3.5$  ps at half peak height, across PVT conditions.

#### D. DFE Summer

Each of the two DFE summer stage consists of NMOS CML main amplifier and DFE tap cells sharing a calibrated load resistor ( $R_L$ ), as shown in Fig. 14. The output net is heavily loaded ( $\sim 100$  fF wire loading) requiring inductive peaking to maintain both a forward path bandwidth greater than 24 GHz and a DFE feedback path settling time less than  $\sim 20$  ps. The forward gain of each of the two summer buffers is set at 3 dB per stage at nominal condition in order to mitigate linearity requirement of the front-end stages. The device noise



Fig. 13. Simulated sensitivity curves for the data (solid lines) and the error slicers (circles) for (a) typical, (b) fast, and (c) slow PVT conditions, respectively, showing <1 ps sampling aperture misalignments across PVT. The X-axis is the delay between the zero crossings of the slicer data input step and the corresponding sampling clock edge. The Y-axes are normalized sensitivity.

contribution from the summer stage at the slicer output is not significant compared to the CTLE stages.

Another important design parameter is the common mode of the summer output. Too low a common-mode reduces forward gain and adversely affects summer linearity, which is of critical importance in DFE performance especially for PAM-4 signaling with 3x larger linearity range requirement over NRZ. On the other hand, too high a common mode will degrade the StrongArm slicer performance because the effective Gm of the slicer input device is reduced. The lower Gm of the slicer increases input referred noise and slows down T<sub>cq</sub>. The summer forward path Gm device is biased with constant current density scheme, which intrinsically maintains output common-mode thanks to the calibrated load resistor ( $R_L$ ). The only systematic common-mode variation comes from the total DFE tap current that can vary from channel to channel depending on the amount of ISI corrected by DFE taps. In this paper, an open-loop compensation scheme is used to tightly control the summer output common mode as illustrated in Fig. 14. In this scheme, the replicas of h1 and h2 tap currents are summed together and are mirrored to be injected into the node between  $R_L$  and the peaking inductor. The mirror ratio takes into account the effect of finite dc resistance of the peaking inductor ( $R_{ind}$ ). The current sources providing the DFE tap compensation current is powered by 1.8 V so that thin-oxide cascode devices can be inserted with good head room while minimizing the summer bandwidth degradation. The same open-loop common-mode compensation scheme

is also used in SUM1 stage. Overall, the residual summer common-mode variation contributed by the supply variation and random device mismatch is estimated to be well within the range of the required input common mode of the slicers.

Due to the multi-level signaling nature of the PAM-4 scheme, the linearity requirement of the AFE stages, comprised of CTLE and DFE summers, is more stringent than NRZ. Improving linearity of an AFE stage without reducing the available signal range inevitably involves gain reduction and/or more power consumption. For a given power budget allocated for the AFE, however, staging larger signal range amplifier as much after ISI equalization increases the efficiency of the available linearity range. In other words, ISI cancellation is best achieved with smaller signal range as long as resulting SNR degradation does not become the determining factor of minimal achievable BER. This is another reason for implementing the DFE summer in two stages (SUM1 and SUM2). Any residual AFE non-linearity, plus TX non-linearity if present, would show up as non-uniform adapted slicer level spacing or unequal eye openings in the eye diagram, reducing PAM-4 BER margin.

### III. EXPERIMENTAL RESULTS

Receiver wafer dice are assembled in BGA packages, following the standard mass production very large scale integration packaging flow. Bench testing of the manufactured PAM-4 receiver is performed over VSR and MR channels (Fig. 15) for BER bathtub sweeps and eye-scan plotting. Cross-talk noise performance is also tested for both channels. The test board with DUT socket and breakout cable connectors are controlled by another general purpose FPGA board that fetches data and status bits from DUT and sends commands back to DUT. A control software that performs PAM-4 RX calibration and closed-loop corrections is coded and programmed onto the FPGA. An external reference clock generator drives the on-die LC-PLL in DUT, which provides clocks to the DUT (receiver) as well as PAM-4 transmitter on the same silicon die. The PAM-4 TX generates 56-Gb/s PRBS31 signal with programmable amplitude and three-tap TX FIR setting. For the tested VSR and MR channels, only pre-cursor TX FIR is used.

Fig. 16 reports the receiver performance measured over the example VSR channel described in Figs.1 and 15(a), comprised of 6" PCB trace (embedded in Rogers RO4003C dielectric material), four connectors, two 24" cables, and a 50-GHz dc blocking cap. The total measured S<sub>21</sub> is -9.6 dB, BGA-to-BGA, and -12.3 dB at 14 GHz including TX/RX package traces (~25 mm total length embedded in an organic laminate substrate). For VSR testing, TX sends 250-mV-dpp 56-Gb/s PRBS31 signal with 3.38-dB pre-cursor pre-emphasis. The measured PAM-4 eye in Fig. 16(a) shows three distinct eyes around the CDR sampling clock phase (X = 0UI), vertically centered around three data slicer threshold levels (DH, DZ, and DL). Since no dedicated eye-scan slicers are present, eye opening is measured by sweeping sampling clock phase of the data and error slicers together, while independently sweeping error slicer thresholds for upper, middle, and lower eyes, in sequence. The asymmetry in upper and



Fig. 14. DFE summer stage (one branch of SUM2 shown; SUM1 is similar).



Fig. 15. Example (a) VSR and (b) MR channels and corresponding IL curves measured from TX BGA pads to RX BGA pads.

lower eyes is attributed to various non-idealities in channel and data path circuits, including slicer sensitivity mismatch and RX input signal imbalance.

The PAM-4 BER timing and voltage bathtub plots are taken using an on-chip BER counter for three different cases: no DFE, h1 DFE only, and h1-h10 DFE. The timing bathtub plot in Fig. 16(c) shows  $<1\text{E-}12$  of BER at optimal position and  $\sim 0.2\text{UI}$  margin at  $1\text{E-}6$  BER required by CEI-56G-VSR-PAM4 standard [1]. The  $1\text{E-}6$  raw BER level is acceptable in the standard because the standard allows the use of a forward error correcting (FEC) block code, such as the Reed-Solomon code, that brings down the link BER to an acceptable level (e.g.,  $<1\text{E-}18$ ) [8].

The voltage BER bathtub plot in Fig. 16(b) is taken by varying DH slicer threshold from its nominal value, while keeping DZ and DL, and measure PAM-4 BER, showing  $<1\text{E-}12$  of BER at optimal slicer threshold and  $\pm 30\text{ mV}$  margin at  $1\text{E-}6$  BER. Both timing and voltage bathtub plots show that the h1 direct feedback effectively suppresses the 1UI post-cursor shown in Fig. 2. Even after h1 correction, un-cancelled

higher tap ISI would significantly increase PAM-4 BER at sampling point, necessitating the use of higher taps.

The three sets of eye scans and BER bathtubs in Fig. 17 are taken by turning on ISI cancellation with (a) CTLE only, (b) CTLE and h1, and (c) CTLE and all ten-tap DFE. The progressively increasing eye opening and lower BER graphically demonstrate the effectiveness of direct h1 DFE and the need for higher taps, even for VSR channel. From Fig. 17(a) where only CTLE is used to Fig. 17(b) where h1 loop is also turned on, more than three orders of magnitude BER improvement is achieved, proving the effectiveness of direct h1 DFE. It is only after higher DFE taps are turned on as shown in Fig. 17(c) that  $<1\text{E-}12$  BER is achieved. The adapted DFE coefficients are:  $h1 = 13.16\%$ ,  $h2 = 0.00\%$ ,  $h3 = -0.33\%$ ,  $h4 = -0.33\%$ ,  $h5 = -0.98\%$ ,  $h6 = 0.65\%$ ,  $h7 = -1.96\%$ ,  $h8 = -0.65\%$ ,  $h9 = -0.65\%$ , and  $h10 = -0.98\%$ , all relative to  $h0$ . Fig. 18 compares the adapted DFE coefficients (in squares) with the post-cursor ISI from the simulated single-bit response (in circles) in Fig. 2. The predicted reflection at around  $h6$  and  $h7$  is well correlated with the



Fig. 16. (a) 56-Gb/s PRBS31 PAM-4 eye-diagram and corresponding (b) voltage and (c) timing BER bathtub curves taken on the VSR channel in Fig. 15(a). (b) and (c) BER curves show three cases: no DFE in triangles; h1 only case in squares; and all ten-DFE taps in circles.



Fig. 17. 56-Gb/s PRBS31 PAM-4 eye-diagram and timing BER bathtub curves taken on the VSR channel in Fig. 15(a). (a) All DFE taps are turned off. (b) Only h1 is turned on. (c) All ten-DFE taps are turned on.

adapted coefficients. The adapted h2, h3, and h6 coefficients turned out to be less than the simulated value, which is attributed to inaccuracies in the channel modeling and AFE characteristics.

For the MR channel with 23-dB loss shown in Fig. 15(b) (3-m 26AWG cable), BER performance degrades from that of the VSR channel, as shown in Fig. 19(b). At 1E-6 BER level required by CEI-56G-MR-PAM4 standard, UI opening



Fig. 18. Adapted DFE coefficients (in squares) in comparison with the simulated post-cursor ISI (in circles) from the single-bit response in Fig. 2. VSR 56-Gb/s PRBS31.



Fig. 19. 56-Gb/s PRBS31 PAM-4 (a) eye-diagram and (b) timing BER bathtub curves taken on the MR channel in Fig. 15(b).



Fig. 20. 56-Gb/s PRBS31 PAM-4 cross-talk performance for (a) VSR and (b) MR channels, respectively.

is  $\sim 0.1\text{UI}$  without cross-talk injection. The adapted DFE correction level for 1UI post-cursor ( $h_1$ ) amounts to about 60% of the equalized main cursor height ( $h_0$ ) while no sign of DFE error propagation had been observed.

The cross-talk performance of ultra-speed wireline transceivers, such as 56-Gbps PAM-4 as reported in this paper, has become very important due to stronger aggressor-to-victim coupling from sharper transition edges and reduced



Fig. 21. (a) PAM-4 eye scan and (b) cursor histogram (56-Gb/s PRBS31 30-dB channel) with adapted slicer DAC codes in arrows.

|                | This work                              | Shibasaki et al. ([9])          | Han et al. ([10])                               | Frans et al. ([2], [3])         |
|----------------|----------------------------------------|---------------------------------|-------------------------------------------------|---------------------------------|
| Technology     | CMOS 16nm FinFET                       | 28nm CMOS                       | 65nm CMOS                                       | CMOS 16nm FinFET                |
| Data Rate      | 40-56Gb/s                              | 56.2Gb/s                        | 60Gb/s                                          | 56Gb/s                          |
| Modulation     | PAM-4                                  | NRZ                             | NRZ                                             | PAM-4                           |
| Equalization   | CTLE<br>10-Tap DFE<br>(h1,...,h10)     | CTLE<br>1-Tap DFE               | CTLE<br>2-Tap FFE<br>3-Tap DFE                  | CTLE<br>ADC based DFE &<br>FFE  |
| Area           | 0.364mm <sup>2</sup>                   | 1.4mm <sup>2</sup><br>(RX & TX) | 0.16mm <sup>2</sup>                             | 2.8mm <sup>2</sup><br>(2 TX/RX) |
| Power          | 230mW @56Gb/s                          | 142mW<br>@56Gb/s (RX)           | 173mW                                           | 370mW<br>(RX excl. DSP)         |
| Power Supplies | 0.9V/1.2V/1.8V<br>(digital/analog/aux) | 0.96V                           | 1.2V/1.0V                                       | 0.9V/1.2V/1.8V                  |
| Channel        | 10dB                                   | 18.4dB                          | 1.54<br>(V <sub>IS</sub> /V <sub>CURSOR</sub> ) | 25dB                            |
| BER            | < 1E-12 (PRBS31)                       | < 1E-12                         | < 1E-12                                         | ~1E-8                           |

Fig. 22. Performance summary and comparison to prior works.

SNR margin in PAM-4 signaling. A real silicon characterization of cross-talk performance of such links provides valuable information to understand and guide during the system level link design and component specification. Fig. 20 plots

BER timing bathtubs measured while noise power is swept from 0 to 3 mV in 1 mV step, for Fig. 20(a) VSR and Fig. 20(b) MR channels, respectively. A broadband noise generator box injects power-calibrated white noise



Fig. 23. Receiver power break-down and supply domains.

(with 16-GHz bandwidth) using a pair of directional couplers into RX cables of the DUT board. Total integrated noise power (in mV-rms) applied at RX input is calculated from the measured noise generator power and the measured IL between the noise injection point to DUT's RX BGA balls. The VSR channel performance in Fig. 20(a) as measured by UI opening at 1E-6 BER, degrades very little up to 2 mV and stays within 0.1UI up to 3 mV. MR performance shown in Fig. 20(b) degrades more rapidly than the VSR case as noise power increases and closes UI opening at 1E-6 BER beyond 2-mV-rms noise. This is the expected results because of the relatively larger SNR degradation for more lossy channels at RX inputs.

The PAM-4 eye-scan images shown in Figs. 16, 17, and 19 are taken by a “destructive” method, implying data link is disrupted when collecting the data. This method basically samples waveform crossing probability within voltage/timing aperture by sweeping sampling clock phase and slice level (therefore disrupting data sampling). Technically, this method is not necessarily equivalent to standard BER contour mapping method based on the output of a dedicated eye-scan slicer compared with simultaneously taken data slicer output (which is anchored at optimal sampling clock phase and slicing threshold). On the other hand, timing and voltage BER bath-tubs shown in this paper are rigorously measured using on-die PRBS error checker and a calibrated time base. A closer examination of eye-scan images and the independently taken BER sweeps confirms that both horizontal and vertical openings match very well with timing and voltage bath-tubs.



Fig. 24. Die photograph.

Fig. 21 compares (a) PAM-4 eye scan and (b) independently measured cursor histogram at CDR lock position (corresponding to  $X$ -axis = 70 PI code in eye scan) for the same DUT, measured at 56-Gbps PRBS31 over a 30-dB channel. The cursor histogram was taken by counting waveforms that pass a narrow slice level interval [V<sub>th</sub>, V<sub>th</sub>+dV] at CDR lock position. Clear PAM-4 levels are visible from cursor histogram, well correlated with the accompanying eye scan. The seven slice levels, indicated by the seven arrows in Fig. 21(b) positioned vertically at the adapted DAC codes for data and error slicers thresholds, match well with observed peaks and valleys, validating slice level adaptation algorithm.

Fig. 22 summarizes the receiver performance and its comparison with recently published transceivers capable of similar bit transfer speed. The receiver consumes 230 mW at 56 Gb/s for VSR channel (Fig. 23), including high-speed clock buffers and clock correction circuits. Majority of power consumption takes place in DFE, including slicer array, summer buffers, DFE FIR latches, and clock buffers. For higher loss channels with higher DFE tap strength, the power consumption increases by ~3–5 mW, depending on CTLE parameter settings that control the AFE bandwidth. Three power domains are used throughout the receiver: 0.9 V for CMOS digital blocks, 1.2 V for high-speed CML stages (CTLE, PGA, and DFE summer) and regulated CMOS clock buffers, and 1.8 V for regulator error amplifier and DFE summer common-mode correction current sources. Notably, high-bandwidth CML circuits use thin-oxide MOSFET for best gain-bandwidth product and  $f_T$  available from 16-nm FinFET process design

kit (PDK). Device reliability is guaranteed by carefully designing dc OP points both in normal operation mode and in power down mode such that device terminal voltage differences do not exceed the allowed maximum rating for thin-oxide devices. Fig. 24 illustrates the die micrograph of the fabricated chip taken by back-side infrared imaging technique with major sub-blocks of the analog PAM-4 receiver demarcated by boxes. Total RX area is 0.364 mm<sup>2</sup>.

#### IV. CONCLUSION

The analog PAM-4 receiver fabricated using 16-nm FinFET technology described in this paper is capable of achieving <1E-12 raw BER over CEI-56G-VSR-PAM4 channel of 10-dB loss and <1E-9 raw BER over CEI-56G-MR-PAM4 channel of 23-dB IL, meeting <1E-6 target BER specified for the standards. The use of FEC encoding and decoding layers as allowed by such standards would bring down the post-FEC link BER well below 1E-25 [3]. The main focus of the circuit optimization is on the power efficiency. As compared to a comparable performance ADC-based solution [2], [3], this solution reduces the power consumption by 40% or more. The critical h1 timing was solved by a power efficient direct feedback scheme, as opposed to a speculative scheme more commonly adopted in NRZ DFE schemes [5]. The h1 analog settling behavior of the summing node is carefully controlled by delaying the slicer digital output from reset value, allowing a predictable and robust DFE feedback across PVT within the stringent 1UI timing budget. Future improvement plans include a power efficient RX pre-cursor cancellation scheme to alleviate the need of TX FIR adaptation for optimal pre-cursor ISI cancellation.

#### ACKNOWLEDGMENT

The authors would like to thank Xilinx SERDES team for their contributions in design, verification, layout, and bench test.

#### REFERENCES

- [1] *Optical Internetworking Forum (OIF), CEI-56G-VSR-PAM4 Very Short Reach Interface, Contribution*, document OIF 2014.230.07, Jun. 2016.
- [2] Y. Frans *et al.*, “A 56 Gb/s PAM4 wireline transceiver using a 32-way time-interleaved SAR ADC in 16 nm FinFET,” in *Proc. Symp. VLSI Circuits*, 2016, pp. 1101–1110.
- [3] Y. Frans, “A 56-Gb/s PAM4 wireline transceiver using a 32-way time-interleaved SAR ADC in 16-nm FinFET,” *IEEE J. Solid-State Circuits*, vol. 52, no. 4, pp. 1101–1110, Apr. 2017.
- [4] J. Im, D. Freitas, A. Roldan, R. Casey, S. Chen, and A. Chou, “A 40-to-56Gb/s PAM-4 receiver with 10-tap direct decision-feedback equalization in 16 nm FinFET,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 114–115.
- [5] J. Savoj *et al.*, “A wide common-mode fully-adaptive multi-standard 12.5 Gb/s backplane transceiver in 28 nm CMOS,” in *Proc. Symp. VLSI Circuits*, 2012, pp. 104–105.
- [6] B. Zhang *et al.*, “A 28 Gb/s multi-standard serial-link transceiver for backplane applications in 28 nm CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 52–53.
- [7] J. Montanaro *et al.*, “A 160-MHz, 32-b, 0.5-W CMOS RISC micro-processor,” *IEEE J. Solid-State Circuit*, vol. 31, no. 11, pp. 1703–1714, Nov. 1996.
- [8] K.-H. Otto *et al.*, “Proposal for CEI-56G FEC requirements section,” Opt. Internetworking Forum Contrib., Otto, Hamburg, Germany, Tech. Rep. 2015.302.02, Jul. 2015.

- [9] T. Shibasaki *et al.*, “A 56 Gb/s NRZ-electrical 247mW/lane serial-link transceiver in 28 nm CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2016, pp. 64–65.
- [10] J. Han, “A 60 Gb/s 173 mW receiver frontend in 65 nm CMOS technology,” in *Proc. Symp. VLSI Circuits*, Jun. 2015, pp. C230–C231.



**Jay Im** received the B.S. degree in electronics engineering and the M.S. degree in physics from Seoul National University, Seoul, South Korea, and the Ph.D. degree in physics from the Ohio State University, Columbus, OH, USA, in 2001. His dissertation topic was nanometer-scale study of metal contacts to wide-bandgap semiconductor materials, including SiO, SiC, and GaN.

He was with Silicon Storage Technology, Inc., Sunnyvale, CA, USA, from 2001 to 2004 as a Process/Device Engineer, where he was involved in developing CMOS compatible flash memory IP. He joined Xilinx Inc., San Jose, CA, USA, in 2004, where he has been involved in embedded non-volatile memory technology development, E-fuse PDK development, custom design of OTP macro in field-programmable gate array (FPGA), RTL design of interface block for foundry IP, and led the pre-production diagnostic module group. He has been leading a Design Team, developing analog front end for high-speed SERDES PHY to be embedded in FPGA, and is in charge of the SERDES IP development for the next generation FPGA.



**Dave Freitas** received the B.S.E.E. degree from San Jose State University, San Jose, CA, USA, and the M.S.E.E. degree from the University of California at Davis, Davis, CA, USA.

He was with the Disk Drive Storage Division, IBM, San Jose, CA, USA, until 2004, where he was involved in analog circuits for the disk drive servo system and the read/write preamp. In 2004, he joined the IBM SERDES Group, USA, where he was involved in analog receiver circuits. He joined Xilinx, Inc., San Jose, CA, USA, in 2013, where he was involved in decision feedback equalizers and strong arm latches. He is currently with the Xilinx high speed SERDES group, where he is involved in continuous-time linear equalizers.



**Arianne Bantug Roldan** received the B.S. degree in computer engineering and the M.S. degree in electrical engineering from the University of the Philippines, Quezon City, Philippines, in 2001 and 2003, respectively.

From 2003 to 2007, she was with Canon Information Technologies Philippines, Quezon City, Intel Technology Philippines, and Chartered Semiconductor (now Global Foundries) consecutively. She participated in design, verification, and silicon validation of various IPs such as memory controller, flash memory, IOs, and standard cells. She joined Xilinx in 2008, where she is currently a Staff IC Design Engineer with the SERDES Technology Group. Her current research interests include in high-speed, low-power circuit design and methodology development.



**Ronan Casey** (M’12) received the B.E. degree in microelectronics from University College Cork, Cork, Ireland, in 2005.

From 2005 to 2008, he was with Silicon and Software Systems (S3), Cork, Ireland, where he was involved in mixed-signal RF baseband circuits and USB receive circuits. From 2008 to 2012, he was with Analog Devices Inc., Limerick, Ireland, where he was involved in developing low-noise high-speed phase-locked loop products. He is currently a Staff Design Engineer with the Xilinx SerDes group, Cork, where he is involved in high-speed wireline transceivers.



**Stanley Chen** (M'01) received the B.S. degree in electrical engineering from National Taiwan University, Taiwan, in 2001, and the Ph.D. degree in electrical engineering and computer science from the University of California at Berkeley, Berkeley, CA, USA, in 2012.

He is currently a Senior Design Manager with Xilinx, Inc., San Jose, CA, USA, where he is involved in high-speed wireline transceivers. His current research interests include analog and mixed-signal circuits and systems.

Dr. Chen received the Vodafone-U.S. Foundation Fellowship for the academic year 2003–2004.



**Chuen-Huei Adam Chou** received the B.S. degree in electrical engineering from National Tsing-Hua University, Hsin-Chu, Taiwan, in 1991, and the M.S. degree in electrical engineering from Purdue University, West Lafayette, IN, USA, in 1994.

From 1995 to 1998, he was with Toshiba America Electronic Component, where he was involved in USB and HSTL transceiver macro. From 1998 to 2013, he was with Rambus, Inc., where he was involved in high-speed chip-to-chip and memory interfaces. He is currently with Xilinx Inc., San Jose, CA, USA, where he is involved in high-speed serial links and analog-digital co-simulation.



**Tim Cronin** was born in Cork, Ireland. He received the B.E. degree in electrical and electronic engineering from University College Cork, Cork, in 2012, and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2014.

In 2014, he joined Xilinx, San Jose, CA, USA, as an Analog Design Engineer with the SerDes technology group.



**Kevin Geary** received the B.E. degree in electrical and electronic engineering and the M.Eng.Sc. degree in microelectronics from University College Cork, Cork, Ireland, in 2002 and 2004, respectively.

From 2003 to 2010, he was with Silicon and Software Systems (now S3Group), Cork, where he developed circuits for both wired and wireless transceiver designs. In 2010, he joined Powervation (now ROHM Semiconductor), Cork, where he was involved in designing mixed signal IC's for digital pulse width modulation controllers. In 2012, he joined the Xilinx SerDes Group, Cork, where he is involved in gigabit transceivers for wireline applications. His current research interests include phase-locked loops, active filters, and high-speed mixed-signal circuit design.

**Scott McLeod**, photograph and biography not available at the time of publication.



**Lei Zhou** received the B.Eng. degree in electrical engineering from the Huazhong University of Science and Technology, China, in 2002, the M.Eng. degree from the Department of Electrical and Computer Engineering, National University of Singapore, Singapore, in 2005, and the Ph.D. degree in electrical and computer engineering from the University of California at Irvine, Irvine, CA, USA, in 2010.

He was with Broadcom and Quantenna, where he developed RF front-end circuits for wireless LAN transceivers. He interned at Atheros and Broadcom in 2006, 2007, and 2009. He is currently with Xilinx, Inc., San Jose, CA, USA, as a Senior Design Engineer with the Serdes Technology Group, developing high-speed analog and mixed signal circuits for NRZ and PAM4 transceivers. He has authored or co-authored over 10 journal and conference publications. His current research interests include analog, mixed-signal, and RF/MMW integrated circuit design for wireline and wireless transceivers.



**Ian Zhuang (Yi Zhuang)** received the B.E. degree in computer engineering from the University of Melbourne, Melbourne, VIC, Australia, in 2007, and the M.Eng. degree in electrical and computer engineering from Cornell University, Ithaca, NY, USA, in 2009.

In 2011, he joined Xilinx, Inc., San Jose, CA, USA, as a Design Engineer with the SerDes Technology Group, where he was involved in silicon validation and characterization, test automation, and IBIS-AMI model development for high-speed

transceivers. He is currently involved in the design and development of high-speed low-power clock distribution and custom digital blocks for wireline transceivers.



**Jaeduk Han** (S'15) received the B.S. and M.S. degrees in electrical engineering from Seoul National University, Seoul, South Korea, in 2007, and 2009, respectively. He is currently pursuing the Ph.D. degree in electrical engineering with the University of California, Berkeley, CA, USA.

He was a Circuit Design Engineer with TLI, Seongnam, South Korea, from 2009 to 2012. He has held various engineering intern positions at Altera, San Jose, CA, USA, Intel, Hillsboro, OR, USA, Xilinx, San Jose, and Apple, Cupertino, CA, USA, in 2012, 2014, 2015, and 2016, respectively, where he was involved in high-speed wireline communication circuits and power management circuits. His current research interests include high-speed wireline communication circuit design and analog circuit design automation.



**Sen Lin** received the B.S. degree in microelectronics from Tsinghua University, Beijing, China, in 2012. He is currently pursuing the Ph.D. degree in electrical engineering and computer science with the University of California, Berkeley, CA, USA.

He was an intern at Xilinx, San Jose, CA, from 2015 to 2016, where he focused on the design of high-speed PAM4 receiver and optical interconnects. In 2016, joined as an intern with Ayar Labs, San Francisco, CA, USA, where he designed high-speed silicon photonic transceivers. His current research interests include modeling and design of integrated silicon photonic systems, high-speed optical transceivers and mixed-signal integrated circuits.



**Parag Upadhyaya** (S'00–M'07) received the B.S., M.S., and Ph.D. degrees in electrical engineering from Washington State University in 2000, 2005, and 2008, respectively.

From 2001 and 2003, he was with Cypress Semiconductor, where he was involved in the development of high-speed wireline and optical transceivers. Since 2008, he has been with Xilinx, Inc., San Jose, CA, USA, where he is currently the Director of Engineering with the SERDES Technology Group and leading development of high-speed transceivers for field-programmable gate array application. He has authored or co-authored over 54 journal, conference, and book chapter publications. He holds more than 20 U.S. patents. His current research interests include high-speed mixed signal circuits for wireline, wireless, and optical transceivers, high-speed data converters, and phase-locked loops.



**Geoff Zhang** received the Ph.D. degree in microwave engineering and signal processing from Iowa State University, Ames, IA, USA, in 1997.

He was with HiSilicon, Huawei Technologies, LSI, Agere Systems, Lucent Technologies, and Texas Instruments. He joined Xilinx Inc., in 2013, where he is currently a Distinguished Engineer and Supervisor, managing transceiver architecture and modeling with the SerDes Technology Group. His current research interests include transceiver architecture modeling and system level end-to-end simulation, both electrical and optical.

**Yohan Frans**, photograph and biography not available at the time of publication.



**Ken Chang** (M'99–SM'14) received the B.S. degree from National Taiwan University, Taipei, Taiwan, in 1990, and the M.S. and Ph.D. degrees from Stanford University, Stanford, CA, USA, in 1994 and 1999, respectively, all in electrical engineering.

From 1999 to 2010, he was with Rambus Inc. He led several projects, including 5-Gb/s/lane 12-Gbytes FlexIO interface for CELL processors, 16- and 20-Gb/s low-power memory interfaces exploring various signaling techniques. Since 2010, he has been with Xilinx, Inc., San Jose, CA, USA, and led the SerDes Technology Group, focused on developing multistandard SerDes IPs for FPGAs, covering top line rates from 10, 28, and 56 Gb/s, all capable of long reach transmission. He has authored or co-authored over 40 IEEE conference/journal publications and holds over 30 U.S. patents in the high-speed link area. His current research interests include high-speed mixed-signal CMOS circuit design, transmitter and receiver design, CDR, equalization, PLL/DLL design, circuit noise analysis, signal integrity analysis, and mixed signal design methodology.

Dr. Chang is the co-author of the 2008 and 2014 CICC best regular papers. He had served on technical program committees for ISSCC and CICC. He is the Technical Program Co-Chair of the 2017 VLSI Circuit Symposium.