

# A 32.75-Gb/s Voltage-Mode Transmitter With Three-Tap FFE in 16-nm CMOS

Kok Lim Chan, Kee Hian Tan, Yohan Frans, Jay Im, Parag Upadhyaya, Siok Wei Lim, Arianne Roldan, Nakul Narang, Chin Yang Koay, Hongyuan Zhao, Ping-Chuan Chiang, and Ken Chang, *Senior Member, IEEE*

**Abstract**—This paper describes a 32.75-Gb/s voltage-mode transmitter (TX) with three-tap feed forward equalization that is fabricated in a 16-nm FinFET CMOS technology. The TX uses a dual regulator architecture to allow independent control of output swing, output common-mode, and equalization. A hybrid impedance control scheme is presented where the total number of driver slices is used for coarse impedance control, and analog loop-based voltage control is used for fine impedance control of the TX. A finite-impulse response compensation circuit to compensate for the data dependent current due to equalization is also presented. The TX consumes 120.8 mW with 0.9- and 1.2-V supplies, provides 0.25–0.9 V<sub>pp</sub> output swing, and achieves total jitter of 6.49 ps-pp and random jitter of 220 fs-rms at 32.75 Gb/s with bit error rate of 1e-12.

**Index Terms**—Equalization, source-series-termination (SST), transmitter (TX), voltage-mode driver.

## I. INTRODUCTION

FIELD-PROGRAMMABLE gate array (FPGA) needs to support various market segments such as aerospace, defense, data centers, automotive, industrial, and medical. Therefore, a transmitter (TX) designed for FPGA needs to support a wide range of data rates and key protocols [1], [2], [15]. These include: 10G SFP+ and 10G CX1 at chip-to-module interface, 40G XLAUI and 10G XFI at chip-to-chip interface, 40G nPPI and 40G CR4 at front-panel IO to line card, 10G KR for backplanes, OIF CEI-25G-LR and CEI-028G-SR for chip to optics, and PCIe Gen 3 and Gen 4 for general purpose serial connection [1]–[4]. Furthermore, industrial and defense applications impose extreme temperature and strict long-term reliability requirements [1].

In meeting these protocols, the TX should have three main features. First, the TX impedance should ideally be independent of the output swing. Second, the TX should support various style of equalizations e.g., in PCIe the main-cursor setting depends on the pre- and post-cursor settings, but in 10GR KR, main-cursor can independently be controlled, and finally, the TX output swing should be adjustable [5].

Manuscript received February 13, 2017; revised May 12, 2017; accepted May 28, 2017. Date of publication June 30, 2017; date of current version September 21, 2017. This paper was approved by Guest Editor Jaeha Kim. (*Corresponding author: Kok Lim Chan*)

K. L. Chan, K. H. Tan, S. W. Lim, A. Roldan, N. Narang, C. Y. Koay, H. Zhao, and P.-C. Chiang are with Xilinx Asia Pacific Pte. Ltd., Singapore 486040 (e-mail: koklimc@xilinx.com).

Y. Frans, J. Im, P. Upadhyaya, and K. Chang are with Xilinx, Inc., San Jose, CA 95124 USA.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2714180



Fig. 1. (a) Simplified voltage-mode driver. (b) Conceptual voltage-mode driver.

These design requirements are further complicated with the use of advanced CMOS technologies [1], [2]. Typically, advanced CMOS process such as 16-nm FinFET is tailored for high-speed digital and not high-precision analog design. Also, double patterning technology developed for enhanced photolithography in this process poses major challenges in the layout of the back-end layers.

In this paper, the design and challenges for a highly flexible low-power voltage-mode TX with three-tap feed forward equalization (FFE) in 16-nm CMOS technology will be presented. The TX supports data rate from 0.5 to 32.75 Gb/s, and output swing from 0.25 to 0.9 V<sub>pp</sub>. It also supports equalization requirement for various protocols. The TX uses a dual regulator architecture to allow the swing, common-mode, and equalization to be controlled independently. The architecture of the TX and the swing control scheme is presented in Section II. In Section III, a hybrid impedance control scheme with both slice-based and analog loop-based impedance control is presented. The hybrid impedance scheme allows reduced power consumption while achieving good impedance matching across process, temperature and voltage (PVT) variations. To avoid the degradation in the deterministic jitter (DJ) in the TX due to data dependent current through the regulators, a finite-impulse response (FIR) compensation scheme is introduced in Section IV. Section V summarizes the experimental results, and Section VI gives the conclusion.

## II. TRANSMITTER ARCHITECTURE

In this paper, a voltage-mode driver is adopted as the voltage-mode driver is 4× more power efficient than its current-mode counterpart [5], [6]. Fig. 1(a) shows a simplified differential voltage-mode driver, on each side, the driver



Fig. 2. (a) Transmitter architecture. (b) 100GBase-CR4 channel response example.



Fig. 3. Transmitter datapath and clock buffers/dividers.

consists of  $N$  pull-up/down devices. Each device can be switched to  $V_{refp}$  or gnd, and when the devices on the  $V_{out+}$  are switched to  $V_{refp}$ , the devices on  $V_{out-}$  will be switched to gnd. Conceptually, the voltage-mode driver forms a resistor divider with the load as shown in Fig. 1(b). The driver is designed such that the total conductance of all the devices is twice of the load conductance i.e.,  $N \times G = 2 \times G_L$ . Equalization is implemented by controlling  $M$  of the devices with a delayed (post-cursor) or advanced version (pre-cursor) of the input. At times  $M$  devices may be switched in the opposite direction, and the output amplitude  $V_{out}$  is given by [6]

$$V_{out} = \frac{N - 2M}{2N} V_{refp}. \quad (1)$$

Since the total number of transistors connected to the output that are turned ON remains constant, the impedance of the driver will also remain constant. One disadvantage of this approach is that the current drawn from  $V_{refp}$  depends on  $M$ . To avoid the data dependent current, an additional shunt path between  $V_{out+}$  and  $V_{out-}$  can be added to maintain a

constant current from  $V_{refp}$  [3], [6]. However, in [3], due to the non-linear mapping between  $V_{out}$  and simultaneous impedance calibration, complicated decoding logic and large number of segments are required [6]. In this paper, an FIR compensation scheme is proposed to address the data dependent current due to equalization. This is elaborated in Section IV.

Fig. 2(a) shows the block diagram of the TX architecture. This block includes three main functional blocks: TX front end, TX datapath, and the TX clock buffers/dividers. The datapath takes an input of 64 bit to generate the input streams to the driver slices. The TX front end consists of 85 parallel TX driver slices. These slices are grouped into pre-cursor (16 slices), main-cursor (36 slices), and post-cursor slices (33 slices) to implement a three-tap FFE. The clock buffer/divider generates all the clocks to the datapath. Fig. 2(b) shows a 100GBase-CR4 channel response example [16]. Typically, the TX FFE is adjusted in conjunction with RX CTLE to equalize the channel response. The number of taps and resolution of the FFE are chosen to meet product specifications, and to support multiple protocols.



Fig. 4. (a) Retimer and 2-to-1 MUX. (b) 2-to-1 MUX timing. (c) Bathtub curve.



Fig. 5. (a) Clock buffer with DCC. (b) DCC circuit. (c) MUX output with DCD.

#### A. TX Datapath and Clock Buffers

The detail of the datapath is shown in Fig. 3. The datapath has a 64-bit input which is then fed to the FIR block to generate the pre- and post-cursor stream. The Coeff MUX allows the different streams to be selected for the pre-, post-, and main-cursor slices. In order to program the amount of pre-/post-cursor equalizations, the inputs of the pre- and

post-cursor slices can be switched between the pre-/post-cursor data and the main-cursor data. The input to the main-cursor can also be tied to static high and low to implement a shunting path to reduce the main-cursor amplitude. For example, suppose  $M$  of the  $N$  driver are tied to static high and low, then regardless of the input data, there will always be  $M$  shunt devices as in Fig. 1(b), and the output will be reduced as given by (1). This allows the main-cursor amplitude to be



Fig. 6. Transmitter front end.



Fig. 7. (a) Slice-based impedance control. (b) Analog loop-based impedance control.

adjusted independently to support 10G-KR protocol. Each of the pre-, main-, and post-cursor data stream consists of odd and even data path spaced 1UI apart. These six half-rate (2T) data streams are then fed to the 2-to-1 MUX, clocked by the half-rate clock (16.375 GHz) to generate the full rate. The clocks for the various blocks are divided down from a 16.375-GHz clock. The most critical blocks in the datapath are the retimer and 2-to-1 mux. Fig. 4(a) and (b) shows the 2-to-1 MUX, and

its associated timing diagram, respectively. The clock selecting the odd and even samples at the 2-to-1 MUX need to be properly centered such that the MUX output jitter only depends on the clock  $\text{CK}_{2T}$ . Fig. 4(c) shows the bathtub curve of jitter versus setup time at the MUX output. At the two ends of the curve, the jitter will be increased due to insufficient timing margins ( $T_{\text{setup}}$  and  $T_{\text{hold}}$ ). As the clock to the 2-to-1 MUX determines the jitter performance of the TX, the duty cycle of



Fig. 8. (a) Typical  $I_d$  versus  $V_{ds}$  of FET. (b) AC/DC resistance of FET.



Fig. 9. Hybrid impedance control scheme.

the clock must be 50% to avoid duty cycle distortion (DCD). This is achieved by adding a duty cycle correction (DCC) circuits to the clock buffer, as shown in Fig. 5(a) and (b). The design is similar to that in [8]. The DCC circuits consist of a bank of tristate buffer in parallel to the main buffer inverter, where each pull-up and pull-down cascade transistors is controlled independently. Fig. 5(c) shows the impact of the MUX output eye diagram when there is DCD.

#### B. TX Front End

Fig. 6 shows the design of the TX front end. The topology of the driver slices as shown is known as source-series-termination (SST) TX [7]. For each driver, the terminating

resistance comprises of a field-effect transistor (FET) switch and a series resistor. Ideally, the resistance of the series resistor should dominate the total resistance as the resistance of the FET switch is rather non-linear and susceptible to process variations [7]. However, having a large FET switch will increase the overall power consumption due to the fan-out requirement for the pre-drivers and clock buffers. Thus, there is a tradeoff between the switch size and the series resistor size. In Fig. 6, the SST driver slice comprises of the FET switches ( $M_P$  and  $M_N$ ), series resistors ( $R_S$ ), and FET programmable resistors ( $M_{ctrlp}$  and  $M_{ctrln}$ ). Out of the 85 driver slices, ten of them are programmable. (That is, they can be turned ON or OFF). The programmable driver slices



Fig. 10. Global resistor calibration.

provide coarse impedance control and the FET programmable resistors ( $M_{ctrlp}$  and  $M_{ctrln}$ ) provide fine impedance control. More details of the impedance control scheme are provided in Section III. The programmability is implemented by NAND and NOR gates as the pre-drivers (Fig. 6). The NAND/NOR gates are designed with a fan-out of 1.5, and optimized for the high-speed path by sizing up the series device in the NAND/NOR connected to the static enable signal. The front end also contains an impedance control loops and FIR compensation blocks, which is explained in detail in the following sections. The TX driver outputs are protected by ESD diodes, and its bandwidth is enhanced with on-chip series inductors [7].

The TX driver uses two regulators to control the output swing and common-mode,  $V_{refp}$  regulator and the  $V_{refn}$  regulator. Since  $V_{refp}$  and  $V_{refn}$  can be set independently, the swing and common-mode are now independently controlled. For example for a swing of 0.9 V<sub>pp</sub>,  $V_{refp}$  is set at 0.9 V and  $V_{refn}$  is grounded. For a swing of 0.25 V<sub>pp</sub>,  $V_{refp}$  is set at 0.575 V and  $V_{refn}$  is set at 0.325 V. In both cases, the common-mode is 0.45 V. The  $V_{refp}$  regulator consists of a high gain, low-bandwidth amplifier with a thin oxide nMOS source follower ( $M_{N0}$ ) as the power device. The nMOS source follower provides lower output impedance for good return loss at mid frequency range. The dominant pole is set at the gate of  $M_{N0}$  to have stability margins >75°. Thanks to high intrinsic gain in FinFET process, the regulator relies on the  $M_{N0}$  to provide the line regulation and >20-dB PSRR from AVTT to  $V_{refp}$ . The  $V_{refn}$  regulator consists of a high gain, two-stage amplifier with  $M_{N1}$  as the power device. Miller compensation is used to ensure that the dominant pole is set at the gate of  $M_{N1}$  to have stability margins >70°. For high output swing (e.g., 0.8 V<sub>pp</sub> and above),  $V_{refn}$  regulator is powered down and transistor  $M_{N1}$  is used as a switch to gnd.

### III. IMPEDANCE CONTROL LOOP

For a TX, its output impedance needs to match the characteristic impedance of the channel to minimize reflection induced intersymbol interference [9]. In addition, for many protocols, the TX needs to meet the specification defined by a return loss mask. Furthermore, for voltage-mode TX, its output swing is determined by the impedance ratio between the driver and the output load. Thus, impedance control is necessary to meet both the return loss and output swing requirements. Two

frequently used approaches are slice-based approach as in [7], and analog loop-based approaches as in [6], [9], and [10].

#### A. Slice-Based Impedance Control

Fig. 7(a) shows an example of a slice-based approach. In this approach, the number of driver slices enabled can be programmed depending on the impedance of a unit slice. In 16-nm FinFET technology, the series resistor can have a variation of 16% and the transistor resistance can change by 20%. Thus if we rely only on slice programming, the number of slices needs to change by as much as 36%. This results in substantial increase in power as the pre-driver and clock buffers need to be correspondingly sized up due to the added gate load capacitance. Another disadvantage is that any mismatch in pull-up/down imbalance due to skewed pMOS and nMOS impedance mismatches cannot be compensated. This will result in an increase in output common-mode ripple. Many protocols require output common-mode ripple to be kept below certain values. Furthermore, slice-based impedance control is generally a foreground technique as it will be difficult to sense the impedance of the driver during normal operation.

#### B. Analog Loop-Based Impedance Control

Another approach to impedance control is to adjust the impedance of the driver based on an analog impedance loop [Fig. 7(b)]. The analog impedance loop make use of a replica circuit to tune the impedance of an FET. In [6], the supply of the pre-driver is used to tune the impedance of the transistor switch by adjusting its  $V_{gs}$ . In [9] and [10], another FET acting as a programmable resistor is added. In Fig. 7(b), the two impedance loops tune  $M_{UP}$  and  $M_{DN}$  such that  $Z_{UP}$  and  $Z_{DN}$  are both 50 Ω, and  $V_{ref}/2$  is dropped across the 100-Ω replica load. One advantage of using impedance loops is that because a replica loop is used, the impedance control can be done in the background. With impedance loop, the FET programmable resistor ( $M_{UP}$  and  $M_{DN}$ ) impedance range must be large enough to compensate for the SST's FET switch ( $M_P$  and  $M_N$ ) and series resistor ( $R_P$  and  $R_N$ ) variations. This leads to two problems. First, due to the reduced supply-level and headroom in advanced CMOS technology and the requirement to support wide range of TX output swing (from 0.25 to 0.9 V<sub>pp</sub>), it is difficult to keep the FET in the driver slice in the linear region over a wide range of impedance values. Second, the  $V_{ds}$  across the FET will change with the series resistor variations. For example, when the resistance value of the series resistor ( $R_P$  and  $R_N$ ) is low, its voltage drop will decrease since the load current is fixed by the output load ( $V_{REF}/(2 \times R_L)$ ). This in turn causes the  $V_{ds}$  of the FET ( $M_{UP}$ ,  $M_{DN}$ ,  $M_P$ , and  $M_N$ ) to increase as the voltage drop across the driver slice will be fixed at  $V_{REF}/4$  due to the impedance control loop. Fig. 8(a) and (b) shows the typical  $I_d$  versus  $V_{ds}$  of an FET, and its dc and ac resistance, respectively. From Fig. 8(b), as the  $V_{ds}$  of the FET increases beyond 100 mV, the small-signal (ac) and large-signal (dc) impedance of the FET will start to deviate. Due to this divergence, good swing control that depends on dc resistance, and good return loss



Fig. 11. FIR compensation circuits.



Fig. 12. Waveforms for FIR compensation circuits.

that depends on ac resistance cannot be simultaneously met for large  $V_{ds}$  across the FET. Thus, it is essential to keep  $V_{ds}$  across the FET small for good swing and return loss performance. Furthermore, analog impedance control loops [Fig. 7(b)] typically only tune the large-signal (dc) impedance of the driver slices.

### C. Hybrid Impedance Control Scheme

In this paper, a hybrid approach is proposed to control the impedance of the TX (Fig. 9). Slice-based impedance control is used to compensate for only  $\pm 10\%$  of the series

resistor variation. The remaining FET impedance variations and  $\pm 6\%$  of the series resistor variations are compensated using analog loop-based impedance control. Since the coarse tuning is already done by the slice-based impedance control, the required range for FET programmable resistor in the analog loop-based impedance control is much reduced. With the hybrid approach, the  $V_{ds}$  across the FET can be kept small to ensure good swing and return loss performance. Furthermore, the analog impedance control loop will also be able to compensate for any skewed pMOS and nMOS impedance mismatches.



Fig. 13. (a) Simulated supply current variation through regulator with and without FIR compensation. (b) Simulated  $V_{\text{refp}} - V_{\text{refn}}$  variation of transmitter with and without FIR compensation.



Fig. 14. Die photograph for transmitter.

Fig. 9 shows the implementation of the hybrid impedance control scheme. Slice-based impedance control is implemented by adjusting the number of driver slices from 75 to 85 based on resistor variations of the process. The resistor variations are determined via a global resistor calibration (RCAL) circuits within the FPGA (Fig. 10). The RCAL uses a feedback loop to match an on-chip programmable resistor to an external 100-Ω resistor, and is performed during startup. Meanwhile, the analog impedance loop-based control is implemented as shown in Fig. 9 with a replica driver slice. The replica branch is sized to have 1/40 scaling ratio of the main driver when there are 80 driver slices (corresponding to typical resistor corner) to achieve a good tradeoffs between sensitivity to mismatch and power consumption. Monte Carlo simulations are performed to verify that the TX swing  $3\sigma$  variation is less than 5%, and the return loss meet the composite return loss



Fig. 15. Measurement setup.

masks for key protocols. Since the impedance loop will ensure that  $(V_{\text{refp}} - V_{\text{refn}})/4$  is dropped across the pMOS and nMOS replica driver slice,  $V_{\text{ctrlp}}$  and  $V_{\text{ctrln}}$  will be adjusted until

$$TX_{P,\text{rep}} = V_{\text{refp}} - \frac{1}{4}(V_{\text{refp}} - V_{\text{refn}}) = \frac{3}{4}V_{\text{refp}} + \frac{1}{4}V_{\text{refn}} \quad (2)$$

and

$$TX_{N,\text{rep}} = \frac{1}{4}(V_{\text{refp}} - V_{\text{refn}}) + V_{\text{refn}} = \frac{1}{4}V_{\text{refp}} + \frac{3}{4}V_{\text{refn}}. \quad (3)$$

The input to the impedance loop  $V_{\text{Ratio},p}$  and  $V_{\text{Ratio},n}$  is used not only to set the value of  $TX_{P,\text{rep}}$  and  $TX_{N,\text{rep}}$ , but also to compensate for the change in the scaling ratio of the replica branch and the main driver when the number of main driver slices changes due to on-chip resistor corner variations. This also eliminates the need of replica load trimming. An on-chip passive resistor  $R_{L,\text{rep}}$  is used instead as the load resistor in the replica branch. Although  $R_{L,\text{rep}}$  will not track the load impedance  $R_L$  across PVT, its variations can also be compensated by adjusting  $V_{\text{Ratio},p}$  and  $V_{\text{Ratio},n}$ . The required adjustment can be determined since the on-chip resistor variation is known via RCAL.



Fig. 16. Measured TX output swing.



Fig. 17. (a) Measured differential-mode return loss at 250-mVpp TX swing. (b) Measured common-mode return loss at 250-mVpp TX swing.

Suppose from RCAL, we know the on-chip resistance is increased by 10%. Since we still want the driver slice resistance to remains the same, the factor of 1/4 in (2) and (3) needs to be adjusted. The new voltage drop across the pMOS and nMOS replica SST slice is no longer  $50/(50 + 100 + 50) = 0.25$  of  $V_{\text{refp}} - V_{\text{refn}}$ , but  $50/(50 + 110 + 50) = 0.238$  of  $V_{\text{refp}} - V_{\text{refn}}$ .

#### IV. FIR COMPENSATION CIRCUIT

In implementing FFE equalization by switching between pre-, post-, and main-cursor data stream to the TX driver slices (Figs. 1 and 2), it is straightforward to adjust the amount of

desired equalization. However, this approach has a disadvantage that the average current drawn from the regulators can be data dependent. When there is a change in the average current, there will be a disturbance at  $V_{\text{refp}}$  and  $V_{\text{refn}}$ . This disturbance in turn can cause an increase in the DJ of the TX driver. In this paper, an FIR compensation circuit is proposed to generate a data-pattern-dependent dummy current that on average cancels out the current drawn by the TX driver slices. As a result, the total average current drawn from the regulators will be constant and independent of the data pattern.

Fig. 11 shows the voltage-mode TX with the FIR compensation circuit. The compensation circuit comprises of two main components: the event detector (consists of 2 XNOR



Fig. 18. (a) Measured differential-mode return loss at 900-mVpp TX swing. (b) Measured common-mode return loss at 900-mVpp TX swing.



Fig. 19. Measured TX output with pre-cursor sweep.

gates), and two current DACs connected across  $V_{\text{refp}}$  and  $V_{\text{refn}}$ . The DACs are 5-bit programmable current sources to generate the compensation current for different output swing and FIR settings of the TX. With pre- or post-cursor equalization, the output amplitude is large whenever the main input  $D_{\text{in1T}}$  is different from its previous or subsequent bit, respectively. The current through the regulator  $I_{\text{TX}}$  will also change correspondingly by drawing the smallest current when the output amplitude is largest. Fig. 12 shows an example with post-cursor equalization. Thus, in order to maintain a constant average current through the regulators, the FIR compensation

circuit needs to generate a compensation current  $I_{\text{comp,ideal}}$  that draws the maximum current whenever the main input  $D_{\text{in1T}}$  is different from its previous bit as shown in Fig. 12. Since we are only interested in the average current, we can implement the detector in an energy-efficient way by using the 2T data stream. With the main 2T odd and even streams ( $D_{\text{in2T,odd}}$  and  $D_{\text{in2T,even}}$ ), the conditions to generate the compensation current can be detected using XNOR gates. In Fig. 12, whenever there is an event where the main input  $D_{\text{in1T}}$  is different from its previous bit, the DACs are enabled to draw a dummy current from the regulated supply. Each XNOR gate drives



Fig. 20. Measured TX output with post-cursor sweep.



Fig. 21. Measured TX output with main-cursor sweep.

a current DAC, and their outputs are combined to give the compensation current  $I_{\text{comp}}$ . Although  $I_{\text{comp}}$  is not the same as the ideal compensation current  $I_{\text{comp,ideal}}$ , their average are the same. The value of the current DACs can be programmed (with sel<0:4>) to match the strength of the equalization in the TX.

Fig. 13 shows the simulation results of a voltage-mode TX with the FIR compensation circuit. Two data patterns are

used to create a change in the averaged current. Without the FIR compensation, it can be seen that when the averaged current changes (at around 50 ns), there is a corresponding perturbation in the regulator voltage. With FIR compensation, the averaged current is now constant, and there is no perturbation in the regulator voltages. For this example, the FIR compensation circuit has a power overhead of about 15%.



Fig. 22. Measured DCD versus DCC code of transmitter.



Fig. 23. (a) Measured TX eye diagram without equalization at 32.75 Gb/s. (b) Measured TX eye diagram with equalization at 32.75 Gb/s.

However, the actual power overhead is dependent not only on equalization setting, but also on the actual data pattern

## V. EXPERIMENTAL RESULTS

The TX is fabricated in a 16-nm finFET CMOS process. This process utilizes a double patterning for both base layers and lower-level metal layers. To facilitate layout effort, the lower double-patterned metal is routed in orthogonal directions to the extent possible. The TX is assembled in a flip-chip. Fig. 14 shows the die photograph of the TX. It measures  $480 \mu\text{m} \times 460 \mu\text{m}$ . Slice-based layout is used in the driver and datapath to improve matching. The skews between the driver slices are minimized by matching the layout of the clock traces and output traces.

The measurement setup is shown in Fig. 15. Time-domain analysis is performed with Agilent Technologies DSA-X

93204A 80GS/s real-time scope, and return loss results are captured with Agilent Technologies Network Analyzer N5245A. Fig. 16 shows the low-frequency output pattern with different swing settings (250, 600, and 900 mV<sub>pp</sub>). Figs. 17 and 18 show the measured differential-mode and common-mode return losses for TX output swing of 250 and 900 mV<sub>pp</sub>, respectively. The measurement is done with temperature variations from  $-40^\circ\text{C}$  to  $100^\circ\text{C}$ , and with  $\pm 5\%$  variations in supplies. As shown in Figs. 17 and 18, the composite return loss masks for key protocols are also added, and it can be seen that the transmitter meets the return loss requirements. The high swing accuracy and good return loss performance confirm that the impedance control scheme is working properly. Figs. 19 and 20 show the low-frequency output pattern with different pre- and post-cursor settings, respectively. The pre-cursor is swept from 0% to 18.75%, and the post-cursor is swept from 0% to 38.75%. Fig. 21 shows



Fig. 24. Measured worst jitter performance at 31.2-Gb/s across temperature and supply variations.

TABLE I  
TRANSMITTER PERFORMANCE COMPARISON

| References                       | Chiang 2014 [11] | Hafez 2013 [12] | Menolfi 2012 [13] | Kim 2015 [14] |       | This Work              |       |
|----------------------------------|------------------|-----------------|-------------------|---------------|-------|------------------------|-------|
| <b>CMOS Technology</b>           | 65nm             | 65nm            | 32nm              | 14nm          |       | 16nm                   |       |
| <b>Driver Topology</b>           | CML              | CML             | SST               | SST           |       | SST                    |       |
| <b>TX FFE</b>                    | 2-tap            | No EQ           | 4-tap             | 4-tap         |       | 3-Tap                  |       |
| <b>Data Rate (Gb/s)</b>          | 60               | 48              | 28                | 28            | 40    | 28                     | 32.75 |
| <b>Power (mW)*</b>               | 374.9            | 59.4**          | 217               | 195           | 518   | 101.84                 | 120.8 |
| <b>Energy Efficiency (pJ/b)*</b> | 6.25             | 1.24            | 7.75              | 6.95          | 12.95 | 3.64                   | 3.69  |
| <b>RJ (ps-rms)</b>               | 0.461            | 0.251           | N/A               | 0.33          | 0.51  | 0.22                   | 0.22  |
| <b>TJ(BER=10^-12)(ps-pp)</b>     | 5.33             | N/A             | 6***              | 10.72         | 12.89 | 6.52                   | 6.49  |
| <b>ISI(ps-pp)</b>                | N/A              | N/A             | N/A               | 4.13          | 4.46  | 2.68                   | 2.44  |
| <b>Area(mm^2)</b>                | 2.1              | 0.4             | 0.036             | 0.028         |       | 0.22 (***/incl. decap) |       |

\* PLL power excluded

\*\* TX is not fully featured

\*\*\* Measurement were done by on-wafer probing

\*\*\*\* Includes decap for 0.9V/1.2V supplies and Vrefp/Vrefn/Clock Regulators

the low-frequency output pattern with different main-cursor settings. The independent control of the main-cursor setting provides support for 10G-KR style of equalization. The DCD performance of the TX versus different DCC code is shown in Fig. 22. The 6-bit DCC allows duty cycle to be adjusted from 48% to 52.5%. The minimum DCD measured is 60 fs. Fig. 23 shows the output eye diagram of the TX is measured at 32.75 Gb/s with PRBS7 input. With equalization, the ISI improves from 5.59 to 2.44 ps-pp. The measured total

jitter (TJ) and random jitter (RJ) are 6.49 ps-pp and 220 fs-rms, respectively. The measurements are done with a channel loss of about 2 dB due to PCB trace, connector, and cable in the setup (Fig. 15). Fig. 24 shows the worst jitter performance at 31.2 Gb/s with PRBS7 input measured across temperature variations from  $-40^{\circ}\text{C}$  to  $100^{\circ}\text{C}$ , and with  $\pm 5\%$  variations in supplies for TX output swing of 250, 600, and 900 mV<sub>pp</sub>.

The TX consumes 120.8 mW, giving an energy efficiency 3.69 pJ/b. Of the 120.8-mW power consumption, the



Fig. 25. Power consumption breakdown of transmitter.

contribution of the TX datapath, the clock buffers, and the TX front end are 39.6, 72.7, and 8.5 mW, respectively (Fig. 25). Table I shows the performance comparison with recent TXs.

## VI. CONCLUSION

In this paper, a 32.75-Gb/s voltage-mode TX with three-tap FFE is presented. The TX uses two regulators to set its output swing and common-mode. A hybrid impedance control scheme which uses a slice-based control for coarse impedance tuning, and an analog loop-based impedance control for fine impedance control is also presented in this paper. With the hybrid impedance control scheme, good swing accuracy and return loss performance can be achieved across a wide range of TX output swing (from 0.25 to 0.9 V<sub>pp</sub>). The TX also has an FIR compensation circuit to compensate for data dependent current due to equalization. The TX is fabricated in 16-nm FinFET CMOS, and consumes 120.8 mW from 0.9- and 1.2-V supplies, giving an energy efficiency 3.69 pJ/b. The TX achieves TJ of 6.49 ps-pp and RJ of 220 fs-rms at 32.75 Gb/s with bit error rate of 1e-12.

## ACKNOWLEDGMENT

The authors would like to thank S. Pan and C. H. Lee for measurement support, and R. D. Cruz, B. Wan, M. Yee, and R. D. Cruz for layout support.

## REFERENCES

- [1] Y. Frans *et al.*, “A 0.5–16.3 Gb/s fully adaptive-flexible reach transceiver for FPGA in 20 nm CMOS,” *IEEE J. Solid-State Circuits*, vol. 50, no. 8, pp. 1932–1944, Aug. 2015.
- [2] P. Upadhyaya *et al.*, “A fully-adaptive wideband 0.5–32.75 Gb/s FPGA transceiver in 16 nm FinFET CMOS technology,” in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2016, pp. 1–2.
- [3] N. Kocaman *et al.*, “A 3.8 mW/Gbps quad-channel 8.5–13 Gbps serial link with a 5 tap DFE and a 4 tap transmit FFE in 28 nm CMOS,” *IEEE J. Solid-State Circuits*, vol. 51, no. 4, pp. 881–892, Aug. 2015.
- [4] J. F. Bulzacchelli *et al.*, “A 28-Gb/s 4-tap FFE/15-tap DFE serial link transceiver in 32-nm SOI CMOS technology,” *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 3232–3248, Dec. 2012.
- [5] S. Saxena, R. K. Nandwana, and P. K. Hanumolu, “A 5 Gb/s energy-efficient voltage-mode transmitter using time-based de-emphasis,” *IEEE J. Solid-State Circuits*, vol. 49, no. 8, pp. 1827–1836, Aug. 2014.
- [6] Y. Lu, K. Jung, Y. Hidaka, and E. Alon, “Design and analysis of energy-efficient reconfigurable pre-emphasis voltage-mode transmitters,” *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1898–1909, Aug. 2013.
- [7] M. Kossel *et al.*, “A T-coiled-enhanced 8.5 Gb/s high-swing SST transmitter in 65 nm bulk CMOS with <−16dB return loss over 10 GHz bandwidth,” *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2905–2920, Dec. 2008.
- [8] Y. Frans *et al.*, “A 40-to-64 Gb/s NRZ transmitter with supply-regulated front-end in 16 nm FinFET,” *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 3167–3177, Dec. 2016.
- [9] Y.-H. Song, R. Bai, K. Hu, H.-W. Yang, P. Y. Chiang, and S. Palermo, “A 0.47–0.66 pJ/bit, 4.8–8 Gb/s I/O transceiver in 65 nm CMOS,” *IEEE J. Solid-State Circuits*, vol. 48, no. 5, pp. 1276–1289, May 2013.
- [10] Y. H. Song and S. Palermo, “A 6-Gbit/s hybrid voltage-mode transmitter with current-mode equalization in 90-nm CMOS,” *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 59, no. 8, pp. 491–495, Aug. 2012.
- [11] P.-C. Chiang, H.-W. Hung, H.-Y. Chu, G.-S. Chen, and J. Lee, “60 Gb/s NRZ and PAM4 transmitters for 400 GbE in 65 nm CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 42–43.
- [12] A. A. Hafez *et al.*, “A 32-to-48 Gb/s serializing transmitter using multiphase sampling in 65nm CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 38–39.
- [13] C. Menolfi *et al.*, “A 28 Gb/s source-series terminated TX in 32 nm CMOS SOI,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2012, pp. 334–335.
- [14] J. Kim *et al.*, “A 16-to-40 Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14nm CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 60–62.
- [15] K. L. Chan *et al.*, “A 32.75-Gb/s voltage mode transmitter with 3-tap FFE in 16 nm CMOS,” in *IEEE A-SSCC Dig. Tech. Papers*, Nov. 2016, pp. 233–236.
- [16] L. Yu *et al.*, “25G long reach cable link equalization optimization,” in *Proc. DesignCon*, Jan. 2016.



**Kok Lim Chan** received the B.Eng. and M.Eng. degrees in electrical engineering from Nanyang Technological University, Singapore, in 1998 and 2000, respectively, and the Ph.D. degree in electrical engineering from the University of California, San Diego, CA, USA, in 2008.

He is currently a Staff Engineer with Xilinx, Singapore. His current research interests include high-speed high-performance data converters and analog front ends for high-speed serial links.



**Kee Hian Tan** received the B.S. and M.Eng. degrees in electrical engineering from the National University of Singapore, Singapore, in 2000 and 2001, respectively.

He joined Marvel Asia, Singapore, in 2001 and LSI, Singapore, in 2008, where he was involved in hard-disk preamplifier design. Since 2012, he has been leading a team of circuit designers developing analog front-end and high-speed digital circuit blocks for SerDes PHY with Xilinx, Singapore.



**Yohan Frans** received the B.S. degree in electrical engineering from the Bandung Institute of Technology, Bandung, Indonesia, in 1995, and the M.S. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 2001.

From 2001 to 2012, he was with Rambus Inc., USA, where he was involved in high-performance and low-power serial links and memory interfaces as a Circuit Design Engineer, a Circuit Architect, and a Design Manager. Since 2012, he has been with Xilinx Inc, San Jose, CA, USA. He is currently

leading design teams as a Senior Engineering Director with the Xilinx Serdes Technology Group, developing high-speed wireline transceivers for advanced field-programmable gate array. His current research interests include a high-speed mixed-signal circuit design, serial-link architecture, transmitter/receiver design, phase-locked loop/delay locked loop, memory interfaces, and low-power circuit architectures.



**Siok Wei Lim** received the B.E. degree in electrical engineering from the University of Malaya, Malaysia, in 2006, and the M.Sc. degree in electrical engineering from the National University of Singapore, Singapore, in 2013.

She was with Spansion, Malaysia, for 1.5 years. She joined Xilinx, Singapore, in 2008, where she is currently an IC Design Engineer with the SerDes Technology Group.



**Jay Im** received the B.S. degree in electronics engineering and the M.S. degree in physics from Seoul National University, Seoul, South Korea, and the Ph.D. degree in physics from Ohio State University, Columbus, OH, USA, in 2001. His thesis was on nanometer-scale study of metal contacts to wide bandgap semiconductor materials, including SiO<sub>2</sub>, SiC, and GaN.

From 2001 to 2004, he was a Process/Device Engineer with Silicon Storage Technology Inc., Sunnyvale, CA, USA, where he was involved in developing

CMOS compatible flash memory IP. In 2004, he joined Xilinx Inc., San Jose, CA, USA, where he was involved in various projects, including embedded nonvolatile memory technology development, E-fuse process design kit development, custom design of one time programmable macro embedded in field-programmable gate array, and register transfer language design of interface block for foundry IP, and led the preproduction diagnostic module group designing and testing silicon test vehicles for the new technology nodes. He is currently leading a group of circuit designers developing analog front-end and high-speed digital circuit blocks for high-speed SERDES PHY, and is responsible for the development of next-generation SerDes IP to be embedded in Xilinx field-programmable gate array.



**Arianne Roldan** received the B.S. degree in computer engineering and the M.S. degree in electrical engineering from the University of the Philippines, Quezon City, Philippines, in 2001 and 2003, respectively.

From 2003 to 2007, she was with Canon Information Technologies, Philippines, Intel Technology, Philippines, and Chartered Semiconductor (now Global Foundries), Singapore, consecutively. She participated in design, verification, and validation of various IPs such as memory controller, flash

memory, IOs, and standard cells. In 2008, she joined Xilinx, Singapore, where she was involved in different field-programmable gate array building blocks. She is currently a Staff Design Engineer with the SERDES Technology Group, Xilinx. Her current research interests include the high-speed, low-power circuit design and methodology development.



**Nakul Narang** received the B.Tech. degree from the Jaypee University of Information Technology, Solan, India, in 2007, and the M.S. degree from Nanyang Technological University, Singapore, in 2015.

He is currently a Designer with Xilinx Asia Pacific Pte. Ltd, Singapore. His current research interests include high-speed regulators, high-speed mixed-signal circuits, and transceiver's architectures.



**Chin Yang Koay** received the B.S. degree in electrical and electronic engineering from Nanyang Technological University, Singapore, in 2013, where he is currently pursuing the M.Sc. degree in electrical engineering.

In 2013, he joined the SerDes Technology Group, Xilinx, Singapore, where he has been involved in high-speed wireline transceiver circuits. His current research interests include wireline transceiver and mixed-signal circuit design.



**Hongyuan Zhao** was born in Shenyang, China, in 1987. He received the B.S. degree in electronics and information engineering from Dalian Jiaotong University, Dalian, China, in 2010, and the M.Sc. degree in microelectronics engineering from the National University of Singapore, Singapore, in 2015.

In 2015, he joined Xilinx Asia Pacific Pte. Ltd, Singapore, where he was involved in high-speed mixed-signal design and high-speed digital circuit design. His current research interests include SerDes architecture, and high-speed mixed-signal design and optimization.



**Parag Upadhyaya** was born in Kathmandu, Nepal. He received the B.S.E.E., M.S.E.E., and Ph.D. (Hons.) degrees in electrical engineering from Washington State University, Pullman, WA, USA, in 2000, 2005, and 2008, respectively.

From 2001 to 2003, he was with Cypress Semiconductor, Austin, TX, USA, where he was involved in the design of high-speed wireline/optical transceivers. He is currently a Senior Design Manager with Xilinx, San Jose, CA, USA, and leads the development of high-speed transceivers for field-programmable gate array applications. He has authored or co-authored over 40 journal, conference, and book chapter publications. His current research interests include high-speed wireline and wireless transceivers, semiconductor device physics, and low-power RFICs.

Dr. Upadhyaya was a recipient of the Outstanding Ph.D. Student Award in 2008 at Washington State University, the Best Paper Award in 1998 for his work on tribology, and the Best Poster Awards in 2004 and 2006 for his works on subharmonic mixer and novel mixed-signal circuits for wireline/wireless transceiver, respectively.



**Ping-Chuan Chiang** was born in Yilan, Taiwan, in 1987. He received the B.S.E.E. degree from National Tsing Hua University (NTHU), Hsinchu, Taiwan, in 2009, and the M.S.E.E. degree from National Chiao-Tung University (NCTU), Hsinchu, in 2011, and the Ph.D.E.E. degree from National Taiwan University (NTU), Taipei, Taiwan, in 2015.

From 2015 to 2016, he served as an Administrative Cadre at the Taiwan Military Service. In 2016, he joined the SerDes Technology Group, Xilinx, Singapore, where he has been involved in high-speed wireline transceiver circuits. His current research interests include RF techniques, high-caliber SerDes transceivers, wireline communication theory, and optoelectronic integrated circuits.

Dr. Chiang received the Presidential Award from NTHU and NCTU in 2006 and 2008, respectively, the Outstanding Undergraduate Student Award from Ho-Ping power company in 2008, the IEEE Solid-State Circuits Society Student Travel Grant Award at the 2013 ISSCC, the NTU Outstanding Student Research Award in 2014, and the NOVATEK Fellowship in 2010, 2013, and 2014. He has been a reviewer for the *IEEE Journal of Solid-State Circuits*, the *IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS*, and the *IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS*, since 2016.



**Ken Chang** (M'99–SM'14) received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, in 1990, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1994 and 1999, respectively.

From 1999 to 2010, he was with Rambus Inc, USA. He led several projects including 5 Gb/s/lane–12 Gb FlexIO interface for CELL processors and 16 and 20 Gb/s low-power memory interfaces exploring various signaling techniques. Since 2010, he has been with Xilinx Inc., San Jose, CA, USA, and led the SerDes Technology Group, focused on developing multistandard SerDes IPs for field-programmable gate arrays, covering top line rates from 10, 28, and 56 Gb/s, all capable of long reach transmission. His current research interests include high-speed mixed-signal CMOS circuit design, transmitter and receiver design, clock data recovery, equalization, phase-locked loop/delay locked loop design, circuit noise analysis, signal integrity analysis, and mixed-signal design methodology. He has authored or co-authored more than 40 IEEE conference/journal publications and holds more than 30 U.S. patents in the high-speed link area. He has co-authored 2008 and 2014 CICC best regular papers.

Dr. Chang is the Technical Program Co-Chair of the 2017 VLSI circuit symposium and has served on technical program committees for ISSCC and CICC.