

# A 64-Gb/s 1.4-pJ/b NRZ Optical Receiver Data-Path in 14-nm CMOS FinFET

Ilter Ozkaya, *Student Member, IEEE*, Alessandro Cevrero, *Member, IEEE*,  
 Pier Andrea Francese, *Senior Member, IEEE*, Christian Menolfi, *Member, IEEE*,  
 Thomas Morf, *Senior Member, IEEE*, Matthias Brändli, Daniel M. Kuchta, *Senior Member, IEEE*,  
 Lukas Kull, *Senior Member, IEEE*, Christian W. Baks, Jonathan E. Proesel, *Senior Member, IEEE*,  
 Marcel Kossel, *Senior Member, IEEE*, Danny Luu, *Student Member, IEEE*,  
 Benjamin G. Lee, *Senior Member, IEEE*, Fuad E. Doany, Mounir Meghelli, *Member, IEEE*,  
 Yusuf Leblebici, *Fellow, IEEE*, and Thomas Toifl, *Senior Member, IEEE*

**Abstract**—A 64-Gb/s high-sensitivity non-return to zero receiver (RX) data-path is demonstrated in the 14-nm-bulk FinFET CMOS technology. To achieve high sensitivity, the RX incorporates a transimpedance amplifier whose gain and bandwidth are co-optimized with a 1-tap decision feedback equalization (DFE). The DFE, which operates at quarter-rate, features a look-ahead speculation to relax DFE timing to 4 unit-interval. The analog front end includes a transadmittance transimpedance inductorless variable gain amplifier, resulting in a low power and compact front end. The RX, wirebonded to a discrete GaAs photodiode, achieves an energy efficiency of 1.4 pJ/bit and  $-5\text{-dBm}$  optical modulation amplitude while recovering PRBS-7 data (bit-error-rate  $< 10^{-12}$ ) modulated by a VCSEL driver with a 2-tap feed forward equalization (FFE) (main + precursor) over 7 m of graded-index 50/125- $\mu\text{m}$  multimode fiber. The measured sensitivities at 56 and 32 Gb/s are  $-9$ - and  $-13\text{-dBm}$  optical modulation amplitude, respectively.

**Index Terms**—Decision feedback equalization (DFE), I/O link, non-return to zero (NRZ), optical receiver, receiver (RX), self-timed comparator, sensitivity, shunt feedback, transimpedance amplifier (TIA), variable gain amplifier (VGA).

## I. INTRODUCTION

WITH the ever-increasing growth of cloud computing and big data applications, serial data-rates beyond 50 Gb/s/lane will eventually be required in wireline communications both inside and between racks in data-centers [1], [2]. The industry currently has developed standards

Manuscript received April 24, 2017; revised July 4, 2017; accepted July 20, 2017. Date of publication November 10, 2017; date of current version November 21, 2017. This paper was approved by Guest Editor Azita Emami. This work was supported in part by the European Union's Seventh Framework Programme under Grant FP7/2007-2013 and in part by the ADDAPT Project under Grant 619197. (*Corresponding author: Ilter Ozkaya*.)

I. Ozkaya is with IBM Research–Zurich, CH-8803 Rüschlikon, Switzerland, and also with the Microelectronic Systems Laboratory, Swiss Federal Institute of Technology, CH-1015 Lausanne, Switzerland (e-mail: ilt@zurich.ibm.com).

A. Cevrero, P. A. Francese, C. Menolfi, T. Morf, M. Brändli, L. Kull, M. Kossel, D. Luu, and T. Toifl are with IBM Research–Zurich, CH-8803 Rüschlikon, Switzerland (e-mail: ace@zurich.ibm.com).

D. M. Kuchta, C. W. Baks, J. E. Proesel, B. G. Lee, F. E. Doany, and M. Meghelli are with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598 USA.

Y. Leblebici is with the Microelectronic Systems Laboratory, Swiss Federal Institute of Technology, CH-1015 Lausanne, Switzerland.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2734913

that cover 56-Gb/s (OIF-CEI-56G) electrical interfaces to meet those demands. However, copper interconnects experience frequency-dependent losses, which require complex and power hungry equalization techniques. For instance, a 30-dB additional channel loss decreases the energy efficiency of the overall transceiver by roughly ten times [3]. On the contrary, optical interconnects have negligible frequency-dependent loss and can transport data over long distances with little or no equalization. As a consequence, optical links are potential candidates to replace copper interconnects for distances as short as 1 m in the near future [4]. To become a viable alternative to well-established electrical links, optical solutions must have competitive power and area efficiency metrics.

Commercially available optical engines are typically located on the board-edge, which still requires an electrical link (such as CEI-56G-VSR) to reach the host chip (either a CPU, a switch ASIC, or an FPGA). Integrating the optical transceiver in the first-level host chip package will improve I/O power efficiency and provide higher bandwidth density. Processor modules with integrated optics have been proposed since 2005 for high-performance computing [5] but have not been considered for data-centers due to high cost. However, at 56 Gb/s and beyond, replacing electrical cables with optics is expected to have similar or lower cost per Gb/s.

Implementing such a system is not without challenges. First, the optical RX circuits need to achieve high sensitivity to accommodate the optical loss in the signal path. Furthermore, low-power operation is a key requirement to meet the thermal design power constraints of the package. For server chips, this is in the order of 150–200 W. To address these requirements, this paper describes a low-power optical RX, measured up to 64 Gb/s with high sensitivity [ $-5\text{-dBm}$  optical modulation amplitude (OMA) at 64 Gb/s] in 14-nm CMOS FinFET. High sensitivity is achieved thanks to a low bandwidth analog front end (AFE) co-optimized with a 1-tap decision feedback equalization (DFE). DFE timing is met thanks to a look-ahead speculative implementation running at quarter-rate (16 GHz). While many state-of-the-art links rely on SiGe circuits [6], this paper proposes a CMOS implementation, since it provides the capability to add digital logic (such as fully digital retiming circuits) and enables integration in a large CMOS die.



Fig. 1. RGC and SFR TIA topology and circuit implementation.

Moreover, the front end has been optimized to reduce inductor count, leading to an extremely compact design ( $0.028 \text{ mm}^2$ ) suitable for multi-channel implementations.

The remainder of this paper is organized as follows. Section II reviews transimpedance amplifier (TIA) architectures, introduces the proposed design methodology to achieve high sensitivity, and compares the proposed techniques with previous work. Section III presents the RX architecture describing in detail circuit implementation of each building block. Section IV describes the experimental setup and presents measurements of the fabricated optical RX. Finally, Section V provides the conclusion.

## II. DESIGN CONSIDERATIONS FOR HIGH SENSITIVITY OPTICAL RX

Since the overall sensitivity of an optical link is determined by the RX TIA, an in-depth signal and noise analysis of the TIA will be provided in this section. First, we derive the optimum SNR in the absence of equalization. Then, we describe how SNR can be improved with a TIA co-optimized with a DFE. The insights described here can guide circuit designers in the TIA optimization for high-speed optical RXs before resorting to time-consuming circuit simulations.

In principle, a simple resistor connected to ground or a common-mode voltage can be used to convert the current of a photo-diode (PD) into a voltage. However, the large capacitance associated with pad and PD severely limits the bandwidth. Hence, to improve gain-bandwidth (GBW) product, active structures are commonly employed. Regulated cascode (RGC) [4] and shunt feedback resistor (SFR) are the most common TIA circuit topologies [7] (Fig. 1). Both RGC and SFR architectures have been compared with the 14-nm FinFET technology. The two designs were optimized for maximum SNR at 64 Gb/s for a given dc gain (50 dB in this case). Our investigations showed that while RGC and SFR TIAs have comparable GBW products and power dissipation, SFR TIA has better SNR, since the output integrated noise of the RGC is approximately 20% higher. The main reason stems from the noise generated by the bias current source, as described in [7]. It must be noted that the dc current source, which is used to cancel the average PD current, is much

smaller (a few hundred  $\mu\text{A}$ ) than the bias current required for a high-bandwidth RGC TIA (at least 2–3 mA), resulting in much smaller noise contribution. Moreover, the average PD current must be subtracted from both SFR and RGC TIAs, which means the current that biases the RGC TIA is an extra noise source. Furthermore, in advanced technology nodes operating at low supply voltages (below 1 V), RGC design is challenging due to limited voltage headroom. Thus, the SFR topology was used for the 64-Gb/s optical RX and will be the focus of the rest of this paper.

### A. Signal Analysis

An SFR TIA together with its model is shown in Fig. 2.

In the circuit,  $C_{\text{pin}}$  and  $C_{\text{out}}$  are the sum of PD capacitance and pad capacitance, and the load capacitance driven by TIA, respectively. In the model,  $C_{\text{in}}$  includes the gate-to-source capacitances ( $C_{\text{gs}}$ ) of the transistors in addition to  $C_{\text{pin}}$ .  $C_f$  consists of the drain-to-gate capacitance ( $C_{\text{gd}}$ ), whereas  $R_{\text{out}}$  is the equivalent output resistance of the inverter. The total transconductance of the CMOS inverter is denoted as  $gm$ . The transimpedance of the TIA can be expressed in terms of those parameters as

$$Z_t(s) = \frac{R_{\text{out}}(gmR_{\text{fb}} - 1 - sC_fR_{\text{fb}})}{1 + gmR_{\text{out}} + sD_{t1} + s^2D_{t2}} \quad (1)$$

where

$$D_{t1} = C_{\text{in}}(R_{\text{out}} + R_{\text{fb}}) + C_fR_{\text{fb}}(1 + gmR_{\text{out}}) \quad (2)$$

$$D_{t2} = R_{\text{fb}}R_{\text{out}}C_{\text{in}}(C_{\text{out}} + C_f). \quad (3)$$

The number of parameters in this equation can be reduced, since some of them are coupled together by the technology node. We can write

$$A = gmR_{\text{out}} \quad (4)$$

$$f_t = \frac{gm}{2\pi C_{\text{gate}}} \quad (5)$$

where  $A$  is the intrinsic gain and  $f_t$  is the transit frequency of the transistors. For a given biasing condition ( $V_{DD}/2$  in this case),  $A$  and  $f_t$  are known. Note that  $C_{\text{gate}}$  is the total gate capacitance, and transistor-level simulations show that it can be distributed between  $C_{\text{gs}}$  and  $C_{\text{gd}}$  with a ratio of 2/3 and 1/3.  $C_{\text{gd}}$  corresponds to  $C_f$ , whereas  $C_{\text{gs}}$  contributes to the input capacitance,  $C_{\text{in}}$ . It must be emphasized that the inclusion of  $C_f$  is extremely important due to the Miller effect. Omitting this capacitance would result in oversimplified models, which may invalidate the further analysis.

We can further reduce the number of parameters by fixing the value of  $C_{\text{pin}}$ , which is determined by the pad and PD capacitances. In the implemented design, the PD and pad capacitances were approximately 60 and 40 fF, respectively. Hence,  $C_{\text{pin}} = 100 \text{ fF}$ . In the given two-pole system, a numerical analysis shows that a high  $C_{\text{out}}/C_{\text{pin}}$  ratio results in peaking in the transfer function. For a maximally flat response, a ratio of less than 0.25 must be satisfied. Therefore,  $C_{\text{out}}$  is taken as 25 fF and it defines the input capacitance of the following stage.

As a result, the full design space can be defined by two parameters:  $R_{\text{fb}}$  and  $gm$ . In Fig. 3,  $R_{\text{fb}}$  is swept for three



Fig. 2. SFR TIA circuit diagram and small signal model.



Fig. 3. TIA 3-dB bandwidth for various  $gm$  values.



Fig. 4.  $gm$  and  $R_{fb}$  values for a target TIA bandwidth.

different  $gm$  values to find the 3-dB bandwidth of the TIA. Since the curves are monotonic, any given bandwidth and  $gm$  pair corresponds to a unique  $R_{fb}$  value. Fig. 4 shows  $gm$  versus  $R_{fb}$  curves with four different TIA bandwidths. The plot clearly shows that for constant bandwidth,  $R_{fb}$  needs to be reduced toward large  $gm$ . This is due to self-loading. The parasitic feedback capacitor  $C_f$  increases due to large transistor size and  $R_{fb}C_f$  becomes the dominant pole.

Fig. 5(a)–(c) shows main cursor [ $V_{\text{Tap}(0)}$ ] together with pre and post cursors at 64 Gb/s across  $gm$ ,  $R_{fb}$  pairs on a constant bandwidth curve. The main cursor and inter-symbol-interference (ISI) components are derived from the

pulse response (1- $\mu$ A amplitude current pulse applied) of the system described by (1). As expected, lowering the bandwidth increases the main cursor due to larger transimpedance gain, which is approximately  $R_{fb}$ . At the same time, bandwidth limitation increases ISI terms. The worst case signal can be calculated as follows:

$$V_{\text{Signal}} = V_{\text{Tap}(0)} - |V_{\text{Tap}(-1)}| - \sum_{n=1}^{\infty} |V_{\text{Tap}(n)}| \quad (6)$$

as shown in Fig. 5(d) for three different TIA bandwidths. Without equalization, the optimal bandwidth is approximately 0.4 times the data-rate (25 GHz at 64 Gb/s). Interestingly, for a given bandwidth, the point with the largest  $V_{\text{signal}}$  does not correspond to the  $gm$  with the largest  $R_{fb}$  due to peaking in the second-order system.

### B. Noise Analysis

The analysis described earlier gives a perspective on the signal point of view only. In order to find the TIA specifications that provide optimum SNR, the noise characteristic must also be analyzed.

The main noise sources of the implemented TIA are amplifier and  $R_{fb}$  thermal noises, which are shown on the TIA model in Fig. 6(a) as  $I_{\text{ngm}}^2$  and  $I_{\text{nR}}^2$ , respectively.

The power spectral densities (PSDs) can be expressed as

$$I_{\text{ngm}}^2 = 4kT\gamma gm \quad (7)$$

$$I_{\text{nR}}^2 = \frac{4kT}{R_{fb}} \quad (8)$$

where  $\gamma$  is the noise excess factor of the transistor. It must be noted that the noise generated by the output resistance ( $R_o$ ) of the inverter can be omitted, since its PSD is a factor of  $A$  smaller than the  $gm$  PSD.

In order to simplify the equivalent output noise calculation, we can split the  $I_{\text{nR}}^2$  noise source as proposed in [8] [Fig. 6(b)]. The output squared noise can be expressed as

$$V_{\text{nR}}^2 = \int_0^{\infty} I_{\text{nR}}^2 |Z_t + Z_o|^2 \delta f \quad (9)$$

$$V_{\text{ngm}}^2 = \int_0^{\infty} I_{\text{ngm}}^2 |Z_o|^2 \delta f \quad (10)$$

$$V_{\text{nout}}^2 = V_{\text{nR}}^2 + V_{\text{ngm}}^2 \quad (11)$$

where  $Z_t$  is the transimpedance of the TIA expressed in (1), and  $Z_o$  is the output impedance of the TIA, which can be



Fig. 5. Tap sizes for different TIA bandwidths. (a) 15 GHz. (b) 25 GHz. (c) 35 GHz. (d) Signal.



Fig. 6. (a) Noise sources in TIA. (b) Equivalent split model.

calculated as

$$Z_o(s) = \frac{R_{\text{out}}(1 + sR_{\text{fb}}(C_f + C_{\text{in}}))}{1 + gmR_{\text{out}} + sD_{o1} + s^2D_{o2}} \quad (12)$$

where

$$D_{o1} = C_{\text{in}}(R_{\text{out}} + R_{\text{fb}}) + C_{\text{out}}R_{\text{out}} + C_fR_{\text{fb}}(1 + gmR_{\text{out}}) \quad (13)$$

$$D_{o2} = R_{\text{fb}}R_{\text{out}}(C_{\text{in}}C_{\text{out}} + C_{\text{in}}C_f + C_fC_{\text{out}}). \quad (14)$$

Thus, for any \$gm\$ and \$R\_{\text{fb}}\$, we can calculate \$V\_{\text{nout}}\$ and find the \$\text{SNR}\_{\text{wc}}\$ under worst case ISI condition as follows:

$$\text{SNR}_{\text{wc}} = \frac{V_{\text{Signal}}^2}{V_{\text{nout}}^2}. \quad (15)$$

Fig. 7 shows \$\text{SNR}\_{\text{wc}}\$ for different TIA bandwidths. The figure clearly demonstrates that the optimum \$\text{SNR}\_{\text{wc}}\$ corresponds to roughly 0.4 times the data-rate [25 GHz for 64 Gb/s



Fig. 7.  $SNR_{wc}$  for three different bandwidths without equalization.

non-return to zero (NRZ)]. Note that  $SNR_{wc}$  given in Fig. 7 is normalized to  $1-\mu A$  input. One can easily calculate the optical sensitivity by using the equation

$$Sens = dBm \left( \frac{2N}{\sqrt{SNR|_{1\mu A} Res}} \right) \quad (16)$$

where  $N$  is the number of standard deviations ( $\sigma$ ) required to reach certain bit-error-rate (BER) (for  $10^{-12}$  BER  $N = 7$ ), and  $Res$  is the responsivity of the PD in terms of  $\mu A/W$ .

### C. Equalization

The SNR of an optical RX can be further improved by lowering the bandwidth (which increases dc gain) below 0.4 times the data-rate and using equalization techniques to recover the bandwidth loss, provided that the noise added by the equalizer is small enough. For instance, in [9], a continuous time linear equalizer (CTLE) is cascaded to a low bandwidth TIA to achieve a sensitivity of  $-8$ -dBm OMA with a PD of  $0.45$  A/W at  $25$  Gb/s. When projected to  $64$ -Gb/s design, this approach has several shortcomings: 1) at high-speed pushing, the CTLE bandwidth to high frequency is power hungry and often requires the use of passive inductors, which are undesired in compact multi-channel optical RX; 2) at high frequency, the CTLE has limited capability to compensate for the multiple poles in the signal path; and 3) CTLE amplifies high-frequency noise as well as generating noise itself degrading SNR compared with an ideal equalizer where only ISI is cancelled.

In [10] and [11], a low bandwidth front end is combined with double sampling and dynamic offset modulation to achieve  $-4.7$ -dBm sensitivity at  $24$  Gb/s and  $-6.8$ -dBm sensitivity at  $25$  Gb/s, respectively. Reference [10] uses a resistor to convert the photo-current into a voltage, while in [11], the resistor is replaced by an SFR TIA leading to higher sensitivity. The drawback of this technique at higher speeds is the difficulty of driving the sample and hold capacitors. Moreover, transient effects, such as kickback noise, are expected to become more and more important as the timing between the two consecutive samples shrinks.



Fig. 8. Maximum SNR for  $n$ -tap DFE.

DFE is another well-known technique, which has the capability to remove postcursor ISI with small or no noise penalty. Infinite-impulse-response (IIR) DFE approximates a long tail of the pulse response using a passive  $RC$  circuit as feedback filter, and subtracts the approximated tail from the input signal, removing all the postursors with a single tap. However, the pulse response characteristics of the passive  $RC$  network should match those of the TIA for accurate ISI cancellation. This restricts the design of the TIA to first-order circuits. A more important limitation comes from the requirement that the total feedback delay must be less than 1 unit-interval (UI). References [4] and [12] implemented this technique to achieve  $-5$ -dBm sensitivity at  $9$  Gb/s and  $-5.8$ -dBm sensitivity at  $20$  Gb/s, respectively. DFE can also be implemented with an FIR filter in the feedback path. Assuming that the first  $m$  postursors are equalized by the DFE, the signal after equalization can be expressed as follows:

$$V_{Signal_{Eq}} = V_{Tap(0)} - |V_{Tap(-1)}| - \sum_{n=m+1}^{\infty} |V_{Tap(n)}|. \quad (17)$$

Fig. 8 shows SNR plots versus TIA bandwidth for different number of DFE taps based on the model provided earlier. The figure shows the maximum achievable SNR and the required TIA bandwidth. Without equalization, the optimal bandwidth is  $20$ – $25$  GHz. A 1-tap DFE lowers the TIA bandwidth for maximum SNR down to  $15$  GHz while improving the SNR by approximately a factor of  $\sqrt{2}$ , which corresponds to a  $1.5$ -dB sensitivity improvement. Additional DFE taps provide marginal SNR gain, while increasing the power cost of the underlying circuit implementation.

Similar analysis can be found in [4] and [13] leading to similar conclusions. However, the circuit model in [4] omits parasitic feedback capacitance ( $C_f$ ) and isolates the output node from feedback resistor  $R_{fb}$  via an ideal buffer neglecting the loading effect of  $R_{fb}$ . Moreover, in both publications, the zero and pole locations of noise transfer function were set based on certain assumptions. Although the analysis provides good insight for certain conditions, it does not derive the complete analytical solution. In our approach, we deduce



Fig. 9. Top-level schematic of the RX.



Fig. 10. TIA schematic. (a) Proposed. (b) Replica.

all the equations from any given set of parameters without simplifying the TIA model, which covers a larger design space.

DFE can be implemented either as direct feedback or as speculative DFE. On the one hand, direct feedback DFE enables many taps to be equalized with relatively low complexity. But the feedback loop delay still needs to be less than 1 UI. On the other hand, with speculative DFE, complexity grows exponentially with the number of taps, yet the timing restriction can be relaxed using certain techniques as will be explained in Section III. Since more than 1-tap DFE gives only marginal advantage in terms of SNR while increasing circuit complexity and power consumption significantly, we decided to use a 1-tap speculative DFE.

### III. ARCHITECTURE AND CIRCUITS

The block diagram of the RX architecture is shown in Fig. 9. At the input of the RX, the average photo-current is cancelled via a 12-bit current DAC, whose control bits are set off-chip. During measurements, the input of the current DAC is swept to find the value that minimizes the offset at the output of the AFE. The ac portion of the signal is converted into a voltage signal by the TIA and amplified by a VGA afterward. The amplified signal is sampled by four-way time-interleaved slicers to generate 1-tap speculative

decisions together with edge information for baud-rate CDR (CDR loop is not included in this design and will be added in the future versions). The total number of slicers driven by the AFE is 12. After that, the speculative decisions are aligned to a single quarter-rate clock and are resolved by the look-ahead DFE logic. After being demultiplexed from quarter-rate to 1/32 rate, 32 final decisions are fed to an on-chip pseudo random bit sequence (PRBS) checker, which is synchronized to the C32 clock (2 GHz) to calculate BER. The quarter-rate clock phases  $\phi_{0,90,180,270}$  are generated through a broadband IQ generator, which is driven by a quarter-rate clock generated off-chip.

#### A. TIA

The schematic of the proposed SFR TIA is given in Fig. 10(a). The feedback path is composed of a 1.1-kΩ resistor and NMOS transistors in parallel to adjust the equivalent resistance down to 250 Ω. Since the signal swing is small, the transistors stay in linear region behaving as linear resistances. This approach reduces the parasitic capacitance as compared with a solution, which consists of an array of passive resistors with switches. This is because typically passive resistors have larger parasitics than transistors. More-



Fig. 11. Bandwidth extension with series inductance. (a) Frequency response. (b) Pulse response.



Fig. 12. Self-referenced TIA transient pulse response.

over, the switches need to be large enough to minimize the ON-resistance, further increasing the area and capacitive load.

A series inductance is added to extend the bandwidth of the TIA. TIA transimpedance as a function of the series peaking inductance value is shown in Fig. 11(a) together with the pulse response in Fig. 11(b). A 400-pH inductance provides a maximally flat response, which corresponds to minimum group delay.

In this design, the input node of the TIA is used as the negative output ( $V_{out_n}$ ) to serve as a differential signal to the output node of the TIA ( $V_{out_p}$ ), rather than placing a replica TIA to generate a reference voltage [14], as shown in Fig. 10(b). As a result, the transimpedance gain becomes  $R_{fb}$  instead of  $R_{fb}[A_{eq}/(A_{eq} + 1)]$ , where  $A_{eq}$  is equal to  $gm(R_{fb} \parallel R_{out})$ . This improvement is shown in Fig. 12. Both single-ended outputs and the differential voltage ( $V_{outp} - V_{outn}$ ) are given in Fig. 12. Note that  $(V_{outp} - V_{outn})$  is shifted to the right in order to match the sampling points for better comparison of the two cases. The main cursor [ $V_{Tap(0)}$ ] of  $V_{outp} - V_{outn}$  is larger than the main cursor of the single-ended output ( $V_{outp}$ ), whereas  $V_{Tap(-1)}$  and  $V_{Tap(2)}$  are the same. Since  $V_{Tap(1)}$  will be equalized by DFE, the increase in this ISI term does not degrade signal integrity.

Another advantage of using the self-referenced TIA is that it generates less noise compared with the replica design.



Fig. 13. Self-referenced TIA NSD.

In Fig. 13, three noise spectral densities (NSDs) are given. The red solid line is the NSD of the TIA with a replica with no filtering capacitor ( $C_{FLT}$ ) at its output. Adding a 600 fF of  $C_{FLT}$  shapes the NSD as indicated by the blue dotted curve. The green dashed curve is the NSD of the proposed self-referenced TIA. There are two main reasons for the reduction in noise. First one is that there is no replica to generate noise. Note that the replica generates as much noise as the TIA itself increasing the integrated noise by a factor of  $\sqrt{2}$ . High-frequency noise of the replica TIA can be filtered out by using a large capacitance at the output node. However, this would prevent the replica TIA from tracking the main TIA behavior for high-frequency supply disturbances compromising power supply rejection ratio (PSRR). The second reason for noise reduction is that in self-referenced TIA, the low-frequency noise components of the transistors are converted into common-mode noise. This explains why no flicker noise is observed in the NSD of the self-referenced TIA illustrated in Fig. 13.

To investigate the PSRR of the self-reference TIA, it is critical to separate the input and output capacitances connected



Fig. 14. Self-referenced TIA schematic for PSRR analysis.

to either VDD or GND, as shown in Fig. 14. It is easy to deduce that the currents  $i_i$  and  $i_o$  become zero if the following condition is met:

$$\frac{C_{i1}}{C_{i2}} = \frac{gm_p}{gm_n} = \frac{C_{o1}}{C_{o2}}. \quad (18)$$

In that case, the current through the feedback resistor becomes zero, resulting in perfect cancellation of power supply ripple. In our implementation, the TIA drives a VGA whose input stage consists of a CMOS inverter with equal-sized PMOS and NMOS transistors. Thus, the TIA output capacitance is divided equally between VDD and GND ( $C_{o1} = C_{o2}$ ). Also the PMOS and NMOS transistors that compose the CMOS inverter of the TIA are sized equally, which matches the two transconductances in this technology ( $gm_p = gm_n$ ). Dividing the input capacitance equally between VDD and GND is more challenging. It consists of three parasitic capacitances. The first one is the  $C_{gs}$  of the transistors, which is already split equally between GND and VDD due to sizing of the transistors. The second parasitic capacitance at the input node is the pad capacitance. In general, this capacitance couples the pad to substrate (connected to GND), creating an imbalance between  $C_{i1}$  and  $C_{i2}$ . One solution to circumvent this problem is to add a power grid below the pad in the lowest metal layer to couple the pad equally to VDD and GND. In the used 13-level metal stack, this corresponds to a pad capacitance increase of approximately 5%, which has no impact on sensitivity. The last portion of the input capacitance comes from the PD. This capacitance is coupled to the supply voltage of the PD outside the chip and cannot be balanced as required for perfect PSRR. However, it is decoupled from the TIA input by both the bond wire and peaking inductances at high frequencies. On the other hand, in a replica TIA design, the PD capacitance and the bond wire inductance also creates an imbalance unless a dummy PD is placed in the packaging, which may not be desirable in practical applications, since it would increase both the cost of packaging and the pitch width of the multi-channel design. The PSRR simulations results of self-referenced TIA and replica TIA are compared in Fig. 15. The worst PSRR performances of all process corners for both cases were also provided in the plot. As expected, the worst case was slow-NMOS-fast-PMOS corner (fast-NMOS-slow-PMOS performs only slightly better), since it is the corner that degenerates the  $gm_p/gm_n$  ratio the most.

To summarize, the proposed self-referenced TIA provides larger swing, lower noise, and similar PSRR as compared with



Fig. 15. PSRR comparison of self-referenced TIA and replica TIA.

a replica TIA while consuming half the power and layout area. Moreover, the TIA has zero offset by design.

### B. VGA

The high losses in the optical path may result in very small current signals on the PD. As an example, a  $-12\text{-dBm}$  OMA signal on a  $0.5\text{-A/W}$  responsivity PD corresponds to a  $32\text{-}\mu\text{A}_{\text{app}}$  photo-current. This signal is converted into a voltage signal with a dc gain of around  $700\ \Omega$ , resulting in  $22.4\text{ mV}$  at the output of the TIA. ISI further reduces the signal down to  $10\text{--}15\text{ mV}$ . Moreover, as explained in Section II-A, the capacitive load at the output node of the TIA must be low, which means the slicers, creating approximately  $100\text{-fF}$  load, cannot be driven by TIA directly.

In order to amplify the signal and drive the slicers, a VGA was designed and placed after TIA. The schematic of the VGA is given in Fig. 16. It consists of two transadmittance transimpedance (TAS-TIS) stages. This structure is also known as the Cherry-Hooper amplifier in literature [15]. The first TAS-TIS stage is a CMOS-based design to match the common-mode output of the TIA, which is around  $VDD/2$ . It must be noted that the voltage gain on the TAS is very small (around 1) due to the low input impedance of the TIS stage. This reduces the Miller effect on the  $C_{gd}$  of the input transistors, minimizing the total input capacitance. The dc gain of the first stage is  $gm_1 R_{f1}$  and can be controlled by changing  $R_{f1}$ .

The output common mode of the first stage is adjusted to match the input common-mode requirements of the second stage by injecting current into the input of the TIS stage creating a voltage drop on the feedback resistors  $R_{f1}$ . The output signal of the first VGA stage is still pseudo differential. That is, the TIA outputs  $Vout_n$  and  $Vout_p$  are amplified separately (by the same gain), resulting in a larger swing in  $Y_p$ . As a result, the formal definition of the output common mode ( $Y_p + Y_n)/2$ ) is not a constant signal. Instead, the output common mode is sensed from the low swing output node via a low-pass filter, as shown in Fig. 16.

The second TAS-TIS is current mode logic (CML)-based and converts the pseudo differential signal at its input into a fully differential signal at its output. The input is connected to two differential pairs. The inner pair is sized at a quarter of



Fig. 16. VGA schematic.



Fig. 17. Slicers and look-ahead DFE block diagram.

the outer pair. And the current generated by the inner pair is multiplied by two on the NMOS mirrors to double the transconductance provided by the outer pair. Compared with a standard CML stage, the effective transconductance increases by a factor of 2, whereas the power consumption and input capacitance increase only by a factor of 1.25. The resistor  $R_p$  extends the bandwidth of the VGA. In the TIS, both NMOS and PMOS differential pairs contribute to gain by reusing the tail current, which minimizes the power consumption. The gain of the second TAS-TIS can be controlled both by changing the feedback resistors  $R_{f2}$  and by changing the tail current  $I_{SS}$ .

In nominal settings, the VGA provides a gain of around 20 dB with a bandwidth of 20 GHz while driving a load of 100 fF. The total power consumption of the VGA is 21 mW and its input referred noise is around 350  $\mu\text{V}_{\text{rms}}$ . Since the TIA output integrated noise is 820  $\mu\text{V}_{\text{rms}}$ , the VGA noise reduces the SNR by 10%.

### C. Speculative DFE

In conventional speculative DFE implementations, the following timing requirement should be met [16], [17]:

$$t_{c2q} + t_{mux} + t_{setup} < 1UI \quad (19)$$

where  $t_{c2q}$  is the clock-to- $Q$  delay,  $t_{mux}$  is the mux delay time, and  $t_{setup}$  is the setup time of the latch. In addition to that, in a quarter-rate design, the  $t_{c2q}$  of the comparator should be smaller than 2 UI, which comes as an additional timing constraint. By moving the DFE into the digital domain, both of those problems can be avoided at the cost of increased circuit complexity and power consumption.

In Fig. 17, the block diagram of the implemented 1-tap DFE is given. The differential input ( $V_{p,n}$ ) is sampled by the comparators, which are driven by quarter-rate clocks. After that, the signals are aligned to a single quarter-rate clock



Fig. 18. Comparator schematic.



Fig. 19. Clock path. (a) Block diagram. (b) Quadrature corrector stage.



Fig. 20. RX micrograph and layout.

and the look-ahead DFE [1], [18] resolves the speculation. Then, look-ahead signals  $L_H(n)$  and  $L_L(n)$  are calculated from the speculative decisions  $D_H(n)$  and  $D_L(n)$ . Finally,  $D(3)$  resolves the speculation. The dependence of each bit to the previous bit is broken in the new speculative array, which results in a relaxed timing constraint of

$$t_{c2q} + t_{mux} + t_{setup} < 4UI. \quad (20)$$

All the digital logic up to  $L_H(n)$  and  $L_L(n)$  is feed-forward and can be pipelined to meet the timing. In our application,



Fig. 21. Measurement setup.

a two-stage pipeline was required to close timing. The clock is also feed-forwarded to enable a deeper logic between the two flip-flops. In  $RC$  extracted simulations, the look-ahead DFE logic was functional up to 85 Gb/s at 800-mV supply. Therefore, DFE is not limiting the data-rate of the optical RX.

The schematic of the comparator is given in Fig. 18. The first stage consists of two differential pairs whose sources are connected directly to VDD. The clock transistors are connected as cascode as in a Lewis–Grey comparator [19]. Compared with a conventional dynamic comparator, where



Fig. 22. Pulse response at 64 Gb/s and frequency response.



Fig. 23. Eye diagrams. (a) 36 Gb/s. (b) 56 Gb/s. (c) 64 Gb/s. (d) DFE equivalent eye at 64 Gb/s.

the clock transistors are located on the tail of the input pair, this topology enables to increase the common mode for the same CK-Q delay. The nominal common mode of this latch is 350 mV, which matches well with the output common mode of the AFE. The second stage is a self-timed latch with cross-coupled inverters. The threshold of each comparator is controlled via a 9-bit voltage DAC (VDAC) whose control bits are stored in on-chip registers. The VDAC covers a range of  $\pm \text{VDD}/2$  with a resolution of  $\text{VDD}/512$ . In order to minimize the layout area, the resistor string network of the VDAC is realized using the parasitic resistances of low-level metals [17].

The integral non-linearity of a test VDAC was measured to be below  $\pm 1$  LSB ( $\pm 1.75$  mV).

#### D. Clocking

Fig. 19(a) shows the block diagram of the clock path. The optical RX receives a quarter-rate differential clock (Ckp–Ckn). After being buffered by a CML driver, it is converted into a pseudo differential CMOS clock by a CML-to-CMOS converter. Then, coarse estimates of I and Q clocks are generated via CMOS delay stages.

These estimated I and Q clocks pass through three cascaded quadrature corrector stages to reduce the I-Q error gradually. Each quadrature corrector stage consists of a two-stage differential injection locked oscillator with four injection points, as given in Fig. 19(b) [see [20]]. The inverters labeled A, B, and C in Fig. 19(b) have different driving strengths. Specifically, the ratio C/B plays an important role in the oscillator characteristics. Too large or too small C/B ratio may cause latching instead of oscillating. The size of A determines the tradeoff between locking range and IQ error correction coefficient. As A gets stronger, the locking range increases, but the ability to correct the IQ error drops. In our implementation, the sizes of A, B, and C were 1, 4, and 5, respectively.

The locking range of the IQ-generator is extended by the additional capacitors  $C_S$  connected to differential nodes of the oscillator. Those capacitors reduce the natural oscillation frequency of the injection-locked oscillator for lower data-rates. The clock path covers a range of 22–67 Gb/s.

#### IV. MEASUREMENT RESULTS

The RX, implemented in 14-nm bulk FinFET technology, was wirebonded to a custom PCB for testing. The surface-illuminated GaAs p-i-n diode has a diameter of 16  $\mu\text{m}$ , 69-fF capacitance, and 0.52-A/W responsivity. The bandwidth of the PD is around 25 GHz [21].

The micrograph and layout of the optical RX are given in Fig. 20. The active area of the RX is around 150  $\mu\text{m} \times 190\mu\text{m}$  thanks to the use of only a single inductor. The small size of the optical RX enables it to be used in multi-channel designs.

The measurement setup is given in Fig. 21. A 56-Gb/s bit pattern generator overclocked to 64-Gb/s drives a SiGe driver with 2-tap FFE (one precursor and one main cursor with a ratio of around 0.45). The SiGe chip modulates a high-speed 850-nm VCSEL, which is connected to a 7-m OM2 MM fiber. The performance characteristics of the VCSEL and SiGe driver can be found in [6]. An optical attenuator is connected in the signal path for sensitivity measurements. OMA is calculated as follows:

$$\text{OMA} = 2 \frac{Av_{\text{cur}}}{Res} \frac{ER - 1}{ER + 1} \quad (21)$$

where ER is the extinction ratio (measured:  $ER = 1.8$ ), Res is the responsivity of the PD ( $Res = 0.52$ ), and  $Av_{\text{cur}}$  is the average current of the PD.

The RX is driven by a differential quarter-rate clock. TX and RX clocks are phase-locked together by a high-precision 1-GHz reference signal. A digital PRBS checker and correlator engine running at 2 GHz (1/32 of the baud-rate) is integrated on the chip to assist with measurements. Measurement results stored on on-chip registers are then transmitted off-chip via a three-wire serial interface.

The pulse response of the link at the output of AFE is measured by applying a repeated 8-bit pattern of “10 000 000” to the optical TX. Note that, this pulse response includes all the bandwidth limitations of VCSEL, PD, and AFE of RX as well as the FFE at the TX. The correlator is configured to count the ratio of “1”s to the total number of bits received



Fig. 24. BER versus DFE slicing level and phase contour plot at 60 Gb/s.

TABLE I  
RX POWER BREAKDOWN AT 64 Gb/s

|            | VDAH (1V) | VDAL (0.9V) | VDD (0.9V) | Energy (fJ/b) |
|------------|-----------|-------------|------------|---------------|
| TIA        | 3.5 mA    |             |            | 55            |
| VGA        | 21.5 mA   |             |            | 335           |
| 12 SLICERS |           | 18 mA       |            | 253           |
| CLK BUF    |           | 25 mA       |            | 351           |
| ALIGNER    |           | 9.7 mA      |            | 136           |
| VDACs      |           | 1 mA        |            | 14            |
| DFE logic  |           |             | 12.3 mA    | 172           |
| DMUX 4:32  |           |             | 7.1 mA     | 100           |
| Total      | 25 mA     | 53.7 mA     | 19.4 mA    | 1416          |

from a single comparator. The measurement is repeated by changing the threshold of the comparator via VDAC and the ratios of “1”s are recorded for each point. When combined together, the recorded points give us a cumulative distribution function (cdf) with respect to the threshold voltage at a certain phase. Taking the derivative of the cdf will give a probability distribution function (pdf), whose mean value corresponds to the signal level and whose standard deviation is the rms noise at that point. The pdf measurement is repeated by stepping the phase of the RX external clock source. Fig. 22 shows the single bit response of the link at 64 Gb/s and the corresponding transfer function. The 3-dB bandwidth matches very well with the optimum value shown in Fig. 8 for 1-tap DFE. It must be noted that this pulse response is the combined response of all elements in the data-path, including SiGe driver, VCSEL, PD, and RX-AFE.

The eye diagrams are found by applying the same procedure described earlier with a PRBS-7 input. The results are given in Fig. 23. At 36 Gb/s, the signal is not affected by ISI and the RX operates error free without DFE [Fig. 23(a)]. As the data-rate increases, ISI starts to close the unequalized eye but the DFE speculative eyes are still open up to 64 Gb/s [see Fig. 23(c) and (d)]. The measured eye diagram matches very well to the RC extracted simulation, as shown in Fig. 23(b).

Fig. 24 shows the BER contour plot at  $10^{-9}$ .



Fig. 25. Bathtub curves. (a) 56 Gb/s. (b) 64 Gb/s.

TABLE II  
PERFORMANCE SUMMARY

|                          | [22]*      | [14]**         | [23]        | [6]*          | This Work         |
|--------------------------|------------|----------------|-------------|---------------|-------------------|
| Technology               | 65-nm CMOS | 32-nm CMOS SOI | 28-nm FDSOI | 130-nm BiCMOS | 14-nm FinFET Bulk |
| Data-Rate (Gb/s)         | 25–28      | 25             | 32          | 71            | 64                |
| Slicer Included          | no         | yes            | yes         | no            | yes               |
| Efficiency (pJ/b)        | 4.9        | 4.4            | 0.15        | 12            | 1.4               |
| Sensitivity (dBm OMA)*** | -9.7       | -10.9          | -8.8        | -4.2          | -5.5              |
| PD capacitance (fF)      | NA         | NA             | 120         | 61            | 69                |
| PD responsivity (A/W)    | 0.8        | 0.5            | 0.9         | 0.49          | 0.52              |
| Area ( $\text{mm}^2$ )   | 0.32       | 0.06           | 0.003       | NA            | 0.028             |

\*RX includes full rate output driver

\*\*RX includes burst mode CDR

\*\*\*Sensitivity reported at maximum rate



Fig. 26. Sensitivity and eye opening at -5-dBm OMA versus data-rate.

Fig. 25(a) and (b) shows BER bathtub curves measured with a PRBS-7 sequence at 56 and 64 Gb/s, respectively, at -5- and -8-dBm OMA. The sensitivity of the RX, which is defined as the minimum optical power which satisfies a BER of  $<10^{-12}$  at least in one point on the bathtub curve, is found for the data-rates between 32 and 64 Gb/s, as shown in Fig. 26, together with the eye opening at -5-dBm OMA. The sensitivity at 32, 56, and 64 Gb/s are -13, -9, and -5.5 dBm, respectively. The measured energy efficiency

is 1.4 pJ/b at 64 Gb/s. The detailed power breakdown is given in Table I.

The performance of the presented RX is compared with other high-speed optical RXs in Table II. To the best of our knowledge, this design is two times faster than previously published CMOS optical RXs and comparable to the fastest SiGe design while consuming less energy. Furthermore, it has better sensitivity than the state-of-the-art CMOS RXs at the same data-rate.

## V. CONCLUSION

A power-efficient NRZ optical RX has been integrated using the 14-nm FinFET bulk technology and characterized up to 64 Gb/s. High sensitivity is achieved by combining a low-bandwidth SFR TIA with a 1-tap speculative DFE. A methodology to optimize an inverter-based SFR TIA for maximum SNR was presented. Moreover, the SNR improvement versus the number of DFE taps was studied. The conclusion of this paper is that 1-tap DFE improves SNR by 3 dB, while keeping the speculative DFE logic relatively simple and low power. Any further increase in the number of taps provides only marginal SNR improvement, whereas the DFE logic complexity and power consumption increase exponentially due to speculation. The findings were also confirmed by the transistor-level circuit simulation. Thanks to the techniques presented in this paper combined with the use of advanced

CMOS technology, we were able to achieve 64-Gb/s data-rate with 1.4-pJ/b energy efficiency while occupying 0.028-mm<sup>2</sup> silicon area.

## VI. ACKNOWLEDGMENT

The research leading to these results has been supported in part by the European Union's Seventh Framework Programme (FP7/2007–2013) in the ADDAPT project under grant agreement 619197.

## REFERENCES

- [1] T. Shibasaki *et al.*, “A 56-Gb/s receiver front-end with a CTLE and 1-tap DFE in 20-nm CMOS,” in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2014, pp. 1–2.
- [2] N. Tracy and T. Wuth. *OIF Next Generation Interconnect Framework*, Accessed on Apr. 2017. [Online]. Available: <http://www.oiforum.com/wp-content/uploads/OIF-FD-Client-400G-1T-01.0-1.%pdf>
- [3] ISSCC. (2016) *Trends*. [Online]. Available: [http://isscc.org/doc/2016/ISSCC2016\\_TechTrends.pdf](http://isscc.org/doc/2016/ISSCC2016_TechTrends.pdf)
- [4] A. Sharif-Bakhtiar and A. C. Carusone, “A 20 Gb/s CMOS optical receiver with limited-bandwidth front end and local feedback IIR-DFE,” *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2679–2689, Nov. 2016.
- [5] E. G. Colgan *et al.*, “Direct integration of dense parallel optical interconnects on a first level package for high-end servers,” in *Proc. Electron. Compon. Technol. (ECTC)*, vol. 1. May 2005, pp. 228–233.
- [6] D. M. Kuchta *et al.*, “A 71-Gb/s NRZ modulated 850-nm VCSEL-based optical link,” *IEEE Photon. Technol. Lett.*, vol. 27, no. 6, pp. 577–580, Mar. 15, 2015.
- [7] B. Razavi, *Design of Integrated Circuits for Optical Communications*, 1st ed. New York, NY, USA: McGraw-Hill, 2003.
- [8] F. Y. Liu *et al.*, “10-Gbps, 5.3-mW optical transmitter and receiver circuits in 40-nm CMOS,” *IEEE J. Solid-State Circuits*, vol. 47, no. 9, pp. 2049–2067, Sep. 2012.
- [9] K. Yu *et al.*, “25 Gb/s hybrid-integrated silicon photonic receiver with microring wavelength stabilization,” in *Proc. Opt. Fiber Commun. Conf. Exhib. (OFC)*, Mar. 2015, pp. 1–3.
- [10] M. H. Nazari and A. Emami-Neyestanak, “A 24-Gb/s double-sampling receiver for ultra-low-power optical communication,” *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 344–357, Feb. 2013.
- [11] S. Saeedi and A. Emami, “A 25Gb/s 170  $\mu$ W/Gb/s optical receiver in 28 nm CMOS for chip-to-chip optical communication,” in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, Jun. 2014, pp. 283–286.
- [12] J. Proesel, A. Rylyakov, and C. Schow, “Optical receivers using DFE-IIR equalization,” in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2013, pp. 130–131.
- [13] A. Sharif-Bakhtiar, M. G. Lee, and A. C. Carusone, “Low-power CMOS receivers for short reach optical communication,” in *Proc. IEEE Custom Integr. Circuits Conf.*, May 2017, pp. 1–8.
- [14] A. Rylyakov *et al.*, “22.1 A 25 Gb/s burst-mode receiver for rapidly reconfigurable optical networks,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [15] E. M. Cherry and D. E. Hooper, “The design of wide-band transistor feedback amplifiers,” *Proc. Inst. Electr. Eng.*, vol. 110, no. 2, pp. 375–389, Feb. 1963.
- [16] D. Z. Turker, A. Rylyakov, D. Friedman, S. Gowda, and E. Sanchez-Sinencio, “A 19 Gb/s 38 mW 1-tap speculative DFE receiver in 90 nm CMOS,” in *Proc. Symp. VLSI Circuits*, Jun. 2009, pp. 216–217.
- [17] P. A. Francese *et al.*, “23.6 A 30 Gb/s 0.8 pJ/b 14 nm FinFET receiver data-path,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan. 2016, pp. 408–409.
- [18] K. K. Parhi, “Design of multigigabit multiplexer-loop-based decision feedback equalizers,” *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 13, no. 4, pp. 489–493, Apr. 2005.
- [19] T. B. Cho and P. R. Gray, “A 10 b, 20 Msample/s, 35 mW pipeline A/D converter,” *IEEE J. Solid-State Circuits*, vol. 30, no. 3, pp. 166–172, Mar. 1995.
- [20] K.-H. Kim *et al.*, “A 20-gb/s 256-mb DRAM with an inductorless quadrature PLL and a cascaded pre-emphasis transmitter,” *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 127–134, Jan. 2006.
- [21] N. Dupuis *et al.*, “Exploring the limits of high-speed receivers for multimode VCSEL-based optical links,” in *Proc. OFC*, Mar. 2014, pp. 1–3.



**Ilter Ozkaya** (S'16) received the B.Sc. degree in electronics engineering from Middle East Technical University, Ankara, Turkey, in 2007, and the M.Sc. degree in electronics engineering from Istanbul Technical University, Istanbul, Turkey, in 2010. He is currently pursuing the Ph.D. degree with the Swiss Federal Institute of Technology, Lausanne, Switzerland.

He joined IBM Research–Zurich, Rüschlikon, Switzerland, in 2014, where he has been conducting research on analog circuit design for high-speed IO links. His current research interests include high-speed optical and electrical communications and mixed signal circuit design.



**Alessandro Cevrero** (M'16) received the M.Sc. degree in nanotechnology and the Ph.D. degree in electrical engineering from the Swiss Federal Institute of Technology, Lausanne, Switzerland, in 2007 and 2014, respectively.

He joined IBM Research–Zurich, Rüschlikon, Switzerland, in 2012, where he has been involved in analog circuit design and silicon validation of high-speed energy-efficient I/O links in advanced CMOS technologies. He has authored or co-authored over 35 technical publications in his research areas. His current research interests include high speed analog circuit design, 3-D integration, and semiconductor manufacturing.



**Pier Andrea Francese** (M'01–SM'17) received the degree in electrical engineering (*cum laude*) from the Politecnico di Milano, Milan, Italy, and the Ph.D. degree from the Swiss Federal Institute of Technology, Zürich, Switzerland, in 1993 and 2005, respectively.

He worked in the field of IC product development for Teradyne, Milan, Philips Semiconductors, Milan, and National Semiconductor, Munich, Germany. In 2010, he joined the IBM Research–Zurich, Rüschlikon, Switzerland, where he develops

circuits for energy-efficient high-speed I/O links in advanced CMOS technologies.



**Christian Menolfi** (S'97–M'99) received the Dipl.Ing. degree and the Ph.D. degree in electrical engineering from the Swiss Federal Institute of Technology (ETH), Zürich, Switzerland, in 1993 and 2000, respectively.

From 1993 to 2000, he was with the Integrated Systems Laboratory, ETH Zürich, as a Research Assistant, where he was involved in highly sensitive CMOS VLSI data-acquisition circuits for silicon-based microsensors. Since 2000, he has been with IBM Research–Zurich, Rüschlikon, Switzerland, where he is involved in the design of multi-gigabit low-power communication circuits in advanced CMOS technologies.



**Thomas Morf** (S'89–M'90–SM'09) received the B.S. degree from the Zürich University of Applied Science, Zürich, Switzerland, in 1987, the M.S. degree in electrical and computer engineering from the University of California at Santa Barbara, Santa Barbara, CA, USA, in 1991, and the Ph.D. degree from the Swiss Federal Institute of Technology (ETH), Zürich, in 1996.

From 1996 to 1999, he led a research group in the area of InP-HBT circuit design and technology also with ETH Zürich. In 1999, he joined IBM Research – Zurich, Rüschlikon, Switzerland. He has co-authored over 150 papers and is co-inventor of over 30 issued patents. His current research interests include ESD circuit protection, electrical and optical high-speed high-density interconnects, and THz antennas and detectors.



**Matthias Brändli** received the Dipl.Ing. (M.Sc.) degree in electrical engineering from the Swiss Federal Institute of Technology (ETH), Zürich, Switzerland, in 1997.

From 1998 to 2001, he was with the Integrated Systems Laboratory, Swiss Federal Institute of Technology, working on deep-submicron technology VLSI design challenges, digital video image processing for biomedical applications, and testability of CMOS circuits. In 2001, he joined the Microelectronics Design Center, ETH Zürich, where

he was involved in numerous digital and mixed-signal ASIC design projects, worked on EDA design automation, and contributed to teaching. In 2008, he joined the IBM Research–Zurich, Rüschlikon, Switzerland, where he has been involved in multi-gigabit/s and low-power communication circuits in advanced CMOS technologies.



**Daniel M. Kuchta** (SM'97) received the B.S., M.S., and Ph.D. degrees in electrical engineering and computer science from the University of California at Berkeley, Berkeley, CA, USA, in 1986, 1988, and 1992, respectively.

He subsequently joined IBM at the Thomas J. Watson Research Center, Yorktown Heights, NY, USA, where he has worked on high-speed VCSEL characterization, multimode fiber links, and parallel fiber optic link research. He is currently a Research Staff Member with the Communications and Computation Subsystems Department at IBM T. J. Watson. He is an author/co-author of over 135 technical papers and inventor/co-inventor of at least 20 patents.

the Communications and Computation Subsystems Department at IBM T. J. Watson. He is an author/co-author of over 135 technical papers and inventor/co-inventor of at least 20 patents.



**Lukas Kull** (S'10–M'14–SM'17) received the M.Sc. degree in electrical engineering from the Swiss Federal Institute of Technology (ETH), Zürich, Switzerland, in 2007, and the Ph.D. degree from the Swiss Federal Institute of Technology, Lausanne, Switzerland, in 2014.

He joined IBM Research–Zurich, Rüschlikon, Switzerland, in 2010, where he has been involved in analog circuit design for high-speed low-power ADCs. He has authored or co-authored over 20 patents and 40 technical publications in his research areas. His current research interests include analog circuit design, hardware for cognitive workloads, IR and THz imaging.



**Christian W. Baks** received the B.S. degree in applied physics from Fontys College of Technology, Eindhoven, The Netherlands, in 2000, and the M.S. degree in physics from the State University of New York, Albany, NY, USA, in 2001.

He joined the IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, as an Engineer in 2001, where he is involved in high-speed optoelectronic package and backplane interconnect design specializing in signal integrity issues.



**Jonathan E. Proesel** (M'10–SM'16) received the B.S. degree in computer engineering from the University of Illinois at Urbana–Champaign, Urbana, IL, USA, in 2004, and the M.S. and Ph.D. degrees in electrical and computer engineering from Carnegie Mellon University, Pittsburgh, PA, USA, in 2008 and 2010, respectively.

He joined the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA, in 2010, where he is currently a Research Staff Member involved in analog and mixed-signal circuit design for optical transmitters and receivers. He held internships with IBM Microelectronics, Essex Junction, VT, USA, in 2004, and IBM Thomas J. Watson Research Center in 2009. His current research interests include high-speed optical and electrical communications, silicon photonics, data converters, and bioelectronics.

Dr. Proesel is a member of the IEEE Solid-State Circuits Society. He serves on the Technical Program Committee for the Symposium on VLSI Circuits. He was a recipient of the Analog Devices Outstanding Student Designer Award in 2008, the SRC Techcon Best in Session Award for Analog Circuits in 2009, and a co-recipient of the Best Student Paper Award at the 2010 IEEE Custom Integrated Circuits Conference. He has also received multiple technical awards at IBM.



**Marcel Kossel** (S'99–M'02–SM'09) received the Dipl.Ing. and Ph.D. degrees in electrical engineering from the Swiss Federal Institute of Technology, Zürich, Switzerland, in 1997 and 2000, respectively.

He joined IBM Research–Zurich, Rüschlikon, Switzerland, in 2001, where he is involved in analog circuit design for high-speed serial links. His current research interests include analog circuit design and RF measurement techniques. He has also worked in the field of microwave tagging systems and radio-frequency identification systems.



**Danny Luu** (S'17) received the B.Sc. and M.Sc. degrees in electrical engineering and information technology from the Swiss Federal Institute of Technology (ETH), Zürich, Switzerland, in 2013, where he is currently pursuing the Ph.D. degree.

He joined IBM Research–Zurich, Rüschlikon, Switzerland, in 2013, where he has been conducting research into analog circuit design for high-speed, high-resolution, and low-power ADCs in collaboration with ETH Zürich toward his Ph.D. degree.



**Benjamin G. Lee** (M'04–SM'14) received the B.S. degree from Oklahoma State University, Stillwater, OK, USA, in 2004, and the M.S. and Ph.D. degrees from Columbia University, New York, NY, USA, in 2006 and 2009, respectively, all in electrical engineering.

In 2009, he became a Post-Doctoral Researcher with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA, where he is currently a Research Staff Member. He is also an Assistant Adjunct Professor of electrical engineering with

Columbia University. His current research interests include silicon photonic devices, integrated optical switches and networks for high-performance computing systems and datacenters, and highly parallel multimode transceivers.

Dr. Lee is a member of the Optical Society and the IEEE Photonics Society. He currently serves on the Board of Governors for the Photonics Society.



**Fuad E. Doany** received the Ph.D. degree in chemical physics from the University of Pennsylvania, Philadelphia, PA, USA, in 1984.

Following a Post-Doctoral Fellowship at the California Institute of Technology, Pasadena, CA, USA, he joined the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA, in 1985. As a Research Staff Member at IBM, he worked on laser spectroscopy, applied optics, projection displays, and laser material processing. Since 2000, he has focused on high speed optical interconnects and optoelectronic packaging. He is an author or co-author of over 120 technical papers and holds over 70 U.S. patents.



**Yusuf Leblebici** (M'90–SM'98–F'10) received the B.Sc. and M.Sc. degrees in electrical engineering from Istanbul Technical University, Istanbul, Turkey, in 1984 and 1986, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 1990.

Since 2002, he has been a Chair Professor with the Swiss Federal Institute of Technology, Lausanne, Switzerland, and the Director of the Microelectronic Systems Laboratory, Lausanne. He is the co-author

of six textbooks, as well as more than 300 articles published in various journals and conferences. His current research interests include the design of high-speed CMOS digital and mixed-signal integrated circuits, computer-aided design of VLSI systems, intelligent sensor interfaces, modeling and simulation of semiconductor devices, and VLSI reliability analysis.

Dr. Leblebici has served as an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II and the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATED SYSTEMS. He was elected as a Distinguished Lecturer of the IEEE Circuits and Systems Society for 2010/2011.



**Mounir Meghelli** (M'07) received the M.S. degree in electronics and automatics from the University of Paris Orsay, Paris, France, in 1992, the Engineering degree in telecommunication from the ENST-Paris, in 1994, and the Ph.D. degree from the University of Paris VI, after a four-year research program with the CNET France Telecom Research Center, Paris, working on the design of high-speed ICs for optical communications in GaAs and InP HBT technologies.

From 1998 to 2005, he was with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA, as a Research Staff Member, where he was involved in the design of high frequency ICs in SiGe BiCMOS and CMOS technologies for wireline and wireless applications. From 2006 to 2011, he was with the IBM Server and Technology Group, where he held a position of Senior Technical Staff Member leading the design of advanced serial links for storage, networking, and server applications. He is currently managing the Mixed Signal Communication IC Design group at the IBM T. J. Watson Research Center.



**Thomas Toifl** (S'97–M'99–SM'09) received the Dipl.Ing. (M.S.) degree and the Ph.D. degree (with highest honors) in electrical engineering from the Vienna University of Technology, Vienna, Austria, in 1995 and 1999, respectively.

In 1996, he joined the European Research Center for Particle Physics, Microelectronics Group, Geneva, Switzerland, where he developed radiation-hard circuits for detector synchronization and data transmission, which were integrated in the four particle detector systems of the new Large Hadron Collider. In 2001, he joined the IBM Research–Zurich, Rüschlikon, Switzerland, where he has been working on multi-gigabit per second, low-power communication circuits in advanced CMOS technologies. In that area he has authored or co-authored 19 patents and more than 50 technical publications. Since July 2008, he manages the I/O Link Technology Group, IBM Research-Zurich.

Dr. Toifl received the Beatrice Winner Award for Editorial Excellence at the 2005 IEEE International Solid-State Circuits Conference.