

# Loop Gain Adaptation for Optimum Jitter Tolerance in Digital CDRs

Joshua Liang<sup>✉</sup>, Ali Sheikholeslami<sup>✉</sup>, Senior Member, IEEE, Hirotaka Tamura, Fellow, IEEE,  
Yuuki Ogata, and Hisakatsu Yamaguchi

**Abstract**—A loop gain adaptation technique is proposed, which optimizes the jitter tolerance (JTOL) of a 28 Gb/s phase interpolator (PI)-based clock and data recovery (CDR) circuit implemented in 28 nm CMOS. The technique increases the CDR’s loop gain to suppress the most jitter while monitoring the autocorrelation function of the bang-bang phase detector (BB-PD) output to prevent the CDR from becoming too underdamped. The proposed technique requires no knowledge of the CDR’s loop latency or input jitter characteristics.

**Index Terms**—Adaptive loop filter (LF), clock and data recovery (CDR), jitter.

## I. INTRODUCTION

AS DATA rates continue to increase, jitter can become the determining factor in the bit error rate (BER) of a wireline link. This is particularly important in digital clock and data recovery (CDR) circuits, which rely on bang-bang phase detectors (BB-PDs), whose gain can vary depending on the CDR’s input jitter [1]. Consequently, the CDR’s loop gain and bandwidth can vary, affecting the CDR’s stability and jitter tolerance (JTOL). When jitter is too small, the BB-PD gain can become larger than expected, leading to an underdamped loop response, and undershoot in the JTOL as seen in Fig. 1. On the other hand, larger than expected jitter can reduce the BB-PD gain, reducing the CDR’s loop gain and jitter tracking ability, which also reduces JTOL. In this paper, we propose a phase interpolator (PI)-based CDR whose loop gain  $K_G$  is adjusted adaptively to optimize JTOL under any condition.

As will be discussed in this paper, prior works adapt the CDR’s loop filter (LF) by either directly measuring or estimating the amplitude or bandwidth of the CDR’s jitter [2]–[5]. These approaches require either dedicated jitter measurement circuits, which can be costly to implement, or some prior knowledge of the expected jitter profile seen by the CDR.

In contrast, in this paper, we simply increase  $K_G$  and, therefore, the CDR’s loop bandwidth to suppress the most jitter and maximize JTOL. To prevent the CDR from becoming too

Manuscript received December 19, 2017; revised April 8, 2018; accepted May 8, 2018. Date of publication June 29, 2018; date of current version August 27, 2018. This paper was approved by Associate Editor Pavan Kumar Hanumolu. This work was supported in part by NSERC. (Corresponding author: Joshua Liang.)

J. Liang and A. Sheikholeslami are with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3H7, Canada (e-mail: joshua.liang@mail.utoronto.ca).

H. Tamura, Y. Ogata, and H. Yamaguchi are with the Fujitsu Laboratories Ltd., Kawasaki 211-8588, Japan.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2018.2839038



Fig. 1. (a) Conventional bang-bang CDR and proposed adaptive loop gain CDR and the impact of adaptation on JTOL when jitter is (b) too small or (c) too large.

underdamped, we monitor the autocorrelation function of the CDR’s BB-PD output for any ringing, which appears if the CDR’s phase margin (PM) drops too low [6].

The remainder of this paper expands on [6] and is organized as follows. Section II provides background on the existing adaptive loop gain strategies. In Sections III and IV, we model and analyze the effect of loop gain on the CDR’s jitter and loop bandwidth. Section V then describes how the proposed adaptation technique prevents the CDR from becoming too underdamped. Section VI describes the test chip implemented in 28 nm CMOS, and Section VII describes measurement results that confirm that the proposed technique optimizes high-frequency JTOL for several different jitter profiles.

## II. BACKGROUND

Several prior works have adapted the loop gain of a phase-locked loop (PLL) or CDR to minimize its jitter or BER. In [7], jitter is measured off-chip and fed to a gradient descent algorithm to optimize a PLL’s LF. This technique



Fig. 2. Existing concepts for CDRs with adaptive LFs using (a) direct jitter measurement, (b) Kalman filter theory, and (c) estimation of jitter bandwidth.

could be combined with on-chip jitter measurement circuits such as eye monitors [8], leading to the system shown in Fig. 2(a). Although effective, adding jitter measurement circuits increases power and area consumption, and can also degrade circuit performance by increasing capacitance on high-speed clock and data paths.

Other techniques avoid directly measuring jitter. Instead, they monitor the PD or LF output [2]–[5]. In [2], the jitter's magnitude is estimated from the fast Fourier transform (FFT) of the LF output as shown in Fig. 2(b). It is then used to calculate the optimum gain of the CDR's first-order LF using Kalman theory [2], by assuming that the dominant source of jitter is phase noise from a voltage-controlled oscillator (VCO) that has a  $-20$ -dB/decade roll-off. Unfortunately, this technique only applies to this particular jitter profile and LF. Its reliance on FFT (performed off-chip in [2]) also makes it challenging to integrate on-chip.

Another approach shown in Fig. 2(c) is to estimate the jitter's bandwidth from the PD output. In [3], the amplitude of the low-pass filtered PD output is monitored. Since higher frequency jitter is attenuated more, the amplitude of the filter output can indicate if the jitter's bandwidth is high or low. In [4] and [5], the number of repeating PD outputs are counted over a given observation time. If many repeated PD outputs are observed within a short time, the jitter is assumed to be predominantly at low frequency. In each of these techniques, the CDR increases its loop gain whenever the bandwidth of the jitter is too low and decreases it otherwise. The limitation of these techniques is that the optimum jitter bandwidth is difficult to determine unless these jitter sources are known *a priori*. For example, the jitter bandwidth that leads to optimal performance can depend on whether jitter is dominated by a VCO, whose jitter is concentrated at low



Fig. 3. (a) Block diagram of a PI-based CDR. (b) Definition of relative jitter  $\psi_{ER}$ .

frequencies or intersymbol interference (ISI)-induced jitter that is more broadband.

In this paper, we use linear analysis and simulation to show that maximizing the CDR's loop gain, and therefore loop bandwidth, leads to near-optimal jitter for a variety of jitter profiles, while also achieving the high loop bandwidth that is often needed to meet JTOL requirements. To prevent the CDR from becoming too underdamped, the autocorrelation function of the BB-PD output is monitored. As we will explain, unlike other recent works that have used autocorrelation for jitter minimization [9]–[11], the proposed work is fully adaptive, requiring no prior knowledge of the CDR's jitter profile or loop latency, and is also robust to the wideband jitter often encountered in wireline applications.

### III. ANALYSIS OF JITTER IN PI-BASED CDR

We develop our loop gain adaptation technique for the PI-based digital CDR shown in Fig. 3(a), which consists of a BB-PD and demultiplexer (DMUX), followed by a majority voting (MV) stage and second-order digital LF that allows the CDR to track frequency offset. To minimize the CDR's BER, we must minimize the relative jitter between the input data and the CDR's recovered clock, which we denote as  $\psi_{ER}$  as shown in Fig. 3(b). To this end, we first identify and quantify the jitter sources contributing to  $\psi_{ER}$ , then analyze how they are affected by the loop gain  $K_G$  (in Section IV).

#### A. Linear Jitter Model

$\psi_{ER}$  is affected by the following jitter sources: data jitter ( $\psi_{DAT}$ ), reference clock jitter ( $\psi_{REF}$ ), phase quantization error from the PI ( $\psi_{PI}$ ), and jitter caused by the BB-PD ( $\psi_{PD}$ ) and MV blocks ( $\psi_{MV}$ ). We incorporate these jitter sources into a linear model as shown in Fig. 4. Note that we have assumed that enough random jitter (RJ) is present to allow this system to be linearized [1], [12]. The combination of the demultiplexing and MV operation leads to a downsampling operation after voting since it only produces one output for every  $N$  PD outputs. This downsampling can cause aliasing of high-frequency jitter. Since the digital LF also operates at a lower frequency than the PI clock, its output is upsampled and followed by a zero-order hold (ZOH) to convert its output back to a full-rate signal.



Fig. 4. Jitter model of PI-based CDR.

### B. PI Quantization Noise

$\psi_{PI}$  is caused by the nonlinearity and quantization error of the PI. Ignoring the nonlinearity for simplicity and treating the PI as an ideal quantizer [13] with  $N_{PI}$  phase steps per unit interval (UI), we can arrive at a lower bound for  $\sigma_{PI}$ , the standard deviation of  $\psi_{PI}$  in units of UI

$$\sigma_{PI} \geq \frac{1}{\sqrt{12N_{PI}}}. \quad (1)$$

In this paper,  $N_{PI}$  is 64 steps per UI, which gives  $\sigma_{PI} \geq 4.5$  mUI or 0.16 ps at 28 Gb/s. For simplicity, we approximate  $\psi_{PI}$  as white noise, although in reality, PI nonlinearity can also lead to deterministic jitter in the presence of frequency offset. In Fig. 4,  $\psi_{PI,ZOH}$  includes the effect of the ZOH on  $\psi_{PI}$ .

### C. Bang-Bang PD Model

The BB-PD can be modeled as a linear gain  $K_{PD}$  with additive quantization noise  $\psi_{PD}$  at its output [1]. As discussed earlier, the BB-PD gain is dependent on the probability density function (PDF) of  $\psi_{ER}$ . If  $\psi_{ER}$  is Gaussian,  $\alpha_T$  is the transition density of the input data, and  $\sigma_{ER}$  is the standard deviation of  $\psi_{ER}$ , then  $K_{PD}$  and  $\sigma_{PD}$ , the standard deviation of  $\psi_{PD}$  can be derived as [1]

$$K_{PD} = \sqrt{\frac{2}{\pi}} \frac{\alpha_T}{\sigma_{ER}} \quad (2)$$

$$\sigma_{PD} = \sqrt{\alpha_T - \frac{2}{\pi}\alpha_T^2}. \quad (3)$$

For jitter having other distributions, caused for example, by ISI or other deterministic jitter,  $K_{PD}$  can also be derived by examining their respective PDFs. Note also that although we have assumed that  $\psi_{ER}$  is Gaussian for the purposes of this analysis, our proposed adaptation technique does not rely on any calculated parameters, and is not dependent on the PDF of  $\psi_{ER}$ .

### D. Majority Voting Noise Model

Majority voting among  $N$  consecutive BB-PD outputs consists of summing the PD outputs and taking the sign of the result. It can, therefore, be modeled as a moving average filter  $M(z^{-1})$ , whose output  $\psi_A$  is followed by a slicer, where

$$M(z^{-1}) = \sum_{k=1}^N z^{-k}. \quad (4)$$

The slicer can be modeled identically to a BB-PD, as a linear gain  $K_{MV}$  followed by a quantization noise source  $\psi_{MV}$  [14]. While  $K_{MV}$  can be found through simulation as in [15], it can also be estimated by assuming that  $\psi_{BB}$ , the output of the BB-PD, is a random process with independent and identically distributed samples and that  $N$  is large enough to approximate  $\sigma_A$  as Gaussian [14].  $K_{MV}$  and  $\sigma_{MV}$  can then be estimated as

$$K_{MV} = \sqrt{\frac{2}{\pi}} \frac{1}{\sigma_A} \quad (5)$$

$$\sigma_{MV} = \sqrt{1 - \frac{2}{\pi}}. \quad (6)$$

While  $\sigma_A$  could be higher if the BB-PD outputs are in fact correlated with each other, treating them as uncorrelated allows us to find a lower bound for  $\sigma_A$  as

$$\sigma_A \geq \sqrt{N} \sigma_{BB}. \quad (7)$$

Since the BB-PD output can only be +1, -1, or 0,  $\sigma_{BB}$  is also equal to  $\alpha_T$  [1].

### E. Calculation of Jitter PSD

We can now use this model to calculate  $S_{ER}(f)$ , the power spectral density (PSD) of  $\psi_{ER}$

$$S_{ER}(f) = [S_{DAT}(f) + S_{REF}(f) + S_{PLZOH}(f)] |H_{JTON}^{-1}(f)|^2 + \left[ \frac{S_{MV}(f)}{|K_{PD}M(f)K_{MV}|^2} + \frac{S_{PD}(f)}{|K_{PD}|^2} \right] |H_{JTRAN}(f)|^2. \quad (8)$$

Here, we have assumed that each jitter source is uncorrelated so that their noise power can be added together.  $H_{JTON}^{-1}(f)$  and  $H_{JTRAN}(f)$  are given as

$$H_{JTON}^{-1}(f) = \frac{1}{1 + LG(f)} \quad (9)$$

$$H_{JTRAN}(f) = \frac{LG(f)}{1 + LG(f)} \quad (10)$$

where  $LG(f)$  is the CDR's loop gain and  $LF(f)$  is the LF response. The digital LF's clock period is  $NT$ , while its latency is  $N_{DNT}$  ( $1/T$  is the data rate.)

$$LG(f) = K_{PD} K_{GM}(f) K_{MV} LF(f) N_{PI}^{-1} e^{-j2\pi f N_{DNT}}. \quad (11)$$

$LF(f)$  includes the effect of the phase accumulator

$$LF(f) = \left( K_p + \frac{K_I}{1 - e^{-2\pi f NT}} \right) \frac{K_{PA}}{1 - e^{-2\pi f NT}}. \quad (12)$$



Fig. 5. Response of (a)  $H_{JTOL}^{-1}(f)$  and (b)  $H_{JTRAN}(f)$  as  $K_G$  is increased.



Fig. 6. Calculated contributions to  $\sigma_{ER}$  from each jitter source for (a) Case I and (b) Case II as  $K_G$  is varied.

$H_{JTOL}^{-1}(f)$  has a high-pass response while  $H_{JTRAN}(f)$  has a low-pass response. Since the output of the LF is already low-pass, the ZOH operation has little effect on its output and is otherwise ignored. The dependence of  $K_{PD}$  and  $K_{MV}$  on the  $\sigma$  of their inputs means that this model must be solved iteratively, by recalculating  $K_{PD}$  and  $K_{MV}$  at each step.

As noted in the preceding analysis, we have modeled the CDR as a linear system. In reality, the BB-PD and MV operations are highly nonlinear. Other blocks such as the PI may also contain nonlinearities in practice and can, therefore, produce harmonics or other nonlinear effects not predicted by this model. The model is most accurate where RJ sources dominate the overall jitter of the system.

#### IV. FINDING OPTIMAL LOOP GAIN

We now use the jitter model to develop a strategy to optimize  $K_G$ . We first examine how to minimize  $\sigma_{ER}$ , and then consider how loop gain affects the CDR's PM and loop bandwidth.

##### A. Jitter Versus Loop Gain

Increasing  $K_G$  raises the corner frequencies of  $H_{JTOL}^{-1}(f)$  and  $H_{JTRAN}(f)$  as shown in Fig. 5. This reduces the contributions to  $\psi_{ER}$  from  $\psi_{DAT}$ ,  $\psi_{REF}$  and  $\psi_{PI}$ , but increases the contributions from  $\psi_{PD}$  and  $\psi_{MV}$ . We explore this tradeoff using two cases.

In Case I,  $\psi_{DAT}$  consists of 500-fs rms white RJ and  $\psi_{REF}$  comes from an oscillator with  $-80$ -dBc/Hz phase noise at 1 MHz offset and a  $-20$ -dB/decade roll-off. We use the jitter model from Section III to calculate and integrate the PSDs of each jitter source in MATLAB. Fig. 6(a) plots each jitter contribution as  $K_G$  varies. Raising  $K_G$  from 1 to 5 significantly lowers  $\sigma_{ER,DAT+REF}$ , the jitter contributed by



Fig. 7. Jitter and PM as  $K_G$  is varied for (a) Case I and (b) Case II.

$\psi_{DAT}$  and  $\psi_{REF}$ .  $\sigma_{ER}$  also reduces since the contributions from  $\sigma_{ER,MV}$  and  $\sigma_{ER,PD}$  remain small. However, when  $K_G$  is increased too much, peaking in  $H_{JTOL}^{-1}(f)$  and  $H_{JTRAN}(f)$  causes  $\sigma_{ER,DAT+REF}$ ,  $\sigma_{ER,PI}$ , and the overall jitter  $\sigma_{ER}$  to increase. In this example,  $\sigma_{ER}$  can, therefore, be minimized by increasing  $K_G$  until peaking in  $H_{JTOL}^{-1}(f)$  starts to increase the overall jitter.

However, if  $\psi_{DAT}$  and  $\psi_{REF}$  are very small, or relatively broadband, increasing  $K_G$  may not necessarily reduce  $\sigma_{ER}$ , as seen in Case II, where  $\psi_{REF}$  represents the jitter of a PLL with  $-110$ -dBc/Hz in-band phase noise and a 3-dB bandwidth of 5 MHz, while  $\psi_{DAT}$  is 500-fs rms white RJ. Although  $\sigma_{ER,PD}$  and  $\sigma_{ER,MV}$  make minimal contributions to  $\sigma_{ER}$ , increasing  $K_G$  still increases  $\sigma_{ER}$ , as shown in Fig. 6(b). To arrive at our proposed technique, we also consider the impact of  $K_G$  on the CDR's loop bandwidth and PM.

##### B. Loop Bandwidth Versus Loop Gain

A high CDR loop bandwidth is desirable and often required to meet stringent JTOL masks [16]. In view of this, Fig. 7 re-plots the results of Fig. 6 showing  $\sigma_{ER}$  along with the CDR's PM and unity-gain frequency ( $f_t$ ) versus  $K_G$ . In Case II, although  $\sigma_{ER}$  is minimized when  $K_G$  is at its lowest value, increasing  $K_G$  can significantly improve  $f_t$  with only a small degradation in jitter. As shown in Fig. 7(b), raising  $K_G$  from 1 to 3 increases  $\sigma_{ER}$  by less than 1% while the unity-gain frequency improves by 2.8×. Jitter only starts to increase significantly when the CDR's PM drops below roughly 60°. Fig. 7(a) shows a similar trend for Case I, where jitter is also minimized when the CDR's PM is approximately 60°.

These examples show that even if the absolute lowest jitter is not achieved, maximizing the CDR's loop gain can still achieve near-optimal jitter performance, while attaining the high loop bandwidth desired for meeting JTOL requirements.

#### V. PROPOSED LOOP GAIN ADAPTATION STRATEGY

Given the results of Section IV, our proposed strategy is to increase  $K_G$  as much as possible without allowing the CDR's PM to degrade too much. To do this, we need a method to monitor any instability in the CDR.



Fig. 8. (a) Normalized impulse response and (b) ideal JTOL of CDR for different PMs.

#### A. Monitoring CDR Stability With PD Autocorrelation

Due to loop latency, increasing  $K_G$  too much degrades the PM of digital CDRs, and increases jitter as we have seen. Poor PM also leads to undershoot in JTOL and ringing in the impulse response of the CDR as shown in Fig. 8. When the ringing becomes large in amplitude, its period, which we denote as  $2n_{\text{peak}}$ , approaches the inverse of  $f_{180}$ , the frequency at which the phase of the CDR loop gain reaches  $-180^\circ$  (or  $-\pi$  in radians), and at which the CDR can oscillate. We write  $n_{\text{peak}}$  in units of UI as follows:

$$n_{\text{peak}} \approx \frac{1}{2f_{180}T} \quad (13)$$

$$\angle LG(f_{180}) = -180^\circ. \quad (14)$$

Any ringing in the CDR's impulse response causes corresponding damped oscillations to occur in  $\psi_{\text{ER}}$ , as well as producing the peaking in  $H_{\text{JTOL}}^{-1}(f)$  and  $H_{\text{JTRAN}}(f)$  as we have seen. This can be observed by measuring the CDR's step response [17], or by observing  $H_{\text{JTOL}}^{-1}(f)$  or  $H_{\text{JTRAN}}(f)$ . However, neither of these can be easily accomplished on-chip. Instead, the ringing in  $\psi_{\text{ER}}$  can be observed by monitoring the autocorrelation function of the BB-PD output  $R(n)$ .

To understand this, first note that the PSD of the BB-PD output  $S_{\text{BB}}(f)$ , also contains the PSD of  $\psi_{\text{ER}}$ , and can be written as

$$S_{\text{BB}}(f) = |K_{\text{PD}}|^2 S_{\text{ER}}(f) + S_{\text{PD}}(f). \quad (15)$$

From (8), we also see that  $S_{\text{ER}}(f)$  is shaped by both  $|H_{\text{JTOL}}^{-1}(f)|^2$  and  $|H_{\text{JTRAN}}(f)|^2$ . Any peaking in either transfer function will, therefore, be observable in  $S_{\text{ER}}(f)$  and  $S_{\text{BB}}(f)$ . While  $S_{\text{BB}}(f)$  would be quite difficult to measure on-chip, we can more easily observe this ringing in the time domain, by estimating the autocorrelation function  $R(n)$ , which is also the inverse Fourier transform of  $S_{\text{BB}}(f)$ .

To illustrate this, Fig. 9 compares  $H_{\text{JTOL}}^{-1}(f)$  and  $H_{\text{JTRAN}}(f)$  as well as the inverse Fourier transforms of  $|H_{\text{JTOL}}^{-1}(f)|^2$  and  $|H_{\text{JTRAN}}(f)|^2$ , simulated with the CDR having several values of PM. As shown in the plots, when PM falls below approximately  $60^\circ$  and peaking starts to occur in  $H_{\text{JTOL}}^{-1}(f)$  and  $H_{\text{JTRAN}}(f)$ , corresponding damped oscillations are observed in the inverse Fourier transforms. In particular, one can see



Fig. 9. Simulated response of (a)  $H_{\text{JTOL}}^{-1}(f)$  and (b)  $H_{\text{JTRAN}}(f)$  and the inverse Fourier transforms of (c)  $|H_{\text{JTOL}}^{-1}(f)|^2$  and (d)  $|H_{\text{JTRAN}}(f)|^2$  for different values of PM.



Fig. 10. (a) Spectrum and (b) corresponding autocorrelation function of BB-PD for various loop gain settings showing peaking caused by excessive loop gain.

in Fig. 9(c) and (d) that as PM decreases, the values of both inverse Fourier transforms start to decrease and become negative where  $n = n_{\text{peak}}$ . Since  $n_{\text{peak}}$  corresponds to half of the CDR's oscillation period when underdamped, these waveforms reach their lowest values at  $n = n_{\text{peak}}$  and their highest values at  $n = 0$ . We expect that this trend will also be observable in  $R(n)$ , which we verify next through simulation.

Fig. 10 plots  $S_{\text{ER}}(f)$  and  $R(n)$  as simulated for several values of  $K_G$  and therefore PM. As the CDR becomes



Fig. 11. Overview of proposed adaptive loop gain CDR.

underdamped, ringing becomes observable in  $R(n)$ , causing the value of  $R(n_{\text{peak}})$  to decrease and become negative as expected.

The results demonstrate that we can, therefore, prevent the CDR from becoming too underdamped by monitoring  $R(n)$  and preventing any ringing in its response.

### B. Proposed Adaptive Loop Gain CDR

The proposed adaptive loop gain CDR is shown in Fig. 11. The adaptation logic monitors the autocorrelation function of the low-pass filtered PD output and increases the CDR loop gain  $K_G$ , so long as  $R(n_{\text{peak}})$  is greater than the decision threshold  $R_{\text{TH}}$ , decreasing it otherwise. Based on simulation results such as those in Fig. 8, setting  $R_{\text{TH}} = 0$  leads to a PM of approximately  $60^\circ$ , providing a good tradeoff between CDR bandwidth and undershoot in JTOL. The value of  $R_{\text{TH}}$  can also be varied to achieve higher or lower PM as desired. As will be seen in Section V-C, with appropriate filtering of the PD output, the value of  $R(n_{\text{peak}})$  is roughly proportional to the PM of the system. For example, to achieve PM closer to  $70^\circ$ ,  $R_{\text{TH}}$  can be chosen to be closer to 0.2.

As shown in Fig. 11,  $R(n)$  is estimated by correlating the low-pass filtered output of the BB-PD,  $\psi_{\text{BB}}$ , with delayed versions of itself and taking the average. This gives the time-averaged autocorrelation function

$$R_{\text{Estimated}}(n) = \frac{1}{L} \sum_{k=1}^L \psi_{\text{BB}}(k) \psi_{\text{BB}}(k-n). \quad (16)$$

Since the BB-PD only outputs  $\pm 1$  or 0, the correlation operation is greatly simplified. By only counting samples where  $\psi_{\text{BB}} \neq 0$ , the measured  $R(n)$  waveform always has a fixed maximum amplitude equal to 1 (i.e.,  $R(0) = 1$ ).

In addition to the averaging that takes place in the  $R(n)$  measurement block, an LF integrates the output of the comparison between  $R(n_{\text{peak}})$  and  $R_{\text{TH}}$ . As a result, the adaptation loop is heavily damped, operating similar to a linear search for the desired value of  $R(n_{\text{peak}})$ . Next, we explain why the BB-PD output must be low-pass filtered before estimating  $R(n)$ .

### C. PD Filtering

As shown in Fig. 10(b),  $R(n)$  generally includes a delta function, which is the autocorrelation function of any white



Fig. 12.  $R(n)$  for different  $K_G$  values measured using (a) raw and (b) low-pass filtered BB-PD output in the presence of high white RJ and (c) corresponding values of  $R(n_{\text{peak}})$ .

RJ in the BB-PD output, including PD and MV quantization noise and ISI jitter. This large delta function often masks the autocorrelation of the other jitter sources and makes it difficult to detect ringing as shown in Fig. 12(a) where we have plotted  $R(n)$  for different  $K_G$  values when a uniformly distributed white RJ of 0.1 UIpp is present. The changes in  $R(n)$  caused by varying  $K_G$  are difficult to detect.

To prevent this, we pass the PD output through a low-pass filter (LPF) to suppress white jitter, reducing the height of the delta function compared to the rest of  $R(n)$ . The result is shown in Fig. 12(b) where filtering makes the ringing in  $R(n)$  much more evident. The effect is further illustrated in Fig. 12(c), which plots  $R(n_{\text{peak}})$  with and without filtering of the PD output. The slope of  $R(n_{\text{peak}})$  as a function of PM is dramatically increased when the PD output is filtered.

### D. Limitations of Proposed Technique

The proposed technique does have some limitations. First, if the jitter of the reference clock, PI, and data are not much larger than the jitter caused by MV, the proposed adaptation will become sub-optimal in minimizing  $\sigma_{\text{ER}}$ .

In addition, in observing  $R(n_{\text{peak}})$ , it is assumed that any ringing in  $R(n)$  is caused by the CDR's impulse response. Ringing or periodic jitter caused by other jitter sources could interfere with adaptation by producing oscillations in  $R(n)$  that could be incorrectly attributed to the CDR becoming too underdamped. This could occur if, for example, the reference clock is generated by a severely underdamped PLL, whose jitter already contains ringing. Periodic jitter could also be generated by power supply-induced jitter, PI nonlinearity, which could generate periodic jitter in the presence of frequency offset or spread spectrum clocking (SSC), or other nonlinear CDR behavior such as limit cycles. If the CDR receives repetitive data patterns, the resulting ISI-induced jitter could also contain periodic content. Such periodic jitter can interfere with adaptation if its amplitude is significant compared to the CDR's other jitter sources, in which case it could create oscillatory waveforms in  $R(n)$ . While higher frequency jitter is suppressed by low-pass filtering the PD output, lower frequency jitter, close to or below the CDR's self-oscillation frequency  $f_{180}$ , is not easily suppressed without degrading the observability of possible CDR oscillations. The effect of such jitter depends on whether its frequency is inside or outside of the CDRs tracking bandwidth.

If the periodic jitter is within the CDRs tracking bandwidth, the CDR will suppress it, reducing its effect on  $R(n)$ . However, if the periodic jitter is beyond the CDR's tracking bandwidth but below  $f_{180}$ , the periodicity in  $R(n)$  becomes more difficult to differentiate from that induced by CDR oscillation. This could lead the adaptation to act incorrectly, by attempting to suppress oscillations not in fact caused by the CDR.

Although the simple adaptation algorithm used here has difficulty with this situation,  $R(n)$  still provides valuable observability of the CDR's jitter. More elaborate adaptation schemes could, therefore, potentially be developed to handle such cases. For example, in [6], we also showed how  $R(n)$  can be used to detect and estimate the frequency of sinusoidal jitter (SJ), which when compared to the maximum bandwidth of the CDR, can help determine how to best suppress it.

Finally, the profile of the data and reference clock jitter can also affect the value of  $R(n_{\text{peak}})$ . In Section V-A, we mentioned that any peaking seen in the CDR response becomes observable in  $S_{\text{BB}}(f)$  and in  $R(n)$ . However, the shape of  $S_{\text{BB}}(f)$  and therefore  $R(n)$  are also influenced by the PSDs of the various jitter sources. To demonstrate this, we plot  $R(n)$  as simulated when the CDR is driven by jitter with four different spectral profiles: white, bandlimited with a first-order rolloff, having a constant  $-20\text{-dB/decade}$  rolloff, and having a constant  $-30\text{-dB/decade}$  rolloff.

Fig. 13 plots the simulated spectral profiles of CDR input jitter and corresponding  $R(n)$  curves when  $R(n_{\text{peak}})$  is close to zero. As shown in Fig. 13(b), when  $R(n_{\text{peak}})$  reaches zero, the simulated PM ranges between  $66^\circ$  and  $54^\circ$  (or  $\pm 10\%$  around  $60^\circ$ ). Progressing through the four jitter profiles, the dominant jitter content is increasingly concentrated at lower frequencies. This reduces the CDR jitter's spectral content around  $f_{180}$ , making the CDR appear to be more damped despite the decreasing PM, which leads to lower than expected PM after adaptation. Conversely, when jitter



Fig. 13. (a) Spectral profiles of CDR input jitter and (b) corresponding  $R(n)$  plots when  $R(n_{\text{peak}}) \approx 0$  in all cases.

is concentrated at higher frequencies, it can emphasize any ringing, leading to PMs higher than  $60^\circ$ .

We now describe how to determine  $n_{\text{peak}}$ . We first provide some intuition by estimating  $n_{\text{peak}}$  using a simplified analysis, before describing how  $n_{\text{peak}}$  can be found more reliably by measuring it on-chip.

#### E. Estimating $n_{\text{peak}}$ Analytically

We have defined  $n_{\text{peak}}$  as half of the oscillation period when the CDR undergoes a damped oscillation. While nonlinear analysis [18] has been used to analyze such oscillations, we adopt a simple linear analysis. Assuming that  $f \ll 1/2\pi NT$ , we can approximate  $e^{-j2\pi f NT}$  in (12) as  $1 - j2\pi f NT$ . Ignoring the effect of MV for simplicity, and also assuming that the CDR has some additional analog delay  $T_D$ , gives the simplified CDR loop gain

$$\text{LG}(f) \approx \frac{K_{\text{PD}} K_G K_I}{N_{\text{PI}}} \left[ \frac{1 + j2\pi f NT \frac{K_P}{K_I}}{-4\pi^2 (NT)^2} \right] e^{-j2\pi f (N_D NT + T_D)}. \quad (17)$$

The CDR can oscillate if  $|\text{LG}(f_{180})| = 1$ . By setting  $\angle \text{LG}(f_{180}) = -\pi$ ,  $f_{180}$  can then be found from

$$\tan^{-1} \left( 2\pi f_{180} NT \frac{K_P}{K_I} \right) = 2\pi f_{180} (N_D NT + T_D). \quad (18)$$

If  $K_P \gg K_I$  and we approximate the LF as being first order, the left-hand side of (18) is approximately  $\pi/2$ , giving

$$f_{180} \approx \frac{1}{4(N_D NT + T_D)} \quad (19)$$

$n_{\text{peak}}$  is then calculated from (13) as

$$n_{\text{peak}} \approx \frac{2(N_D NT + T_D)}{T} \quad (20)$$

The above-mentioned equation shows that  $n_{\text{peak}}$  is mainly a function of CDR loop latency  $N_D NT + T_D$ , but this result relies on several simplifications, ignoring the effect of  $K_I$  and MV. Furthermore,  $T_D$  must include the delays of all PI,



Fig. 14. (a) Concept behind adaptation of  $n_{\text{peak}}$ . (b) Feedback loop used to identify  $n_{\text{peak}}/2$ .

clock tree, retiming, DMUX, and digital circuitry within the CDR feedback loop. These analog delays make  $T_D$  difficult to accurately characterize.  $T_D$  may also be sensitive to process voltage and temperature (PVT) variation. Instead of relying on simulation or calculations, we find  $n_{\text{peak}}$  adaptively on-chip.

#### F. Adaptation of $n_{\text{peak}}$

We determine  $n_{\text{peak}}$  by initially setting  $K_G$  to its highest value and finding the period of the resulting damped oscillation that can be observed in  $R(n)$ .  $R(n_{\text{peak}})$  could be found as the minimum of  $R(n)$ , but that would require a relatively slow gradient descent or search algorithm. Instead, we use the fact that  $R(n_{\text{peak}}/2) \approx 0$  to estimate  $n_{\text{peak}}/2$  using a feedback loop as shown in Fig. 14. Here,  $n_{\text{peak}}$  starts at zero and gradually increases for as long as  $R(n_{\text{peak}}/2)$  remains greater than zero, causing  $n_{\text{peak}}$  to converge when  $R(n_{\text{peak}}/2) \approx 0$ . Once  $n_{\text{peak}}$  is measured, it is stored as  $n_{\text{peak},\text{REF}}$  and used for the rest of the adaptation process.

#### G. Comparison to Other Works Using Autocorrelation

As mentioned earlier, several other recent works have also used the autocorrelation function of the BB-PD output, deriving various conditions on  $R(n)$  in attempting to minimize the jitter of a PLL [9], [10] or CDR [11]. We now compare our work against these works.

The approach of [9] is similar to this work, in that it attempts to drive the autocorrelation function of a PLL's BB-PD output to zero at  $n = 2D + 1$ .  $D$  is the delay of the digital LF, making  $2D + 1$  similar to  $n_{\text{peak}}$  defined earlier when using the nonlinear analysis of [18]. Although Jang *et al.* [9] claim that this minimizes the PLL's output jitter, based on our analysis, this is only the case if the reference clock jitter is extremely low (i.e., when  $\psi_{\text{DAT}}$  is small), which is not true, for example, in PLLs used to filter jitter or in CDRs with jittery input data. In [10],  $R(n)$  for a PLL is observed at several arbitrarily chosen points, based on the analysis in [19], which assumes that the PLL always remains stable, ignoring the possibility of instability, which we have shown, can be the limiting factor in minimizing jitter. The CDR in [11] monitors and attempts to drive  $R(D + 1)$  to zero in a CDR.  $D + 1$  is approximately

$n_{\text{peak}}/2$  in our analysis, meaning that  $R(D + 1)$  will generally be near zero even when the CDR's PM is poor, making it a poor criterion for optimizing the CDR loop gain.

All of these works rely on observing  $R(n)$  at calculated and fixed, rather than adapted values of  $n$ , making them sensitive to variations in loop latency as discussed previously. Because they do not filter the BB-PD output, they are also sensitive to white jitter such as ISI jitter, making them ill-suited for wireline applications.

## VI. IMPLEMENTATION

### A. Analog Front End

The proposed adaptive loop gain scheme was implemented in a 28 Gb/s half-rate PI-based digital CDR shown in Fig. 15. The CDR's analog front end includes a continuous time linear equalizer (CTLE), which combines active feedback [20] with an inverter-based second stage to improve bandwidth and dc gain. Half-rate quadrature clocks are generated from an external reference clock using a two-stage injection-locked oscillator (ILO) that feeds 7-bit CMOS PIs. The half-rate data, edge, and eye monitor samples are demuxed by 16, forming 32-bit buses that are sent to the synthesized digital core operating at 875 MHz.

### B. Digital Backend

The PD takes each set of 32 demuxed edge and data samples and generates 32 corresponding early and late signals. MV then converts these to a single 2-bit value of +1, 0, or -1. This is then fed to the adaptation block and LF whose implementations are thus greatly simplified. The second-order digital LF is shown in Fig. 16. Because MV is used, implementing the gain  $K_G$  only requires a mux instead of a multiplier. In this paper,  $K_G$  is a 4-bit value that changes in linear steps from 1 to 15. A finer  $K_G$  step size could be used but that would also increase power consumption. To avoid using multipliers, the integral and proportional gains,  $K_I$  and  $K_P$ , are powers of two. The outputs of both the MV and decoder blocks are resampled, adding two additional cycles of latency and giving a total digital latency of four cycles.

The implementation of  $R(n)$  measurement is shown in Fig. 17. The block takes the 2-bit output of the MV block and further low-pass filters it using another FIFO and MV stage. MV conveniently provides the desired low-pass filtering effect while maintaining a binary output, which is easily processed by the  $R(n)$  block. The LPF bandwidth is adjusted by selecting the range of FIFO samples over which voting occurs. The filtered signal is then correlated with itself delayed by an adjustable FIFO. As in [21],  $R(n)$  itself is generated by two counters, one that counts the total number of transitions and another that counts the number of correlated samples, producing  $R(n)$  as a digital code. In this paper, 11-bit counters were used. Using larger counters leads to less error in measuring  $R(n)$  but also slows down adaptation and increases power consumption.

## VII. MEASUREMENT RESULTS

The test chip was fabricated in 28 nm CMOS and consumes 106.6 mW or 3.82 pJ/bit at 28 Gb/s with the eye monitor



Fig. 15. Block diagram of adaptive loop gain CDR.



Fig. 16. Block diagram of digital LF.

Fig. 17. Digital implementation of  $R(n)$  measurement.

circuits (only used for diagnostics) consuming roughly 12% of the total power. Although the eye monitors cannot be completely disabled, removing them should improve power efficiency to 3.35 pJ/bit. The die photo with area and power breakdowns is shown in Fig. 18.



Fig. 18. Die photo with area and power breakdowns. Total power is measured while percentage breakdown is based on simulation.

As can be seen in the power breakdown, the digital core accounts for more than 30% of the overall power consumption. This high power consumption was due to the inclusion of many test features, such as the ability to directly observe the outputs of signals within the LF and adaptation circuits at up to 437.5 MHz. Since these additional features consume roughly 40% of the overall digital gate count, simplifying or adding clock gating to these features would lead to significant power savings in the digital core.

#### A. Test Setup and Chip Functionality

The test chip was wire bonded to a PCB making the wire bond the dominant source of attenuation on the input data. As shown in Fig. 19, the half-rate 14 GHz reference clock is supplied by a Rhode and Schwarz SMB100A signal generator.



Fig. 19. Measurement setup.



Fig. 20. (a) Recovered quarter-rate data and (b) half-rate clock with CDR locked to 28 Gb/s PRBS31 data.

The SMB100A was phase and frequency-modulated with random noise from a NoiseCom NC6110 noise source to generate different phase noise profiles.

Fig. 20 shows the quarter-rate (7 Gb/s) PRBS31 data and half-rate (14 GHz) clock recovered by the CDR. Error-free operation was verified by feeding the recovered quarter-rate data to a Centellax TG1B1-A bit error rate tester (BERT). However, since error checking could not be performed simultaneously on all four quarter-rate recovered data streams, the external BERT results were optimistic compared to those from the on-chip BERT that checks the entire deserialized data stream. The on-chip BERT was, therefore, used for the presented measurements.

### B. Adaptation Performance

The first step of adaptation is finding  $n_{\text{peak}}$ . As discussed, this is done by setting  $K_G$  to its highest setting and enabling the  $n_{\text{peak}}$  adaptation block. The result of adaptation is shown in Fig. 21(a) while Fig. 21(b) shows the complete  $R(n)$  waveform measured for the same condition. Since the adaptation circuits operate on the output of the MV block, the precision of  $n_{\text{peak}}$  is limited to the nearest 32 UI.  $n_{\text{peak}}$  converges to an average value of approximately 300 UI, which is close to where  $R(n)$  reaches its minimum in Fig. 21(b). After adaptation,  $n_{\text{peak,REF}}$  is rounded to 320 UI.

The effect of low-pass filtering on the  $R(n)$  measurement is shown in Fig. 22. To generate  $R(n)$  without filtering, sub-sampled PD data were taken off-chip and correlated on a field-programmable gate array (FPGA). Filtering with 1x BW corresponds to  $R(n)$  measured on-chip after the first MV stage in the CDR. The 1/3x and 1/9x BW curves are taken

Fig. 21. (a) Adaptation of  $n_{\text{peak}}$  while  $K_G$  set to maximum and (b)  $R(n)$  plotted for same condition showing that  $n_{\text{peak}}$  adapts to the correct value.Fig. 22. Measured  $R(n)$  with and without low-pass filtering of PD output.Fig. 23. (a) Phase noise of CDR reference clock used in adaptation measurements and measured  $K_G$  adaptation curves for three test cases.

after additional filtering by the second MV stage, whose size of voting window was varied to achieve different filter bandwidths. Note that all of the filtered  $R(n)$  curves use downsampled data and only have a time resolution of 32 UI. Fig. 22 confirms that without filtering, the oscillations in  $R(n)$  are almost impossible to discern as the unfiltered  $R(n)$  curve is dominated by a large delta function and has large ripple caused by duty cycle distortion (DCD). Filtering with 1x and 1/3x BW improves visibility of the oscillation while filtering with 1/9x becomes excessive, adding undesirable phase distortion and moving the zero crossing and peak locations.

The performance of the adaptive loop gain technique was measured with the CDR reference clock having three



Fig. 24. (a) JTOL for adapted, min, and max  $K_G$ . (b) Minimum JTOL measured between 10–100 MHz after adaptation. (c) Recovered clock jitter versus  $K_G$  for three test cases.

different phase noise profiles and frequency offsets. In Case 1, the SMB100A was phase modulated to give a low-pass phase noise characteristic similar to that of a PLL, with  $-80$ -dBc/Hz in-band phase noise. In Cases 2 and 3, the SMB100A was frequency-modulated, producing reference clock phase noise profiles with  $-20$ -dB/decade roll-off similar to a free-running VCO. Due to equipment limits, the modulation bandwidth was limited to 1 MHz. Fig. 23(a) plots each of the phase noise profiles.

The measured  $K_G$  adaptation curves for each of the three test cases are plotted in Fig. 23(b). The adaptation process is relatively slow due to the extensive use of filtering and a slow LF in the adaptation logic. Hysteresis was used to prevent dithering after adaptation has converged.

Comparing the phase noise plots in Fig. 23(a), we observe that the adapted values of  $K_G$  are roughly proportional to the effective bandwidths of the reference clock phase noise profiles. As shown in Fig. 23(a), the phase noise in Case 2 has the smallest bandwidth, rolling off below  $-80$  dBc/Hz at the lowest frequency, while the phase noise in Cases 1 and 3 pass  $-80$  dBc/Hz at a higher frequency. Because of this, in Case 2, the adaptation algorithm chooses a lower value of  $K_G$ , which is sufficient to suppress the applied reference clock phase noise. In Cases 1 and 3, the phase noise extends to higher frequencies, meaning that a larger CDR loop gain is needed to

suppress jitter over a wider bandwidth. While the bandwidth of the phase noise is quite similar in Cases 1 and 3, it has higher low-frequency content in Case 3. Accordingly, the adaptation chooses the largest value of  $K_G$  in Case 3, providing better suppression of the low-frequency jitter.

To characterize the performance of the adaptation technique, JTOL was measured with  $K_G$  set to minimum, maximum, and following adaptation. The results plotted in Fig. 24(a) show that maximizing  $K_G$  causes undershoot in the JTOL in all three test cases. When  $K_G$  is minimized, BER remains above  $10^{-12}$  in all cases as the CDR is unable to suppress the phase noise of the reference clock. After adaptation, well-behaved JTOL is achieved in all three cases.

To better quantify the adaptation's performance, and given the CDR's bandwidth of approximately 10 MHz, we examine the lowest out-of-band JTOL measured from 10 to 100 MHz for a BER  $< 10^{-12}$  and PRBS31 data. The lowest JTOL over this range is taken to ensure that no undershoot is present. As shown in Fig. 24(b), adaptation leads to near-optimal high-frequency JTOL in all cases. Note that in these measurements,  $K_G$  is adapted prior to applying SJ and held at the adapted value ( $K_{G,REF}$ ) during JTOL tests.

The relationship between  $K_G$  and the jitter of the CDR's recovered half-rate clock is also plotted in Fig. 24(c). The CDR clock jitter was measured by an Agilent DCA-86100D

sampling scope with precision time base module as well as an Agilent EXA 9010A spectrum analyzer, by integrating the measured phase noise from 10 to 100 MHz.

The results show that the adapted  $K_G$  values achieve close to the lowest achievable clock jitter. However, they also reveal that clock jitter is a relatively weak predictor of the JTOL performance plotted in Fig. 24(b). In all cases, setting  $K_G$  to its highest setting leads to significant undershoot in the JTOL response but in Cases 1 and 3, the recovered clock jitter remains close to optimal.

### VIII. CONCLUSION

In this paper, an adaptive loop gain PI-based CDR has been demonstrated, using the autocorrelation of the BB-PD output to maximize loop gain while preventing the CDR from becoming too underdamped. The proposed CDR adapts to achieve near-optimal high-frequency JTOL and recovered clock jitter, without requiring knowledge of the input jitter profile. Unlike prior works using autocorrelation, the proposed work does not use any pre-calculated parameters, making it robust to variations and easily portable between CDR designs. Filtering of RJ makes the system robust to any large white RJ that may be encountered.

### ACKNOWLEDGMENT

The authors would like to thank CMC Microsystems for providing CAD tools and test equipment.

### REFERENCES

- [1] M.-J. Park and J. Kim, "Pseudo-linear analysis of bang-bang controlled timing circuits," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 6, pp. 1381–1394, Jun. 2013.
- [2] J.-Y. Lee, J.-H. Yoon, and H.-M. Bae, "A 10-Gb/s CDR with an adaptive optimum loop-bandwidth calibrator for serial communication links," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 8, pp. 2466–2472, Aug. 2014.
- [3] H.-J. Jeon, R. Kulkarni, Y.-C. Lo, J. Kim, and J. Silva-Martinez, "A bang-bang clock and data recovery using mixed mode adaptive loop gain strategy," *IEEE J. Solid-State Circuits*, vol. 48, no. 6, pp. 1398–1415, Jun. 2013.
- [4] H. Song, D.-S. Kim, D.-H. Oh, S. Kim, and D.-K. Jeong, "A 1.0–4.0-Gb/s all-digital CDR with 1.0-ps period resolution DCO and adaptive proportional gain control," *IEEE J. Solid State Circuits*, vol. 46, no. 2, pp. 424–434, Feb. 2011.
- [5] H. Lee, A. Bansal, Y. Frans, J. Zerbe, S. Sidiropoulos, and M. Horowitz, "Improving CDR performance via estimation," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2006, pp. 1296–1303.
- [6] J. Liang, A. Sheikholeslami, H. Tamura, Y. Ogata, and H. Yamaguchi, "A 28 Gbps digital CDR with adaptive loop gain for optimum jitter tolerance," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 122–123.
- [7] M. Mansuri, A. Hadiashar, and C.-K. K. Yang, "Methodology for on-chip adaptive jitter minimization in phase-locked loops," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 50, no. 11, pp. 870–878, Nov. 2003.
- [8] B. Analui, A. Rylyakov, S. Rylov, M. Meghelli, and A. Hajimiri, "A 10Gb/s two-dimensional eye-opening monitor in 0.13  $\mu\text{m}$  standard CMOS," *IEEE J. Solid State Circuits*, vol. 40, no. 12, pp. 2689–2699, Feb. 2005.
- [9] S. Jang, S. Kim, S. H. Chu, G. S. Jeong, Y. Kim, and D. K. Jeong, "An optimum loop gain tracking all-digital PLL using autocorrelation of bang-bang phase-frequency detection," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 62, no. 9, pp. 836–840, Sep. 2015.
- [10] T. K. Kuan and S. I. Liu, "A bang bang phase-locked loop using automatic loop gain control and loop latency reduction techniques," *IEEE J. Solid-State Circuits*, vol. 51, no. 4, pp. 821–831, Apr. 2016.
- [11] S.-W. Kwon *et al.*, "An automatic loop gain control algorithm for bang-bang CDRs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 12, pp. 2817–2828, Dec. 2015.
- [12] N. Da Dalt, "Linearized analysis of a digital bang-bang PLL and its validity limits applied to jitter transfer and jitter generation," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 11, pp. 3663–3675, Dec. 2008.
- [13] T. Chan Carusone, D. A. Johns, and K. Martin, *Analog Integrated Circuit Design*, 2nd ed. Hoboken, NJ, USA: Wiley, 2011.
- [14] J. Liang, A. Sheikholeslami, H. Tamura, and H. Yamaguchi, "On-chip jitter measurement using jitter injection in a 28 Gb/s PI-based CDR," *IEEE J. Solid State Circuits*, vol. 53, no. 3, pp. 750–761, May 2018.
- [15] J. L. Sonntag and J. Stonick, "A digital clock and data recovery architecture for multi-gigabit/s binary links," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1867–1875, Aug. 2006.
- [16] G. R. Gangasani *et al.*, "A 32 Gb/s backplane transceiver with on-chip AC-coupling and low latency CDR in 32 nm SOI CMOS technology," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2474–2489, Nov. 2014.
- [17] D. Fischette, R. DeSantis, and J. H. Lee, "An on-chip all-digital measurement circuit to characterize phase-locked loop response in 45-nm SOI," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2009, pp. 609–612.
- [18] N. Da Dalt, "A design-oriented study of the nonlinear dynamics of digital bang-bang PLLs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 52, no. 1, pp. 21–31, Jan. 2005.
- [19] T. K. Kuan and S.-I. Liu, "A loop gain optimization technique for integer- $N$  TDC-based phase-locked loops," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 7, pp. 1873–1882, Jul. 2015.
- [20] S. Galal and B. Razavi, "10-Gb/s limiting amplifier and laser/modulator driver in 0.18  $\mu\text{m}$  CMOS technology," *IEEE J. Solid State Circuits*, vol. 38, no. 12, pp. 2138–2146, Dec. 2003.
- [21] J. Liang, M. S. Jalali, A. Sheikholeslami, M. Kibune, and H. Tamura, "On-chip measurement of clock and data jitter with sub-picosecond accuracy for 10 Gb/s multilane CDRs," *IEEE J. Solid-State Circuits*, vol. 50, no. 4, pp. 845–855, Apr. 2015.



**Joshua Liang** received the B.A.Sc. degree in engineering science and the M.A.Sc. and Ph.D. degrees in electrical engineering from the University of Toronto, Toronto, ON, Canada, in 2007, 2009, and 2017, respectively.

From 2009 to 2011, he was an Analog Designer with Zarlink Semiconductor (now Microsemi), Ottawa, ON, where he worked on circuits for low-jitter clock synthesis. Since 2012, he has been engaged in research on wireline transceivers. In 2017, he joined Huawei, Markham, ON, Canada, where he has been focused on the research and development of high-speed interfaces.

Dr. Liang was a recipient of the Outstanding Student Paper Award at the IEEE CICC 2017 and the Analog Devices' Outstanding Student Designer Award for 2016.



**Ali Sheikholeslami** (S'98–M'99–SM'02) received the B.Sc. degree from Shiraz University, Iran, in 1990 and the M.A.Sc. and Ph.D. degrees from the University of Toronto, Canada, in 1994 and 1999, respectively, all in electrical engineering.

In 1999, he joined the Department of Electrical and Computer Engineering at the University of Toronto where he is currently Professor and Associate Chair, Research. He was on research sabbatical with Fujitsu Labs in 2005–2006, and with Analog Devices, Toronto, ON, Canada, in 2012–2013. His research interests are in analog and digital integrated circuits, high-speed signaling, and VLSI memory design. He has coauthored over 70 journal and conference papers, 10 patents, and a graduate-level textbook entitled "*Understanding Jitter and Phase Noise*".

Dr. Sheikholeslami served on the Memory, Technology Directions, and Wireline Subcommittees of the ISSCC in 2001–2004, 2002–2005, and 2007–2013, respectively. He currently serves as the Education Chair for both ISSCC and SSCS, and as a member of SSCS Administration Committee. As the SSCS Education Chair, he oversees the SSCS Distinguished Lecturer Program, Webinars, Circuit Contests, and other educational activities. He is an Associate Editor for the *Solid-State Circuits Magazine*, in which he has a regular column entitled "Circuit Intuitions". He was an Associate Editor for the IEEE TCAS-I for 2010–2012, and the program chair for the 2004 IEEE ISMVL.

Dr. Sheikholeslami has received numerous teaching awards including the 2005–2006 Early Career Teaching Award and the 2010 Faculty Teaching Award, both from the Faculty of Applied Science and Engineering at the University of Toronto. He is a registered professional engineer in Ontario, Canada.



**Hirotaka Tamura** (M'02–SM'10–F'13) received the B.S., M.S., and Ph.D. degrees in electronic engineering from Tokyo University, Tokyo, Japan, in 1977, 1979, and 1982, respectively.

In 1982, he joined the Fujitsu Laboratories Ltd., Kawasaki, Japan. After being involved in the development of different exploratory devices such as Josephson junction devices and high-temperature superconductor devices, he moved into the field of CMOS high-speed signaling in 1996 and got involved in the development of a multi-channel high-speed I/O for server interconnects. Since 1996, he has been working in the area of architecture- and transistor-level design for CMOS high-speed signaling circuits. Since 2014, he has been expanding his area to cover devices, circuits, and architectures for post-Moore-era computing.



**Yuuki Ogata** was born in Ehime, Japan, in 1983. He received the B.S. and M.S. degrees from the Department of Electrical and Electronic Engineering, Faculty of Engineering, University of Tokushima, Tokushima, Japan, in 2007 and 2009, respectively.

He joined the Fujitsu Laboratories Ltd., Kawasaki, Japan, in 2009, where he has been engaged in the research and development of high-speed IO interfaces.



**Hisakatsu Yamaguchi** received the B.S. degree in electrical engineering from the Tokyo University of Science, Chiba, Japan, in 1994, and the M.S. degree in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1996.

In 1996, he joined the Fujitsu Laboratories Ltd., Kawasaki, Japan, where he was engaged in research on DRAMs with high-speed I/Os and was responsible for developing MPEG4 Codec ICs. He is currently working on developing high-speed I/Os for high-end servers and super-computers.

Mr. Yamaguchi served on the Technical Program Committees of ISSCC from 2012 to 2016.