

# A 450 fs 65-nm CMOS Millimeter-Wave Time-to-Digital Converter Using Statistical Element Selection for All-Digital PLLs

Ahmed I. Hussein, *Student Member, IEEE*, Sriharsha Vasadi, and Jeyanandh Paramesh<sup>ID</sup>, *Senior Member, IEEE*

**Abstract**—This paper presents a time-to-digital converter (TDC) that operates with a 20–64 GHz input and underpins the phase digitization function in a millimeter-wave all-digital fractional-N frequency synthesizer. A self-calibrated inductor-less frequency divider using dynamic CML latches provides an eight-phase input to a 3-bit “coarse” TDC, which is interfaced to a 5-bit “fine” TDC through a sub-sampling coarse-fine interface circuit. A wide bandwidth low dropout (LDO) on-chip regulator is used to decrease the effect of supply noise on TDC performance. A synthesized digital engine implements calibration using statistical element selection with mean adaptation to alleviate TDC nonlinearity that results from random mismatches and PVT variations. The TDC is fabricated in 65-nm CMOS along with the divider and calibration circuits, and achieves 450-fs resolution. The measured DNL and INL of the TDC are 0.65 and 1.2 LSB, respectively. The TDC consumes 11 mA from 1-V supply voltage. The TDC features a figure-of-merit of 0.167 (0.47) pJ per conversion step without (with) the frequency divider. A single-shot experiment shows that the on-chip LDO reduces the effect of TDC noise by reducing the standard deviation from 0.856 to 0.167 LSB for constant input. The prototype occupies an active area of  $502 \times 110 \mu\text{m}^2$  excluding pads.

**Index Terms**—Fine resolution time-to-digital converter (TDC), fractional-N digital phase-locked loop (PLL), millimeter-wave frequency synthesizer, statistical element selection (SES), TDC nonlinearity calibration.

## I. INTRODUCTION

Time-to-digital converters (TDC), originally developed for application in laser range finders [1], time-of-flight, and timing jitter measurement [2], [3], have been investigated for use in digital phase-locked loop’s (PLL) [4]–[13] to serve as a digital replacement for the phase-frequency detector and the charge pump. Despite a steady increase in the popularity of fractional-N frequency synthesizers [5], [11], [12], the time resolution, linearity, and conversion range of the TDC remain the main obstacles to achieving low in-band phase noise and low spurious content, especially in millimeter-wave synthesizers.

Manuscript received April 30, 2017; revised August 11, 2017; accepted September 27, 2017. Date of publication November 28, 2017; date of current version January 25, 2018. This work was supported in part by National Science Foundation under Grants ECCS-1343324 and CCF-1314876. This paper was approved by Associate Editor Pietro Andreani. (*Corresponding author: Jeyanandh Paramesh.*)

The authors are with the Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA 15213 USA (e-mail: paramesh@ece.cmu.edu).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2762698

Essentially, the resolution of the TDC determines the in-band phase noise of the synthesizer while both resolution and linearity of the TDC determine the spur level. Also, the dynamic range of the TDC must be sufficiently large to cover at least one period of the input signal of the TDC. Thus, if the input frequency to the TDC is low, as would be the case if a long frequency divider were inserted before the TDC, its dynamic range would have to be high. However, inserting a frequency divider does not relax the TDC’s time resolution specification required by the PLL to meet a certain phase noise target [14]. Thus, both the resolution and the dynamic range specifications are stringent. Achieving this specification with sufficient linearity is extremely challenging. On the other hand, if the loop is designed with a low-modulus frequency divider before the TDC, the dynamic range requirement is relaxed due to the shorter input period but the resolution specification remains the same. However, the TDC must now be designed to accommodate a very high input frequency with sufficiently high linearity. This is also extremely challenging. Indeed, TDC’s that operate at input frequencies above 10 GHz with sufficient resolution and linearity have not been demonstrated. This is an important reason why millimeter-wave digital PLL’s are extremely uncommon. The two millimeter-wave digital synthesizers reported to date (see [15], [16]) use the phase-domain architecture with a long, asynchronous divider chain ( $\div 32$ ) that reduces the input frequency of the TDC to around 2 GHz.

This paper describes a two-step TDC that operates over an input frequency range of 5–16.5 GHz. It achieves sufficiently wide dynamic range (over one period at 5 GHz) and fine resolution (8 bits) to enable its use following a fixed divide-by-4 circuit in a 50–66 GHz phase-domain digital PLL. A synthesized digital calibration engine based on statistical element selection (SES) [17] alleviates the TDC nonlinearity that results from PVT variations and random mismatches. This paper is organized as follows. Section II presents a review of TDC architectures and calibration techniques appropriate for the aforementioned millimeter-wave digital PLL. Sections III and IV describe the proposed TDC architecture and circuit design considerations for the constituent blocks, respectively. Section V describes the SES linearization technique, and Section VI describes the calibration implementation. Section VII presents the characterization results from a 65-nm CMOS prototype. Section VIII concludes the paper.



Fig. 1. Simplified block diagram of proposed millimeter-wave two-step TDC.

## II. REVIEW OF TDC ARCHITECTURES FOR DIGITAL PLL'S

The design of TDC's with high resolution, linearity, and dynamic range has been the focus of numerous recent publications [18]–[35]. Vernier delay line-based TDC's can achieve sub-gate delay resolution, but in order to achieve a reasonable dynamic range, they require a large number of stages which also makes it vulnerable to mismatches and PVT variations. The 2-D Vernier delay line [28], [32], [35] can achieve high resolution with low power consumption and high dynamic range with fewer stages compared to a conventional Vernier delay line. However, high routing complexity causes large parasitics at the internal nodes and limits the conversion rate. The two-step TDC is an attractive architecture to simultaneously achieve wide dynamic range and fine resolution. Here, a coarse TDC (CTDC) covers the desired input range and a fine TDC (FTDC) with fewer stages (compared to a Vernier delay line for same dynamic range) measures the residual time following coarse measurement. The main challenge in a two-step TDC is the interface between fine TDC stages and the storage of the residual time from the coarse measurement. Time amplification [23], [26] can relax the resolution requirement of the FTDC in a two-step TDC by amplifying the residual time, typically by using a cross-coupled structure in the metastable region or the input dependent time delay of an SR latch [26]. However, time amplifiers suffer from limited input range, low gain, and high sensitivity to PVT variations. The gated ring oscillator [36] and switched ring oscillator [20] topologies use noise shaping to achieve fine resolution and large dynamic range with low power consumption and without additional calibration, but these techniques operate at low input frequencies due to the need for oversampling.

Calibration methods to mitigate non-linearity in TDC's due to mismatches among delay elements have also received considerable attention [12], [28], [29], [39], [40]. In [12], a random sequence is added at the input of TDC to dither the quantization error at each step. Dithering reduces the power of fractional spurs at the PLL's output at the expense of increased in-band phase noise. In addition, TDC non-linearity causes folding of the quantization noise generated by the dithering which further increases the phase noise. In [29], the time amplifier in a cyclic TDC is calibrated with the goal of maintaining stable gain across a wide input range. A replica-based self-calibration [41] scheme is used in conjunction with chopping at the time amplifier input to compensate offsets in calibration process. However, small errors in the chopping

result in inaccurate gain calibration and increase TDC non-linearity significantly. Another type of calibration is used in the so-called stochastic TDC's; here, the output code of TDC is measured in order to create a histogram of TDC output [37], [38]. Then, the output of each stage is mapped and corrected [38]. To generate a histogram, the input signal should have a uniformly distributed delay relative to the reference over the entire dynamic range of the TDC. In an ADPLL this can be achieved running the DCO at a fractional frequency relative to the reference, hence the phase difference between the DCO and the reference will cover 0 to  $2\pi$  in uniformly distributed steps.

## III. ARCHITECTURE OF PROPOSED TDC

Fig. 1 shows a simplified schematic of the TDC proposed for phase digitization of millimeter-wave inputs in a phase-domain millimeter-wave PLL. A synchronous divide-by-4 reduces the millimeter-wave oscillator frequency to a range suitable to the following stage and generates eight phases spaced  $45^\circ$  apart. A CTDC selects from these eight phases the one whose edge is closest to the edge of a reference signal REF. The selected edge, together with the reference edge, is passed through a subsampling coarse-fine interface (CFI) to a period-normalized FTDC, which quantizes the residual time difference. The thermometer coded outputs of the coarse/FTDC's are converted to binary and combined to form the overall TDC output.

### A. TDC Driver and Coarse TDC

A more detailed schematic of the TDC is shown in Fig. 2. Corresponding waveforms are shown in Fig. 3. A divide-by-4 circuit [Fig. 2(b)] with dynamic CML (DCML) latches using the source coupling and current bleeding locking-range enhancement techniques [42] produces eight  $45^\circ$  spaced phases  $PH\_DIV <7:0>$  at 5–16 GHz from a 20–64 GHz input. The outputs of the divider are fed to a two-step 8-bit TDC comprising a TDC driver, a CTDC stage, a CFI stage, and a pair of FTDC's (FTDC), as shown in Fig. 2. The TDC driver [Fig. 2(a)] converts the eight phases produced by the DCML divider ( $PH\_DIV <7:0>$ ) to CMOS levels ( $PH <7:0>$ ) which are passed to the CTDC and CFI stages along with the reference signal. The TDC driver also generates set (SETB) and reset (RESB) signals of the CTDC and the CFI stage with appropriate delays (not shown in Fig. 2 for simplicity). The CTDC uses the eight phases  $PH <7:0>$  as delayed versions of the input phase  $PH <0>$ . The CTDC comprises eight



Fig. 2. (a) DCML divider-by-4 and TDC Driver. (b) 17 GHz two-step TDC.

arbiters (time comparators) which determine how many phases arrive earlier than the reference signal. A transition detector converts the CTDC output to a one-hot code which is then encoded into a 3-bit word that represents the three MSB's of the TDC's output word. In a phase-domain digital PLL, the quantity of interest is the phase difference between an edge of the TDC's input and a reference edge. Since a conventional delay-based TDC measures time delay between a reference edge and TDC's input edge, it is necessary to normalize its digital output to the TDC's input period [14]. In contrast, in the CTDC, since the “delays” are precisely set by frequency division, the quantization step of the CTDC scales in proportion to the input period. Therefore, its digital output directly represents the phase of the TDC's input (in this case, quantized to 3-bit resolution), and no further normalization is necessary [43]–[45].

#### B. Coarse Fine Interface

The CFI stage performs two key functions in the two-step TDC: (1) to accurately store and pass the time residue from the CTDC to the FTDC, as shown in Fig. 3 and (2) to reduce the switching frequency of  $PH < 7:0 >$  in order to reduce the power consumption of the FTDC. Accordingly, the CFI comprises a sub-sampling circuit which samples  $PH < 7:0 >$  using an early version REF-early of the reference signal. It is important to maintain the same delay difference between the eight sub-

sampled phases  $PH < 7:0 >-S$  and the reference signal. Therefore, the reference signal REF is also retimed using REF-early, as shown in Figs. 2(b) and 3. Thus, the switching frequency of the eight phases is reduced from around 16 GHz to 100 MHz, which reduces power consumption in the FTDC. The eight signals  $S_0-S_7$  generated by the CTDC are then used by the multiplexer MUX in the CFI stage to pass on to the FTDC the time residue between the reference signal REF-S and the phase  $PH < N >-S$  that immediately leads it, as shown in Fig. 3. The reference signal (REF-S) passes through a dummy multiplexer in the CFI stage in order to equalize the delays between the selected signal phase  $PH_N-S$  and reference signal. Additionally, the CFI passes on to a second FTDC the two phases  $PH < N+1 >-S$  and  $PH < N+2 >-S$  which immediately lag REF-S (Fig. 3), in order to measure the delay difference between two consecutive phases (i.e., one quantization period of CTDC). This is done to estimate the input period of the TDC, which is subsequently used for period normalization of the FTDC.

The timing requirement of the delay  $t_d$  between REF and REF-early merits close attention. This delay should be greater than the quantization period of CTDC (time difference between two consecutive phases of divider's outputs). In addition,  $t_d$  should be smaller than one period of the divider output ( $t_{div}$ ) so that all the eight phases are sampled within the time window of interest around the reference edge corresponding to what is observed in the CTDC; otherwise



Fig. 3. Time waveforms of TDC. (a) Eight phases from DCML divider (16 GHz). (b) Reference signal (100 MHz). (c) Sub-sampler outputs (100 MHz). (d) Input of first FTDC. (e) Input of second FTDC.

there will be an error of one divider cycle after the sub-sampler which would saturate the FTDC. The timing condition can be thus expressed as:  $t_{\text{div,MAX}}/8 < t_d < t_{\text{div,MIN}}$ . In this design, the nominal delay difference  $t_{d,\text{NOM}}$  is designed to be 32 ps which is located in the acceptable delay range (25–60 ps) in order to cover input frequency range of (20–66 GHz). It was estimated from Monte Carlo simulation that the standard deviation of  $t_d$  is 1.1 ps which is much smaller than the margin between nominal Vernier delay and the margin to the boundaries of the aforementioned acceptable delay range.

#### C. Fine TDC

Each FTDC comprises a Vernier delay line consisting of 32 stages in order to cover one quantization step of CTDC. Each stage is designed to have a programmable resolution of (200 fs–1.4 ps). The FTDC's each generate 32 output signals that are converted to a one-hot code and then encoded into 5-bit words  $FTDC1<4:0>$  and  $FTDC2<4:0>$ . In the normalization operation,  $FTDC1<4:0>$  is divided by  $FTDC2<4:0>$  using a look-up-table (LUT) and which outputs a 5-bit output word that along with the auto-normalized 3-bit output of the CTDC forms the final 8-bit TDC word. The basic specifications of the CTDC and the FTDC are summarized in Table I.

## IV. CIRCUIT IMPLEMENTATION

#### A. Coarse TDC

The CTDC comprises eight arbiters which determine how many phases arrive earlier than the reference signal. A differential, fully symmetric sense amplifier [Fig. 4(a)] is used to ensure equal delays in the data and the clock paths,

thereby minimizing the time offset in the CTDC core. Before the comparison starts, the arbiter outputs are pulled high together by the PMOS devices ( $M_{p3}$ – $M_{p4}$ ) when SETB is low.  $M_{p3}$  and  $M_{p4}$  are sized to be very small in order to minimize loading on the output node; however, they are sufficiently large to pre-charge the output within half the reference cycle. When SETB becomes high, the input transistors ( $M_{n1}$  and  $M_{n2}$ ) sense the leading edge whereupon the cross-coupled NMOS ( $M_{n3}$ – $M_{n4}$ ) and PMOS ( $M_{p1}$ – $M_{p2}$ ) pairs regenerate the latch outputs in the direction of the leading edge. The input transistors ( $M_{n1}$ – $M_{n2}$ ) are sized large enough to reduce the effect of mismatches, to increase arbiter gain, and to reduce the effects of metastability. Post-layout simulation shows a blackout time (the time difference between the data and clock edges below which the arbiter takes excessively long to settle) of less than 37 fs with supply variation of  $\pm 100$  mV. After the output is latched the arbiter no longer responds to changes in the input and needs to be pre-charged again for the next comparison. Therefore, SETB is an early version of the reference [REF in Fig. 2(b)], so that just a short time before the reference arrives all the arbiters are set and ready. Moreover, to reduce switching current due to the high frequency of  $PH<7:0>$ , NMOS switch ( $M_{n0}$ ) is used to disconnect the current path from ground using SETB when the arbiter is inactive. After sampling the input, a transition detector finds the stage at which the sampled output changes from 1 to 0 and provides a one hot code which is later decoded into a 3-bit binary output. To avoid bubble errors, a three input AND gate with  $Q_{N-1}$ ,  $Q$ , and  $QB_{N+1}$  is used in the transition detector. Since  $PH<7:0>$  rotate every cycle, the first and last stages are also considered to be sequential. In other words, it is assumed that the last stage is followed by the first stage or similarly the first stage is preceded with the last one.

#### B. Coarse-Fine Interface

An array of sub-sampler circuits in the CFI accurately preserves the delay between the eight phases  $PH<7:0>$  and the reference REF while reducing the switching rate of the  $PH<7:0>$ , thereby saving power. The sub-sampling circuits use TSPC flip-flops [Fig. 4(b)] where the input is tied to supply and the high-speed phases  $PH<7:0>$  are connected to the CLK terminals. The early version of REF-early and its complement are connected to  $RES$  and  $RES_B$ , respectively, to reset the sub-sampler's output, as shown in Fig. 4(b). At the rising edge of REF-early, the internal reset devices ( $M_{p2}$ ,  $M_{n4}$ , and  $M_{p5}$ ) of TSPC flip-flop are released and the sub-sampler is activated. The output of the flip-flop will then go high at the first rising edge of the input clock (high-speed signal), and remains high until the next reset signal. In other words, the rising edge is synchronized with the input signal while the falling edge is synchronized with an early reference. Hence, the frequency is reduced but useful timing information is preserved. Post-layout simulation shows that the static current of each sub-sampler stage is 92  $\mu$ A at 16 GHz input frequency and 100 MHz reference frequency. Note that the static current can be eliminated by using additional NMOS transistors controlled by  $RES_B$  [green dotted transistors in Fig. 4(b)] at the expense

TABLE I  
BASIC SPECIFICATIONS OF CTDC AND FTDC

| Specification | Input Frequency | Dynamic Range | Resolution  | Conversion Rate | Normalization  |
|---------------|-----------------|---------------|-------------|-----------------|----------------|
| CTDC          | 12.5 GHz-17 GHz | 58-80 ps      | 7.4 - 10 ps | FRef (100 MS/s) | Pre-normalized |
| FTDC          | FRef (100 MHz)  | 10 ps         | < 400 fs    | FRef (100 MS/s) | Using LUT      |



Fig. 4. (a) Schematic of CTDC arbiter. (b) Schematic of sub-sampler TSPC flip-flop. Green dotted transistors are not used in the actual implementation (short circuit).



Fig. 5. Schematic of FTDC stage.

of larger CLK-to-Q delay, but this was not done in the current prototype. Since the output code of the CTDC should be ready at the input of interface multiplexers [Fig. 2(b)] before the arrival of the sub-samplers outputs, LVT devices were used in the CTDC path (i.e., core, transition detector, and CTDC buffer), HVT devices were used in the CFI with an extra buffer after the sub-samplers. LVT devices were used only in the sub-sampling circuit to accommodate 16-GHz input signals. Post-layout worst-case simulation (HVT = ff, LVT = ss, temperature = 125 °C, and V<sub>DD</sub> = 1.1 V) shows that there is a 32 ps margin between MUX inputs and its selection.

### C. Fine TDC

Fig. 5 shows the schematic of the Vernier FTDC. Cascading a large number of delay unit degrades the TDC non-linearity due to mismatches. In order to alleviate this non-linearity, combinatorial redundancy using SES [17] is introduced in

each FTDC stage, as shown in Fig. 6. Each SES delay unit consists of two inverter stages, and each inverter is comprised of ten switched branches of smaller inverters controlled by a digital word, as shown in Fig. 6. These small inverters are enabled/disabled, to perform SES, using the calibration technique whose details are presented in the next two sections. Implementing a Vernier delay difference between the input signals using different explicit or parasitic capacitances, or loading differences leads to large dependence on input amplitude which exacerbates TDC non-linearity. Therefore, the SES delay elements are designed to be identical in the fast and the slow delay lines. The Vernier delay is set using a 3-bit current DAC, which helps to guarantee a monotonic delay difference versus control word and to implement mean adaptation technique that assists SES calibration. The mean delay difference per stage is designed to be programmable over the range of 0.2–1.4 ps, including post-layout parasitics. This implies that the TDC core, in principle, can work over input frequencies from 11.2 to 78 GHz. Each FTDC stage uses the same arbiter as in the CTDC, but here the reset signal is generated by AND-gating the input signals (Fig. 6). Post-layout simulations, at worst-case process corner, show that the delay of reset signal generation's structure Fig. 6 is less than 15 ps while the delay of SES unit is around 75 ps; this provides the sufficient margin for the arbiter to sample input signals accurately.

## V. FTDC NON-LINEARITY AND ITS MITIGATION

### A. Sources of Non-Linearity

Mismatches in the delay elements and differences in arbiter offsets in the FTDC cause non-linearity in its overall transfer



Fig. 6. Schematic of FTDC stage based on SES with mean adaptation.



Fig. 7. Mismatch variation effect on delay difference of (b) first FTDC stage and (c) final FTDC stage.

characteristic. The effects of these mismatches are illustrated in Fig. 7(a) which shows a histogram generated using Monte Carlo simulation of the first Vernier stage in the FTDC. The nominal delay difference is set to 600 fs which is required to achieve in-band phase noise of  $-103$  dBc/Hz at 66 GHz carrier and 100 MHz reference frequencies [14], [31]. This simulation includes the mismatches of the delay control unit, time arbiter, and SES unit with a single inverter enabled. It is seen that the standard deviation of the Vernier delay (i.e., the difference between the mean delays, or equivalently, the LSB of the FTDC) of one stage exceeds twice the desired

resolution. This simulation also reveals that the delay variation is dominated by mismatch in the inverter delay elements. Note that simply increasing transistor dimensions in the delay inverters to improve linearity is not a viable option. Monte Carlo simulations show that quadrupling  $W$  and  $L$  in the delay elements halves the standard deviation, as expected. However, the target resolution cannot be achieved by sizing alone since the delay variation accumulates along the line, as discussed in the following.

The standard deviation of the total delay difference at the  $n$ th step of FTDC transfer function can be expressed



Fig. 8. Failure probability versus number of selected delay elements for target variation  $< \sigma/100$  using (a) SES and (b) SES with mean adaptation.

approximately as  $\sigma_{n,TDC} = \sqrt{n} \times \sigma_{\text{stage}}$  where  $\sigma_{\text{stage}}$  is the standard deviation of the Vernier delay of a single stage. Fig. 7(b) shows Monte Carlo simulations of the delay variation of the final stage. It is observed that the standard deviation, which follows the above equation is 13 times target LSB. Finally, it is noted that circuit noise is insignificant compared to device mismatches; this was verified through simulation.

#### B. FTDC Calibration Using Statistical Element Selection

In this work, SES [17], [46] is introduced to calibrate the FTDC. SES [47] is a redundancy-based technique that can mitigate the effects of random mismatches in analog circuits [17]. In SES, given a population of  $N$  identical elements, a subset of  $k$  elements can be chosen such that their combination minimizes the standard deviation of a particular parameter. Here, SES is introduced to reduce the standard deviation of the Vernier delay due to random transistor mismatches within the Vernier stages of the FTDC. Specifically, SES is used to ensure that the resolution of each FTDC stage is within some acceptable error from its desired value. Fig. 6 shows simplified schematics of a single FTDC stage and the SES delay element which consists of a cascade of two SES inverters; each SES inverter comprises  $N$  identical inverter slices that can be individually enabled/disabled by a digital calibration engine. The objective of the calibration is to select from the  $N$ -element population the best  $k$ -element subset that minimizes the difference between the desired and actual resolution in a Vernier FTDC stage. An additional requirement is that the DNL and INL of the TDC should be less than one LSB.

The choice of the population size  $N$  is an important consideration in the design of the FTDC. The methodology used to determine  $N$  with the FTDC stage specifications summarized in Table II is described in the following.

- 1) The standard deviation ( $\sigma_{\text{slice}}$ ) of the delay of cell comprising a single slice of a two-inverter cascade, slices, and the delay-control current DAC (Fig. 6) [29], [30], was extracted using Monte Carlo post-layout simulations with statistical device models provided by the foundry.
- 2) Monte Carlo simulations were conducted in MATLAB for a single Vernier stage. In each Monte Carlo instantiation, two populations of  $N$  elements with standard deviation  $\sigma_{\text{slice}}$  were generated. The mean delays of the two populations were adjusted to achieve a target Vernier delay using the delay control DAC.
- 3) The deviation of each stage delay from its mean value is assumed to be the sum of the deviations of the elements in each  $K$ -element subset from the mean delay of each buffer slice.
- 4) As a measure of effectiveness of SES, a failure probability [17] was defined as the statistical frequency with which the deviation of the Vernier delay in each stage from the nominal (desired) value exceeded a given specification spec.
- 5) The simulated failure probability of an FTDC stage was used to estimate the required number of combinations (i.e.,  $N$  and  $k$ ) to achieve a target yield of the FTDC. It should be noted that the yield of the cascaded stages reduces exponentially with the number of stages. The yield of the  $n$ th stage can be expressed as

$$\text{Yield} = (1 - P_{\text{failure}})^n.$$

Fig. 8(a) shows the simulated  $P_{\text{failure}}$  of a single FTDC stage using SES versus the size of the selected subset  $k$  for different population sizes  $N$ . This plot is generated using  $10^5$  Monte Carlo simulations in MATLAB. It can be observed that to achieve overall yield more than 90%,  $N$  should be higher than 15 with  $k$  of 4. With  $k = 5$  the yield improves to 91.3%. This means that there are more than 1365 or 3003 possible combinations for  $k = 5$  and  $k = 4$ , respectively.

TABLE II  
SUMMARY OF SIMULATION PARAMETERS FOR FAILURE PROBABILITY

| Parameter | $\sigma_{\text{Slice}}$ | Spec (Stage) | Target LSB | Target INL |
|-----------|-------------------------|--------------|------------|------------|
| Value     | 1.05 ps                 | 5 fs         | 600 fs     | < 1 LSB    |

TABLE III  
REQUIRED NUMBER OF N AND K FOR DIFFERENT YIELDS OF FTDC BASED ON SES WITH AND WITHOUT MEAN ADAPTION

| Parameter                | $N_{\text{Min}}$ | $k$ | $N_{C_k}$ | Yield of 1-stage | Yield of 32-stage |
|--------------------------|------------------|-----|-----------|------------------|-------------------|
| SES                      | 14               | 4   | 1001      | 99%              | 70.6%             |
|                          | 16               | 4   | 1820      | 99.9%            | 96.1 %            |
| SES with mean adaptation | 9                | 3-4 | 84-126    | 99%              | 70.9 %            |
|                          | 11               | 3-5 | 165-462   | 99.9%            | 96.1%             |



Fig. 9. Simplified conceptual diagram of SES with mean adaption. (a) Normal case of delay distribution of fast and slow line. (b) Delay distributions with larger delay spread due to PVT or local mismatch variation. (c) Delay with larger delay spread after mean adaption.

### C. SES With Mean Adaptation

SES using random device mismatches alone can be used to find pairs of delays that meet the target resolution of each FTDC stage. In other words, the means of the delay distributions in the fast and slow delay cells can be fixed, and a search can be conducted to find a subset from the randomly mismatched populations of each delay cell that best meets the target resolution. This case is illustrated in Fig. 9(a). The probability of finding the best matched  $k$ -element subset elements can be increased by increasing the population size  $N$ . This is confirmed by the Monte Carlo simulations of Fig. 8(a), the results of which are summarized in Table III.

However, using SES alone in the manner described above has two limitations. First, uncertainty in the means of the delay distributions can significantly decrease the probability of finding a combination that meets the delay difference spec. This case is illustrated in Fig. 9(b). Such uncertainties are caused by PVT variations or other sources of mismatch within the FTDC stage, for example, in the arbiter or the delay control transistors. Second, choosing a large population size  $N$  with the intent of using SES only increases the die area, power consumption, calibration time, and implementation complexity.

To overcome the above limitations, the SES control algorithm is combined with adaptation of the mean of the Vernier delay. In essence, each delay cell is designed to that the mean delay can be adjusted in steps that are a fraction of  $\sigma_{\text{slice}}$ , thereby allowing the distributions of the stage delays to be adjusted. This enables a large calibration range, and allows meeting the yield specification with smaller population size and fewer combinations compared to SES alone. This case is shown in Fig. 9(c). In this design, the nominal Vernier delay can be shifted from adjusted from 0.2 to 1.4 ps (0.2  $\sigma_{\text{slice}}$  to 1.4  $\sigma_{\text{slice}}$ ) with 3-bit control.

The effectiveness of SES with mean adaptation is characterized using Monte Carlo simulations in MATLAB with statistical mismatch data obtained by running Monte Carlo circuit simulations. The simulation conditions are the same as those used before and are summarized in Table II. The search algorithm described previously is used to find the best  $k$ -element subset that meets the target Vernier delay of each FTDC stage. If a  $k$ -element subset that meets the target spec is not found, the calibration algorithm adapts the mean of the delay distribution. Then, two new populations of  $N$  elements with the new standard deviation of  $\sigma_{\text{slice}}$  (i.e., which is calculated from Monte Carlo circuit simulations with new mean value) are generated, and the search is repeated.

Fig. 8(b) shows the simulated  $P_{\text{failure}}$  of a single FTDC stage using SES with mean adaption for different  $N$  and  $k$ . It can be observed that to achieve a stage yield higher than 90%,  $N$  should be higher than 9 with  $k$  between 3 and 4 which can be achieved by only 84/126 combinations. Table III compares the required number of combinations of each FTDC stage, based on SES with and without mean adaption, to achieve



Fig. 10. Simplified schematic of on-chip TDC calibration engine.

yield of 99% and 99.9%. Clearly, by using mean adaptation with SES, the calibration time, area and power consumption can be reduced significantly. Moreover, a large improvement in area and complexity of the on-chip calibration engine can be achieved as the number of possible subsets of selected elements decreases. In addition, SES with mean adaptation helps to improve calibration robustness if there is a large difference between the target TDC resolution and center of delay distribution.

## VI. CALIBRATION IMPLEMENTATION

### A. FTDC Calibration Principle

Suppose that two digital signals with slightly different but precisely defined periods  $T_S$  and  $T_F$  are applied to the FTDC in such a manner that their rising edges are aligned at time  $t = 0$ . Assume that  $T_S > T_F$  and define the difference between periods  $\Delta t = (T_S - T_F)$ . Also define  $LSB_T$  to be the nominal Vernier delay in each FTDC stage, and assume that the period difference and the Vernier delay are related to each other as  $\Delta t = LSB_T/N$ , where  $N$  is an integer. In other words,  $LSB_T$  is the ideal quantization step of the FTDC. Now, suppose that all mismatches in the first FTDC stage have been perfectly calibrated such that its Vernier delay is exactly equal to  $LSB_T$ . After  $C$  edges of  $T_F$  have elapsed, the edge of  $T_F$  leads the edge of  $T_S$  by an amount  $\Delta T(C) = C(T_S - T_F)$ . It is easy to see that the first stage output will transition from 0 to 1 when  $\Delta T(C)$  exceeds  $LSB_T$ . In other words, the number of edges after which the first stage toggles is given by  $C(1) = N$ . This observation can now be generalized to find the number of  $T_F$  edges  $C(M)$  after which the output of the  $M$ th FTDC stage will transition from 0 to 1; assuming that all FTDC stages up to and including the  $M$ th stage have been perfectly calibrated, it can be easily shown that  $C(M) = M \cdot N$ .

This principle can be used to sequentially calibrate each FTDC stage starting with the first. By applying two signals with slightly different periods, as described above, a “time ramp” can be applied to the FTDC. The time step of the ramp,

which is the delay difference between the  $T_S$  and  $T_F$  edges accrued every  $T_F$  cycle, can be set to a fraction of the desired resolution by adjusting  $N$ . Ideally, the output of the  $M$ th stage will toggle after  $C(M) = M \times N$  rising edges, but, due to mismatches, this transition might occur after a different number of rising edges. The calibration engine then measures the resolution of a particular stage for available SES combinations and finds the combination that meets most closely meets spec by choosing the one for which the actual count most closely matches the expected value  $C(M)$ . This process is repeated for all stages in the FTDC, thereby minimizing the overall DNL/INL.

### B. FTDC Calibration Engine

Fig. 10 shows a schematic of the SES calibration engine. The calibration controller initiates calibration of the  $M$ th FTDC stage by providing a trigger signal to a dual-channel function generator that outputs calibration signals with periods  $T_S$  and  $T_F$  with period difference  $\Delta t$ . At the triggering instant, the rising edges of the two calibration signals are aligned. Following offset and glitch removal, these signals are fed directly to the FTDC, with the CFI bypassed. The output of the  $M$ th stage is selected by the 32-to-1 mux. A  $32 \times 16$  register file generates 16-bit control words for each FTDC stage, with 10 bits for the SES buffers and 3 bits for each of the two delay control DAC’s. The goal of calibration is to perform element selection such that the transition of the  $M$ th stage occurs at a rising edge at  $T_r = (M-1/K) \times LSB_T$  after the calibration of a particular stage is initialized. Here,  $K$  is a programmable calibration parameter that determines the acceptable error of each FTDC stage.

### C. FTDC Calibration Algorithm

The algorithm implemented in the FTDC calibration engine can be explained using two exemplary scenarios, as shown in Fig. 11 with calibration parameters summarized in Table IV. The calibration engine operates in two states: monitoring state and running state.



Fig. 11. Simplified timing waveforms for FTDC stage calibration for case of (a) success (achieve the desired spec) and (b) failure (violate the desired spec).

TABLE IV  
SUMMARY OF FTDC CALIBRATION PARAMETER

| Parameter | Target LSB<br>( $LSB_T$ ) | Calibration address<br>(Stage # 1) | $\Delta t$ | Excepted count<br>(N) | Target error<br>( $K=2$ ) |
|-----------|---------------------------|------------------------------------|------------|-----------------------|---------------------------|
| Value     | 400 fs                    | 0 ( $M=1$ )                        | 100 fs     | 4                     | < LSB/2                   |

*1) Monitoring State:* The test signals are initialized to have zero phase difference (Fig. 11). The word Cal\_address<4:0> determines which stage is being calibrated. The transition of this stage is monitored using a 32-to-1 multiplexer (Fig. 10). The 16-bit control word of the stage being calibrated is generated from a pattern word generator and stored in the corresponding location in the  $(32 \times 16)$  register file. Also, a replica of the input signal is used to trigger a counter to calculate the number of clock cycles (Actual\_count) between start of calibration and the transition instant. Once the transition occurs, Actual\_count is compared with Expected\_count ( $C(M) = M \times N$ ) as follows:

$$Out = \begin{cases} 1 & |Actual\_count - Expected\_count| < \frac{N}{K} \\ 0 & \text{otherwise.} \end{cases}$$

In the illustration of Fig. 11(a), the first stage is monitored and once the transition occurs, the Actual\_count (4 in this example) is sampled and compared with Expected\_count. Then, the comparison result is fed to set the calibration engine into the Running State.

*2) Running State:* Based on the comparison result, calibration engine moves to one of two cases.

- 1) Success Case: An example of this case is shown in Fig. 11(a), where the transition occurs while the difference between Actual\_count and Expected\_count

is 0, which indicates that the target accuracy specification has been met. Fig. 12(a) shows a simplified flow chart of the calibration algorithm during success state, when the comparison result indicates that current calibrated stage meets the desired spec. During this state, the eight best calibration words of the current calibrated stage are found, ranked, and sorted in Memory1 (Fig. 10). Once Memory1 is full, its contents are moved to Memory2. Then, Memory1 is erased and the calibration address is updated. Calibration of the next stage is then initialized by sending a trigger to the external signal generator to re-align the calibration inputs (i.e., reset phase difference to zero).

- 2) Failure State: An example of this case is shown in Fig. 11(b), where the transition occurs while the difference between Actual\_count and Expected\_count is 2, which violates the target accuracy spec. Fig. 12(b) shows a flow chart of the calibration algorithm in failure state, when the comparison result indicates that current calibration word for the current FTDC stage violates the desired spec. The calibration address is fixed and pattern word generator produces the next 16-bit calibration word. Calibration of current stage is re-initiated by sending a trigger to the external signal generator to re-align the calibration inputs; calibration then moves



Fig. 12. Simplified flow chart of FTDC stage calibration for cases of (a) calibration word success, stage success, and whole calibration success and (b) calibration word failure, stage calibration failure, and whole calibration failure.

to monitoring state. For a given stage, if no combination is found to meet the desired spec, the calibration engine updates the calibration address to the previous

stage and an alternative calibration word is loaded from Memory2 into the register file. The calibration address is then set to the current FTDC stage and its calibration



Fig. 13. Simplified block diagram of on-chip CFI calibration engine.



Fig. 14. (a) Simplified block diagram of millimeter-wave TDC chip. (b) Chip die photograph.

is repeated. For two consecutive stages, if there are no possible combinations that meet the spec, the target calibration accuracy is relaxed and the whole calibration is repeated for the entire FTDC.

#### D. CFI Calibration

As shown in Fig. 2(b), interface stage includes nine paths for eight phases ( $PH<7:0>$ ) and reference signal (REF) and any mismatch in these paths affects the accuracy of FTDC and lead to nonlinearity regrowth. Since FTDC has been calibrated, it is possible to accurately measure any time offsets in different paths of the interface stage. To calibrate the mismatch between these paths, current starved tunable delay elements are inserted at the end of interface stage, as shown in Fig. 13. A common external signal (Cal\_test) is applied through input multiplexer (MUX Cal) to all nine paths (eight phases and reference) of the interface stage. Ideally, the FTDC will measure the same delay difference between all delay paths and output of FTDC should be zero. Therefore, the output of FTDC is monitored by on-chip calibration engine which adapts the tunable delay elements sequentially by changing the bias voltage of PMOS and NMOS transistors through voltage

DACs, till the output of FTDC is zero for any two consecutive paths.

#### E. Practical Considerations for Calibration

Several practical issues must be considered in the on-chip implementation of the proposed calibrations. These are discussed in this section.

- 1) Static delay between the lag and lead calibration sources can arise from routing mismatch, static mismatch in the multiplexing circuit and due to test setup mismatches in this prototype. This can lead to calibration failure. Therefore, the static delay offset is explicitly removed during calibration. Initially, static delay offset at the inputs to the FTDC stage is measured using a dummy FTDC stage. This measured offset is subtracted during the calibration of the each FTDC stage. In this implementation, static offsets up to 4 ns can be cancelled.
- 2) Noise in the supply voltage, device noise, and jitter in the calibration test signals can affect the accuracy of FTDC delay line. Therefore, the behavior of FTDC delay line during calibration might not replicate the actual delay during the operating conditions. In this



Fig. 15. Measurement setup of millimeter-wave TDC chip.



Fig. 16. Output code of millimeter-wave TDC with input frequencies of (33.33 GHz and 83.334 MHz).

prototype, calibration of each stage can be averaged up to 8192 times to mitigate random noise.

The FTDC resolution is also sensitive to supply noise. Therefore, the power supply to the FTDC and CFI was provided by a dedicated low dropout (LDO) regulator, while the CTDC and the divider use another LDO. The LDO topology is similar to [50] and uses folded-cascode error amplifier. The LDO consumes 1.6 mA from a 1.2-V supply and has a loop phase margin of  $60^\circ$  and unity gain bandwidth higher than 40 MHz.

- 3) Jitter in the calibration test signals can limit the post-calibration linearity of the FTDC. Behavioral simulations, conducted to derive jitter specifications, revealed that jitter of up to 4-ps rms can be tolerated to sufficient

to achieve  $\text{INL} < 300$  fs. Note that this can easily be achieved by signal generators (see Fig. 15). However, it is important to note that it is straightforward to design on-chip test sources with this jitter performance, thereby enabling a completely integrated implementation of the proposed calibration. For example, the calibration sources can be implemented using two on-chip ring-oscillator-based PLL's operating from the reference crystal oscillator. Also note that the CFI calibration needs a single clock which can be generated in a similar fashion.

- 4) Temperature Effects. The proposed calibration techniques operate in the foreground to alleviate the effects of process variations, transistor mismatch and on-chip supply drop. In a digital PLL, calibration is performed with the PLL configured in open loop [51]. However, temperature variation effects are not resolvable and generally require periodic calibration, especially when the chip is first powered up. Fortunately, the effect of temperature variation on the TDC is relatively small, as shown in [51], which describes a digital PLL using this TDC. Also, it is noted that due to the completely digital nature of calibration, a possible way to overcome temperature variations is to include a temperature sensor on chip, and to store calibration data over temperature in a LUT. Alternatively, a modification that pushes the proposed calibration technique into the background may be more desirable, but is outside the scope of this paper.
- 5) Calibration Time: In Section V (summarized in Table IV), it was shown that in order to achieve overall yield higher than 95% with  $\text{INL}$  less than  $\text{LSB}/2$ , each stage requires 1820 (165–462) combinations with SES (SES with mean adaptation). However, as discussed in previous section, the calibration of any stage is terminated once the best eight control words are



Fig. 17. Measured (a) DNL and (b) INL of mm-wave TDC with input frequencies of (33.33 GHz and 83.334 MHz).



Fig. 18. Output code of millimeter-wave TDC with high input frequency of 64 GHz.

found and stored in Memory1. Therefore, in practice, the calibration time varies depending on internal calibration parameters, mismatch level, and desired TDC resolution. Behavioral (Verilog/Verilog-A AMS) simulations including mismatch in the FTDC stages were performed to estimate the calibration time under different simulated conditions. With  $1/T_F = 100$  MHz,  $\Delta t = T_S - T_F = 100$  fs, and  $LSB_T = 400$  fs, the best and worst case calibration time were estimated to be 0.2 and 85 s, respectively.

## VII. MEASUREMENT RESULTS

Fig. 14(a) and (b) shows the simplified schematic and die photograph of the 65-nm CMOS prototype chip. The input is provided externally via an on-chip balun. The 1 V supply voltages for the divider core and TDC are provided by separate wide bandwidth on-chip LDO regulators to reduce their sensitivity to supply noise. Fig. 15 shows the measurement setup, along with test instrument details. The divider and the

two-step TDC consume 7.3 and 3.9 mA from a 1 V supply at 64 GHz and 100 MHz, respectively. The FTDC and CFI are calibrated at 100 MHz frequency. After the chip is powered up, the calibration flags of the divider are monitored. After the divider calibration is complete [42], calibration of the FTDC and CFI is enabled. Fig. 16 shows the un-normalized transfer function of the TDC (CTDC + FTDC1) before and after calibration. A 33.333 GHz input to the divider and an 83.3334 MHz reference are applied to achieve a time-domain ramp input with 100-fs time step. Cumulative code count with 128K samples is used to overcome the effect of external signal jitter on the TDC output code and to average out the TDC noise. It can be observed that before the calibration there are several missing codes which are not present in the calibrated TDC. Fig. 17 shows the measured DNL and INL of the TDC before and after calibration. It is seen that calibration reduces the maximum DNL from 6.2 to 0.65 LSB and the INL from 9 to 1.25 LSB. DNL peaks are seen to occur at the transition between two consecutive divider phases; this is due to unresolvable mismatch between these phases. The results in Figs. 16 and 17 represent the worst case for linearity which covers the entire range of TDC's output code at the finest measurable resolution of 450 fs. The dynamic range of FTDC1 is roughly 14.8 ps which covers the time difference between two consecutive phases of CTDC's inputs.

The TDC is further characterized at different input frequencies between 20 and 64 GHz, and reference frequencies up 200 MHz. Fig. 18 shows the un-normalized output of the TDC (CTDC + FTDC1) before and after calibrations of FTDC and CFI when a 64 GHz input and 159.9975-MHz reference are applied to achieve a time-domain ramp input with 100-fs time step. In this case, the spacing of two consecutive phases of CTDC's inputs is roughly 7.82 ps which is approximately half the range of FTDC when its calibrated resolution is set to 450 fs. Therefore, a large jump is observed in the un-normalized output code of CTDC + FTDC1 when the reference signal crosses a quantization bin of the CTDC (i.e., CTDC output decimal equivalent increases by one) although the FTDC

TABLE V  
PERFORMANCE SUMMARY AND COMPARISON WITH STATE-OF-THE-ART TDCs

| Reference                                                       |            | JSSC [32] | JSSC [29] | VLSIC [23]                                    | JSSC [22]     | JSSC [24] | JSSC [20]            | RFIC [33]               | This Work |
|-----------------------------------------------------------------|------------|-----------|-----------|-----------------------------------------------|---------------|-----------|----------------------|-------------------------|-----------|
| Architecture                                                    | 2D Vernier | Cyclic    | Two Step  | Async. Pipeline                               | True Pipeline | SRO       | Flash $\Delta\Sigma$ | Two Step                |           |
| Resolution (ps)                                                 | 4.8        | 1.25      | 3.75      | 1.76                                          | 1.12          | 0.32      | 1.6                  | 0.45                    |           |
| Number of Bits                                                  | 7          | 8         | 7         | 10                                            | 9             | 13        | 7                    | 8                       |           |
| Conv. rate (MS/s)                                               | 50         | 50        | 200       | 300                                           | 250           | 50        | 50                   | 50-200                  |           |
| Input freq. (GHz)                                               | 0.05       | 0.1       | 0.2       | 0.3                                           | 0.25          | 0.25      | <0.25                | 17*<br>64**             |           |
| Range (ps)                                                      | 610        | 320       | 480       | 1800                                          | 573.4         | 2000      | 320                  | 200ps                   |           |
| Linearity (LSB)                                                 | DNL        | 1         | 0.7       | 0.9                                           | 0.6           | 0.6       | -                    | -                       | 0.65      |
|                                                                 | INL        | 3.3       | 3         | 2.3                                           | 1.9           | 1.7       | -                    | 0.875                   | 1.25      |
| Single-Shot (LSB)                                               | -          | -         | -         | 0.7                                           | 0.69          | -         | -                    | 1.7                     |           |
| ENOB                                                            | 4.9        | 6         | 5.28      | 8.46                                          | 7.57          | -         | 6.1                  | 6.89                    |           |
| FOM                                                             | 1.139      | 1.34      | 0.463     | 1.086                                         | 0.325         | -         | 0.386                | 0.167*<br>0.47**        |           |
| Calibration                                                     | Yes        | Yes       | No        | Yes                                           | No            | No        | No                   | Yes                     |           |
| Supply (V)                                                      | 1.2        | 1.2       | 1.2       | -                                             | 1.2           | 1         | 1.1                  | 1                       |           |
| Power (mw)                                                      | 3          | 4.3       | 3.6       | 115                                           | 15.4          | 1.5       | 1.32                 | 3.9*<br>11**<br>12.6*** |           |
| Area (mm <sup>2</sup> )                                         | 0.07       | 0.07      | 0.02      | 0.88                                          | 0.14          | 0.02      | 0.08                 | 0.02#<br>0.089##        |           |
| Tech. (nm)                                                      | 65         | 130       | 65        | 130                                           | 65            | 90        | 40                   | 65                      |           |
| $FOM = \text{Power} / (2^{ENOB} x F_s) (\text{pJ / conv.step})$ |            |           |           | $ENOB = \text{Bits} - \log_2(\text{INL} + 1)$ |               |           |                      |                         |           |

\* Without DCML divider   \*\* With DCML divider   \*\*\* With DCML divider and LDO

# Without calibration structures

## With calibration structures

output does not reach its maximum range. This problem can be eliminated and normal stair case output can be obtained if the normalized FTDC word from LUT normalizer is read instead. However, in this chip, the LUT output was not accessible due to a design error in its encoder. This error was fixed in the TDC that was used in the ADPLL reported in [51]. Nevertheless, Fig. 18 shows that calibration of the FTDC and CFI eliminate missing codes. Within a single quantization bin of the CTDC (i.e., between a single pair of consecutive phases of  $PH<7:0>$ ), DNL < 0.6 LSB, and INL < 0.89 LSB were measured after calibration. Furthermore, by changing the desired resolution of the FTDC stage, the two-step TDC achieves programmable resolution between 450 fs and 1.4 ps which results in an overall dynamic range up to 307.2 ps (i.e., corresponding to 13 GHz). In this prototype, however, the maximum measurable dynamic range was 200 ps (corresponding to 20 GHz) limited by the lower end of the operating range of the divider. Single-shot measurements were performed to characterize the noise of the FTDC.

Conventionally, an external signal is split into two paths and independent delays are applied to each path. However, the jitter due to the external setup usually dominates on the code distribution despite the signal source being common. To avoid this issue, two consecutive phases  $PH<3:2>$  were applied to FTDC2 and 800K output samples were collected. Fig. 19(a) and (b) show the output code distribution without (with) LDO where the standard deviation of TDC noise is 0.856 (0.167) LSB. Experimental results are benchmarked and summarized against recent TDCs in Table V.

With the finest achievable resolution of 450 fs, the in-band phase noise contribution of the TDC in a digital PLL can be calculated to be  $-105$  dBc/Hz for 100-MHz reference frequency and 66-GHz output frequency [14]. However, the digital PLL reported in [52] that uses this TDC achieves a phase noise of  $-82$  dBc/Hz at 100-kHz offset. It is important to note that the in-band phase noise of the aforementioned PLL is dominated by the reference phase noise, and by less significant contributions from by other



Fig. 19. Single shot experiment of FTDC: (a) Without LDO. (b) With LDO.

sources including divider noise, DCO quantization noise and thermal noise of the TDC and counter. In particular, in order to meet  $-100$  dBc/Hz in-band phase noise at 60-GHz output with 100-MHz reference, the phase noise of reference clock should be better than  $-156$  dBc/Hz; however, the reference used to measure the PLL has phase noise of only  $-138$  dBc/Hz at 1 MHz, and contributes about  $-82$  dBc/Hz at 60-GHz output frequency.

### VIII. CONCLUSION

This paper has described a two-step TDC which measures the time difference between a reference signal and an input signal whose frequency can span 5–20 GHz. The resolution of the TDC can be programmed over the range 200 fs to 1.4 ps, and has 8-bit dynamic range. Together with a synchronous divide-by-4, the TDC underpins the fractional counter in a phase-domain digital PLL, and can in principle accommodate input frequencies spanning 20–80 GHz (but is limited by the operating frequency range of the divider). Sub-sampling in the CFI stage, helps to simultaneously maintain the target TDC resolution while significantly reducing power consumption. Digital calibration is extensively used to linearize the FTDC, to equalize delays in the interface stage between the coarse and fine stages, and to achieve robust, power-optimized operation of the frequency divider. Characterization of a 65-nm CMOS prototype demonstrate a TDC that has the highest operation frequency and finest time resolution reported to date at this input frequency and conversion rate, and achieves the best figure-of-merit among state-of-the-art CMOS TDC's.

### ACKNOWLEDGMENT

The authors would like to thank Dr. S. Saberi from iSono Health for her technical assistance.

### REFERENCES

- [1] K. Maatta and J. Kostamovaara, "A high-precision time-to-digital converter for pulsed time-of-flight laser radar applications," *IEEE Trans. Instrum. Meas.*, vol. 47, no. 2, pp. 521–536, Apr. 1998.
- [2] J.-C. Hsu and C. Su, "BIST for measuring clock jitter of charge-pump phase-locked loops," *IEEE Trans. Instrum. Meas.*, vol. 57, no. 2, pp. 276–285, Feb. 2008.
- [3] K. Nose, M. Kajita, and M. Mizuno, "A 1-ps resolution jitter-measurement macro using interpolated jitter oversampling," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2911–2920, Dec. 2006.
- [4] Z. Xu, M. Miyahara, K. Okada, and A. Matsuzawa, "A 3.6 GHz low-noise fractional-N digital PLL using SAR-ADC-based TDC," *IEEE J. Solid-State Circuits*, vol. 51, no. 10, pp. 2345–2356, Oct. 2016.
- [5] L. Vercesi, L. Fanori, F. D. Bernardinis, A. Liscidini, and R. Castello, "A dither-less all digital PLL for cellular transmitters," *IEEE J. Solid-State Circuits*, vol. 47, no. 8, pp. 1908–1920, Aug. 2012.
- [6] A. Elkholy, T. Anand, W. S. Choi, A. Elshazly, and P. K. Hanumolu, "A 3.7 mW low-noise wide-bandwidth 4.5 GHz digital fractional-N PLL using time amplifier-based TDC," *IEEE J. Solid-State Circuits*, vol. 50, no. 4, pp. 867–881, Apr. 2015.
- [7] J. Hong *et al.*, "A  $0.004 \text{ mm}^2 250 \mu\text{W } \Delta \Sigma$  TDC with time-difference accumulator and a  $0.012 \text{ mm}^2 2.5 \text{ mW}$  bang-bang digital PLL using PRNG for low-power SoC applications," in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2012, pp. 240–242.
- [8] P.-Y. Wang, J.-H. C. Zhan, H.-H. Chang, and H.-M. S. Chang, "A digital intensive fractional-N PLL and all-digital self-calibration schemes," *IEEE J. Solid-State Circuits*, vol. 44, no. 8, pp. 2182–2192, Aug. 2009.
- [9] M.-H. Hsieh, L.-H. Chen, S.-L. Liu, and C. C.-P. Chen, "A  $6.7 \text{ MHz}$ -to- $1.24 \text{ GHz}$   $0.0318 \text{ mm}^2$  fast-locking all-digital DLL in 90 nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2012, vol. 45, no. 2, pp. 244–246.
- [10] F. Opteynde, "A 40 nm CMOS all-digital fractional-N synthesizer without requiring calibration," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2012, vol. 50, no. 12, pp. 346–347.
- [11] E. Temporiti, C. Weltin-Wu, D. Baldi, R. Tonietto, and F. Svelto, "A 3 GHz fractional all-digital PLL with a 1.8 MHz bandwidth implementing spur reduction techniques," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 824–834, Mar. 2009.
- [12] E. Temporiti, C. Weltin-Wu, D. Baldi, M. Cusmai, and F. Svelto, "A 3.5 GHz wideband ADPLL with fractional spur suppression through TDC dithering and feedforward compensation," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2723–2736, Dec. 2010.
- [13] C.-R. Ho and M. S.-W. Chen, "A digital PLL with feedforward multi-tone spur cancellation scheme achieving  $<-73$  dBc fractional spur and  $<-110$  dBc reference spur in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 3216–3230, Dec. 2016.
- [14] R. B. Staszewski and P. T. Balsara, *All-Digital Frequency Synthesizer in Deep-Submicron CMOS*. Hoboken, NJ, USA: Wiley, 2006.
- [15] W. Wu, R. B. Staszewski, and J. R. Long, "A 56.4-to-63.4 GHz multi-rate all-digital fractional-N PLL for FMCW radar applications in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 49, no. 5, pp. 1081–1096, May 2014.

- [16] H. Sakurai, Y. Kobayashi, T. Mitomo, O. Watanabe, and S. Otaka, "A 1.5 GHz-modulation-range 10 ms-modulation-period 180 kHz<sub>rms</sub>-frequency-error 26 MHz-reference mixed-mode FMCW synthesizer for mm-wave radar application," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2011, pp. 292–294.
- [17] G. Keskin, J. Proesel, J.-O. Plouchart, and L. Pileggi, "Exploiting combinatorial redundancy for offset calibration in flash ADCs," *IEEE J. Solid-State Circuits*, vol. 46, no. 8, pp. 1904–1918, Aug. 2011.
- [18] S. Alahdab, A. Mantyniemi, and J. Kostamovaara, "Review of a time-to-digital converter (TDC) based on cyclic time domain successive approximation interpolator method with sub-ps-level resolution," in *Proc. IEEE Nordc-Mediterranean Workshop Time Digit. Converters (NoMe TDC)*, Oct. 2013, pp. 42–46.
- [19] Y. Cao, P. Leroux, W. De Cock, and M. Steyaert, "A 1.7 mW 11b 1–1–1 MASH ΔΣ time-to-digital converter," *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2011, vol. 44, no. 4, pp. 480–481.
- [20] A. Elshazly, S. Rao, B. Young, and P. K. Hanumolu, "A noise-shaping time-to-digital converter using switched-ring oscillators—analysis, design, and measurement techniques," *IEEE J. Solid-State Circuits*, vol. 49, no. 5, pp. 1184–1197, May 2014.
- [21] W.-W. Ji, P.-F. Liu, Y.-Y. Niu, W. Li, N. Li, and J.-Y. Ren, "A high-resolution, high-linearity, two-step time-to-digital converter for wideband counter-assisted ADPLL in 0.13 μm CMOS," in *Proc. IEEE 11th Int. Conf. Solid-State Integr. Circuit Technol. (ICSICT)*, Nov. 2012, pp. 4–6.
- [22] J.-S. Kim, Y.-H. Seo, Y. Suh, H.-J. Park, and J.-Y. Sim, "A 300-MS/s, 1.76-ps-resolution, 10-b asynchronous pipelined time-to-digital converter with on-chip digital background calibration in 0.13-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 516–526, Feb. 2013.
- [23] K. Kim, Y. Kim, W. Yu, and S. Cho, "A 7b, 3.75 ps resolution two-step time-to-digital converter in 65 nm CMOS using pulse-train time amplifier," in *Proc. Symp. VLSI Circuits (VLSIC)*, Jun. 2012, pp. 192–193.
- [24] K. Kim, W. Yu, and S. Cho, "A 9 bit, 1.12 ps resolution 2.5 b/stage pipelined time-to-digital converter in 65 nm CMOS using time-register," *IEEE J. Solid-State Circuits*, vol. 49, no. 4, pp. 1007–1016, Apr. 2014.
- [25] M. Kim *et al.*, "High-resolution and wide-dynamic range time-to-digital converter with a multi-phase cyclic Vernier delay line," in *Proc. Eur. Solid-State Circuits Conf.*, Sep. 2013, pp. 311–314.
- [26] M. Lee and A. A. Abidi, "A 9 b, 1.25 ps resolution coarse–fine time-to-digital converter in 90 nm CMOS that amplifies a time residue," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 769–777, Apr. 2008.
- [27] A. Liscidini, L. Vercesi, and R. Castello, "Time to digital converter based on a 2-dimensions Vernier architecture," in *Proc. Custom Integr. Circuits Conf.*, vol. 9, Sep. 2009, pp. 45–48.
- [28] P. Lu, Y. Wu, and P. Andreani, "A 2.2-ps two-dimensional gated-Vernier time-to-digital converter with digital calibration," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 11, pp. 1019–1023, Nov. 2016.
- [29] Y.-H. Seo, J.-S. Kim, H.-J. Park, and J.-Y. Sim, "A 1.25 ps resolution 8 b cyclic TDC in 0.13 μm CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 3, pp. 736–743, Mar. 2012.
- [30] Y.-H. Seo, J.-S. Kim, H.-J. Park, and J.-Y. Sim, "A 0.63 ps resolution, 11 b pipeline TDC in 0.13 μm CMOS," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2011, pp. 152–153.
- [31] R. B. Staszewski, S. Vemulapalli, P. Vallur, J. Wallberg, and P. T. Balsara, "1.3V 20ps time-to-digital converter for frequency synthesis in 90-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 53, no. 3, pp. 220–224, Mar. 2006.
- [32] L. Vercesi, A. Liscidini, and R. Castello, "Two-dimensions Vernier time-to-digital converter," *IEEE J. Solid-State Circuits*, vol. 45, no. 8, pp. 1504–1512, Aug. 2010.
- [33] Y. Wu, P. Lu, and R. B. Staszewski, "A 103 fS<sub>rms</sub> 1.32 mW 50 MS/s 1.25 MHz bandwidth two-step flash-ΔΣ time-to-digital converter for ADPLL," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, May 2015, pp. 95–98.
- [34] Z. Xu, S. Lee, M. Miyahara, and A. Matsuzawa, "A 0.84 ps-LSB 2.47 mW time-to-digital converter using charge pump and SAR-ADC," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, vol. 1, Sep. 2013, pp. 3–6.
- [35] J. Yu and F. F. Dai, "A 3-dimensional Vernier ring time-to-digital converter in 0.13 μm CMOS," in *Proc. Custom Integr. Circuits Conf.*, vol. 1, Sep. 2010, pp. 3–6.
- [36] M. Z. Straayer and M. H. Perrott, "A multi-path gated ring oscillator TDC with first-order noise shaping," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1089–1098, Apr. 2009.
- [37] A. Ravi *et al.*, "A 9.2–12 GHz, 90 nm digital fractional-N synthesizer with stochastic TDC calibration and -35/-41dBc integrated phase noise in the 5/2.5 GHz bands," in *Proc. Symp. VLSI Circuits*, Jun. 2010, pp. 143–144.
- [38] A. Samarah and A. C. Carusone, "A digital phase-locked loop with calibrated coarse and stochastic fine TDC," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1829–1841, Aug. 2013.
- [39] M. Zanuso, P. Madoglio, S. Levantino, C. Samori, and A. L. Lacaita, "Time-to-digital converter for frequency synthesis based on a digital bang-bang DLL," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 3, pp. 548–555, Mar. 2010.
- [40] M. Zanuso, S. Levantino, A. Puggelli, C. Samori, and A. L. Lacaita, "Time-to-digital converter with 3-ps resolution and digital linearization algorithm," in *Proc. ESSCIRC*, Sep. 2010, pp. 262–265.
- [41] S.-K. Lee, Y.-H. Seo, H.-J. Park, and J.-Y. Sim, "A 1 GHz ADPLL with a 1.25 ps minimum-resolution sub-exponent TDC in 0.18 μm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2874–2881, Dec. 2010.
- [42] A. I. Hussein and J. Paramesh, "Design and self-calibration techniques for inductor-less millimeter-wave frequency dividers," *IEEE J. Solid-State Circuits*, vol. 52, no. 6, pp. 1521–1541, Jun. 2017.
- [43] S. Saberi, "Wideband millimeter-wave frequency generation in CMOS," Ph.D. dissertation, ECE, CMU, Pittsburgh, PA, USA, 2014.
- [44] F.-W. Kuo *et al.*, "A 12 mW all-digital PLL based on class-F DCO for 4G phones in 28 nm CMOS," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2014, pp. 1–2.
- [45] J.-W. Lai *et al.*, "A 0.27 mm<sup>2</sup> 13.5 dBm 2.4 GHz all-digital polar transmitter using 34%-efficiency class-D DPA in 40 nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2013, pp. 342–343.
- [46] G. Keskin, "Self-healing circuits using statistical element selection," Ph.D. dissertation, ECE, CMU, Pittsburgh, PA, USA, 2010.
- [47] G. Keskin, J. Proesel, and L. Pileggi, "Statistical modeling and post manufacturing configuration for scaled analog CMOS," in *Proc. Custom Integr. Circuits Conf.*, Sep. 2010, pp. 26–29.
- [48] M. Maymandi-Nejad and M. Sachdev, "A digitally programmable delay element: Design and analysis," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 11, no. 5, pp. 871–878, Oct. 2003.
- [49] M. Maymandi-Nejad and M. Sachdev, "A monotonic digitally controlled delay element," *IEEE J. Solid-State Circuits*, vol. 40, no. 11, pp. 2212–2219, Nov. 2005.
- [50] A. Amer and E. Sánchez-Sinencio, "A 140 mA 90 nm CMOS low dropout regulator with -56 dB power supply rejection at 10 MHz," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2010, pp. 1–4.
- [51] A. Hussein, S. Vasadi, M. Soliman, and J. Paramesh, "A 50-to-66 GHz 65 nm CMOS all-digital fractional-N PLL with 220 fs<sub>rms</sub> jitter," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 326–327.
- [52] A. Hussein, S. Vasadi, and J. Paramesh, "A 50–66 GHz phase-domain digital frequency synthesizer with low phase noise and low fractional spurs," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, Dec. 2017.



**Ahmed I. Hussein** (S'15) received the B.Sc. (Hons.) and M.Sc. degrees in electrical engineering from Cairo University, Cairo, Egypt, in 2011 and 2014, respectively, and the Ph.D. degree in electrical engineering from Carnegie Mellon University, Pittsburgh, PA, USA, in 2017.

From 2011 to 2014, he was an RF/Analog Design Engineer at Silicon-Vision, Cairo, where he is involved in designing high-performance clocking circuits, high-speed serial links, and power management systems. He was also a Teaching and Research Assistant at Cairo University from 2011 to 2014. In 2016, he joined the RFIC team at Apple, Cupertino, CA, USA, as an RF Design Intern with where he was involved in the design of high-performance frequency multipliers, and digital PLLs. He is currently an RF Design Engineer with Apple, where he is developing the circuits and the architectures of high-performance analog/digital frequency synthesizers for the next generation of communication standards. His current research interests include millimeter-wave frequency synthesizers, high-speed serial links, digital PLLs, wide tuning range millimeter-wave oscillators, and high-resolution time-to-digital converters.

Dr. Hussein received the Carnegie Institute of Technology Dean's Tuition Fellowship from Carnegie Mellon University (2014–2015), the Axel Berny Presidential Graduate Fellowship Award (2016–2017), and the A.G Milnes Award (Best Ph.D. Degree Award) from Carnegie Mellon University in 2017.



**Sriharsha Vasadi** received the B.Tech. degree in electrical engineering from IIT Madras, Chennai, India, in 2009, and the M.S. degree in electrical and computer engineering from Carnegie Mellon University, Pittsburgh, PA, USA, in 2015.

In 2010, he joined ST Ericsson, where he was involved in power management for audio applications. In 2011, he joined Aura Semiconductor, Bengaluru, India, where he was involved in RF IC transmitter and synthesizer designs. In 2015, he joined the Silicon Labs, where he is part of the low power RF IC design for IOT products. He holds two granted patents, one patent in pending, and one ISSCC publication.



**Jeyanandh Paramesh** (SM'10) received the B.Tech. degree in electrical engineering from the IIT Madras, Chennai, India, the M.S. degree in electrical engineering from Oregon State University, Corvallis, OR, USA, and the Ph.D. degree in electrical engineering from the University of Washington, Seattle, WA, USA.

He held product development and research positions with AKM Semiconductor (Analog Devices), Motorola, and Intel. He is currently an Associate Professor of electrical and computer engineering with Carnegie Mellon University, Pittsburgh, PA, USA. His current research interests include the design of RF and mixed-signal integrated circuits and systems for a wide variety of applications.