



Figure 21.1.1: Evolution of sensory signal processing systems: (top) Traditional Systems, (middle) Recent Analog-to-Information based innovations and (bottom) Proposed mixed signal interface for compressive analog-to-non-linear digital transformation.



Figure 21.1.3: Architecture of the programmable non-linear mixed-signal interface highlighting iterative computation of non-linearity in a mixed signal loop, the related equations and a 10b binary DAC layout.



Figure 21.1.5: Measured quantization error for the transfer-characteristics depicted in Fig. 21.1.4. Quantization error corresponding to only one transfer-curve from each non-linearity shape (except envelope in top-right) in Fig. 21.1.4 is shown here to ensure clarity.



Figure 21.1.2: Conceptual illustration of sensory signal transformation through a mixed-signal non-linearity to emphasize information content and enable data compression. Impact of parameters  $\varepsilon$ ,  $\alpha$ , and  $n_{it}$  on the shape of non-linearity is highlighted on the non-linear transfer curve.



Figure 21.1.4: Measured transfer-characteristics depict logarithmic (top-left), exponential (top-right), tangent-hyperbolic (bottom-left) and inverse-tangent-hyperbolic (bottom-right) shapes. All parameters  $\varepsilon$ ,  $\alpha$  and  $n_{it}$  are independently programmable. The graphs also depict the total measured power consumption. Input signal range ( $V_{in}$ ) is normalized from 0 to 1.



Figure 21.1.6: Two application measurements to show the versatility of the proposed interface. (top) Beat detection from muscle noise corrupted ECG shows 2x reduction in rms error while reducing the digital back-end data by 50% as compared to traditional interface. (bottom) Distortion correction of open-loop amplification shows an 20dB improvement in SINAD performance.

|           | Tech.              | Non-Linearity                         | Transfer Char.      | Max. Data Compression | Output Width / Max. Resolution | Power @ Fs<br>Static / Dynamic              |
|-----------|--------------------|---------------------------------------|---------------------|-----------------------|--------------------------------|---------------------------------------------|
| This Work | 90 nm CMOS         | Digital - Programmable Standard Cells | Widely Configurable | 50%                   | Prog. 5 to 10 9.5b             | 6.3 $\mu$ W @ 33 kS/s<br>Dynamic            |
| [3]       | 0.18 $\mu$ m CMOS  | Analog. Non-Binary Cap. Array         | Fixed - Exponential | 25%                   | Fixed - 8 10.5b                | 40 $\mu$ W @ 25 kS/s<br>Partially - Dynamic |
| [4]       | 0.18 $\mu$ m CMOS  | Analog. Non-Linearly Spaced Vref.     | Fixed - Logarithmic | 30%                   | Fixed - 8 13b                  | 2500 $\mu$ W @ 22000 kS/s<br>Static         |
| [5]       | 1.5 $\mu$ m BiCMOS | Analog. Diode Non-Linearity           | Fixed - Logarithmic | NA                    | Fixed - 8 8                    | 3 $\mu$ W @ 0.3 kS/s<br>Static              |



Figure 21.1.7: Comparison to state-of-the-art designs and (bottom) Die micrograph highlighting different sections on chip.

## 21.2 A 1μW Voice Activity Detector Using Analog Feature Extraction and Digital Deep Neural Network

Minhao Yang, Chung-Heng Yeh, Yiyin Zhou, Joao P. Cerqueira, Aurel A. Lazar, Mingoo Seok

Columbia University, New York, NY

Voice user interfaces (UIs) are highly compelling for wearable and mobile devices. They have the advantage of using compact and ultra-low-power (ULP) input devices (e.g. passive microphones). Together with ULP signal acquisition and processing, voice UIs can give energy-harvesting acoustic sensor nodes and battery-operating devices the sought-after capability of natural interaction with humans. Voice activity detection (VAD), separating speech from background noise, is a key building block in such voice UIs, e.g. it can enable power gating of higher-level speech tasks such as speaker identification and speech recognition [1]. As an *always-on* block, the power consumption of VAD must be minimized and meanwhile maintain high classification accuracy. Motivated by high power efficiency of analog signal processing, a VAD system using analog feature extraction (AFE) and mixed-signal decision tree (DT) classifier was demonstrated in [2]. While it achieved a record of 6μW, the system requires machine-learning based calibration of the DT thresholds on a chip-to-chip basis due to ill-controlled AFE variation. Moreover, the 7-node DT may deliver inferior classification accuracy especially under low input SNR and difficult noise scenario, compared to more advanced classifiers like deep neural networks (DNNs) [1,3]. Although heavy computational load in conventional floating-point DNNs prevents their adoption in embedded systems, the binarized neural networks (BNNs) with binary weights and activations proposed in [4] may pave the way to ULP implementations. In this paper, we present a 1μW VAD system utilizing AFE and a digital BNN classifier with an event-encoding A/D interface. The whole AFE is 9.4x more power-efficient than the prior art [5] and 7.9x the state-of-the-art digital filter bank [6], and the BNN consumes only 0.63μW. To avoid costly chip-wise training, a variation-aware python model of the AFE was created and the generated features were used for offline BNN training. Measurements show 84.4%/85.4% mean speech/non-speech hit rate with 1.88%/4.65% 1- $\sigma$  standard deviation among 10 dies using the same weights for 10dB SNR speech with restaurant noise.

Figure 21.2.1 shows the system architecture. Audio signal from a microphone is amplified by a low noise amplifier (LNA), and then sent to 16 parallel channels. Each channel is composed of a bandpass filter (BPF), a full-wave rectifier (FWR), and an integrate-and-fire (IAF) event encoder. The central frequencies of the BPFs are geometrically scaled from about 100Hz to 5kHz. The IAF in each channel produces asynchronous events whose rate is roughly proportional to the signal energy of the respective band. The BNN input layer has 48 neurons, derived from the 16-channel AFE output. Three hidden layers respectively have 60, 24 and 11 neurons. The output layer consists of 2 neurons, with one's activation larger than the other's indicating voice, and smaller noise.

Figure 21.2.2 shows the capacitive LNA. The gain is programmable from 24dB to 42dB with a 6dB step via the input capacitor  $C_{in}$ . Current-reuse inverter-based input is used in the main amplifier to enhance noise efficiency. Existing designs employ two tail transistors, one supplying the bias current and the other giving CMFB. However four stacking transistors between supply and ground makes it challenging if not impossible to maintain saturation of all transistors over PVT under a low supply, which in turn may largely degrade the LNA closed-loop gain. This design eliminates one tail transistor, and sets the input DC voltage and bias current via a scaled replica of the input inverter in diode connection.  $C_c$  and  $R_c$  form the pseudo-cascode compensation for stability. With the load of 16 BPFs' input capacitance, sufficient phase margin (PM) requires large bias current in the second stage of the main amplifier, which is in conflict with microwatt system power budget. To solve this dilemma, positive feedback via  $C_f$  and  $R_f$  is used to boost 3dB bandwidth and therefore  $C_c$  can be increased for PM while keeping bias low. The input DC of the main amplifier is established by the DC-servo-loop (DSL) amplifier. The DSL amplifier is biased with pA current to give a high-pass corner frequency of the LNA smaller than 100Hz.

Figure 21.2.3 (upper) shows the super-source-follower-based 2<sup>nd</sup>-order BPF. Its central frequency  $f_0$ , quality factor  $Q$ , and peak gain  $A_0$  are derived as:

$$f_0 = \frac{1}{2\pi} \sqrt{\frac{g_{m1}g_{m2}}{C_1C_2}} \quad Q = \sqrt{\frac{g_{m2}C_2}{g_{m1}C_1}} \quad A_0 = \frac{C_2}{C_1}$$

where  $g_{m1}$  and  $g_{m2}$  are the transconductance of pFETs and nFETs, respectively. To mitigate the signal swing constraints aggravated by PVT variation, a diode-connected pFET replica with its source fixed at  $V_{refup}$  is used to provide the DC gate voltage of the input pFETs. Figure 21.2.3 (lower) shows the FWR and IAF. Buffered by source followers, the BPF output voltage is converted to current by a differential OTA, and then rectified by the cross-coupled precision current rectifier. The cross-coupled topology halves the output swing of the OTA compared to the single-ended version, essential for low supply voltage operation. To alleviate the dead-zone problem of the rectifier exacerbated by PVT variation, the gate voltages of the transistors are set by a single-ended replica biased at  $I_{leak}$ . The OTA is in DC closed-loop to avoid quiescent output offset current, and its output common-mode DC is set to  $V_{mid}$ , the same as the source voltages of the transistors in the rectifier replica. The rectified current is integrated on  $C_{int}$ , and whenever the integrated voltage crosses above  $V_{refdn}$ , an event is generated at the comparator output, and the integration starts over from ground potential.

Figure 21.2.4 shows the BNN implementation. The parallel event streams from the AFE are collected by ripple counters. Events are counted every 25ms frame with a 10ms frame shift, and each frame is stored in the DMEM, replacing values in one 16×9 block. Three DMEM blocks, including the current frame  $n$ , previous frames ( $n-3$ ) and ( $n-6$ ), compose the 48 input neurons to classify the frame ( $n-3$ ). This technique of incorporating neighboring contextual information improves classification performance [3]. To compute the pre-activations of hidden neurons, the input operand of each accumulation is either the data directly from DMEM or the 84-bit register file (RF) that temporarily stores computed activations of the previous layer, or their negated values, depending on the 1-bit weight from WMEM. The activation function hard sigmoid HS(-) [4] is a simple negation of the sign bit of the pre-activations. The classification output of each frame is obtained by comparing the activations of the 2 output neurons without applying HS(-).

The chip was fabricated in 0.18μm CMOS with a core area of 1.66×1.52mm<sup>2</sup>. Figure 21.2.7 shows the chip micrograph. Figure 21.2.5 (upper left) shows the LNA transfer function, together with the input-referred noise spectrum density at a gain of 42dB and 24dB. The respectively calculated noise efficiency factor (NEF) and power efficiency factor (PEF) at 0.6V are 1.73 and 1.80, and 4.39 and 11.6. Figure 21.2.5 (upper right) shows the event number counted every 25ms (a frame) at the output of IAFs of all 16 channels as the function of input frequency. For classification evaluation, 300 randomly selected clean utterances from AURORA4 dataset are concatenated [4] with duration of 37 minutes and mixed with DEMAND noise dataset for training, and another 300 with the non-speech period balanced lasting 1 hour in total for testing. Figure 21.2.5 (lower left and lower right) shows the speech/non-speech hit-rate testing points of 10dB SNR speech with restaurant noise and 5dB SNR speech with metro noise using the same weights respectively over 10 dies without any AFE calibration. Figure 21.2.6 shows the comparison of the AFE and the VAD system with prior works.

### Acknowledgements:

This work was supported by Swiss National Science Foundation (SNF) Early Postdoc Mobility Fellowship and Columbia University Research Initiatives in Science and Engineering (RISE). The authors thank N. Mesgarani, Y. Tsividis, M. Verhelst, X.-L. Zhang for discussions and help.

### References:

- [1] M. Price, et al., "A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating," *ISSCC Dig. Tech. Papers*, pp. 244-245, Feb. 2017.
- [2] K. Badami, et al., "Context-Aware Hierarchical Information-Sensing in a 6μW 90nm CMOS Voice Activity Detector," *ISSCC Dig. Tech. Papers*, pp. 430-431, Feb. 2015.
- [3] X. -L. Zhang, et al., "Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection," *IEEE/ACM Trans. Audio, Speech, Language Process.*, vol. 24, no. 2, pp. 252-264, 2016.
- [4] I. Hubara, et al., "Binarized Neural Networks," *Adv. Neural Inf. Process. Syst.*, pp. 1-9, 2016.
- [5] M. Yang, et al., "A 0.5V 55μW 64×2-Channel Binaural Silicon Cochlea for Event-Driven Stereo-Audio Sensing," *ISSCC Dig. Tech. Papers*, pp. 388-389, Feb. 2016.
- [6] H. -S. Wu, et al., "A 13.8μW Binaural Dual-Microphone Digital ANSI S1.11 Filter Bank for Hearing Aids with Zero-Short-Circuit-Current Logic in 65nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 348-349, Feb. 2017.



Figure 21.2.1: System architecture of the VAD.



Figure 21.2.2: Circuit schematic of the LNA.



Figure 21.2.3: Circuit schematics of the BPF (top), and the FWR and the IAF (bottom).



Figure 21.2.5: Chip Measurements.



Figure 21.2.4: Architecture of the digital BNN.

| Feature Extractor              | This Work                                                                     | Wu, ISSCC 2017                        | Yang, ISSCC 2016                         | Badami, ISSCC 2015                                                      |
|--------------------------------|-------------------------------------------------------------------------------|---------------------------------------|------------------------------------------|-------------------------------------------------------------------------|
| Technology (nm)                | 180                                                                           | 65                                    | 180                                      | 90                                                                      |
| Feature Type                   | Digital                                                                       | Analog to events                      | Analog to events                         | Analog                                                                  |
| Channel Number                 | 16                                                                            | 18x4                                  | 64x2                                     | 16                                                                      |
| Frequency Range (Hz)           | 100 - 5k                                                                      | 160 - 8k                              | 8 - 20k                                  | 75 - 5k                                                                 |
| Power (μW)                     | 0.38                                                                          | 13.8                                  | 55                                       | 6                                                                       |
| Power/Channel (nW)             | 24                                                                            | 190                                   | 430                                      | 380                                                                     |
| Normalized Power* (nW)         | 85                                                                            | N/A                                   | 800                                      | 1480                                                                    |
| Area/Channel (mm²)             | 0.1                                                                           | 0.00934                               | 0.26                                     | 0.13                                                                    |
| Dynamic Range (dB)             | ~40 @IAF <sup>b</sup>                                                         | N/A                                   | 55 @BPF<br>Q=1.3, THD=1%                 | 45 @LNA, THD=5%                                                         |
| Building Blocks                | LNA, BPF, FWR, IAF                                                            | BPF                                   | BPF, ADM                                 | LNA, BPF, FWR, LPF                                                      |
| <b>Voice Activity Detector</b> | <b>This Work</b>                                                              | <b>Price, ISSCC 2017</b>              | <b>Esser, PNAS 2016</b>                  | <b>Badami, ISSCC 2015</b>                                               |
| Technology (nm)                | 180                                                                           | 65                                    | 28                                       | 90                                                                      |
| System Input                   | Passive mic                                                                   | Digital sound                         | Digital feature                          | Passive mic                                                             |
| Feature Type                   | Digital                                                                       | Digital                               | Off-chip software                        | Digital                                                                 |
| Classifier                     | Digital                                                                       | Digital                               | Digital                                  | Mixed-signal                                                            |
|                                | Binarized deep neural network                                                 | Fixed-point deep neural network       | Spiking neural network                   | Decision tree                                                           |
| Power (μW)                     | 1.0 <sup>c</sup>                                                              | 22.3                                  | 26100                                    | 6                                                                       |
| Classification Rate (/s)       | 100                                                                           | 100                                   | 1539                                     | 32600                                                                   |
| Classification Dataset         | AURORA4 mixed w/ DEMAND                                                       | AURORA2                               | TIMIT mixed w/ NOISEX                    | NOISEUS                                                                 |
| Classification Accuracy        | $\mu_s=84.4\%$ , $\mu_{ns}=85.4\%$ , $\sigma_s=1.88\%$ , $\sigma_{ns}=4.65\%$ | 10% EER @7dB SNR, unspecified context | 95.42% accuracy @unspecified SNR/context | Speech/non-speech hit rate 89%/85% @12dB SNR, babble noise <sup>d</sup> |
|                                | $\mu_s=82.9\%$ , $\mu_{ns}=84.5\%$ , $\sigma_s=1.38\%$ , $\sigma_{ns}=1.81\%$ |                                       |                                          | 97% accuracy @unspecified SNR/context                                   |

a. calculated according to the equation in [5]; b. the ratio of max event rate at input  $V_{pk}=1mV$  and LNA gain of 30dB to min event rate at  $V_{pk}=0mV$ ; c. measured at analog  $V_{dd}=0.6V$  and digital  $V_{dd}=0.55V$ ; d. averaged over 10 dies.

Figure 21.2.6: Comparison table.



Figure 21.2.7: Chip micrograph.

## 21.3 32GHz Resonant-Fin Transistors in 14nm FinFET Technology

Bichoy Bahr<sup>1</sup>, Yanbo He<sup>2</sup>, Zoran Krivokapic<sup>3</sup>, Srinivasa Banna<sup>3</sup>, Dana Weinstein<sup>2</sup>

<sup>1</sup>Massachusetts Institute of Technology,  
now at Kilby Labs - Texas Instruments, Dallas, TX  
<sup>2</sup>Purdue University, West Lafayette, IN; <sup>3</sup>GLOBALFOUNDRIES, Santa Clara, CA

Monolithic integration of microelectromechanical resonators in IC technology has been explored extensively over the past few decades in the effort to achieve on-chip clocks, RF filters, and physical, chemical, and biological sensors. Various approaches for resonator integration, actuation, and sensing have been proposed and demonstrated, targeting high  $Q$  and high frequency performance [1]. These include the “MEMS-last” approach with low temperature materials deposited on complete CMOS chips [2,3], and “MEMS-in-the-middle” in which resonators are defined with post-processing of the front end and back end materials of the IC stack [4]. These methods require additional processing and packaging steps that make frequency scaling challenging and may limit resonator  $Q$ . Meanwhile, in the case of monolithically integrated MEMS resonators, transistor sensing can be used to enhance transduction efficiency and reduce parasitics, which can prohibit 2-port detection for high frequency resonance [5]. In this work, we introduce the Resonant Fin Transistor (RFT) fabricated in GLOBALFOUNDRIES 14nm FinFET technology, leveraging the vertical 3D geometry of FinFETs (Fig. 21.3.1) to efficiently confine, drive, and sense acoustic vibrations in the solid (unreleased) CMOS stack with no post-processing or custom packaging. We demonstrate 32GHz resonators with  $Q\sim49,000$ .

Resonance cavities with omnidirectional energy confinement are essential for high- $Q$  resonators. This represents a challenge in the design of unreleased resonators in commercial CMOS processes. As demonstrated in [6,7], confinement from the Back-End-of-Line (BEOL) side of the CMOS die can be achieved by means of phononic crystals (PnCs) patterned into the BEOL layers. However, for frequencies well beyond 10GHz, the required dimensions of such PnCs become prohibitively small. For the 32GHz resonators in this work, continuous sheets of copper in all BEOL layers were used to form an acoustic reflector. Although complete bandgaps are not feasible with such structures, a partial bandgap from 29GHz to 36GHz was found. This provides acoustic confinement from the BEOL stack. Confinement from the silicon substrate is achieved through index guiding: the periodic fin structure forms a slow wave structure sitting on top of the bulk silicon (faster sound velocity), which allows for total internal reflection for wave-vectors around  $k_x=\pi/a$ , for fin pitch  $a$ . This vertical confinement creates a horizontal periodic waveguide with dispersion relation shown in Fig. 21.3.2. Guided modes can be observed at  $k_x=\pi/a$  with frequencies of 29.0 and 33.7GHz. These modes are below the sound cone ( $\omega = ck_x$ , where  $c$  is the sound velocity in bulk silicon), which prohibits their propagation in the bulk, allowing only evanescent modes [6]. Only the 33.7GHz mode can be actuated with the application of a uniform stress across the FinFET gate as the 29.0GHz mode has asymmetric stress across the waveguide period. To achieve horizontal confinement, the periodic fin structure is abruptly terminated at both ends of the waveguide, which results in large specular reflection. Significant scattering is also expected. However, this resonator consists of 154 fins which makes energy loss due to scattering insignificant compared to the energy stored, resulting in high- $Q$  resonance.

The RFT is driven by electrostatic transduction, leveraging the FinFET gate capacitance [5]. Changing voltage on the gate capacitance causes a change in the stored energy, which generates a stress in the RFT resonant cavity. Active FET sensing is used to detect the mechanical vibration as a modulation in the drain current of a FinFET embedded in the RFT cavity and biased in the linear regime. A single and continuous metal gate is shared among all the RFT fins, including both the driving and sensing fins. This ensures structural periodicity, with minimal perturbations (except for manufacturing variations), which results in reduced scattering and energy leakage. The RF driving signal is then applied to the corresponding source/drain epitaxial (S/D epi) contacts of the driving fins, while the RF output current is obtained from the drains of the sensing FinFETs. Also, for waveguiding, it is essential for the driving stress to have large spatial components near  $k_x=\pi/a$ , which requires the actuation of alternate fins with out-of-phase RF signals [6]. However, owing to the small fin dimensions, most FinFET

technologies do not support contacting individual fins S/D regions. To overcome this limitation, 3 fins’ S/D epi are connected as a single driving electrode, while 4 subsequent fins are skipped (DRC prohibits skipping 2 fins). The RFT differential unit cell is built out of 14 fins, such that three are excited with the RF driving voltage  $+0.5V_{drive}$ , four are not contacted, three are driven with  $-0.5V_{drive}$ , and another four skipped. This distribution creates sufficiently large actuation force components at  $k_x=\pi/a$ . Figure 21.3.1 shows the RFT differential unit cell with the epi regions for the contacted fins and the applied driving voltage.

A 2D FEM model for the cross-section of a single differential RFT unit cell is constructed in COMSOL Multiphysics. Figure 21.3.2 shows simplified model geometry, as well as the Floquet periodic boundary conditions imposed on the edges of the unit cell. Differential 1MPa uniform driving stress is applied to the driving gates dielectric (according to the driving voltage polarity). Isotropic and uniform material damping was introduced to limit the  $Q$  to 5,000 for numerical purposes. The X-stress field distribution for the 33.7GHz resonance mode is shown in Fig. 21.3.2. The vibrational mode is well-confined within the transistor fins, with evanescent decay in the bulk and BEOL layers. The full RFT resonator is formed out of a differential sensing unit cell, with five differential driving unit cells on both sides for a total of 154 fins (Fig. 21.3.3). With a 44nm fin pitch, the RFT has a footprint of  $6.8\mu\text{m} \times 200\text{nm}$ .

Fully-differential 2-port S-parameters were measured using Keysight parametric network analyzer (PNA) N5225A with integrated true-mode stimulus application (iTMSA). Three Keithley 2400 SMUs are used to set the DC operating point of the FinFETs. Both drive and sense FETs have gate voltage  $V_G=0.8\text{V}$ . The sense FinFETs are biased in linear regime with  $V_{DS}=200\text{mV}$ , yielding  $I_{DS}=118\mu\text{A}$ . The driving FinFETs are biased with  $V_D=V_S=40\text{mV}$ . The RF power is  $-20\text{dBm}$ , with an IFBW of 500Hz and 30-point averaging. The de-embedded electro-mechanical transconductance ( $g_{m,\text{mech}}=Y_{11dd}/Y_{12dd}$ ) is shown in Fig. 21.3.4, with resonance peak of  $14\text{mS}$  at  $32.0\text{GHz}$  (5% difference from FEM prediction) and  $Q\sim49,000$  for a resonator figure of merit (FOM)  $f_0\cdot Q\sim1.57\times10^{15}$ . Figure 21.3.5 shows  $g_{m,\text{mech}}$  and  $Q$  as functions of the driving DC bias voltage. Figure 21.3.6 shows a comparison of the presented device with state-of-the-art CMOS-Integrated MEMS resonators [2-8], while Fig. 21.3.7 shows an SEM for its cross-section. The presented device advances the state of the art with  $10\times$  higher resonance frequency,  $3.7\times$  improvement in  $Q$ ,  $37\times$  higher FOM, and  $10^5\times$  improvement in output signal than the CMOS resonators in [7].

This work marks the first MEMS resonators embedded in commercial FinFET technology. Leveraging gate capacitance drive, active FET sensing, and acoustic confinement in the CMOS stack, we demonstrate 32GHz with  $Q\sim49,000$ , marking the highest FOM to date for any Si-based MEMS resonator at room temperature.

### References:

- [1] A. C. Fischer, et al., “Integrating MEMS and ICs,” *Nature Microsys. & Eng.* 1, 15005, pp. 1-16, 2015.
- [2] M. Aissi, et al., “A 5.4GHz 0.35μm BiCMOS FBAR resonator oscillator in above-IC technology,” *ISSCC Dig. Tech. Papers*, pp. 1228-1229, Feb. 2006.
- [3] S. Razafimandimbry, et al, “A 2GHz 0.25μm SiGe BiCMOS Oscillator with Flip-Chip Mounted BAW Resonator,” *ISSCC Digest Tech. Papers*, pp. 580-581, Feb. 2007.
- [4] C.-S. Li, et al., “A Low-Voltage CMOS-Microelectromechanical Systems Thermal-Piezoresistive Resonator With  $Q > 10,000$ ,” *IEEE EDL*, vol. 36, no. 2, pp. 192-194, 2015.
- [5] D. Weinstein, S. A. Bhave, “Acoustic Resonance in an independent-gate FinFET,” *Solid-State Sensors, Actuators and Microsystems Workshop*, pp. 459-462, 2010.
- [6] B. Bahr, R. Marathe, D. Weinstein, “Theory and design of phononic crystals for unreleased CMOS-MEMS Resonant Body Transistors,” *IEEE JMEMS*, vol. 24, no. 5, pp. 1520-1533 2015.
- [7] B. Bahr, D. Weinstein, “Vertical acoustic confinement for high- $Q$  fully-differential CMOS-RBTs,” *Solid-State Sensors, Actuators and Microsystems Workshop*, pp. 88-91 2016.
- [8] H. M. Lavasanian, et al., “A  $76\text{dB}\Omega$  1.7GHz  $0.18\mu\text{m}$  CMOS tunable transimpedance amplifier using broadband current pre-amplifier for high frequency lateral micromechanical oscillators,” *ISSCC Dig. Tech. Papers*, pp. 318-319, Feb. 2010.



Figure 21.3.1: RFT differential unit cell with 14 fins showing differential driving voltage for a 3-skip-4 fin connections.



Figure 21.3.2: Dispersion relation of the RFT simulated in FEA (left) for single fin period with periodic boundary conditions and continuous metal reflector above the silicon fin (right).



Figure 21.3.3: Circuit configuration for drive and sense fins in RFT. Each pair of differential drive or sense FETs represents a unit cell of 14 fins as depicted in Fig. 21.3.1.



Figure 21.3.4: Measured RFT electro-mechanical transconductance at  $V_{DC\text{-drive}} = 40\text{mV}$ , exhibiting resonance at 32GHz with  $Q \sim 49,000$ .



Figure 21.3.5: Electromechanical transconductance and quality factor as a function of DC driving voltage. As  $V_{DC\text{-drive}}$  increases,  $V_{GS}$  on the driving FETs decrease, hence decreasing the actuation force and  $g_{m\text{-mech}}$ .

|                         | This work             | HH2016 [7]           | EDL 2014 [4]          | HH2010 [5]            | ISSCC2010 [8]        | ISSCC2007 [3]         | ISSCC2006 [2]          |
|-------------------------|-----------------------|----------------------|-----------------------|-----------------------|----------------------|-----------------------|------------------------|
| CMOS Process            | 14 nm FinFET          | 32 nm SOI            | 0.35 μm               | Custom SOI            | 0.18 μm              | 0.25 μm               | 0.35 μm                |
| f <sub>0</sub> [GHz]    | 32.0                  | 3.19                 | 5.14 MHz              | 37.1                  | 1.0                  | 2.124                 | 5.46                   |
| Q                       | 49,000                | 13,300               | 11,916                | 560                   | 7,100                | 600                   | 300 <sup>t</sup>       |
| f <sub>0</sub> xQ       | 1.57x10 <sup>15</sup> | 4.2x10 <sup>13</sup> | 6.12x10 <sup>10</sup> | 2.08x10 <sup>13</sup> | 7.1x10 <sup>12</sup> | 1.27x10 <sup>12</sup> | 1.638x10 <sup>12</sup> |
| g <sub>m-mech</sub>     | 14 mS                 | 140 μS               | 8.6 μS*               | 20 μS                 | 6.66 mS*             | -                     | 0.4 S * <sup>t</sup>   |
| Top Confinement         | Solid Cu Sheets       | PnC                  | Free Surface          | Free surface          | Free surface         | Free surface          | Free surface           |
| Bottom Confinement      | Index Guiding         | Index Guiding        |                       |                       |                      | Bragg Mirror          | Free surface           |
| Horizontal Confinement  | Abrupt Termination    | PnC                  | Abrupt Termination    | Abrupt Termination    | Abrupt Termination   | Abrupt Termination    | Abrupt Termination     |
| Length [μm]             | 6.8                   | 13                   | 160                   | 1.7                   | 300                  | 200                   | 200                    |
| Width [μm]              | 0.2                   | 4                    | 100                   | 0.5                   | 100                  | 120                   | 170                    |
| Area [μm <sup>2</sup> ] | 1.36                  | 52                   | 16,000                | 0.85                  | 30,000               | 24,000                | 34,000                 |
| Drive                   | Electrostatic         | Electrostatic        | Thermal               | Electrostatic         | Piezo-electric       | Piezo-electric        | Piezo-electric         |
| Sense                   | Active FET            | Active FET           | Piezo-resistive       | Active FET            | Piezo-electric       | Piezo-electric        | Piezo-electric         |

\*Estimated equivalent conductance or  $g_{m\text{-mech}}$   
<sup>t</sup>Value listed in E. Tournier, "5.4 GHz, 0.35 μm BiCMOS FBAR-Based Single-Ended and Balanced Oscillators in Above-IC Technology," MEMS-based circuits and systems for wireless communication, Integrated circuits and systems, Springer, 167-177 (2013).

Figure 21.3.6: Performance comparison with state of the art CMOS-integrated MEMS resonators.



Figure 21.3.7: Scanning electron micrograph of the RFT, showing multiple periods of the device with continuous metal sheet reflectors on top of the fins (left). Zoom-in of FinFETs showing a 3-skip-4 connection pattern (right).

## 21.4 A 10Gb/s Si-Photonic Transceiver with 150µW 120µs-Lock-Time Digitally Supervised Analog Microring Wavelength Stabilization for 1Tb/s/mm<sup>2</sup> Die-to-Die Optical Networks

Yvain Thonnart<sup>1</sup>, Mounir Zid<sup>1</sup>, José Luis Gonzalez-Jimenez<sup>1</sup>, Guillaume Waltener<sup>1</sup>, Robert Polster<sup>1</sup>, Olivier Dubray<sup>1</sup>, Florent Lepin<sup>1</sup>, Stéphane Bernabé<sup>1</sup>, Sylvie Menezo<sup>1</sup>, Gabriel Parés<sup>1</sup>, Olivier Castany<sup>1</sup>, Laura Boutafa<sup>1</sup>, Philippe Grosse<sup>1</sup>, Benoît Charbonnier<sup>1</sup>, Charles Baudot<sup>2</sup>

<sup>1</sup>CEA-LETI-MINATEC, Grenoble, France

<sup>2</sup>STMicroelectronics, Crolles, France

Silicon photonics has allowed cost reduction and performance improvement for optical interconnects for the past few years, and short-reach wavelength-division-multiplexed (WDM) links have recently emerged thanks to the introduction of microring modulators and filters [1-5]. Nevertheless, the promise of optical networks-on-chip foreseen in [1] has to face the integration challenges of scalable low-footprint elementary drivers and robust operation under heavy thermal stress due to self-heating of the cores with varying loads. This work presents a 3D-stacked CMOS-on-Si-photonic transceiver chip, which includes base building-blocks targeting die-to-die WDM optical communication for multicore processors: 10Gbps 2.5V<sub>pp</sub> OOK modulator driver, associated receiver, and digitally-supervised analog wavelength stabilization using microring heaters and remapping for 0-to-90°C operating range, for a total footprint of 0.01mm<sup>2</sup> per microring.

Figure 21.4.1 presents five different experiments integrated in the chip to validate our approach, starting from a single modulator to a complete 4-wavelength WDM link in the 1310nm band. As the microrings have a sharp temperature-dependent resonance with Q-factors up to 30000, they can be mapped to different wavelengths on the same waveguide, but require fine-tuning of the microrings [2-5] to the laser wavelength using temperature tuning within 0.1°C accuracy by Joule effect in a resistive heater inside the ring. Each wavelength is modulated on the Tx side using carrier-depletion PN-rings and filtered using thermally-tuned passive rings to a germanium photodiode whose photocurrent is demodulated on the Rx side. Vertical-fiber grating couplers inject one or more laser sources into the Si-photonic chip and collect the residual or modulated optical output for monitoring. Figure 21.4.2 presents such an output showing eye diagrams of the optical signal after 2.5Vpp OOK Tx modulation up to 10Gbps.

The major innovation in this paper is the new thermal tuning microarchitecture presented in Fig. 21.4.3. It uses a digitally-supervised analog feedback loop monitoring the resonance from the drop port of a microring to apply proportional-integral (PI) robust control on the voltage of the resistive heater via a power NMOS transistor. Analog control provides smooth sub-pm tuning to a reference optical power level with no DC offset nor quantization noise. An operational amplifier with configurable RC feedback fixes the control loop gain: proportional and integral parts are set respectively by a controlled segmented 180kΩ polysilicon resistor and a 5pF on-chip capacitor, which take half the area of the 40×40µm<sup>2</sup> tuning controller. A small digital finite-state machine controlling the feedback loop handles initialization and remapping. Normal operation uses negative feedback, where the NMOS gate voltage increases when the optical power decreases, resulting in more heat injected into the ring and a redshift of the resonant wavelength. This stabilizes the wavelength on the sharper edge of the resonant peak induced by non-linear ring dynamics [5], where the modulation efficiency will be maximum as long as the optical power is below the ring bistability point. However, when threshold comparators detect that the ring is either overheated (thermal budget will not allow keeping up with the environmental temperature drop) or under heated (as the ring cannot be cooled-down to hide the temperature increase), the feedback loop sign is inverted using a pass-gate switch to swap the optical power reference and the TIA output. This leads to a controlled stabilization on the other edge of the resonant peak, after which the loop sign is restored to map the ring on the sharp edge of the next wavelength. A slight reference adjustment enforces the shift direction: positive for blueshift, negative for redshift. A loop gain adjustment via the segmented resistor is needed because of the non-linear gain of the ring along the edge. Finally, the control is also adjusted depending on the transmission of data on the wavelength.

Optical transfer function measurements with a laser wavelength sweep are shown on Fig. 21.4.4 for a ring controlled at different reference optical powers. The first and last curves show the ring transmission spectrum for a completely cool, resp. hot, microring, with the expected distorted peak. The second dashed curve shows the impact of CMOS heat dissipation on an uncontrolled ring, resulting in a resonance shift of 30pm. The other curves show the ring first heated to the maximum as the optical drop power is below the reference, then suddenly cooled down to the minimum as the peak is crossed to stabilize as expected on its sharper edge, with increasing thermal budget until the maximum is reached. These curves show a 0.5pm flat plateau for about 40pm sweep, showing the tuning accuracy. Tuning range, nevertheless, is not as large as expected because no substrate removal was performed on the microrings [5], letting heat flow off the ring. Figure 21.4.5 presents the dynamic behavior of the thermal tuning. Lock-time is measured after activating the controller, and shows a smooth convergence to a reference level corresponding to -2dB on the through port of a Tx ring, where the extinction ratio will be maximal. Time-domain response to a periodic 30pm<sub>pp</sub> environmental fluctuation of the wavelength shows perfect stabilization of the ring up to 900Hz.

Figure 21.4.6 presents the transceiver performance summary and a comparison table. Authors of [2] follow the same 3D face-to-face integration approach for their WDM transceiver, but still rely on manual external thermal tuning of the microring resonant wavelengths. In [3-4], the electrical and photonic ICs are die-to-die wire-bonded, focusing on 100Gbps Ethernet transceiver modules rather than die-to-die optical communication. Authors of [5] present a complete monolithic CMOS-Si-photonic platform based on standard 40nm process, well suited to on-chip/die-to-die communication, although they do not describe WDM demultiplexing with ring filters. In [3-5], thermal tuning is performed using digital closed-loop control with  $\Sigma\Delta$  DAC or SAR ADC, at 100MHz for [3-4], and 76kHz for [5], leading either to high activity or long latency, with a minimum quantization noise of 5pm. The proposed digitally-supervised analog control loop allows for a 50× faster lock-time of 120µs and a demonstrated stability under 900Hz environmental fluctuation, for only 150µW tuning controller power. Digital supervision at 100kHz allows full closed-loop remapping from a wavelength to another in case of large temperature changes in less than 1ms, where previous designs would require to open the loop and scan for a new wavelength. This new remapping capability makes it possible to consider real-time operation with low interruption time under a 0-to-90°C temperature range commonly found in server chips with varying loads.

A micrograph of the CMOS chip is shown in Fig. 21.4.7, with the different sub-blocks highlighted, and the flip-chip integrated 10Gbps transceiver wire-bonded on a PCB with a fiber array positioned on the optical grating couplers. Inductors are part of the 50Ω output buffers and not in the functional drivers. The footprint for a complete per-wavelength Tx or Rx driver is 9600µm<sup>2</sup>, identical on the CMOS and Si-photonic die, and determined by the copper-pillar pitch of 40µm for 6 terminals (ring modulation, ring tuning, photodiode on drop port). This results in a bandwidth-density figure of merit of 1Tbps/mm<sup>2</sup> for a wavelength-locked electro-optical die-to-die communication, paving the way for die-to-die optical networks on photonic interposers for future high-performance computation applications.

### References:

- [1] Young, et al., "Optical I/O Technology for Tera-Scale Computing," *IEEE J. Solid-State Circuits*, vol. 45, no. 1, pp. 235-248, Jan. 2010.
- [2] Rakowski, et al., "A 4×20Gb/s WDM Ring-Based Hybrid CMOS Silicon Photonics Transceiver," *ISSCC Dig. Tech. Papers*, pp. 408-409, Feb. 2015.
- [3] Li, et al., "A 25Gb/s 4.4V-Swing AC-Coupled Si-Photonic Microring Transmitter with 2-Tap Asymmetric FFE and Dynamic Thermal Tuning in 65nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 410-411, Feb. 2015.
- [4] Yu, et al., "A 24Gb/s 0.71pJ/b Si-Photonic Source-Synchronous Receiver with Adaptive Equalization and Microring Wavelength Stabilization," *ISSCC Dig. Tech. Papers*, pp. 406-407, Feb. 2015.
- [5] Sun, et al., "45nm CMOS-SOI Monolithic Photonics Platform," *IEEE J. Solid-State Circuits*, vol. 51, no. 4, pp. 893-907, April 2016.



Figure 21.4.1: CMOS+Si-photonic Chip architecture showing the 5 sub-experiments implemented.



Figure 21.4.2: Tx electro-optical modulation eye diagrams and signal-to-noise ratio at 8Gbps and 10Gbps, for a power consumption of 640 fJ/bit.



Figure 21.4.3: Microring thermal control micro-architecture, with closed-loop analog PI feedback and digital supervision fixing reference set-point and loop feedback sign for controlled wavelength remapping.



Figure 21.4.4: Thru-port transmission spectrum of a microring with different thermal tuning configurations, showing the heater locked (on Drop port) to a reference level during wavelength sweep.



Figure 21.4.5: Transient thermal control behaviour showing heater lock on target power level after activation and steady-state tracking under environmental variation, here laser wavelength.

|                      | [2] Rakowski ISSCC2015      | [3,4] Li, Yu ISSCC2015            | [5] Sun JSSC2016                           | This work                               |
|----------------------|-----------------------------|-----------------------------------|--------------------------------------------|-----------------------------------------|
| Integration scheme   | 3D face-to-face             | 2D proximity wirebonding          | Monolithic                                 | 3D face-to-face                         |
| Technology           | 130nm SOI SiPh<br>40nm CMOS | 130nm SOI SiPh<br>65nm CMOS       | 45nm CMOS SOI<br>85nm CMOS                 | 100nm SOI SiPh<br>85nm CMOS             |
| Datarate             | 20 Gbps                     | 24 Gbps                           | 10 Gbps                                    | 10 Gbps                                 |
| Ring Q factor        | ~5500                       | ~5000 Tx, ~18000 Rx               | ~11600                                     | ~30000                                  |
| Wavelength           | 1550nm                      | 1550nm                            | 1180nm                                     | 1310nm                                  |
| WDM channels         | 4                           | 5                                 | 11                                         | 4                                       |
| Thermal tuning       | Open-loop                   | Digital closed-loop avg/peak det. | Digital closed-loop bit-statist. (Tx only) | Analog closed-loop w. digital reconfig. |
| Wavelength remapping | No / External               | No / External                     | No / External                              | Integrated <1ms remap time*             |
| Heater efficiency    | 0.16nm/mW                   | 0.16nm/mW                         | 1.25nm/mW                                  | 0.04nm/mW                               |
| Tuning ctrl. power   | N.A. - External             | 170µW                             | 720µW                                      | 150µW                                   |
| Tuning precision     | N.A. - External             | Not reported                      | 5pm                                        | 0.5pm                                   |
| Tuning lock-time     | N.A. - External             | 700ms                             | 6.7ms                                      | 120µs                                   |
| Tuning bandwidth     | N.A. - External             | ~1Hz (from tr. meas.)             | ~1Hz (from tr. meas.)                      | 900Hz                                   |
| Tuning area / ring   | 0.04mm² (pad area)          | 0.03mm²                           | 0.0024mm²                                  | 0.0016mm²                               |
| Tot. driver area / λ | 0.14mm² (Tx or Rx)          | 0.1 / 0.06 mm² (Tx / Rx)          | 0.0205mm² (Tx)                             | 0.0098mm² (Tx or Rx)                    |
| Bandwidth density    | 142 Gbps/mm²                | 300 Gbps/mm²                      | 391 Gbps/mm²                               | 1 Tbps/mm²                              |

\* Remapping simulated for heater efficiency of 1.25nm/mW (not attained due to lack of selective substrate removal as in [5])

Figure 21.4.6: 10Gbps Si-photonic thermally-tuned transceiver performance summary and comparison table.



Figure 21.4.7: Die micrographs of CMOS Transceiver chip with 6 Tx drivers and 5 Rx drivers including ring tuning, associated Flip-chip face-to-face assembly on Si-photonic chip, and board and fiber integration.

## 21.5 A 286F<sup>2</sup>/Cell Distributed Bulk-Current Sensor and Secure Flush Code Eraser Against Laser Fault Injection Attack

Kohei Matsuda<sup>1</sup>, Tatsuya Fujii<sup>2</sup>, Natsu Shoji<sup>2</sup>, Takeshi Sugawara<sup>2</sup>, Kazuo Sakiyama<sup>2</sup>, Yu-ichi Hayashi<sup>3</sup>, Makoto Nagata<sup>1</sup>, Noriyuki Miura<sup>1</sup>

<sup>1</sup>Kobe University, Kobe, Japan

<sup>2</sup>University of Electro-Communications, Chofu, Japan

<sup>3</sup>Nara Advanced Institute of Science and Technology, Ikoma, Japan

A sense-and-react closed-loop countermeasure is proposed against Laser Fault Injection (LFI) attack on a cryptographic processor core. A 286F<sup>2</sup>/cell distributed bulk-current sensor detects laser injection by abnormal current conduction at bulk contacts. Upon the detection, a flush code eraser avoids exposure of laser-induced faulty ciphertext by shunting the core supply instantaneously at ns order. A protected AES core in 0.18μm CMOS successfully disables the LFI attack with only +28% area penalty.

LFI is one of the most powerful physical attacks on the cryptographic cores, such as AES (Fig. 21.5.1). By intentionally inducing a cryptographic operation for faulty data, the secret key information can be disclosed through the difference between the correct and the faulty ciphertexts, namely Differential Fault Analysis (DFA). LFI can significantly improve the efficiency of DFA by exploiting its high-precision time and spatial resolutions of the laser. A fault operation with a single-bit FF data flip at the 8th round of AES encryption causes 121b out of 128b secret key disclosure by only one single focused laser shot [1]. A doubling-based logic-level countermeasure can enhance resiliency against LFI [2] however the associated layout area penalty is >100% of the unprotected cryptographic core (i.e. layout area becomes >2× larger). Another solution is integrated sensor-based physical countermeasure to detect LFI. An integrated photo detector (or a temperature sensor) might be a possible solution to detect an abnormal event that occurred due to LFI. However, in order to detect the focused laser shot with limited light and heat spot, a dense sensor array arrangement is needed, resulting in a huge area penalty.

In this paper, a compact LFI sensor solution is proposed, where abnormal bulk current is detected as an LFI attack (Fig. 21.5.1). Since the current is spread all over the shared well and substrate, a sparse sensor array arrangement is possible for layout area saving. This paper also presents a reactive closed-loop countermeasure against the LFI attack. In response to the attack detection, the cryptographic core supply is shunted down instantaneously to erase the laser-induced faulty intermediate value. The overall layout area penalty is only 28%.

Figure 21.5.2 depicts the details of the LFI sensor. The single-bit FF data flip occurs due to laser-induced photocurrent generated at the PN junction of the OFF-state transistor ( $M_N$  in case of Fig. 21.5.2). This photocurrent flows into bulk contacts due to the potential slope between the drain node of  $M_N$  at  $V_{DD}$  level and the bulk contacts at  $V_{SS}$  level (holes in laser-induced electron-hole pairs are captured by the bulk contacts). Since the current conduction at the bulk contact is normally very small, the mA-order abnormal photocurrent flow into the bulk contact can be easily detected by a small resistor and a voltage amplifier. A soft-error detection circuit [3] can be employed as the LFI sensor core. The sensor front-end is composed of only 4 transistors for the resistor-amplifier pairs in the pull-up and –down photocurrent paths. The layout area penalty is only 286F<sup>2</sup>/cell (~2.6 gate-equivalent of 2-input NAND). The sensor front-end is distributed across the entire cryptographic core for 100% detection coverage. The sensor sensitivity and its detection range is characterized based on a preliminary measurement at once with a test chip [4]. No device simulator and foundry confidential parameters are needed for design. The outputs of the sensor front-end cells are wire-ORed into a sensor back-end logic to finally generate an alarm signal for the attack detection. Both the front- and back-end are redundantly distributed to enhance the security level against the sensor disabling attack.

Figure 21.5.3 describes the details of the reactive countermeasure circuit, namely flush code eraser. Upon the LFI attack detection in the sensor, power switches are turned off. An additional shunt switch is then turned on for instantaneous power down. The sensitive laser-induced faulty data is quickly erased and is not disclosed to disable DFA. The proposed code eraser is faster compared to code erase by FF data reset. The code retention time is reduced by the latency of the

combinational logic path. In addition, the proposed eraser is more secure than the reset scheme. Since the core supply is electrically isolated from the global supply line, a power side-channel attack to disclose the faulty ciphertext information becomes very difficult. In case of FF data reset, data-dependent side-channel information is leaked through the supply connection.

A test chip was designed and fabricated in 0.18μm standard CMOS (Fig. 21.5.4). The distributed bulk-current sensor and the flush code eraser was together integrated with a 128bit-key AES cryptographic processor core. The protected AES core was designed based on a standard digital design flow with a commercial EDA tool chain. Based on the preliminary characterization, the sensor front-end cells were distributed with a pitch of 25μm in X- and 10μm in Y-direction to stably detect LFI over the entire AES core area. Since the photocurrent is spread over the shared well and substrate, sparse distribution is acceptable for layout area saving. In total, 336 sensors front-end and 23 back-end cells were integrated. The overall layout area penalty including the flush code eraser is only +28% of an unprotected AES core area. The test chip was mounted on a test board and the LFI attack was performed by using an IR laser source for backside attack capability to bypass a metal shielding on top of the cryptographic core.

Figure 21.5.5 shows the LFI attack results. A 50× optical lens is used to focus the laser spot to be 2μm in diameter. The laser-injection point was 2D scanned with 1μm step all around a single-bit data FF in an unprotected AES core. A faulty ciphertext was obtained by LFI at AES 8th round and the secret key was successfully disclosed. The fault occurred only at one single LFI point even with the maximum available laser power in our test setup. Figure 21.5.5 also presents measured fault probability dependence on the 60ns pulse laser energy injected at this laser-injection point. The minimum required laser energy for inducing fault was measured to be around 4.2nJ. Figure 21.5.6 presents sensor sensitivity map measured by LFI into the protected AES core. Thanks to the preliminary characterization, the laser energy to be detected by the distributed sensor is far smaller than that for the fault injection. This confirms the well-designed sensor detection range for good coverage across the AES core. Finally, the sense-and-react closed-loop operation was tested. An on-chip monitor was integrated for testing purpose to see the functionality of the flush code eraser by monitoring the core supply,  $V_{DD,CORE}$  and the internal data FF outputs. In response to the LFI attack, the core supply was successfully shunt down instantaneously within 2ns. The ciphertext handled in the AES core was erased accordingly to disable the LFI attack successfully.

### Acknowledgements:

This work is supported by JSPS Grants-in-Aid for Scientific Research under Grant 15H01688. The authors are grateful to Information-technology Promotion Agency (IPA) for the laser test setup and technical assistance.

### References:

- [1] K. Sakiyama, et al., "Information-Theoretic Approach to Optimal Differential Fault Analysis," *IEEE Trans. on Information Forensics and Security*, vol. 7, no. 1, pp. 109-120, Feb. 2012.
- [2] M. Doucier-Verdier, et al., "A Side-Channel and Fault-Attack Resistant AES Circuit Working on Duplicated Complemented Values," *ISSCC Dig. Tech. Papers*, pp. 274-275, Feb. 2011.
- [3] E. Neto, et al., "Using Bulk Built-in Current Sensors to Detect Soft Errors," *IEEE Micro*, vol. 26, no. 5, pp. 10-18, May 2006.
- [4] K. Matsuda, et al., "On-Chip Substrate-Bounce Monitoring for Laser-Fault Countermeasure," *IEEE Asian Hardware-Oriented Security and Trust (AsianHOST)*, pp. 1-6, Dec. 2016.



Figure 21.5.1: Conceptual sketch of LFI attack and proposed sense-and-react countermeasure.



Figure 21.5.2: Circuit detail of distributed bulk-current sensor for LFI detection.



Figure 21.5.3: Circuit detail of flush code eraser.



Figure 21.5.4: Die micrograph and measurement setup.



Figure 21.5.5: Measured laser sensitivity map and fault probability dependence on laser energy.



Figure 21.5.6: Measured minimum laser energy detected by distributed sensor.



Figure 21.5.7: Measured core supply waveform and average HD between ciphertext before and after LFI.

## 21.6 An 8-Channel 13GHz ESR-on-a-Chip Injection-locked VCO-array achieving 200 $\mu$ M-Concentration Sensitivity

Anh Chu<sup>1,3</sup>, Benedikt Schlecker<sup>1,3</sup>, Klaus Lips<sup>2</sup>, Maurits Ortmanns<sup>1</sup>, Jens Anders<sup>1,3</sup>

<sup>1</sup>University of Ulm, Ulm, Germany

<sup>2</sup>Helmholtz-Zentrum Berlin für Materialien und Energie, Berlin, Germany

<sup>3</sup>University of Stuttgart, Stuttgart, Germany

Thanks to their unmatched specificity, methods based on magnetic resonance effects are amongst the most powerful spectroscopic techniques available today. Out of these methods, due to the availability of improved electronics at the required frequencies in the tens of GHz region, electron spin resonance (ESR) spectroscopy is gaining significant attention in the research community as a tool in life science and materials science research.

The principle of a frequency-sensitive ESR measurement is illustrated as part of Fig. 21.6.1. In such a measurement, optimizing the sensor's limit of detection (LOD) is crucial because inductively detected ESR suffers from the very small sample polarization under typical experimental conditions. To overcome this sensitivity problem, significant research efforts have been devoted to designing miniaturized ESR detectors that exploit the improved spin sensitivity (i.e. the LOD in spins /  $\sqrt{\text{Hz}}$ ) of small-size detectors [1-3]. Unfortunately, the scaled-down size of these sensors drastically degrades their concentration sensitivity (CS) (i.e. the LOD in M /  $\sqrt{\text{Hz}}$ ), leading to a hard design tradeoff between spin and concentration sensitivity.

To overcome this tradeoff, an improved sensor concept is presented, which extends the frequency-sensitive VCO-based ESR detector presented in [3] to arrays of VCOs for an increased active sensor volume. Arrays of uncoupled VCOs have been previously presented for frequency-sensitive dielectric spectroscopy sensing at high GHz-frequencies with extended field-of-views, cf. [4]. In contrast, in this paper, the use of arrays of injection-locked (IL) VCOs is proposed according to Fig. 21.6.1 to further enhance performance. While in [4] specific care was taken to avoid frequency locking between the individual array oscillators to preserve the localized information contained in their individual frequencies, here, the entire array is intentionally locked to a single joint oscillation frequency. Thereby, two goals are achieved at the same time: First, the phase noise (PN) (in power) contained in the joint oscillation frequency of the array is lowered by approximately the number of array oscillators  $N$  and, second, the readout complexity is drastically reduced to a single readout signal. This latter fact is particularly important when scaling the approach up to large channel counts. The reduced PN of an IL VCO array can be intuitively explained by the correction force which the  $N - 1$  remaining oscillators exert on the joint oscillation phase when the phase perturbation of a single oscillator tries to make the array's phase deviate from its nominal value. Since a lowered noise floor is only one factor towards an improved SNR in a frequency-sensitive detector, attention has also to be paid to the effect of the injection locking on the array sensitivity in Hz/spin, i.e. the frequency change induced by a single spin. To study this effect, the model of the coupling between the electron spin ensemble and the array oscillators shown in blue in Fig. 21.6.2 has been used to perform transistor level simulations for the prototype realization using Keysight's GoldenGate CR analysis. The results of these simulations are shown in Fig. 21.6.3 for the three cases of sample material being placed on (i) a single coil, (ii) four coils or (iii) all array coils at the same time. From these simulations we find: (i) Despite the (very) nonlinear nature of the IL sensor array, it displays a linear change in its output over a wide range of spins and (ii) the frequency deviation produced by a given number of spins is maximum if the sample is homogeneously distributed over all array coils. This latter effect occurs because as more and more VCOs experience the same ESR-induced frequency shift, less and less restoring force is exerted by the remaining VCOs without ESR signal. To verify that this maximum frequency change is not reduced compared to a single VCO-based detector, identical simulations have been performed for a detector as presented in [3]; the results demonstrate the non-reduced maximum array sensitivity compared to a single VCO, cf. Fig. 21.6.3. However, for mass limited samples it might not always be possible to cover the entire array homogeneously with sample. In these cases, it is possible to observe the oscillation amplitude (which is similarly affected by the ESR effect as the frequency) of each individual VCO in addition to the joint oscillation frequency. Since these AM signals are mostly independent – a residual parasitic coupling

exists between them – the array provides  $N$  independent outputs equivalent to  $N$  isolated sensors in addition to the joint FM output. Moreover, when biasing a cross-coupled LC tank VCO with a current source, an AM-demodulated version of the oscillation amplitude is available in the voltage  $v_X$  shown in the boxed VCO of Fig. 21.6.2, cf. [5], which removes the need for an additional AM demodulator per channel. Therefore, by observing both the array frequency and the individual oscillator amplitudes, a sensor which provides both global sample information with an excellent spin and concentration sensitivity in the FM output and a localized information with excellent spin sensitivity in the AM outputs can be designed.

To verify the above sensor concept, a first prototype has been manufactured in a 130nm CMOS technology using the architecture of Fig. 21.6.4 (left). Here, the joint array output is frequency downconverted from the high ESR frequency (in this prototype adjustable between  $\sim 11.8\text{GHz}$  and  $14.2\text{GHz}$ ) to an IF around 100 MHz (to avoid injection locking between the array and the LO) using a second on-chip VCO. The LO oscillator signal is also made available off-chip in a frequency divided ( $N_{\text{div}} = 64$ ) form to facilitate its embedding into a PLL.

As a first experiment with this prototype, its frequency noise (FN) spectrum has been measured and compared against that of a single VCO similar to the one presented in [3]. The results of Fig. 21.6.3 show an improvement factor of  $\sim 10$  (theoretical prediction:  $\sqrt{N_{\text{coils}}} = \sqrt{8}$ ) in FN of the array compared to a single VCO.

Then, the experimental setup of Fig. 21.6.4 (right) has been used to perform a number of experiments in the target ESR application. The corresponding results are shown in Fig. 21.6.5, where Fig. 21.6.5 (left) shows the spectrum of a 4 pL DPPH sample (per coil), which was used to estimate the spin sensitivity, Fig. 21.6.5 (center) shows the spectrum of a 5mM TEMPOL sample (estimated CS 200 $\mu$ M), a common ESR spin trap and Fig. 21.6.5 (right) shows measurements which prove the possibility to use the AM and FM array outputs for simultaneously measuring spatially resolved and global spectra. In the latter experiments, two different samples (DPPH and solid TEMPOL) were placed on two different array coils. In the resulting spectra shown in Fig. 21.6.5 (right), the FM output shows both the TEMPOL (big peak on the left) and the DPPH peak, while the AM output shows the remotely located TEMPOL sample with greatly ( $>> 10 \times$ ) reduced intensity. The key performance metrics extracted from these measurements are compared against the state-of-the-art in IC-based ESR detection in Fig. 21.6.6, highlighting the greatly improved sensitive volume of the proposed approach, without a (significant) degradation in spin sensitivity, which results in an overall greatly improved concentration sensitivity. In this comparison, the FM output is used only.

Overall, an approach towards simultaneously achieving excellent spin and concentration sensitivity in ESR experiments has been presented, which utilizes arrays of miniaturized IL VCOs to produce a reduced FN floor compared to individual VCOs and at the same time extends the excellent spin sensitivity of the small-size detectors over a large sensitive volume for an improved concentration sensitivity.

### References:

- [1] T. Yalcin, et al., "Single-Chip Detector for Electron Spin Resonance Spectroscopy," *Rev. Sci. Instrum.*, 79(9), pp. 1-6, 2008.
- [2] J. Anders, et al., "K-Band Single-Chip Electron Spin Resonance Detector," *J. Magn. Reson.*, 217, pp. 19-26, 2012.
- [3] J. Handwerker, et al., "A 14GHz Battery-Operated Point-of-Care ESR Spectrometer Based on a 0.13 $\mu$ m CMOS ASIC," *ISSCC Digest Tech. Papers*, pp. 476-477, Feb. 2016.
- [4] T. Mitsunaka, et al., "CMOS Biosensor IC Focusing on Dielectric Relaxations of Biological Water with 120GHz and 60GHz Oscillator Arrays," *ISSCC Digest Tech. Papers*, pp. 478-479, Feb. 2016.
- [5] P. Kinget, "Amplitude Detection Inside CMOS LC Oscillators," *IEEE ISCAS*, 4 pp., 2016. [6] Y. Xuebei, et al., "A Single-Chip Dual-Mode CW/Pulse Electron Paramagnetic Resonance Spectrometer in 0.13 $\mu$ m SiGe BiCMOS," *IEEE IMS*, pp. 1-4, 2013.



Figure 21.6.1: Illustration of the ESR working principle and concept drawing of the proposed coupled VCO array for the detection of spin traps in ESR.



Figure 21.6.2: Schematic of the top row of the IL oscillator array (black) and circuit model of the coupling between an electron spin ensemble and the VCOs.



Figure 21.6.3: Left to right: Comparison of the simulated array sensitivity for different sample distributions over the 8-element array and the sensitivity of the single-element VCO-based ESR detector presented in [3]. Comparison of the measured frequency noise of the presented injection-locked 8-element array and the design presented in [3].



Figure 21.6.4: Left to right: Architecture of the presented prototype of an injection-locked VCO array for ESR detection. Experimental setup used to perform all ESR experiments.



Figure 21.6.5: Left to right: Measured ESR spectra of a DPPH sample with equal size placed on a single coil and 4 array coils, respectively. Measured spectrum of a 5mM sample of the conventional ESR spin trap TEMPOL (sample volume: 90nl). Measured ESR spectra for two small samples of DPPH and solid TEMPOL placed on two different array coils recorded with the global FM output and the AM output of the VCO containing the DPPH sample.

|                                                      | This work       | [3]             | [1]                | [2]             | [6]                                         |
|------------------------------------------------------|-----------------|-----------------|--------------------|-----------------|---------------------------------------------|
| Field strength [ $\text{T}$ ]                        | 0.5             | 0.5             | 0.35               | 0.96            | 0.034                                       |
| Operating frequency [ $\text{GHz}$ ]                 | 11.8 – 14.2     | 13.2 – 14.3     | 9                  | 27              | 0.77 – 0.97                                 |
| Spin sensitivity [ $\text{spins}/\sqrt{\text{Hz}}$ ] | $8 \times 10^9$ | $4 \times 10^9$ | $1 \times 10^{10}$ | $2 \times 10^8$ | Not specified                               |
| Sensitive volume                                     | 120 nl          | 27 nl           | 1 nl               | 1 nl            | External micro resonators of different size |
| ASIC power consumption [mW]                          | 15/ch           | 15              | 160                | 75              | not specified                               |
| Complete spectrometer system size and complexity     | low             | low             | high               | high            | low                                         |
| Technology                                           | 0.13 μm CMOS    | 0.13 μm CMOS    | 0.35 μm CMOS       | 0.13 μm CMOS    | 0.13 μm BiCMOS                              |

Figure 21.6.6: Table comparing the presented work with the state-of-the-art in IC-based ESR detection. Only papers showing measured ESR data are included in the comparison.



Figure 21.6.7: Annotated chip micrograph of the ESR-on-a-chip array ASIC.

# Session 22 Overview: *Gigahertz Data Converters*

## DATA CONVERTER SUBCOMMITTEE



**Session Chair:**  
*Kostas Doris*  
NXP, Eindhoven, The Netherlands



**Associate Chair:**  
*Jan Westra*  
Broadcom, Bunnik, The Netherlands

**Subcommittee Chair:** *Un-Ku Moon*, Oregon State University, Corvallis, OR

Extensive calibrations, the use of FinFET technology and architectural innovations continue to push the bandwidth and dynamic range envelopes of high-speed data converters. This session covers gigahertz data converters with resolutions from 8b up to 16b and sampling rates up to 72GS/s.

In the first paper, an ADC designed in 14nm CMOS extends hierarchical time-interleaving techniques combined with high-speed low-power SAR ADCs and extensive calibrations to enable a record sampling rate for 8b converters with only 235mW power dissipation.

In the second paper, a Nyquist DAC designed in 16nm CMOS combines a current-source calibration scheme with dynamic element matching to push the boundaries of linearity across the whole Nyquist range.

Finally, a hybrid DAC introduces a bandpass Delta-Sigma modulator architecture with pre-distortion techniques for agile signal generation of RF carrier signals up to 6GHz.



8:30 AM

**22.1 A 24-to-72GS/s 8b Time-Interleaved SAR ADC with 2.0-to-3.3pJ/conversion and >30dB SNDR at Nyquist in 14nm CMOS FinFET**

*L. Kull, IBM Zurich Research Laboratory, Rueschlikon, Switzerland*

In Paper 22.1, IBM presents a 24 to 72GS/s 8b ADC in 14nm CMOS that achieves 30dB SNDR at Nyquist. This ADC employs hierarchical interleaving with 16 parallel sampling switches each driving 4 sub ADCs. The 64 asynchronous 8b SAR ADCs use separate comparators per bit and are extensively calibrated for offset, gain and timing errors.



9:00 AM

**22.2 A 16b 6GS/s Nyquist DAC with IMD <-90dBc up to 1.9GHz in 16nm CMOS**

*C-H. Lin, Broadcom, Irvine, CA*

In Paper 22.2, Broadcom presents a 16b 6GS/s Nyquist DAC in 16nm FinFET technology. Utilizing calibration and Dynamic Element Matching, this DAC achieves an IMD <-90dBc up to 1.9GHz and SFDR >80dBc up to 900MHz, occupying an area of 0.52mm<sup>2</sup>, while consuming 350mW.



9:30 AM

**22.3 A 16b 12GS/s Single/Dual-Rate DAC with Successive Bandpass Delta-Sigma Modulator Achieving <-67dBc IM3 Within DC-to-6GHz Tunable Passbands**

*S. Su, University of Southern California, Los Angeles, CA*

In Paper 22.3, the University of Southern California presents a 16b 12GS/s bandpass Delta-Sigma DAC in 65nm CMOS able to tune to the center frequency up to 6GHz. A timing and amplitude error correction scheme combined with inverse sinc pre-distortion enables it to achieve an IM3 from -85 to -67dBc up to Nyquist and an SFDR >60dBc at 4.2GHz.

22

## 22.1 A 24-to-72GS/s 8b Time-Interleaved SAR ADC with 2.0-to-3.3pJ/conversion and >30dB SNDR at Nyquist in 14nm CMOS FinFET

Lukas Kull<sup>1</sup>, Danny Luu<sup>1,2</sup>, Christian Menolfi<sup>1</sup>, Matthias Braendli<sup>1</sup>, Pier Andrea Francese<sup>1</sup>, Thomas Morf<sup>1</sup>, Marcel Kossel<sup>1</sup>, Alessandro Cevrero<sup>1</sup>, Ilter Ozkaya<sup>1,3</sup>, Thomas Toifl<sup>1</sup>

<sup>1</sup>IBM Zurich Research Laboratory, Rueschlikon, Switzerland

<sup>2</sup>ETH Zurich, Zurich, Switzerland; <sup>3</sup>EPFL, Lausanne, Switzerland

Optical communication standards, such as ITU OTU-4, OIF 112G and 100/400Gb/s Ethernet, require ADCs with more than 50GS/s and at least 5 ENOB to enable complex digital equalization, and a growing number of appropriate designs have been presented [1-4], mostly time-interleaved SAR ADCs. Most of these ADCs were not intended for input frequencies up to Nyquist and report an input range up to approximately 20GHz, often equivalent to the analog 3dB bandwidth. Ultimately, the analog bandwidth is less relevant than SNDR at high frequencies because an FIR filter can equalize amplitude degradation, but not increase SNDR. The design presented in this paper does not focus on the 3dB bandwidth, but it is optimized for best SNDR at the Nyquist frequency of up to 36GHz. Low power and area are critical for many applications and are achieved by an optimized SAR that allows low supply voltages while still maintaining high speed and accuracy. At 72GS/s, the ADC achieves 39.3dB at low input frequencies and 30.4dB at Nyquist. It consumes 235mW at 72GS/s and 97mW at 48GS/s, which results in 3.3pJ and 2.0pJ per conversion, respectively. The ADC is implemented in an area of 0.15mm<sup>2</sup> in 14nm CMOS FinFET technology.

The differential input of the ADC in Fig. 22.1.1 is protected by reduced ESD diodes connected to a T-Coil and terminated by  $2 \times 50\Omega$ . It directly connects to 16 parallel sampling switches [1] that each feed buffered samples to 4 sub-ADCs. The output of the 64 sub-ADCs is captured by an on-chip shift-register-based memory that stores 16,384 samples. Only 4 CMOS clock phases at quarter rate are required to drive the ADC. In this test chip, these phases are derived from a half-rate clock with a CML divider, skew-adjusted and converted to CMOS outside the ADC macro. A block inside the ADC generates all internal clock signals, including sampling and sub-sampling clocks. Inline demux switching as in [3] would result in a second pole and steeper amplitude roll-off, and is therefore not favorable to achieve higher SNDR beyond the 3dB bandwidth.

A single NMOS switch with feed-through compensation samples the input signal onto a sampling capacitor. A source follower as in [3] buffers the sampled voltage and connects to 4 bootstrapped switches, which sub-sample the voltage onto the SAR ADCs. A reset transistor is activated after sub-sampling to eliminate ISI.

Generating low-duty-cycle sampling clocks is challenging as low jitter and high skew accuracy are required for good SNDR performance at high input frequencies. To reduce skew, each of the 4 input clock signals is used to derive 4 sampling clocks with a minimal number of components, as shown in Fig. 22.1.2. A pass-gate driven by a slower enable signal  $en_{16}<0>$  and its complement  $enb_{16}<0>$  is used to cut out every 4<sup>th</sup> pulse of the incoming ¼-rate clock  $ck_4<0>$ . Two inverters buffer the signal to drive the sampling switch with a short rise and fall time. A small mismatch from the routing of  $ck_4$  to the 4 pass-gates and  $V_t$ -mismatch of pass-gate and inverter transistors accumulate to a few 100fs skew between channels. Several transistors with connected capacitors in front of the first inverter enable fine-grained skew compensation. Placing transistors above the capacitors instead of below towards ground results in a higher on-off ratio of the capacitive load on  $ck_{16in}<0>$  after layout extraction. A total of 5b on the skew calibration was found to be sufficient to cover mismatch with 10-20fs skew compensation steps, which is much less than the expected random jitter standard deviation.

Clock signals  $ck_{16}<n>$  sample the input, reset sampled voltages, and generate clock signals for the bootstrapped switches connecting source followers to the SAR ADCs. Dynamic logic reduces the load on  $ck_{16}<10>$  to generate these sub-sampling clocks. A loop of 2 small inverters eliminates drift at low-speed operation caused by leakage. A 2<sup>nd</sup> pass-gate driven by  $en_{64}<0>$  and  $enb_{64}<0>$  cuts out pulses of  $ck_{16sub}<0>$  to form the clock signals  $ck_{64}$  for the bootstrapped switches.

The SAR ADC in Fig. 22.1.3 stores the sub-sampled signal on a binary capacitive DAC (CDAC). The CDAC is fully differential for the first 4b to prevent large

common-mode variations at the input to the sense-amp latch comparators. Bit 5 of the CDAC has only a capacitor on one side, resulting in a small common-mode step. The smaller units of ½ and ¼ capacitors are implemented as single-sided with a reduced reference voltage of ½ and ¼. Implementing the last 3b as non-differential and with fractional references reduces the size of the CDAC significantly. Eight comparators are used to eliminate a demux stage towards the CDAC to accelerate operation. The same clock that drives the bootstrapped switch triggers the 1<sup>st</sup> comparator. Each comparator triggers the next one, and the last one triggers the reset and calibration logic. All comparators are alternatingly offset-compensated in an auto-zero step after reset at the end of the conversion period. Asynchronous operation of the SAR ADC ensures maximum conversion speed at low power and excellent metastability performance [5].

The SAR ADC offset is background-calibrated on chip. The offset between SAR channels is subtracted off chip. Each SAR ADC has its own resistive DAC to set the reference voltage of each CDAC separately, thus enabling global gain calibration and inter-channel mismatch compensation while providing high noise isolation to neighboring SAR ADCs. ADC internal timing mismatch on the 16 sampling channels is compensated as described in Fig. 22.1.2. All calibration settings for this prototype are derived from off-chip sine fitting.

The ADC shows best efficiency if the supply of the interleaver ( $V_{DDI}$ ) and the SAR ADCs ( $V_{DDA}$ ) are set separately. A higher supply on the interleaver yields a higher bandwidth for a given sampling speed because of the higher overdrive on the sampling switches. The supply on the SAR ADCs can be set as low as possible as long as the SAR ADCs still complete their operation. At 72GS/s, the power consumption of 235mW splits into 77mW for the interleaver and 158mW for the SAR ADC array. At 48GS/s, VDD and interleaver source follower bias were set for optimal efficiency, resulting in a total power consumption of 97mW, equivalent to 2.0pJ/conversion.

Figure 22.1.4 shows that the ADC is fully operational from 24GS/s to 76GS/s. SNDR at Nyquist input frequency (½ sampling speed) drops significantly and is in line with the SNDR vs. input frequency plot of Fig. 22.1.5. Power consumption strongly depends on the supply voltages, as can be seen from the lower half of Fig. 22.1.4, and is almost proportional to the sampling frequency. Loss in amplitude at higher input frequencies is the main source of SNDR degradation, see Fig. 22.1.5. The 3dB frequency depends on the sampling speed and interleaver supply and is above 20GHz. Even though the ADC signal is down by 10-12dB at 40GHz, the ADC still achieves around 4.5 ENOB, sufficient for most applications. The power spectrum in Fig. 22.1.6 at 72GS/s shows that an SFDR of 44dB is achieved.

The ADC is manufactured in a 14nm CMOS FinFET process and occupies 250×590µm<sup>2</sup>, see Fig. 22.1.7. One SAR ADC occupies only 80×12µm<sup>2</sup>, but resistive DACs, local configuration memory and routing increase the size of the sub-ADC array to 80% of the total area.

The comparison table in Fig. 22.1.6 shows recent advances in high-speed CMOS ADCs. This design shows the highest SNDR up to and beyond 20GHz, and is the first design to report an input frequency range >25GHz. Its FoM is 3x better than that of previous designs. The total area of the ADC is 3x smaller than the smallest >40GS/s 8b ADC [6].

### References:

- [1] Y. M. Greshishchev, et al., "A 40GS/s 6b ADC in 65nm CMOS," ISSCC, pp. 390–391, Feb. 2010.
- [2] Fujitsu Semiconductor Europe, "LUKE-ES 55–65 GSa/s 8 bit ADC", Mar. 2012, Accessed in Sept. 2017, <<http://www.fujitsu.com/downloads/MICRO/fme/documentation/c63.pdf>>.
- [3] L. Kull, et al., "A 90GS/s 8b 667mW 64× interleaved SAR ADC in 32nm digital SOI CMOS," ISSCC, pp. 378–379, Feb. 2014.
- [4] J. Cao, et al., "A transmitter and receiver for 100Gb/s coherent networks with integrated 4×64GS/s 8b ADCs and DACs in 20nm CMOS," ISSCC, pp. 484–485, Feb. 2017.
- [5] L. Kull, et al., "A 10b 1.5GS/s pipelined-SAR ADC with background second-stage common-mode regulation and offset calibration in 14nm CMOS FinFET," ISSCC, pp. 474–475, Feb. 2017.
- [6] B. Murmann, Stanford University, "ADC Performance Survey 1997-2017," Accessed in Sep. 2017, <<http://www.stanford.edu/~murmann/adcsurvey.html>>.



Figure 22.1.1: ADC block diagram with schematic of the front-end sampler.



Figure 22.1.2: Sampling and sub-sampling clock generation with skew compensation.



Figure 22.1.3: Asynchronously clocked SAR ADC with binary capacitive DAC based on unit capacitors.



Figure 22.1.4: Measured SNDR and power vs. sampling frequency.



Figure 22.1.5: Measured SNDR and amplitude vs. input frequency. Corresponding VDD settings are given in Fig. 4.



| Specifications                                    | [1]      | [2]           | [3]    | [4]    | This work           |
|---------------------------------------------------|----------|---------------|--------|--------|---------------------|
| Architecture                                      | Ti-SAR   | Ti-SAR        | Ti-SAR | Ti-SAR | Ti-SAR              |
| CMOS Technology (nm)                              | 65       | 40            | 32     | 20     | 14                  |
| Resolution (bits)                                 | 6        | 8             | 8      | 8      | 8                   |
| Sampling Speed (GHz)                              | 40       | 65            | 90     | 64     | 48 72               |
| Supply Voltage (V)                                | 1.0/1.2* | $\pm 0.9/1.8$ | 1.2    | 1.2    | 0.65/0.75* 0.8/0.9* |
| Input Range (V <sub>pp-diff</sub> )               | 1.2      | 0.7           | 0.8    | 0.5    | 0.65                |
| SNDR low f <sub>n</sub> (dB)                      | 34.9     | 36.1**        | 36.0   | 39.7   | 40.9 39.3           |
| SNDR high f <sub>n</sub> (dB)                     | 25.2     | 33.0          | 32.5   | 34.4   | 30.4                |
| f <sub>n</sub> for SNDR high f <sub>n</sub> (GHz) | 18       | 19.9          | 19     | 24.1   | 36.1                |
| 3dB Bandwidth (GHz)                               | -        | 20            | 22     | 20     | 21                  |
| Power (mW)                                        | 1500     | 1200          | 667    | 950    | 97 235              |
| FOM low f <sub>n</sub> (fJ/conv.-step)            | 829      | 355           | 144    | 188    | 22 43               |
| FOM high f <sub>n</sub> (fJ/conv.-step)           | 2512     | 203           | 300    | 47     | 121                 |
| Area (mm <sup>2</sup> )                           | 16       | 0.45          | -      | -      | 0.15                |

\*SAR ADCs/Interleaver &amp; Clocking

\*\* SNDR(FS): 8GHz, -6dBFS

Figure 22.1.6: Power spectrum near Nyquist for 72GS/s sampling frequency and performance comparison table.



Figure 22.1.7: Chip micrograph and layout in 14nm CMOS FinFET.

## 22.2 A 16b 6GS/s Nyquist DAC with IMD <-90dBc up to 1.9GHz in 16nm CMOS

Chi-Hung Lin, Koon Lun Jackie Wong, Tae-Youn Kim, Guangxi Ray Xie, Donald Major, Greg Unruh, Sunny Raj Dommaraju, Hans Eberhart, Ardie Venes

Broadcom, Irvine, CA

Advanced communication systems require DACs with high linearity over a wide bandwidth while consuming low power and small area [1] - [6]. In this work, a 16b 6GS/s Nyquist current-steering DAC in 16nm CMOS is presented. Utilizing bounded INL calibration and thermometer DEM to tackle voltage and timing errors, this DAC achieves an INL<+/-0.25LSB, IMD<-90dBc up to 1.9GHz and SFDR>80dBc up to 900MHz while occupying an area of 0.52mm<sup>2</sup> and dissipating 350mW from 1.0V and 3.0V supplies.

Figure 22.2.1 shows a block diagram and main cell of the DAC. The DAC cell contains a cascaded current source (CS), current switches, thick-oxide output cascodes and bleeding currents. Traditionally, the current calibration was done by comparing the CS to a reference current, which yielded the best DNL. However, it is the INL that determines the linearity of the DAC, especially at low frequencies. To calibrate the DAC INL without complex hardware, we applied a bounded INL calibration by making the reference current adjustable at a higher resolution.

The idea of the bounded INL calibration for 63 thermometer-coded MSBs (b15-b10) is described here. LSB (b9-b4) calibration is applied before MSB calibration, and the procedure is similar as follows:

Step 1: Calibrate the reference current,  $i_{ref}$ , by comparing it to the sum of all LSBs + 1LSB. The calibrated reference current is thus equal to the target MSB current,  $i_D$ , and the DNL between LSBs and MSBs is minimized. Once the  $i_{ref}$  is calibrated, the LSBs are disconnected from the positive input of the current comparator, and one of the MSBs is connected instead. In the meantime, the reference current,  $i_{ref}$ , is kept at the negative input of current comparator. The offset of the comparator is included in  $i_{ref}$ , and gets cancelled out.

Step 2: Construct the ideal transfer function for 63 MSBs based on  $i_D$ .

Step 3: Tune  $i_{ref}$  to match the 1<sup>st</sup> MSB current,  $i_1$ . The digital code of  $i_{ref}$  is the digital representation of the uncalibrated current  $i_1$ . Assuming for now that  $i_{ref}$  has infinite precision, the desired correction would be  $(i_D - i_1)$ . Calibrate  $i_1$  by applying a current source adjustment, CS Adj, to make it closest to  $i_D$ . After calibration, the residual INL,  $inl_1$ , is no greater than half the step size of CS Adj. In this work, CS Adj has 8b programmability with a step size of 0.5 LSB of the DAC resolution. This yields a total correcting range of +/-64 LSB at 16b levels. To reduce the non-linear effect of CS Adj, the calibrated current  $i_{1c}$  is re-measured by tuning  $i_{ref}$  to re-calculate  $inl_1 = i_D - i_{1c}$ .

Step 4: Tune  $i_{ref}$  to match the 2<sup>nd</sup> MSB current,  $i_2$ . The desired correction is  $inl_1 + (i_D - i_2)$ , which accumulates  $inl_1$  from the previous MSB and the error of the current MSB. It is then corrected by a quantized CS Adj<sub>2</sub>, and the calibrated  $i_{2c}$  is re-measured. The residual INL,  $inl_2 = inl_1 + (i_D - i_{2c})$ , is again determined by the quantization error of CS Adj, which will be within +/-0.5 step of CS Adj.

Step 5: Repeat step 4 for all remaining MSB currents.

Step 6: Distribute the full-scale current difference,  $inl_{63}$ , which can range up to 0.5 step of CS Adj into 63 MSB currents. This very small error can be safely ignored in the LSB segments. After calibration, the entire INL curve will be bounded by +/-0.25 LSB of 16b.

Quantization of  $i_{ref}$  with finite precision limits the minimum calibrated INL. Figure 22.2.2 shows this impact. With equal precision for  $i_{ref}$  and CS, this INL calibration approximates conventional DNL calibration due to the same quantization errors. As seen in Fig. 22.2.2, when  $i_{ref}$  has 5b higher resolution than CS, the impact of its quantization errors becomes negligible and the INL will be bounded by +/-0.25 LSB. Measured INL in blue confirms the simulated predictions in red. There are

many ways to realize the additional 5b accuracy in  $i_{ref}$ . In this work the 8b CS Adj is reused and dithered with a sequence that has a tunable duty-cycle. An average of 4K samples is used to suppress the noise to achieve 13b accuracy. Because only the linearity within 0.5LSB from  $i_D$  matters, digital dithering is best suited to achieve high linearity at small swing.

A well-matched DAC can usually achieve high linearity up to several hundred MHz. Once the signal frequency is in the GHz range, the effect of timing mismatch,  $\Delta t$ , takes over and limits the DAC performance. Besides timing calibration [3-4], dynamic element matching, DEM, has been shown to effectively handle  $\Delta t$  issues through averaging [5]. However, during DEM operation, not only does the timing mismatch get averaged, but the voltage mismatch also gets averaged resulting in an elevated noise floor. By performing DEM after a DAC has gone through current calibration, the impact on the noise floor can be substantially reduced.

Traditionally, the complexity of DEM grows exponentially with the number of bits. Our 2D thermometer coded DEM combines column-and-row thermometer-coded logic [6] with local DEM [5] to minimize the glitch energy and substantially increases the randomness with only moderate hardware cost. The DEM is segmented into two groups: 6b MSB (b15-b10) and next 3b (b9-b7). The 6b MSB inputs are first grouped into 3b columns and 3b rows that then get converted to thermometer-coded columns ( $C_j$ ) and rows ( $R_i$ ).  $C_j$  and  $R_i$  are shuffled independently by a local DEM engine as shown in Fig. 22.2.1. Note that the logic to turn on a cell is  $C_{j+1} + C_j * R_i$ . Considering a non-shuffling 2-dimensional decoder,  $C_{j+1}$  can easily tap off from the next column to the right. As in Fig. 22.2.1,  $C_3$  is sitting at the right side of the  $C_2$ . However, when  $C_0$  to  $C_7$  are shuffled,  $C_3$  is no longer sitting next to  $C_2$ . Therefore, an extra local DEM is needed to shuffle  $C_{j+1}$  in the same way as  $C_j$ . A copy of  $C_3$  is sent to the 3<sup>rd</sup> column input of local DEM ( $C_{j+1}$ ), and is brought to the last column along with  $C_2$ . To shuffle Local DEM ( $C_j$ ) and ( $C_{j+1}$ ) in the same way, the same PRBS seed, rand\_c, is used. Row DEM uses a different seed, rand\_r. The rand\_c and rand\_r are generated by two independent PRBS-23 blocks with different initial values. The implementation of this 2D DEM engine only requires three 3b DEM blocks, and can operate at high speed with relatively low power due to its simplicity. The next 3b is shuffled by a 3b DEM like the one used in the local DEM shown in Fig. 22.2.1. The last 7 LSBs are binary weighted and not shuffled.

Figure 22.2.3 shows the measured spectrum of a two-tone sine wave centered at 1.91GHz and sampled at 6GS/s. A two-tone direct digital frequency synthesizer (DDFS) was implemented on-chip to serve as the signal source. The IMD equals -90.68dBc at the worst-case tone. Figure 22.2.4 shows the IMD versus output frequency compared to state-of-the-art multi-GS/s CMOS DACs. This DAC achieves an IMD<-90dBc up to 1.9GHz and an IMD<-80dBc up to 3.9GHz. Figure 22.2.5 shows the SFDR results. This DAC achieves an SFDR>80dBc up to 900MHz and an SFDR>70dBc up to 2.3GHz. A performance summary and a comparison with prior arts are shown in Fig. 22.2.6. Figure 22.2.7 shows the die micrograph.

### References:

- [1] C. Erdmann, et al., "A 330mW 14b 6.8GS/s Dual-Mode RF DAC in 16nm FinFET Achieving -70.8 dBc ACPR in a 20MHz Channel at 5.2GHz", ISSCC, pp. 280-281, Feb. 2017.
- [2] S. Su, et al., "A 12b 2GS/s Dual-Rate Hybrid DAC with Pulsed Timing-Error Pre-Distortion and In-Band Noise Cancellation Achieving >74dBc SFDR up to 1GHz in 65nm CMOS", ISSCC, pp. 456-457, Feb. 2016.
- [3] E. Bechthum, et al., "A 5.3GHz 16b 1.75GS/s Wideband RF Mixing-DAC Achieving IMD<-82dBc up to 1.9GHz", IEEE JSSC, vol. 51, no. 6, pp. 1374-1384, June 2016.
- [4] H. Van de Vel, et al., "A 240mW 16b 3.2GS/s DAC in 65nm CMOS with <-80dBc IM3 up to 600MHz", ISSCC, pp. 206-207, Feb. 2014.
- [5] W.-T. Lin, et al., "A 12-bit 40nm DAC Achieving SFDR > 70 dB at 1.6 GS/s and IMD < -61 dB at 2.8 GS/s With DEMDRZ Technique", IEEE JSSC, vol. 49, no. 3, pp. 708-717, Mar. 2014.
- [6] C.-H. Lin, et al., "A 12 bit 2.9GS/s DAC with IM3<-60dBc Beyond 1GHz in 65nm CMOS", IEEE JSSC, vol. 44, pp. 3285-3293, Dec. 2009.



Figure 22.2.1: DAC block diagram with bounded INL calibration and thermometer DEM to tackle voltage and timing errors.



Figure 22.2.2: Simulated and measured INL vs. codes for the 6b MSB cells for various Ref Adj resolutions.



Figure 22.2.3: Measured spectrum of a two-tone sine wave centered at 1.908GHz and sampled at 6GS/s. The IMD equals -90.68dBc at the worst-case tone.



Figure 22.2.4: IMD versus output frequency with and without CAL/DEM compared to state-of-the-art multi-GS/s CMOS DACs.



Figure 22.2.5: SFDR versus output frequency with and without CAL/DEM compared to state-of-the-art multi-GS/s CMOS DACs.

|               |                 | This work                    | [1] ISSCC 2017            | [2] ISSCC 2016 | [3] JSSC 2016             | [4] ISSCC 2014 | [5] ISSCC 2012 |
|---------------|-----------------|------------------------------|---------------------------|----------------|---------------------------|----------------|----------------|
| Process node  | nm              | 16                           | 16                        | 65             | 65                        | 65             | 180            |
| Resolution    | b               | 16                           | 14                        | 12             | 16                        | 16             | 14             |
| Sampling Rate | GS/s            | 6                            | 6.8                       | 2/8            | 1.75                      | 3.2            | 3/6            |
| Supply        | V               | 1.0/3.0                      | -                         | 1.0/2.5        | 1.2/3.3                   | 1.2/3.3        | 1.8/3.3        |
| Full Scale    | mA              | 40                           | 20                        | 16             | 20                        | 20             | 20             |
| Area          | mm <sup>2</sup> | 0.52                         | 0.86                      | 0.57           | 1.6                       | 1              | 4              |
| Power         | mW              | 350                          | 330                       | -              | 380                       | 240            | 600            |
| IMD@1.9GHz    | dBc             | -91                          | -82                       | -              | -82                       | -68            | -68            |
| IMD@3.9GHz    | dBc             | -80                          | -71                       | -              | -62                       | -              | -              |
| SFDR@0.4GHz   | GHz             | 88                           | 72                        | 77             | -                         | 62             | 63             |
| SFDR@2GHz     | dBc             | 74                           | 69                        | -              | 62                        | -              | -              |
| NSD@250MHz    | dBm/Hz          | -165 (DEM=0)<br>-162 (DEM=1) | -                         | -              | -                         | -              | -              |
| FS=3.5dBm     |                 |                              |                           |                |                           |                |                |
| NSD@2.6GHz    | dBm/Hz          | -162 (DEM=0)<br>-159 (DEM=1) | -160@5.2GHz<br>FS= -10dBm | -              | -165@1.9GHz<br>FS= -12dBm | -              | -              |
| FS= -3.7dBm   |                 |                              |                           |                |                           |                |                |

Figure 22.2.6: Performance summary and comparison.



Figure 22.2.7: Die micrograph.

## 22.3 A 16b 12GS/s Single/Dual-Rate DAC with Successive Bandpass Delta-Sigma Modulator Achieving <-67dBc IM3 Within DC-to-6GHz Tunable Passbands

Shiyu Su, Mike Shuo-Wei Chen

University of Southern California, Los Angeles, CA

The agile allocation of signal bands over RF frequencies and high in-band spectral purity (both SFDR and NSD) can enable higher-order modulation in high-throughput flexible wireless/wireline transmitters, where signals are often channelized at certain center frequencies. Using a Nyquist DAC to cover the entire signal spectrum is thus unnecessary, as this trades the achievable dynamic range with the bandwidth. For such applications, a narrowband Nyquist DAC followed by a mixer, RF mixing DAC [1], or exploiting higher Nyquist zones [2] are typically adopted; these are either limited by deliverable output power or nearby spectral images due to lower input data rates. A dual-rate hybrid DAC [3] uses a delta-sigma modulated LSB path to compensate for the non-idealities of the Nyquist path, allowing high linearity and a low noise floor within the Nyquist band while limiting out-of-band quantization noise. Still, this only synthesizes the baseband signal and is difficult when covering wide RF spectra due to the DSM OSR requirements.

This paper presents a hybrid DAC with a tunable bandpass DSM that addresses the above DAC architecture constraints for high-linearity low-noise waveform synthesis over wide frequency spans. To achieve high-speed operation, a successive pipelined bandpass DSM structure is applied with shortened critical paths. We also use inverse-Sinc-shaped digital pre-distortion (DPD) to overcome roll-off from the finite DSM clock frequency, extending the effective calibration bandwidth by ~6X compared to the DPD scheme in [3]. The bandpass DAC architecture is reconfigurable for dual-rate hybrid operation, similar to [3] except for a different noise transfer function (it gains higher linearity but narrower Nyquist band compared to single rate operation). A proof-of-concept 16b 12GS/s prototype is implemented in 65nm CMOS that achieves <-170dBm/Hz in-band NSD and -85 to -67dBc IM3 from a DC-to-6GHz Nyquist band, which is lower than the state-of-the-art high-rate (>10GS/s) CMOS DACs.

Figure 22.3.1 shows the DAC implementation; 8 channels of 16b digital data are generated via 16 on-chip DDSs, each operating at 1.5GHz. A data-rate control module is used to: a) split an input word into 4b MSB and 12b LSB for DSM; and b) adjust the data rate of the MSB path to be 1, 2, 4, or 8x lower than the LSB path for dual-rate hybrid mode operation. This allows tradeoffs between DAC linearity and spectral-image location. (The higher the MSB data rate, the farther the image frequency.) The MSB paths are decoded to thermometer form and randomized via data weighted averaging (DWA). The mismatched amplitude and timing errors of those MSB current cells are measured and stored in on-chip memory and used by the inverse-Sinc-shaped DPD module. The output of the DPD is 13b wide and compressed to 4b via the successive pipelined 2<sup>nd</sup>/3<sup>rd</sup>-order bandpass DSM. The DSM is only applied to the LSB portion of the input data, so the shaped noise of this hybrid DAC is far lower than conventional single-bit DSM DACs. The passbands of the bandpass DSM are also reconfigurable among 8 of 64 possible locations across the whole Nyquist band. Delay equalization is then used to align the MSB and LSB data phases, which are serialized into a single-channel 12GS/s data stream via a MUX controlled by multi-phase clocks and then synchronized by the CML latches prior to the current-steering cells.

High-speed DSM operation is crucial in this DAC architecture to tune the center frequency of the bandpass response over a wide frequency range, but conventional pipelining/unrolling techniques are often inapplicable for feedback systems with nonlinear truncation operations. Reference [4] proposes a high-speed lowpass DSM architecture, but that method cannot be applied in bandpass DSM design because: a) the operation along the feedback loop cannot be pipelined; and b) the critical path grows with the unrolling operation. This work presents a successive pipelined DSM structure that divides the DSM into 3 stages, each only compressing the incoming bit width by smaller amounts with the scaling factor shown in Fig. 22.3.2. This sharply reduces the critical path of the DSM from 13b to 3b arithmetic operations in each stage and allows high-speed operation. The 3b quantization noise from each stage is also correlated; the total quantization noise at the last-stage output is thus equivalent to a single-stage 13b DSM, as confirmed by our analysis and simulation. In each stage, the 3 LSBs of input data are processed by the error feedback DSM, while the other MSBs are

added at the stage output, where extra pipelining is applied. To further shorten the critical path, the feedforward path of the DSM is pipelined and the same latency in the output adder is inserted accordingly for delay equalization as shown in Fig. 22.3.2. The center frequency of the DSM is selected by changing the 3 coefficients ( $a_1 \sim a_3$ ) of the feedback multipliers and enabling/disabling the chopping mixer inserted at the beginning/end of the DSM chain. In the multiplier implementation, parallel shift-and-sum blocks and a multiplexer are utilized for high-speed operation. In chopping mode, the input signal is chopped twice with  $F_{\text{dsm}}/2$ , while the DSM quantization noise is only chopped once, mirroring the shaped noise profile symmetrically to  $F_{\text{dsm}}/4$ , where  $F_{\text{dsm}}$  is the data rate of a single-channel DSM. Critical path and multiplier complexity are thus reduced due to replicated channels via chopping.

Figure 22.3.3 shows the inverse-Sinc-shaped DPD scheme. Reference [3] introduced a DPD scheme leveraging the fine timing and amplitude resolution of a DSM, which approximates the short timing pulse with digital pulses generated by the oversampling clock of the DSM. The approximated error pulses have wider duration compared with the actual error pulses and are equivalent to the actual errors shaped by a Sinc filter with the first notch frequency at  $F_s$ , where  $F_s$  is the overall data rate of DAC. The approximation is thus only accurate within the  $F_s/8$  band and deviates further from the actual error as frequency increases. This imposes severe limits for the intended bandpass DAC operation, as the signal can operate close to  $F_s/2$ . The inverse-Sinc-shaped DPD scheme further processes the approximated error pulses with an inverse-Sinc filter to compensate the signal attenuation due to Sinc roll-off, especially for high-frequency harmonics. The inverse-Sinc filter is implemented with an FIR structure, where the coefficients are limited to the power of two for simplicity while maintaining sufficient accuracy for the expected filter response. As long as the filter gain can compensate the Sinc attenuation, the harmonic reduction will be enhanced.

The test chip is fabricated in 65nm CMOS and packaged in a 64-pin QFN. The DAC output is differentially loaded with a  $50\Omega$  resistor and delivers an output current of 16mA with the data rate set to 12GS/s. Sixteen 16b 1.5GS/s DDSs are implemented on-chip to synthesize full-scale digital sinusoids for single- or two-tone tests. Amplitude and timing errors are each measured by leveraging their orthogonality [5]. The inverse-Sinc-shaped DPD is capable of correcting the amplitude and timing errors within 250nA and 50fs accuracy, respectively. The NSD is measured <-170dBm/Hz in the passbands with DPD and <-128dBm/Hz in the out-of-bands.

Figure 22.3.4 shows spectrum snapshots with a 3015MHz signal frequency at 9GS/s and the measured SFDR in different frequencies/operation modes. The DWA and DPD can improve the SFDR by >18dB at low frequencies and maintain the improvement of 5–9 dB close to Nyquist frequency. The SFDR remains >65dB up to 3.1GHz at 9GS/s and >60dB up to 4.2GHz at 12GS/s.

Figure 22.3.5 shows spectrum snapshots of a two-tone test and the measured IM3 versus signal frequency. The IM3 is measured from -85 to -67dBc over the Nyquist band at 12GS/s and from -87 to -70dBc over the DC-to-3.2GHz band at 9GS/s.

Figure 22.3.6 shows the measured SFDR vs. signal frequency in dual-rate mode with DSM operating at 9GHz, demonstrating higher SFDR than single-rate mode operation, as expected. The performance is summarized and compared with state-of-the-art DACs. Figure 22.3.7 shows die micrograph.

### References:

- [1] C. Erdmann, et al., "A 330mW 14b 6.8GS/s Dual-Mode RF DAC in 16nm FinFET Achieving -70.8dBc ACPR in a 20MHz Channel at 5.2GHz," *ISSCC Dig. Tech. Papers*, pp. 280–281, Feb. 2017.
- [2] L. Duncan, et al., "A 10b DC-to-20GHz multiple-return-to-zero DAC with >48dB SFDR," *ISSCC Dig. Tech. Papers*, pp. 286–287, Feb. 2017.
- [3] S. Su, et al., "A 12-Bit 2GS/s Dual-Rate Hybrid DAC With Pulsed-Error Pre-distortion and In-band Noise Cancellation Achieving >74dBc SFDR and <-80dBc IM3 up to 1GHz in 65nm CMOS," *IEEE JSSC*, vol. 51, no. 12, pp. 2963–2978, Dec. 2016.
- [4] S. Su, et al., "A 12 bit 1GS/s Dual-Rate Hybrid DAC with an 8GS/s Unrolled Pipeline Delta-sigma Modulator Achieving >75dB SFDR over the Nyquist Band," *IEEE JSSC*, vol. 50, no. 4, pp. 896–907, Apr. 2015.
- [5] Y. Tang, et al., "A 14 bit 200MS/s DAC With SFDR >78dBc, IM3 <-83dBc and NSD <-163dBm/Hz across the Whole Nyquist Band Enabled by Dynamic-mismatch Mapping," *IEEE JSSC*, vol. 46, no. 6, pp. 1371–1381, June 2011.



Figure 22.3.1: Architecture of the single/dual-rate bandpass DSM DAC.



Figure 22.3.2: Implementation of the tunable successive bandpass DSM.



Figure 22.3.3: Inverse-Sinc shaped pre-distortion scheme.

Figure 22.3.4: Measured spectra for  $F_{\text{sig}} = 3015\text{MHz}$  in single-rate mode and measured SFDR over the Nyquist band with/without pre-distortion.

Figure 22.3.5: Measured spectra for two-tone test and measured IM3 over the Nyquist band.



Figure 22.3.6: Measured SFDR in dual-rate hybrid mode and performance comparison with state-of-the-art high-speed CMOS DACs.



Figure 22.3.7: Die micrograph.

# Session 23 Overview: *LO Generation*

## RF SUBCOMMITTEE



**Session Chair:**  
**Hyunchol Shin**  
*Kwangwoon University, Seoul, Korea*



**Associate Chair:**  
**Andrea Bevilacqua**  
*University of Padova, Padova, Italy*

**Subcommittee Chair: Piet Wambacq**, imec, Belgium

The session presents LO-generation systems aimed at 5G communications and sub-mm-wave sensing systems. The first three papers focus on mm-wave CMOS LOs for multiband 5G systems and highlight the importance of injection-locked frequency multipliers and accurate quadrature generation for the 28-to-44GHz band. Then, the session continues with a BiCMOS 301.7-to-331.8GHz source and a 4GHz inverse-Class-F CMOS VCO. A quad-core BiCMOS 15GHz VCO is presented next, while a 7.4-to-14GHz CMOS PLL concludes the session.



10:15 AM

### 23.1 A -31dBc Integrated-Phase-Noise 29GHz Fractional-N Frequency Synthesizer Supporting Multiple Frequency Bands for Backward-Compatible 5G Using a Frequency Doubler and Injection-Locked Frequency Multipliers

*H. Yoon, Ulsan National Institute of Science and Technology, Ulsan, Korea*

In Paper 23.1, the Ulsan National Institute of Science and Technology presents a multiband LO generator for 5G systems covering 2.7-to-4.2GHz, 5.2-to-6GHz, and 25-to-30GHz bands and showing -31.4dBc integrated phase noise and 206fs<sub>rms</sub> jitter for a 29GHz carrier.



10:45 AM

### 23.2 A >40dB IRR, 44% Fractional-Bandwidth Ultra-Wideband mm-Wave Quadrature LO Generator for 5G Networks in 55nm CMOS

*F. Piri, University of Pavia, Pavia, Italy*

In Paper 23.2, the University of Pavia presents a circuit capable of precise generation of quadrature signals over a wide frequency range for 5G communication systems. Over the 28-to-44GHz frequency range, the circuit achieves a quadrature error of less than 1°.



11:00 AM

**23.3 A 22.8-to-43.2GHz Tuning-Less Injection-Locked Frequency Tripler Using Injection-Current Boosting with 76.4% Locking Range for Multiband 5G Applications***J. Zhang*, University of Electronic Science and Technology of China, Chengdu, China

In Paper 23.3, the University of Electronic Science and Technology of China presents an injection-locked frequency tripler operating from 22.8 to 43.2GHz to support multiband 5G applications. The signal generator achieves -40dBc integrated phase noise at 28GHz with a 41.8mW power consumption.



11:15 AM

**23.4 A 301.7-to-331.8GHz Source with Entirely On-Chip Feedback Loop for Frequency Stabilization in 0.13µm BiCMOS***C. Jiang*, University of Michigan, Ann Arbor, MI and Cornell University, Ithaca, NY

In Paper 23.4, the University of Michigan, Cornell University, and STMicroelectronics describe a frequency stabilization technique for mm-wave/THz oscillators. The paper presents a BiCMOS 300GHz oscillator having a very stable frequency output without the need for a PLL or an off-chip crystal and providing a peak output power of -13.9dBm across a 302-to-332GHz range.



11:30 AM

**23.5 An Inverse-Class-F CMOS VCO with Intrinsic-High-Q 1<sup>st</sup>- and 2<sup>nd</sup>-Harmonic Resonances for 1/f<sup>2</sup>-to-1/f<sup>3</sup> Phase-Noise Suppression Achieving 196.2dBc/Hz FOM***C-C. Lim*, University of Macau, Macau, China and University of Malaya, Kuala Lumpur, Malaysia

In Paper 23.5, the University of Macau describes a 4GHz very-low-phase-noise inverse-Class-F CMOS VCO that significantly improves phase noise in 1/f<sup>2</sup> and 1/f<sup>3</sup> regions. The VCO exhibits an FOM of 196.2dBc/Hz with a tuning range of 25.5% between 3.5 and 4.5GHz.



11:45 AM

**23.6 A Quad-Core 15GHz BiCMOS VCO with -124dBc/Hz Phase Noise at 1MHz Offset, -189dBc/Hz FOM, and Robust to Multimode Concurrent Oscillations***F. Padovan*, Infineon Technologies, Villach, Austria

In Paper 23.6, Infineon Technologies and University of Padova present an ultra-low-phase-noise 15GHz quad-core oscillator. Over a 16% tuning range, the phase noise is -124dBc/Hz at a 1MHz offset, and the oscillator FOM is -189dBc/Hz.

23



12:00 PM

**23.7 A 7.4-to-14GHz PLL with 54fs<sub>rms</sub> Jitter in 16nm FinFET for Integrated RF-Data-Converter SoCs***D. Turker*, Xilinx, San Jose, CA

In Paper 23.7, Xilinx describes a 7.4-to-14GHz PLL in a 16nm FinFET process. The design provides 54fs<sub>rms</sub> jitter, in-band noise of -120dBc/Hz at a 100kHz offset, a -75.5dBc reference spur with the power dissipation of 45mW.

### 23.1 A -31dBc Integrated-Phase-Noise 29GHz Fractional-N Frequency Synthesizer Supporting Multiple Frequency Bands for Backward-Compatible 5G Using a Frequency Doubler and Injection-Locked Frequency Multipliers

Heein Yoon<sup>1</sup>, Juyeop Kim<sup>1</sup>, Suneui Park<sup>1</sup>, Younghyun Lim<sup>1</sup>, Yongsun Lee<sup>1</sup>, Jooeun Bang<sup>1</sup>, Kyooohyun Lim<sup>2</sup>, Jaehyouk Choi<sup>1</sup>

<sup>1</sup>Ulsan National Institute of Science and Technology, Ulsan, Korea  
<sup>2</sup>FCI, Seongnam, Korea

To address the increasing demand for high-bandwidth mobile communications, 5G technology is targeted to support data-rates up to 10Gb/s. To reach this goal, one of challenging tasks for wireless transceivers is to generate millimeter-wave (mmW) band LO signals that have an ultra-low integrated phase noise (IPN). The IPN of an LO signal should be reduced to less than -30dBc to satisfy the EVM requirements of high-order modulations, such as 64-QAM. Figure 23.1.1 shows the frequency spectrum for cellular systems, including existing bands below 6GHz and new mmW bands for 5G. A key goal of the evolution of mobile communications is to ensure interoperability with past-generation standards, and this is expected to continue for 5G. Thus, LO generators eventually will be designed to cover existing bands as well as mmW bands. There are many PLLs that can generate mmW signals directly [1,2], but their ability to achieve low IPN is limited. This is because they are susceptible to increases in in-band phase noise due to their large division numbers and out-of-band phase noise due to the low Q-factors of mmW VCOs. They also require a significant amount of power to operate high-frequency circuits, such as frequency dividers. In addition, they must divide frequencies again to support bands below 6GHz, resulting in the consumption of additional power.

In this work, we present an LO generator that can support multiple frequency bands concurrently. First, a fractional-*N* PLL uses a low-noise reference-frequency doubler (RFD) and a high-Q VCO to generate an ultra-low-IPN signal in a GHz range. Then, injection-locked frequency multipliers (ILFMs) increase these relatively low frequencies to higher bands without the degradation of IPN. By sharing a low-power frequency-tracking loop (FTL) that corrects frequency drifts, the burden for designing multiple ILFMs is reduced significantly.

Figure 23.1.2 shows the architecture of the LO generator. To minimize the in-band phase noise level of the PLL, the proposed low-noise RFD is used to double the reference frequency ( $f_{REF}$ ). In this work, one of two ILFMs, i.e. ILFM\_×15 or ILFM\_×3, can be used to increase the PLL output frequency to the target band. When one ILFM is selected, the quadrature signals from the divide-by-2 divider are delivered to pulse generators (PGs) that produce pulses that are injected into the QVCOs. To ensure a low IPN while using the minimum power, the two ILFMs share an ultra-low-power FTL, as presented in [3], which continues to correct the frequency drifts of the QVCO ( $f_{QVCO,M}$ ). To detect the deviation of  $f_{QVCO,M}$  ( $M$  is either 3 or 15) from the target frequency, i.e.  $M \cdot f_{INJ}$ , where  $f_{INJ}$  is the frequency of the injection pulses, the FTL compares the overlapped area of  $INJ_M\text{-}I+$  and  $OUT_M\text{-}Q+$  with that of  $INJ_M\text{-}I+$  and  $OUT_M\text{-}Q-$  at the moment of injection of  $INJ_M\text{-}I+$ . If  $f_{QVCO,M}$  deviates from  $M \cdot f_{INJ}$ , the quadrature relationship between the QVCO outputs is momentarily distorted, i.e.  $INJ_M\text{-}I+$  becomes closer to either  $OUT_M\text{-}Q+$  or  $OUT_M\text{-}Q-$ , causing the difference in the two areas. The magnitudes of the areas are converted to DC voltages,  $V_{AO+}$  and  $V_{AO-}$ , and continuously monitored by the loop. Although this work uses two additional QVCOs, they cause no large increase in the silicon area because of the small sizes of the inductors. ILFM\_×3 uses a three-turn inductor, and the inductor for ILFM\_×15 is intrinsically small due to the high resonant frequency.

To achieve ultra-low IPN at mmW frequencies, a low-noise design of the RFD is vital. In general, the noise to be added by a frequency doubler is proportional to the amount of the delay of the new edge from the original edge. Thus, in terms of noise, the best strategy for designing a frequency doubler is to use both of the original rising and falling edges of the reference clock. Then, the increase in noise can be minimized since only slight adjustments of the timings of the edges are required to reduce the reference spur. Figure 23.1.3 shows the schematics of the RFD using a dual-PG (DPG) and the duty cycle-correcting loop (DCCL) that continuously calibrates the duty cycle of the signal of  $S_{0,5}$ . The DCCL detects the

deviation of the duty cycle of  $S_{0,5}$  by comparing the DC levels of  $S_{DZ}$  and  $S_{DZb}$ , i.e.  $V_{DZ}$  and  $V_{DZb}$ , extracted by low-pole RC filters. Since either  $V_{DZ}$  or  $V_{DZb}$  must be higher than the other when the duty cycle deviates from 50%, the comparator can decide whether the duty cycle should increase or decrease. The correction of the duty cycle is performed by the inverter-based duty corrector, consisting of six delay cells,  $D_k$  ( $k = 0$  to 5).  $D_k$  has two inverters, both of which include slow and fast PMOSs and slow and fast NMOSs. The W/L ratio of the fast transistor is  $2^{k+1}$  times that of the slow transistor. According to  $UD_k < 1:0 >$  from the duty-correction logic (DCL),  $D_k$  can have one of three configurations. When  $UD_k < 1:0 >$  is '00', the fast PMOSs and the fast NMOSs are used. In this case, the duty cycle is not changed, since both the rising and falling edges of the signal must undergo the minimum delay while passing the inverters. When  $UD_k < 1:0 >$  is '01', the first inverter consists of the fast PMOS and the slow NMOS, whereas the second inverter has the slow PMOS and the fast NMOS, as in the blue lines. Since rising edges have more delays than falling edges, the duty cycle must decrease. When  $UD_k < 1:0 >$  is '10', the inverters are configured as in the red lines, and the duty cycle increases. Since the ratio of the strengths of the fast and the slow paths is doubled as  $k$  increases by one, the amount of the change in the duty cycle also is doubled. When decoding  $C_{DCC} < 6:0 >$ , the DCL is supposed to minimize the number of  $UD_k < 1:0 >$ s that have '01' or '10' to remove redundant delays, thereby minimizing added noise. To prevent the periodic toggling at  $S_{0,5}$  due to the change in  $C_{DCC} < 6:0 >$  in a steady state, a dead zone is implemented using an additional delay cell,  $D_{DZ}$ , where the step size is between that of  $D_0$  and  $D_1$ . When the comparator output toggles between +1 and -1, only  $UD_{DZ}$  is changed, and  $C_{DCC}$  is not updated, suppressing the spur at the operating frequency of the DCCL. When the comparator produces consecutive 1s and 0s,  $C_{DCC}$  is updated, and the duty cycle of  $S_{0,5}$  is readjusted. The precision of the RFD is defined by the step size of  $D_{DZ}$ , which was 0.45% in this work. To remove the input offset, the auto-zeroing technique was used for the comparator.

The proposed multiband quadrature LO generator is implemented in a 65nm CMOS process and consumes 36.4mW in the ILFM\_×15 mode. Figure 23.1.4 shows the measured PN of the 3.6GHz output signal when the PLL was in the integer-*N* mode. Compared to the case in which the RFD was bypassed, when 120MHz  $f_{REF}$  was doubled by the RFD, the PLL achieved a much lower IPN, i.e. -50.5dBc, due to the reductions in the division number and the in-band phase noise. The level of the spur at 120MHz was -103dBc, which indicated the RFD effectively calibrated the duty cycle. The presence of the spur at 468.8kHz is due to the RFD that operates at that frequency. Figure 23.1.5 shows the measured PNs of the output signals of ILFM\_×15, ILFM\_×3, and the PLL in the fractional-*N* mode. The measured IPN and rms jitter of the 29.22GHz signal were -31.4dBc and 206fs, respectively. The IPN at 5.84GHz was -44.1dBc. The difference in the PNs of the PLL and each ILFM was close to  $20\log(M)$ , showing that the noise added by the ILFMs was negligible. The table in Fig. 23.1.6 compares the performances of mmW fractional-*N* frequency synthesizers. This work provided multiple-band frequencies and achieved the lowest values of IPN, in-band phase noise, reference spur, and FOM<sub>JIT</sub>. To increase the frequency of the output signal of a GHz-range PLL without the use of FTLs, [4] used a chain of ILFMs with small multiplication factors and extremely wide lock ranges, but it had large power consumption and large spurs.

#### Acknowledgements:

This work was supported by Samsung Research Funding Center of Samsung Electronics under Project Number SRFC-IT1702-02

#### References:

- [1] M. Ferriss, et al., "A 13.1-to-28GHz Fractional-N PLL in 32nm SOI CMOS with a  $\Delta\Sigma$  Noise-Cancellation Scheme," ISSCC, pp. 1–3, Feb. 2015.
- [2] A. Hussein, et al., "A 50-to-66GHz 65nm CMOS All-Digital Fractional-N PLL with 220fs<sub>rms</sub> Jitter," ISSCC, pp. 326–327, Feb. 2017.
- [3] S. Yoo, et al., "A PVT-Robust -39dBc 1kHz-to-100MHz Integrated-Phase-Noise 29GHz Injection-Locked Frequency Multiplier with a 600 $\mu$ W Frequency-Tracking Loop Using the Averages of Phase Deviations for mm-Band 5G Transceivers," ISSCC, pp. 324–325, Feb. 2017.
- [4] A. Li, et al., "A 21–48 GHz Subharmonic Injection-Locked Fractional-N Frequency Synthesizer for Multiband Point-to-Point Backhaul Communications," IEEE JSSC, vol. 49, no. 8, pp. 1785–1799, Aug. 2014.



Figure 23.1.1: Frequency spectrum for cellular systems and the proposed single-PLL-based LO generator supporting multiple frequency bands



Figure 23.1.3: Schematics and duty-correction principle of the proposed reference-frequency doubler (RFD).



Figure 23.1.5: Measured phase noises of the output signals of the PLL in the fractional-N mode ( $f_{PLL} = 3.897\text{GHz}$ ), the ILFM\_X3 ( $f_{QVCO,3} = 5.845\text{GHz}$ ), and the ILFM\_X15 ( $f_{QVCO,15} = 29.228\text{GHz}$ ).



Figure 23.1.2: Overall architecture of the proposed LO generator and the principle of the frequency-tracking loop (FTL).



Figure 23.1.4: Measured phase noises of the 3.6GHz-output signals of the PLL in the integer-N mode in three cases, i.e. black: RFD bypassed; blue: RFD on, DCCL off; and red: both RFD and DCCL on.

| MKR MODE TRC                                   | This work                             | ISSCC'15 [1] M. Ferriss        | ISSCC'17 [2] A. Hussein     | ISSCC'13 W. Wu              | JSSC'14 [4] A. Li                       | JSSC'16 A. T. Sirburean    |
|------------------------------------------------|---------------------------------------|--------------------------------|-----------------------------|-----------------------------|-----------------------------------------|----------------------------|
| Process                                        | 65nm CMOS                             | 32nm SOI                       | 65nm CMOS                   | 65nm CMOS                   | 65nm CMOS                               | 65nm CMOS                  |
| Architecture                                   | RFD + GHz-PLL + ILFMs                 | Analog/Digital Hybrid PLL      | All-Digital PLL             | All-Digital PLL             | GHz-PLL + ILFM chain                    | 20GHz SS-PLL + 60GHz QILO  |
| Type                                           | Fractional-N                          | Fractional-N                   | Fractional-N                | Fractional-N                | Fractional-N                            | Integer-N                  |
| Quadrature                                     | YES                                   | NO                             | NO                          | NO                          | NO                                      | YES                        |
| Multiple Freq. Bands                           | YES                                   | NO                             | NO                          | NO                          | YES                                     | NO                         |
| Output Freq. (GHz)                             | 25.0 - 30.0<br>5.2 - 6.0<br>2.7 - 4.2 | 13.1 - 28.0                    | 50.2 - 66.5                 | 56.4 - 63.4                 | 20.6 - 48.2<br>10.1 - 18.3<br>3.4 - 6.1 | 55.6 - 65.2                |
| Ref. Freq. $f_{REF}$ (MHz)                     | 120                                   | 104.5                          | 100                         | 100                         | 100                                     | 40                         |
| Jitter rms @ 100 MHz (Integ. Range)            | 206fs @ 29.22 (1k - 100MHz)           | 1.03ps* @ 22.25 (10k - 100MHz) | 258fs @ 65.35 (10k - 40MHz) | 590fs @ 61.87 (10k - 10MHz) | 1.02ps** @ 28.5 (10k - 40MHz)           | 290fs @ 60.5 (10k - 40MHz) |
| IPN @ 100 MHz (Integ. Range)                   | -31.4 @ 29.22 (1k - 100MHz)           | -19.8* @ 22.25 (10k - 100MHz)  | -22.5 @ 65.35 (1k - 40MHz)  | -15.8 @ 61.87 (10k - 10MHz) | -17.8* @ 28.5 (10k - 10MHz)             | -22.2 @ 60.5 (10k - 40MHz) |
| IPN @ 28GHz Norm. to 28GHz (Integ. Range)      | -31.8 (1k - 100MHz)                   | -17.8* (10k - 100MHz)          | -22.7 (1k - 40MHz)          | -29.9 (10k - 10MHz)         | -22.7 (10k - 10MHz)                     | -17.9* (10k - 40MHz)       |
| In-band noise @ 100MHz (dBc/Hz) Norm. to 28GHz | -88.6 @ 29.22                         | -71.0 @ 22.25                  | -78.7 @ 65.35               | -75.0 @ 61.87               | -54.0 @ 28.5                            | -78.5 @ 60.5               |
| In-band noise @ 100MHz (dBc/Hz) Norm. to 28GHz | -89.0                                 | -69.0                          | -86.1                       | -81.9                       | -54.1                                   | -85.2                      |
| Reference spur (dBc)                           | -83 @ 29.22GHz                        | NA                             | NA                          | -74                         | -33                                     | -73                        |
| Power Cons. ( $P_{DC}$ ) (mW)                  | 36.4 (X15 mode)<br>35.1 (X3 mode)     | 31.0                           | 46.0                        | 48.0                        | 148.3                                   | 32.0                       |
| Active Area ( $\text{mm}^2$ )                  | 0.95                                  | 0.24                           | 0.45                        | 0.48                        | 2.09                                    | 1.08 w/ pads               |
| $\text{FOM}_{\text{int}}$ (dBc)**              | -238.1                                | -224.8                         | -235.1                      | -227.8                      | -218.1                                  | -235.7                     |

\* Calculated from the measurement results \*\* Calculated from the PN graph in Fig. 22(b) of [4] \*\*\*  $\text{FOM}_{\text{int}} = 10 \log(\sigma^2 / P_{DC})$  (dB)

Figure 23.1.6: Performance comparison with mm-wave-band fractional-N frequency synthesizers.



| Power Consumption (mW) |                                      |
|------------------------|--------------------------------------|
| <b>RFD</b>             | <b>2.5</b>                           |
| <b>PLL (20.1)</b>      | <b>LC-VCO</b> 6.0                    |
|                        | <b>PFD+CP</b> 2.2                    |
|                        | <b>Divider + DSM</b> 3.0             |
|                        | <b>VCO buf.+ Quad. gen.</b> 8.9      |
| <b>ILFM x15 (13.8)</b> | <b>QVCO</b> 10.4                     |
| <b>ILFM x15 (13.8)</b> | <b>PGs</b> 2.5                       |
|                        | <b>FTL</b> 0.9                       |
|                        | <b>QVCO</b> 9.1                      |
| <b>ILFM x3 (12.5)</b>  | <b>PGs</b> 2.5                       |
| <b>ILFM x3 (12.5)</b>  | <b>FTL</b> 0.9                       |
|                        | <b>Total (RFD+PLL+ILFM_x15)</b> 36.4 |
|                        | <b>Total (RFD+PLL+ILFM_x3)</b> 35.1  |

Figure 23.1.7: Die micrograph and power-breakdown table.

## 23.2 A >40dB IRR, 44% Fractional-Bandwidth Ultra-Wideband mm-Wave Quadrature LO Generator for 5G Networks in 55nm CMOS

Farshad Piri<sup>1</sup>, Matteo Bassi<sup>1,2</sup>, Niccolò Lacaia<sup>1</sup>, Andrea Mazzanti<sup>1</sup>, Francesco Svelto<sup>1</sup>

<sup>1</sup>University of Pavia, Pavia, Italy

<sup>2</sup>now with Infineon Technologies, Villach, Austria

The development of next-generation 5G networks is ongoing. The large available bandwidth at mm-waves allows increasing channel capacity well beyond the levels offered by LTE. Wide ranges of spectra, with sub-bands centered at 28GHz, 37GHz, and 39GHz, have been appointed for 5G development to facilitate international roaming and intra-networks connections [1]. In this scenario, generation of ultra-low phase-noise quadrature (IQ) signals with >40dB image rejection ratio (IRR) over >40% fractional bandwidth is key to efficiently deliver extreme data-rates through high-order spectrally efficient modulations. Quadrature voltage-controlled oscillators are disregarded because of their limited tuning range and also due to a severe trade-off between phase noise and phase accuracy. Solutions leveraging single-phase VCOs followed by quadrature generators is seen as a better strategy. Still, the challenging phase noise, required to support higher-order modulations trading-off with tuning range, mandates at least two VCOs covering half bandwidth each. For quadrature generation, distributed couplers, e.g., Lange couplers, are bulky and not amenable to integration. Hybrid couplers based on coupled inductors offer a compact footprint with low loss, but they are disregarded, because a few percent variation in the coupling coefficient, k, leads to unacceptable phase deviations. Polyphase filters (PPFs) and their improvements are widely adopted at RF [2]. In [3], the PPF operation at mm-waves is proven through careful layout techniques. Still, wideband operation can be achieved only by cascading several stages, severely increasing signal loss and power consumption.

In this work, a phase detector senses the phase deviation from quadrature of signals generated by a single-stage PPF, and a feedback circuit continuously tunes the filter center frequency to the input signal frequency. In this way, losses intrinsic to the wideband operation can be mitigated, and performance can be achieved over a large fractional bandwidth and over process variations. Prototypes in a 55nm CMOS technology achieve an IRR better than 40dB over the whole 28-to-44GHz frequency range, i.e. a fractional bandwidth of 44%.

The complete diagram is shown in Fig. 23.2.1. A type-a PPF structure [3] is used to ensure zero amplitude imbalance at any frequency and zero phase error at the center frequency only. Resistive elements are realized with four NMOS transistors biased in triode. The phase detector, based on a double-balanced Gilbert cell, senses the quadrature accuracy, amplifies the error, and drives all four gates of the NMOS transistors with the same voltage. In this way, the center frequency of the PPF is always tuned to the input LO frequency, ensuring minimum quadrature error. Monte-Carlo simulations indicate that offsets due to mismatches in the Gilbert-cell phase detector have negligible impact on quadrature error. The PPF is designed such that the center frequency is  $f_c=33.5\text{GHz}$  when  $V_{tune}=V_{dd}/2$ . At the edges of the bandwidth of interest (i.e. 28 to 39GHz), the phase error in open loop,  $\phi_{ol}$ , is 10 to 15°. To provide an IRR better than 40dB with margin, the phase error in closed loop,  $\phi_{cl}$ , has to be reduced to ~0.25°, requiring a loop gain  $G_{loop}=(\phi_0/\phi_{cl}-1)$  of roughly 32dB. Since the voltage-to-phase-shift gain  $G_{PPF}$  of the PPF through  $V_{tune}$  is ~150deg/V, and the phase detector gain  $G_{PD}$  is 20μA/deg, the transimpedance amplifier is designed to provide a transimpedance gain  $G_{TIA}>82\text{dB}$ .

Interface buffers are modeled as transconductors with parallel combinations of  $C_o$  and  $r_o$  and  $C_i$  and  $r_i$ , respectively, as shown in Fig. 23.2.2(a). Due to complex-pole pairs in the transfer function, transformers resonate with buffers and PPF capacitors to minimize signal loss while keeping wide bandwidth. Higher coupling coefficient k determines higher separation in the two complex poles pairs, resulting in a wider bandwidth, at the expense of increased in-band ripple, as depicted in Fig. 23.2.2(b), where  $T(j\omega)=|(Q_p-Q_n)/I_{in}|$  is plotted as a function of  $k=k_c=k$ . Inductors  $L_i$  and  $L_o$  are selected to resonate  $C_i+C$  and  $C_o+C$  at frequency of 33.5GHz, respectively. For  $k=0.7$ , the ripple is limited to 1dB and bandwidth in excess of 45% is obtained. Once the reactive elements are resonated out,  $T(j\omega_c)$

in Fig. 23.2.2(c) results. By selecting  $R = 2\sqrt{r_i r_o}$ , an optimum is reached, maximizing the gain for a given bandwidth. Buffers cascaded to the PPF need 12mA, and the simulated voltage gain from the PPF input to the buffers output is 8dB.

In a conventional low-frequency PPF design, the layout resembles the device placement shown in Fig. 23.2.3(a). The asymmetric grey interconnect introduces a parasitic inductance  $L_{par}$ , responsible for a differential phase error between the PPF outputs  $I_p$  and  $I_n$ , as shown in Fig. 23.2.3(c). The output of the phase detector, realized as a double-balanced Gilbert cell [6,7], is shown as a function of the input quadrature phase error in Fig. 23.2.3(d) in presence of  $L_{par}$ . Even for  $L_{par}$  values as low as 30pH, the transfer function is shifted by -1°, introducing a corresponding closed-loop systematic quadrature phase error. To mitigate this effect, we introduced the PPF floorplan depicted in Fig. 23.2.3(b), where the symmetric input lines conveniently connect the input transformer to the PPF differential inputs. At the output, the dummy cross  $CR_1$  replicates cross  $CR_2$ , yielding symmetric routings of the PPF components and greatly reducing the systematic quadrature phase error. Full EM simulations were performed on the structure, and the phase detector transfer function is plotted in Fig. 23.2.3(d).

Prototypes of the quadrature generator were fabricated in a 55nm CMOS technology by STMicroelectronics. A die photograph is shown in Fig. 23.2.7. The core active area is 590μm×330μm. Power consumption is 36mW for the input and output buffers, from  $V_{dd}=1.2\text{V}$ . The calibration loop consumes 3mW only. To measure the IRR of the quadrature generator, baseband inputs are fed with 500kHz quadrature signals and the output is connected to Agilent N9030A spectrum analyzer. The LO input power is 0dBm. Figure 23.2.4 shows a screenshot of the output spectrum at  $f=36\text{GHz}$ , where IRR is ~43dB and the LO leakage is -23dBc. The same measurement has been repeated over 3 different samples by sweeping the input frequency from 26 to 46GHz with the loop turned ON and OFF. In the former,  $V_{tune}$  is automatically set by the loop, while all the other parameters are kept constant. In the latter,  $V_{tune}$  is manually set to the value at which the loop settles at  $f=36\text{GHz}$ . Results are shown Fig. 23.2.5(a), where IRR is better than 40dB from 28 to 44GHz, which demonstrates that the loop is able to recover more than 20dB of the IRR with a minimum penalty in area and power consumption. Figure 23.2.5(b) shows the phase noise at the input and output of the quadrature generator, i.e. this work adds negligible noise to the input signal.

Results are summarized and compared to the prior works in Fig. 23.2.6. [4] and [5] feature high fractional bandwidth but with poor minimum IRR. Compared to quadrature generators above 20GHz in Fig. 23.2.6, the introduced scheme yields the best quadrature accuracy (IRR >40 dB) over a large 44% fractional bandwidth.

### Acknowledgment:

Authors thank the RF Department of HiSilicon for technical and financial support and Integrand Software for EMX simulator.

### References:

- [1] Federal Communications Commission, "FCC – Use of Spectrum Bands Above 24 GHz For Mobile Radio Services" Accessed July 2016, <[https://apps.fcc.gov/edocs\\_public/attachmatch/FCC-16-89A1.pdf](https://apps.fcc.gov/edocs_public/attachmatch/FCC-16-89A1.pdf)>.
- [2] J. Park and H. Wang, "A Transformer-Based Poly-Phase Network for Ultra-Broadband Quadrature Signal Generation," *IEEE TMTT*, vol. 63, no. 12, pp. 4444-4457, Dec. 2015.
- [3] S. Kulkarni, et al., "Design of an Optimal Layout Polyphase Filter for Millimeter-Wave Quadrature LO Generation," *IEEE TCAS-II*, vol. 60, no. 4, pp. 202-206, Apr. 2013.
- [4] S. Sah, et al., "Design and Analysis of a Wideband 15–35-GHz Quadrature Phase Shifter with Inductive Loading," *IEEE TMTT*, vol. 61, no. 8, pp. 3024-3033, Aug. 2013.
- [5] K. Koh and G. Rebeiz, "0.13-μm CMOS Phase Shifters for X-, Ku-, and K-Band Phased Arrays," *IEEE JSSC*, vol. 42, no. 11, pp. 2535-2546, Nov. 2007.
- [6] E. Monaco, et al., "A 2-11 GHz 7-Bit High-Linearity Phase Rotator Based on Wideband Injection-Locking Multi-Phase Generation for High-Speed Serial Links in 28-nm CMOS FDSOI," *IEEE JSSC*, vol. 52, no. 7, pp. 1739-1752, July 2017.
- [7] L. Wu, et al., "A 4-Path 42.8-to-49.5 GHz LO Generation with Automatic Phase Tuning for 60GHz Phased-Array Receivers," *IEEE JSSC*, vol. 48, no. 10, pp. 2309-2322, Oct. 2013.



Figure 23.2.1: Schematic diagram of the quadrature generator.

Figure 23.2.2: (a) PPF model with input/output buffers; (b) transfer function plot with  $k_i=k_o=k$  as parameter; and (c) transfer function expression at resonance as a function of  $r_i$ ,  $r_o$ , and the PPF resistance  $R$ .Figure 23.2.3: (a) Classic layout placement of a PPF filter with parasitic inductor  $L_{par}$  highlighted; (b) proposed highly symmetric layout floorplan. (c) Plot of the differential phase error versus  $L_{par}$  for the configuration in (a). (d) Comparison between phase detector transfer functions of classic layout in (a) and introduced layout in (b).Figure 23.2.5: (a) Measured IRR as a function of frequency for 3 samples with the loop ON and the loop OFF (with  $V_{tune}$  set for best IRR at 36GHz). (b) Measured phase-noise spectrum at the input and output of the IQ generator.

Figure 23.2.6: Quadrature generator performance comparison.



Figure 23.2.7: Die micrograph.

### 23.3 A 22.8-to-43.2GHz Tuning-Less Injection-Locked Frequency Tripler Using Injection-Current Boosting with 76.4% Locking Range for Multiband 5G Applications

Jingzhi Zhang, Huihua Liu, Chenxi Zhao, Kai Kang

University of Electronic Science and Technology of China, Chengdu, China

Future cross-network and international roaming are attractive in mm-wave fifth-generation (5G) wireless networks with multiband operations. The generation of an ultra-wide-bandwidth ultra-low-phase-noise (PN) local oscillator (LO) signal in massive multiple-input multiple-output (MIMO) transceivers, which support spectra around 28GHz, 37GHz, and 39GHz, becomes a significant challenge. Injection-locked frequency tripler (ILFT) is a good candidate for LO generation due to its low PN property while suffering from a narrow locking range. Varactors are often used to tune the free-running frequency to increase the bandwidth [1]. However, the PN performance degrades when the target frequency is far away from the free-running frequency, which means a complex calibration mechanism must be applied [2,3]. Meanwhile, an ILFT with such a self-calibration circuit still suffers from a narrow locking range, which cannot support multiband operations. To simplify the system design and meet the multiband requirement, a tuning-less ILFT with an ultra-wide locking range is seen as an appropriate solution for mm-wave multiband 5G applications.

In this work, we propose a multiband ILFT using injection-current boosting (ICB) to improve the locking range. Compared with the injection current ( $I_{inj, ICB}$ ) generated from the injection devices, the effective injection current ( $I_{eff, ICB}$ ) injected into the core oscillator can be boosted by applying a strongly coupled n:1 transformer. Moreover, unlike using an LC resonator in the conventional ILFT, a 6<sup>th</sup>-order weakly coupled transformer resonator is proposed to support multiband operations [4] (top of Fig. 23.3.1). As a result, the locking range can be increased significantly even under low injected-power conditions, which may simplify the design of input driver stage and save power. The circuit realization of the proposed ICB-ILFT is shown on the left of Fig. 23.3.2. With complementary NMOS and PMOS injection devices, the third harmonic is generated by biasing the transistors with the conduction angle of 120° to maximize the third-order transconductance. To support the zero-IF transceiver architecture for 5G systems, a quadrature signal generator is designed based on the proposed ICB-ILFT (bottom of Fig. 23.3.1). A two-stage poly-phase filter (PPF), with buffers compensating for signal loss, is used to generate quadrature injection signals with low phase mismatch and a wide bandwidth. The quadrature injection signals, I+, I-, Q+, and Q-, are injected into the ICB-ILFT to generate required frequencies. Achieving low power consumption, when driving a 50Ω load with a sufficient power, is challenging with the ICB-ILFT. Therefore, a quadrature IL buffer is applied with a 4<sup>th</sup>-order transformer resonator to achieve a wide bandwidth and large gain [1], which is shown on the right of Fig. 23.3.2. To avoid injection pulling, a cascode input stage is used to improve the isolation between the input and the output of the IL buffer. A quadrature coupling network [5] is applied connecting the two cross-coupled pairs to further improve the quadrature accuracy.

The locking range is often limited by the “phase condition” as plotted on the top left of Fig. 23.3.3. There are two methods to increase the locking range according to the phasor diagram of the ILFT: flattening the phase response or increasing the effective injection current. As mentioned in [4,6], a high-order resonator can make a phase plateau and flatten the phase response. A 6<sup>th</sup>-order resonator is selected in this work. The strongly coupled ICB transformer shown in Fig. 23.3.1 is inserted into the circuit to increase the effective injection current. Considering an ideal n:1 transformer, the current shown in the secondary coil is n-times larger than in the primary coil, which is the key method used for current boosting. Figure

23.3.3 (top right) gives the simulation results of the current-boosting behavior, in which the ICB-ILFT is compared with a conventional ILFT under the same resonating and start-up conditions. The effective current ( $I_{eff, ICB}$ , red curve), observed in the core oscillator, can be boosted significantly compared to both the injection current ( $I_{inj, ICB}$ , gray curve) and the injection current in the conventional ILFT ( $I_{inj, conv}$ , blue curve) with identical devices of the ICB-ILFT. The effective current-boosting gain ( $Gain_{ICB, eff}$ , black curve) is larger than zero from 29 to 41.5GHz with the maximum gain of 27dB at 40.2GHz.  $Gain_{ICB, eff}$  shows a bandpass behavior and is able to filter out the low-frequency components, which decreases the fundamental current injecting into the core oscillator. Further, the fundamental tone in the output spectrum is rejected and the rejection ratio is plotted on the bottom left of Fig. 23.3.3. With 0dBm injection power, the fundamental tone of the ICB-ILFT is 20dB lower than in the conventional ILFT. The measured input-sensitivity curve is plotted on the bottom right of Fig. 23.3.3. With 0dBm injection power, the locking range extends from 22.8 to 51GHz (76.4% of the center frequency). While the ICB-ILFT meets an inband loss-of-lock phenomena [4] due to the high-order load resonator with the injection power lower than -1.5dBm, the locking range still satisfies the system-operation bandwidth requirement.

Two chips were fabricated in a 65nm CMOS process. The first chip is a differential ICB-ILFT. It occupies 0.47mm<sup>2</sup> and consumes 5.0mW without a buffer. The second chip is a quadrature signal generator based on the proposed ICB-ILFT. It occupies 0.84mm<sup>2</sup> and consumes 41.8mW power. Figure 23.3.4 plots the measured PN. The quadrature signal generator achieves -97.75dBc/Hz PN at 100kHz offset and -40.03dBc integrated PN (IPN, from 1kHz to 10MHz) at 28GHz. While at 39GHz output frequency, the measured PN and IPN is -94.58dBc/Hz and -36.88dBc. The PN is measured from 22.8 to 45.6GHz as shown at the bottom of Fig. 23.3.4. PN is degraded significantly above 43.2GHz. Thus, the preferred operation bandwidth of the proposed quadrature ICB-ILFT is from 22.8 to 43.2GHz due to low PN, although its locking range is from 22.8 to 51GHz as shown in Fig. 23.3.3. The measured spectrum plots with the fundamental spurs of -47.2dBc at 28GHz and -21.6dBc at 39GHz are shown in Fig. 23.3.5. The measured fundamental power is less than -50dBm in the proposed frequency bands (bottom of Fig. 23.3.5), which shows a good rejection property with the ICB transformer. Compared with the results in reference [3], the proposed ICB-ILFT-based signal generator has a 5.2× larger locking range (61.8%, Fig. 23.3.6) while consuming 2× more power due to the great loss from the two-stage PPF.

#### Acknowledgements:

This work was supported by the National Science Fund of China under Grant 61422104, 61771115 and 61331006. The last author is the corresponding author.

#### References:

- [1] Y. Chao, et al., “An 86-to-94.3GHz Transmitter with 15.3dBm Output Power and 9.6% Efficiency in 65nm CMOS,” ISSCC, pp. 346-347, Feb. 2016.
- [2] D. Shin, et al., “A Mixed-Mode Injection Frequency-Locked Loop for Self-Calibration of Injection Locking Range and Phase Noise in 0.13μm CMOS,” ISSCC, pp. 50-51, Feb. 2016.
- [3] S. Yoo, et al., “A PVT-Robust -39dBc 1kHz-to-100MHz Integrated-Phase-Noise 29GHz Injection-Locked Frequency Multiplier with a 600μW Frequency-Tracking Loop Using the Averages of Phase Deviations for mm-Band 5G Transceivers,” ISSCC, pp. 324-325, Feb. 2017.
- [4] J. Zhang, et al., “A 27.9-53.5-GHz Transformer-Based Injection-Locked Frequency Divider with 62.9% Locking Range,” IEEE RFIC, pp. 324-327, June 2017.
- [5] X. Yi, et al., “A 57.9-to-68.3 GHz 24.6mW Frequency Synthesizer with In-Phase Injection-Coupled QVCO in 65 nm CMOS Technology,” IEEE JSSC, vol. 49, no. 2, pp. 347-359, Feb. 2014.
- [6] A. Li, et al., “A 21-48 GHz Subharmonic Injection-Locked Fractional-N Frequency Synthesizer for Multiband Point-to-Point Backhaul Communications,” IEEE JSSC, vol. 49, no. 8, pp. 1785-1799, Aug. 2014.



Figure 23.3.1: Simplified model of the ICB-ILFT (top) and the quadrature-signal-generator architecture (bottom).



Figure 23.3.2: Proposed ICB-ILFT (left) and IL-buffer (right) implementations with their layout views.



Figure 23.3.3: Operating principle of the ICB-ILFT (top left) with the simulated fundamental-rejection properties (bottom left), and current-boosting gain (top right) with the measured input sensitivity curve (bottom right).



Figure 23.3.4: Measured phase-noise plot at output frequencies of 28GHz (top left) and 39GHz (top right), and the phase-noise performance with the phase-noise degradation vs frequency (bottom).



Figure 23.3.5: Measured output spectrum at output frequencies of 28GHz (top left) and 39GHz (top right); the output power and fundamental rejection vs frequency (bottom).

|                                                                                 | Q. Wu<br>ISSCC'13               | H. Kim<br>RFIC'17                 | D. Shin<br>ISSCC'16 [2]         | S. Yoo<br>ISSCC'17 [3]          | This Work                     |                               |
|---------------------------------------------------------------------------------|---------------------------------|-----------------------------------|---------------------------------|---------------------------------|-------------------------------|-------------------------------|
|                                                                                 |                                 |                                   |                                 |                                 | Chip 1                        | Chip 2                        |
| Injection Signal                                                                | —                               | Off-Chip PLL                      | On-Chip QVCO                    | Off-Chip PLL                    | Off-Chip PLL + PPF            | Off-Chip PLL + PPF            |
| Topology                                                                        | Fundamental Oscillator          | Conventional (Varactor-Tune) ILFT | None Real-time Calibration ILFT | Real-time Calibration ILFM      | No Calibration ICB-ILFT       | No Calibration ICB-ILFT       |
| Process                                                                         | 130nm SiGe                      | 28nm CMOS                         | 130nm CMOS                      | 65nm CMOS                       | 65nm CMOS                     | 65nm CMOS                     |
| Phase                                                                           | Diff.                           | Quad.                             | Quad.                           | Quad.                           | Diff.                         | Quad.                         |
| Output Freq. (GHz)                                                              | 27.9 – 37.8                     | 25.8 – 28.0                       | 26.5 – 29.7                     | 27.4 – 30.8                     | 22.8 – 43.2                   | 22.8 – 43.2                   |
| Bandwidth (%)<br>[w/o Calibration]                                              | 30.1                            | 8.2<br>[1]                        | 11.4<br>[1]                     | 11.7<br>[0.7]                   | 61.8                          | 61.8                          |
| PN at 100kHz (dBc/Hz)<br>$f_{100}$                                              | -80 @ 28G                       | -109.6 @ 28G                      | -86.8 @ 28.5G                   | -92.6 @ 29.3G                   | -94.7 @ 28G                   | -94.7 @ 28G                   |
| PN at 1MHz (dBc/Hz)<br>$f_{1M}$                                                 | -104 @ 28G                      | -108.5 @ 28G                      | -106.7 @ 28.5G                  | -115.6 @ 29.3G                  | -114.0 @ 28G                  | -114.0 @ 28G                  |
| IPN* (dBc)<br>[Integ. Range]<br>$f_{IPN}$                                       | -28.4 @ 28G<br>[100kHz – 10MHz] | -38.7 @ 28G<br>[100kHz – 250MHz]  | -31.8 @ 28.5G<br>[1kHz – 10MHz] | -36.8 @ 29.3G<br>[1kHz – 10MHz] | -40.0 @ 28G<br>[1kHz – 10MHz] | -40.0 @ 28G<br>[1kHz – 10MHz] |
| Total Power (mW) /<br>ILFM Area*** (mm <sup>2</sup> )<br>$f_{Total}$            | 10 (VCO Only)                   | —                                 | 49.7 / 23.32                    | 24.3 (ILFM Only)                | 14.8 / 5.0                    | 41.8 / 9.6                    |
| Chip area (mm <sup>2</sup> ) /<br>ILFM Area*** (mm <sup>2</sup> )<br>$f_{Area}$ | 1.93 (VCO Only)                 | 3.8 / 1.67                        | 1 / 0.09                        | 0.72 / 0.11                     | 0.47 / 0.09                   | 0.84 / 0.15                   |

\*Captured from measured phase noise plot.

\*\*With all auxiliary circuits used in ILFM, including the calibration circuit.

\*\*\*Including all auxiliary circuits used in ILFM and estimated from chip photograph .

Figure 23.3.6: Performances in comparison with prior-art mm-wave signal generators.



Figure 23.3.7: Die micrograph.

### 23.4 A 301.7-to-331.8GHz Source with Entirely On-Chip Feedback Loop for Frequency Stabilization in 0.13μm BiCMOS

Chen Jiang<sup>1,2</sup>, Mohammed Aseeri<sup>3</sup>, Andreia Cathelin<sup>4</sup>, Ehsan Afshari<sup>1</sup>

<sup>1</sup>University of Michigan, Ann Arbor, MI; <sup>2</sup>Cornell University, Ithaca, NY

<sup>3</sup>King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia,

<sup>4</sup>STMicroelectronics, Crolles, France

The THz band has shown its unique characteristics and great potential in many applications. Harmonic oscillators are commonly used to generate THz signals; however, due to supply noise and ambient environment changes, free-running THz oscillators exhibit large spectral linewidth and frequency drift. As a result, frequency stabilization is required in many systems. Conventionally, PLLs are used for this purpose [1-3]; however, major challenges exist when frequency goes into THz band. Since the division ratio ( $N$ ) is large in THz PLLs ( $10^3$  to  $10^4$ ) and the reference phase noise is multiplied by  $N^2$  to the output, a high-quality off-chip crystal oscillator is needed to keep the noise contribution low enough. Moreover, the large  $N$  causes significant VCO-noise folding, potentially degrading the output in-band phase noise. Finally, injection-locking frequency dividers (ILFDs) in THz range provide insufficient locking range, which limits the achievable output frequency range. To boost the ILFD locking range, higher injection power or multiphase injection [1,4] is needed, which significantly increases the power consumption. In this paper, an entirely-on-chip frequency-stabilization feedback loop for mm-wave/THz sources is presented. The feedback loop eliminates the need for both frequency dividers and off-chip crystal oscillators, resulting in much lower cost and power consumption. To show the feasibility, a 301.7-to-331.8GHz source is designed in the STMicroelectronics 0.13μm SiGe:C BiCMOS technology.

The architecture of the prototype source is shown in Fig. 23.4.1. The VCO has a fundamental frequency of 79GHz, and its 2<sup>nd</sup> harmonic is sent to front-end (FE) and back-end (BE) buffer amplifiers. The FE buffer drives a frequency doubler, which generates the 316GHz output. The BE buffer sends the signal for frequency detection and generation of the VCO control feedback. The detailed principle of the frequency detection scheme is shown in Fig. 23.4.2. The BE buffer output ( $P_{BE}$ ) is split into two signal paths. The upper path includes a bandpass filter, which is implemented with capacitively coupled series resonators. With proper design, its phase-frequency response exhibits a steep slope within the desired bandwidth (Fig. 23.4.2). Phase shift of the lower path does not change significantly with frequency. As a result, the phase difference of  $P_3$  and  $P_4$  is a strong function of frequency. This means that after power combining the output power  $P_{det}$  can be used as a frequency detection product. To ensure  $P_{det}$  is a monotonic function of frequency and the optimum detection gain is obtained,  $TL_3$  is added to adjust the phase shift of the upper path. A bipolar power detector converts  $P_{det}$  to  $I_{det}$ , and  $I_{det}$  generates the VCO control voltage feedback over the RC filter (Fig. 23.4.1). In the lower signal path,  $TL_1$  and  $TL_2$  transform the impedance to  $Z_3=115\Omega$ , so that after splitting and experiencing different loss in the two paths, the power ratio at the combiner input ( $P_3/P_4$ ) is ~1. This ratio helps to obtain a near-maximum frequency-detection gain and a minimum dynamic range for  $P_{det}$  to alleviate the trade-off among responsivity, output noise, and compression point of the power detector. It is noteworthy that the phase-frequency response of this impedance transformation slightly reduces the detection gain and increases the monotonic frequency range of  $P_{det}$ . With a  $P_{BE}$  of 1mW, the  $P_{det}$ - and  $I_{det}$ -to-frequency gain are simulated to be ~45.5fV/Hz and ~226fA/Hz, respectively (Fig. 23.4.2). The achieved high gain helps to reduce the noise contribution of the frequency-detection circuit blocks. An off-chip low-noise tunable current  $I_{tune}$  is also injected into the RC loop filter to change the steady-state  $I_{det}$ , thus adjusting the output frequency. In the proposed scheme, the VCO frequency is referenced to the phase-frequency response of the bandpass filter, which is insensitive to supply voltage noise and ambient environment change. Since the bandpass filter is made of the top metal with large geometries, it is also much less sensitive to process variation.

To obtain a wider frequency tuning range, a differential Colpitts oscillator [2] shown in Fig. 23.4.3 is designed. A cascode stage  $Q_3/Q_4$  is placed on top of the core transistors  $Q_1/Q_2$  to provide a required low impedance at  $Q_1/Q_2$  collector.  $M_1/M_2$  supplies the bias current.  $TL_1/TL_2$  transforms the capacitive impedance at the drain of  $M_1/M_2$  into a high impedance to reduce loading effect to the VCO core. The extracted 2<sup>nd</sup> harmonic is sent to both the FE buffer and BE buffer. A cascode

topology is used in the buffer amplifiers for better reverse isolation and stability. Beside the varactor, the VCO frequency can be additionally tuned by changing the tail NMOS bias,  $V_G$ . In simulation, 1.3mW power is generated at the output of the FE buffer to drive the doubler. At the doubler input, a mode-filtering wideband balun [5] is used to convert the single-ended input to balanced waves for 2<sup>nd</sup> harmonic generation in  $Q_1/Q_2$  (Fig. 23.4.3). In this balun, with excitation on the input line, the current flowing on the edges of the gap structure forms return current beneath the output line, which induces differential-mode signal on the output ports. To support different potentials on the edges of the gap, two  $\lambda/4$  slots are placed on top and bottom, which are folded to reduce size. Since only differential-mode waves are supported, the balun shows negligible phase and amplitude imbalance in a wide bandwidth [5]. The input power is injected into the bases of the transistors differentially, and the 2<sup>nd</sup> harmonic is extracted from the collector. The output probing pad acts as part of the matching network to reduce signal loss [6].

To measure the performance of the source, a WR-3 waveguide probe is used to probe the output signal. In the frequency measurement, an even-harmonic mixer is used for mixing the probed signal with 16<sup>th</sup> harmonic of an LO to down-convert it to an IF below 2GHz. Then, the IF signal is measured with a spectrum analyzer. Figure 23.4.4 shows the measured spectrum. With the frequency-stabilization feedback loop, a much narrower line is observed. Phase noise of the output is also measured under both cases (Fig. 23.4.4). When the VCO is free-running, due to frequency drift, the phase noise measurement is only performed down to an offset frequency of 300kHz. With the feedback loop, the output frequency is stabilized, and the phase noise can be measured to a much lower offset frequency. At 100kHz and 1MHz offset, the measured phase noise is -72.4dBc/Hz and -78.5dBc/Hz, respectively. To accurately characterize the probed output power of the chip, an Erikson PM4 power meter is used. Figure 23.4.5 shows the measured output power with output frequency. With the VCO tail bias,  $V_G$ , changed from 0.5 to 0.65V, the total output frequency range achieved is 301.7 to 331.8GHz, which is ~9.5% with respect to the center frequency of 316GHz. Within this range, the output power varies from -19.1 to -13.9dBm. At different VCO bias points, the total DC power consumption varies from 51.7 to 84.1mW, with detailed breakdown shown in Fig. 23.4.5. A performance comparison table is given in Fig. 23.4.6. Thanks to the elimination of narrowband ILFDs and power-consuming ILFD locking-range-extending schemes, the source demonstrates the largest output frequency range and lowest power consumption while achieving comparable phase noise and output power performances with respect to prior works in Fig. 23.4.6. Due to the small temperature dependence of the dielectric constant of the BEOL insulating layers, the proposed passive frequency-sensing structure has a larger temperature coefficient compared to crystal oscillators. However, with significantly lower system integration cost and power consumption, this scheme is attractive for many applications that require short coherence time, such as short-range mm-wave/THz FMCW radar.

#### Acknowledgements:

The authors acknowledge Qualcomm Inc. and King Abdulaziz City for Science and Technology for support, as well as STMicroelectronics for silicon fabrication.

#### References:

- [1] P. Chiang, et al., "A 300GHz Frequency Synthesizer with 7.9% Locking Range in 90nm SiGe BiCMOS," *ISSCC*, pp. 260-261, Feb. 2014.
- [2] R. Han, et al., "A 320GHz Phase-Locked Transmitter with 3.3mW Radiated Power and 22.5dBm EIRP for Heterodyne THz Imaging Systems," *ISSCC*, pp. 446-447, Feb. 2015.
- [3] Y. Zhao, et al., "An Integrated 0.56THz Frequency Synthesizer with 21GHz Locking Range and -74dBc/Hz Phase Noise at 1MHz Offset in 65nm CMOS," *ISSCC*, pp. 36-37, Feb. 2016.
- [4] T. Chi, et al., "A Multi-Phase Sub-Harmonic Injection Locking Technique for Bandwidth Extension in Silicon-Based THz Signal Generation," *IEEE JSSC*, vol. 50, no. 8, pp. 1861-1873, Aug. 2015.
- [5] C. Wang and R. Han, "Rapid and Energy-Efficient Molecular Sensing Using Dual mm-Wave Combs in 65nm CMOS: A 220-to-320GHz Spectrometer with 5.2mW Radiated Power and 14.6-to-19.5dB Noise Figure," *ISSCC*, pp. 302-303, Feb. 2017.
- [6] C. Jiang, et al., "An Efficient 210GHz Compact Harmonic Oscillator with 1.4dBm Peak Output Power and 10.6% Tuning Range in 130nm BiCMOS," *IEEE RFIC*, pp. 194-197, May 2016.



Figure 23.4.1: The 316GHz source with the proposed frequency-stabilization feedback loop.



Figure 23.4.2: The proposed entirely-on-chip frequency-detection scheme (top), the simulated coupled-resonator bandpass-filter frequency response (bottom left) and frequency-detection gain (bottom right).



Figure 23.4.3: VCO and the buffer amplifiers (left) as well as the frequency doubler (right).



Figure 23.4.4: The measured down-converted spectrum and phase noise with and without the frequency stabilization feedback loop.



Figure 23.4.5: The measured output power over the output frequency range (top), as well as the DC power consumption breakdown (bottom).

| References                    | ISSCC 2014 [1]       | ISSCC 2015 [2]              | ISSCC 2016 [3]     | JSSC 2015 [4]         | This Work                  |
|-------------------------------|----------------------|-----------------------------|--------------------|-----------------------|----------------------------|
| Source Type                   | Osc. + PLL           | PLL Inj. Lock. Osc. Array   | Osc. + PLL         | Inj. Lock. Osc. Chain | Osc. + Freq. Det. Feedback |
| Frequency (GHz)               | 280~303              | 317                         | 539~560            | 485~511               | 302~332                    |
| Tuning Range                  | 7.9%                 | \                           | 3.8%               | 5.1%                  | 9.5%                       |
| Frequency Purity              | Phase-Locked         | Phase-Locked                | Phase-Locked       | Free-Running          | Frequency-Stabilized       |
| Phase Noise @ 100kHz (dBc/Hz) | -77.8                | \                           | -71                | \                     | -72.4                      |
| Phase Noise @ 1MHz (dBc/Hz)   | -82.5                | -79                         | -74                | -87                   | -78.5                      |
| Output Power (dBm)            | -14 (Probed)         | 5.2 <sup>†</sup> (Radiated) | -27 (Radiated)     | -16.6 (Probed)        | -13.9 (Probed)             |
| DC Power (mW)                 | 376                  | 610                         | 172                | 388                   | 51.7                       |
| Area (mm <sup>2</sup> )       | 1.6×1.6              | 1.6×1.3                     | 1.8×1.55           | 0.72×0.7 (Core)       | 1×0.85                     |
| Technology ( $f_{max}$ )      | 90nm BiCMOS (315GHz) | 130nm BiCMOS (280GHz)       | 65nm CMOS (240GHz) | 90nm BiCMOS (350GHz)  | 130nm BiCMOS (280GHz)      |

<sup>†</sup> Total power of 16 radiator cells.

Figure 23.4.6: Performance summary and comparison with previously published sources.



Figure 23.4.7: Die micrograph.

### 23.5 An Inverse-Class-F CMOS VCO with Intrinsic-High-Q 1<sup>st</sup>- and 2<sup>nd</sup>-Harmonic Resonances for 1/f<sup>2</sup>-to-1/f<sup>3</sup> Phase-Noise Suppression Achieving 196.2dBc/Hz FOM

Chee-Cheow Lim<sup>1,2</sup>, Jun Yin<sup>1</sup>, Pui-In Mak<sup>1</sup>, Harikrishnan Ramiah<sup>2</sup>, Rui P. Martins<sup>1,3</sup>

<sup>1</sup>University of Macau, Macau, China

<sup>2</sup>University of Malaya, Kuala Lumpur, Malaysia

<sup>3</sup>Instituto Superior Tecnico/University of Lisboa, Lisbon, Portugal

Second-harmonic common-mode (CM) resonance has been explored for LC oscillators to improve their phase noise (PN) in the past. Its implementation evolves from an explicit design [1] that relies on an extra tail tank, to a recent implicit design [2], where the resonator itself offers a CM impedance peak at 2× the oscillation frequency ( $F_{LO}$ ):

Explicit design (Fig. 23.5.1-upper): a high-Q tail tank ( $Q_{TAIL}$ ) is desirable to raise its impedance  $|Z_{TAIL}|$  at  $2F_{LO}$  and to prevent the loss of  $L_{TAIL}$  from penalizing the PN in the  $1/f^2$  region [3]. To compare with the theoretical limit (FOM<sub>MAX</sub>), the FOM in the  $1/f^2$  PN region is plotted against  $L_{TAIL}$  at different  $Q_{TAIL}$ . Closing the gap between FOM<sub>MAX</sub> and FOM imposes an excessive  $Q_{TAIL}$  of 20 or beyond, which can hardly be achieved and maintained over a wide tuning range.

Implicit design (Fig. 23.5.1-lower): a CM resonance at  $2F_{LO}$  can avoid the physical area and loss of the tail tank. Besides, if the 2<sup>nd</sup>-harmonic current is trapped in the resistive impedance at  $2F_{LO}$  due to the CM resonance, the flicker-noise upconversion can be suppressed [4]. Yet, its practical effectiveness is limited by the presence of PN, which renders both  $F_{LO}$  and  $2F_{LO}$  to drift randomly around their resonances. If the 2<sup>nd</sup>-harmonic current spends more time “trapped” in the resistive part of the tank, the effect of flicker-noise upconversion can be further reduced, but it demands a high-Q impedance at  $2F_{LO}$ . Unfortunately, the implicit CM impedance  $|Z_{CM}|$  is relatively low for two reasons: 1) CM inductance  $L_{CM}=L(1-k)$  is small due to magnetic-flux cancellation and 2) the CM resonance has a low Q at  $2F_{LO}$  up to ~1/4 of its differential-mode counterpart [4] (i.e.  $Q_2/Q_1 \approx 0.25$ ). Although lowering k returns a higher  $Z_{CM}$ , more die area is sacrificed, and the required CM capacitance becomes exceedingly small.

This paper proposes an inverse-Class-F CMOS VCO free of CM resonance. A transformer-based dual-band LC resonator generates two intrinsic-high-Q resonances at  $F_{LO}$  (denoted  $Q_1$ ) and  $2F_{LO}$  (denoted  $Q_2$ ). A high  $Q_2$  corresponds to low PN in both  $1/f^2$  to  $1/f^3$  regions, while the transformer voltage gain aids suppressing the thermal noise contributed by the -g<sub>m</sub> transistors. The high-Q resonances lead to high FOMs at both 100kHz offset (191-to-192.5dBc/Hz) and 10MHz offset (195.6-to-196.2dBc/Hz) over 3.49 to 4.51GHz, while showing low frequency pushing (4.5 to 15MHz/V).

Let us consider a single-ended NMOS VCO using a 1:n transformer (Fig. 23.5.2) to satisfy Barkhausen's criterion. The transformer model [5] reveals that two impedance peaks can be produced despite the VCO being single-ended. To map them at  $F_{LO}$  and  $2F_{LO}$ , we have to satisfy  $16\xi^2 - 68\xi + 100\xi k_m + 16 = 0$  for a mutual coupling factor  $0.3 \leq k_m \leq 0.6$ , where  $\xi$  is the product of the transformer inductance ratio ( $L_s/L_p$ ) and capacitance ratio ( $C_s/C_p$ ), arriving at the inverse-Class-F operation. It implies that the effective inductance at  $F_{LO}$  ( $2F_{LO}$ ) is determined not only by  $L_p$  ( $L_s$ ), but also  $\xi$  and  $k_m$ . Increasing the tank impedance at  $2F_{LO}$  is feasible by sizing  $\xi > 1$ . Based on Fig. 23.5.2, a small  $k_m$  (0.34) is desired to raise  $\xi$  (3.2), leading to a higher  $Q_2/Q_1$  ratio (0.8) than that of [2] by 3.2× and a large transformer voltage gain ( $A_v$ ) resulting in a low-PN VCO design. For a low-power VCO design, a moderate  $\xi$  is desired for a high impedance at  $F_{LO}$ , without disproportionately degrading  $A_v$  and  $Q_2$ .

By stacking such a single-ended NMOS ( $M_N$ ) VCO with its PMOS ( $M_p$ ) counterpart, and merging their respective tanks as a single transformer, an inverse-Class-F CMOS VCO is developed with  $V_{GP}$  and  $V_{GN}$  as the differential outputs (Fig. 23.5.3). Shorting the center taps of the two coils ( $L_s$  and  $L_p$ ) realizes self-biasing at  $V_{DD}/2$ , provided that  $M_p$  and  $M_N$  have a matched large-signal transconductance ( $G_m$ ). The VCO can also be viewed as a 2-port resonator. By ensuring  $M_{P,N}$  are under positive feedback, oscillation only occurs at  $F_{LO}$  even if there is a higher impedance at  $2F_{LO}$  than  $F_{LO}$ .  $C_s$  (5 bits, LSB: 20fF) mainly responds for  $F_{LO}$ , such that  $C_p$  (6 bits, LSB: 11fF) can be tuned for  $F_{LO}$  and  $2F_{LO}$  alignment.

Unlike the differential VCO where the -g<sub>m</sub> transistors are switched on-and-off alternately, here they are switched on (first ½ period) and off (another ½ period) simultaneously. Since the fundamental and 2<sup>nd</sup> harmonic currents are in-phase here (Fig. 23.5.3), when they see their respective resistive paths at  $F_{LO}$  and  $2F_{LO}$  resonances, they result in an asymmetric voltage waveform at the drain, showing a flat span at  $V_{DD}/2$ . A large voltage gain at the gate provided by the transformer drives both  $M_p$  and  $M_N$  into the deep triode region, clipping their drain voltage close to  $V_{DD}$  and 0V, respectively. Using the linear time-variant model [6], the Impulse Sensitivity Function (ISF,  $\Gamma$ ) is around zero at these aforesaid two regions. Hence, the noise factor induced by the tank ( $2\Gamma_{RMS}^2$ ) is lowered from 1 (Class-B & C) to 0.37 (inverse-Class-F). Specifically, the falling and rising times at the drain voltage exhibit different maximum derivatives, resulting in asymmetric ISF between the upper and lower regions. Besides, since channel-length modulation dominates in deep sub-micron CMOS, a harmonically shaped drain voltage gives rise to asymmetric  $G_m$ . Interestingly, the product of a higher  $G_m$  and a lower ISF (and vice versa) in the vicinity of 0 ( $\pi$ ) phase yields a zero DC value for the effective ISF of  $G_m$  and thereby prevents the flicker-noise upconversion.

Due to the presence of harmonic voltages, the FOM of a VCO can be expressed as

$$FOM = 10 \log_{10} \left( \frac{2Q_1^2 \alpha_i \alpha_v}{10^3 kT \Sigma F} \right)$$

where T is the absolute temperature, k is Boltzmann constant,  $\alpha_i = I_{LO}/I_{DC}$  is the current efficiency,  $\alpha_v$  is the voltage efficiency defined as a ratio of the oscillation amplitude at the drain ( $V_{DP,DN}$ ) to  $V_{DD}$ , and  $\Sigma F$  is the total noise factor of the VCO contributed by the resonator and transistors (Fig. 23.5.3). If  $F_{LO}$  and  $2F_{LO}$  can be perfectly aligned at an optimum point of  $C_p=0.57\text{pF}$  and assuming  $\alpha_i=12$ , the calculated FOM (196.3dBc/Hz) of this work is beyond that of Class-F<sub>3</sub> (192.7dBc/Hz at  $\alpha_i=0.63$ ,  $\alpha_v=0.8$  and  $\Sigma F=1.88$ ) and Class-C (192.5dBc/Hz at  $\alpha_i=0.9$ ,  $\alpha_v=0.7$  and  $\Sigma F=2.46$ ). At this point,  $\alpha_i=1.35$  and  $\alpha_v=0.47$ , and  $\Sigma F$  is as low as 1.04 by taking the ISF into account. When  $C_p$  increases slightly, the FOM at  $1/f^2$  region goes up to 196.8dBc/Hz because  $Q_1$  raises before  $\Sigma F$  dominates the degradation. Since  $A_v$  and  $Q_1$  are both function of  $\xi$  [5], FOM<sub>MAX</sub> varies with  $C_p$ . A high  $Q_2$  also benefits the frequency pushing from  $V_{DD}$  caused by the Groszkowski effect. Thus, the frequency pushing of our VCO is only dominated by the inevitable nonlinear capacitances (varactor and parasitic).

The VCO occupies 0.14mm<sup>2</sup> in 65nm CMOS. It utilizes a 2-to-4-turn transformer with  $L_p=2.2\text{nH}$  and  $L_s=4.2\text{nH}$ . The simulated  $Q_1$  is 11.5 and  $Q_2$  is 9.2. Figure 23.5.4 shows the measured PN at 3.49GHz ( $f_{MIN}$ ) and 4.51GHz ( $f_{MAX}$ ). Consistent FOMs at 100kHz offset (191 to 192.5dBc/Hz) and 10MHz offset (195.6 to 196.2dBc/Hz) are achieved. The  $1/f^2$  corner is 100kHz at  $f_{MIN}$  and up to 300kHz at  $f_{MAX}$  due to the AM-PM conversion when the varactor and parasitic capacitance dominate. Figure 23.5.5 shows the PN and FOM plots across the frequency tuning range and its pushing by  $V_{DD}$ . The FOM at  $1/f^2$  region varies within ±0.5dB and peaks to 196.2dBc/Hz. The frequency pushing is 4.5MHz/V at  $f_{MIN}$  dominated by the varactor capacitance and -15MHz/V at  $f_{MAX}$  dominated by the capacitor-bank parasitics. The latter is still 2× lower than that of [2] that is up to 30MHz/V.

Benchmarking with the prior art (Fig. 23.5.6), our inverse-Class-F CMOS VCO exhibits the highest FOMs at both 100kHz and 10MHz offsets over a comparable tuning range, while achieving low frequency pushing, which is among the best reported in Fig. 23.5.6. The die micrograph of the VCO is depicted in Fig. 23.5.7.

#### Acknowledgements:

The authors thank Macau Science and Technology Development Fund (FDCT) - SKL Fund and University of Macau - MYRG2017-00185-AMSV for financial support.

#### References:

- [1] E. Hegazi, et al., “A Filtering Technique to Lower LC Oscillator Phase Noise,” *IEEE JSSC*, vol. 36, no. 12, pp. 1921-1930, Dec. 2001.
- [2] D. Murphy, et al., “Implicit Common-Mode Resonance in LC Oscillators,” *IEEE JSSC*, vol. 52, no. 3, pp. 812-821, Mar. 2017.
- [3] M. Garlampazzi, et al., “Analysis and Design of a 195.6 dBc/Hz Peak FoM P-N Class-B oscillator with Transformer-Based Tail Filtering.” *IEEE JSSC*, vol. 50, no. 7, pp. 1657-1668, July 2015.
- [4] M. Shahmohammadi, et al., “A 1/f Noise Upconversion Reduction Technique Applied to Class-D and Class-F Oscillators,” *ISSCC*, pp. 444-445, Feb. 2015.
- [5] A. Mazzanti and A. Bevilacqua, “On the Phase Noise Performance of Transformer-Based CMOS Differential-Pair Harmonic Oscillators,” *IEEE TCAS-I*, vol. 62, no. 9, pp. 2334-2341, Sept. 2015.
- [6] A. Hajimiri and T. Lee, “A General Theory of Phase Noise in Electrical Oscillators,” *IEEE JSSC*, vol. 33, no. 2, pp. 179-194, Feb. 1998.



Figure 23.5.1: Upper: VCO with explicit  $2F_{LO}$  resonance [1] and its FOM dependence on  $Q_{T\text{AIL}}$ . Lower: VCO with implicit  $2F_{LO}$  resonance [2] to avoid an extra LC tank, but the implicit CM resonance has low impedance and low  $Q$  at  $2F_{LO}$ .



Figure 23.5.2: A single-ended NMOS VCO using a 1: $n$  transformer; the transformer model reveals that 2 impedance peaks can be generated at  $F_{LO}$  and  $2F_{LO}$ .  $\xi=3.2$  and  $k_m=0.34$  yield high  $Q_z/Q_i$  ratio (0.8) and  $A_V$ .



Figure 23.5.3: Proposed inverse-Class-F CMOS VCO. By varying  $C_p$  and  $C_s$ ,  $F_{LO}$  and  $2F_{LO}$  can be aligned to suppress the flicker-noise upconversion.



Figure 23.5.4: Measured phase noise and FOM at different offset frequencies.



Figure 23.5.5: Upper: Measured phase noise and FOM vs. the oscillation frequency. Lower: Measured frequency pushing at  $f_{\text{min}}$  and  $f_{\text{max}}$ .

|                             | This Work            | ISSCC'17 [C-C. Li et al.] | JSSC'17 [2]       | ISSCC'15 [4]           | JSSC'13 [M. Babaei et al.] | JSSC'08 [A. Mazzanti et al.] | JSSC'01 [1]                   |
|-----------------------------|----------------------|---------------------------|-------------------|------------------------|----------------------------|------------------------------|-------------------------------|
| Topology                    | CMOS Inverse-Class-F | Trifilar Coil             | NMOS CM resonance | Class-F <sub>2,3</sub> | Class-F                    | Class-C                      | NMOS+2 $F_{LO}$ tail inductor |
| CMOS Tech.                  | 65nm                 | 16nm                      | 28nm              | 40nm                   | 65nm                       | 130nm                        | 0.35μm                        |
| Tuning Range                | 3.49-4.51 (25.5%)    | 3.2-4.0 (22%)             | 2.85-3.75 (27.2)  | 5.4-7 (25%)            | 2.95-3.8 & (25%)           | 4.9-5.65 (14%)               | 1-1.2 (18%)                   |
| $V_{DD}$ (V)                | 0.6                  | 0.4                       | 0.9               | 1                      | 1.25                       | 1                            | 2.5                           |
| Freq. (GHz)                 | 3.49                 | 4.51                      | 3.23              | 2.89                   | 5.4                        | 3.7                          | 4.9                           |
| Power (mW)                  | 1.2                  | 1.14                      | 3.8               | 6.8                    | 12                         | 15                           | 1.4                           |
| PN (dBc/Hz) @ 100k/10MHz    | -102.4/-145.6        | -98.5/-143.7              | -99 */-145        | -108.1/-152            | -105.3/-146.7              | -103.6/-152.8                | -99 */-142 #                  |
| FOM (dBc/Hz) @ 100k/10MHz   | 192.5/195.6          | 191/196.2                 | 183.3/190         | 188/192.5              | 189.1/190.5                | 183.2/192.4                  | 191.3/194.4                   |
| Freq. Pushing (MHz/V)       | 4.5                  | 15                        | 3.6-27.3          | 6-30                   | 12-23                      | 18-50                        | N/A                           |
| $1/f^3$ Corner (kHz)        | 100                  | 300                       | 120               | 200                    | 60                         | 700                          | 200                           |
| Die Area (mm <sup>2</sup> ) | 0.14                 | 0.11                      | 0.19              | 0.13                   | 0.12                       | N/A                          | N/A                           |

\*Estimated from PN plot   #Normalized from 1 MHz offset   & After on-chip div-by-2

FOM =  $-\text{PN} + 20 \log_{10}(f_o/\Delta f) - 10 \log_{10}(P_{DC}/\text{mW})$

Figure 23.5.6: Performance benchmark with the previously published VCOs.



Figure 23.5.7: Die micrograph of the fabricated VCO in 65nm CMOS.

### 23.6 A Quad-Core 15GHz BiCMOS VCO with -124dBc/Hz Phase Noise at 1MHz Offset, -189dBc/Hz FOM, and Robust to Multimode Concurrent Oscillations

Fabio Padovan<sup>1</sup>, Fabio Quadrelli<sup>1,2</sup>, Matteo Bassi<sup>1</sup>, Marc Tiebout<sup>1</sup>, Andrea Bevilacqua<sup>2</sup>

<sup>1</sup>Infineon Technologies, Villach, Austria

<sup>2</sup>University of Padova, Padova, Italy

The relentless development of next-generation communication and radar systems sets increasingly stringent requirements on the spectral purity of local oscillators. Decreasing phase noise is crucial to support efficient modulation formats with large symbol constellations, as well as to enable innovative radar applications, e.g., anti-collision, gesture recognition, and medical imaging. To minimize phase noise, bipolar transistors offer some advantages over ultra-scaled CMOS: higher supply voltage (thus larger oscillation amplitudes), lower 1/f noise, higher-Q passives (due to higher resistivity substrate and, possibly, thicker metals), and higher  $f_T$ ,  $f_{max}$  for a given technology node, which results in a cost advantage for a variety of medium-volume applications (e.g., infrastructure transceivers). For a given supply voltage, a tank showing a smaller resistance at resonance yields lower phase noise. As a result, the minimum phase noise achievable by a single voltage-controlled oscillator (VCO) is ultimately bounded by the smaller realizable inductor displaying the highest Q. To achieve significantly lower phase noise levels, bilaterally coupling  $N$  oscillators [1-3] is a viable option. However, to fully preserve the  $10\log(N)$  phase-noise advantage, while avoiding undesired multitone concurrent oscillations, the coupling network must be carefully designed. This work presents a quad-core bipolar VCO achieving phase noise as low as -124dBc/Hz at 1MHz offset from the 15GHz carrier, -189dBc/Hz figure-of-merit (FOM), and 16% tuning range. Insights are given into the design of the resistive network employed to couple the four oscillators, a key element in achieving the reported performance.

The block diagram of the proposed quad-core VCO is shown in Fig. 23.6.1. Four identical buffers are connected to  $V_{out}$  of each VCO, and one of them drives the pad for measurement. Each differential-VCO core is optimized for minimum phase noise. Class-C operation is leveraged to maximize the amplitude of oscillation, have higher power efficiency, and reduce the noise contribution of the tail generator and of the device base resistance [4]. The VCO is chosen to operate in the  $K_u$  band to allow for agile frequency generation when combined with frequency dividers/multipliers to address the needs of 5G and radar applications.

Frequency tuning at mm-waves in bipolar/BiCMOS technologies is a delicate task. The varactor showing the highest quality factor is often a pn-junction varactor, as in the employed technology. If the junction turns on during a part of the oscillation cycle, it may clip the tank voltage, drastically degrading the tank Q, and limiting the maximum attainable amplitude of oscillation. A step-down magnetic transformer is thus used to couple the varactor to the tank, as shown in Fig. 23.6.1. This arrangement yields several advantages, as the voltage swing across the varactor is reduced compared to the tank voltage, thus limiting the chance of forward conduction. Moreover, the tuning voltage is not required to exceed the supply voltage, and the galvanic isolation of the varactor decreases the oscillator pushing.

When employing transformed-based resonators, two modes of oscillation are in general possible [5]. The VCO may start-up at an undesired frequency or concurrent oscillations at harmonically unrelated frequencies may occur [6]. In a single-core design, robust operation at the lower-frequency mode is enforced by maximizing the magnetic coupling in the transformer [4]. Alternatively, the tank voltage can be fed back to the transistor bases tapping it at the transformer secondary, i.e. across the varactor. This arrangement selects the oscillation mode in a very robust way [5]. However, since in this design the transformer is a step-down one, such a topology would result in a phase-noise penalty, so it is discarded.

When two (or more) cores are coupled, the active devices of one core interact with the resonator of the other, which may lead to multimode oscillations. As shown in Fig. 23.6.2 for two coupled cores, the resonator of core 1 is terminated at port 1 on its own negative resistance ( $-1/G_{m1}$ ), while at port 2 it is loaded by the coupled core 2. At start-up, the small-signal termination conductance at port

2 ( $G_{12}$ ) can potentially be negative. The consequence is that both oscillation modes may be excited [5], as  $G_{12}$  shifts all the poles of the systems into the right-half plane, regardless of the value of  $G_{m1}$ . Whether a single oscillation mode dominates, or concurrent oscillations occur, depends on the nonlinearity of the active device [6]. To avoid concurrent oscillations,  $G_m$  can be decreased to have a smaller oscillator loop gain, yielding a dominant mode by design. However, this means reducing the bias current and thus the oscillation amplitude, losing the phase noise advantage gained by coupling multiple cores. To maintain the phase noise advantage of the multicore architecture, while avoiding oscillations at the higher frequency mode, resistors  $R_c$  are used to couple the cores (see Fig. 23.6.1), thus dumping  $G_{12}$  (see Fig. 23.6.2). The issue of concurrent oscillations in multicore VCOs is critical, as simulations in Fig. 23.6.3 illustrate. When  $R_c=0\Omega$ , large spectral components corresponding at both tank resonance frequencies are observed. If  $R_c$  is instead increased to  $30\Omega$ , a single-mode oscillation is robustly obtained. The value of the coupling resistance is chosen as a trade-off between guaranteeing reliable single-frequency oscillation and avoiding phase noise penalties in presence of mismatches between the tanks of the coupled oscillators [2].

Prototypes of the quad-core VCO are implemented in a  $0.13\mu m$  BiCMOS technology (see Fig. 23.6.7). The chip area including the pads is  $2mm^2$ , half of which taken by the oscillator. The current consumption of each core from the 3V supply is 6mA. To ensure robust and reliable operation, a dynamic biasing scheme [4] employing a CMOS Miller OTA is used (Fig. 23.6.1). For comparison, a single-core VCO is also realized in a separated test chip. Figure 23.6.4 shows the phase noise of the quad-core VCO, measured at a carrier frequency of 15GHz. In comparison to the performance of the single-core oscillator, a 6dB improvement is observed, as expected. Varying  $V_{tune}$  between 0 and 3.6V, the oscillation frequency spans from 11.8 to 15.6GHz. A phase noise variation as low as 2dB is achieved over a 16% tuning range, for  $0.6V < V_{tune} < 3V$ . If a larger phase-noise variation is tolerable, the VCO can operate over a wider 28% tuning range. The robustness of the proposed design with respect to temperature variation is reported in Fig. 23.6.5. The measured phase noise at 1MHz offset from a 15GHz carrier varies as little as  $<2dB$  in the temperature range from -35 to 85°C. The measured 1/f noise corner is  $<50kHz$  across temperature and tuning range. The measured VCO pushing is as low as 60MHz/V.

The performance of the quad-core VCO is summarized in Fig. 23.6.6, and compared to low-phase-noise VCOs operating over 10GHz by normalizing the phase noise to a common 15GHz carrier. The proposed resistive coupling network enables four transformer-based oscillator cores to be combined in a multicore architecture in a robust fashion. As a result, the reported VCO improves the phase noise performance by more than 2dB with respect to the prior designs in Fig. 23.6.6, without degrading FOM or tuning range. Minimum performance variation over the 16% tuning range and -35 to 85°C temperature span ensures this VCO can be effectively employed as a high-spectral-purity oscillator for next-generation communication and radar systems.

#### References:

- [1] S. Ahmadi-Mehr, et al., "Analysis and Design of a Multi-Core Oscillator for Ultra-Low Phase Noise," *IEEE TCAS-I*, vol. 63, no. 4, pp. 529-539, Apr. 2016.
- [2] L. Iotti, et al., "Insights into Phase-Noise Scaling in Switch-Coupled Multi-Core LC VCOs for E-Band Adaptive Modulation Links," *IEEE JSSC*, vol. 52, no. 7, pp. 1703-1718, July 2017.
- [3] D. Ghosh, et al., "A 10 GHz Low Phase Noise VCO Employing Current Reuse and Capacitive Power Combining," *IEEE CICC*, pp. 1-4, Sept. 2010.
- [4] F. Padovan, et al., "Design of Low-Noise K-Band SiGe Bipolar VCOs: Theory and Implementation," *IEEE TCAS-I*, vol. 62, no. 2, pp. 607-615, Feb. 2015.
- [5] A. Bevilacqua, et al., "Transformer-Based Dual-Mode Voltage-Controlled Oscillators," *IEEE TCAS-II*, vol. 54, no. 4, pp. 293-297, Apr. 2007.
- [6] A. Goel and H. Hashemi, "Frequency Switching in Dual-Resonance Oscillators," in *IEEE JSSC*, vol. 42, no. 3, pp. 571-582, Mar. 2007.



Figure 23.6.1: Quad-core-VCO block diagram and schematic.

Figure 23.6.2: Multicore coupling and issue of multi-mode concurrent oscillation; effect of coupling resistance  $R_c$ .Figure 23.6.3: Impact of coupling resistance: concurrent oscillations ( $R_c=0\Omega$ ), and single-tone oscillation ( $R_c=30\Omega$ ).

Figure 23.6.4: Measured quad-core-VCO phase noise and tuning range and comparison with a single-core VCO.



Figure 23.6.5: Measured quad-core-VCO phase noise as a function of temperature.

|                                                               | Iotti JSSC 2017 [2] | Ghosh CICC 2010 [3] | Padovan TCAS-I 2015 [4] | Nakamura ESSCIR 2009 | Moroni JSSC 2014 | This Work    |
|---------------------------------------------------------------|---------------------|---------------------|-------------------------|----------------------|------------------|--------------|
| Technology                                                    | 55nm BiCMOS         | 45nm CMOS           | SiGe HBT                | 0.18μm BiCMOS        | CMOS 65nm        | 130nm BiCMOS |
| $f_0$ [GHz]                                                   | 20                  | 10                  | 23                      | 19.4                 | 54               | 15           |
| Tuning Range [%]                                              | 15                  | 14.6                | 17                      | 16.6                 | 9                | 16           |
| Phase Noise at 1MHz [ $\text{dBc}/\text{Hz}$ ]                | -118.5              | -123                | -114                    | -116                 | -111             | -124         |
| Eq. Phase Noise at 1MHz from 15GHz [ $\text{dBc}/\text{Hz}$ ] | -121.5              | -119.5              | -118                    | -118                 | -122             | -124         |
| Phase Noise variation across tuning range [dB]                | 2                   | N/A                 | 4                       | 6                    | 3.5              | 2            |
| Type                                                          | Quad-Core           | Dual-Core           | Single-Core             | Single-Core          | Quad-Core        | Quad-Core    |
| $V_{cc}$ [V]                                                  | 1.2                 | 1.3                 | 3.3                     | 1                    | 1.2              | 3            |
| $P_{DC}$ [mW]                                                 | 50                  | 30                  | 18                      | 7.5                  | 146              | 72           |
| FOMT [ $\text{dBc}/\text{Hz}$ ]                               | -187.5              | -188                | -189                    | -193                 | -184             | -189         |
| FOMT [ $\text{dBc}/\text{Hz}$ ]                               | -191                | -191                | -193                    | -197                 | -182             | -193         |
| Area [mm <sup>2</sup> ]                                       | 0.6                 | 0.67                | 0.4                     | 0.16                 | 1.69             | 1            |

Figure 23.6.6: Performance summary and comparison with the previously published VCOs operating at  $f_0 > 10\text{GHz}$ .



Figure 23.6.7: Quad-core-VCO micrograph. VCO intrinsic area is  $1\text{mm}^2$ ; chip area with pads is  $2\text{mm}^2$ .

### 23.7 A 7.4-to-14GHz PLL with 54fs<sub>rms</sub> Jitter in 16nm FinFET for Integrated RF-Data-Converter SoCs

Didem Turker<sup>1</sup>, Ade Bekele<sup>1</sup>, Parag Upadhyaya<sup>1</sup>, Bob Verbruggen<sup>2</sup>, Ying Cao<sup>1</sup>, Shaojun Ma<sup>1</sup>, Christophe Erdmann<sup>2</sup>, Brendan Farley<sup>2</sup>, Yohan Frans<sup>1</sup>, Ken Chang<sup>1</sup>

<sup>1</sup>Xilinx, San Jose, CA; <sup>2</sup>Xilinx, Dublin, Ireland

Direct-RF data converters [1,2] have seen increased adoption in remote-radio-head TX and RX, due to their unparalleled bandwidth and flexibility. However, since these converters need to directly synthesize and sample multi-GHz radio signals, the sampling clock must exhibit excellent phase-noise performance, to minimize self- and adjacent-channel mixing, and strong suppression of reference and harmonic spurs, to meet stringent out-of-band emissions and minimize aliased energy. Furthermore, a wide range of sampling frequencies is required for the flexibility to cover multiple bands. Due to these stringent requirements, typically, external PLLs are employed, adding to the BOM cost. This work presents techniques for a fully integrated 7.4-to-14GHz PLL in 16nm FinFET that has 54fs<sub>rms</sub> jitter to satisfy the low noise requirements of RF data converters.

The design of a wide-range PLL, which implies wide-range LC VCOs for cost optimization, presents significant challenges. The VCO typically requires large capacitor banks, compromising the quality factor (Q) of the LC tank. Some phase-noise reduction techniques like the use of AC coupling capacitors to optimize varactor DC operating points and the use of harmonic filtering are not feasible in a wide-range VCO design. In addition to the wide-range-VCO design challenges, the increased flicker noise of FinFET devices impacts the performance of every PLL sub-block, particularly the VCO, which has 1/f<sup>2</sup> rather than 1/f slope at low offset frequencies. To meet the overall phase-noise requirement, a wide PLL bandwidth is preferred, which means all sub-blocks that contribute to the in-band phase noise of the PLL must have low noise.

Figure 23.7.1 shows the RF SoC architecture, where the PLL, followed by a divider and clock distribution buffers, provides clocks to the DAC and ADC slices. The PLL uses two LC VCOs to generate continuous frequency coverage from 7.4 to 14GHz. A charge pump (CP), a loop filter, and CMOS circuits, such as a feedback divider (/N), a phase frequency detector (PFD), a reference-clock input buffer, and reference clock distribution, all contribute to the PLL in-band phase noise. In order to keep their flicker- and thermal-induced phase noise low, the CMOS circuits require fast edge-rate. All flip-flops and the buffers in the feedback divider and the PFD direct path have less than 10ps rise/fall time, and the PFD dead-zone mitigation pulse width is between 30 to 40ps over all PVT conditions. The supply for the CMOS circuits is isolated from the rest of the design in order to mitigate spurs associated with self-induced supply noise from the fast edge-rate. Moreover, CP and VCO each have separate deep n-wells to improve substrate noise isolation.

Figure 23.7.2 shows the charge-pump (CP) circuit. The CMOS input from the PFD is converted to differential to drive a PMOS-and-NMOS current-steering pair, which works off the regulated supply. To keep a large dynamic range, 18 CP unit slices are used in parallel to deliver large current required to keep noise low. A replica bias is used to ensure that up and down currents are matched within <1% over the CP dynamic range. It is sized 4x of the CP unit cell to ensure better matching and lower noise. An RC filter is added to the PMOS current source to reduce output noise. A unity-gain feedback opamp is used to keep the charge-pump output common mode constant to reduce dynamic CP mismatch. The current sources are stacked for high output impedance and low noise. The small current mismatch, coupled with the use of small resistor values and low leakage MOM capacitors in the loop filter, significantly improves the PLL spurious response.

To mitigate the impact of the flicker noise on the VCO voltage regulator, the noise of the bandgap and bias generation circuitry, which delivers the regulator reference voltage Vref, is filtered by a passive RC filter, as illustrated in Fig. 23.7.3. In addition, a programmable FET resistor in sub-threshold operation is placed at the output of the OTA to realize large (several mega-ohms) resistance to implement <10kHz low-pass filter with a small silicon footprint. The use of a large resistor in the regulator feedback path is possible due to the low gate leakage of the FinFET process.

Two LC VCOs are designed to cover 7.4 to 14GHz PLL range with >1GHz frequency overlap to ensure continuous frequency coverage over PVT. In the 16nm FinFET process, the flicker noise of a PMOS transistor is higher than that of an NMOS transistor; therefore, an all-NMOS VCO architecture with stacked gm pair and current bias is implemented. A closed-loop coarse tuning FSM sets a 6-bit and a 5-bit coarse tuning word to lower-band (LB) and upper-band (UB) VCOs, respectively, to control binary weighted high-Q MOM capacitor banks to choose the proper frequency band. The coarse tuning capacitor array consists of MOM capacitor units with an NMOS switch M1 and two-stack NMOS pull devices as illustrated in Fig. 23.7.3. When the unit is on, the source/drain (S/D) nodes A and B of M1 are pulled to ground through the high impedance pull devices to achieve higher Q. When the unit is off, nodes A and B are pulled high to minimize Q degradation due to S/D leakage of M1, especially at higher temperatures.

The MOM capacitor units are placed between the inductor coil legs in the layout and the units of different binary bits are distributed as a common centroid to mitigate inductor-leg distributive effect to ensure sufficient overlap between adjacent coarse frequency bands as illustrated in Fig. 23.7.3. This also allows the use of smaller tuning varactors, which is the lowest Q element in the LC tank, further improving the VCO phase-noise performance. To ensure that the VCO stays within the same coarse frequency band when temperature drifts after the initial coarse band tuning, temperature compensation is applied using varactors controlled by a bandgap circuit that generates a voltage, V<sub>te</sub>, proportional to the temperature. The V<sub>te</sub> voltage is RC-filtered to limit its noise contribution to the LC VCO. A small-silicon-footprint inductor (coil diameter < 90μm) with Q > 17 is used for the LC tank.

The RF SoC with the integrated PLL is fabricated in a 16nm FinFET process. Figure 23.7.4 shows the phase noise measured at the divide-by-2 output, with the PLL running at 12.5GHz using a 500MHz reference. The PLL achieves phase noise of -120dBc/Hz at 100kHz and -123dBc/Hz at 1MHz offset. The rms jitter integrated from 10kHz to 10MHz is 53.6fs. The regulator-output FET filter improves phase noise by 3.5dB around 300kHz and the rms jitter by 10fs. Figure 23.7.5 shows the spurious response of the PLL observed at the DAC output with PLL running at 12.5GHz and its output divider set to 2. A sinusoidal data pattern is applied to the DAC such that when sampled by 6.25GHz clock, the DAC output tone is at 1.052GHz. Measured reference spur level is less than -75dBc at F<sub>ref</sub>=500MHz offset and is less than -80dBc for the higher-order reference harmonics.

Figure 23.7.5 shows the phase noise of the PLL for 14, 12.5, 10, and 8GHz, measured after the divide-by-2 circuit, with a 500MHz reference, demonstrates consistent low in-band noise throughout the PLL operation range. The PLL dissipates 45mW of total power at 12.5GHz, including the reference distribution buffers. Figure 23.7.6 summarizes PLL performance along with previously published RF PLLs [3-6] where phase noise is normalized to 1GHz for comparison. A figure-of-merit (FOM<sub>T</sub>) that takes integrated jitter, power consumption, and tuning range into account is defined, and this work exhibits an FOM<sub>T</sub> of -246.8dB. Figure 23.7.7 shows the micrographs of the RF SoC and PLL, where the PLL occupies 0.35mm<sup>2</sup> area.

#### Acknowledgments:

The authors would like to thank Xilinx SerDes layout, verification, and application teams for making this project possible.

#### References:

- [1] J. Wu, et al., "A 4GS/s 13b Pipelined ADC with Capacitor and Amplifier Sharing in 16nm CMOS", ISSCC, pp. 466-467, Feb. 2016.
- [2] C. Erdmann, et al., "A 330mW 14b 6.8GS/s Dual-Mode RF DAC in 16nm FinFET Achieving -70.8dBc ACPR in a 20MHz Channel at 5.2GHz", ISSCC, pp. 280-281, Feb. 2017.
- [3] X. Gao, et al., "A 2.2GHz Sub-Sampling PLL with 0.16ps<sub>rms</sub> Jitter and -125dBc/Hz In-Band Phase Noise at 700μW Loop-Components Power", IEEE Symp. VLSI Circuits, pp. 139-140, June 2010.
- [4] X. Gao, et al., "A 2.7-4.3GHz 0.16ps<sub>rms</sub> Jitter -246.8dB FOM Digital Fractional-N Sampling PLL in 28nm CMOS", ISSCC, pp. 174-175, Feb. 2016.
- [5] C. Yao, et al., "A 14nm Fractional-N Digital PLL with 0.14ps<sub>rms</sub> Jitter and -78dBc Fractional Spur for Cellular RFICs", ISSCC, pp. 422-423, Feb. 2017.
- [6] M. Raj, et al., "A 164fs<sub>rms</sub> 9-to-18GHz Sampling Phase Detector Based PLL with In-Band Noise Suppression and Robust Frequency Acquisition in 16nm FinFET", IEEE Symp. VLSI Circuits, pp. 182-183, June 2017.



Figure 23.7.1: Direct-RF data-converter SoC architecture with integrated PLL.



Figure 23.7.2: Charge-pump circuit diagram.



Figure 23.7.3: LC-VCO and voltage-regulator circuit diagram.



Figure 23.7.4: Phase-noise measurement of the 12.5GHz PLL at the divide-by-2 output w/wo regulator FET filter.



Figure 23.7.5: PLL spurious response measured at the DAC output and phase noise for 4-to-7GHz output frequencies.

| PLL Architecture                         | [3]                   | [4]              | [5]             | [6]                  | This Work           |
|------------------------------------------|-----------------------|------------------|-----------------|----------------------|---------------------|
| VCO Technology                           | Integer N, SSPD based | FracN, SSPD DPLL | FracN, DPLL     | Integer N, SPD based | Integer N, CP based |
| Reference Freq.(MHz)                     | 180nm                 | 28nm             | 14nm FinFET     | 16nm FinFET          | 16nm FinFET         |
| Frequency Range (GHz)                    | 2.21                  | 2.7 – 4.3        | 5.38            | 9 – 18               | 7.4 – 14            |
| Measurement Frequency (GHz)              | 2.21                  | 5.82             | 2.69            | 18                   | 6.25                |
| Phase Noise @100kHz (dBc/Hz)             | -125 (@200kHz)        | -105.5           | -113.6          | -104.1 (@200kHz)     | -120                |
| Phase Noise @1MHz (dBc/Hz)               | -125 (from figure)    | -115.4           | -122.45         | -107.3               | -123.2              |
| Phase Noise @100kHz (normalized to 1GHz) | -131.9 (@200kHz)      | -120.8           | -122.2          | -129.2 (@200kHz)     | -135.9              |
| Phase Noise @1MHz (normalized to 1GHz)   | -131.9                | -130.7           | -131            | -132.4               | -139.1              |
| RMS Jitter (fs)                          | 160 (10k – 100M)      | 159 (10k – 40M)  | 137 (10k – 10M) | 164 (1k – 100M)      | 53.6 (10k – 10M)    |
| Reference Spur (dBc)                     | -56                   | -78              | -87.6           | N.A                  | -75.5*              |
| Power (mW)                               | 2.5                   | 8.2              | 13.4            | 29.2                 | 45                  |
| Area (mm²)                               | 0.2                   | 0.3              | 0.257           | 0.39                 | 0.35                |
| FOM <sub>T</sub> (dB)                    | N.A                   | -243.4           | N.A             | -239.3               | -246.8              |

\* including DAC, measured at 1.052GHz DAC output

$$FOM_T = 10 \log \left( \left( \frac{\sigma_{rms}}{1s} \right)^2 \times \frac{P}{1mW} \times \frac{1}{TR} \right) \text{ where } TR = \left( \frac{f_{max} - f_{min}}{f_{mid}} \right)$$

Figure 23.7.6: Performance summary and comparison with previously published PLLs.



Figure 23.7.7: RF SoC and PLL micrograph (4.4mm × 2.35mm).

# Session 24 Overview: *GaN Drivers and Converters*

## POWER MANAGEMENT SUBCOMMITTEE



**Session Chair:**  
***Yogesh Ramadas***  
*Texas Instruments, Santa Clara, CA*



**Associate Chair:**  
***Gerard Villar Piqué***  
*NXP Semiconductors,  
Eindhoven, The Netherlands*

**Subcommittee Chair: *Axel Thomsen*, Cirrus Logic, Austin, TX**

GaN power devices have garnered significant attention for their reduced switching losses leading to small-form-factor high-frequency switching converters. However, issues related to reliable GaN gate drivers, level shifters with high common-mode immunity and zero-voltage switching detection remain. This session presents recent advances in gate drivers and ZVS detection schemes for GaN-based power converters.



8:30 AM

**24.1 A 2MHz 150-to-400V Input Isolated DC-DC Bus Converter with Monolithic Slope-Sensing ZVS Detection Achieving 13ns Turn-On Delay and 1.6W Power Saving**

*L. Cong*, University of Texas at Dallas, Richardson, TX and Texas Instruments, Santa Clara, CA

In Paper 24.1, the University of Texas at Dallas describes a 2MHz 150-to-400V isolated phase-shifted full-bridge bus converter using gate-drivers that perform slope-sensing-based ZVS detection. The adaptive dead-time circuitry reduces converter loss by 1.6W and enables 2MHz switching frequency which is 14 $\times$  better than state of the art for comparable power efficiency.



9:00 AM

**24.2 A Fully Integrated Three-Level 11.6nC Gate Driver Supporting GaN Gate Injection Transistors**

*B. Wicht*, Reutlingen University, Reutlingen, Germany

In Paper 24.2, Reutlingen University presents a 1.5A GaN gate driver that provides the capability of bipolar/three-level gate drive without the need for any external capacitors. The integrated buffer capacitors enable the gate driver to support 11.6nC gate charge and the ability to drive gate-injection GaN transistors.



9:30 AM

**24.3 A 3-to-40V  $V_{IN}$  10-to-50MHz 12W Isolated GaN Driver with Self-Excited  $t_{dead}$  Minimizer Achieving 0.2ns/0.3ns  $t_{dead}$ , 7.9% Minimum Duty Ratio and 50V/ns CMTI**

*X. Ke*, University of Texas at Dallas, Richardson, TX

In Paper 24.3, the University of Texas at Dallas describes a GaN DC-DC converter for automotive application, which employs an isolated gate driver with dead-time and duty-ratio minimization schemes. The dead times can be reduced to 0.2ns while supporting a 40V-to-3.3V conversion. The on-die ground-isolation scheme can handle CMTI rates up to 50V/ns.

## 24.1 A 2MHz 150-to-400V Input Isolated DC-DC Bus Converter with Monolithic Slope-Sensing ZVS Detection Achieving 13ns Turn-On Delay and 1.6W Power Saving

Lin Cong<sup>1,2</sup>, Hoi Lee<sup>1</sup>

<sup>1</sup>University of Texas at Dallas, Richardson, TX

<sup>2</sup>Texas Instruments, Santa Clara, CA

The growing development of industrial power supplies demands DC-DC converters that are increasingly efficient and reliable. Industrial bus supplies that operate from a front-end PFC could experience fluctuations from 150V to 400V [1]. This requires isolated DC-DC converters to function over a wide range of the high-input condition. Zero-voltage switching (ZVS) with proper dead-time control is essential in these converters to remove the dominant switching power loss for achieving better power efficiency. Specifically, each power FET should be controlled to turn on right after its ZVS operation is complete. If the dead time is set too short, the power FET would be turned on with large drain-to-source voltage, resulting in large turn-on loss. The converter would also suffer from large reverse conduction loss of the power FETs due to excessive dead time [2,3]. Since the ideal dead time varies with different input voltages and load currents, the capability of generating adaptive dead time with minimal turn-on delay ( $t_{don}$ ) in the ZVS converters is important, especially for high-frequency converter operation, but is challenging to realize. Previously, a valley switching scheme was reported to provide adaptive dead-time control under different conditions. However, it requires an off-chip auxiliary winding that complicates the magnetic design, and needs  $t_{don} \geq 100$ ns that limits the converter operation in the MHz range [4].

This paper presents a synchronous gate driver with a fully integrated ZVS detector (**ZVSD**) in a GaN isolated DC-DC converter to support 150 to 400V input. Adaptive dead time with  $\sim 13$ ns  $t_{don}$  is achieved to enable MHz converter operation and minimize various power losses under different input voltages and load currents. A level shifter with a differential-mode noise-blanking (**DMNB**) scheme is also developed to enhance the converter reliability at high input voltages.

Figure 24.4.1 shows the structure of the full-bridge isolated DC-DC converter that consists of four power FETs ( $M_{HA}$ ,  $M_{LA}$ ,  $M_{HB}$  and  $M_{LB}$ ) on the primary side.  $M_{HA}$  and  $M_{LA}$  are driven out of phase compared to  $M_{HB}$  and  $M_{LB}$ , respectively, to control the output voltage. A 2<sup>nd</sup>-order LC lowpass filter formed by  $L_F$  and  $C_0$  reduces the output voltage ripple. During the switching transitions, transient currents of the leakage inductance  $L_R$  charge or discharge the switching nodes  $V_{SWA}$  and  $V_{SWB}$  to realize ZVS of different power FETs. Figure 24.4.1 also depicts the architecture of the gate driver for  $M_{HA}$  and  $M_{LA}$ , in which turn-on and -off mechanisms of each power FET are determined by the proposed ZVSD and the PWM controller, respectively. The proposed ZVSD relies on sensing the rising (falling) slope of node voltage  $V_{SWA}$  and determines the dead time for turning on  $M_{HA}$  ( $M_{LA}$ ) if the slope changes from positive (negative) to 0 at the end of the ZVS transition. Moreover, the fully integrated implementation of this slope-sensing ZVSD (**SS-ZVSD**) can significantly reduce  $t_{don}$  as compared to the prior discrete-based valley switching detection [4]. Since the reverse conduction of GaN FETs occurs during  $t_{don}$ , as shown in Fig. 24.1.1, the proposed SS-ZVSD with short  $t_{don}$  enables minimal reverse conduction loss over wide ranges of the input voltage and load current.

Figure 24.1.2 shows both high- and low-side SS-ZVSD circuits to control the turn-on of  $M_{HA}$  and  $M_{LA}$ , respectively. Each SS-ZVSD circuit mainly consists of a voltage-slope detector (**VSD**) and a current comparator. Capacitor  $C_P$  in the VSD is the parasitic capacitance of an always-off HV NMOS  $M_D$ , for converting the slope of  $V_{SWA}$  to a current pulse  $I_{SEN}$  in both high- and low-side ZVSD circuits. Without loss of generality, the operation of turning on  $M_{HA}$  is described as follows. After the low-side  $M_{LA}$  is off, the transient current of  $L_R$  flows into the node  $V_{SWA}$  such that both voltages  $V_{SWA}$  and  $V_{BT}$  increase towards  $V_{IN}$ , and thus  $C_P$  generates a positive current pulse  $I_{SEN}$  ( $= C_P dV_{SWA}/dt$ ). The current comparator then triggers a pulse  $I_{SENH}$  to reset  $V_{DH}$  to logic low. After the dead time (the duration of  $V_{DH}$  being logic low),  $M_{HA}$  is turned on when  $V_{SWA}$  reaches  $V_{IN}$  (full ZVS) or its peak value (partial ZVS) for momentarily reducing the slope of  $V_{SWA}$  to 0. The operation of the SS-ZVSD indicates that the dead time would be adaptively generated when the input  $V_{IN}$  (thus  $V_{SWA}$ ) or load  $I_0$  (thus  $L_R$  current) changes. As both SS-ZVSD

circuits guarantee correct operation if  $|dV_{SWA}/dt| > 0.5$ V/ns in this design, the converter can automatically realize ZVS operation over wide ranges of the input voltage and load current. Moreover, the capability of the SS-ZVSD to support both full- and partial-ZVS operations enhances the converter reliability especially in the high input-voltage condition.

High-frequency converter operation needs to minimize  $t_{don}$ , which is the duration needed to turn on  $M_{HA}$  or  $M_{LA}$  after the end of the ZVS transition. Since both high- and low-side SS-ZVS detections are directly performed in high- and low-voltage domains, respectively, the delay of the level shifting is bypassed. Additionally, on-chip capacitors  $C_H$  and  $C_L$  in Fig. 24.1.2 provide significant dynamic current to discharge the capacitance at the drain terminals of  $M_{n2}$  and  $M_{n4}$  in the current comparator after the ZVS transition finishes. Hence, both high- and low-side ZVS detection delays are reduced to  $< 8$ ns and  $< 3$ ns, respectively, under different input voltages and load currents. Together with the buffer and logic delays, both values of high- and low-side  $t_{don}$  achieve  $< 13$ ns and  $< 6$ ns, respectively. The total power dissipation of both SS-ZVSD circuits is 200 $\mu$ W, which is only 0.0002% of the converter minimum output power of 100W.

The reliability of the HV converter also depends on the design of the level shifter. As shown in Fig. 24.1.3, with the large  $dv/dt$  at node  $V_{SWA}$  at high input voltages, large parasitic capacitances of HV LDMOS  $M_{n5}$ ,  $M_{n6}$  would pull  $V_{HS}$  and  $V_{HR}$  to logic low, resulting in false gate-drive signals during start-up. The conventional short-pulse blanking scheme is also unable to be used for the ZVS gate drivers because the  $V_{SWA}$  ramp-up occurs earlier than the turn-on of the RS latch. To enhance the reliability of the HV level shifter, the DMNB circuit shown in Fig. 24.1.3 is proposed. Specifically, all transistors  $M_{p7}$  to  $M_{p10}$ ,  $M_{n7}$ ,  $M_{n10}$  in the DMNB circuit are turned on when  $V_{SWA}$  experiences large  $dv/dt$ . With specific size ratio,  $M_{p7}$  (or  $M_{p10}$ ) provides a higher current than  $M_{n7}$  (or  $M_{n10}$ ), so both outputs  $V_{HS}'$  and  $V_{HS}$ ' of the DMNB circuit can be held at logic low for blanking the  $dv/dt$  noise. If only one voltage  $V_{HS}$  (or  $V_{HR}$ ) is pulled low in the normal operation,  $V_{HS}'$  (or  $V_{HR}'$ ) simply follows with negligible delay. Hence, the reliability of the level shifter and thus the converter is improved by the proposed DMNB circuit that only dissipates 120 $\mu$ W.

With the proposed gate driver, the isolated ZVS converter with four 650V eGaNs (GS66502B) as power FETs and a planar transformer using EQ30-3F45 can support  $V_{IN}$  from 150V to 400V and deliver output power up to 250W. Figure 24.1.4 shows that measured turn-on delays ( $t_{don}$ ) of the high-side  $M_{HA}$  and low-side  $M_{LA}$  are within 13ns and 6ns, respectively, for both inputs of 200V and 400V. Figure 24.1.5 further depicts that the variation of  $t_{don}$  for turning on  $M_{HA}$  is only 1ns when  $V_{IN}$  varies from 150V to 400V and  $I_0$  changes from 2A to 5A. The converter is also proven to operate at 1MHz with  $V_{in} = 300$ V and 2MHz with  $V_{in} = 150$ V. Moreover, Fig. 24.1.6 shows that the proposed gate driver with the SS-ZVSD reduces the converter power loss up to 1W and 1.6W under ( $V_{IN} = 400$ V,  $f_{SW} = 1$ MHz) and ( $V_{IN} = 150$ V,  $f_{SW} = 2$ MHz), respectively, as compared to utilizing the prior valley switching scheme with 100ns  $t_{don}$ . Comparisons with prior HV converters with similar input and output voltages are also provided in Fig. 24.1.6. The proposed gate driver with the SS-ZVSD is the only one that has the fully-integrated adaptive dead-time control. With  $> 7.6$ x reduction in  $t_{don}$ , this GaN isolated ZVS converter can operate up to 14x higher frequency with similar power efficiency over wider ranges of the input voltage and output power as compared to state-of-the-art counterparts [1,4]. Figure 24.1.7 shows the micrograph of the proposed gate driver in a 1.45 $\mu$ m 700V BCD process.

### Acknowledgements:

This work was supported by SRC/TxACE with the contract number 1836.048. The authors would like to thank Wei Zhang and Zhong Ye of Texas Instruments for technical suggestions and help, and Texas Instruments for the IC fabrication support.

### References:

- [1] VICOR, "Micro Family 300V Input DC-DC Converter Module," *datasheet*, 2017.
- [2] Y. Xia and R. Ayyanar, "Inductor Feedback ZVT Based, Low THD Single Phase Full Bridge Inverter with Hybrid Modulation Technique," *IEEE APEC*, pp. 3444-3450, March 2017.
- [3] J. Strydom, "Dead Time Optimization for Maximum Efficiency," *EPC white paper*, 2013.
- [4] MPS, "HPC0100 - Quasi Resonant Controller," *datasheet*, 2015.



Figure 24.1.1: Proposed isolated DC-DC converter with fully integrated ZVS adaptive dead-time control for minimizing both turn-on delay and reverse-conduction loss.



Figure 24.1.2: Slope-sensing ZVSD circuits and key operation waveforms for adaptive dead-time control.



Figure 24.1.3: Schematic and advantage of the dynamic HV level shifter with the differential-mode noise-blanking scheme.



Figure 24.1.4: Measured ZVS turn-on delays for both high-side and low-side power FETs under  $V_{IN}$  of 200V and 400V.



Figure 24.1.5: Measurements of variation of turn-on delay and steady-state converter operation waveforms under different conditions.



Figure 24.1.6: Power-efficiency measurements and converter performance comparisons.

|                        | VICOR<br>V300C48C150BL [1]     | MPS<br>HFC0100 [4]                       | This work                                                 |
|------------------------|--------------------------------|------------------------------------------|-----------------------------------------------------------|
| $V_{IN}$ [V]           | 180 – 375 DC                   | 85 – 265 AC                              | 150 – 400 DC                                              |
| $V_G$ [V]              | 48                             | 24                                       | 48                                                        |
| Max $P_{OUT}$ [W]      | 150                            | 120                                      | 250                                                       |
| $F_{sw}$ [kHz]         | Not reported                   | 140                                      | Up to 2000                                                |
| Power switch           | MOSFET                         | MOSFET                                   | 650V GaN                                                  |
| Switching method       | ZVS                            | ZVS                                      | ZVS                                                       |
| Gate driving Method    | Isolated driver w/ transformer | Integrated driver for low-side FET       | Integrated synchronous driver for high- and low-side FETs |
| ZVS turn-on delay [ns] | Not available                  | Auxiliary winding based valley detection | Integrated $V_{SW}$ slew rate detection                   |
| Peak power efficiency  | 88.2%                          | 90.5% (220VAC, 1.5A)                     | 90.2% (400V, 1MHz, 4A)<br>88.6% (150V, 2MHz, 3.3A)        |



Figure 24.1.7: Die micrograph of the proposed gate driver in a 1.45 $\mu$ m 700V BCD process with total die area of 9mm<sup>2</sup> excluding the testing circuit.

## 24.2 A Fully Integrated Three-Level 11.6nC Gate Driver Supporting GaN Gate Injection Transistors

Achim Seidel<sup>1</sup>, Bernhard Wicht<sup>1,2</sup>

<sup>1</sup>Reutlingen University, Reutlingen, Germany

<sup>2</sup>Leibniz University Hannover, Hannover, Germany

Due to their superior fast-switching performance, GaN transistors show enormous potential to enable compact power electronics in applications like renewable energy, electrical cars and home appliances by shrinking down the size of passives. However, fast switching poses challenges for the gate driver.

Since GaN transistors have a low threshold voltage  $V_t$  of ~1V, an unintended driver turn-on can occur in case of a unipolar gate control as shown for a typical half-bridge in Fig. 24.2.1 (top left). This is due to coupling via the gate-drain capacitance (Miller coupling), when the low-side driver turns on, causing a peak current into the gate. This is usually tackled by applying a negative gate voltage to enhance the safety margin towards  $V_t$ , resulting in a bipolar gate-driving scheme. In many power-electronics applications GaN transistors operate in reverse conduction, carrying the inductor current during the dead time  $t$ , when the high-side and low-side switch are off (as illustrated at a high-side switch in Fig. 24.2.1, bottom left). As there is no real body diode as in silicon devices, the GaN transistor turns on in reverse operation with a voltage drop  $V_F$  across the drain-source terminals (quasi-body diode behavior). As a negative gate voltage adds to  $V_F$ , 63% higher reverse-conduction losses were measured for a typical GaN switch in bipolar gate-drive operation. This drawback is addressed by a three-level gate voltage (positive, 0V, negative), which at the same time provides robustness against unintended turn-on similar to the bipolar gate driver, proven in [1] for a discrete driver.

Conventional gate drivers with bipolar gate voltages (Fig. 24.2.1 top right) consist often of a unipolar gate driver IC, while the source of the GaN transistor is connected to an intermediate voltage level. Two external capacitors buffer the gate charge for the positive and negative voltage levels. An additional charge pump with an external capacitor or a bulky and expensive isolated supply (transformer) is required to generate the negative  $V_{GS}$ . The discrete bipolar driver in [2] requires no additional charge pump, but still two external capacitors. [3] presents a microwave-driven bipolar gate driver without any external capacitors; however, it does not buffer the transmitted energy, limiting the gate current to ~30mA.

This work presents a gate driver, which utilizes a full-bridge architecture (Fig. 24.2.1 bottom right) to provide a bipolar / three-level gate drive voltage. As GaN transistors operate within a typical  $V_{GS}$ -range of ±5V, the four transistors of the driver output stage can be designed with just low-voltage transistors. Depending on the bridge polarity, a positive, negative and zero gate-source voltage is applied to the GaN device. To avoid the disadvantages of an external  $C_{DRV}$ , the concept of high-voltage energy storing (HVES) with integrated buffer capacitor [4] is used at both the gate and the source. The integrated HVES circuit is able to deliver high-current pulses based on a resonant discharging of a high-voltage capacitor over an inductor to the gate. Since the energy in the capacitor is proportional to the square of its voltage (e.g. 15V), the circuit can provide a high amount of charge in a small die area. A capacitor-sharing charge pump ( $C_{HV}$  CP) on the source side enables the recharge of the high-voltage buffer capacitor  $C_{HVN}$  from a non-isolated driver supply, during driver off-state. By integrating all capacitors, the presented work reduces the pin count as well as bond wires in the gate loop and provides more flexibility for the PCB and system design.

Figure 24.2.2 (top) shows the circuit of the implemented three-level gate driver. To turn-on the GaN switch,  $MN_p$  is turned off by the driver control signal  $DRV_{INP}$ . In case of a bipolar driver operation,  $DRV_{INN}$  turns 'high', switching off  $MP_n$  and turning on  $MN_n$  to discharge the source S.  $M_{HVP}$  turns on, triggering the 'Gate driver (with HVES)' [4], charging the GaN gate G resonantly from the high-voltage capacitor  $C_{HVP}=0.6nF$  through the integrated inductor  $L_{HVP}\sim14nH$  and an active rectifier. Triggered by the GaN gate node G,  $MP_p$  turns on, keeping G with a low-ohmic and low-inductive connection at  $V_{DRV}$  level in the static state. For driver turn-off,  $MP_p$  turns off and  $MN_p$  on, controlled by the driver signal  $DRV_{INP}$ , discharging node G. Triggered by  $DRV_{INN}$ , the high-voltage transistor  $M_{HVN}$  initiates a resonant current pulse from  $C_{HVN}$  over  $L_{HVN}$ , charging node S. The current path is closed via  $MN_p$  and the low-voltage charge pump transistor  $M_{CP}$ , which is in the on-state during driver turn-off. Controlled by the rising node S,  $MP_n$  connects S to  $V_{DRV}$ . In three-level mode,  $DRV_{INN}$  turns 'high' before the next driver turn-on event, turning  $MP_n$  off and discharging node S via  $MN_n$ .

In case of an isolated supply (supply ground referred to chip ground),  $C_{HVP}/C_{HVN}$  can be connected to the chip ground. For recharging  $C_{HVP}/C_{HVN}$  from  $V_{SUP}$  (typ. ~15V), the back-to-back switches  $B2B_p/B2B_n$  turn on and the current path closes over the capacitors and the chip ground. In case of a non-isolated supply (supply ground referred to the source S), the source node S is connected to  $V_{DRV}$ , during driver off-state (negative gate level). As the capacitor's  $C_{HVP}/C_{HVN}$  bottom plates are connected to the chip ground, the charging current would flow through  $C_{DRV}$  and  $MP_n$  to S, discharging  $C_{DRV}=3.6nF$ . To avoid this, the bottom plate of  $C_{HVP}$  can be connected to the source node S of the GaN transistor. However, for  $C_{HVN}$  this is not possible, because it needs to have a chip ground connection during driver turn-off. For this reason, the implemented  $C_{HV}$  charge pump ( $C_{HV}$  CP), comprising  $M_{CP}$  and  $D_{CP}$ , reuses  $C_{HVN}$  as a charge pump capacitor. Figure 24.2.2 (bottom right) shows its function. During driver turn-off, node G is shorted to chip ground, while  $M_{CP}$  is active in order to charge node S. The resonant characteristics of the gate driver with HVES cause the voltage at node S to exceed  $V_{DRV}$ , such that  $C_{DRV}$  recharges via the body diode of  $MP_n$ . The same  $C_{DRV}$  recharge scheme applies on the gate side at driver turn-on. After driver turn-off,  $M_{CP}$  turns off and  $C_{HVN}$  is recharged via  $D_{CP}$  and  $MP_n$  directly to the source. In case of a low-side driver, the driver supply is available also during driver on-state. Since the chip ground is connected to node S via  $MN_n$ , the proposed concept allows to recharge all capacitors referred to S.

The gate driver IC was fabricated in a 0.18µm HV BiCMOS technology. In order to verify its driving capability, a capacitor  $C_{GS,add}$  was added, which artificially increases the gate capacitance (Fig. 24.2.3 right). A maximum gate charge of  $Q_{max}=11.6nC$  can be delivered and maximum gate drive currents of  $I_{Gsource,max}=1.5A$  and  $I_{Gsink,max}=1.3A$  were confirmed.

The proposed three-level driver also supports GaN gate injection transistors (GIT), Fig. 24.2.3. An LDO delivers a DC gate current  $I_G$  in the mA-range during driver on-state. For non-GIT GaN devices, the LDO acts as a voltage regulator for  $V_{DRV}$ . In contrast, solutions like [2] would require an additional current-consuming resistor in parallel to  $C_{GS}$ .

The double pulse test, Fig. 24.2.4, confirms fast transients of >80V/3ns for a GIT in three-level mode at a switching voltage  $V+=100V$  and  $I_L=10A$  load.

The gate driver and the  $C_{HV}$  charge pump (Fig. 24.2.2) deliver the charge every driver switching cycle. In some power applications, it is necessary to provide a negative gate voltage during long driver turn-off phases or at startup of the particular application. To supply  $V_{DRV}$ , an additional charge pump ( $V_{DRV}$  Charge Pump) can be activated (Fig. 24.2.5), which needs to deliver only the static current, drawn from  $V_{DRV}$ . The bottom part of Fig. 24.2.5 shows the benefit of the  $V_{DRV}$  CP, supporting long driver turn-off phases. This is in contrast to the case if  $V_{DRV}$  CP is disabled, Fig. 24.2.5 bottom left. The driver is supplied by the  $C_{HV}$  charge pump, and additionally from  $V_{SUP}$  in a low-side configuration during driver on-state. Figure 24.2.5 top right shows a driver startup measurement.

The comparison to prior art and the die micrograph are depicted in Fig. 24.2.6 and Fig. 24.2.7, respectively. The proposed driver is the only fully integrated gate driver that provides a three-level gate drive voltage (positive, 0V, negative). Without any external capacitors, it delivers up to 11.6nC gate charge with a high gate-drive-current capability of 1.5A. The driver also supports a limited on-state gate current, for driving GaN gate injection transistors (GIT).

### References:

- [1] Z. L. Zhang, et al., "Three-Level Gate Drivers for eGaN HEMTs in Resonant Converters", *IEEE Trans. Power Electronics*, pp. 5527-5538, July 2017.
- [2] D. Bortis, et al., "Comprehensive Evaluation of GaN GIT in Low- and High-Frequency Bridge Leg Applications," *IPMC-ECCE Asia*, pp. 21-30, May 2016.
- [3] S. Nagai, et al., "A DC-Isolated Gate Drive IC with Drive-by-Microwave Technology for Power Switching Devices," *ISSCC*, pp. 404-406, Feb. 2012.
- [4] A. Seidel, et al., "A 1.3A Gate Driver for GaN with Fully Integrated Gate Charge Buffer Capacitor Delivering 11nC Enabled by High-Voltage Energy Storing," *ISSCC*, pp. 432-433, Feb. 2017.
- [5] M. K. Song, et al., "A 20V 8.4W 20MHz Four-Phase GaN DC-DC Converter with Fully On-Chip Dual-SR Bootstrapped GaN FET Driver Achieving 4ns Constant Propagation Delay and 1ns Switching Rise Time," *ISSCC*, pp. 302-303, Feb. 2015.
- [6] X. Ke, et al., "A 3-to-40V 10-to-30MHz Automotive-Use GaN Driver with Active BST Balancing and VSW Dual-Edge Dead-Time Modulation Achieving 8.3% Efficiency Improvement and 3.4ns Constant Propagation Delay," *ISSCC*, pp. 302-304, Feb. 2016.



Figure 24.2.1: Concepts for multiple gate-drive schemes including the proposed three-level gate driver.



Figure 24.2.2: Schematic of the proposed gate driver.



Figure 24.2.3: Operation principle of  $C_{HV}$  charge pump, GIT driving and characterization of gate charge / current.



Figure 24.2.4: Transient measurements of the gate driver with three-level gate voltage operation.



Figure 24.2.5:  $V_{DRV}$  Charge Pump circuit and its influence on driver performance at start-up / long off-time.

|                               | ISSCC'12 [3]  | ISSCC'15 [5]           | ISSCC'16 [6]            | ECCE Asia'16 [2]                | ISSCC'17 [4]          | This Work                                    |
|-------------------------------|---------------|------------------------|-------------------------|---------------------------------|-----------------------|----------------------------------------------|
| Technology                    | GaN/ Sapphire | Si, 0.35 $\mu$ m BCD   | Si, 0.35 $\mu$ m BCD    | Discrete                        | Si, 180nm BCD         | Si, 180nm BCD                                |
| Max. $I_{drv}$                | 30mA          | n.r.*                  | n.r.*                   | n.r.*                           | 1.3A                  | 1.5A                                         |
| Bipolar gate voltage          | Yes           | No                     | No                      | Yes                             | No                    | Yes                                          |
| Three-level gate voltage      | No            | No                     | No                      | No                              | No                    | Yes                                          |
| Support of GaN GIT switches   | Yes           | No                     | No                      | Yes                             | Yes                   | Yes                                          |
| Buffer capacitors             | 0             | n.r.*                  | n.r.*                   | 10nF, 2 <sup>nd</sup> cap n.r.* | 1.7nF, 0.6nF          | 3.6nF ( $V_{DRV}$ ), 2x0.6nF ( $V_{HVN,p}$ ) |
| Max. gate charge $Q_G$        | N/A           | 133pC ***              | 133pC ***               | N/A                             | 11nC                  | 11.6nC                                       |
| Buffer capacitor area         | 0             | 0.15mm <sup>2</sup> ** | 0.718mm <sup>2</sup> ** | N/A                             | 1.4mm <sup>2</sup>    | 2.9mm <sup>2</sup>                           |
| $Q_G$ / buffer capacitor area | N/A           | 0.9nC/mm <sup>2</sup>  | 0.19nC/mm <sup>2</sup>  | N/A                             | 7.6nC/mm <sup>2</sup> | 4nC/mm <sup>2</sup>                          |
| Max. $V_{SW}$ slew rate       | n.r.*         | 20V/1ns                | 40V/1.5ns               | 400V/2ns                        | n.r.*                 | 80V/3ns                                      |
|                               |               |                        |                         | 40V/1.2ns                       |                       | 80V/2.8ns                                    |

\* not reported, \*\* extracted from chip micrograph, \*\*\* datasheet EPC8002, \*\*\*\* extracted from graph



Figure 24.2.7: Die micrograph.

### 24.3 A 3-to-40V $V_{IN}$ 10-to-50MHz 12W Isolated GaN Driver with Self-Excited $t_{dead}$ Minimizer Achieving 0.2ns/0.3ns $t_{dead}$ , 7.9% Minimum Duty Ratio and 50V/ns CMTI

Xugang Ke, D. Brian Ma

University of Texas at Dallas, Richardson, TX

High-frequency ( $f_{SW}$ ), wide-input ( $V_{IN}$ ) power converters have gained increasing popularity in automotive applications due to the heightening demand for low profile, high power density and fast dynamic response [1]. To implement such converters, it is widely believed that Gallium Nitride (GaN) technology would replace conventional silicon technology in the future due to far-superior figures of merit ( $R_{DS(on)} \times Q_G$ ) [2,3]. Hence, there are urgent needs in developing compatible GaN driving techniques and circuits that enable high efficiency, high conversion ratio (CR) and high reliability power conversion at high  $f_{SW}$ . Among many challenges in accomplishing these goals, gate driver dead-time ( $t_{dead}$ ) control and ground-interference isolation require the most immediate attention.

Figure 24.3.1 shows a conventional GaN gate driver with a fixed  $t_{dead}$  control [4,5]. It senses the gate signal  $V_{GH}$  of high-side switch  $M_H$  before turning on the low-side switch  $M_L$  to prevent  $M_H$  and  $M_L$  from catastrophic current shoot-through. However, this causes a long  $t_{dead}$  at the switching node  $V_{SW}$  falling edge, contributed by the down-level shifter delay (marked as ④), the  $t_{dead}$  generator delay (②) and the  $M_L$  driver stage delay (⑤). Similarly, the delays of the  $t_{dead}$  generator (①), the up-level shifter (③) and the  $M_H$  driver stage (⑥) lead to the  $t_{dead}$  at the  $V_{SW}$  rising edge. This excessively long  $t_{dead}$  incurs a high reverse conduction loss in  $M_L$  (absence of body diode), degrading efficiency dramatically as  $f_{SW}$  increases. A design in [6] proposed an adjustable  $t_{dead}$  control. However, due to the delay mismatch of rising and falling edges,  $t_{dead}$  is not minimized and the effective on-time ( $T_{ON}$ ) is reduced ( $T_{ON} < t_{ON,PWM}$ ). As  $t_{ON,PWM}$  is pushed as short as the total delay of ④ and ① in Fig. 24.3.1,  $M_H$  can no longer be turned on. This limits the minimum  $t_{ON,PWM}$  that can pass to  $V_{GH}$ . Furthermore, because of circuit delay and signal distortion, ON-duty times of the gate drive voltages  $V_{GH}$  and  $V_{GL}$  would deviate from the respective input  $V_H$  and  $V_L$ , causing regulation errors at the output  $V_o$ . More seriously, when operating for high CR, ON-duty time becomes very short as previously stated. For example, for a 40V-to-3.3V converter at 10MHz, the effective ON-duty time of  $V_{PWM}$  is only 8.25ns. On the other hand, automotive electronic devices share a common chassis ground involving large parasitic inductance/resistance ( $L_{GND}, R_{GND}$ ) in the ground return trace. With a high level of DC return current from power loads ( $I_{load}$ ) and AC return current from switching converters, the voltage level of the local ground panel ( $V_{GND2}$ ) can be shifted significantly from  $V_{GND1}$  (tied to the cathode of the battery  $V_{IN}$ ). In high  $f_{SW}$ , high  $I_{load}$  operation,  $V_{IR} + V_{AC}$  could potentially surpass the supply voltage, imposing a transmission error onto  $V_{PWM}$  in Fig. 24.3.1. Meanwhile, as load power reaches tens of Watts, a large di/dt transient at  $M_L$  incurs large voltage bounce at the parasitic ( $L_{PP}, R_{PP}$ ) of PGND, causing significantly increased turn-on/off periods that lead to high power loss [7]. Lastly, with high  $V_{IN}$ , the high dv/dt transient on  $V_{SW}$  injects large coupling current into the level shifters through  $C_{par,ls}$ , which could cause false triggering at  $V_{GH}$ . Thus, it would be highly desirable if the ground of the gate drivers can be isolated to suppress the DC ground shift and AC ground bounce for system reliability.

Figure 24.3.2 illustrates the proposed GaN driver circuit. It consists of H- and L-channel ON-duty mirrors, two self-excited  $t_{dead}$  minimizers across the channels and a ground-isolated gate drive stage. To mitigate the  $t_{dead}$  challenge, the ON-duty mirrors and the  $t_{dead}$  minimizers play critical roles. The ON-duty mirrors are designed to adaptively map the ON-duty times of  $V_{GH}$  and  $V_{GL}$  from respective input  $V_H$  and  $V_L$ . Consider the L-channel ON-duty mirror in Fig. 24.3.2 as an example. When the ON-duty of  $V_{GL}$  is larger than that of  $V_L$  ( $t_{ON}$ ), the ON-duty mirror discharges a compensation capacitor  $C_{cmp}$ . This action reduces a charge current  $I_{d1}$  in the rising-edge generator, but increases a charge current  $I_{d2}$  in the falling-edge generator of the ON-duty mirror circuit. As  $I_{d1}$  and  $I_{d2}$  define the charge rates to modulate the dual edge delays, the decrease of  $I_{d1}$  creates a delay  $t_{rr}$  to postpone the rise of  $V_{GL}$ , while the increase of  $I_{d2}$  forces  $V_{GL}$  to drop earlier by a period of  $t_{fr}$ . The ON-duty of  $V_{GL}$  is thus narrowed until it matches with  $V_L$ . At startup, the ON-duty of  $V_{GL}$  is always modulated starting from zero to prevent shoot-through current in the GaN switches. Similarly, when the ON-duty time of  $V_{GL}$  is smaller than  $t_{ON}$ , the mirror circuit operates in an opposite way to extend the ON-duty time until it matches the  $t_{ON}$  of  $V_L$ . On the other hand, the  $t_{dead}$  minimizers minimize the  $t_{dead}$ s between the H- and L-channels. For example, as depicted in Fig. 24.3.2, in the  $V_{SW}$  falling edge  $t_{dead}$ , a pulse of  $t_{dm}$  is generated by

the D flip-flop in the circuit, which excites the following circuit to charge up the node  $V_E$ , and adds a delay time into  $V_{GH}$ . The operation repeats until the falling edge of  $V_{GH}$  meets the rising edge of  $V_{GL}$  and  $t_{dead}$  is minimized close to zero. A large resistor  $R_H$  (creates an offset) sets a minimum  $t_{dead}$ , and initiates the  $t_{dead}$  minimizer at startup. A similar operation applies to the  $V_{SW}$  rising edge  $t_{dead}$  minimizer.

To mitigate the damaging DC shift and AC voltage bounce from the chassis ground, Fig. 24.3.3 presents the  $M_H$  and  $M_L$  gate driver stages with on-die ground isolation and high-dv/dt-immune level shifters. Rather than having AGND and DGND share the same p-well potential, the driver stages are placed in separate p-wells isolated by high-voltage (HV) n-tubs from the AGND well. The ON-duty mirrors and the  $t_{dead}$  minimizers are placed in the AGND well, sharing the same ground  $V_{GND1}$  with the PWM controller, whereas PGND and DGND share the ground  $V_{GND2}$  ( $V_{ISO}=V_{GND2}-V_{GND1}$ ). To contend with high AC ground bounce at PGND, a source-return terminal (SR) of  $M_L$  is Kelvin-connected from the source terminal (S) and returns to DGND to ensure a reliable driving voltage  $V_{GS}$  for  $M_L$ . Moreover, four high-dv/dt-immune level shifters are designed to transmit control signals across the isolated wells. For an up-level shifter in Fig. 24.3.3, during the  $M_H$  turn-on,  $V_H$  and  $V_{H3}$  rise sequentially to trigger the turn-on of  $M_{p2}$ . Meanwhile, a narrow pulse is generated at  $V_{GN}$  to dramatically increase the current in  $M_{TN}$ .  $V_{H4}$  goes high with a short delay and sets the HV latch cell in the BST rail. In the instant of  $M_H$  turn-on, the BST rail rises quickly, triggering a high coupling current to charge the drain nodes of the HV drain-extended NMOS  $M_{d1}$  and  $M_{d2}$ . Two damping resistors  $R_{dr1}, R_{dr2}$  are placed in series with the diode-connected  $M_{p1}, M_{p2}$ , limiting  $V_{SW}$  dv/dt-induced current from being injected to the HV latch cell to improve common-mode transient immunity (CMTI). When  $V_H$  goes low, the down-level shifter operates in a similar manner to switch off  $M_H$ . Similar switching operations apply to the level shifters linked to low-side switch  $M_L$  as well.

A prototype of the GaN driver was fabricated in a 0.35μm HV BCD process. Two discrete enhancement-mode GaN FETs (EPC8009) were employed as power switches. The converter supports a wide  $V_{IN}$  ranging from 3V to 40V, with a programmable  $f_{SW}$  from 10MHz to 50MHz and a maximum output power of 12W. The quiescent current of the driver is 2.3mA. Figure 24.3.4 shows the measured minimized  $t_{dead}$ , ON-duty time, and  $V_{SW}$  dv/dt transient. Thanks to the proposed ON-duty mirrors and the  $t_{dead}$  minimizers,  $t_{dead}$  for the  $V_{SW}$  rising edge is reduced from 10.8ns to 0.2ns, whereas  $t_{dead}$  for the  $V_{SW}$  falling edge drops from 8.4ns to 0.3ns. A minimum duty ratio of 7.9% is observed. As shown in Fig. 24.3.5, in response to a -3V to 30V DC ground shift, 1.2V AC ground bounce, and 50V/ns  $V_{SW}$  dv/dt transient,  $V_{GH}$  and  $V_{GL}$  maintain robust and valid switching states owing to the proposed ground isolation technique. In Fig. 24.3.6, the efficiency peaks at 89.1% for the 12V-to-8V conversion at 10MHz, and stays above 74% over 90% of the full 12W power range. Compared to the prior arts [4-6] in the table,  $t_{dead}$  has been reduced by more than 18x, at 1.6x higher  $f_{SW}$ . It accomplishes 5x higher  $t_{dm}$  in comparison with [4], and supports an on-die isolated gate driving at over 1.4x faster  $V_{SW}$  slew rate. The die micrograph is shown in Fig. 24.3.7, with a die area of 0.61mm<sup>2</sup>.

#### Acknowledgement:

The authors would like to thank Texas Instruments for IC fabrication support.

#### References:

- [1] Texas Instruments Application Notes: "Automotive Wide VIN DC/DC, Power Solutions for Emerging Applications."
- [2] B. J. Baliga, "Gallium Nitride Devices for Power Electronic Applications", vol. 28, no. 7, *Semicond. Sci. & Technol.*, pp. 1-8, 2013.
- [3] A. Lidow, et al., "GaN Transistors for Efficient Power Conversion," 2<sup>nd</sup> Ed, John Wiley & Sons, West Sussex, UK, 2015.
- [4] Efficient Power Conversion Application Note: AN015, "Introducing a Family of eGaN FETs for Multi-Megahertz Hard Switching Applications," Accessed: 2014. [online] Available: <http://epc-co.com/epc/documents/product-training/AN105eGaNFETsforMulti-MegahertzApplications.pdf>.
- [5] M. K. Song, et al., "A 20V 8.4W 20MHz Four-Phase GaN DC-DC Converter with Fully On-Chip Dual-SR Bootstrapped GaN FET Driver Achieving 4ns Constant Propagation Delay and 1ns Switching Rise Time," *ISSCC*, pp. 302-303, Feb. 2015.
- [6] X. Ke, et al., "A 3-to-40V 10-to-30MHz Automotive-Use GaN Driver with Active BST Balancing and  $V_{SW}$  Dual-Edge Dead-Time Modulation Achieving 8.3% Efficiency Improvement and 3.4ns Constant Propagation Delay," *ISSCC*, pp. 302-303, Feb. 2016.
- [7] X. Ke, et al., "A 10MHz 3-to-40V VIN Tri-Slope Gate Driving GaN DC-DC Converter with 40.5dBpV Spurious Noise Compression and 79.3% Ringing Suppression for Automotive Applications," *ISSCC*, pp. 430-431, Feb. 2017.



Figure 24.3.1:  $t_{dead}$  and ground interference challenges in a conventional high-voltage, high- $f_{SW}$  GaN driver.



Figure 24.3.2: Schematic and operation of ON-duty mirror and self-excited  $t_{dead}$  minimizer.



Figure 24.3.3: On-die ground-isolated gate driver with high dv/dt-immune level shifters.



Figure 24.3.4: Measured ON-duty times, minimized  $t_{dead}$ , and  $V_{SW}$  dv/dt transient in GaN gate driver.



Figure 24.3.5: Measured transient behaviors of  $V_{GH}$  and  $V_{GL}$  with on-die ground-isolated gate driving.



Figure 24.3.6: Efficiency plot and performance comparison.



Figure 24.3.7: Die micrograph.

# Session 25 Overview:

## *Clock Generation for High-Speed Links*

### WIRELINE SUBCOMMITTEE



**Session Chair:**  
**Roberto Nonis**  
Infineon, Villach, Austria



**Associate Chair:**  
**Pavan Hanumolu**  
University of Illinois,  
Urbana-Champaign, Urbana, IL

**Subcommittee Chair:** **Frank O'Mahony**, Intel, Hillsboro, OR

Clock generation circuits are ubiquitous building blocks in all electronic systems and are the fundamental performance limiters in many of them. This session covers the latest advances in clock generation for high-speed links. The first paper addresses a precision quadrature generator in the latest CMOS process, making use of injection-locking techniques. The second paper presents a technique for generating high-frequency reference clocks by quadrupling the frequency of commonly used, low-cost, crystal oscillators. The third paper demonstrates a fractional PLL that uses reference clock dithering and calibrated dither cancellation in the feedback loop to effectively attenuate fractional spurs. And the final paper describes a digital ring PLL that uses a fast phase correction method and proportional pulse calibration to reduce jitter.



10:15 AM

**25.1 A 4-to-16GHz Inverter-Based Injection-Locked Quadrature Clock Generator with Phase Interpolators for Multi-Standard I/Os in 7nm FinFET**

*S. Chen, Xilinx, San Jose, CA*

In Paper 25.1 Xilinx presents a combined injection-locked quadrature clock generator and phase interpolator in a 7nm FinFET process. The design achieves a continuous range from 4 to 16GHz with less than 1° quadrature phase error and INL of the phase interpolator <1.5LSB at 16GHz, while dissipating 22.4mW.



10:45 AM

**25.2 A 5GHz 370fs<sub>rms</sub> 6.5mW Clock Multiplier Using a Crystal-Oscillator Frequency Quadrupler in 65nm CMOS**

*K. M. Megawer, University of Illinois, Urbana, IL*

In Paper 25.2. the University of Illinois presents a crystal oscillator frequency quadrupler in cascade with a 5GHz injection-locked multiplier in 65nm CMOS. The quadrupler and the injection locked multiplier achieve 129fs<sub>rms</sub> and 366fs<sub>rms</sub> integrated jitter, respectively, while consuming 1.45mW and 5mW.



11:15 AM

**25.3 A Fractional-N Digital PLL with Background-Dither-Noise-Cancellation Loop Achieving <-62.5dBc Worst-Case Near-Carrier Fractional Spurs in 65nm CMOS**

*C-R. Ho, University of Southern California, Los Angeles, CA*

In Paper 25.3, the University of Southern California presents a 3-to-5GHz fractional-N digital PLL with a spur reduction technique implemented in 65nm CMOS. The proposed method uses dithering of the reference clock to mitigate the fractional spurs and performs dither cancellation in background. The PLL measures >40dB spur improvement and achieves a worst-case fractional spur of -62.5dBc.



11:30 AM

**25.4 A -242dB FOM and -75dBc-Reference-Spur Ring-DCO-Based All-Digital PLL Using a Fast Phase-Error Correction Technique and a Low-Power Optimal-Threshold TDC**

*T. Seong, Ulsan National Institute of Science and Technology, Ulsan, Korea*

In Paper 25.4, Ulsan National Institute of Science and Technology (UNIST) presents a fast phase-error correction technique that reduces the impact of oscillator noise on PLL output jitter. Using a TDC with optimally spaced time thresholds, the prototype DPLL achieves 320fs<sub>rms</sub> jitter with an FOM of -242dB.

## 25.1 A 4-to-16GHz Inverter-Based Injection-Locked Quadrature Clock Generator with Phase Interpolators for Multi-Standard I/Os in 7nm FinFET

Stanley Chen, Lei Zhou, Ian Zhuang, Jay Im, Didem Melek, Jinyung Namkoong, Mayank Raj, Jaewook Shin, Yohan Frans, Ken Chang

Xilinx, San Jose, CA

As ever-increasing bandwidth demand pushes wireline transceiver data-rates beyond 25Gb/s, the clocking solution for multi-protocol support over a wide range of data-rates becomes a key design challenge. In [1], an injection-locked multi-phase clock generator demonstrated wideband operation and a high-resolution phase rotator using CML in 28nm FDSOI CMOS. However, in 7nm FinFET technology, the CML implementation suffers from the reduced supply level and output impedance degradation at high temperatures. In order to scale power consumption with data-rate, CML implementation also needs to employ bias current and load programmability, further impacting its performance. For these reasons, the supply-regulated inverter-based clocking scheme is proposed. Furthermore, the full inverter-based clock chain generates smaller random jitter (RJ) because of the faster edge-rate compared to a CML implementation. Supply regulation, applied as part of the calibration loop, mitigates the sensitivity to inverter delay and edge-rate over the process and temperature variations. This design, benefiting from its mostly-digital structure, adopts "sea of gates" layout style with optimized via patterns and uniform metal tracks, which effectively alleviate the significant parasitic resistance variations on low level metals fabricated by multiple patterning in 7nm FinFET.

To achieve a wide frequency range and to scale the power consumption with the data-rate, the proposed architecture uses only inverter-based circuits after an on-chip clock source (Fig. 25.1.1). Only a single differential phase (I-Phase) clock is distributed to the local injection-locked ring oscillator (ILRO). The ILRO generates the multi-phase clocks for the phase interpolators on each channel. The phase interpolator is used to rotate the multi-phase clocks based on the transmitter or receiver front-end usage. The proposed architecture can support half-rate NRZ transceivers up to 32Gb/s or PAM-4 transceivers up to 64Gb/s.

As illustrated in Fig. 25.1.2, the ILRO consists of a ring oscillator made of four identical pseudo-differential delay cells, injection locked by an input clock at frequency  $f_{inj}$ . To accommodate the wideband operation of 4 to 16GHz, while avoiding large Kvco and false sub-harmonic locking, the ILRO breaks the operation range into three bands. In the low-frequency band (4-7GHz), the direct path without the feedforward inverters is used. At the two higher-frequency bands (7-12GHz, and 12-16GHz), the segmented feedforward path [2] is turned on to boost the oscillation frequency,  $f_{osc}$ . The injection buffer consists of two stages. The first level-shifting AC-coupled self-biased inverter stage reduces the duty-cycle distortion from the injected clock, which directly induces uncorrectable residual phase error in the output clocks. The second programmable buffer stage adjusts the injection strength, which affects injection bandwidth, locking range and output clock phase noise. The four differential output phases from ILRO are uniformly spaced by exactly  $\pi/4$  ( $45^\circ$ ) only if the oscillator is locked to the injection clock ( $f_{inj}=f_{osc}$ ). Otherwise a relative phase error, proportional to the frequency mismatch, is introduced. The quadrature phase error was shown to contain frequency error information [4]. If the initial loop  $V_{ctrl}$  is far from its final locking value, the ILRO may take a longer time to settle, or it may settle to incorrect frequencies. In order to solve this issue, a coarse frequency-tracking loop (FTL) and a fine quadrature-locked loop (QLL) are used to achieve the wide locking range while maintaining low quadrature phase error.

As shown in Fig. 25.1.2, during initial coarse frequency tracking, the off-chip control logic searches for the correct band settings of the ILRO and sweeps the on-chip FTL DAC by comparing the injection frequency with the free-running frequency of the ILRO. The settled 5b FTL DAC code sets  $V_{ctrl}$  of ILRO and brings the free-running frequency within the locking range of the fine QLL. Then the fine QLL is enabled to further lock the frequency of ILRO by quadrature phase error cancellation [4]. The fine QLL consists of a quadrature phase detector (QPD), a V-to-I converter, a loop filter, and a voltage regulator. An XOR-based QPD takes eight clock phases for sensing the instantaneous deviation from quadrature between pairs of clock signals. The resistors in the QPD are carefully optimized to reduce the loading effect added to the ring oscillator without excessively

attenuating the detected phase error signals. The outputs,  $V_{det\_p}$  and  $V_{det\_n}$ , are averaged out by the V-to-I converter and the loop filter to generate  $V_{ctrl}$ . To suppress the supply noise, a voltage regulator is used to regulate the ILRO supply,  $V_{reg\_ILRO}$ , which tracks  $V_{ctrl}$ .

The measured ILRO free-running frequencies over all three bands are shown in Fig. 25.1.3. The locking range indicates  $>+/-10\%$  of target frequencies which relaxes the resolution specification of the coarse FTL DAC. The phase noise of the free-running ILRO, and the quadrature-locked ILRO demonstrates the effective phase noise suppression from injection locking and the fine analog QLL, showing the phase noise of -136.22dBc/Hz and -137.9dBc/Hz at 10MHz and 1GHz offset, when characterized using a clean off-chip clock at 16GHz. The integrated RJ from 100kHz to 8GHz of the ILRO is 105fs<sub>rms</sub>. The transient I/Q clock waveforms from the oscilloscope show less than 1° I/Q phase error at 16GHz.

The 7b inverter-based CMOS PI, shown in Fig. 25.1.4, consists of two dual-core units to generate pseudo-differential outputs, where each unit hosts two PI core slices for generating the single-ended data clock (Dclk) and edge clock (Xclk), which are nominally 90° apart for the 2x oversampling CDR architecture. Each PI core slice consists of four stages: octant selection mux, slew-rate controller, phase mixer, and level-shifter. The PI takes eight clock phases from the ILRO. The slew rate control adjusts the slope of the clock edges accordingly by the band settings. The 3 MSBs of the PI code are decoded into one-hot format for selecting the octant [3]. The remaining 4 LSBs are decoded into 16b thermometer codes for controlling the mixer weighting. The PI operates under the regulated supply,  $V_{reg\_PI}$ , which tracks  $V_{ctrl}$  of the ILRO. The mixer outputs drive AC-coupled self-biased inverters to level-shift the clocks to the full CMOS swing. The measured PI transfer function curves, DNL and INL curves over three bands are shown in Fig. 25.1.5. The DNL and INL are 0.87 and 1.44 LSB respectively at 16GHz and the worst cast INL is 2.58 LSB at 7GHz.

Figure 25.1.6 shows the phase noise measurement for the full on-chip clock chain with a dual-mode wide-band transformer-based LCVCO [6] used as a clock source. The integrated jitter at the output of the full clock chain for 100kHz-to-1GHz and 100kHz-to-8GHz are 80fs, and 143fs, respectively, assuming 3MHz bandwidth of first-order filtering from the CDR and/or PLL is applied. The ILRO and PI contribute only an additional 5fs in output jitter integrated from 100kHz to 1GHz. The noise floor is below -135dBc/Hz at 1GHz offset and beyond. The performance comparison with prior art is shown in Fig. 25.1.6. Measured total power of the full clock path at 16GHz (including LCVCO, one ILRO, and D/X PIs) is 48mW. The ILRO and one PI consume 22.4mW, a 15% improvement in power efficiency and >40% improvement in RJ compared with [1]. This prototype is fabricated in 7nm FinFET technology with an active area of 0.105mm<sup>2</sup>. Figure 25.1.7 shows the chip microphotograph and the power breakdown.

### Acknowledgment:

The authors would like to thank Xilinx Serdes team for their contributions in circuit design, chip bring-up, and measurements.

### References:

- [1] E. Monaco, et al., "A 2–11 GHz 7-Bit High-Linearity Phase Rotator Based on Wideband Injection-Locking Multi-Phase Generation for High-Speed Serial Links in 28-nm CMOS FDSOI," *IEEE JSSC*, vol.52, no.7, pp. 1739–1752, July 2017.
- [2] K. Chang, et al., "A 0.4–4-Gb/s CMOS quad transceiver cell using on-chip regulated dual-loop PLLs," *IEEE JSSC*, vol.38, no.5, pp. 747–754, May 2003.
- [3] M. Chen, et al., "A 0.1–1.5 GHz 8-bit Inverter-Based Digital-to-Phase Converter Using Harmonic Rejection," *IEEE JSSC*, vol.48, no.11, pp. 2681–2692, Nov. 2013.
- [4] M. Raj, et al., "A 4-to-11GHz Injection-Locked Quarter-Rate Clocking for an Adaptive 153fJ/b Optical Receiver in 28nm FDSOI CMOS," *ISSCC*, pp. 404–406, Feb. 2014.
- [5] J. Chien, et al., "A Pulse-Position-Modulation Phase-Noise-Reduction Technique for a 2-to-16GHz Injection-Locked Ring Oscillator in 20nm CMOS," *ISSCC*, pp. 52–53, Feb. 2014.
- [6] M. Raj, et al., "A 7-to-18.3GHz compact transformer based VCO in 16nm FinFET," *IEEE Symp. VLSI Circuits*, pp.1-2, June 2016.



Figure 25.1.1: Block diagram of the wide-range multi-phase clock generation for serial links using an ILRO and CMOS PI.



Figure 25.1.2: Circuit Implementation of the ILRO with a passive QPD and coarse/fine tuning loops.



Figure 25.1.3: Measured ILRO free-running frequency, locking range, phase noise and quadrature phase error.



Figure 25.1.4: CMOS PI with 7b phase resolution for quadrature clock generation Dclk and Xclk.



Figure 25.1.5: Measured PI delay vs. code curve, DNL and INL over 16GHz, 12GHz, 7GHz, and 4GHz.



Figure 25.1.6: Measured phase noise of the full clock chain at 16GHz and the performance comparison table.



Figure 25.1.7: Chip micrograph and power breakdown.

## 25.2 A 5GHz 370fs<sub>rms</sub> 6.5mW Clock Multiplier Using a Crystal-Oscillator Frequency Quadrupler in 65nm CMOS

Karim M. Megawer, Ahmed Elkholy, Daniel Coombs, Mostafa G. Ahmed, Ahmed Elmallah, Pavan Kumar Hanumolu

University of Illinois, Urbana, IL

Phase noise performance of ring-oscillator-based (RO-based) clock multipliers is typically limited by oscillator noise. The most power-efficient method for improving the phase noise of such clock multipliers is by increasing the oscillator noise suppression bandwidth ( $F_{BW}$ ). While  $F_{BW}$  depends on the type of clock multiplier, the maximum achievable  $F_{BW}$  is limited by the reference frequency ( $F_{REF}$ ). For instance, in phase-locked loops (PLLs)  $F_{BW} = F_{REF}/10$ , while multiplying delay-locked loops (MDLLs) [1] and injection-locked clock multipliers (ILCMs) [2] can achieve  $F_{BW}$  of  $F_{REF}/4$  and  $F_{REF}/6$ , respectively. Exploiting this behavior, the MDLL in [1] and the ILCM in [2] achieved excellent performance at the expense of using a high-frequency low-noise reference (REF) clock and a small multiplication factor ( $N < 10$ ). One promising way to reduce  $F_{REF}$  in MDLLs/ILCMs involves increasing the injection rate by using both the positive and negative edges of the REF clock [3, 4] but at the cost of making jitter/spurious performance susceptible to duty cycle errors in the REF clock. While [3] demonstrated an effective means to correct such errors, it still needed a relatively high  $F_{REF}$  of 125MHz. In view of this, we present a method to quadruple the frequency of a conventional 54MHz Pierce XO and demonstrate its application using an RO-based ILCM achieving less than 370fs<sub>rms</sub> integrated jitter at a 5GHz output. The proposed quadrupler acts as a low noise XO frequency multiplier and can be used to increase the bandwidth of MDLLs and ring/LC-based integer- or fractional-N PLLs also.

Figure 25.2.1 shows a conceptual block diagram of the proposed XO quadrupler and illustrates its operation using idealized waveforms. Assuming the XO output,  $V_{XO}$ , to be a sine wave of amplitude  $V_A$ , we first generate two signals  $V_{1,2}$ , with 25% duty cycles by slicing  $V_{XO}$  with two comparators whose threshold voltages,  $V_{TH1,2}$ , are set to  $\pm V_A/\sqrt{2}$ . This results in  $V_{1,2}$  being separated by half the XO period ( $T_{XO}/2$ ) and EXOR-ing them produces a 50% duty cycle clock,  $V_3$ , at twice the XO frequency ( $2F_{XO}$ ). We then double the frequency of  $V_3$  by EXOR-ing  $V_3$  with its delayed version, thereby ideally quadrupling the XO frequency with minimal jitter degradation. However, in practice, any changes in the waveform shape or amplitude of  $V_{XO}$ , along with deviations in  $V_{TH1,2}$  and path mismatches introduced by inevitable slicer/XOR imperfections manifest as duty cycle errors, which directly appear as jitter at the output (see Fig. 25.2.1). For instance, denoting the duty cycle of  $V_1$  as  $D_1 = 0.25 + \Delta T_1/T_{XO}$ , reveals that duty cycle error ( $\Delta T_1/T_{XO}$ ) appears as output jitter of the form  $[\Delta T_1, -\Delta T_1, 0, 0, \dots]$  that has a period of  $T_{XO}$ . Similarly, duty cycle errors of  $V_2$  ( $\Delta T_2/T_{XO}$ ) result in periodic output jitter of the form  $[0, 0, \Delta T_2, -\Delta T_2, \dots]$ . In addition to duty cycle errors in  $V_{1,2}$ , any change in the spacing between the negative edge of  $V_1$  and the positive edge of  $V_2$  from  $T_{XO}/4$ , causes the duty cycle of  $V_3$  to deviate from 50% ( $D_3 = 0.5 + 2\Delta T_3/T_{XO}$ ), which introduces additional period jitter in the following manner:  $[\Delta T_3, \Delta T_3, -\Delta T_3, -\Delta T_3, \dots]$ . Because all the errors in the quadrupler ( $\Delta T_{1,2,3}$  and others, if any) appear as period jitter with a unique signature as described above, we surmise that each of the errors can be separately detected, by measuring the quadrupler period, and corrected as described later (see Fig. 25.2.3).

Figure 25.2.2 shows the complete clock multiplier. It is composed of the XO quadrupler, an ILCM, and background calibration circuitry that corrects duty cycle errors. The XO quadrupler consists of a single-ended low-noise Pierce XO whose output,  $V_{XO}$ , is split into two ac-coupled paths and sliced at  $V_{TH1,2}$  to produce  $V_{1,2}$ . Instead of using comparators that compare  $V_{XO}$  with  $V_{TH1,2}$ , simple low noise CMOS inverters  $I_{1,2}$  with a fixed threshold voltage of  $V_{TH}$  are used and the desired slicer levels,  $V_{TH1,2}$ , are realized by level shifting  $V_{XO}$  by  $V_{B1,2}$ . A least mean square (LMS) algorithm is used to adaptively set  $V_{B1,2}$ . A calibration unit running in the background implements the LMS algorithm and its outputs,  $D_{CAL1,2}$ , are converted to voltages,  $V_{B1,2}$ , using high-resolution  $\Delta\Sigma$  DACs. Compared to [5], which uses a differential XO, skewed inverters built with zero-V<sub>T</sub> and high-V<sub>T</sub> MOS devices as variable-threshold comparators, and an analog duty cycle correction scheme that is itself susceptible to PVT variations, the proposed quadrupler produces 2X lower noise, is more robust, and can work with standard single-ended XOs and regular V<sub>T</sub> transistors.

The ILCM consists of an RO that is injection-locked to the quadrupler output by injecting narrow pulses similar to [3]. A frequency tracking loop (FTL) based on pulse-gating logic (PGL) [3] is used to keep the free-running frequency of the RO close to the desired output frequency to guarantee low jitter and good spurious performance in the presence of PVT variations.

Figure 25.2.3 shows the details of calibration. Leveraging the fact that all of the errors appear in the form of period jitter with well-defined patterns, each of the errors is detected and corrected using its corresponding error template. Period error is detected using a bang-bang phase detector (PD) and is correlated with two different error template sequences  $T_1[k] = [1, -1, 0, 0, \dots]$  and  $T_2[k] = [0, 0, 1, -1, \dots]$  corresponding to  $\Delta T_1$  and  $\Delta T_2$  errors, respectively. The result is accumulated and scaled to generate the calibration signals,  $K_{CAL1,2}$ , that are used to adjust  $V_{B1,2}$ . Note that changing  $V_{B1,2}$  alters the effective  $V_{TH1,2}$  and changes duty cycles  $D_{1,2}$  as desired. Because error due to  $\Delta T_3$  cannot be corrected via  $V_{B1,2}$ , it is corrected separately using a digitally controlled delay line (DCDL<sub>CAL</sub>) whose delay control word is generated by correlating the PD output with a  $T_3[k] = [1, 1, -1, -1, \dots]$  sequence as shown in Fig. 25.2.3. While it is possible to correct for  $\Delta T_{1,2}$  errors also using DCDL<sub>CAL</sub>, it would require a prohibitively large delay range (covering  $\pm 10\%$  error in  $D_{1,2}$  since PVT would require a delay range of 2ns). However, DCDL<sub>CAL</sub> provides better than sub-ps resolution and is therefore better suited for precise error correction compared to  $V_{B1,2}$  tuning. In view of this, the  $\Delta T_{1,2}$  errors are coarsely corrected through  $V_{B1,2}$  and fine correction is performed using the DCDL<sub>CAL</sub>. A dead-zone comparator with hysteresis is used to seamlessly transfer control between the coarse and fine tuning paths.

A prototype consisting of an XO, a quadrupler, and an ILCM, along with the calibration circuitry, is implemented in a 65nm CMOS process. Measured performance of the quadrupler is reported in Fig. 25.2.4. Measuring quadrupler output phase noise accurately is difficult because of its narrow duty cycle [5]. So, the doubler output ( $V_3$  in Fig. 25.2.2) phase noise is shown instead, where the integrated jitter (10kHz to 10MHz) is 77fs<sub>rms</sub>. A sensitivity plot of the quadrupler spur performance to duty cycle error reveals that spurious tones as large as -45dBc occur at  $V_3$  when  $V_{B1,2}$  are slightly detuned. Measured noise and spurious performance at a 4.752GHz output frequency is shown in Fig. 25.2.5. The integrated jitter (10kHz to 30MHz) is 366fs<sub>rms</sub>. When the calibration loop is turned off spurs at  $F_{XO}$  and  $2F_{XO}$  are at -27.3dBc and -28.8dBc and they reduce to -53dBc and -51.8dBc, respectively, when the calibration loop is running in the background. The total power consumption is 6.5mW. A detailed power breakdown along with a performance summary and comparison tables of the quadrupler and the entire clock multiplier are shown in Fig. 25.2.6. Compared to [5], the proposed quadrupler achieves more than 2X improvement in noise performance while consuming 3X less power. The clock multiplier achieves comparable FoM to some of the best reported designs while operating with the lowest  $F_{REF}$  and the largest multiplication factor. The die micrograph is shown in Fig. 25.2.7 and the active area is 0.16mm<sup>2</sup>.

### Acknowledgement:

This work was in part supported by Analog Devices. We thank Mentor Graphics for providing the Analog Fast Spice (AFS) simulator.

### References:

- [1] A. Elshazly, et al., "Clock multiplication techniques using digital multiplying delay-locked loops," *IEEE JSSC*, vol. 48, no. 6, pp. 1416-1428, June 2013.
- [2] S. Choi, et al., "A 185fs<sub>rms</sub>-integrated-jitter and -245dB PVT-robust ring-vco-based injection-locked clock multiplier with a continuous frequency-tracking loop using a replica-delay cell and a dual-edge phase detector," *ISSCC*, pp. 194-195, Feb. 2016.
- [3] D. Coombs, et al., "A 2.5-to-5.75GHz 5mW 0.3psrms-jitter cascaded ring-based digital injection-locked clock multiplier in 65nm CMOS," *ISSCC*, pp. 152-153, Feb. 2017.
- [4] H. Kim, et al., "A 2.4GHz 1.5mW digital MDLL using pulse-width comparator and double injection technique in 28nm CMOS," *ISSCC*, pp. 328-329, Feb. 2016.
- [5] M. M. Ghahramani, et al., "A 192MHz differential XO based frequency quadrupler with sub-picosecond jitter in 28nm CMOS," *IEEE RFIC*, pp. 59-62, May 2015.



Figure 25.2.1: Principle idea and conceptual block diagram of the proposed XO quadrupler.



Figure 25.2.2: Block diagram of the complete clock multiplier.



Figure 25.2.3: Block diagram of the background calibration unit.



Figure 25.2.4: Measured phase noise and spur level sensitivity of the quadrupler.



Figure 25.2.5: Measured spur performance w/o and w/ calibration and phase noise after calibration of the whole clock multiplier.

|                                                                    | Elsawy<br>ISSCC'12 [1] | Kim<br>ISSCC'16 [4] | Elkholy<br>JSSC'16 | Choi<br>ISSCC'16 [2] | Coombs<br>ISSCC'17 [3] | This Work      |
|--------------------------------------------------------------------|------------------------|---------------------|--------------------|----------------------|------------------------|----------------|
| Architecture                                                       | MDLL                   | X2 + MDLL           | DPLL               | ILCM                 | X2 + ILCM              | XO + X4 + ILCM |
| Technology [nm]                                                    | 130                    | 28                  | 65                 | 65                   | 65                     | 65             |
| Freq. Range [GHz]                                                  | 1.5                    | 2.4                 | 2.0-5.5            | 0.96-1.44            | 2.5-5.75               | 2.6-5.2        |
| Ref. Freq. [MHz]                                                   | 375                    | 75                  | 50                 | 120                  | 125                    | 54             |
| Mult. Factor [N]                                                   | 4                      | 32                  | 40-110             | 8-12                 | 20-46                  | 48-96          |
| Ref. Spur [dBc]                                                    | -55.6                  | -51.4               | -44                | -53                  | -45                    | -53            |
| Output Jitter [fs_rms]                                             | [10k - 100MHz]         | [1k - 40MHz]        | [10k - 100MHz]     | [10k - 40MHz]        | [10k - 40MHz]          | [10k - 30MHz]  |
| Output Freq. [GHz]                                                 | 1.5                    | 2.4                 | 5                  | 1.2                  | 5                      | 4.752          |
| Power [mW]                                                         | 0.89                   | 1.51                | 4                  | 9.5                  | 5.3                    | 6.5            |
| Area [mm²]                                                         | 0.25                   | 0.024               | 0.084              | 0.06                 | 0.09                   | 0.16           |
| FoM <sub>1</sub> [dB]                                              | -248.5                 | -241.3              | -228.5             | -244.9               | -242.4                 | -240.5         |
| FoM <sub>2</sub> [dB]                                              | -254.4                 | -256.3              | -248.4             | -254.8               | -258.1                 | -260           |
| FoM <sub>1</sub> [dB] = $10\log_{10}(\sigma_{rms}^2 \cdot P_{mW})$ |                        |                     |                    |                      |                        |                |
| FoM <sub>2</sub> [dB] = $10\log_{10}(\sigma_{rms}^2 \cdot P_{mW})$ |                        |                     |                    |                      |                        |                |

| Ghahramani<br>RFIC'15 [5]   | This Work                                       |                                                 |
|-----------------------------|-------------------------------------------------|-------------------------------------------------|
| Technology [nm]             | 28                                              | 65                                              |
| Supply Voltage [V]          | 1.0                                             | 1.0                                             |
| XO                          | Differential @48MHz                             | Single-ended @54MHz                             |
| Current [mA]                | 5.5                                             | 1.45                                            |
| Area [mm²]                  | 0.045                                           | 0.07                                            |
| 2X CLK Phase Noise [dBc/Hz] | -139.8 @10kHz<br>-148.3 @100kHz<br>-151.9 @1MHz | -141.8 @10kHz<br>-154.1 @100kHz<br>-158.8 @1MHz |
| 2X CLK Jitter [fs_rms]      | 184 [10k - 10MHz]                               | 77 [10k - 10MHz]                                |



Figure 25.2.6: Performance comparison with state-of-the-art ring-based clock multipliers and frequency quadruplers.



Figure 25.2.7: Die micrograph.

## 25.3 A Fractional-N Digital PLL with Background-Dither-Noise-Cancellation Loop Achieving <-62.5dBc Worst-Case Near-Carrier Fractional Spurs in 65nm CMOS

Cheng-Ru Ho, Mike Shuo-Wei Chen

University of Southern California, Los Angeles, CA

Fractional-N digital phase-locked loops (DPLLs) are highly reconfigurable, scalable, and useful for synthesizing clocks with fine frequency resolution for modern RF, mixed-signal and digital VLSI systems. One critical design challenge is the associated fractional-N spurs, which result from the quantization error of the time-to-digital converter (TDC) or delta-sigma modulator when using multi-modulus fractional dividers. Those unwanted spurs can cause degraded deterministic jitter, reciprocal mixing, and undesired spectrum emissions, depending on the actual PLL application. Various techniques have been used to mitigate fractional spurs. One is to apply dithering [1–3] in the input reference path to randomize the periodic phase error pattern, hence reducing spur magnitude. However, the dithering signal also increases the phase noise floor. A two-level dithering scheme [3] has been adopted to limit noise degradation (but with less spur reduction), or a foreground noise-cancellation technique [2] can be applied at the cost of higher complexity without real-time voltage/temperature tracking ability. Alternatively, the technique in [4] directly cancels spurs and avoids elevating the noise floor, but it increases the logic complexity for near-carrier fractional spurs. To resolve those bottlenecks, we dither the input reference clock and leverage an adaptive dither-noise-cancellation loop operating continuously in the background. Thus, it allows for a larger dithering signal with minimal impact on the noise floor, leading to a more randomized phase error pattern and hence lower spur magnitude. The background operation compensates for PVT variation in real time (response time <0.9μs). It also uses relatively low logic complexity (~16k gates, 0.017mm<sup>2</sup> active area), especially when mitigating near-carrier fractional spurs. This proof-of-concept prototype in 65nm CMOS shows >40dB spur improvement and achieves a worst-case fractional spur of -62.5dBc at 1.83kHz frequency offset, lower than state-of-the-art PLLs applying dither approaches. The jitter and spur fluctuation after dither noise cancellation is within 2% and 2.5dB, respectively, when measured over 6 chips, supply voltage (+/-10%) and temperature (27°–60°C) variations, showing the technique's robustness.

The overall DPLL architecture is given in Fig. 25.3.1. Input-reference-path dithering is implemented via a tunable delay buffer chain controlled by a Hadamard-code generator to delay the reference clock phase before comparison with the LC-tank digitally controlled oscillator (LC-DCO) phase by  $D_{\text{dither}} \cdot G_{\text{DTC}}$ , where  $D_{\text{dither}}$  is the Hadamard dithering code and  $G_{\text{DTC}}$  is the digital-to-time converter (DTC) gain, i.e., 1 LSB of the DTC. A least-mean-square (LMS)-based background-dither-noise-cancellation loop is inserted before the digital loop filter (DLF) to cancel the induced dithering noise. The DLF output controls the oversampled delta-sigma modulator to toggle a bank of varactors in the LC-DCO, which helps reduce in-band frequency quantization noise. Afterwards, the injection-locked time-to-digital converter (IL-TDC) tracks DCO frequency and phase variation over PVT and provides 28 levels of fine phase quantization within one DCO period. Finally, the accumulated frequency control word (FCW) is subtracted at the TDC output for fractional-N operation.

Figure 25.3.2 shows causes of fractional spurs and the dithering scheme used in this work. The finite quantization error between the TDC output and accumulated FCW path results in a periodic saw-tooth error pattern. This error pattern then modulates the DCO and generates fractional spurs at the DPLL output. Since the IL-TDC subdivides one DCO period into 28 quantization levels, peak dithering amplitude is set above one DCO period so that the actual quantization may use any of the 28 quantization levels depending on the dither code. Moreover, the TDC's differential nonlinearity (DNL) error can be further randomized, i.e., it takes longer for the DNL pattern to repeat itself. However, it significantly raises the noise floor, necessitating the background-dither-noise-cancellation scheme, which will be discussed below. Assuming the dither-induced noise can be removed completely, it should be noted that the entire spurious tone energy spreads out in the spectrum, appearing as random noise confined within 1 LSB of TDC quantization. In other words, the fundamental noise floor limit should still be determined by the designed TDC resolution.

Figure 25.3.3 shows that  $D_{\text{dither}}$  is first converted into the phase domain as  $\phi_{\text{Dither}}$  and that it multiplies with a certain DTC transfer function  $H_{\text{DTC}}(z)$  via a tunable delay buffer. The same dithering information  $\phi_{\text{Dither}}$  is fed into an adaptive filter and its output is added after the FCW subtraction node,  $\phi_{\text{sum}}$ . For perfect cancellation, the adaptive filter should be designed such that the cancellation signal at the output of the adaptive filter injection node at  $\phi_e$ , and the injected dither noise at  $\phi_{\text{sum}}$ , should have matched magnitude. Therefore, we examine the transfer function from  $\phi_{\text{Dither}}$  to  $\phi_{\text{sum}}$  (denoted as  $H_{\text{Loop}}(z)$ ), which is a function of the DLF response, DTC, DCO, and TDC gain. According to the analytical derivation, it presents a high-pass transfer function and the frequency response can vary over PVT because gain values change. On activating the cancellation loop, the frequency response of the adaptive filter  $H_{\text{can}}(z)$  gradually approaches  $H_{\text{Loop}}(z)$ . The adaptive filter uses LMS algorithms that continuously minimize the mean square error at  $\phi_e$ . Thus, convergence speed and settling accuracy depend on the step size of the iterative loop. Background operation allows compensation of the frequency response variation due to PVT drift.

Figure 25.3.4 shows the implementation of the background-dither-noise-cancellation loop. The dither signal is generated with a Hadamard-code generator, creating a random sequence of 4-bit binary dither codes ( $D_{\text{dither}}$ ). They are further converted into the thermometer codes, controlling a bank of 16 unitary metal-oxide-metal (MOM) capacitors, and retimed to the falling edge of the reference clock. This retiming scheme keeps glitches (due to switching the delay of the CML buffer) away from the rising edges of the reference clock. In parallel, we remove the bias of  $D_{\text{dither}}$  and feed them into the LMS filter. The cancellation filter response is composed of two parts: 1) a fixed high-pass filter  $H_{\text{HFF}}(z)$  to duplicate the DPLL close-loop high-pass response; and 2) a two-tap adaptive FIR filter that adjusts gain and phase to compensate for the variation from the analog portion of the dithering circuit. The DPLL loop filter is deterministic and depends only on the programmed DLF parameters, i.e., proportional and integral path gain for type-II response, so it will not drift with PVT. Therefore, it is not necessary to adapt the high-pass filter coefficient of  $H_{\text{HFF}}(z)$  in real time. The coefficient of the two-tap FIR filter should be adjusted based on the product of the phase error,  $\phi_e$ , and output of  $H_{\text{HFF}}(z)$ , which leads to better convergence stability according to numerical simulations.

The prototype occupies an active area of 0.338mm<sup>2</sup> in 65nm CMOS and the overall power dissipation consumes 18.1mW, where the dither and noise removal circuits consume 0.6mW from a 1V supply. Figure 25.3.5 shows the snapshot of the near-carrier fractional spur at FCW of 120+2<sup>14</sup> before and after enabling the cancellation loop. In this setting, the equivalent integrated RMS jitter of 1.78ps is measured before and after cancelling the dither noise, which demonstrates the complete cancellation on the introduced dither noise. The phase noise profile of the DPLL operated at FCW of 120+2<sup>10</sup> is shown in Fig. 25.3.5 as well. With the dithering scheme activated, the noise floor is overwhelmed by the dither noise. After removing the dither noise, the noise floor is improved by 23dB while 30dB of spur rejection is observed. In Fig. 25.3.6, the measured worst-case fractional spurs range from -62.5 to -79.3dBc across different FCW settings, indicating at least 20dB improvement. Response time of the implemented adaptive filter is <0.9μs. Figure 25.3.6 shows state-of-the-art PLLs applying dither techniques to mitigate fractional spurs. This work shows the most spur reduction, in addition to background tracking capability. Figure 25.3.7 shows the chip micrograph and summarizes the chip performance.

### Acknowledgement:

The author would like to thank DARPA RF-FPGA program for funding support.

### References:

- [1] R. B. Bogdan, et al., "Spur-Free Multirate All-Digital PLL for Mobile Phones in 65nm CMOS," *IEEE JSSC*, vol. 46, no. 12, pp. 2904-2919, Dec. 2011.
- [2] E. Temporiti, et al., "A 3.5GHz Wideband ADPLL with Fractional Spur Suppression Through TDC Dithering and Feedforward Compensation," *IEEE JSSC*, vol. 45, no. 12, pp. 2723-2736, Dec. 2010.
- [3] Y. He, et al., "A 673uW 1.8-to-2.5GHz Dividerless Fractional-N Digital PLL with an Inherent Frequency-Capture Capability and a Phase-Dithering Spur Mitigation for IoT Applications," *ISSCC*, pp. 420-421, Feb. 2017.
- [4] C. R. Ho, et al., "A digital PLL with Feedforward Multi-tone Spur Cancellation Loop achieving <-73dBc Fractional Spur and <-110dBc Reference Spur in 65nm CMOS," *IEEE JSSC*, vol. 51, no. 12, pp. 3216-3230, Dec. 2016.



Figure 25.3.1: Digital PLL architecture with reference-path dithering and LMS-based background-dither-noise-cancellation loop.



Figure 25.3.2: Time-domain illustration of the dithering scheme to randomize the fractional spur error pattern.



Figure 25.3.3: Loop response of the dithering and cancellation path over PVT variation.



Figure 25.3.4: Implementation of the background-dither-noise-cancellation loop.



Figure 25.3.5: Measured spectrum of fractional spurs at FCW of  $120 + 2^{14}$  and phase-noise profile at FCW of  $120 + 2^{10}$  before/after dither noise cancellation.



Figure 25.3.6: Measured worst-case fractional spur across different FCW settings, settling behavior at FCW of  $120 + 2^{14}$  and comparison with state-of-the-art PLLs with dithering techniques for fractional-spur cancellation.



|                                   |                       |
|-----------------------------------|-----------------------|
| PLL frequency                     | 3 – 5.2 GHz           |
| Reference clock ( $F_{REF}$ )     | 11 – 40 MHz           |
| Feedback divider range            | 1 – 256               |
| Technology                        | CMOS 65nm             |
| Active Area                       |                       |
| Analog                            | 0.27 mm <sup>2</sup>  |
| DPLL Loop                         | 0.051 mm <sup>2</sup> |
| Cancellation Loop                 | 0.017 mm <sup>2</sup> |
| Power                             |                       |
| Analog core                       | 16.4 mW               |
| DPLL Loop                         | 1.1 mW                |
| Dither + Cancellation             | 0.6 mW                |
| Supply voltage                    | 1 V                   |
| Reference spur @ 30MHz offset     | -102.32 dBc           |
| Fractional spur @ 1.83kHz offset  | -62.47 dBc            |
| Setting time of cancellation loop | 0.9 us                |
| Integrated jitter @ 3.6GHz        | 856 fs                |
| Integrated jitter @ 3.60018GHz    | 1.78 ps               |
| In-band phase noise @ 100KHz      | -96 dBc/Hz            |
| Out-band phase noise @ 3MHz       | -120 dBc/Hz           |

Figure 25.3.7: Die micrograph and performance summary.

## 25.4 A -242dB FOM and -75dBc-Reference-Spur Ring-DCO-Based All-Digital PLL Using a Fast Phase-Error Correction Technique and a Low-Power Optimal-Threshold TDC

Taeho Seong, Yongsun Lee, Seyeon Yoo, Jaehyouk Choi

Ulsan National Institute of Science and Technology, Ulsan, Korea

To improve efficiency in the use of silicon, there have been many efforts to develop ring-oscillator-based clock generators with low jitter. A PLL using a fast phase-error correction (FPEC) technique [1] is one promising architecture. By emulating the phase-realignment mechanism of an injection-locked clock multiplier (ILCM), the FPEC PLL can achieve ultra-low jitter that is almost comparable to that of ILCMs. In addition, since the FPEC PLL has an integrator in its transfer function, it can also achieve a low reference spur and a high multiplication factor ( $N$ ), which is different from ILCMs. However, the FPEC PLL of an analog implementation in [1] has difficulty maintaining optimal loop characteristics, which vary easily due to PVT variations or a change in the output frequency. To facilitate the calibration of loop characteristics, the FPEC can be implemented in an all-digital PLL (ADPLL), increasing the control word of a DCO,  $D_{DCW}$ , in a very short duration,  $T_{FPEC}$ , as shown in Fig. 25.4.1. Since the FPEC technique can rapidly remove the accumulated jitter of the DCO from the previous reference period,  $T_{REF}$ , the variance of the output jitter,  $VAR[J_{OUT}](t)$ , becomes saw-tooth-shaped along with the accumulating jitter. In a conventional ADPLL, the accumulated jitter is removed gradually over  $T_{REF}$ , so the variance of the jitter is nearly constant [2]. This difference enables the FPEC ADPLL to have much lower RMS jitter,  $\sigma_{RMS}$ . However, the FPEC ADPLL is limited in its ability to achieve extremely low jitter, i.e., it cannot reduce  $\sigma_{RMS}$  as much as analog FPEC PLLs can. This is because typical ADPLL TDCs provide less precise information regarding the oscillator jitter than a PD does in analog PLLs. When it detects a timing error,  $\tau_{ERR}$ , a TDC generates a digitized value,  $D_{TDC}$ ; thus, the amount of error to be corrected becomes rather than  $\tau_{ERR}$ . This results in a quantization error,  $\tau_0$ , thereby increasing  $\sigma_{RMS}$ . To minimize  $\tau_0$  (or  $E[\tau_0^2]$ ), the resolution of a TDC must be improved significantly to a level at which the quantity of jitter can be distinguished, but this is difficult when a typical CMOS process is used. Even if the design itself were possible, additional power would be required to generate many evenly spaced time thresholds.

This work presents an FPEC ADPLL that can concurrently achieve an ultra-low jitter and a low reference spur. A low-power optimal-threshold (OT) TDC is proposed as the core circuit to minimize  $E[\tau_0^2]$  effectively. By optimizing the spacing between the time thresholds of the TDC and the phase-correction gain of the loop,  $E[\tau_0^2]$  can be reduced dramatically, while using a small number of thresholds ( $N_{TDC}$ ) and a small amount of power. Since the FPEC technique is implemented digitally, the variables that define the loop characteristics can be corrected precisely using simple background calibrators.

Figure 25.4.2 shows how the optimal values of the related variables can be obtained. In the analysis,  $N_{TDC}$  is assumed to be three, and the time thresholds of the TDC are represented as  $-\tau_{TH}$ , 0, and  $+\tau_{TH}$ , where  $\tau_{TH}$  is the spacing between two adjacent thresholds. By a  $D_{TDC}$  and a phase-correction gain,  $K$ ,  $\hat{\tau}_{ERR}$  can be one of  $\pm K$  and  $\pm 3K$ . To find the optimal values of  $\tau_{TH}$  and  $K$  that can minimize  $E[\tau_0^2]$  for the given probability density function (PDF) of  $\tau_{ERR}$ , the Lloyd-Max algorithm [3] is used. Using the minimum-mean-square-error criterion, this algorithm was developed to find the optimal decision thresholds and representative levels that minimize the variance of the quantization error of any PDFs; thus, it can provide correct solutions to the problem of this work. From this algorithm, the optimal values of  $\tau_{TH}$  and  $K$  are obtained as  $\sigma_{ERR}$  and  $0.5\sigma_{ERR}$ , respectively, where  $\sigma_{ERR}$  is the standard deviation of the PDF. The reason the solutions have such simple forms is that  $\tau_{ERR}$  has a Gaussian distribution because we only consider the thermal noise of the DCO. In this case, the probabilities,  $P(\tau_{ERR} < -\tau_{TH})$ ,  $P(-\tau_{TH} < \tau_{ERR} < +\tau_{TH})$ , and  $P(\tau_{ERR} > +\tau_{TH})$  are 0.16, 0.68, and 0.16, respectively, and these values are used to calibrate  $\tau_{TH}$  and  $K$ . In reality, due to the feedback of  $\tau_0$  to  $\tau_{ERR}$ , the PDF of  $\tau_{ERR}$  could deviate from being a Gaussian distribution. This

could cause errors in the estimation of  $\tau_{TH}$  and  $K$ , but they should be insignificant, since the portion of  $\tau_0$  out of  $\tau_{ERR}$  is relatively small in steady state.

We performed simulations using the behavioral model to verify the validity of the solution from the Lloyd-Max algorithm and observed  $\sigma_{RMS}$  as the values of  $\tau_{TH}$  and  $K$  were changed arbitrarily. From the simulations, the values of  $\tau_{TH}$  and  $K$  that provided the minimum  $\sigma_{RMS}$  were  $1.03\sigma_{ERR}$  and  $0.51\sigma_{ERR}$ , respectively, which were in good agreement with the foregoing analysis. To observe the change of the minimum  $\sigma_{RMS}$  according to  $N_{TDC}$ , we also performed the simulation for different  $N_{TDC}$ , using the values of  $\tau_{TH}$  and  $K$  obtained from the algorithm. As shown in the graph of the normalized  $\sigma_{RMS}$ , the conventional ADPLL has a 66% higher  $\sigma_{RMS}$  than the ideal FPEC PLL with zero  $E[\tau_0^2]$ . The values of  $\sigma_{RMS}$  decrease gradually as the FPEC is used with increasingly larger  $N_{TDC}$ . This is because the quantization by the TDC becomes more continuous, thereby further reducing  $E[\tau_0^2]$ . The decrease of  $\sigma_{RMS}$  is noticeably slower for  $N_{TDC}$ s greater than three; thus, we decided to fix  $N_{TDC}$  at three to reduce design complexity and power consumption. Figure 25.4.2 also shows the diagram of the proposed OT-TDC using three DFF/delay-cell sets. The upper and lower delay cells are based on a digital-to-time converter (DTC), generating relatively positive and negative delays, i.e.,  $+\tau_U$  and  $-\tau_L$ , respectively. Using the fact that the averages of the outputs of the upper DFF,  $E[D_U]$ , and lower DFF,  $E[D_L]$ , must be  $-0.68$  and  $0.68$ , respectively, the amounts of  $\tau_U$  and  $\tau_L$  can be optimally adjusted.

Figure 25.4.3 shows the architecture of the proposed FPEC ADPLL. The OT-TDC quantizes a timing error between  $S_{DIV}$  and  $S_{REF}$  and provides  $D_{TDC}$ . The digital-loop filter (DLF) consists of an integral path and a proportional path with gains of  $\alpha$  and  $\beta$ , respectively. To prevent jitter peaking,  $\alpha$  is designed to be small, whereas  $\beta$  is designed to be very large. The effect of the FPEC is generated by turning on the proportional path only for a short time, according to  $S_{FPEC}$  from the FPEC-DLF logic; the duration of  $T_{FPEC}$  is one-eighth that of  $T_{REF}$ . To maintain the minimized  $E[\tau_0^2]$ , the values of  $\tau_U$ ,  $\tau_L$ , and  $K$  are calibrated by two digital controllers. To evaluate the current value of  $\tau_U$  (or  $\tau_L$ ), the  $\tau_{TH}$ -controller compares  $E[D_U]$  (or  $E[D_L]$ ) with  $-0.68$  (or  $0.68$ ) using an FIR filter and a sample-and-hold circuit. Then,  $D_{DCW,U}$  (or  $D_{DCW,L}$ ) is updated based on the results. To optimize  $K$ , the  $K$ -gain controller adjusts  $\beta$  until the autocorrelation of  $D_M$  becomes zero. This condition is equivalent to adjusting  $K$  to  $0.5\sigma_{ERR}$  and can ensure that the loop gain is optimal [4].

The proposed FPEC ADPLL, fabricated with a 65nm CMOS process, uses 6.0mW power and 0.055mm<sup>2</sup> area. Figure 25.4.4 shows that our approach achieved very low phase noise (PN) and RMS jitter with a wide noise-reduction bandwidth, i.e., almost 20MHz. The output frequency,  $f_{OUT}$ , was 2.4GHz, and the reference frequency,  $f_{REF}$ , was 75MHz. The PN and RMS jitter were reduced further when  $N_{TDC}$  changed from 1 to 3; the decrease in the 100kHz PN was approximately 5dB, and RMS jitter was reduced from 454 to 320fs. When  $f_{OUT}$  was changed from 2.3 to 2.5GHz with different  $f_{REF}$ , RMS jitter stayed near 320fs. Figure 25.4.5 shows that the reference spur was  $-75$ dBc, which varied by less than 5dB for different  $f_{OUT}$ . The table (top) in Fig. 25.4.6 shows that the proposed FPEC ADPLL using the OT-TDC concurrently achieved very low RMS jitter,  $FOM_{JIT}$ , and reference spur, compared to state-of-the-art ring-oscillator-based clock generators. Also, it had much better  $FOM_{JIT}$  than state-of-the-art BB-ADPLLs (bottom).

### Acknowledgement:

This work was supported by Samsung Research Funding Center of Samsung Electronics under Project Number SRFC-IT1702-02

### References:

- [1] T. Seong, et al., "A -242-dB FOM and -71-dBc Reference Spur Ring-VCO-Based Ultra-Low-Jitter Switched-Loop-Filter PLL Using a Fast Phase-Error Correction Technique," *IEEE Symp. VLSI Circuits*, pp. C186-C187, June 2017.
- [2] J. Borremans, et al., "A Low-Complexity, Low-Phase-Noise, Low-Voltage Phase-Aligned Ring Oscillator in 90 nm Digital CMOS," *IEEE JSSC*, vol. 44, no. 7, pp. 1942-1949, July 2009.
- [3] S. Lloyd, "Least squares quantization in PCM," *IEEE Trans. Information Theory*, vol. 28, no. 2, pp. 129-137, Mar. 1982.
- [4] T. K. Kuan, et al., "A Loop Gain Optimization Technique for Integer- $N$  TDC-Based Phase-Locked Loops," *IEEE TCAS-I*, vol. 62, no. 7, pp. 1873-1882, July 2015.



Figure 25.4.1: Concept of the FPEC ADPLL and comparison of jitter reduction in a conventional ADPLL.



Figure 25.4.2: Analysis and simulations of optimal time thresholds,  $\tau_{\text{TH}}$ , of a TDC and phase-correction gain,  $K$ , that minimize  $E[\tau_0^2]$ , and a block diagram of the proposed optimal-threshold (OT) TDC.



Figure 25.4.3: Overall architecture of the proposed FPEC ADPLL and diagrams of digital calibrators.



Figure 25.4.4: Measured phase noises of the 2.4GHz output signal of the FPEC ADPLL when  $N_{\text{TDC}} = 3$  and  $N_{\text{TDC}} = 1$ , and the measured RMS jitter at the output frequencies from 2.3 to 2.5GHz with a different  $f_{\text{REF}}$  ( $N = 32$ ).



Figure 25.4.5: Measured spectrum of the 2.4GHz output signal of the FPEC ADPLL and the measured reference spur at output frequencies from 2.3 to 2.5GHz with a different  $f_{\text{REF}}$  ( $N = 32$ ).

| Comparison with state-of-the-art ring-oscillator-based integer-N clock generators ( $N > 15$ ) |            |                         |                     |                           |                       |
|------------------------------------------------------------------------------------------------|------------|-------------------------|---------------------|---------------------------|-----------------------|
| Reference                                                                                      | This Work  | VLSI'17 [1]<br>T. Seong | ISSCC'15<br>L. Kong | ISSCC'16<br>K. Y. J. Shen | ISSCC'17<br>D. Coombs |
| Technology                                                                                     | 65nm       | 65nm                    | 65nm                | 14nm                      | 65nm                  |
| Architecture                                                                                   | FPEC ADPLL | Analog FPEC PLL         | SLF PLL             | ILCM                      | ILCM                  |
| Loop BW calibration                                                                            | Y          | N                       | N                   | Y                         | N                     |
| $f_{\text{OUT}}$ (GHz)                                                                         | 2.4        | 3.008                   | 2.4 (2.0 – 3.0)     | 4 (0.4 – 5.0)             | 5 (2.5 – 5.75)        |
| $f_{\text{REF}}$ (MHz)                                                                         | 75         | 47                      | 22.6                | 100                       | 125                   |
| Multi. Factor (N)                                                                              | 32         | 64                      | 106                 | 40                        | 40                    |
| PN@1MHz(dBc/Hz)                                                                                | -119.8     | -121.6                  | -113.8              | -112                      | -115.9                |
| Jitter <sub>av</sub> (fs)                                                                      | 320        | 357                     | 970                 | 1264                      | 340                   |
| (1k – 100M)                                                                                    | (1k – 80M) | (1k – 200M)             | (100k – 1G)         | (10k – 40M)               | (10k – 40M)           |
| Spur (dBc)                                                                                     | -75        | -71                     | -65                 | NA                        | -45                   |
| Power (mW)                                                                                     | 6.0        | 4.6                     | 4.0                 | 2.6                       | 5.3                   |
| FOM <sub>av</sub> (dB)*                                                                        | -242.1     | -242.3                  | -234.1              | -233.9                    | -242.4                |
| Area (mm <sup>2</sup> )                                                                        | 0.055      | 0.047                   | 0.015               | 0.021                     | 0.09                  |

| Comparison with state-of-the-art ring-DCO-based BB-ADPLLS |               |                     |                       |                     |                    |
|-----------------------------------------------------------|---------------|---------------------|-----------------------|---------------------|--------------------|
| Reference                                                 | This Work     | ISSCC'17<br>T. Jang | ISSCC'16<br>C. W. Yeh | ISSCC'15<br>M. Song | ISSCC'14<br>B. Kim |
| Technology                                                | 65nm          | 28nm                | 40nm                  | 14nm                | 65nm               |
| Architecture                                              | FPEC ADPLL    | BB-ADPLL            | BB-ADPLL              | BB-ADPLL            | BB-ADPLL           |
| $f_{\text{OUT}}$ (GHz)                                    | 2.4           | 2.4 (0.8 – 3.2)     | 3.2                   | 2 (0.032 – 2)       | 1.6 (0.4 – 1.6)    |
| $f_{\text{REF}}$ (MHz)                                    | 75            | 50                  | 200                   | 50                  | 266                |
| Multi. Factor (N)                                         | 32            | 48                  | 16                    | 40                  | 6                  |
| PN@1MHz(dBc/Hz)                                           | -119.8        | -104                | -104                  | -96                 | -97                |
| Jitter <sub>av</sub> (fs)                                 | 320           | 2520                | 3540                  | 18800               | 2800               |
| (1k – 100M)                                               | (100k – 100M) | (10k – 300M)        | (100k – 100M)         | (20k – 2M)          | (10k – 40M)        |
| Spur (dBc)                                                | -75           | NA                  | NA                    | NA                  | -75                |
| Power (mW)                                                | 6.0           | 5.0                 | 2.9                   | 2.06                | 2.7                |
| FOM <sub>av</sub> (dB)*                                   | -242.1        | -225.1              | -224.4                | -211.4              | -226.7             |
| Area (mm <sup>2</sup> )                                   | 0.055         | 0.049               | 0.022                 | 0.009               | 0.019              |

\* FOM<sub>av</sub> (dB) = 10 log[(Jitter<sub>av</sub>/1fs)<sup>2</sup> / (Power/1mW)]

Figure 25.4.6: Performance comparisons with state-of-the-art ring-oscillator-based clock generators and ring-DCO-based BB-ADPLLS.



| Power Consumption (mW)                              |      |
|-----------------------------------------------------|------|
| DCO                                                 | 5.16 |
| Divider                                             | 0.30 |
| FPEC DLF<br>+ K-Gain Controller<br>+ FPEC-DLF Logic | 0.47 |
| OT-TDC<br>+ τ <sub>TH</sub> -Controller             | 0.07 |
| Total                                               | 6.00 |

Figure 25.4.7: Micrograph of the FPEC ADPLL and power-breakdown table.

# Session 26 Overview:

## *RF Techniques for Communication and Sensing*

### RF SUBCOMMITTEE



**Session Chair:**  
**Giuseppe Gramegna**  
*Huawei, Mougins, France*



**Associate Chair:**  
**Hua Wang**  
*Georgia Institute of Technology, Atlanta, GA*

### Subcommittee Chair: **Piet Wambacq**, imec, Leuven, Belgium

The increasing need to improve transmitter efficiency, enhance receiver linearity and robustness, and migrate electronic interfaces closer to the antenna continues to stimulate research in advanced RF techniques. The first paper in this session presents a broadband hybrid-coupler circulator for full-duplex operation. The following two papers demonstrate mm-wave antenna and power amplifier co-integration to increase efficiency and reduce cost. Next, we have three papers that advance power-amplifier architectures for 5G and NB-IoT applications. The three papers towards the end of this session demonstrate techniques for high-linearity receivers. A real-time, near-field THz imager concludes the session.



**1:30 PM**  
**26.1 A 0.55-to-0.9GHz 2.7dB NF Full-Duplex Hybrid-Coupler Circulator with 56MHz 40dB TX SI Suppression**

*S. Jain, Oregon State University, Corvallis, OR*

In Paper 26.1, Oregon State University presents a balanced hybrid circulator RX that achieves 2.7dB NF, while providing >40dB TX-to-RX baseband isolation across 32MHz BW (56MHz BW for an average of 40dB isolation), and >30dB isolation across 190MHz bandwidth. An on-chip balancing network is not required. The improvement in NF in this work is >3dB, and the isolation bandwidth is increased >2.7 $\times$  compared to state-of-the-art circulator-based RX.



**2:00 PM**  
**26.2 A 62-to-68GHz Linear 6Gb/s 64QAM CMOS Doherty Radiator with 27.5%/20.1% PAE at Peak/6dB-Back-off Output Power Leveraging High-Efficiency Multi-Feed Antenna-Based Active Load Modulation**

*H. T. Nguyen, Georgia Institute of Technology, Atlanta, GA*

In Paper 26.2, Georgia Institute of Technology leverages a high-efficiency multi-feed antenna-based active-load-modulation scheme. The paper reports a multi-feed 65GHz on-chip Doherty radiator in 45nm SOI-CMOS that supports Gb/s modulations and generates 19dBm P<sub>1dB</sub> with 27.5%/20.1% PAE at 0dB/6dB back-off. The best back-off efficiency enhancement among 60-to-80GHz silicon-based PAs is demonstrated using a new output network.



**2:30 PM**  
**26.3 A 69-to-79GHz CMOS Multiport PA/Radiator with +35.7dBm CW EIRP and Integrated PLL**

*B. Abiri, California Institute of Technology, Pasadena, CA*

In Paper 26.3, the California Institute of Technology demonstrates an on-chip multiport radiator and an integrated PLL with a locking range of 69-to-79GHz that is capable of providing a continuous 35.7dBm EIRP and 24.4dBm of total radiated power with 15.2% DC-to-radiation efficiency. The approach is also scalable and extracts high power levels from low-voltage CMOS transistors by combining power from multiple PAs and lowering the radiator driving impedance.



2:45 PM

**26.4 A 28GHz 41%-PAE Linear CMOS Power Amplifier Using a Transformer-Based AM-PM Distortion-Correction Technique for 5G Phased Arrays***S. N. Ali*, Washington State University, Pullman, WA

In Paper 26.4, Washington State University presents a transformer to correct the AM-PM distortion of a Class-AB PA for 5G applications. The 65nm CMOS PA achieves <0.7 degree AM-PM distortion, 41% PAE, and 15.6dBm  $P_{out}$  at 28GHz CW. Tested under a 512/256/64QAM signal with a 20/50/340MSym/s data-rate at 28GHz, the PA has a linear PAE at 28GHz of 18.2% for a 64QAM, 340MHz BW signal ( $P_{avg}=9.8$ dBm, EVM=-26.4dBc and ACPR=-30dBc).



3:15 PM

**26.5 A Compact Dual-Band Digital Doherty Power Amplifier Using Parallel-Combining Transformer for Cellular NB-IoT Applications***Y. Yin*, Fudan University, Shanghai, China

In Paper 26.5, Fudan University demonstrates a dual-band high-power digital Doherty PA for cellular NB IoT in 55nm CMOS. The PA reaches 28.9dBm<sub>peak</sub>  $P_{out}$  with 36.8% PAE in the low band and 27dBm<sub>peak</sub>  $P_{out}$  with 25.4% PAE in the high band. For a 12-subcarrier 180kHz NB-IoT signals,  $P_{out}$  is 24.4dBm with an average PAE of 29.5% and -21.6dB EVM.



3:30 PM

**26.6 A Continuous-Mode Harmonically Tuned 19-to-29.5GHz Ultra-Linear PA Supporting 18Gb/s at 18.4% Modulation PAE and 43.5% Peak PAE***T-W. Li*, Georgia Institute of Technology, Atlanta, GA

In Paper 26.6, Georgia Institute of Technology presents a PA that achieves a  $P_{sat}$  1dB bandwidth of 19 to 29.5GHz (43.3%) or  $S_{21}$  3dB BW of 17.7 to 32.3GHz (58.4%) for multiband 5G MIMO. It supports 18Gb/s 64QAM and 8Gb/s 256QAM with >8.7dBm  $P_{avg}$ , >16.3% total PAE, and >20% PAE under modulation. The design is based on a continuous-mode harmonically tuned linear mm-wave PA that features an ultra-compact on-chip PA output network in only one transformer footprint. The continuous-mode terminations at the fundamental, 2<sup>nd</sup>, and 3<sup>rd</sup> harmonics are wideband with no tunable elements or switches.



4:00 PM

**26.7 A Coupled-RTWO-Based Subharmonic Receiver Front-End for 5G E-Band Backhaul Links in 28nm Bulk CMOS***M. Vigilante*, KU Leuven, Heverlee, Belgium

In Paper 26.7, KU Leuven demonstrates an E-Band direct-conversion subharmonic receiver (SHRX) that achieves 8.3dB noise figure, 12.5GHz RF bandwidth, and -25dBm ICP<sub>1dB</sub>, while consuming <100mW. The receiver leverages on-chip coupled RTWOs to generate 8 differential phases at  $f_{LO}=f_{RF}/4$ . It is implemented in a 28nm bulk CMOS technology without an ultra-thick top metal.



4:15 PM

**26.8 A 12mW 70-to-100GHz Mixer-First Receiver Front-End for mm-Wave Massive-MIMO Arrays in 28nm CMOS***L. Iotti*, University of California, Berkeley, CA

In Paper 26.8, the University of California, Berkeley, presents an E-Band mixer-first RX with  $S_{11}<-10$ dB, 8-to-12.7dB NF, 19.5-to-25.3dB gain, and -16.8dBm maximum ICP<sub>1dB</sub> across 70-to-100GHz. The high-input-impedance passive mixer is matched at the RF input using a frequency-translational feedback and a broadband matching network. Power consumption is 12mW.



4:30 PM

**26.9 A 13<sup>th</sup>-Order CMOS Reconfigurable RF BPF with Adjustable Transmission Zeros for SAW-Less SDR Receivers***P. Song*, University of Southern California, Los Angeles, CA

In Paper 26.9, the University of Southern California describes a 13<sup>th</sup>-order CMOS reconfigurable RF bandpass filter that is tunable from 0.8 to 1.1GHz. The IIP3-OOB is +24dBm at a 40MHz offset. The filter has adjustable close-by transmission zeros. The 3dB BW is 30 to 50MHz, tunable, and has a 100dB/100MHz transition-band roll-off that enables close-by blocker rejection.



4:45 PM

**26.10 A 128-Pixel 0.56THz Sensing Array for Real-Time Near-Field Imaging in 0.13μm SiGe BiCMOS***P. Hillger*, University of Wuppertal, Wuppertal, Germany

In Paper 26.10, the University of Wuppertal presents a silicon-based super-resolved real-time terahertz sensing system that includes THz illumination, detection, evanescent field sensing, and readout on a single chip. It comprises of 128 split-ring-resonator-based 0.56THz near-field sensors. Chopping techniques and an integrated lock-in amplifier are employed to achieve 93dB dynamic range (1Hz BW) in an analog readout mode, and 37dB at 28fps in a digital readout mode.

## 26.1 A 0.55-to-0.9GHz 2.7dB NF Full-Duplex Hybrid-Coupler Circulator with 56MHz 40dB TX SI Suppression

Sanket Jain, Abhishek Agrawal, Manoj Johnson, Arun Natarajan

Oregon State University, Corvallis, OR

Simultaneous transmit-and-receive (STAR) radios enable higher spectrum efficiency and dynamic spectrum access. The integration of a shared antenna interface is attractive for small system formfactor and MIMO channel estimation and has led to demonstrations of compact reciprocal (electrical balance duplexers (EBD)) and non-reciprocal (circulator) interfaces with self-interference cancellation (SIC) [1-5]. In this paper, we address the challenge of low-noise wideband SIC by demonstrating a wideband hybrid-coupler-circulator antenna interface using N-path mixers that achieves low NF while preserving the linearity of passive-mixer-first RX. The 0.55-to-0.9GHz hybrid-coupler-based circulator-RX achieves 2.7dB NF at 750MHz with +14dBm OOB IIP3 as part of an antenna interface that achieves 2.6-to-4.1dB TX→ANT insertion loss (IL) from 550 to 900MHz. The wideband coupler approach provides >40dB TX-to-RX baseband (BB) isolation across a 32MHz BW (56MHz BW for averaged 40dB isolation) and >30dB isolation across a 190MHz bandwidth without an on-chip balancing network, representing a >3dB improvement in NF and >2.7× increase in cancellation bandwidth compared to previously published circulator-RX.

Magnetic-free integrated circulators, based on the phase non-reciprocal behavior of two-port N-path filters embedded in a transmission-line (t-line) structure, have been demonstrated in [2,3]. The architecture has been extended to mm-wave using a wideband gyrator in [4]. However, intrinsic asymmetry in the t-line structure leads to amplitude and phase imbalance, which can limit both the TX SIC nulls as well as the TX SIC bandwidth. A balance network has been demonstrated to increase SIC, but the resulting increase in quadrature imbalance can lead to NF degradation [2]. The requirements placed on an additional balancing network can be considerably relaxed if the intrinsic quadrature balance in the passive structure is improved. Figure 26.1.1 shows the baseband voltage for a two-port N-path filter, driven by quadrature clocks, as a function of phase difference,  $\theta$ , between the input signals. The N-path mixers constructively combine signals at the LO frequency with  $\theta = -90^\circ$  while rejecting all other signals (ignoring harmonics) and specifically nulling signals with  $\theta = 90^\circ$  at the LO frequency. The proposed balanced 90° hybrid-coupler-based non-reciprocal structure is shown in Fig. 26.1.1 in which the two-port N-path mixer driven by quadrature clocks is connected to the thru and coupled ports while the TX and ANT signals are applied to the input and isolation ports. Signals from the TX and ANT ports are in quadrature at the thru and coupled ports but with opposite phase progressions (Fig. 26.1.1). The phase non-reciprocal two-port N-path mixer nulls the TX signal creating a reflective short at the thru and coupled ports with respect to the TX signal. This leads to the TX signal propagating to the antenna port. On the other hand, signals from the ANT port see a high impedance at the thru and coupled ports leading to RX reception on the baseband capacitors. Notably, the impedance translation of N-path structures ensures a short for all signals from the TX port and for out-of-band signals at the RX port, thereby providing TX as well as OOB rejection. Achieving quadrature phase and amplitude balance is critical for wideband nulling. As shown in Fig. 26.1.1, 90° hybrid coupler can achieve wideband quadrature balance enabling wideband TX nulling. The TX-to-ANT IL depends upon hybrid-coupler IL and mixer-switch  $R_{SW}$ . Given compact commercial SMD couplers with 0.2dB IL [6] and CMOS switches with  $R_{SW} \sim 1.6\Omega$ , the proposed approach can simultaneously achieve wide SIC bandwidth and low IL while ensuring small formfactor and simplified packaging.

Additionally, the potential for achieving low RX NF using an integrated circulator structure is presented in Fig. 26.1.2. Similar to the structure in [2], the two-port N-path circuit constructively combines the ANT signals arriving at the thru and coupled ports on the baseband capacitors. The down-converted RX signal is hence available on the N-path capacitors, which serves as the RX baseband port. This is equivalent to a circulator operating with the RX port mismatched to a high impedance. Therefore, power matching for the ANT port is provided by the TX port termination. If the TX port is terminated in  $50\Omega$ , a wideband RX match (limited by coupler bandwidth) is achieved. However, ANT→TX isolation is no longer present since the ANT port is matched by the TX port impedance. Importantly, noise emanating from the TX termination is cancelled at the RX port, similar to other wideband noise-cancelling LNAs. In addition, the hybrid-coupler structure results in a passive voltage gain at the RX port. Since the baseband RX can be designed without matching considerations, it is feasible to achieve low NF by reducing the input-referred noise of the baseband circuits while preserving the linearity of passive-mixer-first RX. Finally, quadrature balance in the hybrid-coupler enables TX SIC without the balancing network in [2], avoiding the associated degradation in RX gain and NF.

The schematic of the implemented hybrid-circulator receiver is shown in Fig. 26.1.2. The 65nm CMOS implementation includes 8-phase N-path switches and baseband inverter-based  $G_M$ -cells that drive off-chip TIA and summing amplifiers. Dual-edge-triggered flip flops are used to generate the 8-phase non-overlapping pulses for I and

Q LO signals from a 4x LO. The switch drain and source voltages are biased to improve switch linearity and the clock drive is AC coupled to the mixer gates. The IC is implemented in 65nm CMOS and occupies 1mm<sup>2</sup>. The IC is packaged with a commercial wideband 700MHz SMD hybrid coupler (5.1mm × 6.4mm) [6] using a chip-on-board packaging approach.

The measured S-parameters with respect to the TX (Port 1) and ANT (Port 2) ports is shown in Fig. 26.1.3. A wideband input-match from 300MHz to 1GHz is observed with IL of 2.6dB, 3.1dB, and 4.7dB at 550MHz, 700MHz, and 900MHz, respectively (LO at 770MHz). As expected, the high impedance at the RX port results in lack of isolation for ANT→TX. In this design, the N-path capacitors are sized for 80MHz RF BW. The expected phase asymmetry due to the non-reciprocal structure can be seen in Fig. 26.1.3. The RX conversion gain and linearity are measured with the  $G_M$  cells loaded with  $50\Omega$ . The measured NF (with  $G_M$  cells driving  $50\Omega$  to exclude the TIA impact) at 750MHz LO is shown in Fig. 26.1.3. The passive-mixer-first circulator-RX provides high-linearity of +14dBm with respect to out-of-band signals. The high-impedance baseband approach leads to the RX achieving 2.7dB NF (Fig. 26.1.3) and demonstrating the feasibility of achieving low NF with shared antenna interface functionality. The RX can achieve 0.2dB improved NF in TDD mode if switch drive settings are optimized for NF rather than TX→ANT IL.

TX SIC measurements are carried out for a LO of 770MHz with an impedance tuner providing <1.2 VSWR transformation on the ANT port that is fixed across frequency. The small-signal and large-signal TX→RX BB isolation (referred to the ANT port) are shown in Fig. 26.1.4. The ANT port VSWR and LO I/Q phase settings are optimized for +5.5dBm TX power. The symmetric hybrid-coupler approach results in wide SIC bandwidths for circulator-RX without the need for balancing networks that degrade RX NF or IL. Measurements show >20dB SIC across >400MHz BW and >30B SIC across 160MHz BW for small-signal and large-signal TX. In the case of +5.5dBm TX, SIC > 40dB is achieved for 32MHz BW while average SIC > 40dB is achieved across 56MHz BW.

The passive-mixer-first RX architecture, accompanied by a 40dB wideband SIC, supports high linearity with respect to the TX port while only placing the typical  $50\Omega$  matching constraint on the TX. The measured RX 1dB compression due to in-band TX signal is shown in Fig. 26.1.4 and demonstrates blocker P1dB of +5.5dBm. The TX→RX BB SIC results in lower inter-mod products due to the TX. A two-tone TX input shows an effective small-signal IB-IIP3 of 4dBm referred to the ANT port (Fig. 26.1.4). Two-tone tests show TX→ANT IIP3 of 25dBm (Fig. 26.1.4), which is in-line with expectations due to the high isolation.

The impact of varying ANT VSWR on SIC is measured using a manual impedance tuner. SIC is a function of the magnitude and angle of the load reflection coefficient with >20dB cancellation for VSWR < 1.35. The efficacy of the proposed structure in rejecting modulated TX signals is also shown in Fig. 26.1.5. A modulated multi-tone signal with a null-to-null BW of ~20MHz is applied at the TX port. Figure 26.1.5 shows the normalized baseband signal at 3.5dBm and -11.5dBm input power. While the higher power level leads to increased inter-mod products, the measured signal power at baseband demonstrates >40dB RF SIC across high power levels for the wideband signal.

The measured performance of the proposed structure is compared to prior works in Fig. 26.1.6 while the die micrograph is shown in Fig. 26.1.7. The hybrid-circulator RX achieves significantly improved NF compared to prior antenna-interface RX while also providing wideband SIC and scalability to higher frequencies.

### Acknowledgements:

This project was funded by the DARPA ACT program and NSF #1554720.

### References:

- [1] B. van Liempd et al. "A +70dBm IIP3 Single-Ended Electrical Balance Duplexer in 0.18μm SOI CMOS," ISSCC, pp. 32-33, Feb. 2016.
- [2] N. Reiskarimian et al. "Highly-Linear Integrated Magnetic-Free Circulator-Receiver for Full-Duplex Wireless," ISSCC, pp. 316-318, Feb. 2017.
- [3] J. Zhou et al. "RX with integrated magnetic-free N-path-filter-based non-reciprocal circulator and baseband SI cancellation for FD wireless," ISSCC, pp. 178-180, Feb. 2016.
- [4] T. Dinc and H. Krishnaswamy "A 28GHz magnetic-free non-reciprocal passive CMOS circulator based on spatio-temporal conductance modulation", ISSCC, pp. 294-295, Feb. 2017.
- [5] D. Yang et al. "A Wideband Highly Integrated and Widely Tunable Transceiver for In-Band Full-Duplex Communication," IEEE JSSC, vol. 50, no. 5, pp 1189-1202, May 2015.
- [6] Anaren, "Hybrid Coupler 3 dB 90°," Model X3C07P1-03S Datasheet, Rev C. Accessed on July 24, 2017.
- [7] T. Zhang et al. "A 1.7-to-2.2GHz Full-Duplex Transceiver System with >50dB Self-Interference Cancellation over 42MHz Bandwidth," ISSCC, pp. 314-316, Feb. 2017.
- [8] D. Broek et al. "An In-Band FD Radio RX With a Passive Vector Modulator Downmixer for SI Cancellation," IEEE JSSC, Vol. 50, No. 12, Dec. 2015.

Phase Non-Reciprocal 2-Port N-path Mixer



Figure 26.1.1: The 2-port N-path structure combines signals with  $\theta = -90^\circ$  while nulling signals with  $\theta = 90^\circ$ . Embedding this structure in a  $90^\circ$  hybrid coupler results in wideband TX signal nulling at mixer capacitors.



Figure 26.1.3: Measured S-parameters showing input power match and TX-to-ANT insertion loss. Measured in-band and out-of-band IIP3 and measured NF demonstrates passive-mixer-first type linearity with low NF.



Figure 26.1.5: Measured cancellation across the ANT port VSWR. Wideband SI cancellation testing using a modulated multi-tone input shows approximately linear increase in SI power and >40dB TX-to-BB isolation (ref. to ANT).



Figure 26.1.2: The 2-port N-path capacitors serve as the high-impedance RX port in circulator RX. Noise and signal from the TX port are nulled at the RX port. Schematic of the proposed hybrid-coupler circulator-RX.



Figure 26.1.4: Measured TX-to-BB isolation (referred to the ANT port) shows wideband cancellation. Measured blocker-1dB tests show +5.5dBm TX power handling. Measured two-tone TX test shows +25dBm IIP3 for TX-to-ANT path.

| Specification                           | This Work                                                    | ISSCC 2017 [2]                          | ISSCC 2017 [1]                                   | ISSCC 2016 [3]                          | JSSC 2015 [5]                           | JSSC 2015 [6]                           |
|-----------------------------------------|--------------------------------------------------------------|-----------------------------------------|--------------------------------------------------|-----------------------------------------|-----------------------------------------|-----------------------------------------|
| Arch.                                   | Hybrid-Coupler based N-path Circulator RX                    | T-time based N-path Mixer Circulator RX | Dual Path Adapter Filter                         | T-time based N-path Mixer Circulator RX | Mixer-First TRX with Baseband Duplexing | Mixer First SI-cancelling VM-down mixer |
| Frequency                               | 0.55-0.9GHz                                                  | 0.61-0.975GHz                           | 1.7-2.2GHz                                       | 0.6-0.8GHz                              | 0.1-1.5GHz                              | 0.15-3.5GHz                             |
| Antenna Interface                       | Yes                                                          | Yes                                     | No                                               | Yes                                     | Yes                                     | No                                      |
| RX Gain (dB)                            | 15dB                                                         | 28-43dB                                 | 20-35dB                                          | 42dB                                    | 53dB                                    | 24dB max.                               |
| RX NF                                   | 2.7dB                                                        | 6.3dB                                   | 4dB                                              | 5.8-8dB                                 | 5.8dB                                   | 6.3dB                                   |
| In-band IIP3                            | -4dBm @ 15dB gain                                            | -18.4dBm at 28dB gain                   | NR                                               | -33dBm at 42dB gain                     | -38.7dBm at 53dB gain                   | +9/+19dBm (+ve cond. on/off)            |
| OOB IIP3                                | +14dBm                                                       | +15.4dBm                                | NR                                               | +19dBm                                  | +22.5dBm                                | +22dBm                                  |
| Analog SIC Domain                       | RF                                                           | RF                                      | RF + BB                                          | RF + BB                                 | BB                                      | RF                                      |
| RF/Analog SI suppression                | 40dB RF SIC across 56MHz BW; >40dB 32MHz BW; >30dB 190MHz BW | 40dB SIC across 20MHz BW <sup>(a)</sup> | >30dB RF SIC over 40MHz; >50dB SIC RF & BB 42MHz | 42dB SIC across 12MHz BW <sup>(a)</sup> | 33dB across 300kHz TX BW                | 27dB SIC across 16.25 MHz               |
| TX Port Power Handling                  | +5.5dBm                                                      | +7                                      | NR                                               | -7dBm                                   | -17.3                                   | N/A                                     |
| TX → ANT IL                             | 2.6dB @ 550MHz<br>3.1dB @ 700MHz<br>4.7dB @ 900MHz           | 1.8dB to 3.2dB                          | NA                                               | 1.7dB-3.2dB                             | NA                                      | N/A                                     |
| Effective IIP3 with respect to TX power | +4dBm small-signal                                           | -10dBm small-signal                     | NR; canceller                                    | +1dBm at 42dB gain                      | 0dBm at 43/53dB gain                    | 1.5dBm                                  |
| RX + SI Canceller Power                 | 25mW                                                         | 72mW <sup>(d)</sup>                     | 33.5mW                                           | 100mW <sup>(e)</sup>                    | 43.56mW incl. TX                        | 23.56mW                                 |
| Antenna Interface Power                 | 24mW                                                         | 36mW <sup>(g)</sup> at 0.7GHz           | N/A                                              | 59mW <sup>(f)</sup> at 0.7GHz           | RX power incl. interface                | N/A                                     |
| Technology                              | 65-nm CMOS                                                   | 65-nm CMOS                              | 40-nm CMOS                                       | 65-nm CMOS                              | 65-nm CMOS                              | 65-nm CMOS                              |

<sup>(a)</sup>additional digital SI cancellation also demonstrated, <sup>(b)</sup>linearity of canceller reported, <sup>(c)</sup>IIP3 with respect to TX at BB not reported; <sup>(d)</sup>includes on-chip phase control, <sup>(e)</sup>includes two-stage baseband amplification, <sup>(f)</sup>includes BB canceller

Figure 26.1.6: Performance summary and comparison with the prior recent full-duplex/shared antenna-interface RXs.