



Figure 26.1.7: Die micrograph of the IC (occupying 1mm<sup>2</sup> in 65nm CMOS) packaged with a 90° hybrid coupler (not shown).

## 26.2 A 62-to-68GHz Linear 6Gb/s 64QAM CMOS Doherty Radiator with 27.5%/20.1% PAE at Peak/6dB-Back-off Output Power Leveraging High-Efficiency Multi-Feed Antenna-Based Active Load Modulation

Huy Thong Nguyen, Taiyun Chi, Sensen Li, Hua Wang

Georgia Institute of Technology, Atlanta, GA

Extreme throughput requirements on future mm-wave systems, e.g., 5G links, necessitates the use of spectrum-efficient modulations that often come with high peak-to-average power ratios (PAPRs). Therefore, there is an increasing need for mm-wave power amplifiers (PAs) with high power back-off (PBO) efficiency, so that wideband large-PAPR signals can be transmitted with high energy efficiency. Among various PA PBO efficiency-enhancement techniques, Doherty PAs feature wideband modulations and low baseband DSP overhead, making them particularly promising candidates for high-speed mm-wave systems.

William H. Doherty, back in 1936, proposed two generic PA architectures employing either parallel or series combiners to realize active load modulations for PA PBO efficiency enhancement [1]. Different from parallel Doherty combiner that intrinsically scales up the load impedance, series Doherty combiner naturally down-scales the load and is particularly appealing for high-power PAs in voltage-limited Si processes. However, mm-wave series Doherty combiners entail various practical design challenges (Fig. 26.2.1). For example, transformer series combiners face compromised performance at high mm-wave and degraded balancing due to strong coil-coil capacitive coupling, while  $\lambda/4$  transmission-line (T-line) approaches are area-consuming and cannot easily support differential PAs. Existing on-chip series combiners also exhibit poor passive efficiency that substantially reduces the Doherty PBO efficiency enhancement in practice.

Several Si-based mm-wave Doherty PAs are reported [2,3]. However, the PA PBO efficiency increase is marginal compared to Class-B PA, or there is significant degradation on the peak PA efficiency, both of which are mainly due to the lossy Doherty combiners and imperfect main/auxiliary PAs cooperation.

We propose a mm-wave Doherty radiator topology that exploits a multi-feed on-chip antenna to achieve antenna-based close-to-ideal series power combining and thus high-performance Doherty operation. Merging the series Doherty combiner with the antenna also yields a compact radiator design within a single-antenna footprint and reduced on-chip passive networks, making it especially suitable for massive MIMO. Moreover, different from spatial outphasing or spatial I/Q power combining [4,5], our radiator directly achieves its Doherty operation on the antenna before any radiation, ensuring a consistent Doherty performance and undistorted modulations over a wide field of view (FoV). In addition, GHz-bandwidth adaptive biasing is employed in the auxiliary path to enhance Doherty main/auxiliary PA cooperation.

To construct an on-antenna series power combiner, we select two symmetrical locations on a one- $\lambda$  wire loop antenna to form two ports (Fig. 26.2.1). The driving impedance of this 2-port loop antenna can be analyzed using its [Y] matrix. When port 1 is excited with a voltage  $V_e$  and port 2 is short-circuited, a standing wave current is established on the antenna. Assuming no loss for simplicity, the one- $\lambda$  total loop length and symmetry ensure that the two currents  $I_1$  and  $I_2$  are equal in both magnitude and phase. Consequently, the 2-port one- $\lambda$  loop-antenna [Y] matrix is identical to that of an ideal series combiner, so that it realizes on-antenna series power combining and simultaneously radiates out the combined power (Fig. 26.2.1). Moreover, as the power is combined directly on the antenna, it typically offers lower loss than conventional on-chip power combining structures.

To complete the series Doherty combiner, the auxiliary path adopts a  $\lambda/4$  impedance inverting network as a capacitively loaded T-line, so that the T-line length is only 35° for size reduction. We also shape the antenna ground to increase the instantaneous bandwidth of the on-chip antenna. 3D EM simulations show that the antenna-based Doherty network achieves the desired Doherty load modulation (Fig. 26.2.2) with a total passive efficiency of 76% including antenna radiation efficiency on high resistivity SOI substrate.

For robust wireless communication, the antenna radiation pattern and the gain of the proposed Doherty radiator must not change during the Doherty operation over the antenna FoV. Otherwise, spatially dependent AM-AM/AM-PM errors will appear to corrupt the transmitted modulations. We set up a 3D EM simulation to model the full wireless communication link, where our Doherty radiator acts as a transmitter, and two far-field receiver dipoles are placed with one in the boresight

( $\varphi = 0^\circ$ ,  $\theta = 180^\circ$ ) and the other in a non-boresight ( $\varphi = 0^\circ$ ,  $\theta = 225^\circ$ ) direction (Fig. 26.2.2). Assuming main/auxiliary PAs follow the idealistic Class-B operation, the received signals in both directions show excellent large-signal linearity with  $<0.2\text{dB}$  AM-AM and  $<1.5^\circ$  AM-PM variations through the Doherty operation (Fig. 26.2.2).

In the active circuit designs, we use an identical two-stage PA design for the main and auxiliary PAs, each consists of a common-source driver and a cascode PA. Compact transformer matching is used at the input and inter-stage. We also employ two high-Q T-line inductors to resonate out the PA device output capacitance. In parallel, we design a high-speed adaptive biasing in the auxiliary path to improve main/auxiliary PA cooperation. It is worth mentioning that using the adaptive biasing only at the auxiliary PA is often inadequate. Even at deep PBO, the auxiliary driver output can be sufficiently large to unwantedly turn on the auxiliary PA. In this work, we apply the adaptive biasing to both the auxiliary driver and the PA to ensure that the auxiliary path is completely turned off below 6dB PBO and is rapidly turned on above 6dB PBO. At 0dB PBO, the auxiliary PA gate bias is raised to the similar value as the main PA (0.3V).

The 62-to-68GHz Doherty radiator is implemented in a 45nm CMOS SOI process, occupying  $1.7 \times 1.9\text{mm}^2$  including the antenna. The IC is flip-chip packaged on a Roger CLTE-AT™ laminate to perform back-side radiation. We first characterize its continuous wave performance using a horn antenna and a power sensor at the far field. The output power ( $P_{\text{out}}$ ) of the Doherty radiator is the measured EIRP minus the antenna gain. A separate on-chip antenna test structure is used to measure the antenna gain. At 65GHz, the measured antenna gain is 4.5dBi. Next, the Doherty radiator EIRP is measured, based on which the Doherty radiator  $P_{\text{out}}$  is obtained and summarized in Fig. 26.2.4. Desired Doherty PBO PAE enhancement and linear large-signal performance are consistently achieved across the operation frequency. At 65GHz, the measured saturated EIRP/ $P_{\text{sat}}/P_{1\text{dB}}$  are 23.9/19.4/19.2dBm in the main lobe direction. The measured total PAE at peak/6dB PBO are 27.5/20.1% respectively including the adaptive biasing circuits power consumption, demonstrating 1.46× PAE enhancement at 6dB PBO over an idealistic Class-B PA. At 62 to 68GHz, the Doherty PAE enhancement at 6dB PBO is 1.45 to 1.53× over an idealistic Class-B PA. Next, we characterize the dynamic performance of the Doherty radiator. Figure 26.2.5 shows the 64-QAM complex-modulation tests over the antenna FoV. Without digital pre-distortion (DPD), at the average EIRP/ $P_{\text{out}}$  of +19/14.5dBm, the measured EVM and ACPR are respectively -28dB and -26.97dBc in the boresight directions for 3Gb/s 64-QAM modulation, and consistent EVMs are observed over the antenna FoV (approx.  $-60^\circ$  to  $+60^\circ$ ), verifying its spatially consistent Doherty PBO efficiency enhancement and linear transmission.

Compared with the reported 60-to-80GHz silicon PAs/transmitters [2-8] in Fig. 26.2.6, this work achieves the best PAE PBO enhancement ratio at 6dB PBO, the highest PAE at 6dB PBO, and the highest average PAE for 3Gb/s and 6Gb/s 64-QAM modulation transmission, demonstrating the PBO efficiency advantage of Doherty architecture.

### Acknowledgements:

We would like to thank GlobalFoundries for chip fabrication and members of Georgia Tech GEMS group for their technical supports.

### References:

- [1] W. Doherty, "A New High Efficiency Power Amplifier for Modulated Waves", *Proc. of IRE*, vol. 24, no. 9, pp. 1163-1182, Sept. 1936.
- [2] K. Greene et al., "A 60-GHz Dual-Vector Doherty Beamformer", *IEEE JSSC*, vol. 52, no. 5, pp. 1373-1387, May 2017.
- [3] E. Kaymaksut et al., "Transformer-Based Doherty Power Amplifiers for mm-Wave Applications in 40-nm CMOS," *IEEE TMTT*, vol. 63, no. 4, pp. 1186-1192, Apr. 2015.
- [4] C. Liang and B. Razavi, "Transmitter Linearization by Beamforming," *IEEE JSSC*, vol. 46, no. 9, pp. 1956-1969, Sept. 2011.
- [5] J. Chen et al., "A Digitally Modulated mm-Wave Cartesian Beamforming Transmitter with Quadrature Spatial Combining," *ISSCC*, pp. 232-233, Feb. 2013.
- [6] T. Chi et al., "A 60GHz On-Chip Linear Radiator with Single-Element 27.9dBm Psat and 33.1dBm Peak EIRP Using Multifeed Antenna for Direct On-Antenna Power Combining", *ISSCC*, pp. 296-297, Feb. 2017.
- [7] K. Khalaf et al., "Digitally Modulated CMOS Polar Transmitters for Highly Efficient mm-Wave Wireless Communication," *IEEE JSSC*, vol. 51, no. 7, pp. 1579-1592, July 2016.
- [8] A. Komijani and A. Hajimiri, "A Wideband 77-GHz, 17.5-dBm Fully Integrated Power Amplifier in Silicon," *IEEE JSSC*, vol. 41, no. 8, pp. 1749-1756, Aug. 2006.



Figure 26.2.1: The realization of a Doherty power amplifier with a series power combiner, proposed on-chip one- $\lambda$  wire loop antenna as series combiner, and [Y] matrix derivation of the proposed antenna-based series combiner.



Figure 26.2.2: Proposed antenna-based Doherty PA output stage, 3D EM model of the proposed Doherty-radiator output stage, simulated Doherty active load modulation, and simulated amplitude and phase response received by far field dipoles assuming Main/Auxiliary PAs follow an ideal Class-B PA.



Figure 26.2.3: Top-level schematic of the proposed Doherty radiator, schematic of the Main/Auxiliary PAs, schematic of high-speed adaptive bias circuit for the auxiliary Driver and PA, and performance of adaptive bias outputs versus input power.



Figure 26.2.4: CW measurement of the proposed Doherty radiator at 63/65/67GHz and the CW performance summary from 62 to 68GHz (the PAE enhancement ratio at 6dB PBO is compared to an idealistic Class-B PA with the same PAE at  $P_{1dB}$ ).



Figure 26.2.5: Modulation results of the Doherty radiator with 64-QAM modulation at 0.5Gsym/s at boresight direction and in various directions over the antenna Field-of-View.

|                                   | This work                                        | [2] Greene, ISSCC '17    | [3] Kaynakc, TMTT'15                                     | [5] Chen, ISSCC '13                                                | [6] Chi, ISSCC '17                           | [7] Khalaf JSSC'16                          | [8] Komijani JSSC'07 |
|-----------------------------------|--------------------------------------------------|--------------------------|----------------------------------------------------------|--------------------------------------------------------------------|----------------------------------------------|---------------------------------------------|----------------------|
| Architecture                      | Antenna-based Doherty                            | Doherty                  | Doherty                                                  | Spatial IQ combining                                               | On antenna power combiner                    | Digital Polar Tx                            | Class AB             |
| Frequency (GHz)                   | 65                                               | 62                       | 72                                                       | 60                                                                 | 60                                           | 60                                          | 77                   |
| $V_{DD}$ (V)                      | 1 (Driver)<br>1.9 (PA)                           | 2.8 (Driver)<br>3.6 (PA) | 1.5                                                      | 1                                                                  | 2 (Driver)<br>2 (PA)                         | 0.9                                         | 1.8                  |
| $P_{sat}$ (dBm)                   | 19.4                                             | 17.5*                    | 21                                                       | 9.6                                                                | 27.9                                         | 10.8                                        | 17.5                 |
| $P_{1dB}$ (dBm)                   | 19.2                                             | 17.1                     | 19.2                                                     | 9.6                                                                | 25                                           | 7.4                                         | 14.5                 |
| Peak PAE                          | 28.3%                                            | 23.7%                    | 13.6%                                                    | 28.5%                                                              | 23.4%                                        | 29.8%                                       | 12.8%                |
| PAE @ $P_{1dB}$                   | 27.5%                                            | 23.7%                    | 12.4%                                                    | 28.5%                                                              | 16.2%                                        | 15%*                                        | 11.6%*               |
| PAE @ 6dB PBO                     | 20.1%                                            | 13%                      | 7%                                                       | 14.25%                                                             | 6%*                                          | 4.5%*                                       | 3%*                  |
| PAE Enhancement Ratio at 6dB PBO* | 1.46                                             | 1.10                     | 1.13                                                     | 1                                                                  | 0.74                                         | 0.6                                         | 0.51                 |
| Mod. Scheme                       | 64-QAM<br>3Gb/s<br>-28dB EVM<br>14.5dBm<br>21.2% | N/A                      | 64-QAM<br>0.6Gb/s<br>-25.6dB EVM<br>+15.9dBm<br>7.2% PAE | 16-QAM<br>4.8Gb/s<br>-16.2dB EVM<br>+25.4dBm<br>+8dBm<br>16.5% PAE | 64-QAM<br>6.7Gb/s<br>-18.7dB EVM<br>+19.3dBm | 16-QAM<br>6.7Gb/s<br>-18.1dB EVM<br>+3.6dBm | N/A                  |
| Data Rate                         | EVIM<br>$P_{avg}$<br>$PAE_{avg}$                 | 130nm SiGe BiCMOS        | 40nm CMOS                                                | 65nm CMOS                                                          | 45nm CMOS SOI                                | 40nm CMOS                                   | 120nm SiGe BiCMOS    |
| Technology                        | 45nm CMOS SOI                                    | 130nm SiGe BiCMOS        | 40nm CMOS                                                | 65nm CMOS                                                          | 45nm CMOS SOI                                | 40nm CMOS                                   | 120nm SiGe BiCMOS    |

\* Estimated from reported figures

\*\* Compared to an idealistic class-B PA with the same PAE at  $P_{1dB}$

Figure 26.2.6: Table of comparison with prior PAs/transmitters operating from 60 to 80GHz.



Figure 26.2.7: Die micrographs of the Doherty Radiator, the antenna test structure, and photos of the flip-chip packaged PCBs.

## 26.3 A 69-to-79GHz CMOS Multiport PA/Radiator with +35.7dBm CW EIRP and Integrated PLL

Behrooz Abiri, Ali Hajimiri

California Institute of Technology, Pasadena, CA

Low-cost mm-wave silicon integrated signal generation and processing enable many applications, such as silicon-based automotive radars for self-driving cars and wireless communications. Some challenges encountered in commercialization of such systems are the high packaging and testing costs and high sensitivity to antenna parameters, which can diminish the advantage of integrated silicon solutions. On-chip antennas have been proposed as a solution to reduce the packaging costs [1,2]. Link budget analysis of systems (e.g., radar) necessitates high-power (high EIRP) transmitters while system resolution analysis suggests higher frequency of operation for better spatial resolution. The scaling of CMOS transistors facilitates the latter requirement, but, unfortunately, the lower breakdown voltage of the transistors reduces their maximum power handling capabilities at a given radiator impedance. Several approaches have already been implemented to address this issue, each with its own shortcoming. Power-combining multiple PA outputs with passive on-chip power combiners [3] adds extra loss and reduces the overall efficiency, spatial power combining using phased arrays [4] consumes a large die area. Power combining at the antenna [5,6] has been proposed as an approach to address these challenges. In this paper, we propose a spatial PA/radiator power combining approach with optimal PA-load design using strongly coupled antennas in close proximity. This approach utilizes techniques of power combining in free space resulting in favorable drive-point impedance design and using on-chip PAs and radiators to achieve high radiated output power.

Figure 26.3.1 shows the concept of the proposed strongly coupled radiator and its associate impedance scaling. A single slot antenna has a radiation impedance of around  $520\Omega$  and thus only radiates around 1mW if driven with a 1V swing. If a second strongly coupled slot in close proximity to the first one is driven in phase, the total radiated power increases fourfold; similarly, three slots radiate 9mW, and the quadratic trend holds up for slots placed in such fashion so long as the overall dimension is smaller than roughly one wavelength. The quadratic increase in radiated power is due to the simultaneous reduction of the drive-point impedance and the increased number of the power sources. The addition of a strongly coupled slot radiator increases the total radiated power not only by increasing the number of power sources but also by increasing the radiated power of each slot through lowering the drive-point impedance. Unlike previous spatial power-combining methods, the proposed array of slots does not significantly change the radiation pattern compared to a single-element antenna.

We implemented a 16-element slot array of the proposed radiator in the top aluminum layer of a 65nm bulk CMOS process. Figure 26.3.2 shows the dimensions of the implemented radiator as well as the topology of 8 PAs and their transistor sizing. Each PA is a pseudo-differential cascode stage with a 24pH shunt inductor to resonate the parasitic capacitance of the cascode node and drives two slots by utilizing a virtual ground between them, as shown Fig. 26.3.2. By properly designing the slot length, we are able to completely absorb the parasitic capacitance of the PA output node into the antenna structure. The antenna also provides the DC power to the PA, eliminating the need for RF chokes and improving the PA efficiency. All the PAs are driven in-phase through a differential 77GHz binary-tree clock-distribution network.

To allow for FMCW operation, a PLL with a multiplication ratio of 32 was implemented to generate a 10GHz BW chirp using a synthesized 2.156-to-2.469GHz reference signal. The block diagram of the PLL is shown in Fig. 26.3.3. The closed loop BW of the PLL is higher than 20MHz, allowing fast chirp rates. The VCO has a tuning range of 67.9 to 80.6GHz, while the PLL locking range was 69 to 79GHz. The measured phase noise of the PLL is -96.4dBc/Hz at 1MHz offset. This is obtained by downconverting the radiated power using the 5<sup>th</sup> harmonic of PMP MOD-WM harmonic mixer.

Electromagnetic simulations indicate that the radiation efficiency of the radiator can be improved from 46% to 52% by using a low-cost 6.35mm diameter alumina hemispherical lens, which increases the directivity of the antenna from 5 to 10dBi and also improves heat dissipation. Full 3D pattern measurements of the radiator with the lens were performed over the full frequency span from which the directivity of the radiator was obtained. Figure 26.3.4 shows the highlights of these measurements.

The EIRP of the radiator at broad side was measured using Agilent V8486A and W8486A power sensors with 15dBi WR-15 and 25dBi WR-12 standard horns, respectively. The calculated gain of the horn antennas vs. frequency [7] matches the datasheet at provided frequency points. Total radiated power (TRP) was calculated from the measured EIRP and measured radiator directivity. The PA efficiency and total output power can be calculated from TRP and the simulated antenna efficiency. Figure 26.3.5 shows these measurement results. The radiator achieves a peak EIRP of +35.7dBm at 71.25GHz when running continuously from a  $V_{DD}=1.8$  supply and consumes 1006mA of current. The measurements correspond to a maximum TRP of +24.4dBm with a peak directivity of 12.2dBi. The combined PA peak output power is +27.4dBm with a drain efficiency of 30.8%. Simulations indicate that the output power varies by less than 20% across the PAs due to presence of edge effects in the structure.

A comparison table provided in Fig. 26.3.6 summarizes the results of this work and compares them against other works.

### Acknowledgment:

The authors would like to thank Dr. Florian Bohn for helpful discussions and Dr. Amirreza Safaripour for assistance in testing. This work was supported by Caltech Innovation Initiative (CI<sup>2</sup>) research grant.

### References:

- [1] S. Bowers et al., "An integrated traveling-wave slot radiator," *IEEE RFIC*, pp. 369-372, June 2014.
- [2] A. Babakhani et al., "A 77GHz 4-Element Phased Array Receiver with On-Chip Dipole Antennas in Silicon," *ISSCC*, pp. 629-638, Feb. 2006.
- [3] C. Chappidi and K. Sengupta, "A Frequency-Reconfigurable mm-Wave Power Amplifier with Active-Impedance Synthesis in an Asymmetrical Non-Isolated Combiner", *ISSCC*, pp. 344-345, Feb. 2016.
- [4] W. Shin et al., "A 108–114 GHz 4x4 Wafer-Scale Phased Array Transmitter with High-Efficiency On-Chip Antennas," *IEEE JSSC*, vol. 48, no. 9, pp. 2041-2055, Sept. 2013.
- [5] A. Natarajan et al., "A 77GHz Phased-Array Transmitter with Local LO-Path Phase-Shifting in Silicon," *ISSCC*, pp. 639-648, Feb. 2006.
- [6] T. Chi et al., "A 60GHz On-Chip Linear Radiator with Single-Element 27.9dBm Psat and 33.1dBm Peak EIRP Using Multifeed Antenna for Direct On-Antenna Power Combining," *ISSCC*, pp. 296-297, Feb. 2017.
- [7] RF Wireless World, "Horn Antenna Calculator, Accessed on Aug. 10, 2017, <<http://www.rfwireless-world.com/calculators/Horn-Antenna-Calculator.html>>
- [8] B. Sadhu et al., "A 60GHz Packaged Switched Beam 32nm CMOS TRX with Broad Spatial Coverage, 17.1dBm Peak EIRP, 6.1dB NF at < 250mW," *IEEE RFIC*, pp. 342-343, May 2016.
- [9] P. N. Chen et al., "A 94GHz 3D-Image Radar Engine with 4TX/4RX Beamforming Scan Technique in 65nm CMOS," *ISSCC*, pp. 146-147, Feb. 2013.



Figure 26.3.1: Impedance- and radiated-power scaling of tightly coupled slot antennas.



Figure 26.3.2: Eight differential cascode PAs drive 16 slots. The radiator is designed to absorb the parasitic capacitance of PA output.



Figure 26.3.3: PLL block diagram and measurement results.



Figure 26.3.4: Measured and simulated radiation pattern of the radiator.



Figure 26.3.5: Radiator and PA performance.

| Process                                         | This Work<br>65nm<br>CMOS        | [1]<br>32nm<br>SOI | [4]<br>180nm<br>SiGe | [6]<br>45nm<br>SOI | [8]<br>32nm<br>SOI | [9]<br>65nm<br>CMOS |
|-------------------------------------------------|----------------------------------|--------------------|----------------------|--------------------|--------------------|---------------------|
| Frequency(GHz)                                  | 69-79                            | 134.5              | 108-114              | 53-63              | 58.3-60.5          | 88-99               |
| EIRP(dBm)                                       | 35.7                             | 6.0                | 24.5                 | 33.1               | 17.1               | 35                  |
| On Chip Antenna<br>Radiator Directivity<br>(dB) | 12.2                             | 7.1                | 17                   | 6.9                | 32.5               | 36<br>(Disk Ant.)   |
| Tot. Rad. Power (dBm)                           | 24.4                             | -1.3               | 7.5                  | 26.2               | N/A                | 0                   |
| Antenna Efficiency                              | 52%                              | 39%                | 45%                  | N/A                | N/A                |                     |
| PA P <sub>sat</sub> (dBm)                       | 27.4                             | N/A                | 11                   | 27.9               | N/A                | N/A                 |
| PA Efficiency                                   | 30.8%<br>(Drain Eff.)            | N/A                | N/A                  | 23.4%<br>(PAE)     | N/A                | N/A                 |
| Phase Noise (dBc/Hz)                            | -96.4@1MHz                       | N/A                | N/A                  | N/A                | -113<br>@10MHz     | -85.6<br>@1MHz      |
| Total DC Power(W)                               | PA+PLL/Distrib<br>1.81+0.55=2.36 | 0.17               | 3.4                  | N/A                | 0.23               | 0.6                 |
| Chip Area(mm <sup>2</sup> )                     | 2.9                              | 1.2                | 39                   | 10.5               | 9<br>(RX+TX)       | 4.32<br>(RX+TX)     |

Figure 26.3.6: Comparison with other works.



Figure 26.3.7: Die micrograph.

## 26.4 A 28GHz 41%-PAE Linear CMOS Power Amplifier Using a Transformer-Based AM-PM Distortion-Correction Technique for 5G Phased Arrays

Sheikh Nijam Ali<sup>1</sup>, Pawan Agarwal<sup>2</sup>, Joe Baylon<sup>1</sup>, Srinivasan Gopal<sup>1</sup>, Luke Renaud<sup>1</sup>, Deukhyoun Heo<sup>1</sup>

<sup>1</sup>Washington State University, Pullman, WA, <sup>2</sup>MaxLinear, San Diego, CA

To fulfill the insatiable demand for high data-rates, the millimeter-wave (mmW) 5G communication standard will extensively use high-order complex-modulation schemes (e.g., QAM) with high peak-to-average power ratios (PAPRs) and large RF bandwidths. High-efficiency integrated CMOS power amplifiers (PA) are highly desirable for portable devices for improved battery life, reduced form factor, and low cost. To meet simultaneous requirements for high efficiency and reasonable linearity, PAs intended for use with complex modulation are often operated in Class-AB mode [1,2]. For small input amplitude in Class-AB, the device is turned-on and has an input capacitance ( $C_{gs}$ ) of  $\sim(2/3)WL_C_{ox}$ . As the input amplitude becomes large, the device turns-off for part of the RF cycle, thus reducing its effective input capacitance. This input capacitance-modulation effect creates an input-amplitude-dependent phase shift in Class-AB mode resulting in an amplitude-modulation to phase-modulation (AM-PM) distortion [2]. Consequently, it degrades linearity metrics (e.g., error vector magnitude (EVM), adjacent channel power ratio (ACPR)) in complex-modulation systems. External linearization techniques (e.g., digital pre-distortion) are often used in transmitters to meet linearity requirements, but they are complex in nature and expensive to implement. Apart from these, few works at low-GHz frequencies are reported to improve the PA's intrinsic linearity using a varactor- or PMOS-based AM-PM correction methods [1,2]. These works reduce the design overhead of external linearization systems; however, the inclusion of additional capacitive element to correct AM-PM degrades gain and efficiency, which is not optimal for mmW frequencies [1,2].

To address linearity, without degrading performance or introducing dramatic design complexity, we propose a 2-stage linear PA architecture where a compensation transformer is integrated into the amplifier chain to correct AM-PM distortion while maintaining high power efficiency. Figure 26.4.1 shows the conceptual architecture and waveforms of the proposed technique. The 1<sup>st</sup> and 2<sup>nd</sup> amplifiers are driver (DA) and power (PA) stage, respectively, and both are biased at Class-AB. A harmonically tuned, continuous Class-F load network is used in the 2<sup>nd</sup>-stage for high efficiency [3]. The transformer ( $T_c$ ) acts as an analog pre-distortion network, which is used to compensate for the AM-PM phase shift from the 2-stage amplifier. The proposed  $T_c$  samples the RF signal from the input of the DA and generates a nonlinear phase response of  $\theta_c$ . The magnitude response of the  $\theta_c$  is designed to be larger than the inherent phase response of the DA,  $\theta_{DA}$  (i.e.  $|\theta_c| > |\theta_{DA}|$ ). The 2<sup>nd</sup> stage has a phase response of  $\theta_{PA}$ , and the design conditions (such as device size, bias level etc.) of this stage are set such that  $\theta_{PA} = -\theta_{SI}$ , where  $\theta_{SI} = \theta_c + \theta_{DA}$ . Thus, the net AM-PM distortion is reduced.

Figure 26.4.2 shows the complete schematic of the proposed architecture. The primary coil of the transformer is connected at the input signal path, while the secondary coil of the transformer is connected to an NMOS switch ( $M_{sw}$ ). The gate terminal of the  $M_{sw}$  taps into the gate of the DA's device ( $M_1$ ), while the drain and source nodes are connected to a large-resistor for well-defined dc bias. As the input power ( $P_{in}$ ) rises in magnitude (large-signal), the net impedance of  $T_c$  primary coil creates an inverse characteristic in comparison to the net impedance created by  $C_{gs1}$  in the DA. By appropriately selecting the size of  $T_c$  and thereby controlling the net impedance change across  $P_{in}$ , a desired phase shift of  $\theta_{SI}$  can be generated at the output of DA. The layout and the equivalent circuit of the transformer for  $M_{sw}$  on and off are shown in Fig. 26.4.2. If  $M_{sw}$  is off, there is no current flow in the secondary coil of  $T_c$ ; therefore, the net inductance in primary coil remains unchanged (i.e.  $L_{eq,off} = L_{cp}$ ). Conversely, when  $M_{sw}$  is on, the secondary coil develops an opposing net current of  $I_{cs}$ , which reduces the net inductance to a lower value in primary coil,  $L_{eq,on} = L_{cp}(1-k^2)$ . The net impedance waveform for a large signal is shown in Fig. 26.4.2 for both  $T_c$  and  $C_{gs1}$  across  $P_{in}$ .

The proposed AM-PM correction technique offers three key benefits for enhancing PA large-signal performance. First, the control signal of the transformer is directly tapped from the input of the DA's gate terminal, thus there is no need to implement any additional control circuitry. Since the tapped signal is in RF domain the correction process is instantaneous and synchronized with the DA's large-signal behavior, unlike the phase adjusting method in [2] limited to baseband

operation. Second, designers can allow lower quiescent bias current in the DA thus generating higher efficiency in contrast to typically designed low-efficiency Class-A DA. Finally, due to the inductive linearization, the net capacitive impedance at the input of the DA reduces; hence, high gain and efficiency can be achieved in the DA compared to other capacitive-based linearization methods as in [1].

To demonstrate the proposed technique, a PA prototype is fabricated in a 65nm CMOS process. Deeply scaled CMOS process used in moderate-to-high  $P_{o,sat}$  (e.g.,  $>10$ dBm) levels pose stability concerns due to an increased Miller effect from a large gate-drain capacitance ( $C_{gd}$ ). Hence, PAs often need to operate at compromised performance levels. To cope with this adverse effect, we integrate a tunable gate-drain transformer ( $T_{N1}$  and  $T_{N2}$ ) feedback neutralization network as implemented in [4], in both stages using a switched-substrate-shield layout (SSL) technique [5]. The continuous Class-F output matching network is designed using a multi-order tuned network consisting of 3<sup>rd</sup>-order harmonic matching [3]. Furthermore, a tunable inductor ( $L_{int}$ ) using the SSL technique is integrated into the inter-stage matching network. These tunable components ( $T_{N1}$ ,  $T_{N2}$ , and  $L_{int}$ ) provide flexibility to compensate PVT variations and mismatch, ensuring high performance.

Results of the AM-PM phase distortion at the 1<sup>st</sup> (DA), 2<sup>nd</sup> (PA), and 2-stage amplifier are shown in Fig. 26.4.3. Less than 0.7° measured phase distortion is achieved at  $P_{1dB}$  for the 2-stage amplifier at 28GHz. The distortion is about 1.3° near  $P_{sat}$  at 28GHz, enabling amplification of large PAPR signals like 64/256-QAM with low EVM. The PA achieves <1° measured phase distortion at  $P_{1dB}$  across 27 to 31GHz. The large-signal performance for 1-tone signals are presented in Fig. 26.4.4. A PAE<sub>sat</sub> of 41% at 28GHz for  $P_{o,sat}$  of 15.6dBm is achieved. The PAE<sub>sat</sub> varies between 38 and 41% from 26 to 29GHz while maintaining  $P_{sat}>15$ dBm. The PA is tested under 64/256/512-QAM signals with 340/50/20MSym/s data-rate with measurement results summarized in Fig. 26.4.5. Due to the low AM-PM distortion at  $P_{1dB}$  to  $P_{sat}$  level, the PA shows high average power-efficiency for high-order-QAM signals without any external phase pre-distortion, while maintaining excellent EVM and ACPR results. The proposed technique dramatically improves the PA large-signal performance while reducing the complexity and implementation cost as compared with traditional works.

Figure 26.4.6 summarizes recently reported silicon PAs intended for 5G band. At 28GHz, the proposed linear PA amplifies a 340MSym/s 64-QAM signal with -26.4dB EVM and -30dBc ACPR while achieving PAE of 18.2% at +9.8dBm of  $P_{o,avg}$ . Figure 26.4.7 shows the die micrograph of the PA with an active area of only 0.24mm<sup>2</sup>.

### Acknowledgements:

This work was supported in part by the U.S. NSF under Grant CNS-1705026, CNS-1564014, the Joint Center for Aerospace Technology Innovation (JCATI), and the NSF Center for Design of Analog-Digital Integrated Circuits (CDADIC). We also thank Keysight Technologies for measurement support.

### References:

- [1] C. Wang et al., "A Capacitance-Compensation Technique for Improved Linearity in CMOS Class-AB Power Amplifiers," *IEEE JSSC*, vol. 39, no. 11, pp. 1927-1937, Nov. 2004.
- [2] Y. Palaskas et al., "A 5-GHz 20-dBm Power Amplifier with Digitally Assisted AM-PM Correction in a 90-nm CMOS Process," *IEEE JSSC*, vol. 41, no. 8, pp. 1757-1763, Aug. 2006.
- [3] S.N. Ali et al., "A 42–46.4% PAE Continuous Class-F Power Amplifier with  $C_{gd}$  Neutralization at 26–34 GHz in 65 nm CMOS for 5G Applications," *IEEE RFIC*, pp. 212-215, June 2017.
- [4] S.N. Ali et al., "Reconfigurable High Efficiency Power Amplifier with Tunable Coupling Coefficient Based Transformer for 5G Applications," *IEEE IMS*, pp. 1177-1180, June 2017.
- [5] P. Agarwal et al., "Switched Substrate-Shield-Based Low-Loss CMOS Inductors for Wide Tuning Range VCOs," *IEEE TMTT*, vol. 65, no. 8, pp. 2964-2976, Aug. 2017.
- [6] S. Shakib et al., "A Highly Efficient and Linear Power Amplifier for 28-GHz 5G Phased Array Radios in 28-nm CMOS," *IEEE JSSC*, vol. 51, no. 12, pp. 3020-3036, Dec. 2016.
- [7] S. Hu et al., "A 28GHz/37GHz/39GHz multiband linear Doherty power amplifier for 5G massive MIMO applications," *ISSCC*, pp. 32-33, Feb. 2017.
- [8] B. Park et al., "Highly Linear mm-Wave CMOS Power Amplifier," *IEEE TMTT*, vol. 64, no. 12, pp. 4535-4544, Dec. 2016.
- [9] M. Vigilante and P. Reynaert, "A 29-to-57GHz AM-PM compensated class-AB power amplifier for 5G phased arrays in 0.9V 28nm bulk CMOS," *IEEE RFIC*, pp. 116-119, June 2017.



Figure 26.4.1: Architecture of the proposed AM-PM-distortion correction technique for mmW CMOS PA.



Figure 26.4.3: Results of  $C_{gs}$ ,  $L_{eq}$ , AM-PM, and small-signal S-parameters.



Figure 26.4.5: Measurement results of demodulated signals at 28GHz.



Figure 26.4.2: Schematic of the proposed 2-stage linear PA network incorporating a transformer based AM-PM-distortion correction network.



Figure 26.4.4: Measurement results of large-signal and linearity metrics.

| Ref.                                 | This Work                                                                                                                                                                       | JSSC'16 [6]                                         | ISSCC'17 [7]                                               | TMFT'16 [8]                                              | RFIC'17 [9]                                       |
|--------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------|------------------------------------------------------------|----------------------------------------------------------|---------------------------------------------------|
| Tech.                                | 65nm CMOS                                                                                                                                                                       | 28nm CMOS                                           | 130nm SiGe                                                 | 28 nm CMOS                                               | 28nm CMOS                                         |
| Freq. (GHz)                          | 28                                                                                                                                                                              | 30                                                  | 28                                                         | 28                                                       | 34                                                |
| V <sub>DD</sub> (V)                  | 1.1                                                                                                                                                                             | 1.0                                                 | 1.5                                                        | 1.1                                                      | 0.9                                               |
| Gain (dB)                            | 15.8                                                                                                                                                                            | 15.7                                                | 18.2                                                       | 10                                                       | 20.8                                              |
| P <sub>o,1dB</sub> (dBm)             | 14                                                                                                                                                                              | 13.2                                                | 15.2                                                       | 14                                                       | 13.4                                              |
| P <sub>o,avg</sub> (dBm)             | 15.6                                                                                                                                                                            | 14                                                  | 16.8                                                       | 14.8                                                     | 16.6                                              |
| PAE <sub>1dB</sub> (%)               | 34.7                                                                                                                                                                            | 34.3                                                | 19.5                                                       | 35.2                                                     | 12.6                                              |
| PAE <sub>max</sub> (%)               | 41                                                                                                                                                                              | 35.5                                                | 20.3                                                       | 36.5                                                     | 24.2                                              |
| AM-PM (deg.) @ P <sub>1dB</sub>      | 0.7                                                                                                                                                                             | NA*                                                 | NA                                                         | NA                                                       | 1.1*                                              |
| Modulated Signal Measurement Results | 64-QAM 340MSym/s +9.8dBm -26.4dB EVM -30dBc ACPR 18.2%PAE<br>256-QAM 50MSym/s +9.4dBm -7.8dB EVM -28dBc ACPR 16.3%PAE<br>512-QAM 20MSym/s +9.0dBm -11.2dB EVM -28dBc ACPR 9%PAE | 64-QAM 250MSym/s +4.2dBm -25dB EVM -28.4dBc ACPR NA | 64-QAM 500MSym/s +9.2dBm -27dB EVM -28.4dBc ACPR 16.5% PAE | 64-QAM 80MSym/s +6.8dBm -27.4dB EVM -34dBc ACPR 5.8% PAE | 64-QAM 337MSym/s +10.1dBm -25dB EVM -32.1dBc ACPR |
| Active Area (mm <sup>2</sup> )       | 0.24                                                                                                                                                                            | 0.16                                                | 1.76**                                                     | 0.28                                                     | 0.16                                              |
| Topology                             | Transformer based AM-PM correction                                                                                                                                              | Inductive source degeneration                       | Doherty                                                    | 2 <sup>nd</sup> harmonic short, common-source            | PMOS varactor based AM-PM compensation            |

\*Including pads (total area). \*Not available. \*\*Estimated from graph

Figure 26.4.6: Comparison with mmW silicon PA's.



Figure 26.4.7: Die micrograph in a 65nm CMOS technology.

## 26.5 A Compact Dual-Band Digital Doherty Power Amplifier Using Parallel-Combining Transformer for Cellular NB-IoT Applications

Yun Yin, Liang Xiong, Yiting Zhu, Bowen Chen, Hao Min, Hongtao Xu

Fudan University, Shanghai, China

Narrowband Internet-of-Things (NB-IoT) is a newly developed 3GPP protocol optimized for low-power wide-area IoT applications and is evolving toward the future fifth-generation (5G) mobile communication. It specifies at least 23dBm maximum output power for long-range communication, stringent emission mask compatible with guard-band or in-band scenarios, and it supports multiple operation bands from 699 to 915MHz (LB) and from 1710 to 1980MHz (HB). For cost reduction, longer battery life, and fast time to market, the integration of high-power high-efficiency power amplifiers (PAs) on-chip is greatly demanded. To benefit from advanced CMOS technology, the digital polar transmitter has become a very attractive architecture for NB-IoT applications [1]. To simultaneously support dual bands for user flexibility, the traditional solution is to implement two separately optimized PAs [2], which requires extra design effort and increases die area. An ultra-compact single-transformer-based parallel power combiner proposed in [3] provides optimum load transformation in the two operation bands. Moreover, to support higher throughputs and achieve better spectral efficiency, high peak-to-average-power-ratio (PAPR) multi-subcarrier modulation is adopted in NB-IoT, which requires the PA to be efficient not only at peak power but also at power back-off (PBO) to extend battery life. Efficiency boosting techniques of digital Doherty PAs have been shown in [4-6], but two transformers are needed in the passive network. In this work, a high-power digital Doherty PA for NB-IoT applications is proposed and introduces a parallel-combining-transformer (PCT) power combiner for dual-band coverage, back-off efficiency enhancement, and ultra-compact implementation.

Figure 26.5.1 illustrates the operations of proposed Doherty PA through a PCT combiner. The 3-coil current-mode combiner acts as a 4-way power combiner, where the two primary coils are driven by two differential PA pairs to collect their output currents and provide optimum load transformation. The transformer combiner is configured as shown in Fig. 26.5.1 to maintain both PA pairs quasi-differential and close to each other, which is critical to minimize differential mismatch and to reduce effective ground parasitics. At peak power, both PAs are fully switched on and the currents flow through the two primary coils in-phase at full amplitude. Here, the single-ended impedance seen by each PA is  $50\Omega$ . In the case where both PAs are controlled synchronously, load impedance of each PA stays the same when output power is gradually backed off from peak. If operated as Class-B mode, at 6dB PBO, the DC power consumption is only reduced by half, which suffers from 50% efficiency degradation. In this work, each PA pair is controlled independently to perform as a Doherty PA. With output power decreased, output power from PA2 is gradually reduced, and the single-ended impedance seen by PA1 starts to increase until it reaches  $100\Omega$ . Then, an efficiency peaking at 6dB PBO is achieved when PA2 is fully switched off, thereby enhancing the average efficiency.

Figure 26.5.2 shows the block diagram of proposed digital Doherty PA. The switched-capacitor digital PA is employed due to its good linearity and efficiency. To meet the stringent NB-IoT emission mask, a total resolution of 10 bits is adopted, where PA1 and PA2 are identical 9b sub-PAs. To minimize layout mismatch and achieve better resolution, each sub-PA is constructed using a hybrid unary and binary array. The sub-PA is divided into 16 groups with the 4 MSBs, where each group consists of 7 thermometer-coded cells and 2b LSB binary-coded cells. A cascode inverter Class-D topology is employed in the unit PA cell to distribute  $2 \times V_{DD}$  voltage stress among 4 transistors and provide high output power. Single-ended PM signal is converted by an S-to-D block to generate differential signals, which are further level-shifted to two voltage domains. At off-state, the differential outputs of the unit PA are both connected to ground to avoid different fluctuations between supply and ground. Besides, metal-oxide-metal (MOM) switched capacitors formed by middle metal layers are used to reduce parasitic capacitors. In the floorplan, each sub-PA is arranged in order of groups and the “snake” traverse movements are performed among groups to improve the differential nonlinearities (DNL).

Figure 26.5.3 shows the implementation of dual-band passive matching network. The 3-coil parallel-combining transformer is implemented within a compact single-transformer footprint. PA1 and PA2 are placed at the same side of the transformer to maintain good differential symmetry. The output currents from PA1 and PA2 flow in parallel direction in the two primary coils, which achieves magnetic enhancement and increases the effective primary inductance, thus contributing to further size reduction and lower loss. This is important for on-chip matching design at sub-GHz. The passive matching network of the PA device parasitic capacitors, switched capacitors ( $C_0$ ), a 3-coil transformer, and 3 capacitors ( $C_1$  and  $C_2$ ) transforms the optimum load for PA1 and PA2, which realizes two power peaks over LB and HB. The total passive loss from sub-PA to  $50\Omega$  load is less than 1.6dB over a wide frequency range. The balun function is implemented on-chip with a single-ended output pad. In this work, a total of 1.2nF decoupling capacitors are integrated to attenuate supply and ground ringing while improving the linearity and noise performance.

The proposed dual-band digital Doherty PA is implemented in a 55nm CMOS process (Fig. 26.5.7) and occupies an area of  $1.26 \times 0.88\text{mm}^2$  including all decoupling capacitors and ESD I/O pads. The PA is packaged in a QFN package and surface mounted on an evaluation PCB board with a single-ended LO input and an RF output. The chip operates with dual power supplies of 2.4V and 1.2V, and the power consumption of all IO buffers, LO distribution drivers, logic blocks, and PAs is included in the PAE calculation. The measured dual-band continuous-wave (CW) results are shown in Fig. 26.5.4, where the LB achieves 28.9dBm peak power with 36.8% peak PAE, and the HB obtains 27.0dBm peak power with 25.4% peak PAE. The relatively flat output power is achieved over LB and HB, respectively, with the maximum power deviations of 0.3dB and 0.2dB within the frequency bands. Owing to the Doherty load, the PAE of 29.9% and 16.8% is achieved at 6dB PBO for 850MHz and 1.7GHz, respectively, showing the average efficiency enhancement.

In modulation tests, memoryless digital pre-distortion look-up tables are utilized to linearize the PA. With the 12-subcarrier NB-IoT signals, the PA output spectrum meets the stringent NB-IoT emission-mask requirements across both LB/HB bands, as demonstrated in Fig. 26.5.5. With -21.6dB EVM, the PA achieves the  $P_{avg}$  of 24.4dBm with average PAE of 29.5% at 850MHz, and the  $P_{avg}$  of 23.0dBm with average PAE of 17.9% at 1.7GHz. Moreover, this PA can be applied to wideband communication, such as WLAN and LTE. With 20MHz 64-QAM/256-QAM WLAN signals, the PA obtains 22.9dBm  $P_{avg}$ , 26.1% average PAE, -25.3dB EVM and 20.8dBm  $P_{avg}$ , 22.7% average PAE, -30.5dB EVM, respectively. The measured performance of proposed PA is summarized in Fig. 26.5.6 and compared with prior works. This work delivers a close-to-Watt-level peak output power and the best average PAE with on-chip matching at sub-GHz among results in Fig. 26.5.6. By using a parallel-combining-transformer power combiner, the dual-band frequency coverage, high output power, and backoff efficiency enhancement are simultaneously achieved with the smallest footprint, thus well-fitting low-cost NB-IoT applications.

### Acknowledgements:

The authors would like to thank the State Key Laboratory of ASIC and System at Fudan University for measurement support.

### References:

- [1] Z. Song et al., “A Low-Power NB-IoT Transceiver with Digital-Polar Transmitter in 180-nm CMOS,” *IEEE TCAS-I*, vol. 64, no. 9, pp. 2569-2581, Sept. 2017.
- [2] J. Moreira et al., “A Single-Chip HSPA Transceiver with Fully Integrated 3G CMOS Power Amplifiers,” *ISSCC*, pp. 162-163, Feb. 2015.
- [3] J. Park et al., “A Highly Linear Dual-Band Mixed-Mode Polar Power Amplifier in CMOS with an Ultra-Compact Output Network,” *IEEE JSSC*, vol. 51, no. 8, pp. 1756-1770, Aug. 2016.
- [4] S. Hu et al., “Design of A Transformer-Based Reconfigurable Digital Polar Doherty Power Amplifier Fully Integrated in Bulk CMOS,” *IEEE JSSC*, vol. 50, no. 5, pp. 1094-1106, May 2015.
- [5] V. Vorapipat et al., “A Class-G Voltage-Mode Doherty Power Amplifier,” *ISSCC*, pp. 46-47, Feb. 2017.
- [6] H. Xu et al., “Systems and Methods for Combining Power through A Transformer,” US Patent 9,306,503, Apr. 2016.



Figure 26.5.1: Digital Doherty PA operation at 0dB/6dB PBO using parallel combing transformer.



Figure 26.5.2: Block diagram of the proposed dual-band digital Doherty PA.



Figure 26.5.3: Implementation of dual-band passive matching network and its simulation results.



Figure 26.5.4: Measured dual-band CW results of output power, PAE and frequency response.



Figure 26.5.5: Measured PA NB-IoT output spectrums and its average  $P_{out}$  and PAE versus frequency.

|                              | This work                                    | TCAS-I 2017 [1] <sup>*</sup>      | ISSCC 2015 [2]                    | JSSC 2016 [3] <sup>*</sup> | JSSC 2015 [4] <sup>*</sup> | ISSCC 2017 [5]         |
|------------------------------|----------------------------------------------|-----------------------------------|-----------------------------------|----------------------------|----------------------------|------------------------|
| Freq. (GHz)                  | 0.85/1.7                                     | 0.891                             | 0.95/1.95                         | 2.6/4.5                    | 3.82                       | 3.5                    |
| On-chip Balun                | Yes<br>(1 transformer)                       | No<br>(Off-chip matching)         | Yes<br>(2 transformer)            | Yes<br>(1 transformer)     | Yes<br>(2 transformer)     | Yes<br>(2 transformer) |
| Peak Pout (dBm)              | 28.9/27.0                                    | 23.2                              | 32                                | 28.1/26.0                  | 27.3                       | 25.3                   |
| Peak PAE (%)                 | 36.8/25.4                                    | 44.5                              | -                                 | 35/21.2                    | 32.5 (DE)                  | 30.4                   |
| 6dB PBO (dB)                 | 29.9/16.8                                    | -                                 | -                                 | -                          | 22 (DE) <sup>**</sup>      | 25.3                   |
| Modulation                   | 12 Subcarriers<br>180kHz Bandwidth<br>NB-IoT | 20MHz<br>64QAM/256QAM<br>WLAN @LB | Single Carrier,<br>3.75kHz NB-IoT | HSPA                       | 8MS/s<br>256QAM            | 500ks/s 16QAM          |
| Pavg (dBm)                   | 24.4/23.0                                    | 22.9/20.8                         | 18.87                             | >26/26                     | 20.37/18.53                | 21.8                   |
| PAE (%)                      | 29.5/17.9                                    | 26.1/22.7                         | 33.4                              | 23.7/23.8                  | 16.26/13.42                | 22.1 (DE)              |
| EVM (dB)                     | -21.6                                        | -25.3/-30.5                       | -28.2                             | -26.6/-23.2                | -36.3/-34.6                | -25.0                  |
| Supply (V)                   | 1.2/2.4                                      | 2                                 | 3                                 | 1.5/3                      | 1.2/3                      | 1.2/2.4                |
| Chip Size (mm <sup>2</sup> ) | 1.11                                         | 1.75                              | 3 <sup>†</sup>                    | 2.25                       | 2.09                       | 1.2                    |
| Technology                   | 55nm CMOS                                    | 180nm CMOS                        | 65nm CMOS                         | 65nm CMOS                  | 65nm CMOS                  | 45nm CMOS SOI          |

\*Results from the second prototype. <sup>†</sup>Estimated total area of RF DACs, LB PA and PA from chip micrograph.

<sup>\*\*</sup>Results measured with GSG probe. <sup>\*\*</sup>Estimated from Fig. 13 in [4].

Figure 26.5.6: Performance summary and comparison with prior works.



Figure 26.5.7: Die micrograph.

## 26.6 A Continuous-Mode Harmonically Tuned 19-to-29.5GHz Ultra-Linear PA Supporting 18Gb/s at 18.4% Modulation PAE and 43.5% Peak PAE

Tso-Wei Li, Ming-Yu Huang, Hua Wang

Georgia Institute of Technology, Atlanta, GA

The 5<sup>th</sup> generation (5G) mm-wave systems are expected to support wideband spectrum-efficient modulations (e.g., 64-QAM or 256-QAM) to achieve Gb/s-link-throughput revolution. These complex modulation schemes, however, often come with high-density constellations that demand stringent linearity, i.e. AM-AM and AM-PM, on the mm-wave front-end circuits, in particular, the power amplifiers (PAs). In addition, to support future massive MIMOs, the mm-wave front-ends should be ultra-efficient in both their energy efficiency and area usage, posing even more constraints on the PA designs [1-5].

Practical mm-wave PAs, especially silicon-based PAs, always face a steep trade-off between the PA efficiency and linearity. Conventional linear PAs, e.g., Class-AB PAs, offer design simplicity and good linearity. However, their over-simplified output harmonic terminations often limit the peak efficiency. Millimeter-wave time-domain switching PAs, e.g., Class-E PAs, show high peak efficiency but poor linearity. They cannot support complex modulations without major digital pre-distortion (DPD) computations, which require substantial power and complexity at Gb/s speeds. An alternative is to use overdriven linear PAs with harmonic terminations. Multiple designs recently show that Class-J or Class-F/Class-F<sup>1</sup>-like harmonic terminations of linear PAs can boost their peak efficiency and still preserve high linearity. However, these designs either have limited bandwidth due to narrowband harmonic terminations or require area-consuming passive networks, constraining their use in broadband massive MIMO systems.

We propose a two-stage differential continuous-mode harmonically tuned ultra-linear mm-wave PA that achieves wideband operation (19 to 29.5GHz, 43.3%), high peak PAE (43.5%), high modulation PAE (18.4% for 3GSym/s 64-QAM and 16.3% for 1GSym/s 256-QAM), and high average output power  $P_{out}$  (9.8dBm for 3GSym/s 64-QAM and 8.7dBm for 1GSym/s 256-QAM). Our PA realizes an on-chip continuous-mode wideband harmonic-termination output network, simultaneously for the fundamental, 2<sup>nd</sup>-, and 3<sup>rd</sup>-order harmonics, with only one transformer footprint and no need for any tunable elements or switches, which achieves an ultra-compact PA design for broadband 5G massive MIMOs (Fig. 26.6.1).

Unlike conventional single-frequency-PA harmonic tuning, the continuous-mode PA substantially expands the frequency range over which the desired PA output harmonic terminations can be achieved for efficiency enhancement. However, most existing continuous-mode PA output networks require multiple passive components and transmission lines for multi-resonance tuning [6-8], inevitably increasing the network complexity, losses, and size. Our PA output network exploits and enhances parasitic elements in an on-chip transformer to achieve continuous-mode harmonic tuning at both differential- and common-mode with substantial network simplification and area-saving. It consists of one 1:1 transformer and three harmonic tuning capacitors ( $2 \times C_d$  and  $C_c$ ). Also, it utilizes two symmetrically embedded branches  $L_d$  inside the transformer for the 3<sup>rd</sup>-order harmonic impedance tuning in differential-mode, and two extended branches  $L_{c1}$  and  $L_{c2}$  for the 2<sup>nd</sup>-order harmonic impedance tuning in common-mode (Fig. 26.6.1). The schematic of our PA and the EM model and the simplified circuits of the proposed PA output network are shown in Fig. 26.6.1. To ensure PA linearity, a harmonic trap network is added at the PA input to provide a low 2<sup>nd</sup>-order harmonic source impedance.

Next, the PA output harmonic termination network is explained. In Fig. 26.6.2,  $L_{dm1}/L_{cm1}$  and  $L_{dm2}/L_{cm2}$  are the differential-/common-mode half-circuit inductances of the transformer, and the output leads are absorbed into the transformer secondary coil. Also,  $L_{m1}$  and  $L_{k1}$  are the magnetizing and leakage inductances of the transformer in differential-mode half-circuit. In the differential mode, the center-tap of the transformer is virtual ground, so  $L_{c1}/L_{c2}$  and  $C_c$  do not affect the fundamental and 3<sup>rd</sup>-order harmonic termination.  $C_d-L_d-L_{dm2}$  form a multi-resonance tank  $Z_1$  with high-frequency resonance. At the fundamental frequency, the series network  $C_d-L_d$  behaves as a small capacitor, which presents a high impedance to the transformer. The transformer performs its matching with the PA output capacitance  $C_{out}$  and provides the desired fundamental load impedance to the PA. At the 3<sup>rd</sup>-order harmonic, the series network  $C_d-L_d$  is slightly below their series resonance, which shorts out  $L_{dm2}$  and forms a series resonance of  $C_d$ -

$L_d-L_{m1}-L_{k1}$  to produce a desired low impedance. In the common-mode half-circuit, the network of  $C_c/2$ ,  $2L_{c1}$ , and  $2L_{c2}$  forms a multi-resonance tank  $Z_2$ . At the 2<sup>nd</sup>-order harmonic,  $Z_2$  provides a high impedance, and the remaining series tank of  $C_d-L_d$  behaves as a capacitor. Therefore, with proper tuning, the 2<sup>nd</sup>-order harmonic impedance is dominated by  $C_{out}$ ,  $L_{cm1}$  and the effective capacitance due to series  $C_d-L_d$ , which achieve desired continuous-mode 2<sup>nd</sup>-order harmonic impedance. The trajectories of half-circuit load impedance at fundamental, 2<sup>nd</sup>-, and 3<sup>rd</sup>-order harmonics with the PA output capacitance  $C_{out}$  are shown on the Smith Chart in Fig. 26.6.2.

The PA continuous-mode harmonic terminations are further explained (Fig. 26.6.2). The fundamental load impedance is mostly inductive for lower frequency ( $0 \leq \xi \leq 1$ ) and capacitive for higher frequency ( $-1 \leq \xi \leq 0$ ), and vice versa for the 2<sup>nd</sup>-order harmonic impedance. The fundamental and the 2<sup>nd</sup>-order harmonic impedances of the upper operation bandwidth follow the constant conductance circles, while the 3<sup>rd</sup>-order-harmonic impedance is kept low. These aspects verify that the PA achieves a continuous-mode Class-F<sup>1</sup>-like harmonic terminations for its fundamental, 2<sup>nd</sup>-, and 3<sup>rd</sup>-order impedances [6]. In addition, the harmonic trap and the inter-stage network together provide a low 2<sup>nd</sup>-order-harmonic source impedance for the PA to ensure its linearity.

Our proposed PA is implemented in a 0.13μm SiGe BiCMOS process with a 0.91×0.32mm<sup>2</sup> core area excluding pads, as shown in Fig. 26.6.7. The small-signal and the continuous-wave (CW) large-signal measurements are shown in Fig. 26.6.3. At 28.5GHz, it achieves  $P_{sat}$  of 17dBm and  $P_{1dB}$  of 15.2dBm, the power gain  $G_p$  of 20dB, peak PAE<sub>total</sub> of 43.5%, and peak PAE<sub>PA</sub> of 50%. For a fair comparison with reported 2-stage and 1-stage PAs, we use PAE<sub>total</sub> to represent the overall PAE of the 2-stage PA including both the driver and PA output stage, while PAE<sub>PA</sub> stands for the 1-stage PA PAE, i.e. the PA output stage only. The measured  $P_{sat}$  is 16.4 to 17.4dBm for 19 to 29.5GHz, achieving a 43.3% large-signal  $P_{out}$  1dB bandwidth. Figure 26.6.3 also shows the AM-PM distortion of only 3° and the AM-AM gain peaking within 0.43dB up to  $P_{1dB}$  at 28.5GHz.

Our proposed PA is first measured using 64-QAM signals at 1GSym/s (6Gb/s), 1.5GSym/s (9Gb/s), 2.5GSym/s (15Gb/s), and 3GSym/s (18Gb/s) at the carrier frequency ( $f_{carrier}$ ) of 28.5GHz, as shown in Fig. 26.6.4. With no DPD, the measured EVM is below -25dB for all data rates. At 3GSym/s (18Gb/s), the EVM is -25dB with the average output power  $P_{avg}$  of 9.8dBm and PAE<sub>total</sub> of 18.4%. Next, the PA is measured using 256-QAM signals at 0.5GSym/s (4Gb/s), 0.8GSym/s (6.4Gb/s), and 1GSym/s (8Gb/s) at carrier frequency ( $f_{carrier}$ ) of 28.5GHz, as shown in Fig. 26.6.5. The EVM is kept below -30dB for all data rates. At 1GSym/s (8Gb/s), the EVM is -30dB with  $P_{avg}$  of 8.7dBm and PAE<sub>total</sub> of 16.3%.

Compared with the reported silicon-based PAs at similar frequencies in Fig. 26.6.6, our proposed PA demonstrates the highest data rate (18Gb/s 64-QAM and 8Gb/s 256-QAM), the highest peak PAE (2-stage PAE<sub>total</sub>=43.5% and 1-stage PAE<sub>PA</sub>=50%), the highest modulation PAE (18.4% for 3GSym/s 64-QAM and 16.3% for 1GSym/s 256-QAM), and a wide  $P_{out}$  1dB bandwidth (19 to 29.5GHz, 43.3%). The continuous-mode harmonically tuned output PA network only occupies one on-chip transformer footprint for an ultra-compact PA design.

### Acknowledgements:

We thank GlobalFoundries for chip fabrication.

### Reference:

- [1] S. Hu et al., "A 28GHz/37GHz/39GHz Multiband Linear Doherty Power Amplifier for 5G Massive MIMO Applications," *ISSCC*, pp. 32-33, Feb. 2017.
- [2] S. Shakib et al., "A Wideband 28GHz Power Amplifier Supporting 8×100MHz Carrier Aggregation for 5G in 40nm CMOS," *ISSCC*, pp. 44-45, Feb. 2017.
- [3] P. Indirayanti and P. Reynaert, "A 32 GHz 20 dBm-PSAT Transformer-Based Doherty Power Amplifier for Multi-Gb/s 5G Applications in 28 nm bulk CMOS," *IEEE RFIC*, pp. 45-48, June 2017.
- [4] M. Vigilante and P. Reynaert, "A 29-to-57GHz AM-PM Compensated Class-AB Power Amplifier for 5G Phased Arrays in 0.9V 28nm bulk CMOS," *IEEE RFIC*, pp. 116-119, June 2017.
- [5] B. Park et al., "Highly Linear mm-Wave CMOS Power Amplifier," *IEEE TMTT*, vol. 64, no. 12, pp. 4535-4544, Dec. 2016.
- [6] V. Carrubba et al., "On the Extension of the Continuous Class-F Mode Power Amplifier," *IEEE TMTT*, vol. 59, no. 5, pp. 1294-1303, May 2011.
- [7] K. Chen and D. Peroulis, "Design of Broadband Highly Efficient Harmonic-Tuned Power Amplifier Using In-Band Continuous Class-F<sup>1</sup>/F Mode Transferring," *IEEE TMTT*, vol. 60, no. 12, pp. 4107-4116, Dec. 2012.
- [8] S. Ali et al., "A 42–46.4% PAE Continuous Class-F Power Amplifier with Cgd Neutralization at 26–34 GHz in 65nm CMOS for 5G Applications," *IEEE RFIC*, pp. 212–215, June 2017.



Figure 26.6.1: The schematic of the proposed PA and the continuous-mode harmonically tuned output matching network.



Figure 26.6.3: The measured S-parameters and the CW large-signal performance vs. output power and carrier frequency.



Figure 26.6.5: The 256-QAM modulation measurement results ( $f_{carrier}=28.5GHz$ ) at 0.5GSym/s, 0.8GSym/s and 1GSym/s.



Figure 26.6.2: The simplified differential-/common-mode half circuits of the output network, and the load impedance trajectories.



Figure 26.6.4: The 64-QAM modulation measurement results ( $f_{carrier}=28.5GHz$ ) at 1GSym/s, 1.5GSym/s and 3GSym/s.

|                                                  | This work                                               | [8] RFIC 17                                  | Sarkar T-MTT 17                               | [2] ISSCC 17                  | [1] ISSCC 17                 | T-MTT 16                                     | [3] RFIC 17                                              | [4] RFIC 17       |
|--------------------------------------------------|---------------------------------------------------------|----------------------------------------------|-----------------------------------------------|-------------------------------|------------------------------|----------------------------------------------|----------------------------------------------------------|-------------------|
| Technology                                       | 130nm SiGe                                              | 65nm CMOS                                    | 130nm SiGe                                    | 40nm CMOS                     | 130nm SiGe                   | 28nm CMOS                                    | 28nm CMOS                                                | 28nm CMOS         |
| PA supply (V)                                    | 1.9                                                     | 1.1                                          | 3.6                                           | 1.1                           | 1.5                          | 2.2                                          | 1                                                        | 0.9               |
| $P_{sat}$ 1dB Frequency (GHz)                    | 19.25                                                   | 26.34                                        | 27.29                                         | 27.30 <sup>#</sup>            | 28.42                        | 26.5-29 <sup>#</sup>                         | 28.33 <sup>#</sup>                                       | 25.4 <sup>#</sup> |
| $P_{sat}$ 1dB Bandwidth                          | 43.3%                                                   | 26.7%                                        | 7.1%                                          | 10.5% <sup>#</sup>            | 40%                          | 10.7% <sup>#</sup>                           | 22.2% <sup>#</sup>                                       | 63% <sup>#</sup>  |
| Gain (dB)                                        | 20                                                      | 10                                           | 15.5                                          | 22.4                          | 18.2                         | 13.6                                         | 22                                                       | 20.8              |
| $P_{sat}$ (dBm)                                  | 17                                                      | 14.8                                         | 18.8                                          | 15.1                          | 16.8                         | 19.8                                         | 19.8                                                     | 16.6              |
| $P_{sat}$ (dBm)                                  | 15.2                                                    | 13.2                                         | 15.9                                          | 13.7                          | 15.1                         | 18.6                                         | 16                                                       | 13.4              |
| PAE <sub>total</sub> (%) <sup>†</sup> (28.5GHz)  | 43.5                                                    | -                                            | -                                             | 33.7                          | 22.6                         | -                                            | 21                                                       | 24.2              |
| PAE <sub>PA</sub> (%) <sup>††</sup> (1-stage PA) | 50                                                      | 46.4                                         | 35.3                                          | -                             | -                            | 43.3                                         | -                                                        | -                 |
| PAE <sub>PA</sub> (@ $P_{sat}$ (%))              | 39.2                                                    | -                                            | -                                             | 31.1                          | 21.6                         | -                                            | 12.8                                                     | 12.6              |
| PAE <sub>PA</sub> (@ $P_{sat}$ (%))              | 43                                                      | 39 <sup>#</sup>                              | 31.5                                          | -                             | -                            | 41.4                                         | -                                                        | -                 |
| Modulation scheme                                | 64-QAM                                                  | 256-QAM                                      | -                                             | 16-QAM OFDM                   | 64-QAM OFDM                  | 64-QAM WLAN                                  | 64-QAM                                                   | 64-QAM            |
| Data rate (Gb/s)                                 | 6 9 18 4 6.4 8                                          | -                                            | -                                             | 3.2 4.8                       | 6 0.48                       | 15 6                                         | -                                                        | -                 |
| EVM (dB)                                         | -27.6 -26.8 -25 -31.3 -30.5                             | -                                            | -                                             | -22 -25                       | -26.6 -27.5                  | -25 -25                                      | -                                                        | -                 |
| $P_{avg}$ @EVM (dBm)                             | 10.7 10.7 9.8 8.8 8.7                                   | -                                            | -                                             | 6.7 7.2                       | 10.97 11.7                   | 11.7 5.9                                     | -                                                        | -                 |
| PAE <sub>total</sub> @EVM (%)                    | 21.4 21.5 18.4 16.2 16.7 16.3                           | -                                            | -                                             | 11 7.5 <sup>#</sup>           | 17.3 5.75                    | 5.75 2.3                                     | -                                                        | -                 |
| PA core size (mm <sup>2</sup> )                  | 0.29                                                    | 0.12                                         | 0.27                                          | 0.23                          | 1.76 0.28                    | 0.59 0.16                                    | -                                                        | -                 |
| Topology                                         | Differential 2-stage Continuous-mode Harmonically-tuned | Single-ended 1-stage Continuous-mode Class-F | Single-ended Cascade Continuous-mode Class-AB | Single-ended 3-stage Class-AB | Single-ended 2-stage Doherty | Single-ended 2-stage Harmonically-Classified | Single-ended 2-stage Class-AB with Output Power Combiner | -                 |

\*PAE<sub>total</sub> includes the DC power consumption of both driver and PA stage

\*\*PAE<sub>PA</sub> includes the DC power consumption of the output PA stage only

<sup>#</sup>Graphically estimated from reported figures

Figure 26.6.6: The comparison table of silicon-based mm-wave PAs at related operation frequencies.



Figure 26.6.7: The PA die micrograph.

## 26.7 A Coupled-RTWO-Based Subharmonic Receiver Front-End for 5G E-Band Backhaul Links in 28nm Bulk CMOS

Marco Vigilante, Patrick Reynaert

KU Leuven, Heverlee, Belgium

A fully integrated receiver for high-capacity 5G E-Band Backhaul links (71 to 76GHz and 81 to 86GHz) needs a local-oscillator (LO) distribution network with >19% tuning range (TR) and accurate quadrature phases. Further, the LO phase noise (PN) at 10MHz offset should be as low as -119dBc/Hz to impair the SNR of the received 64-QAM signal by <1dB at 10<sup>-6</sup> BER [1]. Silicon-based solutions reported so far either do not meet such stringent requirements [1-4] or demand large power consumption in the LO distribution network [5].

We propose a direct-conversion I/Q subharmonic-receiver (SHRX) architecture that leverages on-chip coupled rotary traveling-wave oscillators (RTWOs) to generate N=8 differential phases at  $f_{LO}=2f_{RF}/N$ , a quadrature-correction circuit to realize accurate I/Q phases, and current-mode passive mixers (MXs) to allow low-V<sub>DD</sub> operation and to greatly simplify the layout. The subharmonic architecture is well known and provides several benefits. (1) The oscillators enjoy superior FOM for given PN and TR [1-6]. (2) The RTWOs are able to directly drive the MXs reducing the LO distribution network to bare minimum, further saving power consumption [6]. (3) There is no need for image-rejection filters and IF gain stages [5,6]. (4) Typical key limitations of direct conversion TRXs, such as LO feedthrough and PA pulling, are greatly mitigated [2,6].

Despite the aforementioned advantages, no mm-wave SHRX implementation with sufficient performance was found to date in open literature, with the remarkable exception of [6]. In [6], half-harmonic operation (i.e.  $f_{LO}=f_{RF}/2$ ) at 24GHz is reported by leveraging a double quadrature lumped-element LC VCO. In this work, we demonstrate that 8 differential phases can be effectively generated by distributed oscillators, while circuit and layout techniques are studied to achieve superior phase accuracy. Implemented in a 28nm bulk CMOS technology without an ultra-thick top-metal option, the realized prototype SHRX front-end achieves quarter-harmonic operation (i.e.  $f_{LO}=f_{RF}/4$ ) in the E-Band and exhibits low PN and FOM and wide TR. The measured minimum noise figure (NF) is 8.3dB and varies less than 2dB over the 12.5GHz 3dB RF bandwidth.

Figure 26.7.1 shows the simplified schematic and the layout implementation of the proposed coupled RTWOs and SHMXs. Two RTWOs are coupled to achieve 3dB lower PN with negligible FOM penalty [1], while generating the required N=8 differential phases. The MXs are conveniently laid out inside the transmission-line (TL) ring, maximizing symmetry and minimizing the length of the LO interconnections, key features to ensure phase accuracy and low power consumption. The conversion gain (CG) of a SHMX degrades in presence of phase error ( $\varphi_{err}$ ), and the larger N the larger the degradation. It can be shown that when  $\varphi_{err}=8^\circ$  is considered, the CG reduces by  $\approx 0.08$ dB for N=4 and  $\approx 0.94$ dB for N=8 (worst case scenario). The RTWOs are designed with 16 stages (see Fig. 26.7.1). Each stage employs a back-to-back inverter negative G<sub>m</sub> ( $W_N=4\times 1.06\mu m$ ,  $W_P=6\times 1.06\mu m$ , L=28nm), a 6b MOM capacitor bank, a small A-MOS varactor for continuous tuning, and a small A-MOS varactor for I/Q phase error compensation as first proposed in [7] for LC quadrature VCOs. Benefited by the 4x lower frequency of operation, (1) the capacitive part of the TL losses is reduced and (2) a larger number of stages can be laid out in the RTWO ring. This permits the circuit to impose an almost square voltage waveform at 20GHz, further improving the CG of the MX. It is worth mentioning that the RTWOs should be coupled with differential connections at least in two points (as in Fig. 26.7.1) to avoid mode ambiguity and ensure proper quadrature operation.

The 4-stage LNA in Fig. 26.7.2 provides  $\approx 30$ dB gain to compensate for the losses of the SHMXs. Neutralized common-source (CS) amplifiers are used as transconductance stage in the RF signal path ( $W_{CS,deg}=40\times 1.06\mu m$ ,  $W_{CS}=24\times 1.06\mu m$ , L=28nm), providing high gain and excellent reverse isolation at mm-wave [2,8]. An inductor is added at the source of the 1<sup>st</sup> stage of the LNA to lower its input impedance and realize broadband matching to 50Ω [6]. Transformer based 4<sup>th</sup>-order filters are largely employed to serve as a balun and ESD protection at the LNA input [2] and for gain-bandwidth enhancement and power division in the inter-stage matching networks [8].

The BB current at the SHMX output ( $W_{MX}=24\times 1.06\mu m$ , L=28nm) is converted to a voltage by a TIA (Fig. 26.7.2). The TIA provides  $\approx 56\Omega$  input impedance and is followed by a buffer to drive the 50Ω measurement equipment. For testing purposes frequency divide-by-4 and divide-by-16 are also implemented on-chip, and the RTWOs can be locked by an off-chip PLL.

The SHRX with integrated RTWOs was realized in a 28nm bulk CMOS process (Fig. 26.7.7). The core area is 1760μm×620μm. Figure 26.7.3 shows the measured DC power consumption of the coupled RTWOs and the power breakdown between the blocks. The RTWOs run at  $f_{LO}=f_{RF}/4$  and draw 67.4mA and 57.7mA from a 1.3V supply at the lowest and highest oscillation frequency, respectively. P<sub>DC,RTWOS</sub> decreases almost linearly with frequency. The transconductance stages in the RF path and the BB TIAs consume 69.8mA and 16.1mA from a 0.9V supply, respectively. Benefited by the proposed architecture, the total power consumption of the LO generation and distribution network never exceeds the 53% of the full system. The oscillators are tunable from 16.1 to 19.8GHz, corresponding to 20.5% TR. The center frequency is  $\approx 1.8$ GHz lower than expected due to an underestimation of the layout parasitics during the design phase. The PN is measured at the frequency divide-by-4 output and reported to the carrier, assuming negligible phase-noise degradation from the divider. Figure 26.7.4 shows the measured PN and FOM of the free-running coupled RTWOs at 10MHz offset from the carrier against the oscillation frequency. The PN ranges from -131.2 to -132.8dBc/Hz (1.6dB variation) over the TR, with a corresponding FOM from -176.7 to -179.5dBc/Hz. The measured and simulated |S<sub>11</sub>|, CG, and NF are reported in Fig. 26.7.5. The input power match is better than -9.3dB from 65.7 to 99GHz, the CG is 28dB over a 12.5GHz BW<sub>3dB</sub> limited by the on-chip LO TR. The NF, measured with a SAGE STZ-12-11 E-Band noise source and an R&S spectrum analyzer, reaches a minimum of 8.3dB and varies less than 2dB over the RF BW<sub>3dB</sub>. The measured I/Q mismatch shown in Fig. 26.7.5 is  $<7.1^\circ$  over the TR and can be made negligible by the discussed on-chip phase-correction circuitry. By varying V<sub>cor,I</sub> from 0 to 1.1V and V<sub>cor,Q</sub> (in Fig. 26.7.1) from 1.1 to 0V, the I/Q phases can be corrected by  $\pm 7.5^\circ$  to  $\pm 9.3^\circ$  at the lowest and highest oscillation frequency respectively. The effect of the phase correction on the measured PN at 10MHz offset is <1dB. Figure 26.7.5 reports also the large signal CW measurements at 81.4GHz, showing an ICP<sub>1dB</sub> of -25dBm.

Experimental results are summarized and compared to recently published E-Band RXs in Fig. 26.7.6. The proposed coupled-RTWO-based SHRX significantly advances the state-of-the-art of the CMOS designs in Fig. 26.7.6 in terms of PN, TR, and FOM. Relative to the 0.13μm SiGe solution in [5], this work achieves comparable PN and TR, while saving  $>4.6\times$  power in the LO path. Moreover, the proposed SHRX achieves state-of-the-art NF and linearity, while reporting the widest RF BW<sub>3dB</sub> among the designs in the comparison table. In summary, this work proves the viability of fully integrated coupled-RTWO-based subharmonic RXs for high-capacity 5G E-Band backhaul links.

### Acknowledgements:

This work was supported by Analog Devices Inc, Limerick, Ireland. The authors wish to thank Mike Keaveney, Mike O'Shea, Niall McDermott and Andrew Cunnane from Analog Devices for providing technology access and support with measurements.

### References:

- [1] L. Iotti et al., "Insights into Phase-Noise Scaling in Switch-Coupled Multi-Core VCOs for E-Band Adaptive Modulation Links," *IEEE JSSC*, vol. 52, no. 7, pp. 1703-1718, July 2017.
- [2] D. Guermandi et al., "A 79 GHz 2 × 2 MIMO PMCW Radar SoC in 28 nm CMOS," *IEEE JSSC*, vol. 52, no. 10, pp. 2613-2626, Oct. 2017.
- [3] D. Guermandi et al., "A 79GHz binary phase-modulated continuous-wave radar transceiver with TX-to-RX spillover cancellation in 28nm CMOS," *ISSCC*, pp. 354-355, Feb. 2015.
- [4] T. Fujibayashi et al., "A 76- to 81-GHz Multi-Channel Radar Transceiver," *IEEE JSSC*, vol. 52, no. 9, pp. 2226-2241, Sept. 2017.
- [5] R. Levinger et al., "High-Performance E-Band Transceiver Chipset for Point-to-Point Communication in SiGe BiCMOS Technology," *IEEE TMTT*, vol. 64, no. 4, pp. 1078-1087, Apr. 2016.
- [6] A. Mazzanti et al., "A 24GHz Sub-Harmonic Receiver Front-End with Integrated Multi-Phase LO Generation in 65nm CMOS," *ISSCC*, pp. 216-608, Feb. 2008.
- [7] V. Szortyka et al., "A 42 mW 200 fs-Jitter 60 GHz Sub-Sampling PLL in 40 nm CMOS," *IEEE JSSC*, vol. 50, no. 9, pp. 2025-2036, Sept. 2015.
- [8] M. Vigilante and P. Reynaert, "A 68.1-to-96.4GHz Variable-Gain Low-Noise Amplifier in 28nm CMOS," *ISSCC*, pp. 360-362, Feb. 2016.



Figure 26.7.1: Simplified schematic of the coupled RTWOS and subharmonic mixers, with highlighted number of stages, tuning elements, and proposed I/Q phase-correction circuits (left). Layout details of the 28nm CMOS implementation (right).



Figure 26.7.2: Simplified block diagram of the integrated coupled-RTWO-based subharmonic receiver.



Figure 26.7.3: Measured DC power consumption of the coupled RTWOS (top) and measured power breakdown between the blocks at the tuning range edges (bottom).



Figure 26.7.4: Measured phase noise (black triangles) and the figure of merit (gray triangles) at 10MHz offset from the carrier versus the oscillation frequency of the free-running coupled RTWOS. Benefited by the subharmonic architecture the RTWOS oscillate at  $f_{LO}=f_{RF}/4$ .



Figure 26.7.5: Measured input power match, conversion gain, noise figure, I/Q phase versus frequency, and measured output power ( $P_o$ ) versus input power ( $P_{IN}$ ) at 81.4GHz.

| Reference          | This work                                     | Guermandi JSSC17 [2]             | Guermandi ISSCC15 [3]      | Fujibayashi JSSC17 [4]                              | Levinger TMTT16 [5]                       |
|--------------------|-----------------------------------------------|----------------------------------|----------------------------|-----------------------------------------------------|-------------------------------------------|
| Technology         | 28nm CMOS                                     | 28nm CMOS                        | 28nm CMOS                  | 0.13μm SiGe                                         | 0.13μm SiGe                               |
| Architecture       | Subharmonic Direct Conversion 2 Coupled RTWOS | Direct Conversion VCO + SH-QILO5 | Direct Conversion SH-QILO5 | Direct Conversion VCO + Quadrupler + Hybrid Coupler | Sliding-IF VCO + Quadrupler + Divide-by-2 |
| $f_c$ [GHz]        | 75.1                                          | 82.5                             | 79                         | 78.5                                                | 73 83                                     |
| Gain [dB]          | 28                                            | 31                               | 35                         | 15                                                  | 70 70                                     |
| RF-BW              | 12.5GHz*                                      | 9GHz                             | 8GHz                       | 7GHz                                                | 5GHz 5GHz                                 |
| NF [dB]            | 8.3-10                                        | 12*                              | 6.2-7                      | 7.8                                                 | 6.7 6.7                                   |
| $ICP_{1dB}$ [dBm]  | -25                                           | -28                              | -32.5                      | 1                                                   | -19.6* -21.6*                             |
| RX $P_{DC}$ [mW]   | 77.3                                          | 68                               | 59                         | 197.5                                               | 222 222                                   |
| On-chip LO?        | YES                                           | YES                              | NO                         | YES                                                 | YES                                       |
| LO freq. [GHz]     | 16.1-19.8                                     | 78-87                            | NA                         | 74-82.4                                             | 15.6-19.3                                 |
| LO TR [%]          | 20.5                                          | 11                               | NA                         | 12                                                  | 21.2                                      |
| LO $P_{DC}$ [mW]   | 87.6-75                                       | 100                              | NA                         | NA                                                  | 405*                                      |
| PN @10MHz [dBc/Hz] | -131.2/-132.8 (-119.1/-120.7)*                | -116                             | NA                         | -119/-120**                                         | -133 (-120)*                              |
| LO FOM [dBc/Hz]    | -176.7/-179.5                                 | -174.3                           | NA                         | NA                                                  | -171.5*                                   |

\*Limited by on-chip LO TR      \*\*Including flip chip assembly and module loss

\*DC power consumption of the LO frequency divide-by-2 not included      \*\*Estimated from the reported PN @1MHz offset      \*Equivalent @f<sub>RF</sub>

Figure 26.7.6: Performance summary and comparison with prior works. Designs implemented in bulk CMOS technology are highlighted.



Figure 26.7.7: Die micrograph. The core area is  $1760\mu\text{m} \times 620\mu\text{m}$ . The total area is  $1950\mu\text{m} \times 1500\mu\text{m}$ .

## 26.8 A 12mW 70-to-100GHz Mixer-First Receiver Front-End for mm-Wave Massive-MIMO Arrays in 28nm CMOS

Lorenzo Iotti, Greg LaCaille, Ali M. Niknejad

University of California, Berkeley, CA

Multi-user multiple-input multiple-output (MIMO) systems are promising enablers for high-capacity wireless access in next-generation mobile networks. Leveraging antenna arrays at the access point, narrow beams can be steered to different users simultaneously, enhancing spectral efficiency through spatial multiplexing. By employing a number of array elements,  $M$ , much larger than the number of users,  $K$ , (i.e. massive MIMO), simple linear beamforming algorithms can achieve nearly optimal operation [1]. Operating massive MIMO systems at mm-waves results in compact antenna arrays and wide channel bandwidths. Within the available spectrum, the E-Band communication bandwidth (71 to 76GHz, 81 to 86GHz, and 92 to 95GHz) has recently gained attention for both access and wireless backhaul, due to low oxygen attenuation.

Hardware implementation of mm-wave massive MIMO systems poses several challenges and opportunities. On one hand, array gains allow one to relax the performance of the individual transceiver element, e.g., RX noise figure (NF) and TX power [1]. On the other hand, per-element area and power consumption have to be minimized. RF-signal and local-oscillator (LO) phase shifting and combining are popular in single-beam mm-wave phased arrays [2]. However, in a multi-beam system  $M \times K$  mm-wave phase shifters would be required, resulting in significant area and power overhead. A baseband-combining architecture, as shown in Fig. 26.8.1, is hence more suitable for massive MIMO receivers, as compact analog or digital Cartesian beamformers can be employed [2]. As a drawback, since spatial filtering is performed after the RX front-end, high linearity is required to cope with interferers.

In this paper, we propose an E-Band mixer-first RX front-end, minimizing area and power consumption without sacrificing linearity and bandwidth. Most wideband mm-wave receivers employ magnetically or capacitively coupled multi-stage LNAs [3]. However, compensating losses in interstage networks requires significant power consumption. Since power reduction, rather than noise-figure minimization, is key in massive MIMO front-ends, a mixer-first receiver can be adopted instead. A 60GHz passive mixer presented in [4] achieves >30% power-matching bandwidth and 11-to-14dB NF. However, it employs wide MOS transistors (i.e.  $64\mu\text{m}/60\text{nm}$ ) for ideal-switch operation. At mm-waves, these require significant LO power to be driven, thus lowering the transceiver power efficiency. In [5], a low-power 5GHz mixer-first receiver is proposed, featuring small mixer switches that present a high input impedance and are matched to the antenna through a 1-to-6 transformer. The input matching network provides passive voltage gain, resulting in moderate noise figure (~5.3dB) in spite of high mixer series resistance. Unfortunately, high-turn-ratio transformers cannot be implemented at mm-waves due to low-frequency self-resonance.

The proposed RX front-end is shown in Fig. 26.8.2. A passive quadrature downconverter is followed by open-loop differential baseband amplifiers. Neglecting the mixer input capacitance, the input impedance of a capacitively loaded passive mixer ( $Z_{\text{IN,MIX}}$ ) has a peak centered around the LO frequency ( $f_{\text{LO}}$ ), whose bandwidth is set by the mixer load capacitance. The in-band impedance is proportional to the mixer-switch series resistance [5]. In the proposed mixer, small switches ( $6\mu\text{m}/30\text{nm}$ ) are employed, resulting in  $\sim 400\Omega$  in-band  $Z_{\text{IN,MIX}}$  when the mixer is driven by a  $600\text{mV}_{\text{diff},0,\text{pk}}$  sinusoidal 80GHz quadrature LO. Matching this impedance to  $50\Omega$  would require a high-Q passive network, preventing wideband operation. Hence, frequency-translational feedback is employed to reduce  $Z_{\text{IN,MIX}}$ . Auxiliary feedback mixers are placed between the baseband amplifier output and the mixer input. Feedback switches are sized to obtain loop gain  $G_{\text{LOOP}} \approx 1$ , resulting in  $Z_{\text{IN,MIX}}$  being decreased by a factor of two within the loop bandwidth, as shown in Fig. 26.8.2. Since the baseband amplifier contributes to  $G_{\text{LOOP}}$ , the feedback switches can be considerably downsized compared to the mixer, hence contributing to only ~10% capacitance overhead. Since  $G_{\text{LOOP}} \approx 1$ , the feedback does not lead to stability concerns.

The remaining input-impedance matching is performed by a wideband transformation network, combining an L-match and an input shunt resonator, as shown in Fig. 26.8.2. The input resonator not only neutralizes the pad capacitance, but also improves the matching bandwidth. DC bias and ESD protection are placed

on the negative terminal of the shunt inductance, which is AC shorted to ground. The network provides ~6dB passive voltage gain, hence amplifying the signal before it enters the noisy passive mixer. Simulated RX noise figure is ~8dB, 3dB higher than [5] due to higher losses at mm-waves, unavailability of a 25% duty cycle LO, and ~1dB noise penalty due to the feedback. The baseband amplifier is designed to provide 13dB gain with 2mA current consumption. Gate-drain neutralization is employed to reduce the input capacitance, so that the dominant pole is located at the amplifier output.

Neutralized differential LO buffers with resonant loads, shown in Fig. 26.8.3, were co-designed with the mixer to provide 7dB maximum gain. Given the small mixer size, the buffer load impedance magnitude is  $\sim 500\Omega$ , resulting in only 2mA nominal current consumption for each buffer. An LC series trap, tuned at  $\sim 85\text{GHz}$ , and a choke inductor were added to improve common-mode rejection at  $f_{\text{LO}}$ . A differential transformer-coupled quadrature hybrid is employed to generate the quadrature LO.

A prototype was realized in 28nm CMOS without ultra-thick metal (see Fig. 26.8.7). The RX core area, including input pad, LO buffers, and quadrature hybrid, is only  $0.085\text{mm}^2$ . The front-end is followed by a VGA and a matched output buffer for measurement purposes. The die was wire-bonded to a PCB, and high-frequency probes were used for the RF and LO pads. The LO was generated off-chip, and a constant  $300\text{mV}_{\text{diff},0,\text{pk}}$  swing was provided at the LO buffer input. Figure 26.8.4 shows measured  $S_{11}$  for different values of  $f_{\text{LO}}$ . Wideband power matching with  $S_{11} < -10\text{dB}$  over 74 to 94GHz is achieved, and the bandwidth is improved to 70 to 100GHz when the LO buffer current is increased to 4 mA to compensate losses at the edge of the buffer bandwidth.

NF, conversion gain, and input-referred 1dB compression point (ICP1dB) at 100MHz offset from  $f_{\text{LO}}$  are shown in Fig. 26.8.5. Gain and ICP1dB were measured using a frequency-translational VNA, whereas a W-Band noise source and a spectrum analyzer were used to evaluate NF with the Y-factor method. The RX front-end achieves 19.5 to 25.3dB voltage gain and 8 to 12.7dB NF over a wide operation range, ultimately limited by the LO chain bandwidth. Conversion gain increases at high frequency due to a rise in the matching-network voltage gain. Linearity gets worse at the bandwidth edges, since the reduced LO swing results in soft switching in the mixer. The baseband bandwidth is 1.8GHz, when driving a  $100\text{fF}$  differential load. The flicker noise corner, set by the baseband amplifier, is at 10MHz.

Measurement results are compared to low-power CMOS mm-wave receivers in Fig. 26.8.6. The proposed receiver covers the licensed E-Band communication spectrum with -24dBm worst-case ICP1dB, 9.5dB worst-case NF, and only 12mW power consumption including the LO buffers. Ultra-low-power operation and small footprint, together with linearity comparable with prior works in Fig. 26.8.6, suggest the use of the proposed front-end for massive MIMO mm-wave arrays.

### Acknowledgements:

This work was partly funded by the Intel Connectivity SRS program and the NSF EARS program (grant #1642920). The authors acknowledge the TSMC University Shuttle program for chip fabrication, Keysight Technologies for measurement support, and Integrand EMX for electromagnetic simulation. Thanks to Dr. Christopher Hull and Dr. Steven Callender from Intel Labs for the fruitful discussions, Suren Singh from Keysight for assistance during measurements, Prof. Francesco Svelto from University of Pavia for supporting the project, and Berkeley Wireless Research Center faculty, students, staff and sponsors.

### References:

- [1] E. Larsson et al., "Massive MIMO for Next Generation Wireless Systems," *IEEE Commun. Mag.*, vol. 52, no. 2, pp. 186-195, Feb. 2014.
- [2] S. Kundu and J. Paramesh, "A Compact, Supply-Voltage Scalable 45–66 GHz Baseband-Combining CMOS Phased-Array Receiver," *IEEE JSSC*, vol. 50, no. 2, pp. 527-542, Feb. 2015.
- [3] M. Vigilante and P. Reynaert, "On the Design of Wideband Transformer-Based Fourth Order Matching Networks for E-Band Receivers in 28-nm CMOS," *IEEE JSSC*, vol. 52, no. 8, pp. 2071-2082, Aug. 2017.
- [4] A. Moroni and D. Manstretta, "A Broadband Millimeter-Wave Passive CMOS Down-Converter," *IEEE RFIC*, pp. 507-510, June 2010
- [5] A. Homayoun and B. Razavi, "A Low-Power CMOS Receiver for 5 GHz WLAN," *IEEE JSSC*, vol. 50, no. 3, pp. 630-643, Mar. 2015.
- [6] M. Khanpour et al., "A Wideband W-Band Receiver Front-End in 65-nm CMOS," *IEEE JSSC*, vol. 43, no. 8, pp. 1717-1730, Aug. 2008.





Figure 26.8.7: Die micrograph and detailed view of the RX front-end layout.

## 26.9 A 13<sup>th</sup>-Order CMOS Reconfigurable RF BPF with Adjustable Transmission Zeros for SAW-Less SDR Receivers

Pingyue Song, Hossein Hashemi

University of Southern California, Los Angeles, CA

Current cellular receivers often employ acoustic filters (SAW or BAW) for each communication band due to their high selectivity, low insertion loss, and small formfactor. The need to support multiple communication bands, multi-input multi-output (MIMO) communications, and carrier aggregation necessitates the use of several such acoustic filters with a large overall footprint. These acoustic RF filters employ several high-Q acoustic resonators to create several poles and transmission zeros in a high-order transfer function to achieve low insertion loss within a nearly flat passband, fast roll-off, and highly attenuated stopband. The low quality factor of integrated passive components, specifically inductors, leads to a significant increase of insertion loss (proportional to the filter order and inversely proportional to the component quality factor) and poor stopband rejection for an equivalent RF filter.

On-chip N-path filters mimic the functionality of high-Q second-order bandpass resonators with a clock-controlled tunable center frequency. Single- and multi-resonator N-path tunable bandpass RF filters (Fig. 26.9.1(a) and (b)) have been reported [1]. Same order of filtering has also been reported in charge domain [2]. In order to increase the filter selectivity (steeper transition-band roll-off and higher stopband attenuation), transmission zeros (TZ) may be added to the filter transfer function. Recently, combination of active transconductance cells and passive N-path resonators [3,4], and filtering-by-aliasing [5] have been investigated to create BPFs with transmission zeros at the cost of lower passband bandwidth. In summary, a reconfigurable integrated bandpass RF filter with sufficiently high passband bandwidth and selectivity has not been found by the authors in literature. This paper presents a 13<sup>th</sup>-order monolithic reconfigurable bandpass filter prototype with tunable transmission zeros achieving 30-to-50MHz tunable BW<sub>3dB</sub> and a steep 100dB/100MHz transition-band roll-off enabling close-by-blocker rejection.

In a coupled-resonator bandpass filter, transmission zeros in either side of the passband may be introduced by adding inductors and capacitors in series with the coupled resonators (Fig. 26.9.1(c)). Given that the resonators are replaced with N-path circuitry that are driven by the same clock frequency, it is important that the filter design uses resonators that have the same resonant frequency. One challenge associated with capacitively coupled N-path resonators is the undesired charge sharing between the capacitors of the coupled N-path resonators, thereby reducing the effective quality factor of the N-path resonators. Furthermore, the coupling network affects the passband shape and bandwidth. The proposed bandpass filter (Fig. 26.9.1(d)) utilizes three resonators that are coupled through inductive and inductive-plus-capacitive networks. The capacitor (inductor) in series with the leftmost (rightmost) resonator creates a transmission zero to the left (right) of the filter passband. The values of the passive coupling components together with all coupled resonant tanks determine the filter specifications such as bandwidth, transition-band roll-off, etc. In the implemented filter, the coupled resonators are replaced with N-path resonators while all coupling and series capacitors are variable and realized as a bank of switched capacitors enabling filter reconfigurability beyond clock-frequency tuning (Fig. 26.9.2). The values of switched capacitors in the N-path filters depend on the number of non-overlapping clock phases (4 phases in this prototype). In order to reduce the power consumption of the multi-phase clock distribution network, the multi-phase generation circuitry is local to each of the N-path resonators. Due to the isolation provided by the coupling inductors, each N-path resonator can be treated separately, and the three local 4-phase non-overlapping clock generators need not be synchronized in phase. The inductors are realized as on-chip spirals. The low quality factor of coupling inductors does not degrade the filter response.

The performance of an N-path system can be degraded by a few practical non-idealities. First, the clock signals that are used to drive the CMOS switches have a finite rise/fall time as determined by the power budget and technology

parameters. Second, the parasitic capacitance of MOS switches contributes to unwanted charge sharing between switched capacitors of adjacent clock phases. Both effects decrease the equivalent quality factor (Q) of the N-path resonators, and subsequently degrade the response of the high-order BPF that is based on these resonators. The reduced quality factor of the N-path resonator, effectively modeled with a parallel resistor, may be enhanced by adding a negative parallel resistor realized as a cross-coupled differential pair. To maintain the desired DC bias voltage of 0.5V across each N-path filter and suppress the gain at even harmonics, complementary cross-coupled differential pairs are used. 2.5V NMOS and PMOS I/O transistors are used for the cross-coupled pairs to allow larger voltage swings and, hence, to enhance system linearity. Binary-weighted arrays of complementary cross-coupled differential pairs are employed to control the value of Q-boosting negative resistances to realize different filter responses.

The proposed differential filter has been fabricated in the TSMC 65nm CMOS technology (Fig. 26.9.7). To facilitate single-ended measurements, two off-chip baluns are used at the input and output. The insertion loss due to the baluns is de-embedded and measurement results are referred to the chip input and output. Figure 26.9.3 shows a representative measured filter frequency response with f<sub>LO</sub>= 875MHz, passband bandwidth of 40MHz, and transition band roll-off slope of 100dB/100MHz. In this specific configuration, transmission zeros are placed at around 835MHz and 910MHz resulting in >17dB and >19dB blocker rejection at these two close-in frequencies, respectively. It is seen that the gain at the second harmonic is significantly suppressed thanks to the differential clocking scheme. To accommodate different filtering requirement, filter bandwidth and locations of transmission zeros can be tuned by changing discrete capacitor values and equivalent negative resistances. As shown in Fig. 26.9.3, tunable bandwidth from 30 to 50MHz is achieved while maintaining roll-off slope >100dB/100MHz and in-band input/output return loss larger than 7dB. Figure 26.9.4 demonstrates that the filter response with a bandwidth of 40MHz, the roll-off slope of 100dB/100MHz, and return loss larger than 7dB can be replicated across a wide spectrum from 800MHz to 1.1GHz. Figure 26.9.5 shows linearity measurement results for filter centered on f<sub>LO</sub>= 875MHz, with a passband bandwidth of 40MHz. It is important to mention that maintaining a similar filter response at different center frequencies requires adjusting the values of tunable capacitors, as well as value of Q-boosting negative resistances as the LO frequency changes. The measured in-band IIP3 (IIP3-IB) is 25dBm, with two tones separated by 1MHz, and in-band 1dB compression point (ICP<sub>1dB</sub>) is 7dBm as shown in Figs. 26.9.5(a) and (b), respectively. Figure 26.9.5(c) demonstrates the filter linearity performance as a function of the blocker offset frequency Δf. The blocker-induced 1dB compression point (B<sub>1dB</sub>) achieves 9dBm, with a blocker only 40MHz away from test signal. In a two-tone test, a 24dBm out-of-band IIP3 (IIP3-OOB) is measured with two blockers only 40MHz apart. Performance summary and comparison with previous works are shown in Fig. 26.9.6. Compared to prior art this work achieves simultaneous wide passband bandwidth and sharp roll-off slope across a wide spectrum while maintaining comparable NF, system linearity, and power consumption.

### Acknowledgements:

This work is partially supported by DARPA through the RF FPGA program and NSF through the EARS program.

### References:

- [1] R. Chen and H. Hashemi, "Passive Coupled-Switched-Capacitor-Resonator-Based Reconfigurable RF Front-End Filters and Duplexers," *IEEE RFIC*, pp. 138-141, May 2016.
- [2] Y. Xu and P. Kinget, "A Switched-Capacitor RF Front End with Embedded Programmable High-Order Filtering," *IEEE JSSC*, vol. 51, no. 5, pp. 1154-1167, May 2016.
- [3] Y. Lien et al., "A High-Linearity CMOS Receiver Achieving +44dBm IIP3 and +13dBm B1dB for SAW-less LTE Radio," *ISSCC*, pp. 412-414, Feb. 2017.
- [4] G. Qi et al., "A 0.7 to 1 GHz Switched-LC N-Path LNA Resilient to FDD-LTE Self-Interference at ≥40 MHz Offset," *IEEE RFIC*, pp. 276-279, May 2017.
- [5] S. Hameed and S. Pamarti, "A Time-Interleaved Filtering-by-Aliasing Receiver Front-End with >70dB Suppression at <4×Bandwidth Frequency Offset," *ISSCC*, pp. 418-419, Feb. 2017.



Figure 26.9.1: (a)-(e) Evolution of the proposed filter topology along with representative corresponding transfer functions, (f) Performance summary of past monolithic CMOS RF band-pass filters.



Figure 26.9.3: (a) Representative measured filter response with a 40MHz bandwidth centered on  $f_{L0}=875\text{MHz}$ , (b) Representative measured filter shapes with bandwidth of 30, 40, and 50MHz.



Figure 26.9.5: (a) Measured output signal and IM3 power levels versus input power, (b) Measured insertion loss versus input-signal power level, (c) Measured out-of-band (OOB) IIP3, and  $B_{1\text{dB}}$  as a function of OOB blocker frequency offset  $\Delta f$ . In all plots, filter response is centered on  $f_{L0}=875\text{ MHz}$  with  $BW_{3\text{dB}}=40\text{ MHz}$  and  $f_{\text{SIG}}=864\text{MHz}$ .



Figure 26.9.2: Schematic of the proposed filter.



Figure 26.9.4: Representative measured small-signal S-parameters, group delay, and NF of filters with similar frequency response centered on different frequencies.

|                                | RFIC 2016 [1] | JSSC 2016 [2]            | ISSCC 2017 [3]           | RFIC 2017 [4]              | ISSCC 2017 [5]                                  | This Work                        |
|--------------------------------|---------------|--------------------------|--------------------------|----------------------------|-------------------------------------------------|----------------------------------|
| Technology                     | 65 nm CMOS    | 40 nm CMOS               | 28 nm CMOS               | 180 nm CMOS SOI            | 65 nm CMOS                                      | <b>65 nm CMOS</b>                |
| Topology                       | N-Path        | Charge-Domain Filtering  | N-Path                   | N-Path +LNA                | Filtering by Aliasing                           | <b>N-Path</b>                    |
| Tuning Range (GHz)             | 0.5 - 1.1     | 0.1 - 0.7                | 0.1 - 2.0                | 0.7 - 1.0                  | 0.1 - 1.0                                       | <b>0.8 - 1.1</b>                 |
| Filter Order                   | 4             | 1 - 4                    | 5                        | 8                          | NA                                              | <b>13</b>                        |
| Adjustable Transmission Zero   | No            | No                       | No                       | No                         | Yes                                             | <b>Yes</b>                       |
| Roll-Off Slope (dB/100 MHz)    | 15 *          | 98 *                     | 18.3 *                   | 120 *                      | 49 * 800 *                                      | <b>100</b>                       |
| BW <sub>3dB</sub> (MHz)        | 30            | 3.2 - 4.8                | 13                       | 8 *                        | 40                                              | <b>2.5</b>                       |
| ICP <sub>-1dB</sub> (dBm)      | 5             | -27 *                    | -6 *                     | NA                         | -7 *                                            | <b>7</b>                         |
| B <sub>-1dB</sub> (dBm)        | 11            | 15<br>(Delta f/BW = 6.2) | 13<br>(Delta f/BW = 6.2) | 8<br>(Delta f/BW = 5.0)    | 9.5 (Delta f/BW = 2.0)<br>13 (Delta f/BW = 4.0) | <b>9<br/>(Delta f/BW = 1.0)</b>  |
| IIP3-IB (dBm)                  | 19.2          | NA                       | 5                        | NA                         | 8                                               | <b>25</b>                        |
| IIP3-OOB (dBm)                 | 26            | 24<br>(Delta f/BW = 6.2) | 44<br>(Delta f/BW = 6.2) | 32.3<br>(Delta f/BW = 5.0) | 21<br>(Delta f/BW = 1.2)                        | <b>24<br/>(Delta f/BW = 1.0)</b> |
| NF (dB)                        | 3.8 - 5.8     | 6.8 - 9.7                | 4.1 - 10.5               | 5.5 - 6.4                  | 7                                               | <b>5.0 - 8.6</b>                 |
| Power (mW)                     | 15 - 25       | 59 - 105                 | 38 - 96                  | 42 - 57                    | 64 - 84                                         | <b>80 - 97</b>                   |
| Active Area (mm <sup>2</sup> ) | 0.50 *        | 2.03                     | 0.49                     | 2.2                        | 2.3                                             | <b>1.9</b>                       |

\* Estimated from figures

Figure 26.9.6: Performance summary and comparison.



Figure 26.9.7: Die micrograph.

## 26.10 A 128-Pixel 0.56THz Sensing Array for Real-Time Near-Field Imaging in 0.13 $\mu$ m SiGe BiCMOS

Philipp Hillger<sup>1</sup>, Ritesh Jain<sup>1</sup>, Janusz Grzyb<sup>1</sup>, Laven Mavarani<sup>1</sup>, Bernd Heinemann<sup>2</sup>, Gaetan Mac Grogan<sup>3</sup>, Patrick Mounaix<sup>4</sup>, Thomas Zimmer<sup>5</sup>, Ullrich Pfeiffer<sup>1</sup>

<sup>1</sup>University of Wuppertal, Wuppertal, Germany

<sup>2</sup>IHP, Frankfurt (Oder), Germany

<sup>3</sup>Institut Bergonié, Bordeaux, France

<sup>4</sup>CNRS, Talence, France

<sup>5</sup>University of Bordeaux, Talence, France

Real-time terahertz video cameras are regarded as key enabler systems for numerous applications. Unfortunately, their spatial resolution is fundamentally restricted by the diffraction limit. Near-field-scanning optical microscopy (NSOM) is used in the THz domain to break through this limit [1]. Recently reported THz near-field sensors based on silicon technology promise significant improvements compared to NSOM with respect to sensor sensitivity, system cost, and scanning time [2,3]. However, only single-pixel implementations have been presented with unmodulated CW sources so far, which limits the sensors dynamic range (DR) due to detector 1/f noise. This paper scales-up the research of near-field sensing into larger surfaces made of a plurality of super-resolution pixels with video-rate imaging capabilities. The 128-pixel 0.56THz imaging array includes all functions such as illumination, sensing, detection, and digital readout on a single silicon chip.

Figure 26.10.1 shows the scalable circuit architecture implemented in a 0.13 $\mu$ m SiGe-BiCMOS technology (IHP) with  $f_f/f_{max}=300/450$ GHz. The core of each pixel is a cross-bridged double-split-ring resonator (SRR) with a lateral resolution on the  $\mu$ m scale [2] (Fig. 26.10.2). The sensor exposes a sharply confined electric field to the top surface of the chip around two short strips. With an object placed in the sensing volume, the capacitive part of the resonator increases and the SRR resonance frequency shifts towards lower frequencies. For a fixed excitation frequency, this leads to a change in transmitted power and to a dielectric-permittivity-based imaging contrast with the maximum response caused by metallic objects [2,3].

The array consists of two 64-pixel-long rows of SRRs and power detectors in a vertically mirrored arrangement. Each row is divided into 16 subarrays of 4 pixels, each driven from a single chopped free-running triple-push oscillator (TPO) (Fig. 26.10.3) to provide the highest possible array density. All 4 sensors in a subarray are interconnected with a corresponding TPO by means of a 4-way equal-power splitter consisting of two cascaded, 3.9dB-insertion-loss (540GHz) modified Wilkinson power splitters [4] in a stripline configuration (Fig. 26.10.2). The subarray schematic is provided in Fig. 26.10.3. The oscillator is based on a 534-to-562GHz Colpitts TPO with 28 $\mu$ W output power presented in [2]. A current-mirror biasing scheme is used for external-reference current control ( $I_{ref}$ ) and source chopping with a 1.2V CMOS controlled base pull-down logic. The TPO oscillation is periodically turned on/off by pulling down the common DC base bias with an externally supplied chopping signal (chop). The chopped THz wave is then coupled capacitively (25fF) to a SiGe-HBT power detector with an estimated current responsivity of 0.48A/W [2]. The pixels comprise a beta-helper current-mirror biasing scheme that can be controlled with a 3.3V CMOS selection logic. The array is operated with only one TPO and one detector turned on at a time enabling a fully scalable circuit architecture with a total power consumption of 79mW. All detectors share an active PMOS load to convert the detector output current to voltage.

The detector output signal is further processed by a 0-to-42dB gain active bandpass filter (BP) and a lock-in amplifier (LIA) as shown in Fig. 26.10.1. The LIA comprises a passive CMOS mixer, a third-order switched-capacitor lowpass filter, and an instrumentation amplifier. The lowpass corner can be externally adjusted with the LPclk frequency to optimize the frame rate vs. the SNR. The LIA output is subsequently sampled with a 6b flash ADC with programmable reference voltages. The flash architecture was chosen to support fast oversampling with successive approximation algorithms and dynamic range adjustments to accommodate oscillator/detector PVT variations and varying signal strengths due to different materials. The ASIC part provides registers for reference voltage and gain settings, address decoders, multiplexers, clock dividers, and an SPI interface.

Full-wave EM simulations were conducted to investigate the impact of the power-splitting network and the coupling capacitor on the single sensor performance and the cross-coupling between pixels. Simulation results of the transmitted power through the 4-way power splitter and the SRRs are presented in Fig. 26.10.2 for one detector (IN-to-Det<sub>1</sub>) as a function of frequency and object material properties. Note, that the slope above the resonance frequency is well preserved compared to a single sensor [2] because of the good isolation (>30dB, Fig. 26.10.2) and return loss provided by the divider network. Because of the specific row arrangement, two different coupling phenomena need to be considered. First, the coupling through the radiative and reactive near-field zones between the pixels, not connected through the divider network, was analyzed and found to be negligible (<-46dB). Second, the parasitic coupling between sensors connected to the same divider network was found to be more relevant, due to a finite variation of the power division ratios at the internal divider nodes as a function of the object loading conditions. Figure 26.10.2 presents the simulated power transmission (IN-to-Det<sub>1</sub>) in a 4-way sensor arrangement across frequency for different object loading conditions while influenced by the closest-proximity pixel (Det<sub>2</sub>) with varying material properties. The worst-case scenario occurs for a perfect electric conductor on Det<sub>2</sub> and no object on Det<sub>1</sub>. This shifts the resonance frequency down and causes a -27dB change in relative power at the detector input, referred to the oscillator power available at the resonator input. This corresponds to a dielectric permittivity uncertainty of around 0.1 for an object with a relative permittivity of 2 (Fig. 26.10.2).

The array can be operated in an analog or a digital readout mode. In analog mode, the single pixel sensor performance was measured using setup a) as shown in Fig. 26.10.4 with the BP output signal muxed to a pad (TP1, Fig. 26.10.1). The dynamic range (DR), defined as the ratio between the maximum sensor response for a metal tip and the spot noise at a chopping frequency ( $f_{chop}$ ), is typically around 93dB at  $f_{chop}=200$ kHz. The DR variations are within 18dB due to a global biasing scheme unable to accommodate pixel-to-pixel process variations. The measured parasitic coupling between 4 subarray pixels is smaller than -23dB. In digital mode, the sensor performance was measured with setup b) shown in Fig. 26.10.4 by analyzing the ADC output for a single pixel in the time domain. The DR of the digital readout is 37dB with 64 averaged samples per pixel (1MHz sampling clock) resulting in a frame-rate of 28fps for the whole array ( $f_{chop}=200$ kHz, 2.45kHz low pass corner). Imaging results are shown in Fig. 26.10.5 for a Ni-mesh close-to direct contact with the chip surface. The mesh exhibits a 50 $\mu$ m bar width and a 250 $\mu$ m bar pitch. The 2D image was acquired during a single 1D lateral scan at a 1 $\mu$ m step size. The sensors resolve the bar edges with 15 $\mu$ m resolution according to a 10-to-90% rising-to-falling edge criterion, being slightly higher than the previously reported 10 to 12 $\mu$ m [2] due to the soft bar edges of the mesh.

Figure 26.10.6 shows a comparison of this work with prior near-field imaging systems and Fig. 26.10.7 shows the die micrograph. This work advances the prior art in Fig. 26.10.6 by showing a monolithically integrated THz super-resolution imaging SoC with real-time capability. The array provides a scanning time reduction of more than two orders of magnitude and increases the dynamic range by up to 51dB compared to previously reported single pixel sensors in [2,3].

### Acknowledgements:

This work was partially funded by the DFG priority program SPP 1857 ESSENCE and a Reinhart Koselleck project.

### References:

- [1] A. Adam, "Review of Near-Field Terahertz Measurement Methods and Their Applications," *J. Infrared, Millimeter and Terahertz Waves*, vol. 32, no. 8, pp. 976-1019, Sept. 2011.
- [2] J. Grzyb et al., "Solid-State Terahertz Superresolution Imaging Device in 130-nm SiGe BiCMOS Technology," *IEEE TMTT*, DOI: 10.1109/TMTT.2017.2684120, in press.
- [3] J. Grzyb et al., "A 0.55 THz Near-Field Sensor With a  $\mu$ m-Range Lateral Resolution Fully Integrated in 130 nm SiGe BiCMOS," *IEEE JSSC*, vol. 51, no. 12, pp. 3063-3077, Dec. 2016.
- [4] S. Horst et al., "Modified Wilkinson Power Dividers for Millimeter-Wave Integrated Circuits," *IEEE TMTT*, vol. 55, no. 11, pp. 2439-2446, Nov. 2007.
- [5] T. Mitsunaka et al., "CMOS Biosensor IC Focusing on Dielectric Relaxations of Biological Water With 120 and 60 GHz Oscillator Arrays," *IEEE JSSC*, vol. 51, no. 11, pp. 2534-2544, Nov. 2016.



Figure 26.10.1: Block diagram of the 128-pixel near-field array.



Figure 26.10.2: Illustration of the subarray arrangement and sensor simulation results.



Figure 26.10.3: Schematic of a single 4x1-pixel subarray and the shared active load.



Figure 26.10.4: Analog/digital sensor characterization: Measurement setup a) and b) results.



Figure 26.10.5: 2D imaging results for a 1D scan of a Ni-mesh (1μm step size).

| Technology   | Resolution [μm] | Frequency [THz] | Dynamic range [dB]                       | Integration level        | Number of pixels | Ref.      |
|--------------|-----------------|-----------------|------------------------------------------|--------------------------|------------------|-----------|
| NSOM         | typ. 3.3-40     | 0.2-2.5         | -                                        | external detector/source | 1                | [1]       |
| 0.13-μm SiGe | est. 10-12      | 0.534-0.562     | 42 <sup>1</sup>                          | detector/source          | 1                | [2]       |
| 0.13-μm SiGe | est. 8-10       | 0.533-0.555     | 20 <sup>2</sup>                          | detector/source          | 1                | [3]       |
| 65-nm CMOS   | -               | 0.06,0.12       | -                                        | fully integrated         | 128,192          | [5]       |
| 0.13-μm SiGe | est. 10-12      | 0.534-0.562     | 93 <sup>3</sup><br>37@28fps <sup>4</sup> | fully integrated         | 128              | This Work |

<sup>1</sup> estimated to be equal to the SNR of the image in Fig. 19 [2]<sup>2</sup> estimated to be equal to the SNR of the image in Fig. 25.1.5 [3]<sup>3</sup> measured at the bandpass output (TP1), referenced to a bandwidth of 1Hz<sup>4</sup> DR off the full readout chain with 64 averaged ADC samples for every pixel

Figure 26.10.6: Comparison table for near-field imagers.



Figure 26.10.7: Die micrograph (stitched). The blacked-out region contains circuits that are not related to the array.

# Session 27 Overview: *Power-Converter Techniques*

## POWER MANAGEMENT SUBCOMMITTEE



**Session Chair:**  
**Makoto Takamiya**  
*University of Tokyo, Tokyo, Japan*



**Associate Chair:**  
**Yen Hsun Hsu**  
*Mediatek, Hsinchu, Taiwan*

### Subcommittee Chair: **Axel Thomsen**, Cirrus Logic, Austin, TX

The session on Power Converter Techniques presents improvements of power density, power efficiency and power dissipation in switched-capacitor, hybrid, linear and inductor-based DC-DC converters and power modulators. The first paper addresses the fully integrated fine-grained rational buck-boost converter with switched capacitor. The next four papers present innovative ideas in inductor-based DC-DC converters including capacitor-assisted hybrid DC-DC converters. This is followed by two high-frequency HPUE-capable envelope-tracking power modulators. An LDO is also presented that achieves good transient response under Hi-Lo-Hi transient stimulus. Finally, the last paper introduces the on-chip resonant-gate-drive SC converter for near-threshold computing.



**1:30 PM**  
**27.1 A 0.22-to-2.4V-Input Fine-Grained Fully Integrated Rational Buck-Boost SC DC-DC Converter Using Algorithmic Voltage-Feed-In (AVFI) Topology Achieving 84.1% Peak Efficiency at 13.2mW/mm<sup>2</sup>**

*Y. Jiang, University of Macau, Macau, China*

In Paper 27.1, Macau University describes an Algorithmic Voltage-Feed-In (AVFI) Topology for systematic rational VCR generation. The converter achieves 84.1% peak efficiency at 13.2mW/mm<sup>2</sup> with 0.22-to-2.4V input, demonstrating a >13× power density improvement.



**2:00 PM**  
**27.2 A 10MHz Time-Domain-Controlled Current-Mode Buck Converter with 8.5% to 93% Switching Duty Cycle**

*J.-G. Kang, Hanyang University, Seoul, Korea*

In Paper 27.2, Hanyang University presents a current-mode time-domain-controlled buck converter, which eliminates the need for current sensor and prevents sub-harmonic oscillation without slope compensation. This 10MHz current-mode buck converter can provide wide range of output from 0.15V to 1.69V with 1.8V input with peak efficiency of 94.9%.



**2:30 PM**  
**27.3 An 86% Efficiency SIMO DC-DC Converter with One Boost, One Buck, and a Floating Output Voltage for Car-Radio**

*A. Salimath, University of Pavia, Pavia, Italy*

In Paper 27.3, the University of Pavia demonstrates a SIMO DC-DC converter generating three supply voltages suitable for a Class-D audio amplifier of a car-radio. The circuit withstands the 4-to-40V range of car battery variations and regulates in the range of 4.5 to 27V. Designed using a 0.11µm BCD process, switched at 2.4MHz, the SIMO offers a peak efficiency of 86% at 2.7W of output power with active area of 2.5mm<sup>2</sup>.



3:15 PM

**27.4 A 97% High-Efficiency 6µs Fast-Recovery-Time Buck-Based Step-Up/Down Converter with Embedded 1/2 and 3/2 Charge-Pumps for Li-Ion Battery Management***M.-W. Ko, KAIST, Daejeon, Korea*

In Paper 27.4, KAIST presents a step-up/down converter IC for Li-ion battery management, which embeds 1/2 and 3/2 charge-pumps in buck topology. It is fabricated with a 0.18µm BCD process, achieves 97% maximum efficiency with wide load current range of 0.03 to 1A, and 6µs fast recovery time with hysteretic control.



3:30 PM

**27.5 A 95.2% Efficiency Dual-Path DC-DC Step-Up Converter with Continuous Output Current Delivery and Low Voltage Ripple***S-U. Shin, KAIST, Daejeon, Korea*

In Paper 27.5, KAIST describes a dual-path step-up DC-DC converter with two current paths in an inductor and a flying capacitor in different time-slots. The output ripple voltage is reduced to less than 15mV owing to the continuous output delivery current and furthermore, its right-half-plane zero is alleviated. This converter has peak efficiency of 95.2% even with a DCR up to 200mΩ.



3:45 PM

**27.6 An 87.1% Efficiency RF-PA Envelope-Tracking Modulator for 80MHz LTE-Advanced Transmitter and 31dBm PA Output Power for HPUE in 0.153µm CMOS***C-Y. Ho, MediaTek, Hsinchu, Taiwan*

In Paper 27.6, Mediatek demonstrates an envelope-tracking modulator (ETM) in 0.153µm CMOS for an 80MHz LTE-A transmitter with a dual-mode AC feed-forward Class-AB linear amplifier. This 80MHz ETM achieves -38.1dBc ACLR at 26dBm PA output power and its peak efficiency is 87.1% and 81.2% for 20MHz and 80MHz, respectively.



4:15 PM

**27.7 A 2TX Supply Modulator for Envelope-Tracking Power Amplifier Supporting Intra- and Inter-Band Uplink Carrier Aggregation and Power Class-2 High-Power User Equipment***T. Nomiyama, Samsung Electronics, Hwaseong, Korea*

In Paper 27.7, Samsung demonstrates a single 2TX SM-IC supporting two independent TXs with Power Class 2. Only one buck-boost is used and shared for both TXs by swapping capacitors, and buck converters are equipped with return-to-battery switching for efficiency and noise. The SM achieves max 84.6% efficiency and -133dBm/Hz noise. The ET-PA of LTE Band 41 reaches 42.7% PAE while delivering 29.4dBm power with -38.2dBc ACLR.



4:45 PM

**27.8 94% Power-Recycle and Near-Zero Driving-Dead-Zone N-Type Low-Dropout Regulator with 20mV Undershoot at Short-Period Load Transient of Flash Memory in Smart Phone***W-C. Chen, MediaTek, Hsinchu, Taiwan*

In Paper 27.8, Mediatek presents an LDO that features a virtual-ground-based dynamic-power-recycling buffer and anti-ringing feed-forward compensation. It achieves 20mV undershoot with the unique short-period H-L-H load transient of flash memory. The near-zero driving dead-zone of the buffer improves transient response while maintaining high current efficiency with 94% power recycling.



5:00 PM

**27.9 An On-Chip Resonant-Gate-Drive Switched-Capacitor Converter for Near-Threshold Computing Achieving 70.2% Efficiency at 0.92A/mm<sup>2</sup> Current Density and 0.4V Output***M. Abdelfattah, Ohio State University, Columbus, OH*

In Paper 27.9, Ohio State University demonstrates a fully-integrated switched-C converter for near-threshold in 45nm SOI. A single 100pH on-chip inductor is used to reduce the switching losses of multiple power FETs, thus maximizing efficiency without sacrificing current density. The converter achieves 70.2% efficiency at 0.92 A/mm<sup>2</sup> and 0.4V output using an area-efficient resonant-gate-drive scheme. It operates from 1V input and occupies 0.3mm<sup>2</sup>, only 5% of which is for the inductor.

## 27.1 A 0.22-to-2.4V-Input Fine-Grained Fully Integrated Rational Buck-Boost SC DC-DC Converter Using Algorithmic Voltage-Feed-In (AVFI) Topology Achieving 84.1% Peak Efficiency at 13.2mW/mm<sup>2</sup>

Yang Jiang<sup>1</sup>, Man-Kay Law<sup>1</sup>, Pui-In Mak<sup>1</sup>, Rui P. Martins<sup>1,2</sup>

<sup>1</sup>University of Macau, Macau, China,

<sup>2</sup>Instituto Superior Tecnico/University of Lisboa, Lisbon, Portugal

Most existing switched-capacitor (SC) DC-DC converters only offer a few voltage conversion ratios (VCRs), leading to significant efficiency fluctuations under wide input/output dynamics (e.g. up to 30% in [1]). Consequently, systematic SC DC-DC converters with fine-grained VCRs (FVCRs) become attractive to achieve high efficiency over a wide operating range. Both the Recursive SC (RSC) [2,3] and Negator-based SC (NSC) [4] topologies offer systematic FVCR generations with high conductance, but their binary-switching nature fundamentally results in considerable parasitic loss. In bulk CMOS, the restriction of using low-parasitic MIM capacitors for high efficiency ultimately limits their achievable power density to <1mW/mm<sup>2</sup>. This work reports a fully integrated fine-grained buck-boost SC DC-DC converter with 24 VCRs. It features an algorithmic voltage-feed-in (AVFI) topology to systematically generate any arbitrary buck-boost rational ratio with optimal conduction loss while achieving the lowest parasitic loss compared with [2,4]. With 10 main SC cells (MCs) and 10 auxiliary SC cells (ACs) controlled by the proposed reference-selective bootstrapping driver (RSBD) for wide-range efficient buck-boost operations, the AVFI converter in 65nm bulk CMOS achieves a peak efficiency of 84.1% at a power density of 13.2mW/mm<sup>2</sup> over a wide range of input (0.22 to 2.4V) and output (0.85 to 1.2V).

Figure 27.1.1 shows the system diagram of the proposed AVFI topology, which takes advantage of the quasi-Dickson topology to achieve optimal conduction and parasitic losses for rational buck/boost VCR generation. It consists of  $n$  stages of cascaded unit rational cells (RC) carrying equal charge flow, leading to low intrinsic conduction loss as in RSC and NSC under the same VCR. By feeding in either  $V_{IN}$  or  $V_{OUT}$  algorithmically into each RC stage, it can realize any arbitrary VCR, from  $(n+1):1$  to  $(n+1):n$  in buck mode and from  $n:(n+1)$  to  $1:(n+1)$  in boost mode, respectively. Voltage-feed-in (VFI) coefficients  $a_i$  and  $b_i$  determine the involvement of  $V_{IN}/V_{OUT}$  in each power cell, while  $m_i$  corresponds to the power cell configuration for either Dickson or charge-path folding (QF) mode operation. Depending on the VCR requirement, the AVFI algorithm achieves a unique topology by configuring each RC into one of the 8 possible modes according to the conversion type (i.e. buck/boost) and the VFI coefficients. As shown in Fig. 27.1.1, the Dickson mode involves two cases, TT (top-in-top-out) and BB (bottom-in-bottom-out), as distinguished by the dual-phase charge flow ( $Q_{flow}$ ) path within an RC cell. In QF mode, the in/out  $Q_{flow}$  happens on different plates, denoted by TB and BT. In conventional Dickson topology, the cascaded power cells intrinsically ensure same plate inter-cell charge transfer ( $Q_{tran}$ ), exhibiting the  $C_{i,top} \rightarrow C_{i+1,top}$  or  $C_{i,bot} \rightarrow C_{i+1,bot}$  pattern with small bottom-plate switching voltage ( $\Delta V_{CB}$ ). This is in contrast to binary (including RSC and NSC) and series-parallel topologies which incorporate  $Q_{tran}$  patterns  $C_{i,top} \rightarrow C_{i+1,bot}$  and  $C_{i,bot} \rightarrow C_{i+1,top}$ , leading to sub-optimal  $\Delta V_{CB}$  and hence excessive parasitic loss. In the AVFI topology, RCs operating in QF mode can inherently perform a  $Q_{tran}$ -path folding function to reduce the parasitic loss due to the direct cascading of TT and BB cells. An illustrative example of a 7:4 buck AVFI converter is shown in Fig. 27.1.1

Figure 27.1.2 shows the theoretical analysis of the AVFI converter with 24 VCRs (11 buck + 13 boost) with comparison to 4-stage RSC (RSC-4) and 3-stage NSC (NSC-3), over the target VCR range from 2:1 to 1:7. The AVFI achieves the same conduction loss as RSC-4 and NSC-3 in both buck and boost modes. Although NSC-3 offers more VCRs via multi-coefficients feedback, many of them show higher conduction losses and can hardly contribute to efficiency improvement especially at heavy load. Due to the quasi-Dickson property of the AVFI converter, we can demonstrate a ~50% parasitic-loss-factor ( $M_{PAR}$ ) improvement in buck mode, except for 2:1 as all 3 cases result in the same topology. In boost mode, the AVFI topology shows a quasi-linear parasitic loss profile instead of exponentially increasing as in RSC-4 and NSC-3. Figure 27.1.2 also compares the theoretical performance among different topologies.  $C_{fly}$  has a bottom-plate parasitic of 8% (typical for MOSCAP), and  $C_{total}$ ,  $V_{OUT}$  and  $I_{load}$  are set to 15nF, 1V and 20mA, respectively. The AVFI converter with 24 VCRs shows the best overall efficiency, with >6% efficiency improvement in most VCRs.

Figure 27.1.3 shows the implemented AVFI converter using a partitionable power stage (10MCs + 10ACs) with a scaling ratio (SR) of 5. Unlike RSC and NSC that require weighted power cells, the uniform charge-flow property in the AVFI converter can facilitate modular implementations as shown in Fig. 27.1.1. Power cell partitioning can alleviate the total number of required power cells ( $N_{CT}$ ) induced by fined-grained VCRs with complete capacitor utilization. The 10MCs+10ACs structure can theoretically realize up to 79 VCRs (40 buck + 39 boost) with a 3x  $N_{CT}$  reduction (from 60 to 20). We select a total of 24 VCRs (11 buck + 13 boost) out of 79 according to the target conversion range and power level. The cell partitioning details are summarized in Fig. 27.1.3.

Conventionally, the bootstrapping technique can only accommodate for one fixed reference node across a power switch, requiring connecting the gate control voltage to the absolute system high/low levels to ensure proper switch-off state under the dynamic node conditions for FVCR. This either increases the switch on-resistance, or mandates the use of high-voltage switches. The proposed RSBD resolves this issue by adaptively selecting the proper reference node for accurate switch-off control while ensuring robust operation with low-voltage power switches to reduce the switching loss. Figure 27.1.4 details the power-cell implementation and the proposed RSBD circuit for the dual-phase control of switch  $T_3$ , which exhibits the greatest design challenge. The P/N switches ( $S_{P/N}$ ) on the top plate ( $T_{1-3}$ ) are alternatively activated in buck/boost modes. With 2b external control ( $en$  and  $lv$ ), the proposed RSBD incorporates two reference-selection (RS) blocks that select the proper high/low reference control levels from the periodically varying node voltages. Besides, the 3 control blocks (i.e.  $\phi_{ON}\ ctrl.$ ,  $\phi_{OFF}\ ctrl.$  and  $\phi_{dis}\ ctrl.$ ) generate the required gate control signals for the switch on-, off- and disable-states, respectively. The  $\phi_{ON/OFF}\ ctrl.$  takes the selected level from the corresponding RS blocks as reference, then pump up/down the system clock to generate the required switch driving voltages  $V_{GP}/V_{GN}$  for  $S_{P/N}$ . The  $\phi_{dis}\ ctrl.$  ties the  $S_{P/N}$  gate terminals to the appropriate voltage level during the disable-state. The table in Fig. 27.1.4 summarizes the corresponding  $S_{P/N}$  driving states for  $T_3$ .

The AVFI SC DC-DC converter (Fig. 27.1.7) occupies an area of 2.4mm<sup>2</sup> in 65nm bulk CMOS. The on-chip  $C_{fly}$  (MIM+MOS) and  $C_{OUT}$  (MOS) are ~8nF and ~6nF, respectively. Figure 27.1.5 (top) plots the measured power conversion efficiencies with  $V_{IN}$  varying from 0.23 to 2.3V at  $V_{OUT}=1V$ , showing high consistency with the simulation result except at high VCRs where the MOS capacitance degradation becomes significant. The peak efficiency is 84.1% at a power density of 13.2mW/mm<sup>2</sup>. The performance is similar for  $V_{IN}=0.26$  to 2.4V with  $V_{OUT}=1.2V$ , and for  $V_{IN}=0.22$  to 2.15V with  $V_{OUT}=0.85V$ . Figure 27.1.5 (middle) shows the output power delivery range versus efficiency for different VCRs, demonstrating a maximum  $I_{OUT}$  of up to 80.1mA at 2:1. Figure 27.1.5 (bottom) depicts the measured transient waveforms at an  $I_{OUT}$  step from 4 to 25mA without external capacitors using pulse-skipping modulation, indicating an output ripple voltage ( $V_{rip}$ ) of 60 and 90mV, respectively.

Benchmarking with state-of-the-art FVCR SC DC-DC converters in Fig. 27.1.6, this work achieves the most VCR, and the highest power density and peak efficiency without using external capacitors. Comparing with the RSC-based topology in [3], this work improves the power density by >13x, at a higher peak efficiency and over a wider VCR range.

### Acknowledgements:

The authors thank the Macau FDCT (FDCT069/2016/A2) and University of Macau (MYRG2015-AMSV-00140) for financial support.

### References:

- [1] C. K. Teh and A. Suzuki, "A 2-Output Step-Up/Step-Down Switched-Capacitor DC-DC Converter with 95.8% Peak Efficiency and 0.85-to-3.6V Input Voltage Range," *ISSCC*, pp. 222-223, Feb. 2016.
- [2] L. G. Salem and P. P. Mercier, "An 85%-Efficiency Fully Integrated 15-Ratio Recursive Switched-Capacitor DC-DC Converter with 0.1-to-2.2V Output Voltage Range," *ISSCC*, pp. 88-89, Feb. 2014.
- [3] D. Lutz, et al., "A 10mW Fully Integrated 2-to-13V-Input Buck-Boost SC Converter with 81.5% Peak Efficiency," *ISSCC*, pp. 224-225, Feb. 2016.
- [4] W. Jung, et al., "A Rational-Conversion-Ratio Switched-Capacitor DC-DC Converter Using Negative-Output Feedback," *ISSCC*, pp. 218-219, Feb. 2016.



Figure 27.1.1: Proposed algorithmic voltage-feed-in (AVFI) topology with rational cells for arbitrary quasi-Dickson rational buck/boost VCR generation.



Figure 27.1.2: Simulated efficiency over  $V_{IN}$  with  $C_{BP}=8\%$  (top) and parasitic loss factor (the lower the better) compared to existing topologies (bottom).



Figure 27.1.3: Proposed 10MC+10AC architecture (top) for AVFI converter with 24 VCRs (11 Buck + 13 Boost) and the cell partitioning modes (bottom).



Figure 27.1.5: Measured efficiency versus  $V_{IN}$  range (top), efficiency over output power (middle) and load transient waveforms (bottom).



Figure 27.1.4: Rational power cell implementation and proposed RSBD for power switches.

|                                               | This work                 | D. Lutz<br>ISSCC'16         | C. K. Teh<br>ISSCC'16     | M. Saadat<br>ASSCC'15 | X. Hua<br>CICC'15 | J. Jiang<br>JSSC'17 |
|-----------------------------------------------|---------------------------|-----------------------------|---------------------------|-----------------------|-------------------|---------------------|
| Technology                                    | 65nm CMOS                 | 0.35μm HVCmos               | 65nm CMOS                 | 0.25μm CMOS           | 65nm CMOS         | 130nm CMOS          |
| Conv. type                                    | Buck-Boost                | Buck-Boost                  | Buck-Boost                | Buck-Boost            | Buck              | Buck                |
| No. of VCR                                    | 11 buck + 13 boost        | 8 buck + 9 boost            | 5 buck + 1 boost          | 4 buck + 4 boost      | 3 buck + 3 boost  | 6 buck              |
| Integrated C <sub>fly</sub>                   | MOS + MIM                 | MIM                         | MOS + off-chip 1μF        | MIM                   | N/A               | Off-chip 4μF        |
| V <sub>IN</sub> [V]                           | 0.22 ~ 2.4                | 2 ~ 13                      | 0.85 ~ 3.6                | 0.6 ~ 2.4             | 0.5 ~ 3.3         | 1.6 ~ 3.3           |
| V <sub>OUT</sub> [V]                          | 0.85 ~ 1.2                | 5                           | 0.1 ~ 1.9                 | 1.2 ~ 1.5             | 1                 | 0.5 ~ 3             |
| I <sub>OUT, MAX</sub> [mA]                    | 80.1                      | 4                           | 10                        | 0.1                   | 0.0033            | 120                 |
| η <sub>peak</sub> [%]                         | Buck: 84.1<br>Boost: 83.2 | Buck: 81.5<br>Boost: 70.9   | Buck: 95.8<br>Boost: 90.5 | 76                    | 70.4              | 91                  |
| P-den@η <sub>peak</sub> [mW/mm <sup>2</sup> ] | Buck: 13.2<br>Boost: 10.2 | *Buck: 0.96<br>*Boost: 0.15 | N/A                       | *0.062                | *0.0069           | N/A                 |

\*Estimated from the corresponding literature

Figure 27.1.6: Performance comparison with state-of-the-art designs.



- 1~2) 10 MC  $C_{fly}$  (MIM + MOS) and  $C_{out}$  (MOS) for  $0^\circ$  and  $180^\circ$  branches
- 3~4) 10 AC  $C_{fly}$  (MIM + MOS) and  $C_{out}$  (MOS) for  $0^\circ$  and  $180^\circ$  branches
- 5~7) RSBGs + digital control
- 8~9) Power switches

Figure 27.1.7: Die micrograph of the proposed AVFI SC DC-DC converter with 24 VCRs.

## 27.2 A 10MHz Time-Domain-Controlled Current-Mode Buck Converter with 8.5% to 93% Switching Duty Cycle

Jin-Gyu Kang, Min-Gyu Jeong, Jeongpyo Park, Changsik Yoo

Hanyang University, Seoul, Korea

Current-mode DC-DC converters offer various advantages over voltage-mode DC-DC converters such as much simpler frequency compensation, automatic over-current protection, and faster transient response [1,2]. For current-mode control, however, an accurate inductor current sensor is required which can be very sensitive to noise. Another concern in designing a current-mode DC-DC converter is the instability under certain operating conditions known as subharmonic oscillation. A peak-current-mode buck converter, for example, may become unstable when its switching duty cycle is larger than 50% and slope compensation is required to ensure stable operation. While both current-mode and voltage-mode DC-DC converters are conventionally controlled by voltage-domain controllers that use voltage signals as control variables, the works in [3] and [4] have shown that voltage-mode DC-DC converters can also be controlled by time-domain controllers consisting of only time-domain circuits such as voltage-controlled oscillators, voltage-controlled delay lines, and phase detectors (PD). Because time-domain controllers do not use any wide-bandwidth error amplifier, voltage comparator, and passive RC filter required for conventional voltage-domain controllers, they consume much less power and occupy smaller silicon area.

This paper describes a time-domain current-mode controller that can eliminate the need for an inductor current sensor and prevent sub-harmonic oscillation without slope compensation. A 10MHz current-mode buck converter has been implemented in a 65nm CMOS process with the proposed time-domain current-mode controller. Because the time-domain controller does not use voltage comparators, the switching duty cycle is not limited by the delay of the voltage comparator and can range from 8.5% to 93% for the 10MHz current-mode buck converter, resulting in an output voltage range of 0.15V to 1.69V from 1.8V input.

Figure 27.2.1 shows the architectures of current-mode buck converters with the conventional voltage-domain controller and with its equivalent proposed time-domain controller. The PD of the time-domain controller replaces the voltage comparator of the conventional voltage-domain controller and provides the switching signal  $V_{PWM}$  whose pulse width is proportional to the phase difference between the clocks  $CLK_{SET}$  and  $CLK_{RST}$ . The error voltage  $V_{OUT}-V_{REF}$  of the buck converter output is applied to the control ports of the voltage-controlled oscillator VCO1 and the voltage-controlled delay line VCDL. Therefore, the phase  $\phi_{SET}$  of the clock  $CLK_{SET}$  is the sum of the integral term provided by VCO1 and the proportional term provided by VCDL of the error voltage  $V_{OUT}-V_{REF}$  and can be written as;

$$\phi_{SET}(s) = [\omega_{01} - K_{VCO1} \cdot (V_{OUT} - V_{REF})]/s + K_{VCDL} \cdot (V_{OUT} - V_{REF}) \quad (1)$$

where  $\omega_{01}$  is the free-running frequency of VCO1,  $K_{VCO1}$  is the voltage-to-frequency gain of VCO1, and  $K_{VCDL}$  is the voltage-to-delay gain of VCDL. The proportional term  $K_{VCDL} \cdot (V_{OUT} - V_{REF})$  generates a zero to stabilize the feedback loop like the resistor  $R_1$  of the conventional voltage-domain controller. Because the frequency of VCO2 is proportional to the voltage  $V_{SW}-V_{OUT}$  across the inductor, the phase  $\phi_{RST}$  of the clock  $CLK_{RST}$  is given as;

$$\phi_{RST}(s) = [\omega_{02} + K_{VCO2} \cdot (V_{SW} - V_{OUT})]/s \quad (2)$$

where  $\omega_{02}$  and  $K_{VCO2}$  are the free-running frequency and the voltage-to-frequency gain of VCO2, respectively. As can be seen in equation (2), the phase  $\phi_{RST}$  of the clock  $CLK_{RST}$  is proportional to the integral of the voltage  $V_{SW}-V_{OUT}$  across the inductor, which means the voltage-controlled oscillator VCO2 performs the inductor current sensing to replace the inductor current sensor  $R_i$  of the conventional voltage-domain controller. Figure 27.2.2 shows the operation waveforms of the clock signals  $CLK_{SET}$  and  $CLK_{RST}$  and their phases  $\phi_{SET}$  and  $\phi_{RST}$  at steady-state. Because the buck converter output  $V_{OUT}$  is equal to the reference level  $V_{REF}$  at steady state, the phase  $\phi_{SET}$  of the clock  $CLK_{SET}$  increases from 0 to  $2\pi$  with constant slope of  $\omega_{01}$  as can be seen in equation (1) and Fig. 27.2.2. The phase  $\phi_{RST}$  of the clock  $CLK_{RST}$  increases with different slopes  $S_{on}$  and  $S_{off}$  depending on the voltage level of the switch voltage  $V_{SW}$ .

With conventional voltage-domain peak current-mode control, a perturbation  $\Delta I_L(0)$  of inductor current becomes  $\Delta I_L(T_{SW})$  after one switching cycle and  $\Delta I_L(T_{SW})$  is larger than  $\Delta I_L(0)$  when the switching duty cycle is larger than 50% as shown in Fig. 27.2.3. Because the perturbation of the inductor current increases over time, this causes sub-harmonic oscillation. To prevent the sub-harmonic oscillation of conventional voltage-domain current-mode control, slope compensation is required with the external ramp signal  $V_{RAMP}$  as shown in Fig. 27.2.1. The proposed time-domain current-mode control does not have the sub-harmonic oscillation problem even without slope compensation. Assuming the perturbation  $\Delta I_L(0)$  of the inductor current appears as a variation  $\Delta\phi(0)$  of the phase  $\phi_{RST}$  as shown in Fig. 27.2.3, the variation of the phase  $\phi_{RST}$  becomes  $\Delta\phi(T_{SW})$  after one switching cycle. Denoting the variation of the switch-ON time by the phase variation  $\Delta\phi(0)$  as  $\Delta t$ , the phase variations  $\Delta\phi(0)$  and  $\Delta\phi(T_{SW})$  can be written as  $S_{on} \cdot \Delta t$  and  $S_{off} \cdot \Delta t$ , respectively. From equation (2), the ramping slopes  $S_{on}$  and  $S_{off}$  of the phase  $\phi_{RST}$  are given as;

$$S_{on} = \omega_{02} - K_{VCO2} \cdot (V_{IN} - V_{OUT}) \quad (3)$$

$$S_{off} = \omega_{02} - K_{VCO2} \cdot V_{OUT} \quad (4)$$

Because the input  $V_{IN}$  is larger than the buck converter output  $V_{OUT}$ ,  $S_{on}$  is larger than  $S_{off}$  and  $\Delta\phi(T_{SW}) = S_{off} \cdot \Delta t$  is smaller than  $\Delta\phi(0) = S_{on} \cdot \Delta t$  even when the switching duty cycle is larger than 50%. This means the variation of the phase  $\phi_{RST}$  resulting from the perturbation of inductor current decreases over time and the sub-harmonic oscillation is prevented if employing the proposed time-domain current-mode control.

The current-mode buck converter with the proposed time-domain controller and 10MHz switching frequency has been implemented in a 65nm CMOS process. Because the proposed time-domain current-mode controller does not use a voltage comparator whose large delay easily limits the switching duty cycle, the current-mode buck converter can have a wide range of switching duty cycles. The measured switching duty cycle ranges from 8.5% to 93%, allowing the converter to regulate the output from 0.15V to 1.69V from an input voltage of 1.8V. The measured waveforms are shown in Fig. 27.2.4 for switching duty cycles of 8.5%, 57%, and 93%, and the power conversion efficiency versus the load current  $I_{OUT}$  is shown in the same figure as well. The peak power efficiency is 94.9% when the load current is 250mA and the output is 1.5V. Figure 27.2.5 shows the measured waveforms of load-transient operation and reference-tracking operation required for dynamic voltage scaling (DVS). The output voltage  $V_{OUT}$  recovers its nominal value in less than 3.5μs for the load current steps of 48mA/0.1μsec. When the target level of  $V_{OUT}$  changes from 1V (0.5V) to 0.5V (1V), the output  $V_{OUT}$  reaches the desired level in 3.5μs (3μs). The performance of the time-domain controlled current-mode buck converter is compared with other works in Fig. 27.2.6. The proposed time-domain controller occupies only 0.036mm<sup>2</sup> because it does not need any passive components and amplifiers. The die micrograph is shown in Fig. 27.2.7

### Acknowledgements:

This work was supported by the Samsung Research Funding Center of Samsung Electronics under Project Number SRFC-IT1501-01. The CAD tools and chip fabrication were supported by the IC Design Education Center (IDEC), Korea.

### References:

- [1] Y.-H. Lee, et. al., "Fast Transient (FT) Technique with Adaptive Phase Margin (APM) for Current Mode DC-DC Buck Converters," *IEEE Trans. VLSI.*, vol. 20, no. 10, pp. 1781-1793, Oct. 2012.
- [2] M. Du, et.al., "A 5-MHz 91%-Peak Power Efficiency Buck Regulator with Auto-Selectable Peak- and Valley-Current Control," *IEEE JSSC*, vol. 46, no. 8, pp. 1928-1939, Aug. 2011.
- [3] S. Kim, et. al., "High Frequency Buck Converter Design Using Time-Domain Control Technique," *IEEE JSSC*, vol. 50, no. 4, pp. 990-1001, Apr. 2015.
- [4] S. Kim, et. al., "A 1.8V 30-to-70MHz 87% Peak Efficiency 0.32mm<sup>2</sup> 4-Phase Time-Based Buck Converter Consuming 3μA/MHz Quiescent Current in 65nm CMOS," *ISSCC*, pp. 216-217, 2015.



Figure 27.2.1: Current-mode buck converter with the proposed time-domain controller.



Figure 27.2.2: Operation waveforms of the current-mode buck converter with the proposed time-domain controller.



With conventional voltage-domain control,  
 $\Delta I_L(T_{sw}) > \Delta I_L(0)$  if duty  $> 50\%$  → Sub-harmonic oscillation

With the proposed time-domain control,  
 $\Delta \Phi(T_{sw}) < \Delta \Phi(0)$  even for duty  $> 50\%$  → No sub-harmonic oscillation

Figure 27.2.3: Subharmonic oscillation issue with conventional voltage-domain control and the proposed time-domain control.



Figure 27.2.4: Measured waveforms at steady state, and power conversion efficiency.



Figure 27.2.5: Measured waveforms of load transient and reference tracking operations.

|                                                | [1]                         | [2]                         | [3]                      | This work                |
|------------------------------------------------|-----------------------------|-----------------------------|--------------------------|--------------------------|
| Technology                                     | 350-nm CMOS                 | 350-nm CMOS                 | 65-nm CMOS               | 65-nm CMOS               |
| Control scheme                                 | Voltage-domain current mode | Voltage-domain current mode | Time-domain voltage mode | Time-domain current mode |
| $V_{IN}$ [V]                                   | 2.7–3.6                     | 2.7–4.2                     | 1.8                      | 1.8                      |
| $V_{OUT}$ [V]                                  | 2                           | 0.5–2.6                     | 0.6–1.5                  | 0.15–1.69                |
| Maximum $I_{OUT}$ [A]                          | 0.6                         | 0.5                         | 0.6                      | 0.6                      |
| Switching frequency [MHz]                      | 1                           | 5                           | 11–25                    | 10                       |
| Inductor [ $\mu\text{H}$ ]                     | 4.7                         | 1                           | 0.22                     | 0.22                     |
| Capacitor [ $\mu\text{F}$ ]                    | 10                          | 4.7                         | 4.7                      | 4.7                      |
| Area [mm $^2$ ]                                | 3.8                         | 0.54                        | 5                        | 2.118                    |
| Die                                            | 0.217*                      | 0.148*                      | 0.037                    | 0.036                    |
| Peak efficiency [%]                            | 93                          | 91                          | 94                       | 94.9                     |
| Load transient settling time [ $\mu\text{s}$ ] | Load current step           | 400-mA / N.A.               | 200-mA / N.A.            | 500-mA / N.A.            |
| Up                                             | 6                           | 6                           | 3                        | 3.5                      |
| Down                                           | 12                          | 8                           | 3.5                      | 3.5                      |
| Reference tracking [ $\mu\text{s}$ ]           | Up                          | N.A.                        | N.A.                     | N.A.                     |
| Down                                           |                             |                             |                          | 3.5                      |

\*Estimated from the chip microphotograph

Figure 27.2.6: Performance comparison.



Figure 27.2.7: Chip micrograph.

## 27.3 An 86% Efficiency SIMO DC-DC Converter with One Boost, One Buck, and a Floating Output Voltage for Car-Radio

Arunkumar Salimath<sup>1</sup>, Edoardo Bonizzoni<sup>1</sup>, Edoardo Botti<sup>2</sup>, Giovanni Gonano<sup>2</sup>, Paolo Cacciagran<sup>2</sup>, Davide Luigi Brambilla<sup>2</sup>, Tommaso Barbieri<sup>2</sup>, Franco Maloberti<sup>1</sup>

<sup>1</sup>University of Pavia, Pavia, Italy; <sup>2</sup>STMicroelectronics, Cornaredo, Italy

The design of Class-D audio power amplifiers [1] for car radio is challenging because of the large voltage variation of the automotive battery. During crank and dump, the 14.4V battery voltage may sharply (in less than 2ms) drop down to 4V or rise up to 40V. For a proper operation, the supply voltages of the Class-D amplifier must be properly controlled for all the battery conditions. The block diagram of the Class-D power amplifier in Fig. 27.3.1 helps in defining the set of required supply voltages. The power audio stage uses  $n$  ( $n = 1, \dots, 4$ ) channels of high and low-side switches both made by n-type transistors. The choice optimizes the on-resistance and the gate capacitance to achieve the best efficiency, but requires a boosted voltage ( $V_{\text{boost}}$ ) to drive the high-side devices. A regulated low voltage ( $V_{\text{reg-low}}=4.5\text{V}$ ) supplies the driver of  $N_{\text{pow-LS}}$ . As the digital and analog processing are performed at  $V_{\text{bat}}/2$  to improve the system SNR, the digital core uses a floating voltage ( $V_{\text{floatH}}-V_{\text{floatL}}=1.8\text{V}$ ) across  $V_{\text{bat}}/2$ .

This paper describes a Single-Inductor Multiple-Output (SIMO) DC-DC converter [2-5] capable of generating all the required voltages, including the floating voltage across  $V_{\text{bat}}/2$ . The SIMO converter is integrated together with the Class-D amplifier for an optimal overall system efficiency and pin count.

Figure 27.3.2 illustrates the schematic diagram of the SIMO converter. In the steady state, the inductor L stores energy through  $M_P$  and  $M_{N1}$  during  $T_1$ . During  $T_2$  and  $T_3$ ,  $HV-D_N$ ,  $HV-D_B$ ,  $D_{RL}$ , and  $M_{N2}$  enable energy distribution into  $V_{\text{boost}}$  and  $V_{\text{reg-low}}$ . Then, the inductor feeds the floating load through  $M_{N3}, N_A$  and  $HV-D_{FH,FL}$  during  $T_4$ . The inductor and the capacitors  $C_{LB}$ ,  $C_{LRL}$ , and  $C_{LF}$  are off-chip. The small external elements  $C_{\text{filter}}$  (10nF) reduce the common-mode ripple possibly caused by unbalanced charge injection during the switching of  $M_{N3}-HV-D_{FH}$  and  $M_{N4}-HV-D_{FL}$ . For driving the power devices, the circuit uses Regulated Floating Dual-Slope (RFDS) drivers. They provide a  $V_{GS}$  of 5V or 0V needed to switch on or off the power devices. For  $M_{N1}$  there is only a dual slope driver. Depending on  $V_{\text{bat}}$  and on the number of channels used by the power stage, the SIMO converter is boost or buck dominated (Fig. 27.3.1).

The switches for the floating load, in addition to be unidirectional, have to ensure two-side protection for HV drop during the off-state. As shown in Fig. 27.3.2, two series-connected HV n-type transistors in switch and diode configuration achieve unidirectional switching and provide HV defense as they have drain-extended terminals at input and output (i/o). The RFDS drivers, across the gate and the floating source terminal, control the HV switches independently of the voltage at their i/o terminals. Two auxiliary linear regulators generating  $V_{\text{bat}}/2 \pm 0.8\text{V}$  supply the digital core during the start-up.

In order to avoid glitches, the sequence of loads switching and phases must ensure a continuous path for the inductor current and properly handle the charge stored on the large parasitic capacitances of the switches. The best trade-off, verified by extensive transistor level simulations, is the following sequence: to charge (i) the inductor through  $M_P$  and  $M_{N1}$ , (ii)  $V_{\text{boost}}$  through  $HV-D_B$ , (iii)  $V_{\text{reg-low}}$  by switching on  $M_{N2}$ , and, finally, (iv) the floating load by turning on  $M_{N3}$  and  $M_{N4}$ . Closing  $M_{N2}$  automatically achieves the critical off-switching of the boost load as it reverse biases  $HV-D_B$ . The large parasitic charge accumulated at  $V_{L2}$  during  $V_{\text{boost}}$  regulation discharges into  $V_{\text{reg-low}}$ . The phases used to turn off  $M_{N2}$  and on  $M_{N3}$  ( $M_{N4}$ ) are slightly overlapped to prevent charge accumulation at  $V_{L2}$  during the switching transition and its subsequent discharge into the floating load. The adopted switching sequence together with the filter capacitors  $C_{\text{filter}}$  significantly reduce the common-mode ripples at the floating output. As all the load delivery paths are unidirectional, the inductor current is prevented from going negative. Hence, the SIMO converter automatically handles any discontinuous conduction mode.

The feedback network, outlined in Fig. 27.3.2, uses HV transconductors to sense the boosted and the floating voltage values. Through scaled resistors, the current outputs are transformed into voltages and compared with a single reference level. The combination of the obtained errors serves to determine the switching duty cycles using the same method as [2] and [4]. The continuous time (CT) analog error processor computes  $E_1+E_2+E_3$ ,  $E_1+E_2-E_3$ , and  $E_1-E_2-E_3$  and compares their amplified and filtered versions with the sawtooth signal for generating the switching phases.

Figure 27.3.3 shows the schematic diagram of the RFDS driver. It consists of three sections: a Zener regulator capable of generating a bounce-free voltage of  $V_{\text{drvz}}+5\text{V}$ , a level shifter whose inputs are the complementary 0-to-5V logic signals coming from the PWM generator, and the dual-slope driver. The positive feedback loop established by  $M_4$  and  $M_5$  speeds up the circuit that level shifts within 5ns. A differential-to-single ended converter controls the dual-slope driver. The small  $M_{12}$  transistor pulls up the output until the inverter switches on  $M_{13}$  to augment the slope of the transition. A typical output waveform is shown on the top of the figure.

The circuit was fabricated in a 0.11μm BCD technology with an active area of 2.5mm<sup>2</sup>. The circuit start-up needs a proper procedure to have (i) the digital core operating for I<sup>C</sup> communication and (ii) the boosted voltage available for RFDS drivers before starting the normal SIMO regulation. Figure 27.3.4 shows the sequence and the measured results of the start-up phase at  $V_{\text{bat}}=14.4\text{V}$ . After turn-on, the diode  $HV-D_{\text{pull-up}}$  pre-charges  $V_{\text{boost}}$  to  $(V_{\text{bat}}-V_{\text{th}})$  while the SIMO converter is off. Two auxiliary linear regulators, supplied by  $V_{\text{bat}}$ , pre-charge the output floating nodes to  $V_{\text{bat}}/2 \pm 0.8\text{V}$ . This creates the initial condition for starting the circuit. A start-up pulse lasting 15ms establishes  $V_{\text{boost}}$  ( $=V_{\text{bat}}+6.5\text{V}$ ) while the controls of the other outputs are off. After the start-up pulse, the SIMO regulation begins and the circuit generates the required output voltages ( $V_{\text{reg-low}}=4.5\text{V}$ ,  $\Delta V_{\text{float}}=1.8\text{V}$ ). As Fig. 27.3.4 shows,  $V_{\text{boost}}$  settles to its nominal value without any ringing and the floating outputs experience a minor transient lasting 0.7ms.

Figure 27.3.5 confirms  $V_{\text{bat}}$  tracking capability of the SIMO converter to respond to sharp fluctuations during crank and dump. The battery profile of the experimental test follows the sequence  $V_{\text{bat}}=14.4\text{V}; 4.5\text{V}; 7\text{V}; 14.4\text{V}; 27\text{V}; 14.4\text{V}$ . The output voltages achieve the line regulation values annotated in the figure; during crank and dump, all of them are below 10.1mV/V, with the exception of  $V_{\text{reg-low}}$  which is 16.2mV/V at  $V_{\text{bat}}=4.5\text{V}$ . Measurements are until  $V_{\text{bat}}=27\text{V}$  because, for higher values, the system enters in the idle state in order to protect the power section of the Class-D amplifier.

Figure 27.3.6 shows the waveforms of  $V_{L1}$  and  $V_{L2}$  measured across the inductor together with the AC-coupled output voltages. The measurement conditions are  $V_{\text{bat}}=14.4\text{V}$  and two-channel load. The choice outlines the ripples, all less than 20mV, mainly caused by switching the large transistors. For full load (four channels), the maximum ripple is 25mV. The SIMO efficiency exceeds 80% for two and four-channel loads with  $V_{\text{bat}}$  ranging from 10V to 25V. The measured peak efficiency is 86%. The table compares the performances of SIMO converters with similar processes and output power. This circuit generates (i) a battery-tracking floating voltage, (ii) a battery-tracking boosted voltage, and (iii) a ground-referred buck output. Its supply voltage range is 4 to 40V complying with the automotive class requirements, and the maximum output ripple is 25mV. The die micrograph is shown in Fig. 27.3.7.

### References:

- [1] M. Hoyerby, et al., "A 2X70W Monolithic Five-Level Class-D Audio Power Amplifier in 180nm BCD," *JSSC*, vol. 51, no. 12, pp. 2819-2829, Dec. 2016.
- [2] M. Belloni, et al., "A 4-Output Single-Inductor DC-DC Buck Converter with Self-Boosted Switch Drivers and 1.2A Total Output Current," *ISSCC*, pp. 444-445, Feb. 2008.
- [3] D. Lu, et al., "An 87%-Peak-Efficiency DVS-Capable Single-Inductor 4-Output DC-DC Buck Converter with Ripple-Based Adaptive Off-Time Control," *ISSCC*, pp. 82-83, Feb. 2014.
- [4] M. Jung, et al., "An Error-Based Controlled Single-Inductor 10-Output DC-DC Buck Converter with High Efficiency at Light Load Using Adaptive Pulse Modulation," *ISSCC*, pp. 222-223, Feb. 2015.
- [5] W. Xu, et al., "A 90% Peak Efficiency Single-Inductor Dual-Output Buck-Boost Converter with Extended PWM Control," *ISSCC*, pp. 394-395, Feb. 2011.



Figure 27.3.1: Class-D power stage block diagram and regulated supply requirements.



Figure 27.3.2: Schematic diagram of the proposed SIMO regulator.



Figure 27.3.3: Schematic diagram of the regulated floating dual-slope (RFDS) driver.



Figure 27.3.4: Measured SIMO regulator outputs during start-up.



Figure 27.3.5: Measured SIMO regulator outputs tracking the battery voltage during crank and dump.



Figure 27.3.6: Measured steady-state SIMO regulator outputs, power efficiency, and performance comparison.



Figure 27.3.7: Die micrograph.

## 27.4 A 97% High-Efficiency 6μs Fast-Recovery-Time Buck-Based Step-Up/Down Converter with Embedded 1/2 and 3/2 Charge-Pumps for Li-Ion Battery Management

Min-Woo Ko<sup>1</sup>, Ki-Duk Kim<sup>1</sup>, Young-Jin Woo<sup>2</sup>, Se-Un Shin<sup>1</sup>, Hyun-Ki Han<sup>1</sup>, Yeunhee Huh<sup>1</sup>, Gyeong-Gu Kang<sup>1</sup>, Jeong-Hyun Cho<sup>1</sup>, Sang-Jin Lim<sup>1</sup>, Se-Hong Park<sup>1</sup>, Hyung-Min Lee<sup>3</sup>, Gyu-Hyeong Cho<sup>1</sup>

<sup>1</sup>KAIST, Daejeon, Korea

<sup>2</sup>Siliconworks, Daejeon, Korea

<sup>3</sup>Korea University, Seoul, Korea

Lithium-ion batteries are generally used in mobile devices, but the voltage range of the battery varies from 2.7 to 4.2V. To provide a mid-3V-range output from the battery, a converter capable of step-up/down-conversion is necessary. For this purpose, non-inverting buck-boost topologies with multimode control [1-3] have been widely used. However, they have limited efficiency slightly higher than 90%, which comes from the fact that a main current path always encompasses two switches. To increase the efficiency in the buck mode where the converter operates for most of the usage time, a flying capacitor buck-boost (FCBB) was proposed in [4]. Despite its high power efficiency, it requires large-size LDMOS to endure a large voltage range up to 8V at switching node, resulting in cost inefficiency. Since all these topologies have a common controller that covers both buck and boost modes of operation, compensator design is challenging. Moreover, a non-minimum-phase system of boost operation makes it hard to achieve a fast loop response. In this paper, we propose a step-up/down DC-DC converter based on buck operation only over the whole input voltage range, which greatly simplifies the controller design and consequently gives fast response. Furthermore, it achieves high efficiency because of the reduced effective resistance on the main current path.

In the top left of Fig. 27.4.1, the proposed fast-response step-up/down converter (FUDC) topology is shown, which has one high-side switch  $S_1$  in the main current path between the battery and inductor like the buck converter. However, for the lower side, it has 6 sub-switches,  $S_2-S_7$ , to configure a switched-capacitor-based charge-pump for regulating sub-voltage sources from  $V_{BAT}/2$  to  $3V_{BAT}/2$ . As illustrated in Fig. 27.4.1 right, the converter has three different operating phases:  $\phi_d$ (down),  $\phi_c$ (charging) and  $\phi_u$ (up). By reconfiguring the sub-switches, a step-down or step-up mode can be adaptively selected. In the charging phase  $\phi_c$ , the inductor is energized or de-energized according to the voltage relation between  $V_{BAT}$  and output voltage  $V_{OUT}$ . At the same time, the flying capacitors  $C_{F1}$  and  $C_{F2}$  connected in series are charged up to  $V_{BAT}/2$ , respectively, where the switching node voltage  $V_x$  equals to  $V_{BAT}$ . During the down-phase  $\phi_d$ , the capacitors in parallel are connected to the inductor in series and discharged. In this condition, the inductor is de-energized, and  $V_x$  is  $V_{BAT}/2$ . In the up-phase  $\phi_u$ , the two parallel capacitors are connected to the battery and the inductor in series where  $V_x$  equals to  $3V_{BAT}/2$ . The maximum voltage stress for the switches  $S_6$  and  $S_7$  is  $V_{BAT}$ , and becomes  $V_{BAT}/2$  for other switches. Hence, the topology does not suffer from voltage stress problems. The operating waveforms are illustrated in the bottom left of Fig. 27.4.1 When  $V_{BAT}$  is larger than  $V_{OUT}$ , the step-down mode is selected by operation between two phases,  $\phi_c$  and  $\phi_d$ , and the conversion gain is  $(1+D)/2$  where the duty D is defined as the ratio of  $\phi_c$  to switching period T. Similarly, when the step-up mode is selected between  $\phi_c$  and  $\phi_u$ ,  $V_{OUT}$  is equal to  $(3-D)/2$  times of  $V_{BAT}$ . Since, the proposed FUDC always has a buck converter feature in step-up as well as step-down operation. The design complexity in the controller is greatly reduced owing to consistent loop dynamics as the buck converter for all two modes.

A switch size optimization strategy of FUDC is described in Fig. 27.4.2. To optimize the size of the switches, the total amount of charge flowing through each switch is calculated. In Fig. 27.4.2 left, step-down mode is modeled with nearly constant current assumption while step-up mode is omitted since it can be modeled in the same manner. The normalized resistance R is defined as the resistance of a single power switch in the conventional buck-boost converter. Then, by setting the resistances of  $S_1$  and other sub-switches as  $\alpha R$  and  $\beta R$ , respectively, the model can be simplified with voltage/current sources and single effective resistance. Figure 27.4.2 top right shows the current waveforms conducting through each switch. The negative (positive) area of  $I_{CF1}$  and  $I_{CF2}$  indicates the amount of charge flowing into (out of) the flying capacitor. Due to the capacitor charge balance, the

two areas have the same values. Taking Li-ion battery operating voltage range into consideration, D has a value between 0.5 and 1, and the maximum charge amount conducting through  $S_2-S_7$  is  $I_L T/4$  when D is 0.5. For  $S_1$ , the amount equals  $I_L T$  when D is 1. Thus,  $\alpha$  should be a quarter of  $\beta$ . In the effective resistance graph of Fig. 27.4.2 bottom right, the effect of switch sizing strategy is obvious.

In the step-up conversion, the inductor current of the conventional buck-boost converter (CBBC) is larger than the load current because only a portion of the inductor current is delivered to the output. Thus, the output voltage ripples of the CBBC increase as the load current increases. Figure 27.4.3 represents the advantages of FUDC over CBBC. In Fig. 27.4.3 top left, the ripples in the inductor current and the output voltage are shown when  $V_{BAT}$  varies slowly from high to low, for constant  $V_{OUT}$ . The FUDC has considerably lower ripples compared to CBBC since the voltage swing of  $V_x$  is half in the FUDC. In addition, the switching loss of the switching node can be reduced. Conduction loss of the FUDC is compared with that of the existing structure and presented in Fig. 27.4.3 bottom left. Even under light load condition, FUDC can maintain high efficiency while operating on a CCM mode since the reverse current of the inductor charges the flying-capacitors, resulting in reduced RMS current on  $S_1$ , as illustrated in Fig. 27.4.3 right. It is another merit that the FUDC does not require a DCM operation and complex zero current sensing circuitry.

In Fig. 27.4.4 top, the overall system architecture, including the power stage and controller is shown. A body-switching (BS) technique is applied to  $S_1$  [5]. Also, to reduce the gate driving switching loss, switch segmentation is implemented. The FUDC is a buck-boost converter, but always operates in the buck mode. Thus, simple hysteretic control is also possible. In the control stage, an unfixed boundary hysteretic control is implemented by using a quasi-inductor-current emulator (QICE) [6]. A differential path is added to  $V_{OUT}$  to prevent unwanted sub-ringing due to the phase difference between the inductor current and  $V_{OUT}$ . The steady-state waveforms in Fig. 27.4.4 bottom demonstrate the detailed operation. In the step-down (-up) mode, the upper (lower) boundary is determined by the clock for fixed-frequency operation, and the  $V_{REF}$  is the lower (upper) boundary. The current information is reset each cycle so that  $V_{FB}$  encounters  $V_{REF}$ . When  $V_{BAT}$  and  $V_{OUT}$  are similar, step-down or step-up can occur irregularly since the two boundaries become very close. In this situation, the reset cycle is doubled, so that step-up and -down occur regularly once in a row. With this simple control method, both fast transient response and smooth-mode transient can be achieved.

The FUDC was fabricated in a  $0.18\mu\text{m}$  BCD process with a chip area of  $2.0 \times 2.5\text{mm}^2$ . Measurement results of transient responses at 700mA load step with  $V_{BAT}=4.2\text{V}$  and  $V_{OUT}=3.4\text{V}$  as well as steady-state waveform for different  $V_{BAT}$  are shown in Fig. 27.4.5. Undershoot/overshoot voltages are only  $100\text{mV}/150\text{mV}$ , and recovery times of the output voltages are as short as  $6\mu\text{s}/12\mu\text{s}$ . As shown in Fig. 27.4.6 top, the converter has a maximum efficiency of 97% at  $V_{BAT}=3.7\text{V}$  with a load of 200mA. The die micrograph is shown in Fig. 27.4.7.

### References:

- [1] S. Rao, et al., "A 1.2A Buck-Boost LED Driver with 13% Efficiency Improvement Using Error-Averaged SenseFET-Based Current Sensing," *ISSCC*, pp. 238-240, Feb. 2011.
- [2] P. Malcovati, et al., "A  $0.18\mu\text{m}$  CMOS 91%-Efficiency 0.1-To-2A Scalable Buck-Boost DC-DC Converter for LED Drivers," *ISSCC*, pp. 280-282, Feb. 2012.
- [3] X.-E. Hong, et al., "98.1%-Efficiency Hysteretic-Current-Mode Non-Inverting Buck-Boost DC-DC Converter with Smooth Mode Transition," *IEEE Trans. Power Electron.*, vol. 32, no. 3, pp. 2008-2017, 2016.
- [4] Y.-M. Ju, et al., "A Hybrid Inductor-Based Flying-Capacitor-Assisted Step-Up/Step-Down DC-DC Converter with 96.56% Efficiency," *ISSCC*, pp. 184-186, Feb. 2017.
- [5] P. Favrat, et al., "A High-Efficiency CMOS Voltage Doubler," *IEEE JSSC*, vol. 33, no. 3, pp. 410-416, March 1998.
- [6] S.-H. Lee, et al., "A  $0.518\text{mm}^2$  Quasi-Current-Mode Hysteretic Buck DC-DC Converter with  $3\mu\text{s}$  Load Transient Response in  $0.35\mu\text{m}$  BCDMOS," *ISSCC*, pp. 214-216, Feb. 2015



Figure 27.4.1: Proposed topology and operating principle.



Figure 27.4.3: Advantages of proposed topology.

Figure 27.4.2: Switch-size optimization strategy.



Figure 27.4.4: System implementation.



Figure 27.4.5: Measured waveforms.



Performance Table

|                           | [1] ISSCC '11                     | [2] ISSCC '12 | [3] TPE '16 | [4] ISSCC '17                                    | This work                                      |
|---------------------------|-----------------------------------|---------------|-------------|--------------------------------------------------|------------------------------------------------|
| Process                   | 0.5µm CMOS                        | 0.18µm CMOS   | 0.35µm CMOS | 0.18µm BCD                                       | 0.18µm BCD                                     |
| Topology                  | Buck-Boost                        | Buck-Boost    | Buck-Boost  | FCBFB                                            | FUDC                                           |
| Inductor/Capacitor        | 2.2µH / 10µF                      | 1µH / 33µF    | 1µH / 10µF  | 2.2µH / 10µF<br>(10µH <sup>2</sup> )             | 2.2µH / 10µF<br>(2x20µH <sup>2</sup> )         |
| Input Voltage (V)         | 3 - 5.5                           | 2.7 - 5.5     | 2.5 - 5     | 2.7 - 4.2                                        | 2.7 - 4.2                                      |
| Output Voltage (V)        | 3.6                               | 0 - 5         | 3.3         | 3.4                                              | 3.4                                            |
| Load current range (A)    | 0.6 - 1.2                         | 0.1 - 2.0     | 0.01 - 0.4  | 0.25 - 2.0                                       | 0.03 - 1.0                                     |
| Switching frequency       | 2MHz                              | 2.5MHz        | <1.66MHz    | 1MHz                                             | 1MHz                                           |
| Continuous current supply | No                                | No            | No          | No                                               | Yes                                            |
| Efficiency                | Max: 90.7%                        | Max: 91%      | Max: 96.6%  | Max: 97.0%                                       | Max: 97.0%                                     |
|                           | Min: 61%                          | Min: 61%      | Min: 84%    | Min: 84%                                         | Min: 90.4%                                     |
| Load Transient            | Recovery Time (Within 1% of Vout) | -             | -           | 112µs / -                                        | 6µs / 12µs                                     |
|                           | Undershoot / overshoot            | -             | -           | @ VIN = 4.2V<br>(10mA - 400mA)<br>(400mA - 10mA) | @ VIN = 4.2V<br>(0mA - 700mA)<br>(700mA - 0mA) |

\* the value of flying capacitors \*\*measured at very low switching frequency

Figure 27.4.6: Performance summary and comparison with state-of-the-art.



Figure 27.4.7: Die micrograph.

## 27.5 A 95.2% Efficiency Dual-Path DC-DC Step-Up Converter with Continuous Output Current Delivery and Low Voltage Ripple

Se-Un Shin<sup>1</sup>, Yeunhee Huh<sup>1</sup>, Yongmin Ju<sup>1</sup>, Sungwon Choi<sup>1</sup>, Changsik Shin<sup>1</sup>, Young-Jin Woo<sup>1</sup>, Minseong Choi<sup>1</sup>, Se-Hong Park<sup>1</sup>, Young-Hoon Sohn<sup>1</sup>, Min-Woo Ko<sup>1</sup>, Younsin Jo<sup>1</sup>, Hyunki Han<sup>1</sup>, Hyung-Min Lee<sup>2</sup>, Sung-Wan Hong<sup>3</sup>, Wanyuan Qu<sup>4</sup>, Gyu-Hyeong Cho<sup>1</sup>

<sup>1</sup>KAIST, Daejeon, Korea

<sup>2</sup>Korea University, Seoul, Korea

<sup>3</sup>Sookmyung Women's University, Seoul, Korea

<sup>4</sup>Zhejiang University, Hangzhou, China

DC-DC boost converters are widely used to increase the supply voltage in various applications, including LED drivers, energy harvesting, etc. [1-5]. The conventional boost converter (CBC) is shown in Fig. 27.5.1, where the switches  $S_1$  and  $S_2$  are turned on and off alternately at  $\phi_1$  and  $\phi_2$ , respectively, and the inductor current ( $I_L$ ) is built up and delivered to the output. There are some critical issues in CBC because the output delivery current ( $I_S$ ) is not continuous. As a result, the  $I_L$  can be much larger than the load current ( $I_{LOAD}$ ) as  $\phi_1$  becomes longer. Since a bulky-size inductor having a low parasitic DC resistance ( $R_{DCR}$ ) is not usable for mobile applications with a strictly limited space, this large  $I_L$  results in significant conduction loss in the large  $R_{DCR}$  of a small-size inductor. Another issue is that the discontinuous  $I_S$  in  $\phi_2$  causes large voltage ripple ( $\Delta V_{OUT}$ ) at the output. Moreover, switching spike voltages can cause over-voltage stress on the loading block due to large  $dI/dt$  of  $I_S$  combined with parasitic inductances of the GND path.

To solve the issues noted above, this paper proposes a dual-path step-up converter (DPUC) as a new topology. There are two paths for current flow in different time slots, resulting in continuous current delivery to the output while boosting the output voltage with reduced level of  $I_L$ . The DPUC is composed of an inductor ( $L$ ), an output capacitor ( $C_{OUT}$ ), five power switches ( $S_1$ - $S_5$ ), and a flying capacitor ( $C_F$ ) as shown in Fig. 27.5.2. In the DPUC operation,  $S_1$  is turned on to build up  $I_L$  in  $\phi_1$ . At the same time,  $S_3$  and  $S_5$  are also turned on and  $C_F$  is connected in series with  $C_{OUT}$  (C-path) to deliver the capacitor current ( $I_C$ ) to the output. In  $\phi_2$ ,  $S_2$  and  $S_4$  are turned on connecting  $L$  and  $C_F$  in series and  $I_L$  is delivered to the output (L-path). The average current of the C-path is the same as that of the L-path to meet the charge balance on  $C_F$ , and the delivery current ( $I_D$ ) to the output is always continuous. The  $C_F$  is charged with a DC voltage of  $V_{OUT}-V_{IN}$  and the conversion ratio  $M$  is  $(2-D)/(2-2D)$ , which is always larger than 1 as in the CBC, as  $D$ , the duty cycle, varies from 0 to 1. Since the currents of both C-path and L-path are delivered to the output, the average current flowing through the inductor and the switches is reduced as well, which is one of notable strengths of the DPUC. Therefore, there is a significant reduction of overall conduction loss as the root-mean-square (RMS) current is reduced at a large  $R_{DCR}$  and each switch. The reduced  $I_L$  can be larger or smaller than  $I_{LOAD}$  depending on  $M$ . The other advantage of the DPUC is that the continuous  $I_D$  significantly reduces  $\Delta V_{OUT}$  and switching spikes. Thus, the DPUC can use a smaller-size inductor with a large  $R_{DCR}$  and a smaller-size  $C_{OUT}$  while achieving higher efficiency and smaller  $\Delta V_{OUT}$  than the CBC.

Figure 27.5.3 shows the top structure of the DPUC. In this paper, the peak current controller is adopted by sensing the output error voltage and the inductor current with slope compensation in generating  $\phi_1$ . The remaining time excluding  $\phi_1$  is determined as  $\phi_2$  to activate the 2-phase mode.

Here, there is a problem when  $\phi_1$  is less than 0.5Ts because  $I_C$  becomes larger inversely proportional to the  $\phi_1$  period and flows into the output through  $C_F$  within  $\phi_1$ . This is because the  $I_C$  must charge during  $\phi_1$  by the amount of discharge current during  $\phi_2$  in order to maintain the charge balance on  $C_F$  in the 2-phase mode of Fig. 27.5.2. If  $\phi_1$  is short,  $I_C$  becomes large and this increases the RMS current in the switch, which degrades the efficiency and  $\Delta V_{OUT}$ . To reduce the peak of  $I_C$  to a low level for short  $\phi_1$ ,  $\phi_3$  is inserted between  $\phi_1$  and  $\phi_2$ , as shown in the bottom left of Fig. 27.5.3. In  $\phi_3$ , the switches  $S_2$ ,  $S_3$ , and  $S_5$  are turned on simultaneously to combine the L-path and C-path to deliver the current to the output. The operation then becomes a 3-phase mode satisfying  $\phi_1 + \phi_3 = 0.5Ts$  and large peak  $I_C$  does not exist in  $C_F$  even though  $\phi_1$  is short, thereby lowering the RMS current significantly. In addition,  $\Delta V_{OUT}$  and switching spikes also are

considerably reduced as a result. Therefore, by selecting 3-phase mode and 2-phase mode properly, the DPUC can increase efficiency and reduce  $\Delta V_{OUT}$  in the whole range of  $\phi_1$ . To obtain these advantages automatically, the phase mode selector is designed to judge whether  $\phi_1$  is larger or smaller than 0.5Ts and determine whether to operate in 2-phase mode or 3-phase mode, which is done using a simple algorithm as shown in the right side of Fig. 27.5.3.

Moreover, the DPUC provides an additional advantage with regard to the transient response compared with the CBC. It is well known that the CBC has a right-half-plane- (RHP) zero effect that worsens the transient response [2]. This is because a temporarily opposite reaction occurs at the output delivery current for an abrupt change of  $\phi_1$  when an abrupt load change occurs in the CBC. This makes the transient response slower. On the other hand, in the DPUC, this effect is alleviated by the feedforward characteristic of the C-path. The top of Fig. 27.5.4 shows  $I_D$  and  $I_L$  in a step-up load transient condition as a simulated waveform. We can see that the rapid increment of C-path current as  $\phi_1$  increases in the DPUC alleviates the RHP zero effect. Therefore, the DPUC can achieve a faster transient response than the CBC with the same PWM controller. Owing to this effect, the stability of the DPUC can easily be guaranteed by using a conventional controller as well.

In discontinuous conduction mode (DCM), the DPUC also achieves small  $\Delta V_{OUT}$  at the output. To support DCM, a shrinking-diode-time zero-current detector (SDT-ZCD) is adopted as shown in the bottom of Fig. 27.5.4 [5]. The feedback is constructed to realize accurate ZCD. The zero current of  $I_L$  is detected at  $S_2$ . The SDT-ZCD monitors whether the rising-edge pulse ( $V_{S2R}$ ) of the  $S_2$  gate voltage ( $V_{SG}$ ) and the falling-edge pulse ( $V_{XF}$ ) of the switching node ( $V_X$ ) are overlapped or not, generating a UP/DN signal. By repeating UP and DN, it realizes accurate ZCD, thus improving the efficiency in DCM.

The DPUC was fabricated in a 1P4M 0.18 $\mu m$  BCD process. The left top and middle of Fig. 27.5.5 show the measured waveform for  $V_{OUT}=4.2V$  when  $I_{LOAD}$  is 600mA. It operates in 2-phase mode when  $V_{IN}=2.8V$  and in 3-phase mode when  $V_{IN}$  is 3.3V, where the output ripples ( $\Delta V_{OUT}$ ) are as low as 8mV and 12mV, respectively. The bottom left of Fig. 27.5.5 shows that the load transient response is improved in comparison with the CBC. The top right in Fig. 27.5.5 shows that the waveform of the CBC has a significantly large  $\Delta V_{OUT}$  of 50mV with larger  $I_L$ , which is measured under the same conditions as the 2-phase mode of the DPUC. In addition, the middle right of Fig. 27.5.5 shows the DCM operation with  $\Delta V_{OUT}=5mV$  at  $V_{IN}=3.3V$  when  $I_{LOAD}=40mA$ . The bottom right of Fig. 27.5.5 shows that the enlarged  $V_X$  node waveform demonstrates the operation of SDT-ZCD.

The top left of Fig. 27.5.6 shows the efficiency plot by varying  $I_{LOAD}$  at  $V_{IN}=3V$  and  $V_{OUT}=4.2V$ . The bottom of Fig. 27.5.6 presents a comparison table with other state-of-the-art works of the CBC. Even though  $R_{DCR}$  is 200m $\Omega$ , the DPUC has a peak efficiency of 95.2%, which allows the use of a cheaper and smaller inductor than in the CBC. In DCM with decreased switching frequency, SDT-ZCD increases light load efficiency. The top right of Fig. 27.5.6 shows the measured  $\Delta V_{OUT}$  for  $V_{OUT}=4.5V$  with  $I_{LOAD}$  of 500mA by varying  $V_{IN}$  from 2.4V to 4.2V, comparing the CBC and the DPUC. In the DPUC,  $\Delta V_{OUT}$  is reduced by more than 3 times relative to the CBC in the 2-phase mode, and the 3-phase mode maintains  $\Delta V_{OUT}$  at a low level even when  $V_{IN}$  increases. The die micrograph is shown in Fig. 27.5.7.

### References:

- [1] T.-H. Kong, et al., "A 0.791 mm<sup>2</sup> On-Chip Self-Aligned Comparator Controller for Boost DC-DC Converter Using Switching Noise Robust Charge-Pump," *IEEE JSSC*, vol. 49, no. 2, pp. 502-512, Feb. 2014.
- [2] Y. K. Luo, et al., "Time-Multiplexing Current Balance Interleaved Current-Mode Boost DC-DC Converter for Alleviating the Effects of Right-Half-Plane Zero," *IEEE Trans. Power Electron.*, vol. 27, no. 9, pp. 4098-4112, Sept. 2012.
- [3] 90% Efficient Synchronous Boost Converter with 600-mA Switch, TPS61071-Q1 Datasheet, Texas Instruments Inc.
- [4] X. Jing and P.K.T. Mok, "A Fast Fixed-Frequency Adaptive-On-Time Boost Converter with Light Load Efficiency Enhancement and Predictable Noise Spectrum," *IEEE JSSC*, vol. 28, no. 10, pp. 2442-2456, Oct. 2013.
- [5] J. Kim, et al., "A DC-DC Boost Converter With Variation-Tolerant MPPT Technique and Efficient ZCS Circuit for Thermoelectric Energy Harvesting Applications," *IEEE Trans. Power Electron.*, vol. 28, no. 8, pp. 3827-3833, Aug. 2013.





Figure 27.5.7: Die micrograph.

## 27.6 An 87.1% Efficiency RF-PA Envelope-Tracking Modulator for 80MHz LTE-Advanced Transmitter and 31dBm PA Output Power for HPUE in 0.153μm CMOS

Chen-Yen Ho, Shih-Mei Lin, Che-Hao Meng, Hao-Ping Hong, Sheng-Hong Yan, Ting-Hsun Kuo, Chia-Sheng Peng, Chieh-Hsun Hsiao, Hsin-Hung Chen, Da-Wei Sung, Chien-Wei Kuan

MediaTek, Hsinchu, Taiwan

Modulation schemes employed in long-term-evolution advanced (LTE-A) services for higher data-rate with high peak-to-average power ratios (PAPR) are becoming more complicated, which degrades the efficiency of RF power amplifiers (PA). Envelope-tracking modulators (ETM) have been proposed to improve the PA efficiency and linearity by dynamically adjusting the supply voltage of the RF PA according to the envelope of the transmitted signal.

To increase the uplink data-rate, LTE-A intra-band contiguous carrier aggregation (CCA) requires wider bandwidth ETM to track the envelope. The hybrid ac-coupled modulator [1-6] has the notable high efficiency advantage that it is able to couple the AC envelope-modulated signal to the ETM output. Unfortunately, the system efficiency and tracking bandwidth are still limited by the architecture selection and circuit design. To avoid the efficiency degradation due to a Class-AB amplifier, the authors in [4] presented a multilevel buck regulator to replace the Class-AB amplifier. However, achieving wider bandwidth and watt-level power from a buck regulator is challenging due to increased switching losses. For high-bandwidth and high-dynamic-range ET solutions, the authors in [5] introduced an amplifier with dual Class-AB drivers. The additional Class-AB driver results in an extra area. Moreover, the bandwidth is still limited to LTE-40MHz.

We demonstrate a hybrid ac-coupled ETM in Fig. 27.6.1 that achieves the widest bandwidth among published ETMs for cellular LTE application [1-5] by employing a dual-mode high-speed AC feed-forward Class-AB linear amplifier (LA). The measured E-ACLR for LTE-80MHz (4CCA) at 26dBm PA output power is -38.1dBc. An auto-detect selection (ADS) is utilized to achieve an ET dynamic range of 13dB. Proper control of the amplitude of the coupled AC signal can achieve not only higher overall system efficiency but larger power-delivery capability. Motivated by the above mentioned needs, both switching regulators, buck-boost and dual-power-line (DPL) buck converters, with excellent efficiency and fast settling at all output levels, become more and more imperative. The proposed Output Dependent Auxiliary Switch (ODAS) can monitor output levels to determine operation of power switch type for optimized efficiency. In addition, a Self-Compensated Ramp Generator (SCRG) is introduced to create an artificial ramp waveform to compensate the nonlinear characteristics of the transfer function of the buck-boost converter to achieve fast settling. Finally, the DPL Buck combining both DC-DC power paths is realized to achieve high-power-user-equipment (HPUE) Power Class 2 (i.e. max transmit power of 26dBm) for TD-LTE band41.

The proposed dual-mode high-speed AC feed-forward Class-AB LA design is shown in Fig. 27.6.2. The LA can be configured to high-bandwidth mode (HBM) and high-gain mode (HGM) according to TD-LTE/FDD-LTE application. Since receiver-band noise (RXBN) is critical for the FDD-LTE system, a high-loop-gain configuration is chosen to suppress RXBN by switching  $S_{\text{MODE}2}$  to HGM. The dominant pole of the LA is determined by the  $g_{m1/2}$  and equivalent impedance at nodes  $V_{\text{op1/on1}}$  or  $V_{\text{op2/on2}}$ . When switching  $S_{\text{MODE}1}$  to HBM, bandwidth can be pushed to 2x times compared to HGM. High bandwidth is the main design consideration for Intra-band CCA and thus HBM is used for all above 2CCA (40MHz) applications. The ETM adopts core devices for all the signal paths to extend the dominant pole to higher frequency and achieve a 3dB bandwidth above 100MHz. The high-speed direct AC feed-forward path M7~M9 drives a Class-AB output stage, which further enhances the LA bandwidth by 30% to 40%. In addition, to protect the core devices from overvoltage stress, cascode I/O devices are used for all amplifier stages and the Class-AB output stage. The high-speed auxiliary amplifier in Fig. 27.6.2 is utilized to drive M10 and M11. To increase the 2<sup>nd</sup> pole frequency at the output, core devices MP2 and MN2 are used to realize high transconductance as the output driver, which ensures the  $V_{ds}$  across the core devices M12 and M13 always operate in a reliable region. In addition, the headroom limitation of the Class-AB output stage is unavoidable in the cascode

structure [5]. Therefore, the ADS is applied. ADS for gate control of M10 and M11 is determined by  $V_{BB}$ . When  $V_{BB}$  is smaller than the threshold, SW1 and SW2 are on. M10 and M11 operate as small resistors. As a result, the headroom requirement can be reduced and ET dynamic range extended, which achieves better efficiency even at mid-range PA output power. In this work, the minimum supply voltage of  $V_{BB}$  is 1.5V.

To further improve the efficiency of the ETM, reducing resistance in the power path of DC-DC becomes an essential issue. The low battery supply, however, usually limits the performance and flexibility. A DPL-buck is employed to determine the power path,  $V_{BAT}$  or  $V_{BB}$ , based on the modulated envelope signal even for HPUE application. The dynamic body bias (DBB) and dynamic driving switch (DDS) are adopted as shown in Fig. 27.6.3. DBB automatically chooses a suitable body bias voltage level to acquire lowest on-resistance for highest efficiency while DDS always keeps at the highest power source level to secure driving capability.

Figure 27.6.3 shows the relation between  $V_{COMP}$  and  $V_{BB}$  of different  $V_{BAT}$  in each mode. The SCRG predicts the target  $V_{COMP}$  and creates an artificial ramp waveform that can track the relationship between  $V_{BAT}$  and  $V_{BB}$ ; the ramp compensates the nonlinear characteristics of the transfer function, shapes the ratio of  $V_{COMP}$  to  $\beta \cdot V_{BB}$  into a slope of 1, and does not need to take time to charge/discharge the internal large capacitor that is designed for stability. Hence, the buck-boost converter achieves fast step up/down during  $V_{BB}$  configuration and Inner-Loop Power Control (ILPC) for a 3G/4G cellular system, of 18μS within +/-0.5dB.

Figure 27.6.4 shows the proposed ETM efficiency plot with a fixed 4Ω resistor load vs. its output power. The proposed ETM is capable of delivering an output power up to 4W. The ETM peak efficiencies for 20/40/60/80MHz are 87/85.5/82.3/81.2%, respectively. The ET system was measured with an external 4G-LTE high-band (HB) PA. An LTE QPSK signal was used for measurement. The target antenna output power is 23dBm and 26dBm for Power Class 3 and 2 (HPUE), respectively, assuming a 5dB front-end loss in the transmitter path and 2dB maximum power reduction (MPR) for above 40MHz intra-band CCA. Figure 27.6.5 shows the measured DC supply current at 3.8V for Band41 LTE-20MHz vs. the RF-PA output power. Considering the HPUE case, the ET achieves 13dB dynamic range from 18dBm to 31dBm. A lowest 1.5V  $V_{BB}$  supply for the LA enables ET mode to present higher efficiency than average-power-tracking (APT) mode during the mid-range PA output power. Compared with APT mode, the ETM saves power by 34.5% at 31dBm PA output power. Figure 27.6.6 shows the measured LTE-80MHz at Band41 output spectrum and timing waveform of 26dBm PA output power (attenuator and cable loss are 11.9dB). The comparison table summarizes the performance of the ETM and compares to other state-of-the-art works [1-5]. Measured E-UTRA ACLRs for bandwidth of 40MHz (2CCA), 60MHz (3CCA), and 80MHz (4CCA) at 26dBm PA output power are -41.5, -39.9, and -38.1dBc, respectively. The die micrograph is shown in Fig. 27.6.7. The die size is 5.133mm<sup>2</sup> (2.95mm×1.74mm) in a 0.153μm CMOS process and is packaged in a 28-pin wafer-level chip-scale package (WLCSP).

### Acknowledgements:

The authors thank Yu-Hsin Lin for technical consultation and Dr. Tsung-Hsien Lin for paper discussion.

### References:

- [1] X. Liu, et al., "A 2.4V 23.9dBm 35.7%-PAE -32.1dBc-ACLR LTE-20MHz Envelope-Shaping-and-Tracking System with a Multiloop-Controlled AC-Coupling Supply Modulator and a Mode-Switching PA," *ISSCC*, pp. 38-39, Feb. 2017.
- [2] J.-S Paek, et al., "An RF-PA Supply Modulator Achieving 83% Efficiency and -136dBm/Hz Noise for LTE-40MHz and GSM 35dBm Applications," *ISSCC*, pp. 354-355, Feb. 2016.
- [3] M. Hassan, et al., "A CMOS Dual-Switching Power-Supply Modulator with 8% Efficiency Improvement for 20MHz LTE Envelope Tracking RF Power Amplifiers," *ISSCC*, pp. 366-367, Feb. 2013.
- [4] P. Arno, et al., "Envelope Modulator for Multimode Transmitters with AC-Coupled Multilevel Regulators," *ISSCC*, pp. 296-297, Feb. 2014.
- [5] S.-C. Lee, et al., "A Hybrid Supply Modulator with 10dB ET Operation Dynamic Range Achieving a PAE of 42.6% at 27.0dBm PA Output Power," *ISSCC*, pp. 42-43, Feb. 2015.
- [6] P. Riehl, et al., "An AC-Coupled Hybrid Envelope Modulator for HSUPA Transmitters with 80% Modulator," *ISSCC*, pp. 364-365, Feb. 2013.



Figure 27.6.1: Architecture of proposed hybrid ac-coupled ETM.



Figure 27.6.2: Schematic of dual-mode Class-AB linear amplifier with direct AC feed-forward.



Figure 27.6.3: Dynamic Body Bias &amp; Dynamic Driving Switch (DBB &amp; DDS) for DPL-Buck converter. Self-Compensated Ramp Generator (SCRG) and circuit implementation for Buck-Boost converter.



Figure 27.6.4: ETM efficiency versus its output power at 20MHz/40MHz/60MHz/80MHz.



Figure 27.6.5: DC supply power versus RF-PA output power (with APT-ET comparison).



Figure 27.6.6: Measured RF-PA output spectrum and timing waveform for intra-band CCA LTE-80MHz at band41 26.0dBm PA Output Power with -38.1dBc E-ACLR and performance comparison to published works.



Figure 27.6.7: Die micrograph.

## 27.7 A 2TX Supply Modulator for Envelope-Tracking Power Amplifier Supporting Intra- and Inter-Band Uplink Carrier Aggregation and Power Class-2 High-Power User Equipment

Takahiro Nomiyama<sup>1</sup>, Yongsik Youn<sup>2</sup>, Younghwan Choo<sup>1</sup>, Dongsu Kim<sup>1</sup>, Jaeyeol Han<sup>1</sup>, Junhee Jung<sup>1</sup>, Jongbeom Baek<sup>1</sup>, Sungjun Lee<sup>1</sup>, Euiyoung Park<sup>1</sup>, Jeonghyun Choi<sup>1</sup>, Ji-Seon Paek<sup>1</sup>, Jongwoo Lee<sup>1</sup>, Thomas Byunghak Cho<sup>1</sup>, Inyup Kang<sup>1</sup>

<sup>1</sup>Samsung Electronics, Hwaseong, Korea

<sup>2</sup>Samsung Semiconductor, San Jose, CA

Uplink carrier aggregation (UL-CA) and high-power user equipment (HPUE) are proposed in the recent 3GPP standard [1]. UL-CA increases data-rate by aggregating intra- or inter-band carriers, and requires a supply modulator (SM) IC to generate two independently modulated supply voltages for the separate transmitter (TX) paths. Power Class 2, a new HPUE standard intended for TD-LTE Band41, allows 26dBm output power, which is 3dB higher than typical Power Class 3. To achieve 26dBm output, an SM must provide an RF power amplifier (PA) with larger current and boosted voltage above battery range. However, conventional SM-ICs support only Power Class 3 PAs with intra-band contiguous CA up to 40MHz bandwidth in an envelope-tracking (ET) operation [2]. In order to support non-contiguous intra- and inter-band CA, a typical ET system needs two SMs with double the external components, occupying great PCB area in a cellular handset. This paper presents a single-chip SM-IC with two independently controlled TX outputs supporting Power Class 2. In this way, the SM-IC saves the BOM cost and the PCB area while achieving high system-power efficiency and low receiver-band noise.

Figure 27.7.1 shows the presented 2TX SM-IC architecture consisting of one buck-boost (BB) converter, two dual-supply buck (BK) converters, two Class-AB linear amplifiers (LA), and four average-power-tracking (APT) switches. For Power Class 2, the required PA supply voltage reaches up to 5.0V, much higher than typical 3.2-to-4.2V battery range. To meet this supply voltage requirement, BB is employed for both step-up and -down voltage generation, and the BB also supplies BKs as an additional power source [3]. For 2TX application within a single die, a straightforward solution is to double the dedicated SM set (BB, BK, LA) per each TX. But it also doubles the die area and external components, having almost no advantage over two separate SM chips. Since BB needs significant die area and a huge power inductor, in order to get the single-chip benefit, the 2TX SM-IC removed an extra BB and was designed to have a single shared BB for both the TX paths. Instead of using two big load capacitors ( $C_{BK}$ ), furthermore, one capacitor was also removed by sharing a  $C_{BK}$  for the TX paths. The sharing of BB (that is  $C_{BB}$ ) and  $C_{BK}$  are realized by the four APT switches of  $SW_{BB}$  and  $SW_{BK}$ , respectively. In typical 1TX application, the non-operating TX path is disabled, and the operating TX is supported by the BB, a BK, and an LA. The hybrid BK and LA operate jointly to provide the PA a modulated supply as in the conventional ET operation [2-4]. In 2TX application, both the TX paths are enabled for the combined modes including APT-APT, ET-ET, and mixed APT-ET. Since BB is the boosted voltage source above the battery range, in a usual assignment, it supplies the TX demanding higher voltage than the other TX.

From the fact that the 2TX SM has a shared single BB and independent TXs operation,  $C_{BB}$  and  $C_{BK}$  need to be connected properly at the transition interval of all the combined mode changes. Two mandatory guidelines should be obeyed. To avoid destructive damage among the four switches,  $C_{BB}$  and  $C_{BK}$  cannot be tied through the switches when the voltage difference is not close to zero. To keep a seamless TX supply in any event, one transition TX cannot make any interruption to the other active TX. These guidelines are practically implemented with a capacitor-swapping technique between TXs as shown in Fig. 27.7.2. As an example, once TX1 path starts its transition while TX2 keeps actively communicating, the capacitor swap detector generates a swap trigger signal when the  $V_{BB}$  and  $V_{BK}$  difference is within the threshold. The four switches are shorted together at that moment, and then  $C_{BB}$  and  $C_{BK}$  are swapped between TXs shortly afterwards. The transition ends within 20μs to meet the transition interval specification of the LTE system.

As shown in Fig. 27.7.3, a dual-supply ( $V_{BATT}$ ,  $V_{BB}$ ) BK per each TX path is also employed to drive a high-voltage PA for Power Class 2. When the required output

voltage is below the battery voltage, the BK has typical switching between ground and  $V_{BATT}$ . When the required output is above the battery, the BK of [3] changes its supply to BB output and switches between ground and  $V_{BB}$  (both cases return to ground, R2G). In this 2TX SM-IC, however, the BK switches between  $V_{BATT}$  and  $V_{BB}$  when the required output is above the battery (including the typical case, both cases return to battery, R2B). Comparing with the R2G switching, the R2B switching has several advantages. First, R2B has better efficiency. From the comparison example, the R2B (50%) has less duty-ratio than the R2G (80%) to get the same output voltage (4.0V). Because  $V_{BB}$  is generated from  $V_{BATT}$  through BB conversion loss, draining less current from  $V_{BB}$  is desired for the overall efficiency improvement. Second, R2B has less output noise, which is crucial for the FDD system. Because the generated noise is proportional to the square of the switching voltage difference, the R2B (5-3=2V) noise is only 16% of the R2G (5-0=5V) noise. The PMOS-PMOS switching of R2B was a challenge because it also demands the PMOS body switching and the speed needs to be as fast as the gate driving speed. As a result, the body driver is accurately synced with the gate driving signal in design and is incorporated into the PMOS switch itself in layout.

With a wide 2.5-to-5.0V battery range in measurements, the 2TX SM-IC provides dynamic voltage-scaled outputs from 0.4V to 5.0V for Power Class 2 PAs while supporting LTE 40MHz bandwidth in ET operation. Combined with a Bypass-LDO having 50mΩ on-resistance, the BB supplies up to 3.0A for a 2G-GSM PA. The features of the SM-IC are clearly shown in Fig. 27.7.4. Due to the capacitor swapping technique between TXs, the active TX2 keeps constant voltage without any interruption from the TX1 transition, and both the TXs have smooth transition with the combination of APT and ET. The ET-ET waveforms of different bandwidth signals also show completely independent TX operation from each other. The adaptive R2B switching changes its state dynamically between Gnd- $V_{BATT}$  and  $V_{BATT}$ - $V_{BB}$  depending on the envelope signal, and allows both less power consumption with the lower  $V_{BB}$  duty-ratio and output noise reduction with the smaller  $V_{BATT}$ - $V_{BB}$  switching amplitude.

Despite the BB having more loss and noise than the BK, by employing the highly efficient low-noise R2B switching, the SM-IC reaches maximum 84.6% efficiency as shown in Fig. 27.7.5, and achieves the low output noises of -133dBm/Hz @30MHz (LTE Band17) and -142dBm/Hz @95MHz (LTE Band3). With a commercial power amplifier module integrated duplexer (PAMiD) for Power Class 2 and LTE Band41, the PAMiD output power reaches 26.4dBm while consuming 2.04W dc power, which is equivalent to 42.7% PAE at 29.4dBm PA output assuming 3dB duplexer loss. Comparing with the APT mode, the ET mode extends its range down to about 16dBm and saves 800mW (47% of the dc supply power) at 26dBm PAMiD output power. Owing to the closed-loop highly linear output regulation of the LA, furthermore, E-UTRA ACLR is measured as 38.2dBc at the 26.4dBm PAMiD output power as shown in Fig. 27.7.6. Figure 27.7.6 also shows the overall performance summary and the comparison with prior works. The die micrograph of the 2TX SM-IC is shown in Fig. 27.7.7. Due to the area-efficient architecture, 6.0mm<sup>2</sup> (2.45mm×2.45mm) of die size is achieved with a 90nm CMOS process and 49-pin wafer-level chip-scale package. Comparing with double the commercial 1TX product having a BB [3], the presented state-of-the-art 2TX SM-IC occupies 40% less die area.

### Acknowledgements:

The authors thank Jong-Ku Kim and Junseok Yang for their contribution to implement the SM-IC. Authors also thank James Haslett for his technical editing.

### References:

- [1] 3GPP TS 36.101: Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA); User Equipment (UE) Radio Transmission and Reception.
- [2] J. S. Paek, et al., "An RF-PA Supply Modulator Achieving 83% Efficiency and -136dBm/Hz Noise for LTE-40MHz and GSM 35dBm Applications," *ISSCC*, pp. 354-355, Feb. 2016.
- [3] J. S. Paek, et al., "A -137 dBm/Hz Noise, 82% Efficiency AC-Coupled Hybrid Supply Modulator With Integrated Buck-Boost Converter," *IEEE JSSC*, vol. 51, no. 11, pp. 2757-2768, Nov. 2016.
- [4] M. Hassan, et al., "A CMOS Dual-Switching Power-Supply Modulator with 8% Efficiency Improvement for 20MHz LTE Envelope Tracking RF Power Amplifiers," *ISSCC*, pp. 366-367, Feb. 2013.
- [5] X. Liu, et al., "A 2.4V 23.9dBm 35.7%-PAE -32.1dBc-ACLR LTE-20MHz Envelope-Shaping-and-Tracking System with a Multiloop-Controlled AC-Coupling Supply Modulator and a Mode-Switching PA," *ISSCC*, pp. 38-39, Feb. 2017.



Figure 27.7.1: Presented 2TX Supply Modulator architecture for UL-CA and HPUE.

TX(V)  
5V  
0V  
Time(s)

Transition of one TX  
TX2=BB SW<sub>BB2</sub> ON  
TX1=BB SW<sub>BB1</sub> ON  
TX2=BK1 SW<sub>BK1</sub> ON  
TX1=BK2 SW<sub>BK2</sub> ON  
C<sub>BB</sub> & C<sub>BK</sub> swap  
No cross regulation at TX2 from TX1 transition

Transition of both TXs  
TX2=BB SW<sub>BB2</sub> ON  
TX1=BB SW<sub>BB1</sub> ON  
C<sub>BB</sub> & C<sub>BK</sub> swap  
TX2=BK1 SW<sub>BK1</sub> ON

Figure 27.7.2: Seamless TX transitions with capacitor swapping technique.



Figure 27.7.3: Presented Return-to-Battery switching at dual-supply buck converter.



Figure 27.7.4: Measured waveforms showing the features of 2TX SM-IC.



Figure 27.7.5: SM Efficiency at ET mode and the dc power consumption with PAMiD.



Figure 27.7.6: Measured 26.4dBm PAMiD output spectrum for LTE40MHz Band41 and the comparison to prior works.



Figure 27.7.7: Die micrograph.

## 27.8 94% Power-Recycle and Near-Zero Driving-Dead-Zone N-Type Low-Dropout Regulator with 20mV Undershoot at Short-Period Load Transient of Flash Memory in Smart Phone

Wei-Chung Chen, Tzu-Chi Huang, Chao-Chang Chiu, Chih-Wei Chang,  
Kuo-Chun Hsu

MediaTek, Hsinchu, Taiwan

In power-management integrated circuits (PMIC) for smart phones, cascaded buck and low-dropout (LDO) regulators with N-type power MOSFETs are commonly utilized for high conversion efficiency, power quality and high-density integration as shown in Fig. 27.8.1 [1]. Long paths on printed-circuit board (PCB) from the PMIC to the following applications result in obvious parasitic effects of large  $L_{PCB}$  and  $R_{PCB}$ , and multilayer ceramic capacitors (MLCC) placed near the application side are necessary. Complex and unpredictable PCB networks induce unexpected poles and zeros in the LDO loop so that an LDO with wide bandwidth (BW) and fast transient response is difficult to design. Furthermore, flash memory, such as universal flash storage (UFS) and embedded-multimedia cards (eMMC), has short-period heavy-to-light-to-heavy (H-L-H) load transients which makes LDO design more challenging. In the waveform shown in Fig. 27.8.1, the gate voltage of the power MOSFET ( $V_{GATE}$ ) is pulled toward OV when overshoot of  $V_{OUT}$  is caused by a heavy-to-light load transient. Once the light-to-heavy load transient occurs at moment  $t_0$  with  $V_{OUT}$  overshoot,  $V_{OUT}$  then suffers from large undershoot because the N-type power MOSFET has a driving dead zone. The driving dead zone is defined as the region of gate voltage  $V_{GATE}$  lower than the  $V_{OUT}$  level and the power MOSFET delivers no current. The power MOSFET and compensation capacitance forms a heavy capacitance load so that transient performance is degraded. In prior art, the amplifier (amp) and buffer stage consume large quiescent current ( $I_Q$ ) for easier stability compensation and higher slew rate (SR). In addition, dummy load current ( $I_{dummyload}$ ) at  $V_{OUT}$  or a complex clamping function at  $V_{GATE}$  are utilized for the short-period H-L-H load transient of flash memory. However, the efficiency and circuit complexity are sacrificed as a result.

Figure 27.8.2 shows the circuit implementation of the proposed N-type LDO, including a virtual-ground-based dynamic-power-recycling (VGDR) buffer and anti-ringing feed-forward (ARFF) compensation for meeting smart phone PMIC requirements. A VGDR buffer, which consists of MOSFETs  $M_{P5}$ ,  $M_{P6}$ , and  $M_{N7}$ , and the resistor  $R_3$ , sets  $V_{OUT}$  as virtual ground. The diode-connected  $M_{N7}$  provides low impedance at  $V_{GATE}$  for generating a high-frequency pole,  $P_{GATE}$ . The bias current of  $M_{N7}$  is determined by  $V_{GATE}$ , which has a positive correlation with load conditions. Therefore, the proposed VGDR buffer performs dynamic bias current adjustment to achieve high SR at load transient, and dynamic  $P_{GATE}$  can track the output capacitance pole during different load conditions for good stability. The constant bias  $I_B=1\mu A$  and  $R_S=100k\Omega$  provide extra pull-low ability, especially for  $M_{N7}$  to operate in the sub-threshold region under light load conditions. The current-mirror structure of  $M_{P5}$  and  $M_{P6}$  also performs dynamic current bias and generates low impedance at  $V_A$  to reduce compensation complexity.

An additional advantage of the virtual ground is that the inherent minimum level of  $V_{GATE}$  is equal to the  $V_{OUT}$  voltage level. Figure 27.8.3 (a) shows the improvement of  $\Delta V_{OUT}$  for a short-period H-L-H load transient. In a conventional N-type LDO,  $V_{GATE}$  is in the driving dead zone and the LDO is under open-loop operation during the  $V_{OUT}$  overshoot period. When a light-to-heavy load transient occurs during this period,  $V_{OUT}$  suffers from large undershoot and long settling time ( $\Delta t$ ) because the power MOSFET cannot deliver current until  $V_{GATE}$  is pulled up from OV to a level higher than  $V_{OUT}$ . That is, the larger driving dead zone results in longer response lag and worse undershoot voltage. By contrast, the proposed VGDR buffer restricts the minimum  $V_{GATE}$  to be clamped at the  $V_{OUT}$  level without extra detection circuitry during the  $V_{OUT}$  overshoot period. The LDO can be guaranteed to stay in closed-loop operation because  $V_{GATE}$  has a near-zero driving dead zone. As the result, power MOSFET can provide driving current instantly for the H-L-H load transient.

Furthermore, while the dynamic bias current in the VGDR buffer benefits low impedance and high SR, the virtual ground provides a power recycling path to  $V_{OUT}$ . Power recycle is defined as  $I_{Q,RECYCLE}/(I_Q+I_{Q,RECYCLE})$ , where  $I_Q$  and  $I_{Q,RECYCLE}$  are the quiescent current from the controller flowing into ground and  $V_{OUT}$ ,

respectively. The dynamic bias  $I_1$  ( $I_{DYN}$ ) is about 0.05% of  $I_{Q,OUT}$  and the ratio of  $I_2$  to  $I_1$  is about 1/40. Consequently, at maximum  $I_{LOAD}=800mA$ , the proposed LDO performs about 94% power recycling of the controller by recycling dynamic current  $I_1=400\mu A$  while  $I_2$  and the constant quiescent current of the other controller are  $10\mu A$  and  $15\mu A$ , respectively.

On the other hand, the proposed ARFF compensation in Figure 27.8.2 provides a dominant pole for frequency compensation and enhances the transient response. For achieving high SR without sacrificing large quiescent current, two on-chip capacitors,  $C_1$  and  $C_2$ , are designed to avoid heavy capacitance load on the nodes through the main feedback path, such as nodes  $V_2$ ,  $V_3$ , and  $V_5$ . Besides,  $C_1$  acts as a feed-forward coupling path from  $V_{OUT}$  to  $V_1$  to further accelerate transient response while  $V_1$  response is limited by the low-quiescent-current requirement. In Fig. 27.8.3 (b),  $V_{OUT}$  undershoot can be fed back to  $V_1$  through the coupling path of  $C_1$ , and the power MOSFET can instantly deliver power to charge up  $V_{OUT}$  at light-to-heavy load transient. However, without  $M_F$ , the sudden charging up of  $V_{OUT}$  feeds back to  $V_1$  through  $C_1$  again so that the power delivery of the power MOSFET is withdrawn and  $V_{OUT}$  then drops again. That is,  $V_{OUT}$  has kick-back ringing. This work then uses an unbalanced push-pull buffer including large size of  $M_F$  and low bias current to achieve smooth recovery.

Figure 27.8.4 shows frequency response considering wide-load range, PCB parasitic effects, MLCC and its equivalent series resistance (ESR), which generate two poles and two zeros.  $P_{PCB1}$  is approximately  $1/(2\pi C_{OUT} R_{LOAD})$ , where  $R_{LOAD}$  is effective resistance of  $I_{LOAD}$ .  $P_{PCB2}$  locates at high frequency beyond BW and the influence can be ignored. Two zeros,  $Z_{PCB1}$  and  $Z_{PCB2}$  locate over a wide frequency range under different PCB conditions.  $Z_{ESR}$  is a zero generated by  $C_{OUT}$  and  $R_{ESR}$ . The proposed LDO uses  $C_1=10pF$  as Miller compensation to generate the dominant pole  $P_{COMP}$ . The  $M_{N1}$  and  $M_F$  in series with  $C_1$  contributes a left-half-plane (LHP) zero  $Z_{COMP}$  for improving the phase margin. Another auxiliary Miller capacitor  $C_2=2pF$  creates pole splitting for further pushing  $P_{GATE}$  to higher frequency. In addition, the VGDR buffer generates a low and dynamic impedance to make  $P_{GATE}$  track  $P_{PCB1}$  at different  $I_{LOAD}$  for low damping factor. Therefore, the simple structure of the proposed LDO benefits the low complexity of compensation.

Figure 27.8.5 shows the measured load transient response while  $V_{BAT}=3.8V$ ,  $V_{SYS}=1.2V$  and  $V_{OUT}=1V$ . The load changes from 0mA to 800mA with 0.5 $\mu s$  rise time. The VGDR buffer ARFF compensation optimizes the transient response, and the undershoot voltage is 20mV whether the transient occurs at steady-state or at the  $V_{OUT}$  overshoot period. The table in Fig. 27.8.5 quantifies the advantage of the VGDR buffer stage with 1 $\mu A$  quiescent current and 1 $\mu F$   $C_{OUT}$ . The conventional works (i) and (ii) without clamping  $V_{GATE}$  need large quiescent current ( $I_Q=100\mu A$  to 500 $\mu A$ ) and larger  $C_{OUT}$  to avoid large undershoot when  $V_{GATE}$  is in the driving dead zone. Figure 27.8.6 tabulates all the performance values of the proposed LDO compared with previous work [2-4]. The proposed LDO obtains the best figure of merit (FOM) [5] for transient response. Figure 27.8.7 shows the die micrograph fabricated in UMC 0.15 $\mu m$  5V-CMOS, and the die area is 0.054mm<sup>2</sup>.

### References:

- [1] Q.-H. Duong, et al., "Multiple-Loop Design Technique for High-Performance Low-Dropout Regulator," *IEEE JSSC*, vol. 99, no. 7, pp. 1-17, July. 2017.
- [2] M. Ho, et al., "A CMOS Low-Dropout Regulator With Dominant-Pole Substitution," *IEEE Trans. Power Electron.*, vol. 31, no. 9, pp. 6362-6371, Sept. 2016.
- [3] M. Ho, et al., "A Low-Power Fast-Transient 90-nm Low-Dropout Regulator With Multiple Small-Gain Stages," *IEEE JSSC*, vol. 45, no. 11, pp. 2466-2475, Nov. 2010.
- [4] A. Maity and A. Patra, "A Hybrid-Mode Operational Transconductance Amplifier for an Adaptively Biased Low Dropout Regulator," *IEEE Trans. Power Electron.*, vol. 32, no. 2, pp. 1245-1254, Feb. 2017.
- [5] M. Al-Shyoukh, et al., "A Transient-Enhanced Low Quiescent Current Low-Dropout Regulator with Buffer Impedance Attenuation," *IEEE JSSC*, vol. 42, no. 8, pp. 1732-1742, Aug. 2007.



Figure 27.8.1: Conventional N-type LDO for flash memory in smart phone.



Figure 27.8.2: Schematic of proposed N-type LDO.



Figure 27.8.3: Transient improvement for short-period H-L-H load transient by (a) VGDPR buffer and (b) ARFF compensation.



Figure 27.8.4: Frequency response considering wide load range, PCB parasitic effects, and MLCC.



Figure 27.8.5: Measured load transient response and comparison performance table of buffer stage.

|                      | [1] JSSC 2017       | [2] TPE 2016 | [3] JSSC 2010 | [4] TPE 2017       | This Work           |
|----------------------|---------------------|--------------|---------------|--------------------|---------------------|
| Technology           | 0.13μm              | 0.18μm       | 90nm          | 0.18μm             | 0.15μm              |
| Power-MOS Type       | NMOS                | PMOS         | PMOS          | PMOS               | NMOS                |
| $V_{out}$            | 1.0V                | 1.0V         | 0.9V          | 1.2V               | 1.0V                |
| $I_{load}$ no load   | 14μA                | 135μA        | 9.3μA         | 1.6μA              | 15μA                |
| $I_{load}$ full load | 120μA               | 135μA        | 9.3μA         | 200μA              | 25μA                |
| $I_{dyn}$            | 0.04% of $I_{load}$ | -            | -             | 0.4% of $I_{load}$ | 0.05% of $I_{load}$ |
| Power Recycling      | -                   | -            | -             | -                  | 94%                 |
| $I_{max}$            | 300mA               | 100mA        | 50mA          | 50mA               | 800mA               |
| $\Delta I_{load}$    | 300mA               | 100mA        | 50mA          | 50mA               | 800mA               |
| $\Delta V_{out}$     | 56mV                | 25mV         | 6mV           | 24mV               | 20mV                |
| $C_{out}$            | 1μF                 | 1μF          | 1μF           | 1μF                | 1μF                 |
| FOM [5]              | 8.71                | 337.5        | 22.32         | 15.36              | 0.47                |

\*Note:  $I_Q = 25\mu A$ , and  $I_{Q,RECYCLE} = 400\mu A$  at  $I_{load} = I_{max}$

Figure 27.8.6: Performance summary and comparison with state of the art.



1. Power MOSFET
2. Main Controller
3. Compensation MIM capacitors
4. Other test Function

Figure 27.8.7: Die micrograph.

## 27.9 An On-Chip Resonant-Gate-Drive Switched-Capacitor Converter for Near-Threshold Computing Achieving 70.2% Efficiency at 0.92A/mm<sup>2</sup> Current Density and 0.4V Output

Moataz Abdelfattah<sup>1</sup>, Muhammad Swilam<sup>1</sup>, Brian Dupaix<sup>2</sup>, Shane Smith<sup>1</sup>, Ayman Fayed<sup>1</sup>, Waleed Khalil<sup>1</sup>

<sup>1</sup>Ohio State University, Columbus, OH

<sup>2</sup>Air Force Research Laboratory, Wright-Patterson AFB, OH

Near-threshold computing (NTC) is a promising approach to address the increasing demand for energy efficiency in computing platforms. In NTC, the supply voltage is scaled down to realize quadratic energy savings while degrading the operating frequency only linearly, which can be compensated by using many-core architectures. However, practical implementation of many-core NTC systems requires a large number of on-chip DC-DC converters to provide each core with independent voltages and fast dynamic voltage scaling at a reduced cost. Moreover, these converters must support heavy loads (a few hundreds of millamps) to supply the current required per core, or cluster of cores, while occupying minimal area (i.e. high current density) and achieving high power-conversion efficiency at low output voltages.

Switched-capacitor (SC) converters are an attractive approach for integration; however, achieving high current density comes at the expense of low efficiency. This tradeoff can be understood by considering that increasing the load entails a larger power switch size and either a higher switching frequency ( $f_{sw}$ ) or a larger flying capacitance ( $C_{fly}$ ). The former results in a quadratic rise in switching losses, and thus a sharply degraded achievable efficiency, while the latter results in only a linear rise in switching losses but lowers the current density. At low output voltages, this tradeoff is further exacerbated as small overdrive voltages force the power switches to be excessively large to obtain sufficiently low ON-resistance. To address this tradeoff, the dominant approach in the literature has been to preserve efficiency by increasing  $C_{fly}$  and mitigating the drop in current density by either using special capacitor technologies, such as deep-trench [1] and high-density MIM capacitors [2,3], or through soft charging techniques [4]. However, special capacitor technologies entail higher economic cost, and soft charging techniques are reported only at higher output voltages than required for NTC. In contrast, this paper increases  $f_{sw}$  to preserve current density while counteracting the increase in switching losses by utilizing an area-efficient resonant gate drive (RGD) circuit.

Resonant gate drivers rely on LC tanks to recycle part of the charge consumed to drive the power switches in a converter. However, conventional implementations [5] dedicate an inductor for every power switch, or pair of switches, which is impractical for integration purposes. In this paper, an RGD scheme that allows sharing the inductor across a large number of power switches is presented. Figure 27.9.1 illustrates the operation principle of the RGD scheme in the case of a pair of power switches ( $M_P$ ,  $M_N$ ) with complementary gate-control signals ( $V_{GP}$ ,  $V_{GN}$ ). Assuming initial opposite voltages on the gates of the switches, a resonance pulse (RP) is used to initiate the charge recycling process between the gate capacitors through a small on-chip inductor and two transmission gates. By optimizing the width of RP, resonance can be stopped at the peak of the charge transfer to maximize energy savings and avoid voltage ringing across the inductor at the zero-current state. However, losses due to the finite Q-factor of the LC tank deplete the stored energy, and thus prevent the gate voltages from reaching full supply levels. Therefore, restore switches (RS) are used to pull up/down the gate voltages to appropriate levels. Accordingly, charges are taken from the supply only to replenish the lost energy, and thus, the switching losses of the power switches are reduced by a factor ( $\alpha$ ), which is a function of the Q-factor.

In this RGD scheme, since the inductor is active only during gate transitions, it can be shared among multiple pairs of power switches, provided that their gate transitions do not overlap. Therefore, the scheme is extended to a 2:1 series-parallel SC topology with PMOS and NMOS power switches configured as shown in Fig. 27.9.2. The gate transitions of the pairs ( $M_{P1}$ ,  $M_{N1}$ ) and ( $M_{P2}$ ,  $M_{N2}$ ) must always be non-overlapping to prevent shoot-through current, and thus, the inductor is shared between them by dynamically reconfiguring its connection using transmission gates. It can be shown that by optimizing the power switch

sizes and switching frequency, the overall converter losses (series and switching) are reduced by a factor ( $\alpha^{1/2}$ ) compared to an optimized RGD-less design.

The presented RGD scheme is incorporated into a 4-phase 2:1 SC converter, as shown in Fig. 27.9.3. The multiphase design reduces the voltage ripple without requiring an output decoupling capacitor. Moreover, since all 4 phases are time-interleaved, only a single inductor is shared for all phases. Although the switching losses associated with turning ON/OFF the transmission gates of the RGD circuit represent a small portion of the overall losses, it is worth noting that in a multiphase design, these losses grow larger since the inductor must be reconfigured multiple times within a single switching cycle. However, this is offset by an increase in the tank Q-factor due to smaller power switches per phase. The output voltage is regulated by a lower-bound hysteretic controller using a 2.6GHz clocked comparator, whereas the timing control block is responsible for generating all the resonance pulses and restore signals within the RGD circuits. To quantify the impact of the resonance pulse width on the efficiency improvement, the width of the pulse is controlled through programmable-delay inverters. Finally, an on-chip resistive load is used to characterize the converter efficiency and transient response.

The design was fully integrated in a 45nm SOI technology and its active area is 0.32mm<sup>2</sup>, excluding the on-chip test load. The 100pH on-chip inductor represents only 5% of the total active area. The converter operates from a 1V input and supports up to 295mA load at a maximum switching frequency of 325MHz and an output voltage in the range of 0.35 to 0.41V. Figure 27.9.4 shows the measured converter efficiency versus load current. At the maximum current density of 0.92A/mm<sup>2</sup>, 70.2% efficiency is achieved, while the peak efficiency is 75.5% at 0.44A/mm<sup>2</sup>. To illustrate the effectiveness of the RGD approach, simulation results of the implemented converter and an optimized RGD-less converter are overlaid on the measured efficiency, showing a significant improvement in efficiency (~8%) with RGD. Note that the measured efficiency is slightly lower than the simulated results due to power-routing losses. Additionally, Fig. 27.9.4 shows that the measured efficiency changes by less than 2% across ±12% change in the width of RP. Figure 27.9.5 shows the measured load-step dynamic response of the converter with a 20ns settling time worst-case.

Figure 27.9.6 summarizes the key performance metrics of the converter, along with a comparison to the state-of-the-art designs. Among converters operating at near threshold output voltages (i.e. 0.4 to 0.6 V) with a wide range of maximum current densities, the presented converter achieves better efficiency (over 4%) and significantly higher current density (over 2x better), which makes it an attractive design for NTC applications. Figure 27.9.7 shows the die micrograph with the relevant blocks highlighted.

### References:

- [1] T. M. Andersen, et al., "A Feedforward Controlled On-Chip Switched-Capacitor Voltage Regulator Delivering 10W in 32nm SOI CMOS," *ISSCC*, pp. 1-3, Feb. 2015.
- [2] T. Souvignet, et al., "A Fully Integrated Switched-Capacitor Regulator With Frequency Modulation Control in 28-nm FDSOI," *IEEE TPE*, vol. 31, no. 7, pp. 4984-4994, July 2016.
- [3] R. Jain et al., "A 0.45-1 V Fully-Integrated Distributed Switched Capacitor DC-DC Converter With High Density MIM Capacitor in 22 nm Tri-Gate CMOS," *IEEE JSSC*, vol. 49, no. 4, pp. 917-927, April 2014.
- [4] N. Butzen and M. Steyaert, "A 1.1W/mm<sup>2</sup>-Power-Density 82%-Efficiency Fully Integrated 3:1 Switched-Capacitor DC-DC Converter in Baseline 28nm CMOS Using Stage Outphasing and Multiphase Soft-Charging," *ISSCC*, pp. 178-179, Feb. 2017.
- [5] R. Chen and F. Z. Peng, "A High-Performance Resonant Gate-Drive Circuit for MOSFETs and IGBTs," *IEEE TPE*, vol. 29, no. 8, pp. 4366-4373, Aug. 2014.
- [6] J. Jiang, et al., "A 2-/3-Phase Fully Integrated Switched-Capacitor DC-DC Converter in Bulk CMOS for Energy-Efficient Digital Circuits with 14% Efficiency Improvement," *ISSCC*, pp. 366-367, Feb. 2015.



Figure 27.9.1: Operation principle of the proposed RGD scheme, and impact on switching losses.



Figure 27.9.2: RGD scheme applied to a 2:1 SC converter, relevant waveforms, and impact on overall converter losses.



Figure 27.9.3: System overview of the 4-phase converter, and implementation of the timing control block.



Figure 27.9.4: Measured efficiency across load current ( $V_{in} = 1V$ ,  $V_{ref} = 0.41V$ ), and impact of RP width on measured efficiency at maximum load.



Figure 27.9.5: Measured load-step transient response ( $V_{in} = 1V$ ,  $V_{ref} = 0.41V$ ). Voltage ripple between 50 and 70mV, and response time  $\approx 20$ ns.

|                                         | [4]           | [2]                            | [3]                            | [6]                            | This Work      |
|-----------------------------------------|---------------|--------------------------------|--------------------------------|--------------------------------|----------------|
| Technology                              | 28nm          | 28nm FD-SOI                    | 22nm Tri-gate                  | 65nm                           | 45nm SOI       |
| Capacitor Technology                    | MOS/MOM       | High-Density MIM               | High-Density MIM               | MOS/MOM/MIM                    | MOS/MOM        |
| VCR                                     | 3:1           | 4:3, 3:2, 2:1, 3:1             | 1:1, 2:1, 3:2, 5:4             | 4:1, 3:1                       | 2:1            |
| $V_{in}$ (V)                            | 3.2           | 1.8                            | 1.23                           | 1.5 – 2.5                      | 1              |
| $V_{out}$ (V)                           | 0.95          | 0.2 – 1.2                      | 0.45 – 1                       | 0.4 – 0.7                      | 0.35 – 0.41    |
| $f_{sw}$ (MHz)                          | 267           | 100*                           | 250                            | 100                            | 325            |
| Max. $I_{load}$ (mA)                    | 129           | 250                            | 36*                            | 50*                            | 295            |
| $I_{peak}$                              | 82% @ 0.95 V  | 72.5% @ 0.72 V                 | 80 @ 1 V                       | 79.5% @ 0.6 V                  | 75.5% @ 0.41 V |
| Max. $I_{density}$ (A/mm <sup>2</sup> ) | 1.15 @ 0.95 V | 0.43 @ 0.72 V<br>0.43 @ 0.34 V | 0.34* @ 0.7 V<br>0.3* @ 0.55 V | 0.22* @ 0.6 V<br>0.13* @ 0.5 V | 0.92 @ 0.41 V  |
| $\eta$ @ Max. $I_{density}$             | 82%           | 72.5%<br>45.5%                 | 70%*<br>66%*                   | 67%*<br>60%*                   | 70.2%          |
| Near-Threshold Voltages                 | No            |                                |                                |                                | Yes            |

\* estimate based on graphs

Figure 27.9.6: Comparison with state-of-the-art fully integrated, high-current-density SC converters. Designs with Near-Threshold voltages are highlighted.



Figure 27.9.7: Die micrograph of the SC converter with RGD. The total active converter area is  $0.319\text{mm}^2$ .

# Session 28 Overview: *Wireless Connectivity*

## WIRELESS SUBCOMMITTEE



**Session Chair:**  
*Howard Luong*

*Hong Kong University of Science and Technology,  
Hong Kong, China*



**Associate Chair:**  
*Kyoo Hyun Lim*

*FCI, Seongnam, Korea*

**Subcommittee Chair: Stefano Pellerano**, Intel, Hillsboro, OR

Connecting things wirelessly requires optimization from multidisciplinary areas. This session will introduce state-of-the-art wireless transceivers supporting ultra-lower-power IoT and connectivity solutions. In this session, a high-performance WLAN SoC supporting up to 802.11ax 1024QAM will be presented. Then, two-blockers-tolerant high-sensitivity Bluetooth Low-Energy (BLE) transceivers in 65nm and 40nm CMOS will be presented followed by a best-in-class performance all-digital PLL for BLE in 16nm FinFET technology, and an energy-harvesting BLE transmitter in 28nm CMOS. An ultra-low-power wakeup receiver enabling event-driven sensor nodes and an ultrasonic wake-up receiver using a precharged capacitive micro-machined ultrasound transducer will be shown. Finally, a 5.8GHz near-field radio achieving the smallest die size of 116 $\mu$ m $\times$ 116 $\mu$ m will be presented in this session.



1:30 PM

### 28.1 An 802.11ax 4x4 Spectrum-Efficient WLAN AP Transceiver SoC Supporting 1024QAM with Frequency-Dependent IQ Calibration and Integrated Interference Analyzer

*S. Kawai, Toshiba, Kawasaki, Japan*

In Paper 28.1, Toshiba presents a fully integrated 4x4 AP WLAN SoC in 28nm CMOS supporting up to 802.11ax and equipped with frequency-dependent IQ calibration for 1024QAM and an interference analyzer for reliable connection.



2:00 PM

### 28.2 An ADPLL-Centric Bluetooth Low-Energy Transceiver with 2.3mW Interference-Tolerant Hybrid-Loop Receiver and 2.9mW Single-Point Polar Transmitter in 65nm CMOS

*H. Liu, Tokyo Institute of Technology, Tokyo, Japan*

In Paper 28.2, the Tokyo Institute of Technology reports a Bluetooth Low-Energy (BLE) transceiver in 65nm CMOS. The RX consumes 2.3mW with a sensitivity of -94dBm and high blocker tolerance owing to the proposed single-channel demodulation and dynamic-range enhancement technique. The single-point polar TX consumes 2.9mW with a 1.89% FSK error.



2:30 PM

**28.3 A 0.8V 0.8mm<sup>2</sup> Bluetooth 5/BLE Digital-Intensive Transceiver with a 2.3mW Phase-Tracking RX Utilizing a Hybrid Loop Filter for Interference Resilience in 40nm CMOS**
*M. Ding*, imec - Holst Centre, Eindhoven, The Netherlands

In Paper 28.3, imec-Holst center and Renesas Electronics describe a 0.8V 0.8mm<sup>2</sup> Bluetooth 5/BLE digital-intensive transceiver in 40nm CMOS with a 2.3mW phase-tracking RX utilizing a hybrid loop filter for interference resilience. With an ADPLL-based digital FM interface for precise deviation frequency control, the TX delivers 1.8dBm maximum output power, whereas the RX achieves a -92/-95dBm sensitivity at 2Mb/s and 1Mb/s respectively.



3:15 PM

**28.4 A 0.45V Sub-mW All-Digital PLL in 16nm FinFET for Bluetooth Low-Energy (BLE) Modulation and Instantaneous Channel Hopping Using 32.768kHz Reference**
*M-S. Yuan*, TSMC, Hsinchu, Taiwan

In Paper 28.4, TSMC, and University College Dublin present a 0.45V sub-mW all-digital PLL for BLE modulation and <100ns instantaneous channel hopping using a 32.768kHz reference in 16nm FinFET technology. It performs channel hopping and GFSK modulation in a 2-point manner with extensive DCO calibrations after locking to the center band upon system power-up.



3:45 PM

**28.5 A 0.2V Energy-Harvesting BLE Transmitter with a Micropower Manager Achieving 25% System Efficiency at 0dBm Output and 5.2nW Sleep Power in 28nm CMOS**
*J. Yin*, University of Macau, Macau, China

In Paper 28.5, the University of Macau and Instituto Superior Technico/University of Lisboa describe the implementation of a 0.2V energy-harvesting BLE transmitter in 28nm CMOS with a micropower manager exhibiting 25% system efficiency at 0dBm output. An ultra-low-voltage VCO with a 5.6:1 transformer and Class-E/F2 PA with an HD-3-notching transformer is presented with a passive-intensive type-I PLL with a 5% duty cycle to improve the reference spurs to -47dBc.



4:15 PM

**28.6 A -76dBm 7.4nW Wakeup Radio with Automatic Offset Compensation**
*J. Moody*, University of Virginia, Charlottesville, VA

In Paper 28.6, the University of Virginia reports an ultra-low-power wakeup receiver in 0.13μm CMOS, enabling event-driven sensor nodes with automatic offset compensation spending the majority of their time in an “asleep-yet-alert” state. The receiver consumes 7.4nW with a measured sensitivity of -76dBm and -71dBm at the 151.8MHz MURS and 433MHz ISM bands respectively.



4:30 PM

**28.7 A 14.5mm<sup>2</sup> 8nW -59.7dBm-Sensitivity Ultrasonic Wake-Up Receiver for Power-, Area-, and Interference-Constrained Applications**
*A. S. Rekhi*, Stanford University, Stanford, CA

In Paper 28.7, Stanford University implements an ultrasonic wake-up receiver in 65nm CMOS using a precharged capacitive micro-machined ultrasonic transducer. Realized in an area of 14.5mm<sup>2</sup>, it achieves -59.7dBm sensitivity with 8nW power consumption.



4:45 PM

**28.8 A 5.8GHz Power-Harvesting 116μm×116μm "Dielet" Near-Field Radio with On-Chip Coil Antenna**
*B. Zhao*, University of California, Berkeley, Berkeley, CA

In Paper 28.8, the University of California at Berkeley presents a 5.8GHz power-harvesting 116μm×116μm “Dielet” near-field radio in 65nm CMOS with on-chip coil antenna. A hybrid two-tone IM2-IM3 technique is proposed to lock the on-chip oscillator, simultaneously improving the uplink SNR to 42dB and uplink signal-to-transmitter ratio to -28.9dBc at 20MHz.

## 28.1 An 802.11ax 4x4 Spectrum-Efficient WLAN AP Transceiver SoC Supporting 1024QAM with Frequency-Dependent IQ Calibration and Integrated Interference Analyzer

Shusuke Kawai<sup>1</sup>, Hiromitsu Aoyama<sup>2</sup>, Rui Ito<sup>3</sup>, Yutaka Shimizu<sup>3</sup>, Mitsuyuki Ashida<sup>3</sup>, Asuka Maki<sup>3</sup>, Tomohiko Takeuchi<sup>3</sup>, Hiroyuki Kobayashi<sup>3</sup>, Go Urakawa<sup>3</sup>, Hiroaki Hoshino<sup>3</sup>, Shigejito Saigusa<sup>3</sup>, Kazushi Koyama<sup>4</sup>, Makoto Morita<sup>2</sup>, Ryuichi Nihei<sup>2</sup>, Daisuke Goto<sup>2</sup>, Motoki Nagata<sup>3</sup>, Kengo Nakata<sup>3</sup>, Katsuyuki Ikeuchi<sup>1</sup>, Kentaro Yoshioka<sup>1</sup>, Ryoichi Tachibana<sup>3</sup>, Makoto Arai<sup>2</sup>, Chen-Kong Teh<sup>2</sup>, Atsushi Suzuki<sup>2</sup>, Hiroshi Yoshida<sup>2</sup>, Yosuke Hagiwara<sup>3</sup>, Takayuki Kato<sup>2</sup>, Ichiro Seto<sup>1</sup>, Tomoya Horiguchi<sup>3</sup>, Koichiro Ban<sup>1</sup>, Kyosuke Takahashi<sup>3</sup>, Hirotsugu Kajihara<sup>3</sup>, Toshiyuki Yamagishi<sup>3</sup>, Yuki Fujimura<sup>3</sup>, Kazuhisa Horiuchi<sup>3</sup>, Katsuya Nonin<sup>1</sup>, Kengo Kurose<sup>3</sup>, Hideki Yamada<sup>3</sup>, Kentaro Taniguchi<sup>1</sup>, Masahiro Sekiya<sup>1</sup>, Takeshi Tomizawa<sup>3</sup>, Daisuke Taki<sup>3</sup>, Masaaki Ikuta<sup>3</sup>, Tomoya Suzuki<sup>3</sup>, Yuki Ando<sup>3</sup>, Daisuke Yashima<sup>1</sup>, Takahisa Kaihatsu<sup>1</sup>, Hiroki Mori<sup>1</sup>, Kensuke Nakanishi<sup>3</sup>, Takeshi Kumagaya<sup>1</sup>, Yasuo Unekawa<sup>2</sup>, Tsuguhide Aoki<sup>1</sup>, Kohei Onizuka<sup>1</sup>, Toshiya Mitomo<sup>1</sup>

<sup>1</sup>Toshiba, Kawasaki, Japan

<sup>2</sup>Toshiba Electronic Devices & Storage, Kawasaki, Japan

<sup>3</sup>Toshiba Memory, Kawasaki, Japan; <sup>4</sup>Toshiba Microelectronic, Kawasaki, Japan

An exponentially increasing number of wireless-LAN (WLAN) devices in a dense environment causes a decrease in throughput owing to collisions among the devices and the lack of contiguous bandwidth. The next-generation standard of 802.11ax improves spectrum efficiency by additionally supporting 1024 (1K) QAM, OFDMA with 80+80MHz; these impose several challenges to silicon design. 1K-QAM demands extreme IQ balance with an IRR better than -50dB over a wide frequency range up to 80MHz to achieve at least -35dB RX EVM or less. Noise characteristics better than -44dBc LO integrated phase noise as well as 50dB isolation between each TRX chain are mandatory. To best make use of the essential features defined in the 11ax standard, a real-time and arbitrary spectrum-resource control capability at each access point (AP) is beneficial both in terms of further improvement in spectrum efficiency as well as communication reliability in ISM coexisting bands and legacy WLAN. Interference identification of the order of 10μs while incorporating everything into a strictly limited silicon area is key to launching the advanced unique functionality. This paper presents a fully integrated 4x4 802.11abgn/ac/ax-compliant AP transceiver SoC. The chip offers frequency-dependent IQ (FD-IQ) mismatch calibration for both amplitude and phase, low-noise TX BB and multimode LO distribution techniques, and real-time interference detection including 2.4GHz inverter-type microwave (MW) ovens.

Figure 28.1.1. shows a block diagram of the proposed SoC that contains four dual-band 802.11abgn/ac/ax RF chains with dual synthesizers. The first synthesizer provides LO signals to all 4 transceivers while the second synthesizer provides LO signals to chains 2 and 3. This architecture supports all possible configurations of 4x4, 3x3+1x1, 2x2+2x2, non-contiguous communication (80+80), and simultaneous interference detection with communications, while a conventional architecture had a limitation [1]. The highly isolated LO distribution scheme allows for simultaneous operation of both synthesizers. The RF loopback path is inserted between the output of the TX DA and the input of the RX RFAMP for the IQ calibration. The baseband loopback path is also connected between TXBB and RXBB. A pure current-mode analog BB chain was employed for better noise performance in the TX. Both the on-chip interference analyzer and calibration engine are integrated in the digital domain.

The IQ mismatch is one of the critical factors that deteriorate IRR and EVM. The self-calibration scheme implemented here corrects the IQ mismatch in both the TX and RX simultaneously and independently based on two sets of loopback data. The data set is obtained through phase characteristics modification of the loopback path by setting the DAO capacitor bank normally used for TX frequency channel control. This concept is beneficial compared to the previous works [1,2]; [2] can calibrate only the RX IQ mismatch and [1] requires an additional downconversion circuit. In this work, an additional DAC, which is not depicted in Fig. 28.1.1 is implemented for carrier-leakage calibration.

The digital IQ compensator shown in Fig. 28.1.2 compensates FD-IQ amplitude and phase mismatches caused by an imperfect isolation between the I and Q signals, especially in the RX. Frequency-dependent compensations for not only phase but also amplitude are mandatory to deliver a 1K-QAM 80+80MHz signal; however this was impossible by using a conventional affine transformer. The proposed key idea is to convert the frequency-dependent error that manifests itself as an error in amplitude to that of one that manifests itself as an error in phase, by rotating the IQ axis by an optimum angle  $\phi$ , therefore cancelling both frequency dependences only in the phase domain as shown in the IQ map. The residual phase dependence can be cancelled by using a compact FIR filter through tap-coefficient tuning [2]. As a result, the proposed scheme enables the compact FIR to correct both amplitude and phase with minimum circuit overhead. Sufficient TX and RX IRRs of -56dB and -51.6dB were measured over 80MHz bandwidth. The calibration scheme completes within 12.7μs which is faster than the Short-Inter-Frame-Space (SIFS) defined by WLAN standards, and thus Inter-packet calibrations are possible depending on temperature drifts.

A pure current-mode analog TX BB shown in Fig. 28.1.3 is another key feature to minimize noise and support 1K-QAM with a minimum silicon area. A conventional design with a voltage-mode TXBB requires opamp-based TIA whose large number of transistors introduces significant noise and area increments [3]. The proposed TIA-less design is beneficial in terms of noise and area without critical IM degradations. A simple current mirror with a lowpass filter structure is suitable for current-mode DAC and QMOD. The wideband DAC allows LPF with a high cut-off frequency and a compact layout. The poor CMRR of the proposed circuit can be compensated by the carrier leak calibration, which was originally prepared for QMOD compensation. The simulation results reveal that the proposed pure current-mode architecture achieves 3dB better SNR and 65% silicon area saving compared to a voltage-mode design.

Figure 28.1.4 shows the LO distribution network that satisfies both -50dB isolation for non-contiguous mode and reconfigurability for all the possible operation modes. Inter-chain buffers are introduced to transfer 10GHz LO signals for maximum 5mm distance that is constrained by TRX placement under unidirectional gate arrangement in the process technology. The buffers are composed of a cascade configuration, whose two stages are in different supply voltage domains and both stages are turned off at each frequency boundary when the dual synthesizers operate simultaneously. Thanks to the design strategy, sufficient isolations better than 50dB between the chains, which are almost 10dB lower than the required EVM, were confirmed in simulation. There was no measured phase-noise degradation of SYNO under the condition that SYN1 is operating simultaneously only in 80MHz offset. Sufficient TX EVMs of -37.5dB at 5530MHz and -36.8dB at 5775MHz were measured as well in non-contiguous (80+80) mode.

The real-time interference analyzer shown in Fig. 28.1.5 was implemented as well for a system throughput optimization. Interferences such as MW oven and radar in the 2GHz/5GHz bands limit available WLAN channels and degrade throughput, thus time-frequency-analysis-based interference identification enables a WLAN system to take an optimum transmission strategy. For example, if MW oven radiation is identified, a WLAN system can recover throughput by switching to a different channel or by adjusting the packet transmission timing since it has an on-off cycle corresponding to a commercial power-supply frequency. As fast as 10μs of identification latency clearly improves the throughput, but an approximately 10-to-20MHz manufacture-dependent radiation bandwidth of inverter-type ovens, and coexistence with other WLAN signals makes the detection difficult. The analyzer consists of three parallel detectors for minimal latency, which will be referred to as 1) wideband detector, 2) narrowband detector and 3) MW oven detector. The wideband detector identifies WLAN signals such as DSSS/OFDMA/non-contiguous (80+80) signal /OFDMA. The narrowband detector is designed for relatively smaller bandwidth signals such as radar and Bluetooth. An independent detector for MW oven was designed to identify the radiation with rapid change in amplitude. In the MW oven detector, time-domain and frequency-domain detections are independently perform as follows. The differentiator, the median filter, and the output comparator detect wideband signals with rapid amplitude change in the time domain, and filter out WLAN signals which have stable amplitude. The differentiator resolution could be minimized to 4b for gate-count compression without detection-accuracy degradation. Frequency-domain detection is done by differentiating and filtering logic that can detect the repeatedly-changed waveform. A 2D flag map is generated based on the time-frequency-domain outputs. As shown in the figure, time-frequency 2D flag maps are successfully captured for both MW oven and WLAN signals with the implemented analyzer, and the system can instantaneously allocate channels in the frequency spaces not occupied by interferences. The gate count for the MW detector was only 320K.

Figure 28.1.6 shows 1K-QAM TX constellation, RX IRR, and OFDMA operation results. The calibration improves RX IRR by 17.7dB down to -53.1dB on average over 80MHz bandwidth. The TX output power and received EVM in the OFDMA mode show that the multi-user signal is successfully transmitted and received.

Figure 28.1.7 shows the die micrograph and the measurement summary. The die size is 44.6mm<sup>2</sup> where 12.0mm<sup>2</sup> consists of RF analog circuitry.

### References:

- [1] T.-M. Chen, et al., "An 802.11ac Dual-Band Reconfigurable Transceiver Supporting up to Four VHT80 Spatial Streams with 116fs<sub>rms</sub>-Jitter Frequency Synthesizer and Integrated LNA/PA Delivering 256QAM 19dBm Per Stream Achieving 1.733Gb/s PHY rate," ISSCC, pp. 126-127, Feb. 2017.
- [2] S. Kawai, et al., "A 1024-QAM Capable WLAN Receiver with -56.3 dB Image Rejection Ratio Using Self-Calibration Technique," ISCAS, pp. 1346-1349, June 2017.
- [3] T.-M. Chen, et al., "A 2x2 MIMO 802.11 abgn/ac WLAN SoC with Integrated T/R Switch and On-Chip PA Delivering VHT80 256QAM 17.5dBm in 55nm CMOS," IEEE RFIC, pp. 225-228, June 2014.
- [4] S.-T. Yan, et al., "An 802.11a/b/g/n/ac WLAN Transceiver for 2x2 MIMO and Simultaneous Dual-Band Operation With +29 dBm Psat Integrated Power Amplifiers," IEEE JSSC, vol. 52, no. 7, pp. 1798-1813, July 2017.
- [5] M. He, et al., "A 40nm Dual-Band 3-Stream 802.11a/b/g/n/ac MIMO WLAN SoC with 1.1Gb/s Over-The-Air Throughput," ISSCC, pp. 350-351, Feb. 2014.



Figure 28.1.1: Block diagram of proposed SoC.



Figure 28.1.2: Frequency-dependent IQ amplitude and phase compensator.



Figure 28.1.3: Current-mode TX baseband and programmable-gain QMOD schematics.



Figure 28.1.4: LO distribution schematic, measured phase noise, measured EVM in non-contiguous (80+80).



Figure 28.1.5: Block diagram of integrated interference analyzer and measured power interference density with detection result.



Figure 28.1.6: Measured EVM with calibration, TX OFDMA downlink interference, RX OFDMA EVM.

|                                       | This work                                                                                                                              | ISSCC2017[1]                  | JSAC2017[4]                           | ISSCC2014[5]                   |
|---------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|---------------------------------------|--------------------------------|
| WLAN standards                        | <b>4x4<br/>11abgn/ac/ax</b>                                                                                                            | 4x4<br>11abgn/ac              | 2x2<br>11abgn/ac                      | 3x3<br>11abgn/ac               |
| Process [nm]                          | 28                                                                                                                                     | 40                            | 40                                    | 40                             |
| TX EVM [dBm]                          | -42.5(ax,40M,1KQAM <sup>1</sup> ,-5dBm)<br>-38.4(ac,80M,256QAM <sup>2</sup> ,-5dBm)<br>-38.1(ax,80M,1KQAM <sup>1</sup> ,-5dBm)         | NA                            | -40 (20M, Floor)<br>-41 (HT40, -5dBm) |                                |
| RX sensitivity [dBm]                  | -78.4(q,54M)<br><b>-64.2(ax,40M,1KQAM<sup>1</sup>)</b><br>-65.4(ac,80M,256QAM <sup>2</sup> )<br><b>-57.7(ax,80M,1KQAM<sup>1</sup>)</b> | -77(L,G,54M)<br>-78.3(54Mbps) | NA                                    |                                |
| RX NF [dB]                            | 2.9<br>5.0                                                                                                                             | N/A<br>N/A                    | 2.9<br>4.5                            | 3.0<br>4.3                     |
| RF power consumption [mW]             |                                                                                                                                        |                               |                                       |                                |
| TX 2.4G                               | 844(4S5+1LO,-5dBm)                                                                                                                     | 3863<br>(4S5+1LO,21dBm)       | 1460<br>(2S5+1LO,20dBm)               | 1080 <sup>4</sup><br>(3S5+1LO) |
| TX 5G                                 | 832(4S5+2LO,-5dBm)                                                                                                                     | 4164<br>(2S5+2LO,22dBm)       | 1759<br>(2S5+1LO,19dBm)               | 1520 <sup>4</sup><br>(3S5+1LO) |
| RX 2.4G                               | 354(4S5+1LO)                                                                                                                           | 297(4S5+1LO)                  | 17<br>(2S5+1LO)                       | 1170 <sup>4</sup><br>(3S5+1LO) |
| RX 5G                                 | 447(4S5+2LO)                                                                                                                           | 474(4S5+2LO)                  | 2<br>(2S5+1LO)                        | 2880 <sup>4</sup><br>(3S5+1LO) |
| Image rejection ratio after cal. [dB] | -53(RX, Ave. over 80M)<br>-59(RX, at 5MHz)<br>-61(TX, Ave. over 80M)<br>-64(TX, at 5MHz)                                               | -61(TX, at 5MHz)              | NA                                    | NA                             |
| RF chip area[mm <sup>2</sup> ]        | 12.0                                                                                                                                   | 11.4                          | 8.6                                   | 21.5 <sup>5</sup>              |
| 1K QAM                                | Yes                                                                                                                                    | No                            | No                                    | No                             |
| FD-IQ amp. mismatch calibration       | Yes                                                                                                                                    | No                            | No                                    | No                             |
| Non-contiguous                        | Yes                                                                                                                                    | Yes                           | No                                    | No                             |
| Integrated interference analyzer      | Yes                                                                                                                                    | No                            | No                                    | No                             |

<sup>1</sup> Modulation and coding scheme is MCS11    <sup>2</sup> Modulation and coding scheme is MCS9    <sup>3</sup> Modulation and coding scheme is MCS27    <sup>4</sup> SoC power    <sup>5</sup> SoC area

Figure 28.1.7: Performance comparison with previous works and die micrograph.

## 28.2 An ADPLL-Centric Bluetooth Low-Energy Transceiver with 2.3mW Interference-Tolerant Hybrid-Loop Receiver and 2.9mW Single-Point Polar Transmitter in 65nm CMOS

Hanli Liu, Zheng Sun, Dexian Tang, Hongye Huang, Tohru Kaneko, Wei Deng, Rui Wu, Kenichi Okada, Akira Matsuzawa

Tokyo Institute of Technology, Tokyo, Japan

This paper demonstrates a Bluetooth Low-Energy (BLE) transceiver (TRX) achieving ultra-low-power (ULP) operation for Internet-of-Things (IoT) applications. As more and more devices will be connected and access to the Internet, the wireless traffic will be extremely crowded in the 2.4GHz ISM band. To coexist with all the wireless devices without being interfered by co-channel and out-of-band (OB) signals, a BLE receiver (RX) should have very high adjacent-channel rejection and very high blocker tolerance. At the same time, the total power consumption should be minimized for longer battery life. In this work, the TRX utilizes a wide loop-bandwidth All-Digital PLL (ADPLL) as a central component for transmitter (TX) modulation, RX analog data digitization, and phase synchronization. The single-channel demodulation method is adopted for cutting half of the analog baseband circuit to further reduce the power consumption while maintaining a high interference rejection. This BLE TRX achieves the lowest energy consumption among the state-of-the-art works in the comparison table [1-5] while satisfying all the interference requirements with sufficient margins.

A hybrid-loop RX is proposed in [1] targeting adjacent-channel rejection while eliminating two ADCs and Q-channel to reduce power. However, the reuse of the ADPLL as an ADC consumes large power, which makes it less attractive. Furthermore, the RX suffers from carrier frequency mismatch between TX and RX, and non-coherent demodulation by only the I-channel drops Q-channel information [1]. Figure 28.2.1 shows the BLE TRX block diagram. The hybrid-loop RX architecture is adopted for eliminating the ADCs and a Q-path for demodulation. The ADPLL is capable of 5MHz wide loop-bandwidth operation using a 26MHz reference thanks to the integrated reference doubler and loop latency compensation technique. Less than 0.5 reference-period loop latency is achieved, which realizes over 70-degree phase margin at 5MHz loop-bandwidth. This wide loop-bandwidth operation of the ADPLL stabilizes the oscillator frequency while being used as an ADC, hence it can also be used as the local oscillator (LO) for downconversion at the mixer side. The power consumption of the ADPLL is only 1.1mW in RX mode. The Digitized data  $D_{out}$  from the ADPLL is filtered by a 27-tap FIR filter for further noise and interference rejection. The phase and frequency information is estimated from  $D_{out}$  and feedback to LO frequency directly through the ADPLL frequency control word (FCW). The synchronization loop mitigates the phase rotation of I and Q channels caused by frequency and phase offset, which improves the RX sensitivity. Finally, the symbol timing will be recovered and demodulated by the integrated data decoder. To further take the advantage of the wide loop-bandwidth operation of the ADPLL, a 1Mb/s GFSK signal is modulated through FCW, which forms a single-point modulator. It can make the TX simplified without any calibration such as DCO gain and linearity calibration in the conventional two-point polar modulators. Furthermore, the wide loop-bandwidth operation also relaxes the PA pulling effect in TX mode, which is a problem in the conventional two-point modulator when DCO and PA share the same frequency.

Figure 28.2.2 shows the low-IF receiver [5] as one of the most common receiver architectures for ULP applications. It achieves good in-band and OB blocker performance. However, it requires both I and Q channels which consumes significant power. In this architecture, we take advantage of the single-channel receiving method, where the incoming BLE signal is downconverted to a very low IF signal (250kHz) in order to transfer the constellation into DPSK from GFSK modulation [1] and can be demodulated using only the I-channel. The very-low-IF architecture helps to solve the image rejection issues, which will degrade the OB blocker performance in the sliding-IF architecture [3]. Furthermore, thanks to the lowpass filter and the lowpass transfer characteristics of the ADPLL, the overall RX system transfer function realizes an over 55dB blocker rejection over 3MHz offset. The ADPLL is reused as an ADC which transfers the analog signal to its digital version. The I and Q signals are de-correlated by using a Timing Error Detector (TED) in the single receiving channel. The phase error is filtered by a digital loop filter and added to the ADPLL FCW. Owing to the wide loop-bandwidth ADPLL operation, it can instantaneously synchronize both the phase and small frequency error of the incoming signal with the LO signal. As compared with the power budget with the conventional I/Q low-IF RX, a total of around 34% energy can be saved thanks to the single-channel demodulation architecture and elimination of ADCs.

In order to reuse the ADPLL as an ADC, the analog input signal is first converted into a frequency offset through a varactor in an LC-VCO, and it is integrated as a phase offset. As shown in Fig. 28.2.3, in the conventional circuit the analog input signal  $V_{PGA}(t)$  is directly fed to the varactor, and the converted phase information  $\phi_{out}(t)$  will be digitized by a Time-to-Digital Converter (TDC). The converted digital phase error will be filtered and fed back as a digital control code  $D_{out}(n)$  of the DCO to keep constant frequency and phase. Thus, the digital control code  $D_{out}(n)$  always keeps canceling the frequency drift by the input signal  $V_{PGA}(t)$  as shown in Fig. 28.2.3, i.e.,  $V_{PGA}(t)$  is digitized to  $D_{out}(n)$ . However, the varactor is a nonlinear component, and it introduces significant SNDR degradation in this A-to-D conversion. Even if a small-amplitude signal is an input to the varactor, the conversion still suffers from the noise and distortion trade-off and results in a poor SNDR performance. In this work, a feedback DAC is introduced to mitigate the nonlinearity issue in the varactor as shown in Fig. 28.2.3. The feedback DAC generates a cancellation signal to be subtracted from the incoming signal  $V_{PGA}(t)$ , and only the small-amplitude residue signal  $V_{tune}(t)$  is input to the varactor. Furthermore, the DAC feedback path will also be used as the phase-lock path for the ADPLL. The narrow-bandwidth frequency-lock path is used for centering the DC level of the feedback DAC in RX mode. This closed-loop operation can significantly improve the SNDR performance. An SNDR of 45dB is achieved in the measurement result, which is 18dB better than without the feedback DAC. In addition, the power consumption of the TDC can be saved due to a narrower TDC range. The required TDC range can be reduced because of the small amplitude of  $V_{tune}(t)$ , and the total power consumption of the PLL could be reduced to 1.1mW in this design.

Figure 28.2.4 shows the RF front-end design of the BLE TRX. A single-ended LNA is employed for lower power consumption, and the output is converted to a differential signal by a custom-designed on-chip balun. To reduce LNA power consumption, an additional lower-voltage supply is sometimes utilized [1,2]. To realize the entire 1V-supply operation while maintaining sufficient LNA current efficiency, a gm-cell is stacked on the top of the LNA transistor and the LNA output is coupled to a differential input of the gm-cell by the balun. The LNA bias feeds back from the gm-cell common-mode voltage, which stabilizes the LNA VDD voltage. The measured noise figure of the whole RX chain is around 6dB at the maximum gain of 68dB.

Figure 28.2.5 shows the measurement results of both TX and RX. The single-point modulation TX fully satisfies the BLE spectrum mask requirement. The measured FSK error is only 1.89%, which is the best among all the state-of-the-art BLE TX in Fig. 28.2.6. The modulator consumes 1.2mW for all BLE channels, and the PA consumes 1.9mW at -3dBm output and 3.7mW at 0dBm. The RX consumes 2.3mW for RX front-end and ADPLL, which is the lowest among all the works listed in Fig. 28.2.6. The sensitivity level is -94dBm for a bit-error-rate (BER) of 0.1%. The adjacent-channel rejections are measured by setting the desired signal to -67dBm and increase the adjacent channel signal power until the BER reaches 0.1%. The adjacent-channel rejection is greatly improved by enabling the DAC feedback path. For the conventional design [1], the -3MHz point does not satisfy the BLE requirement while a 10dB improvement is achieved by the dynamic range enhancement technique. Figure 28.2.7 shows the die micrograph. By adopting the single-channel receiving architecture and the dynamic-range enhancement technique, the proposed BLE TRX achieves the lowest power consumption while obtaining -94dBm sensitivity and satisfying all the interference requirements with a sufficient margin.

### Acknowledgments:

This paper is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

### References:

- [1] H. Okuni, et al., "A 5.5mW ADPLL-Based Receiver with Hybrid-Loop Interference Rejection for BLE Application in 65nm CMOS," ISSCC, pp. 436-437, Feb. 2016.
- [2] F.-W. Kuo, et al., "A Bluetooth Low-Energy (BLE) Transceiver with TX/RX Switchable On-Chip Matching Network, 2.75mW High-IF Discrete-Time Receiver and 3.6mW All-Digital Transmitter," IEEE Symp. VLSI Circuits, pp. 64-65, June 2016.
- [3] Y.-H. Liu, et al., "A 3.7mW-RX 4.4mW-TX Fully Integrated Bluetooth Low-Energy/IEEE802.15.4/Proprietary SoC with an ADPLL-Based Fast Frequency Offset Compensation in 40nm CMOS," ISSCC, pp. 236-237, Feb. 2015.
- [4] T. Sano, et al., "A 6.3mW BLE Transceiver Embedded RX Image-Rejection Filter and TX Harmonic-Suppression Filter Reusing On-Chip Matching Network," ISSCC, pp. 240-241, Feb. 2015.
- [5] J. Prummel, et al., "A 10mW Bluetooth Low-Energy Transceiver with On-Chip Matching," ISSCC, pp. 238-239, Feb. 2015.



Figure 28.2.1: Block diagram of the proposed BLE transceiver.



Figure 28.2.2: Conceptual diagram of ADPLL-centric RX architecture with phase and frequency synchronization and ADPLL reused as ADC.



Figure 28.2.3: Conceptual diagram of reusing ADPLL as ADC with enhanced dynamic range and measured SNDR performance.



Figure 28.2.4: RX front-end and analog baseband circuits capable of 1V power supply operation.



Figure 28.2.5: Measurement results of proposed BLE transceiver.

|                                                                       | This work                   | ISSCC 16[1]                           | VLSI 16[2]                             | ISSCC 15[3]                            | ISSCC 15[4]                            | ISSCC 15[5]                            |
|-----------------------------------------------------------------------|-----------------------------|---------------------------------------|----------------------------------------|----------------------------------------|----------------------------------------|----------------------------------------|
| Technology                                                            | 65nm                        | 65nm                                  | 28nm                                   | 40nm                                   | 40nm                                   | 55nm                                   |
| Data rate & Modulation                                                | 1-Mbps GFSK                 | 1-Mbps GFSK                           | 1-Mbps GFSK                            | 1-Mbps GFSK                            | 1-Mbps GFSK                            | 1-Mbps GFSK                            |
| Integration level                                                     | RF+ADPLL+DBB                | RF+ADPLL+DBB                          | RF+ADPLL+DBB+MCU SoC                   | RF+PLL+DBB                             | RF+PLL+PMU                             | RF+PLL+DBB+PMU                         |
| TX sensitivity                                                        | -94dBm                      | -90dBm                                | -95dBm                                 | -94dBm                                 | -94.5dBm                               | -94.5dBm                               |
| RX ACR(1/2/3MHz)                                                      | 1/31/36 dB                  | N/A/24/29 dB                          | N.A.                                   | 4/25/35 dB                             | 2/32/N.A. dB                           | N.A.                                   |
| Blocker Power (30~2000MHz, 2003~2399MHz, 2484~2997MHz, 3000~12750MHz) | -1dBm, -13dBm, -12dBm, 0dBm | -20dBm, -22dBm, -25dBm, -24dBm, -7dBm | -18dBm, -20dBm, -25dBm, -24dBm, -13dBm |
| TX Architecture                                                       | Single-point polar          | N.A.                                  | 2-point polar                          | 2-point polar                          | Up conversion                          | Up conversion                          |
| TX Modulation Error                                                   | 1.89%                       | N.A.                                  | 2.67%                                  | 4.8%                                   | N.A.                                   | N.A.                                   |
| TX Output Power                                                       | -3dBm                       | N.A.                                  | 0dBm                                   | -2dBm                                  | 0dBm                                   | 0dBm                                   |
| Supply voltage                                                        | 1V                          | 0.6/1.1V                              | 0.5/1V                                 | 1V                                     | 1.1V                                   | 0.9~3.3V                               |
| Power Consumption                                                     | DBB 0.3mW                   | 0.5mW                                 | N.A.                                   | 0.4mW                                  | N.A.                                   | 11.2mW                                 |
|                                                                       | Analog 2.3mW                | 5.5mW                                 | 3.75mW                                 | 3.3mW                                  | 6.3mW                                  | 10.1mW                                 |
| TRX Active area                                                       | DBB 0.2mW                   | N.A.                                  | N.A.                                   | 0.2mW                                  | N.A.                                   | 2.9mm²                                 |
|                                                                       | Analog 2.9mW                | N.A.                                  | 4.7mW                                  | 4.2mW                                  | 7.7mW                                  | 1.64mm²                                |

Figure 28.2.6: Performance comparison table.



Figure 28.2.7: Die Micrograph.

### 28.3 A 0.8V 0.8mm<sup>2</sup> Bluetooth 5/BLE Digital-Intensive Transceiver with a 2.3mW Phase-Tracking RX Utilizing a Hybrid Loop Filter for Interference Resilience in 40nm CMOS

Ming Ding<sup>1</sup>, Xiaoyan Wang<sup>1</sup>, Peng Zhang<sup>1</sup>, Yuming He<sup>1</sup>, Stefano Traferro<sup>1</sup>, Kenichi Shibata<sup>2</sup>, Minyoung Song<sup>1</sup>, Hannu Korpela<sup>1</sup>, Keisuke Ueda<sup>2</sup>, Yao-Hong Liu<sup>1</sup>, Christian Bachmann<sup>1</sup>, Kathleen Philips<sup>1</sup>

<sup>1</sup>imec - Holst Centre, Eindhoven, The Netherlands

<sup>2</sup>Renesas Electronics, Tokyo, Japan

This paper presents a low-voltage (0.8V) ultra-low-power Bluetooth 5(BT5)/Bluetooth Low Energy(BLE) digitally-intensive transceiver for IoT applications. In comparison to BLE, BT5 has a 2x higher data-rate and 4x longer range, while having >8x longer packet. The BLE prior arts [1-5] have made significant efforts to minimize the power consumption for longer battery life, as well as the chip area. However, the prior-art Cartesian BLE radios consume namely 6 to 10mW [1-3] to achieve a <-94dBm sensitivity but with a relatively high supply voltage ( $V_{DD}$ ) (>1.0V). Operating a BLE RF transceiver at a lower  $V_{DD}$  (e.g., <0.85V) not only extends the battery life by up to 50% [3], and reduces the Power-Management-Unit complexity, but also can accommodate a wider range of energy sources (e.g., harvesters). A recent single-channel phase-tracking RX [5] demonstrated a potential to reduce the chip area and the power consumption at a  $V_{DD}$  down to 0.85V. However, it suffers from a degraded sensitivity due to a poor deviation frequency control and an excessive loop delay, limited ACR (Adjacent-Channel-Rejection) due to the digitally-controlled-oscillator (DCO) side-lobe energy, and an undefined initial carrier frequency due to the lack of a PLL/FLL that could have a risk of tracking to an interference. This work presents a fully-integrated 0.8V phase-domain BT5/BLE-combo transceiver, including a PHY-layer digital baseband (DBB), and addresses the above-mentioned issues by employing two key techniques: 1) a hybrid loop filter with a loop-delay compensation for DCO side-lobe suppression to enhance interference tolerance, and 2) an all-digital PLL (ADPLL)-based digital FM interface shared between RX and TX is employed, including a deviation frequency calibration, and it also precisely defines the initial frequency. Moreover, the PHY-layer DBB that supports a packet-mode phase-tracking RX operation is also demonstrated.

Figure 28.3.1 shows the block diagram of the transceiver, illustrating a phase-tracking RX, a direct frequency-modulation digital TX, an ADPLL-based digital FM interface, and a PHY-layer DBB. The single-channel zero-IF RX together with the DCO forms a phase-tracking loop, allowing direct frequency demodulation with a simple 1b comparator (CMP) at low  $V_{DD}$ . The digital TX consists of a DCO modulated by the digital FM interface and an energy-efficient (30%) Class-D PA with an on-chip matching. The ultra-low-power dividerless snapshot ADPLL [6] avoids high-speed logics, allowing it to operate at 0.8V operation and consume only 415μW (excluding DCO). During the RX mode, the ADPLL is only needed at the start-up to define the center frequency ( $f_c$ ). After that, it is disabled to save power and the phase-tracking loop takes over the DCO control. Similarly, in TX mode, the ADPLL can be disabled after initial frequency lock  $f_c$  and the DCO frequency can be directly open-loop modulated. In both modes, the DCO gain ( $K_{dco}$ ) error due to PVT variation (e.g., ±20%) must be calibrated within ±5%, to avoid degrading either RX sensitivity or TX modulation quality. In this work,  $K_{dco}$  estimated by the ADPLL ensures the precision of the deviation frequency to be within ±3%. The DCO frequency can also be optionally modulated using 2-point injection to provide a long-term carrier frequency stability in case of very long packets, e.g., 17ms as in BT5. In addition to the data-aided automatic frequency calibration (AFC) adopted in [5], the presented RX implements an initial Coarse Frequency Offset (CFO) removal to ensure the CFO is compensated during the BT5/BLE packet preamble.

In the phase-tracking RX, the trade-off between loop delay and interference filtering is crucial to achieve both good sensitivity and selectivity. The RX SNR for demodulation is degraded, and hence sensitivity, in case there is a large total RX loop delay ( $t_{loop}$ ). As indicated in Fig. 28.3.2, if  $t_{loop}$  exceeds half-symbol duration, the SNR is degraded rapidly, according to the behavioral simulations. The phase-tracking RX in [5] employs a high-Q Elliptic analog LPF to reject the adjacent channel interference. However, this increases loop delay due to the high-Q LPF and degrades the RX sensitivity. In addition, the DCO side-lobe energy of the phase-tracking RX is another source of ACR degradation which cannot be filtered by the analog LPF. As shown in Fig. 28.3.2, DCO side-lobe energy is

relatively high when it is only modulated by the 1b comparator signal as in [5]. In the presence of interference, the product of the DCO side-lobe energy and the interference will mingle with the downconverted desired signal, which limits the ACR performance. Therefore, to both optimize the loop delay and reduce the DCO side-lobe energy, a hybrid loop filter with loop-delay compensation is proposed, which consists of an analog loop filter (ALF) and a digital loop filter (DLF) with loop delay compensated. Both the ALF and DLF implement a notch between 2<sup>nd</sup> and 3<sup>rd</sup> channel frequency, which further provides extra attenuation. Specifically, the ALF rejects the interferences before the comparator to ensure a sufficient SNR for demodulation, while the DLF suppresses both the DCO side-lobe energy and the residue of the interference after the ALF without introducing extra delay. It significantly improves the sensitivity (5dB) and ACR performance (2 to 7dB) compared to [5].

Figure 28.3.3 shows the RX implementation with the proposed hybrid loop filter. The single-ended inverter-based LNA is implemented to maximize gain at low  $V_{DD}$  (0.8V). A single-balanced passive mixer together with the lowpass input impedance of the transimpedance amplifier (TIA) forms a bandpass profile, which enhances the out-of-band blocker performance. The ALF employs a 3<sup>rd</sup>-order Chebyshev II topology with a pair of complex zeros as a notch at adjacent frequency. The goal of the DLF is to suppress the DCO side-lobe energy by 2dB and 6dB at the 2<sup>nd</sup> and 3<sup>rd</sup> channel to meet the ACR target in BLE/BT5. In addition, the loop-delay compensation is implemented, which consists of three paths: Proportional, Integral, and Derivative (PID). These three paths form a highpass profile, which intrinsically provides a negative delay, compensating the delay of the following digital filter (~4T<sub>clk</sub>). Thanks to the loop-delay compensation, a 4<sup>th</sup>-order Chebyshev II digital filter is implemented, suppressing the DCO side-lobe energy by 3 to 6dB at the adjacent channels.

The chip was implemented in 40nm CMOS, with a core area of only 0.8mm<sup>2</sup> (Fig. 28.3.7), including the on-chip matching. The chip was measured with standard BLE/BT5 (1Mb/s / 2Mb/s) GFSK signals with PRBS9 pattern. Thanks to the proposed hybrid loop filter, the RX shows 2-to-7 dB improvement in ACR compared to [5], and does not suffer from the adjacent channel image issue as in [4] (Fig. 28.3.4). Despite the lower  $V_{DD}$ , the RX has a worst-case OOB blocker of -17dBm, and does not have any OOB image as in [2,3]. To ensure that the demodulation performance is not degraded by any large DC offset during phase tracking, it is calibrated to within ±10mV before packets. As shown in Fig. 28.3.5, the PHY DBB performs this DC offset calibration and assists the phase-tracking RX to detect BLE/BT5 packets. The  $K_{dco}$  error can be calibrated well within ±3% thanks to the ADPLL-facilitated frequency-deviation calibration, which leads to an excellent -95 and -92dBm sensitivity in BLE and BT5, respectively. The RX, including the digital FM interface and DLF, consumes 2.3 and 2.9mW in BLE and BT5 modes, respectively, where the digital power consumption (35%) and area (15%) can be further scaled in advance technologies. The TX exhibits a modulation quality of 2% (BLE) and 1.4% (BT5). The PA delivers maximum 1.8dBm power with 30% efficiency, and the 2<sup>nd</sup> and 3<sup>rd</sup> harmonics are -56dBm and -64dBm respectively, which are significant below FCC regulation. Figure 28.3.6 shows the benchmark of the state-of-the-art phase-tracking RX and BLE radios. This chip presents a BT5/BLE-combo fully integrated transceiver with a best RX figure-of-merit (FoM) and a lowest supply voltage among the state-of-the-arts in the table.

#### References:

- [1] J. Prummel, et al., "A 10mW Bluetooth Low-Energy Transceiver with On-Chip Matching," *ISSCC*, pp. 238-239, Feb. 2015.
- [2] T. Sano, et al., "A 6.3mW BLE Transceiver, Embedded RX Image-Rejection Filter and TX Harmonic-Suppression Filter Reusing On-Chip Matching Network," *ISSCC*, pp. 240-241, Feb. 2015.
- [3] X. Wang, et al., "A 0.9-1.2V Supplied, 2.4GHz Bluetooth Low Energy 4.0/4.2 and 802.15.4 Transceiver SoC Optimized for Battery Life," *ESSCIRC*, pp. 125-128, Feb. 2015.
- [4] H. Okuni, et al., "A 5.5mW ADPLL-Based Receiver with Hybrid-Loop Interference Rejection for BLE Application in 65nm CMOS," *ISSCC*, pp. 436-437, Feb. 2016.
- [5] Y.-H. Liu, et al., "A 770pJ/b 0.85V 0.3mm<sup>2</sup> DCO-Based Phase-Tracking RX Featuring Direct Demodulation and Data-Aided Carrier Tracking for IoT Applications," *ISSCC*, pp. 408-409, Feb. 2017.
- [6] Y. He, et al., "A 673μW 1.8-to-2.5GHz Dividerless Fractional-N Digital PLL with an Inherent Frequency-Capture Capability and a Phase-Dithering Spur Mitigation for IoT Applications," *ISSCC*, pp. 420-421, Feb. 2017.



Figure 28.3.1: Simplified block diagram of the transceiver.



Figure 28.3.2: Illustration of the phase-domain RX with interference resilience.



Figure 28.3.3: RX implementation with the hybrid loop filter and the loop-delay compensation.



Figure 28.3.4: Measured RX performance (Adjacent-Channel-Rejection, out-of-band performance, power consumption).

Figure 28.3.5: Measured RX sensitivity with  $K_{dco}$  calibration and DC offset calibration, and packet detection in real time.

|                          | This work                 |           | [5]<br>Y.H. Liu<br>ISSCC'17 | [4]<br>H. Okuni<br>ISSCC'16 | [3]<br>W. Yang,<br>ESSCIR'17 | [1]<br>X. Wang<br>ESSCIR'16 | [2]<br>J. Prummel<br>ISSCC'15 | [2]<br>T. Sano<br>ISSCC'15 | F.W. Kuo<br>JSSC'17      |
|--------------------------|---------------------------|-----------|-----------------------------|-----------------------------|------------------------------|-----------------------------|-------------------------------|----------------------------|--------------------------|
| Standards                | BLE                       | BT5       | Proprietary                 | BLE                         | BT/BLE                       | BLE                         | BLE                           | BLE                        | BLE                      |
| Data rate                | 1Mbps                     | 2Mbps     | 2Mbps                       | 1Mbps                       | 2Mbps                        | 1Mbps                       | 1Mbps                         | 1Mbps                      | 1Mbps                    |
| Supply voltage           | 0.8V                      |           | 0.85V                       | 1.1V                        | 1.2V                         | 1V                          | 0.9V-3.3V                     | 1.1V                       | 1V                       |
| Technology               | 40nm                      |           | 40nm                        | 65nm                        | 55nm                         | 40nm                        | 55nm                          | 40nm                       | 28nm                     |
| Integration level        | RX/TX/partial DBB         |           | Zero-IF<br>phase-tracking   | Low-IF<br>hybrid loop       | Low-IF<br>Cartesian          | Sliding-IF<br>Cartesian     | Low-IF                        | Sliding-IF<br>Cartesian    | High-IF<br>Discrete-Time |
| RX architecture          | Zero-IF<br>phase-tracking |           | Zero-IF<br>phase-tracking   | Low-IF<br>hybrid loop       | Low-IF<br>Cartesian          | Sliding-IF<br>Cartesian     | Low-IF                        | Sliding-IF<br>Cartesian    | High-IF<br>Discrete-Time |
| Noise Figure             | 5.9dB                     |           | 6dB                         | -                           | 5.5dB                        | -                           | -                             | 6.5                        | 6.5                      |
| Image rejection          | No image                  |           | No image                    | -2dB                        | -                            | 35dB                        | No image                      | 70                         | 42                       |
| TX ACR (2nd/3rd-channel) | 18/30dB                   | 18/29.5dB | 20dB@5MHz                   | 24/29dB                     | 51/54dB                      | >17/27dB                    | -                             | 32/-dB                     | -                        |
| RX worst-case OOB        | -17dBm                    |           | -32dBm                      | -22dBm                      | 5dBm                         | -                           | -9dBm                         | -28dBm                     | -25dBm                   |
| TX max. Pout             | 1.8dBm                    |           | -                           | -                           | 8dBm                         | 1dBm                        | 2.3dBm                        | 0dBm                       | 3                        |
| TX Freq. Err.            | 2% / 1.4%                 |           | -                           | -                           | 5.4%                         | 6%                          | -                             | 2.6%                       | -                        |
| Radio area               | 0.8mm <sup>2</sup>        |           | 0.3mm <sup>2</sup>          | 2mm <sup>2</sup>            | 2.2mm <sup>2</sup>           | 1.6mm <sup>2</sup>          | 2.9mm <sup>2</sup>            | 1.1mm <sup>2</sup>         | 1.9mm <sup>2</sup>       |
| On-chip matching         | Yes                       |           | No                          | No                          | Yes                          | Yes                         | Yes                           | Yes                        | Yes                      |
| RX sensitivity           | -95dBm*                   | -92dBm*   | -87dBm*                     | -90dBm*                     | -96.8dBm*                    | -93#                        | -85#                          | -94.5                      | -94.5#                   |
| RX FOM <sub>SEN</sub> ** | 181.4dB                   | 180.4dB   | 178dB                       | 172dB                       | 176dB                        | 179dB                       | 170dB                         | 174dB                      | 172dB                    |
| Power cons.              | BLE                       | BT5       | BLE                         | BT/BLE                      | BLE                          | BLE                         | BLE                           | BLE                        | BLE                      |
|                          | 2.3mW                     | 2.9mW     | 1.55mW                      | 5.5mW                       | 12mW                         | 5.6mW                       | 11.2mW                        | 6.3mW                      | 2.75mW                   |
|                          | 6.1mW                     | 6.1mW     | -                           | -                           | 79.6mW                       | 9.4mW                       | 10.1mW                        | 7.7mW                      | 3.7mW#                   |
|                          | 0.74mW                    | 1.1mW     | -                           | -                           | 0.5mW                        | -                           | 0.6mW                         | -                          | -                        |

\* Based on BER, # Based on PER  
\*\* RX sensitivity FOM = sensitivity-10 × log(P<sub>DC</sub>/Data rate)  
## at 0dBm output power

Figure 28.3.6: Performance summary and comparison with the state-of-the-art.



Figure 28.3.7: Die micrograph in 40nm CMOS.

## 28.4 A 0.45V Sub-mW All-Digital PLL in 16nm FinFET for Bluetooth Low-Energy (BLE) Modulation and Instantaneous Channel Hopping Using 32.768kHz Reference

Min-Shueh Yuan<sup>1</sup>, Chao-Chieh Li<sup>1</sup>, Chia-Chun Liao<sup>1</sup>, Yu-Tso Lin<sup>1</sup>, Chih-Hsien Chang<sup>1</sup>, Robert Bogdan Staszewski<sup>2</sup>

<sup>1</sup>TSMC, Hsinchu, Taiwan; <sup>2</sup>University College Dublin, Dublin 4, Ireland

The current paradigm of frequency synthesis for short-range wireless transceivers, such as BLE, is to use a crystal oscillator (XO) in the tens-of-MHz range as a frequency reference (FREF) to phase lock an RF oscillator [1-4]. This ensures a sufficiently wide PLL bandwidth of tens to hundreds of kHz to quickly acquire a new channel and to suppress lower-frequency phase noise (PN) of the RF oscillator. The latter requirement can be alleviated by substantially lowering the flicker PN of a digitally controlled oscillator (DCO) thus allowing to freeze its tuning word updates during receive (RX) packets and further directly FM-modulating the DCO during transmit (TX) packets [1]. However, an all-digital PLL (ADPLL) is still needed just to quickly settle the DCO to each new channel.

Eliminating the XO would significantly cut the total consumed IoT energy since it drains ~100uW just to sustain itself. Hence, for low-duty-cycle operations, the XO must be periodically shut down. However, each XO restart is energy intensive (~1mW over ~1ms) so only XO shutdowns >10ms make sense [5]. In such systems, the energy drained by the XO can now be actually higher than in an RF PLL (~1mW over ~0.5ms [1,2]). Only very recently, there was an effort embarked by the authors of [7] to reduce the XO startup energy by dynamically adjusting load capacitance of its Pierce oscillator.

In this work, we demonstrate the elimination of XO and regular-bandwidth PLL for BLE by resorting to a 32.768kHz real-time clock (RTC) that is ubiquitous in IoT hosts for TX/RX scheduling (<20ppm frequency accuracy is now widely available in RTC). As the resulting max PLL bandwidth is 1kHz or so, settling-time response to hopping channels would be comparable to the packet length of ~0.5ms, thus making it impractical. We propose to replace the conventional channel settling with a band settling (see Figure 28.4.1 top) that would be carried out only once per global device power up. The ADPLL would always stay settled to the center channel ( $CH=20$  @2440MHz) and perform the combined channel hopping and FM modulation by instantaneously offsetting the DCO resonance from the center channel via a 2-point modulation [6]. During periodical sleeps, the oscillator tuning word (OTW) registers will be saved in order to restore the same center-channel frequency at wakeups. Since the band span ( $\Delta f_{band} = 80\text{MHz}$ ) is much wider than the channel span ( $\Delta f_{CH} = 2\text{MHz}$ ), the DCO nonlinearity will be evident so it must be compensated.

Details of the proposed ADPLL architecture are shown in Figure 28.4.2.  $f_T$  of this technology is in the hundreds of GHz, so the devices can operate at subthreshold to directly interface with solar cell harvesters going below 0.5V. The only exception is a 128-stage time-to-digital converter (TDC), which requires fine resolution (<20ps) so it must operate at the nominal supply of 0.75V. Consequently, a tiny voltage doubler is added. This necessitates level shifters (L2H and H2L) for the 1X-2X voltage domain transfers. The DCO variable clock (CKV) of ~2440MHz is divided by 256 to generate ~9.53MHz for the frequency modulation clock  $f_{FM}$ . Other clocks can be readily generated by integer or fractional division of CKV/2. A clock-gating (CG) circuitry cuts unnecessary edges of CKV/2 before feeding them to the TDC. To ensure the precise and instantaneous channel hopping, the DCO gain  $K_{DCO}$  should be precisely known across PVT variations and channel frequencies. We propose an LMS calibration technique [6] to normalize  $K_{DCO}$  at the center channel 20 in real time, and exploit the hopping frequency deviation ( $CH[n]-20$ ) as a stimulus to inject into the 2-point modulating ADPLL and to observe the filtered phase error  $\phi_E[n]$  as a reaction. The DCO normalization multiplier  $f_R/K_{DCO}$  is updated once per packet  $n$  according to  $f_R/K_{DCO}[n] = f_R/K_{DCO}[n-1] + \mu \cdot \phi_E[n] \cdot \text{sign}(CH[n]-20)$ , where  $\mu$  is an adaptation step size and  $f_R$  is the reference frequency. With this iterative algorithm, the reaction of filtered  $\phi_E[n]$  due to hopping perturbations diminishes and the normalized DCO gain converges (Figure 28.4.1 bottom), such that the ADPLL hopping bandwidth will become allpass [6].

OTW hopping control at a channel  $CH = 1\dots40$  away from  $CH=20$  can be obtained by  $OTW_{CH=20} = [(CH-20) \cdot (\Delta f_{CH}/f_R)] \cdot (f_R/K_{DCO}) \cdot X(CH)$ , where the factor in square brackets is  $FCW_{CH=20}$ .  $X(CH)$  is a compensation factor due to a non-linear DCO gain, which is a cubic function of the resonant frequency (see Figure 28.4.3). Since the max frequency deviation ( $f_{CH40}-f_{CH20}$ ) is only 1.6%, a linear approximation is used:  $X(CH) = 1/[1 + 3(f_{CH} - f_{CH20})/f_{CH20}]$ .

Figure 28.4.3 (right) shows the proposed DCO. The transformer turns ratio N=2 and coupling coefficient ( $k=0.74$ ) are designed to provide a passive voltage gain [8] from

the drain (transformer's primary) to gate (transformer's secondary) sides of the transconductor pair, thus boosting the loop gain by 50% and improving the start-up at a lower supply of 0.23V. The DCO has five switched-capacitor (sw-cap) tuning banks: PVT, coarse (COAR) and FINE for the PLL, as well as hopping (HOP) and FM for the 2-point modulation. The PVT bank is unit-weighted and provides large 10.2MHz steps. It is split into the transformer's primary and secondary to achieve the max Q-factor enhancement. To ensure enough coverage in each tuning bank, a COAR bank has a step size of 0.48MHz and the FINE bank has the finest resolution of 16kHz to eliminate the need for power-hungry  $\Sigma\Delta$  DCO dithering. To achieve the finest resolution, the FINE bank is connected to the primary coil to benefit from the capacitance transformation of  $1/N^2$ . The FM sw-cap bank with 128 unit-weighted 16kHz units covers the ±250kHz GFSK modulation range.

The HOP1/HOP2 bank with 14 binary-weighted bits is used for the fast hopping with tuning range of 180MHz to cover the BLE band. A binary-weighted arrangement was chosen rather than unit-weighted as it allowed to reduce the occupied sw-cap bank area by ~90%. However, the binary scheme produces process-related mismatch errors so a segmentation technique is proposed (see Figure 28.4.3 top) to directly compensate them. Each controlled hopping bit will have a compensating correction (derived via silicon measurements) to be stored in a look-up table and applied as shown in Figure 28.4.3. Another multiplicative correction,  $1/r_{FM,HOP}$  shown in Figure 28.4.2, is applied on the FM path to compensate for the systematic  $K_{DCO,FM} / K_{DCO,HOP2}$  gains ratio of 1.41 (=  $r_{FM,HOP}$ ) with standard deviation of 0.04 measured over 49 dies.

The proposed ADPLL with 2-point modulation/hopping capability for BLE is implemented in 16nm FinFET and successfully verified to meet BLE specifications through on-wafer RF probing measurement. Figure 28.4.4 shows the measurements of channel hopping and GFSK modulation: 1) nearly identical GFSK spectra at the first, middle and last channels (downconverted to baseband); 2) full-band spectrum (at maximum hold) of a 40-channel random hopping sequence; 3) zoomed-in channel-hopping (channels: 15, 20, 25) with GFSK modulation; and 4) same zoomed-in sequence but without the modulation and with the non-linear DCO gain compensation turned off/on. The hopping time is <0.1us (shown in Figure 28.4.5), as it lies well within the time resolution aperture of the available signal analyzer. The ADPLL bandwidth in all measurements is 1.3kHz.

Figure 28.4.5 further shows the measured PN. As the ADPLL bandwidth is  $32.768\text{kHz} / (2^2 \cdot 2\pi) = 1.3\text{kHz}$ , its effect cannot be visible at the minimum measured offset of 10kHz. However, the low  $1/f^3$  corner of 140kHz is clearly discernable. The measured jitter within 100kHz to 1GHz is 1.39ps. The TDC resolution is 8.6ps at 0.4V supply to the doubler and rises to 15.6ps at 0.3V. The total ADPLL power consumption  $P_{DC}$  is 0.923mW at 0.45V, the breakdown of which illustrates that the 2.1-to-2.5GHz, -104dBc/Hz (@ 1MHz offset) DCO dominates 70% of  $P_{DC}$ . The entire digital logic is clocked at 32.768kHz and draws only 6% of  $P_{DC}$ .

Figure 28.4.6 summarizes the performance and compares it to state-of-the-art BLE PLLs. The key breakthroughs here are: 1) ultra-low-voltage operation at ≤0.45V; 2) the elimination of conventional XO as FREF in favor of a 32kHz RTC; and 3) near-instantaneous channel hopping, while maintaining the best-in-class performance compared to the those in Fig. 28.4.6 at sub-mW power consumption. The die photo is shown in Figure 28.4.7.

### Acknowledgments:

The authors would like to thank Hung-Yi Kuo and Chester Kuo for all lab measurements.

### References:

- [1] F. W. Kuo, et al., "A Bluetooth Low-Energy Transceiver With 3.7-mW All-Digital Transmitter, 2.75-mW High-IF Discrete-Time Receiver, and TX/RX Switchable On-Chip Matching Network," *IEEE JSSC*, vol. 52, no. 4, pp. 1144-1162, Apr. 2017.
- [2] Y. He, et al., "A 673uW 1.8-to-2.5GHz Dividerless Fractional-N Digital PLL with an Inherent Frequency-Capture Capability and a Phase-Dithering Spur Mitigation for IoT Applications," *ISSCC*, pp. 420-421, Feb. 2017.
- [3] J. W. Lai, et al., "A 0.27mm<sup>2</sup> 13.5dBm 2.4GHz All-Digital Polar Transmitter Using 34%-Efficiency Class-D DPA in 40nm CMOS," *ISSCC*, pp. 326-327, Feb. 2013.
- [4] Y. H. Liu, et al., "A 2.7nJ/b Multi-Standard 2.3/2.4GHz Polar Transmitter for Wireless Sensor Networks," *ISSCC*, pp. 448-449, Feb. 2012.
- [5] R. Thirunarayanan, et al., "Reducing Energy Dissipation in ULP Systems: PLL-Free FBAR-Based Fast Startup Transmitters," *IEEE TMTT*, vol. 63, no. 4, pp. 1110-1117, Apr. 2015.
- [6] R. B. Staszewski, et al., "LMS-Based Calibration of an RF Digitally Controlled Oscillator for Mobile Phones," *IEEE TCAS-II*, vol. 53, no. 3, pp. 225-229, Mar. 2006.
- [7] Ding et al., "A 95uW 24MHz Digitally Controlled Crystal Oscillator for IoT Applications with 36nJ Start-Up Energy and >13x Start-Up Time Reduction Using A Fully-Autonomous Dynamically-Adjusted Load," *ISSCC*, pp. 90-91, 2017.
- [8] M. Babaie and R. B. Staszewski, "A Class-F CMOS Oscillator," *IEEE JSSC*, vol. 48, no. 12, pp. 3120-3133, Dec. 2013.



Figure 28.4.1: Mechanisms of instantaneous channel hopping, DCO gain calibration through hopping perturbation, and GFSK data modulation.



Figure 28.4.2: Block diagram of the proposed ADPLL. (In bold are the newly introduced blocks w.r.t. conventional ADPLL, e.g., [1].)



Figure 28.4.3: The DCO schematic and non-linearity compensation schemes.



Figure 28.4.4: Measured spectral GFSK frequency modulation (top-left), full-band hopping (top-right), three-channel hopping (bottom-left), and settling w/ & w/o DCO compensation (bottom-right).



Figure 28.4.5: Measured hopping settling time, phase jitter, TDC resolution and power consumption.

|                                   | This work               | [1] JSSC'17 | [2] ISSCC'17  | [3] ISSCC'13 | [4] ISSCC'12  |
|-----------------------------------|-------------------------|-------------|---------------|--------------|---------------|
| Architecture                      | ADPLL TDC               | ADPLL       | ADPLL TDC+DTC | ADPLL        | Analog CP-PLL |
| Technology                        | 16nm FinFET             | 28nm        | 40nm          | 40nm         | 90nm          |
| VDD(V)                            | < 0.45                  | 1           | 1             | 1.3          | 1.2           |
| Reference(MHz)                    | 0.032                   | 5-40        | N/A           | 26           | 24            |
| Output(GHz)                       | 2.1-2.5                 | 2.05-2.55   | 1.8-2.5       | 2.4          | 1.7-2.48      |
| RMS Jitter (ps)                   | 1.39**                  | 1.23        | 1.98          | 0.98         | 2.66          |
| Power (mW)                        | 0.923                   | 1.4         | 0.67          | 4.55         | 1.1           |
| FOM*                              | -237.5                  | -236.7      | -236          | -233.6       | -231          |
| Core Area (mm <sup>2</sup> )      | 0.24                    | 0.24        | 0.18          | 0.075        | 0.75          |
| Channel Hopping Settling Time(us) | < 0.1                   | 15          | 11            | N/A          | < 40          |
| TDC Resolution(ps)                | 7.8@0.45v<br>11.8@0.35v | 12          | N/A           | 7            | N/A           |

\*FOM=10\*log([σ<sub>jitter</sub><sup>2</sup>\*(P<sub>Dc</sub>/1mW)])

\*\* Integrated from 100kHz to 1GHz

Figure 28.4.6: Performance summary and comparison with state-of-the-art.



Figure 28.4.7: Die micrograph of the ADPLL. The core size is 0.24mm<sup>2</sup>.

## 28.5 A 0.2V Energy-Harvesting BLE Transmitter with a Micropower Manager Achieving 25% System Efficiency at 0dBm Output and 5.2nW Sleep Power in 28nm CMOS

Jun Yin<sup>1</sup>, Shiheng Yang<sup>1</sup>, Haidong Yi<sup>1</sup>, Wei-Han Yu<sup>1</sup>, Pui-In Mak<sup>1</sup>, Rui P. Martins<sup>1,2</sup>

<sup>1</sup>University of Macau, Macau, China

<sup>2</sup>Instituto Superior Tecnico/University of Lisboa, Lisbon, Portugal

Massive deployment of wireless sensor tags (e.g. iBeacon) will only happen if batteries and their replacement effort are avoided. Self-powering by harvesting the ambient energies like indoor solar and thermal gradient is prospective [1], but their inconstant sub-0.5V output hinders their utility. Adding boost converters and regulators inevitably worsens the system efficiency and integration level. This paper reports an energy-harvesting Bluetooth Low-Energy (BLE) transmitter (TX) (Fig. 28.5.1). It features a fully integrated micropower manager ( $\mu$ PM) to limit the sleep power and tolerate variation of the energy-source voltage ( $V_{DD,EH}$ ) down to 0.2V. U=An ultra-low-voltage (ULV) VCO and PA, and a passive-intensive type-I PLL are proposed. Fabricated in 28nm CMOS, the TX exhibits a 25% system efficiency at 0dBm output ( $P_{out}$ ).

The TX (Fig. 28.5.1) tailors a  $\mu$ PM with 4 specific charge pumps (CP<sub>1-4</sub>) to deliver the internal power and bias. CP<sub>1</sub> is capable to self-start since  $V_{DD,EH}=0.2V$ . It generates a 1V  $V_{DD,PM}$  to power-up the bandgap reference (BGR). CP<sub>1</sub> and BGR together offer  $V_{DD,PM}$  and PVT-insensitive  $V_{REF,PM}$  (0.55V) to start and operate CP<sub>2-3</sub>. Since the VCO plus PA dominate 99% active power, ULV designs allow them to operate directly under  $V_{DD,EH}$  (energy harvester + storage). Both VCO and PA are biased with a stable  $V_{BIAS}$  (0.39V) given by the BGR to resist  $V_{DD,EH}$  variation. The passive-intensive type-I PLL draws only 54 $\mu$ A from a 0.55V  $V_{DD,PLL}$  offered by CP<sub>2</sub>. Static logic controls are under 1V  $V_{DD,CTRL}$  given by CP<sub>3</sub>. To inhibit the sleep power, a sub-nW always-on CP<sub>4</sub> is employed to generate a -0.17V  $V_{NEG}$  for power-gating the VCO and PA.

Only CP<sub>1</sub> of the  $\mu$ PM is detailed (Fig. 28.5.2). A switched-capacitor rectifier driven by a multiphase ring-VCO reduces the switching ripple at  $V_{DD,PM}$ , which otherwise calls for a big C<sub>de</sub> compromising the area and startup time. The ring-VCO, using differential bootstrapped inverters, secures an adequate output swing (ideally 3 $\times$  $V_{DD,EH}$ ) to drive the rectifier without extra buffers. To cope with  $V_{DD,PM}$  variation, the error amplifier senses  $V_{DD,PM}$  and adjusts the delay lines of the ring-VCO via  $V_{DL}$ . Thus, its oscillation frequency can be moderated to limit the excessive power when  $V_{DD,EH}$  raises. The forward diodes at  $V_{DD,PM}$  are for overdrive protection.

CP<sub>2-3</sub> (not shown) are similar to CP<sub>1</sub>, but their error amplifier directly locks the bias current of the ring-VCO, so as to regulate  $V_{DD,PLL}$  and  $V_{DD,CTRL}$  [1]. The always-on CP<sub>4</sub> is based on a negative-output rectifier driven by a kHz-range ring-VCO. In the  $\mu$ PM measurement (Fig. 28.5.2), the critical voltages are stabilized against  $V_{DD,EH}$  0.2 to 0.3V, and settle in <400 $\mu$ s that can be overlapped with the startup time of the crystal oscillator [1].

The trifilar-coil VCO [2] features a large passive loop gain to facilitate startup at ULV. Yet, the VCO steady-state power consumption is decided by both the loss of the LC tank and transistor channel conductance ( $G_{DS}$ ). Since the trifilar-coil VCO has out-phased  $V_{GS}$  and  $V_{DS}$ , its cross-coupled pair is pushed into the triode region, resulting in large  $G_{DS}$  and power consumption especially at ULV. Here, our ULV VCO (Fig. 28.5.3-left) removes the drain-to-gate feedback for a constant drain voltage  $V_D$  of M<sub>1,2</sub> set by  $V_{DD,EH}$ . Differential-mode oscillation is achieved by source-to-gate magnetic cross-coupling ( $V_S$  to  $V_{G+}$  and  $V_{S+}$  to  $V_G$ ). Unlike [3], an auxiliary cross-coupled pair is not entailed at the gate, which otherwise penalizes the phase noise (PN) by ~2dB in the 1/f<sup>2</sup> region (even with a size of one-tenth of W/L<sub>M1,2</sub>). Since  $V_{G+}$  and  $V_{S+}$  ( $V_{G-}$  and  $V_S$ ) are in-phase, they lead to a large  $|V_G|$  (~0.78V<sub>pp</sub>) improving the PN, and can serve as the VCO outputs. The reduced  $V_{GS}$  (~0.51V<sub>pp</sub>) helps M<sub>1,2</sub> to stay in the saturation region, avoiding Q degradation of the transformer.. The transformer has a primary coil (L<sub>A11,12</sub>) stacked atop its secondary coil (L<sub>A21,22</sub>) to enlarge both the coupling factor ( $k_A \approx 0.76$ ) and turns ratio ( $N_{GS} = 5.6 = \sqrt{L_{A11}/L_{A21}}$ ). The VCO hence shows a large passive loop gain proportional to  $k_A N_{GS}$  even in the presence of source degeneration. In active mode, M<sub>1,2</sub> has a gate bias  $V_{BIAS}$  (0.39V) given by the BGR that further facilitates startup at ULV. DC isolation between  $V_{GB}$  and  $V_{DD,EH}$  improves the frequency pushing (measured 29.7MHz/V).

The 2.4GHz VCO outputs ( $V_{GS}$ ) directly drive the ULV PA (Fig. 28.5.3-right) that operates in the Class-E/F<sub>2</sub> mode to enhance the power efficiency. To meet a 0-dBm  $P_{out}$  at 0.2V, we exploit a step-down transformer (L<sub>B1,2</sub>) to reduce the drain-node resistance of M<sub>3,4</sub>, while rejecting the even-order harmonics at  $V_{out}$ . To suppress the HD<sub>3</sub> without adding an explicit LC filter, C<sub>n</sub> (0.7pF) is embedded into the secondary coil (L<sub>B2</sub>) to resonate with a part of its inductance (L<sub>n</sub>) at 3 $\times$  of the 2.4GHz ISM band:  $3f_0 = 1/(2\pi\sqrt{L_n C_n})$ . As such, |Z<sub>22</sub>| of the PA will present a high impedance at 3f<sub>0</sub> to obstruct the 3<sup>rd</sup>-harmonic current. L<sub>n</sub> is routed as the most inner 2 turns of L<sub>B2</sub> (Fig. 28.5.7), with its dimension (D<sub>1</sub>) being designed to balance the HD<sub>3</sub> and PA efficiency. Simulations suggest that D<sub>1</sub>=100 $\mu$ m aids balancing HD<sub>3</sub> (-47 dBm) and PA efficiency (30.6%) at  $P_{out}=0$ dBm. Comparing with no L<sub>n</sub>C<sub>n</sub>, HD<sub>3</sub> is rejected by 19dB more (Fig. 28.5.3-right). The passband loss rises only 0.5dB, since the magnetic coupling between L<sub>B1</sub> and L<sub>B2</sub> is dominated by their outer turns. In sleep mode, the shared gate bias ( $V_{BIAS}$ ) of the VCO (M<sub>1,2</sub>) and PA (M<sub>3,4</sub>) is switched to  $V_{NEG}$  (-0.17V) optimized for leakage power reduction (measured 4.9nW). The VCO can be switched to the receiver mode by shutting down the PA via SW<sub>PA</sub>.

The VCO is locked by a type-I integer-N PLL with  $V_{DD,PLL}=0.55V$ , and f<sub>REF</sub>=1MHz to support channel selection (Fig. 28.5.4). With a passive XOR gate + an inverter-based buffer, V<sub>x</sub> is rail-to-rail pulses and hence a 0-to-0.55V  $V_{CTRL}$  range for frequency tuning is realized. Unlike [4] that uses 50%-duty-cycle  $\phi_{1,2}$  to drive the master-slave sampling filter (MSF), here  $\phi_1$  utilizes a 10x less duty cycle (i.e. 5%) to reduce the ripple of  $V_{CTRL}$  mainly induced by the clock feedthrough from V<sub>A</sub>, which aids in suppressing the reference spur. It can be quantified by analyzing the 1<sup>st</sup>-harmonic Fourier coefficient of  $V_{CTRL}$ . Meanwhile, the XOR gain varies with  $V_{CTRL}$ , and has a simulated minimum gain improved from ~0.03 (50%  $\phi_1$ ) to 1.21V/rad (5%  $\phi_1$ ), which expands the loop bandwidth to better suppress the VCO PN. During open-loop FM modulation, S<sub>1</sub> is off and S<sub>2</sub> is on. Thus, C<sub>1,2</sub> are in parallel (26pF) to reduce the  $V_{CTRL}$  leakage. The swing of  $\phi_{1,2}$  is bootstrapped to lower the size of S<sub>1,2</sub> (transmission gates). The PLL excluding the VCO dissipates 30 $\mu$ W mainly for the divider.

The TX fabricated in 28nm CMOS satisfies the strict minimal density rules. Under open-loop modulation, the TX shows a system efficiency of 25% at  $P_{out}=0$ dBm and  $V_{DD,EH}=0.2V$ , and 27% at  $P_{out}=3.7$ dBm and  $V_{DD,EH}=0.3V$  (Fig. 28.5.5). A single-tone  $P_{out}$  of 0dBm shows HD<sub>2</sub>=-49.6dBm and HD<sub>3</sub>=-47.4dBm. The BLE spectral mask is met and FSK error is just 2.2%. The frequency drift is <5kHz when delivering a 425us BLE packet. The free-running VCO shows a FOM of 188.4dB at 1MHz offset, and a 1/f<sup>2</sup> corner at 150kHz. The PLL+VCO power efficiency is 0.29mW/GHz, and the largest spurs are -47dBc. The settling time measures 30 at an initial frequency offset of 30MHz. The complete TX has a sleep power of 5.2nW and active area of 0.53mm<sup>2</sup> (Fig. 28.5.7).

Benchmarking with the recent art [4-6] in Fig. 28.5.6, this work achieves a BLE TX  $\mu$ PM that enables ULV operation down to 0.2V, while upholding a high system efficiency and full integration. [5-6] use dual supplies, and their performances have not included the loss, power and area of the power-management units. [7] aims at direct battery operation.

### Acknowledgements:

The authors thank Macau Science and Technology Development Fund (FDCT) - SKL Fund and University of Macau - MYRG-2015-00097-AMSV for financial support.

### References:

- [1] W.-H. Yu, et al., "A 0.18V 382 $\mu$ W Bluetooth Low-Energy (BLE) Receiver with 1.33nW Sleep Power for Energy Harvesting Applications in 28nm CMOS," ISSCC, pp.414-415, Feb. 2017.
- [2] C. C. Li, et al., "A 0.2V Trifilar-Coil DCO with DC-DC Converter in 16nm FinFET CMOS with 188dB FOM, 1.3kHz Resolution, and Frequency Pushing of 38MHz/V for Energy Harvesting Applications," ISSCC, pp.332-333, Feb. 2017.
- [3] A. W. L. Ng, et. al., "A 1-V 24-GHz 17.5-mW Phase-Locked Loop in a 0.18-um CMOS Process," IEEE JSSC, vol. 41, pp. 1236-1244, June 2006.
- [4] L. Kong, et al., "A 2.4GHz 4mW Inductorless RF Synthesizer," ISSCC, pp.450-451, Feb. 2015.
- [5] M. Babaie, et al., "A Fully Integrated Bluetooth Low-Energy Transmitter in 28nm CMOS with 36% System Efficiency at 3 dBm," IEEE JSSC, vol. 51, no. 7, pp. 1547-1565, July 2016.
- [6] X. Peng, et al., "A 2.4-GHz ZigBee Transmitter Using a Function-Reuse Class-F DCO-PA and an ADPLL Achieving 22.6% (14.5%) System Efficiency at 6-dBm (0-dBm)  $P_{out}$ ," IEEE JSSC, vol. 52, pp. 1495-1508, June 2017.
- [7] Y.-H. Liu, et al., "A 3.7mW-RX 4.4mW-TX Fully Integrated Bluetooth Low-Energy/IEEE802.15.4/proprietary SoC with an ADPLL-Based Fast Frequency Offset Compensation in 40nm CMOS," ISSCC, pp. 236-237, Feb. 2015.



Figure 28.5.1: Proposed energy-harvesting BLE TX features a fully-integrated  $\mu$ PM to control the active/sleep power and tolerate variation of the energy source ( $V_{DD,EH}$ ) down to 0.2V.



Figure 28.5.2: Left: CP<sub>1</sub> of the  $\mu$ PM. Its ring-VCO is locked via tunable delay lines to track  $V_{DD,EH}$  variation. Right: Measured internal voltages against  $V_{DD,EH}$  and their startup time.



Figure 28.5.3: Left: ULV VCO with  $M_{1,2}$  always in saturation region even at  $V_{DD,EH}=0.2V$ . Right: ULV Class-E/F<sub>2</sub> PA, with its transformer-embedded  $L_nC_n$  notch to suppress HD<sub>3</sub>.



Figure 28.5.4: A 2.4GHz integer-N type-I PLL with passive XOR and MSSF. The MSSF uses a 5%-duty-cycle  $\Phi_1$  to aid suppressing the reference spurs.  $V_{DD,PLL}=0.55V$  and  $f_{ref}=1MHz$ .



Figure 28.5.5: Upper: measured  $P_{out}$ , power efficiency and HD<sub>2,3</sub>. Lower: measured modulated output spectrum, FSK error and frequency drift under open-loop operation.

| Parameters                                            | This Work                                   | JSSC'16 [5]                                                 | JSSC'17 [6]                                    | ISSCC'15 [7]                |
|-------------------------------------------------------|---------------------------------------------|-------------------------------------------------------------|------------------------------------------------|-----------------------------|
| Key Techniques                                        | $\mu$ PM + ULV VCO & PA + Type-I Analog PLL | Dual- $V_{DD}$ + Class-E/F <sub>2</sub> PA + LC-DCO + ADPLL | Dual- $V_{DD}$ + Function-Reuse DCO-PA + ADPLL | Class-D PA + LC-DCO + ADPLL |
| CMOS Technology                                       | 28 nm                                       | 28 nm                                                       | 65 nm                                          | 40 nm                       |
| Active Area (mm <sup>2</sup> )                        | 0.53 *                                      | 0.65                                                        | 0.39                                           | 0.6                         |
| O/P Matching Network                                  | Fully On-Chip                               | Fully On-Chip                                               | Partially On-Chip                              | Partially On-Chip           |
| HD <sub>2</sub> /HD <sub>3</sub> @ $P_{out}$ (dBm)    | -49.6/-47.4 @ 0 dBm                         | -50/-47 @ 0 dBm                                             | -43.2/-47.6 @ 0 dBm                            | -49/-53 @ -2 dBm            |
| Modulation Error                                      | 2.2% (GFSK)                                 | 2.7% (GFSK)                                                 | 2.29% (HS-OOPSK)                               | 4.8% (GFSK)                 |
| Supply Voltage (V)                                    | 0.2                                         | 0.5 (DCO)<br>1 (ADPLL & PA)                                 | 0.4 (DCO-PA)<br>0.7 (ADPLL)                    | 1                           |
| TX Power Consump. (mW) @ $P_{out}$                    | 4 @ 0 dBm *                                 | 3.6 @ 0 dBm                                                 | 4.4 @ 0 dBm                                    | 3.45 @ -2 dBm               |
| TX Power Efficiency (%) @ $P_{out}$                   | 25 @ 0 dBm *                                | 28 @ 0 dBm                                                  | 22.6 @ 0 dBm                                   | 18.3 @ -2 dBm               |
| Sleep Power (nW)                                      | 5.2                                         | N/A                                                         | N/A                                            | N/A                         |
| VCO PN @ 1MHz offset (dBc/Hz)                         | -119                                        | -116 to -117                                                | -116                                           | -110                        |
| VCO FoM @ 1MHz offset (dB)                            | 188.4                                       | 188 to 189                                                  | N/A                                            | 183                         |
| PLL Power Efficiency (mW/GHz)                         | 0.29                                        | 0.57                                                        | N/A                                            | 0.39                        |
| PLL FoM <sup>#</sup> (dB) normalized @ 1MHz $f_{ref}$ | -227.2                                      | -231.6                                                      | N/A                                            | -220.9                      |
| PLL Largest Spurs (dBc)                               | -47                                         | -60                                                         | -42                                            | -38                         |

\* Included a fully-integrated  $\mu$ PM. [5-7] have not included the loss, power and area of the power-management units.

$$\# \text{PLL FoM} = 10\log \left[ \left( \frac{\sigma_{\text{rms}}}{1 \text{ sec}} \right)^2 \cdot \frac{\text{Power}}{1 \text{ mW}} \cdot \frac{f_{\text{REF}}}{1 \text{ MHz}} \right]$$

Figure 28.5.6: Benchmark with the recent art. This work and [5-7] use open-loop modulation.



Figure 28.5.7: Die micrograph of the 28nm CMOS BLE TX and details of transformers. Extra pads are for individual characterization of the μPM, VCO and PA in active and sleep modes.

## 28.6 A -76dBm 7.4nW Wakeup Radio with Automatic Offset Compensation

Jesse Moody, Pouyan Bassirian, Abhishek Roy, Ningxi Liu,  
Stephen Pancrazio, N. Scott Barker, Benton H. Calhoun,  
Steven M. Bowers

University of Virginia, Charlottesville, VA

Event-driven sensor nodes have applications in agriculture, infrastructure, and perimeter monitoring and are characterized by spending the vast majority of their time in an asleep-yet-alert state. In this state, the node must wake to incoming RF wakeup commands from an antenna with minimal dc power, as the total percentage of power in sleep mode dominates if wakeup events are sufficiently infrequent. The RF wakeup receiver (WuRX) is one critical block of the node's asleep-yet-alert state. It must maximize sensitivity with power consumptions of 10nW or less to maximize battery lifetime or even enable battery-less systems that persist on energy harvesting [1-3]. These WuRXs must reliably detect wakeup signals as well as reject false wakeups caused by external interferer signals or noise. Otherwise, booting the full node into its active state when it is not needed can quickly relinquish power savings created by the wakeup radio in its asleep-yet-alert state.

In this work, we present a WuRX that achieves -76dBm sensitivity in the 151.8MHz MURS band and -71dBm sensitivity in the 433MHz ISM band while consuming 7.4nW dc power. This is enabled by several innovations, including a passive envelope detector (ED) that minimizes the input noise-equivalent power (NEP) of the RF front-end, a high-sensitivity amplifier-comparator chain with a fully automated offset-control algorithm that operates even when the RF channel is quiet, a digital correlator to provide additional discrimination from external interferers or wakeup signals, and a low-frequency bandpass IF path to limit noise into the comparator while providing robustness against external interference. Figure 28.6.1 shows the receiver block diagram.

The CMOS RF front-end is co-designed with a discrete tapped capacitor transformer to maximize the signal-to-noise ratio at the output of the ED (Fig. 28.6.2). The achievable passive voltage gain of the transformer is limited by the inductor's shunt conductance, so minimizing the input capacitance of the ED by reducing the number of detector stages enables the use of larger, higher-quality-factor inductors with larger shunt conductance for optimal voltage gain. On the other hand, increasing the number of stages in the detector increases its voltage sensitivity ( $V_{out}/P_{inRF}$ ), output impedance as well as output noise level. Device voltage threshold also enables a tradeoff between input impedance, voltage sensitivity, and bandwidth [4]. Output thermal noise levels of active envelope detectors such as common-source and common-gate architectures degrade considerably at extremely low bias currents (Fig. 28.6.2). Thus, noise performance of passive (Dickson) detectors becomes superior when currents are restricted to less than several hundred nA. Additionally, passive detectors with zero-bias diode-connected transistors have no flicker noise, erasing the tradeoff between flicker-noise corner frequency and input impedance. This ED was optimized for the 151.8MHz band, where a 45-stage, low-threshold voltage-device ED provided minimum NEP of 170fW/Hz in simulation with an overall measured voltage sensitivity from the  $50\Omega$  input to the output of the detector of 15.2mV/nW. To demonstrate the broadband nature of the CMOS chip itself in other bands, a second transformer was designed, implemented, and measured for the 433MHz band, achieving a voltage sensitivity of 6.4mV/nW (Fig. 28.6.2). The 3dB bandwidths of the 151.8MHz and 433MHz transformers are 3MHz and 11MHz respectively.

The IF signal is amplified and digitized through a high-sensitivity ground-referenced baseband amplifier followed by a clocked comparator. The baseband amplifier is a modified cascode amplifier where the input stage is replaced with a common-drain PFET to allow for a ground-referenced amplifier. It takes advantage of ac coupling between the amplifier and comparator as well as a zero-pole pair created by using self-biasing of the gates of the common-gate and load transistors to enable an analog bandpass frequency response with a low-frequency cutoff of <1Hz and a high-frequency cutoff of 0.2kHz, which can be tuned through a digitally controlled capacitor bank ( $C_L$ ) to match the frequency response of the ED while suppressing out-of-band noise (Fig. 28.6.3). One additional benefit of the IF bandpass frequency response is that it suppresses many interferer signals. The amplifier provides high simulated passband gain of 25dB while consuming 2nW.

The clocked-comparator sampler consists of a preamplifier with a pFET input stage with one input referenced to ground and the other input driven by the output of the RF front-end (Fig. 28.6.3). The cross-coupled inverter pair provides regenerative feedback. When CLK=0, the comparator is reset. When CLK=1, the comparator enters into evaluation mode where it samples the incoming signal that gets latched during the rising edge of  $\phi$ . The comparator threshold is controlled

through 6b of fine-grained control bits, which are set by the automatic offset-control loop with minimum step size of -280 $\mu$ V. Three bits of coarse-grained digital control set the range of the comparator threshold. For instance, with a coarse setting of 000, the simulated comparator threshold ranges from -6mV to 35mV.

A fully-integrated automatic offset-control algorithm allows for optimal operation in the presence of on-off-keyed (OOK) RF interferer signals, and it supplies self-calibration to overcome PVT variations. One significant challenge for event-driven WuRXs is that the entirety of the offset control must be accomplished in the absence of an RF signal. For receiver power levels below 10nW, an on-chip RF calibration source also is not feasible, so the offset of the comparator must be set from information available in the RF off state. The compensation loop, shown in Fig. 28.6.4, sets the comparator offset to a level that provides a desired false-positive rate (at the output of the comparator) for a trade-off between low false-wakeup rate (at the output of the correlator) and high sensitivity. In this work, the false-positive rate is set to be 1.5% to achieve a false-wakeup event rate of <1/hr.

An 8b shift-register-based correlator with programmable error tolerance was implemented with subthreshold logic to minimize dc power (Fig. 28.6.4). Given the asymmetry of the incoming signal that is exclusively zeros until a wakeup signal is sent, the error correction tolerates differing levels of false positives and false negatives from the comparator and still issues a wakeup signal. This helps suppress system false wakeups, which otherwise increase at lower threshold offset words. The wakeup signal is two back-to-back 8b OOK codes separated by a half-clock-cycle delay to account for possible phase mismatch between the transmitter and receiver. The on-chip clock consists of a five-stage current-starved ring-oscillator architecture. External voltage biases  $V_{BN}$  and  $V_{BP}$  control the frequency of the clock source from 0.1kHz to 10kHz and draw << 0.1nW.

Wakeup sensitivity of -76dBm, with  $10^3$  probability of missed detection and false-wakeup rate <1/hr is achieved using the full wakeup code including the correlator with no synchronization and a symbol bit rate of 0.2kb/s (Fig. 28.6.5). The offset word is fixed to ensure constant voltage threshold throughout this measurement. Rejection of interferers is observed with -76dBm sensitivity for  $10^3$  missed detection rate at a <30dB carrier to interferer ratio (CIR) constant envelope interference with a 3 MHz offset from the signal. The rectified CW signals produce a dc offset at the output of the envelope detector and thus are blocked by the bandpass response of the baseband amplifier before even reaching the comparator. The limitations on the CIR is sufficiently large dc offsets output of the envelope detector to drive the baseband amplifier into a different biasing region where the small wakeup signal on top of the dc offset no longer has enough gain.

A more challenging form of interference to block is from non-constant envelope interferers with similar RF and IF frequencies as the desired wakeup signal that can pass through both the RF input filter as well as the IF baseband amplifier. These types of interferers will be incident on the comparator and must be blocked either through offset control at the comparator or through digital error correction after the comparator. An example of the WuRX's robustness to such non-constant envelope interferers with automatic offset control enabled is shown in Fig. 28.6.5. A successful -75dBm wakeup signal is observed in a quiet environment. After a -68dBm 0.1 kb/s OOK signal 3MHz away briefly causes an elevated false-positive rate at the comparator, the offset control automatically raises the threshold. Rather than being unusable, the WuRX remains functional the entire time, though at a lower sensitivity, as the final -72dBm wakeup signal shows, with no false wakeups occurring even under such challenging interference conditions. The DC power of the system is 7.3nW and its performance is compared to state-of-the-art WuRXs in Fig 28.6.6, showing 7dB improvement over prior sub-10nW WuRXs. Figure 28.6.7 shows die and PCB photos of the WuRX.

### Acknowledgements:

The authors thank Troy Olsson of the Defense Advanced Research Projects Agency (DARPA) for support under Contract No. HR0011-15-C-0139.

### References:

- [1] H. Jiang et al., "A 4.5nW Wake-Up Radio with -69dBm Sensitivity," ISSCC, pp. 416-417, Feb. 2017.
- [2] N. E. Roberts et al., "A 236nW -56.5dBm-Sensitivity Bluetooth Low-Energy Wakeup Receiver with Energy Harvesting in 65nm CMOS," ISSCC, pp. 450-451, Feb. 2016.
- [3] K. R. Sadagopan, et al., "A 365nW -61.5 dBm Sensitivity, 1.875 cm<sup>2</sup> 2.4 GHz Wake-Up Receiver with Rectifier-Antenna Co-Design for Passive Gain," IEEE RFIC, pp. 180-183, June 2017.
- [4] P. Bassirian, et al., "Analysis of Quadratic Dickson Based Envelope Detectors for IoT Sensor Node Applications," IEEE IMS, pp. 215-218, June 2017.



Figure 28.6.1: Block diagram of the wakeup receiver showing waveforms from RF input through digital wakeup output.



Figure 28.6.2: Comparison between active and passive envelope detectors and measured performance of RF front-end at 151.8MHz and 433MHz.



Figure 28.6.3: Schematics of baseband amplifier and comparator and their simulated gain, noise and offset response.



Figure 28.6.4: Automatic-offset-control algorithm, measurements showing automated control over comparator false-positive rate and correlator schematic.



Figure 28.6.5: Measurements of receiver sensitivity, dc power consumption and automatic-offset response to non-constant envelope interferers.



Figure 28.6.6: Summary of performance and comparison with state-of-the-art.



Figure 28.6.7: Die micrograph and PCB photo of the wakeup receiver.

## 28.7 A 14.5mm<sup>2</sup> 8nW -59.7dBm-Sensitivity Ultrasonic Wake-Up Receiver for Power-, Area-, and Interference-Constrained Applications

Angad Singh Rekhi, Amin Arbabian

Stanford University, Stanford, CA

The next generation of the Internet of Things is envisioned to include unobtrusive, distributed mm-sized nodes capable of sensing and communicating information about their surroundings. Wake-up receivers (WuRXs) – ultra-low-power receivers that monitor their environment for a wake-up signature – are an important part of this vision, as they can extend the lifetime of a wireless node by keeping it asleep until interrogated. The state of the art in WuRXs, measured in terms of sensitivity and power, has recently been advanced by streamlining the signal path to include fewer power-hungry gain stages and instead obtaining the gain at the chip-antenna interface [1,2]. This has led to excellent power-sensitivity performance, but the accompanying increase in size hampers the applicability of these techniques in size-conscious applications, such as surveillance, asset tracking, and ubiquitous sensing. Moreover, size-reduction methods based on RF antenna miniaturization are fundamentally limited by high antenna Qs at low size-to-wavelength ratios [3]; efficiently matching to these antennas requires, in turn, high-Q passives that are generally unavailable at mm-scale.

To overcome this fundamental size-sensitivity tradeoff, this work proposes an ultrasonic (US) WuRX based on a precharged capacitive micromachined US transducer (CMUT), similar to those presented in [4]. Signature detection over US rather than RF allows this work to improve upon prior art by 7.6dB in terms of a combined sensitivity-power-area metric in the following ways: the CMUT-chip interface is naturally high-impedance (~200kΩ), resulting in higher voltage for a given incident power and leading to -59.7dBm sensitivity; the high-impedance, low-frequency environment obviates the need for active gain at RF, allowing for 8nW power consumption; and the small CMUT size (on the order of the US wavelength) and absence of off-chip matching allow the WuRX to be implemented in 14.5mm<sup>2</sup>, which is significantly smaller than the active area (chip + antenna/transducer) of prior RF- and US-based works [5,6]. Using US also makes the system insensitive to RF interference, while the narrowband (NB) CMUT and on-chip signature detector reject acoustic noise and interferers, as shown by measurements. Multiple, partially-overlapping signatures allow hierarchical classes of wake-up, leading to greater flexibility in a network tree protocol. At the system level, unlike RF, US has no EIRP limit in air, enabling greater range at the same intensity by increasing transmitter aperture (ultimately limited by intensity regulations at the WuRX).

Figure 28.7.1 shows the chip block diagram. The CMUT output connects directly to a ripple-cancelling envelope detector (ED), whose DC output is compared to a pseudoresistor ladder-based reference voltage by a low-noise comparator clocked by a relaxation oscillator ( $f_{clk} = 1.344\text{kHz}$ ). The comparator output is clocked into a variable-length detector that looks for one of multiple selectable signatures and outputs a 0.5V wake-up pulse upon signature recognition. The reference ladder is calibrated once, upon startup.

To limit power consumption and area, the chip uses neither closed-loop timing recovery nor a crystal. Meanwhile, the CMUT needs ~Q cycles to ring up, during which time the ED output is not necessarily valid. The accompanying timing issues are solved by oversampling the data. By conditioning on the bit arrival time relative to the last clock pulse, the missed-detection rate (MDR) can be derived as a function of timing uncertainty (drift and jitter), signature length, and sampling rate. As shown in Fig. 28.7.1, sampling the incoming data at 4x the bit rate allows almost full use of the CMUT bandwidth for an MDR of 10<sup>-3</sup>, at timing uncertainties realizable without a crystal. The design of the Schmitt trigger-based relaxation oscillator takes advantage of 4x oversampling by being optimized for low power rather than high accuracy; an 8b capacitor bank allows frequency trimming upon startup.

To further reduce power, the CMUT output is directly fed into a low-power ED, shown in Fig. 28.7.2. One of the challenges posed by operating at such a low carrier frequency (57.7kHz) is the high ED output ripple, which limits the WuRX sensitivity if not addressed. A ripple-cancelling topology is therefore implemented, in which the outputs of a common-source and common-gate stage are combined before being fed into the comparator. Because the linear gain ( $A_{fundamental}$ ) of these stages has opposite sign while the conversion gain ( $A_{conversion}$ ) has the same sign, the carrier amplitude at the output is reduced by ~44dB (simulated) compared to an ED based on a single common-source stage. As shown in Fig. 28.7.2, this

improves the frequency-ripple tradeoff to the point that the ripple becomes smaller than the noise of this stage. Ratiometric biasing in the ED mitigates the effect of PMOS pseudoresistor variation on operating point; the one bias point not set ratiometrically is adjusted with one-bit control to result in the correct sign for  $A_{conversion}$  over process variations.

The output of the ED is compared to a ladder-based reference by a self-timed comparator based on a double-tail latch, a topology chosen for its low headroom and wide input common-mode range (important due to process variation in ED output bias point). As shown in Fig. 28.7.3, to avoid an incomplete comparison, only once the outputs of the comparator are valid are its tail transistors turned off, and it is reset only at the next clock pulse. Because the comparator spends the majority of each clock cycle idling, leakage power is a potential concern; this is addressed by using thick-oxide tail devices, limiting leakage to 290pW.

The comparator output is clocked into a DFF-based detector that compares sequences of every fourth bit (in accord with 4x oversampling) to one of four hard-coded signatures. To enable hierarchical wake-up networks in which security and privacy settings are built-in at the chip level, these signatures are designed to partially overlap: each child signature contains at least one parent signature, so that child nodes cannot be awoken without also waking up parent nodes within range. Proof-of-concept wireless measurements demonstrating a parent-child wake-up scenario, as well as more detail about signature hierarchy, are shown in Fig. 28.7.4.

The proposed WuRX chip was fabricated in TSMC 65nm CMOS GP technology, runs on 0.5V, and measures 1mm×1.5mm. Figure 28.7.5 shows the wireless test setup and results. For each test, incident pressure was determined by combining the CMUT voltage with sensitivity data previously collected using a calibrated microphone (GRAS 40DP); incident power was then calculated by combining pressure with CMUT area. Synchronized BER tests establish operation at nominal sensitivity; unsynchronized MDR tests show sensitivity within +/-0.5 dB of BER results, indicating that 4x oversampling effectively mitigates detection errors due to timing uncertainty. Bandwidth, carrier-to-interferer (CIR), and carrier-to-noise (CNR) measurements indicate robustness of the US WuRX to close CW interferers and NB noise; slight increases in BER over a short CIR range for some values of  $\Delta f$  likely stem from the probabilistic nature of BER measurement at the chosen data sequence length. Long-range test results show wireless functionality in an outdoor setting with ambient noise and a strong 25kHz interferer. Preliminary variability measurements for 20 chips from a single lot, taken electrically at a lower data-rate (62.5b/s) and slightly higher  $V_{DD}$  (0.6V) than nominal, indicate consistent performance (sensitivity:  $\mu = -58.4\text{dBm}$ ,  $\sigma = 1.6\text{dBm}$ ; power:  $\mu = 7.3\text{nW}$ ,  $\sigma = 0.9\text{nW}$ ).

Figure 28.7.6 shows that the proposed WuRX improves upon the state of the art by 7.6dB, has the smallest active area among works that include an antenna/transducer, and is supported by wireless tests and interferer measurements. This work thus demonstrates that using an ultrasonic wake-up medium enables miniaturized, long-range, low-power WuRXs that are competitive with RF-based solutions. Photos of the chip die and precharged CMUT are shown in Fig. 28.7.7.

### Acknowledgments:

The authors thank Prof. B. T. Khuri-Yakub and M.-C. Ho for fabrication and provision of precharged CMUTs, and Mentor Graphics for the use of the Analog FastSPICE (AFS) Platform. This research was conducted with Government support under and awarded by DoD, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a.

### References:

- [1] H. Jiang, et al., "A 4.5nW Wake-Up Radio with -69dBm Sensitivity," *ISSCC*, pp. 416-417, Feb. 2017.
- [2] K. R. Sadagopan, et al., "A 365nW -61.5 dBm Sensitivity, 1.875 cm<sup>2</sup> 2.4 GHz Wake-Up Receiver with Rectifier-Antenna Co-Design for Passive Gain," *IEEE RFIC*, pp. 180-183, 2017.
- [3] H. A. Wheeler, "Fundamental Limitations of Small Antennas," *Proc. IRE*, vol. 35, no. 12, pp. 1479-1484, Dec. 1947.
- [4] M. C. Ho, et al., "Long-Term Measurement Results of Pre-Charged CMUTs with Zero External Bias Operation," *IEEE Int. Ultrason. Symp.*, pp. 89-92, 2012.
- [5] K. Yadav, et al., "A 4.4-μW Wake-Up Receiver Using Ultrasound Data," *IEEE JSSC*, vol. 48, no. 3, pp. 649-660, March 2013.
- [6] H. Fuketa, et al., "A 0.3-V 1-μW Super-Regenerative Ultrasound Wake-Up Receiver With Power Scalability," *IEEE TCAS-II*, vol. 64, pp. 1027-1031, Sept. 2017.



Figure 28.7.1: US WuRX block diagram; precharged CMUT electrical properties; choice of data and sampling rates.



Figure 28.7.2: Hybrid CS-CG ripple-cancelling ED: schematic, ripple reduction, performance under process variation.



Figure 28.7.3: Self-timed comparator is off for most of clock period; thick-oxide tail devices minimize leakage.



Figure 28.7.4: Signature detector looks for one of many hierarchical signatures, as wireless tests demonstrate.



Figure 28.7.5: Wireless tests show US WuRX functionality across signatures and with noise and interferers present.



Figure 28.7.6: Comparison plot and table; latter includes recent works pushing sensitivity, power, and often area.



Figure 28.7.7: Micrograph of proposed chip sitting on a precharged CMUT; simulated chip power-draw breakdown.