

# A Recursive Switched-Capacitor House-of-Cards Power Amplifier

Loai G. Salem, *Student Member, IEEE*, James F. Buckwalter, *Senior Member, IEEE*,  
and Patrick P. Mercier, *Member, IEEE*

**Abstract**—A fully integrated CMOS power amplifier (PA) that efficiently generates high-voltage RF waveforms directly from a 4.8-V supply using only low-voltage thin-oxide transistors is introduced. High-voltage operation is achieved via implicit  $\sim 100\%$  efficient dc–dc conversion enabled by stacking and flying individual class-D PA cells in a House-of-Cards (HoC) topology. By dynamically reconfiguring digitally controlled HoC slices to support different voltage conversion ratios and capacitively coupling their outputs, a Doherty-like high-efficiency backoff profile is achieved without using any magnetic impedance inverter. A test chip, implemented in 65-nm low-power CMOS, operates directly from 4.8 V using only 1.2-V transistors, and attains above 40% battery-to-RF efficiency at both 23-dBm peak power and at 6-dB backoff at 720 MHz. When applying a 10-MHz 16-quadrature amplitude modulation signal, the PA achieves an error vector magnitude of 3.6%-rms without using any predistortion techniques with an average output power and power-added efficiency of 15.7 dBm and 26.5%.

**Index Terms**—Digital power amplifier (PA), Doherty, envelope elimination and restoration, PA, switched capacitor (SC).

## I. INTRODUCTION

DESIGN of battery-connected power amplifiers (PAs) that simultaneously achieve high output power, efficiency, and linearity in scaled CMOS is challenging, in part due to the low ( $\sim 1$  V) breakdown voltage of thin-oxide transistors. Since most modern mobile systems utilize lithium-ion batteries with voltages on the order of  $\sim 4$  V, step-down conversion of the battery voltage via a dc–dc converter is typically required to safely drive scaled CMOS transistors. However, delivering  $>20$  dBm of output power to a  $50\Omega$  antenna requires  $>5$  V peak-to-peak swing, and thus, the large battery voltage must be stepped down to drive thin-oxide CMOS transistors that perform RF waveform amplification, after which the low-voltage RF waveform is transformed back up to a higher voltage via an impedance transformation network in order to drive  $50\Omega$  with sufficient power. Converting voltages down then back up leads to cascaded losses that, in practice, limit achievable battery-to-RF efficiency of CMOS PAs to  $<30\%$  [1]. While techniques

Manuscript received December 12, 2016; revised March 6, 2017 and April 23, 2017; accepted April 28, 2017. Date of publication May 10, 2017; date of current version June 22, 2017. This paper was approved by Guest Editor Robert Bogdan Staszewski. This work was supported by DARPA under Grant D15AP00091. (*Corresponding author: Loai G. Salem*)

L. G. Salem and P. P. Mercier are with the Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA 92093 USA (e-mail: lgsalem@ucsd.edu; pmmercier@ucsd.edu).

J. F. Buckwalter is with the Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA 93106 USA (e-mail: buckwalter@ece.ucsb.edu).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2703781

like transistor stacking can help improve the voltage blocking capability of active power amplifier (PA) elements [2]–[4], such techniques are often employed in conjunction with linear biasing strategies (e.g., class-B) that have inherently poor efficiency.

On the other hand, switched-mode PAs, such as class-D amplifiers, can theoretically achieve high efficiency, and, importantly, leverage CMOS scaling. Since switched-mode PAs in isolation do not support amplitude modulation capabilities as required by modern spectrally efficient communication standards, they are typically implemented in conjunction with techniques to impart amplitude modulation capabilities, such as envelope elimination and restoration (EER) [5]–[7], out-phasing [8], or pulsewidth modulation [9]. Alternative switching approaches, such as direct RF digital-to-analog converters (DACs) [10]–[14], power combining [15], [16], and switched-capacitor (SC) PA [17]–[20], can be used to, in some cases with the aforementioned amplitude modulation techniques, improve efficiency or linearity. However, while switched-mode PAs leverage transistor scaling, cascaded dc–dc converter, PA, and impedance transformation network losses are exacerbated as supply voltages scale downward, making design of efficient, linear, battery-connected CMOS PAs increasingly challenging.

This paper presents the design of a digital PA that, by stacking and flying unit class-D PA cells in an SC House-of-Cards (HoC) topology, enables efficient high output power generation directly from a 4.8-V supply, all while using thin-oxide CMOS transistors. Since the PA is modular and recursively reconfigurable amongst several battery-to-RF voltage ratios, high battery-to-RF efficiency is achieved both at peak and 6-dB backoff power via a voltage-mode Doherty-like capacitive power combining technique.

Initial circuit schematics and measurement results were presented in [21]. In this paper, the proposed HoC topology is introduced and contrasted to prior-art PA approaches in Section II, while Sections III and IV provide the implementation details. Section V presents the detailed measurement results, and Section VI concludes this paper.

## II. SWITCHED-CAPACITOR HOUSE-OF-CARDS POWER AMPLIFIER

As described in Section I, the design of PAs in scaled CMOS processes to provide high output power with high efficiency constitutes multiple challenges related to low transistor breakdown voltages. In this section, we discuss the design evolution of the HoC PA topology to realize high-output voltage



Fig. 1. (a) Conventional class-G operation. An extra dc-dc converter is required to supply the 6-dB backoff. (b) Implicit 100% efficiency dc-dc conversion at 6-dB backoff via the proposed charge-recycling PA stacking.

swing while operating directly from a large supply voltage using scaled CMOS devices. Specifically, a charge-recycling dc-dc conversion scheme is introduced to realize implicit high-efficiency dc-dc conversion directly from a high-voltage input power source. Then, we show how ladders of stacked PA cells can be connected in cascade to provide efficient series power combining without any magnetics. Although described for CMOS processes, the proposed architecture is also suitable for other technologies, such as bipolar junction transistors, metal-semiconductor field-effect transistors, and high-electron-mobility transistors.

#### A. Implicit DC-DC Conversion via Stacked-Amplifier Charge-Recycling

Nonconstant envelope modulation schemes [e.g., quadrature amplitude modulation (QAM) and Orthogonal frequency-division multiplexing (OFDM)] are required to provide better utilization of the available bandwidth in modern communication standards. While constellation-points rearrangement has been illustrated to help reduce the peak-to-average power ratio in present modulation schemes [22], [23], such high peak-to-average power ratio (PAPR) signals still require a PA with high efficiency across a wide dynamic power range. Class-G supply modulation has been demonstrated to achieve high efficiency at backoff by operating a nonlinear PA from multiple supply voltage levels, typically  $V_{in}$  and  $V_{in}/2$ , as determined by the input envelope signal in an EER scheme [7]. For example, as shown in Fig. 1(a), a second peak at 6-dB backoff in the overall PA efficiency is realized by operating the PA from a second supply,  $V_{in}/2$ , when the input signal amplitude (AM) drops below a predetermined threshold. The supply modulator can be implemented using a linear voltage regulator [6], [24], [25], or in a hybrid design that includes a linear regulator in parallel to a switching supply modulator [26], [27]. In either case, the efficiency of the employed dc-dc converter at 6-dB backoff is critical to the overall efficiency of the system. Furthermore, the dc-dc converter implementation typically requires off-chip or large on-chip inductors to achieve high efficiency. For instance,

the dc-dc converter in [28] requires two external inductors (4.7  $\mu$ H and 22 nH) and two external capacitors (0.47  $\mu$ F and 6.8 nF), and occupies  $2.52 \times 2.52$  mm<sup>2</sup> on chip area to realize an 86.2% dc-dc conversion efficiency at 26.3-dBm output power.

As an alternative approach, implicit high-efficiency dc-dc downconversion at 6-dB backoff can be realized without external dc-dc converter by stacking two half-sized class-D PA cells, PA1 and PA2, each half of the PA total conductance, on top of each other while coupling their outputs through a flying capacitor,  $C_{fly}$ , and operating the stack from  $V_{in} = 2V_{DD}$ , as shown in Fig. 1(b). Each PA in the stack delivers half the total output power,  $P_{out} = 2/(\pi^2)V_{DD}^2/R_L$  [29], to the load  $R_L$ . The charge dumped by the top domain,  $q = \int_0^{T/2} I_o/2 \sin(2\pi/T t) dt = TI_o/(2\pi)$ , where  $T$  is the RF carrier period and  $2q$  is the total output charge delivered during half the period, matches the charge absorbed by the bottom domain, thereby the intermediate node  $V_{int}$  is automatically balanced to  $V_{DD}$ . In a practical implementation, a small  $C_{fly}$  value matches the switching phases for PA1 and PA2 and establishes a 2:1 SC dc-dc converter by reusing PA1 and PA2 switches to provide active regulation to  $V_{int}$ . Unlike the class-G dc-dc converter that has to provide the total PA output power, the established 2:1 SC dc-dc sources or sinks only a small delta current due to minimal charge imbalance between the stacked domains, PA1 and PA2.

Fig. 2 shows the switch-level block diagram and operation of the example two-stack PA. The switches are controlled by the PM clock. Fig. 2(a) shows the resulting networks during the phase when the PM clock is high ( $\phi_1$ ) and when the clock is low ( $\phi_2$ ). During  $\phi_1$ , the odd-numbered switches are turned on, connecting the flying capacitor,  $C_{fly}$ , between the midlevel voltage,  $V_{int}$ , and ground. Consequently, capacitors  $C_{fly}$  and  $C_1$  are connected in parallel and charge sharing occurs to balance the voltage across  $C_1$  to  $V_{in}/2$  at steady state. During  $\phi_1$ ,  $R_L$  is ac-coupled to  $V_{int}$  and GND through switches  $s_3$  and  $s_1$  in parallel, while  $C_{fly}$  holds a dc voltage of approximately  $V_{DD}$ . From Fig. 2(b), during  $\phi_1$ , the top PA2 charges the intermediate node  $V_{int}$  by a half sinusoid with amplitude  $I_o/2$ . Therefore,  $V_{int}$  jumps by  $\Delta V \approx (TI_o/(2\pi)(C_1 + C_2 + C_{fly}))$ .



Fig. 2. Example two-stack PA operation from  $V_{in} = 2 V_{DD}$ . (a) Switch-level block diagram. (b) Resulting two switched networks when the PM clock is high and low. (c) Differential operation for elimination of  $V_{int}$  capacitance.

In  $\phi_2$ , the even-numbered switches are ON, connecting  $C_{fly}$  in parallel to  $C_2$  in order to balance the voltage across  $C_2$  to  $V_{in}/2$ . At the same time, ac-coupled  $R_L$  is brought up to  $V_{in}$  and  $V_{int}$  through switches  $s_4$  and  $s_2$ . On  $\phi_2$ , the charge  $q = \text{TI}_o/(2\pi)$  stored on the capacitors  $C_1$ ,  $C_2$ , and  $C_{fly}$  during the prior phase is released back to supply PA1, and hence,  $V_{int}$  droops by  $\Delta V$ .

Alternating between the two phases  $\phi_1$  and  $\phi_2$  along with the boundary condition of continuous voltage across the capacitors  $C_1$ ,  $C_2$ , and  $C_{fly}$  during phase switching enforces all capacitors voltages and  $V_{int}$  to reach  $V_{in}/2$  at steady state through the imposed Kirchhoff's voltage law (KVL) equations, irrespective of the initial voltage level [30]. The proposed topology thereby utilizes the switches to perform simultaneous power delivery at both the dc and the RF  $f_o$  components.

The size of the capacitors  $C_1$ ,  $C_2$ , and  $C_{fly}$  determines the amount of voltage ripple,  $\Delta V$ , on  $V_{int}$ . For 10% ripple,  $C_1$ ,  $C_2$ , and  $C_{fly}$  should be assigned equal sizes, i.e., one third of the total on-chip capacitance of  $10 \times \text{TI}_o/(2\pi V_{DD})$ . In order to reduce the amount of required capacitance, an ac virtual ground is created at  $V_{int}$  in Fig. 2(c) by tying together the  $V_{int}$  nodes of two 2-stack PAs and driving them in opposite

phases. Through the established differential operation, the current dumped by PA2— into  $V_{int}$  cancels the current drawn by PA1 during  $\phi_1$ , and vice versa in  $\phi_2$ , and hence, the required total capacitance for dc balancing is nearly zero. Practically,  $C_1$  and  $C_2$  should still be large enough to decouple the required gate-drive charge only during the brief nonoverlap time between  $\phi_1$  and  $\phi_2$ , e.g.,  $C_1 = C_2 \approx C_G$ , where  $C_G$  is the total gate capacitance of PA1 or PA2. This decoupling capacitance is typically implemented using thin-oxide gate capacitance. Unlike the power switch that is typically implemented using multiple parallel fingers with large area overhead for drain and source regions, the MOS capacitor can be implemented using a single transistor finger of almost equal width and length and, therefore, in a denser manner. The parasitic top/bottom capacitors of the required decoupling capacitance are at a fixed voltage level relative to the ground and, therefore, do not result in parasitic switching losses. On the other hand,  $C_{fly}$  should be set, such that  $1/(\omega_o C_{fly}) < 2R_{on}$ , where  $R_{on}$  is the total equivalent output resistance  $R_{out}$  of the PA, for phase-aligned ac operation. It is important to note that the KVL equations are underconstrained when stacking the two PAs without  $C_{fly}$ . In other words, there are too few links in the directed graph of the switched network in Fig. 2 to provide a single unique solution for  $V_{int}$ . In order to establish a properly posed switched topology,  $C_{fly}$  is employed in Fig. 2 to enforce a unique solution for  $V_{int}$ .

The two-stack differential PA topology provides multiple advantages for scaled CMOS technologies as compared with the representative class-G system when operating at 6-dB backoff, as shown in Fig. 1(a). First, the proposed differential topology provides the required supply,  $V_{int} = V_{DD}$ , for 6-dB backoff without any extra dc–dc converter. The stacked topology also enables powering the PA cells from a 2- $V_{DD}$  input without violating the employed thin-oxide switches breakdown voltage.<sup>1</sup> In addition, the stacked PA does not suffer from cascaded losses at 6-dB backoff due to a dc–dc converter in series with a PA as in conventional class-G PA approaches (e.g., the efficiency in Fig. 1(a) is  $\eta = \eta_{dc-dc}\eta_{dc-ac}$  at 6-dB backoff). Instead, the efficiency of the two-stack PA becomes  $\eta_{dc-ac} = (1 + R_{on}/R_L)^{-1}$ , which approaches 100%. Second, the implicit high-efficiency switching dc–dc conversion implemented through stacking the two PA slices does not produce spurious output noise, even with the inherent 2:1 SC, where it operates at the carrier frequency  $f_o$ . On the other hand, most PAs operated from explicit dc–dc converters produce spurs at the fundamental switching frequency of the dc–dc converter and its harmonics. Passive filtering and postregulation through cascaded high power-supply-rejection-ratio linear regulators are required to circumvent the switching products from reaching the PA output, increasing cost and reducing efficiency of the overall class-G solution. Finally, by stacking two PA cells, the current drawn from the supply and ground grids is also reduced by a factor of 2 over the current drawn when PA cells are operated in parallel. This

<sup>1</sup>A startup circuit is required to ensure that the employed devices are not overstressed during power ON. This is typically applied in high-voltage dc–dc converters via inrush current limiting circuitry.



Fig. 3. Implicit dc-dc conversion via charge recycling. The same total dc capacitance, i.e.,  $C_1 + C_2$  in Fig. 2, can be equally divided between the  $N - 1$  intermediate nodes for the same ripple amplitude as the two-stack PA, in the case of a nondifferential operation, similar explanation for  $C_{\text{fly}}$ .

reduces the off-chip supply decoupling tree size by two times. More importantly, the lower input current  $I_{\text{in}}$  drawn by the stacked PA, in Fig. 1(b), results in four times lower  $I_{\text{in}}^2 R$  loss in the PA  $V_{\text{in}}$  power supply, as will be discussed shortly, where  $R$  originates from the power transistors, filter elements, and interconnections in the  $V_{\text{in}}$  dc-dc regulator.

The proposed implicit charge-recycling dc-dc conversion can be generalized to realize  $(2/\pi) V_{\text{DD}}$  output voltage amplitude from  $V_{\text{in}} = NV_{\text{DD}}$  using  $V_{\text{DD}}$ -rated thin-oxide devices. Instead of stepping the input battery voltage  $V_{\text{in}}$  down by  $N:1$  through a lossy and bulky dc-dc converter, the proposed approach, as shown in Fig. 3, slices a nominal PA, with conductance  $G_{\text{on}}$  for a given output current driving capability  $i_o$ , into  $N$  PA cells, each with conductance  $G_{\text{on}}/N$ . Then, the approach *stacks* the  $N$  PA cells, each operating at the nominal process voltage  $V_{\text{DD}}$ , while the entire stack is powered from  $NV_{\text{DD}}$ , such that the charge *discarded* by the topmost PA cell trickles down through the  $N$ -PA stack to be recycled at each level, achieving  $\sim 100\%$  dc-dc efficiency. The  $N - 1$  flying-capacitor ladder is employed to enforce phase alignment among the stacked  $N$  PA cells and establish a properly posed topology. In fact, such a ladder with the switches of the PA cells forms an  $N:1$  SC ladder dc-dc converter, which provides active regulation to the interdomain nodes, as in Fig. 3. Interestingly, the same total dc capacitance, i.e.,  $C_1 + C_2$  in Fig. 2, can be equally divided between the  $N - 1$  intermediate nodes in Fig. 3 for the same voltage ripple amplitude as the two-stack PA, in the case of a nondifferential operation. In addition,  $C_{\text{fly}}$  in Fig. 2 is equally divided among the employed  $N - 1$  capacitors in Fig. 3, to realize the same relative alignment, given that the conductance of each PA cell is reduced by a factor of  $N$ .

The number of stacked amplifiers  $N$  can be reduced to maintain the target  $P_{\text{out}}$  level as the battery voltage drops over time. On the other hand, if an external voltage regulator is employed to provide a fixed PA  $V_{\text{in}}$ , the proposed  $N$  stacked-PA approach enables a significant boost in the regulator dc-dc efficiency. It is important to note that the higher the dc-dc conversion ratio from the input battery voltage, the larger the dc-dc inherent losses [32]. Furthermore, it is preferred to deliver the required PA input dc power at higher voltage levels  $V_{\text{in}} = N \times V_{\text{DD}}$ , since the resulting  $N$  times lower current densities enable smaller loss and higher dc-dc regulator efficiency. Fig. 4(a) shows the simulated dc-dc efficiency of a buck-boost converter optimized to supply a peak  $P_{\text{out}} = 23$  dBm at 1.2-, 2.4-, and 3.6-V supply voltage levels to a 5-bit digital PA implemented by stacking 1–3 PA cells, respectively, from a 4.8-V battery. As shown, the dc-dc converter achieves 9.1% and 15.4% higher efficiency by delivering the same required  $P_{\text{out}}$  at 3.6 V ( $I_{\text{in}}$ ) instead of 2.4 V ( $2I_{\text{in}}$ ) and 1.2 V ( $3I_{\text{in}}$ ), respectively. Therefore, significant boost in the overall PA efficiency can be realized by reducing the *forgotten* dc-dc conversion loss through PA stacking. Fig. 4(b) shows the resulting overall PA efficiency,  $\eta_{\text{dc-dc}} \times \eta_{\text{drain}}$ , when a 5-bit SC PA is implemented by stacking 1–3 PA cells while powering the entire stack at 1.2, 2.4, and 3.6 V, respectively, from the dc-dc converter in Fig. 4(a). The overall efficiency is enhanced by  $\sim 6.8\%$  and  $\sim 11.9\%$  across  $P_{\text{out}}$  levels down to  $-9$  dB. This results in above 5.8% improvement in the average efficiency of a 9-dB PAPR output modulated signal.

### B. High-Voltage RF Signal Generation in Scaled CMOS Without Magnetics

The low breakdown voltages available with deep submicrometer devices limit the allowed output voltage swings and, hence, the achievable peak output power across a fixed load. To supply the high output power levels required by modern communication standards with thin-oxide devices, contemporary PA design schemes resort to impedance transformation networks [33]–[36], power combining [1], [15], [16], [37], device stacking [2], [4], [38], [39], or an amalgamation of them. Fig. 5 shows representative power combining and device stacking schemes, respectively. The parallel power combining schemes shown in Fig. 5(a) (left) achieves an output voltage amplitude of  $(4/\pi) V_{\text{DD}}$  from  $V_{\text{in}} = 2 V_{\text{DD}}$  by placing two PA cells in parallel, powering them from a dc-dc converter outputting  $\sim V_{\text{DD}}$ , and having each amplifier feed a 1:1 transformer whose secondary winding is connected in series with the other PA secondary, achieving 1:2 impedance transformation. Unfortunately, on-chip transformers consume significant silicon area and suffer from high losses due to low-resistivity substrate and thin metal and dielectric layers in baseline CMOS [16], [36], [40].

An alternative means to generate high output power using thin-oxide devices is to perform series power combining by transistor stacking. Essentially, such schemes dc-connect transistors in series while engineering the RF swing at the gate of each stacked transistor on top of the input common source to ensure that all the transistors are operating in the



Fig. 4. (a) Buck-boost dc-dc converter efficiency enhancement by delivering the same required  $P_{out}$  at higher voltage levels through PA stacking. The buck-boost converter is implemented using Coilcraft 220-nH ceramic inductor HA4033 [50] and 0.25- $\mu m$  5-V power transistors. (b) Overall PA efficiency when a 5-bit SC PA of  $Q_L = 0.89$  is implemented by stacking 1–3 PA cells while operating the entire stack from 1.2, 2.4, and 3.6 V, respectively, through the input dc-dc converter in Fig. 4(a).

safe region (i.e., without exceeding  $V_{GS}$  or  $V_{DS}/V_{DG}$  ratings), while producing high output amplitude [Fig. 5(a) (right)], as in stacked-FET PA [2]–[4] and high voltage/high power [38], [41]. Unfortunately, such schemes typically require an array of input and output matching networks (often harmonically tuned) that occupy large area, degrade the overall PA efficiency, and generally limit the operational bandwidth. More importantly, such schemes require postfabrication trimming of the gate networks to ensure proper in-phase voltage swing at the gate and drain to avoid device breakdown. While differential topologies can create a virtual ac ground at the gate bias points, most demonstrated prototypes in prior art assume external, up to  $N$ , ideal bias sources using lab equipment, while generating the required low-impedance high-fidelity bias voltage on chip, as in real products, is extremely [31], [42].

The stacked-transistor concept can be extended to a digital cascaded structure, as shown in Fig. 5(b). Stacking two devices practically requires an extra high-fidelity dc-dc source, as



Fig. 5. Prior schemes to realize high RF power using thin-oxide devices. (a) Parallel and series power combining schemes. (b) Digital device-stacked PAs.

shown in Fig. 5(b) (left). Stacking more devices is challenging. For example, the upper two switches in a four stack cascode [Fig. 5(b) (right)] have to be operated from level-shifted 180°-phase square PM signals  $\phi_h$  and  $\phi_{2h}$  with  $V_{DD}$  and 2- $V_{DD}$  swings, respectively, in order not to violate the transistor oxide breakdown during the on-phase while not exceeding a  $V_{DS}$  of  $V_{DD}$  in the off-phase. This requires perfect alignment, given the finite rise/fall times, between the PM input  $\phi$  and the level-shifted and out-of-phase PM clocks  $\phi_h$  and  $\phi_{2h}$ , not to mention the complexity of generating a 2- $V_{DD}$  swing drive signal. Stacking  $N$  devices to realize  $NV_{DD}$ -rating cascaded switch requires  $N - 1$  high-fidelity biasing levels at  $nV_{DD}$  with progressively increasing gate swings, which significantly increases the cost of the solution.

To generate high RF voltages using only scaled thin-oxide CMOS transistors, an HoC topology is proposed that builds upon the stacked class-D concept presented earlier, and is inspired in part by multi-level dc-ac power inverters. An example two-stack HoC is shown in Fig. 6. To achieve an amplitude of  $4/\pi V_{DD}$ , the supply and the GND of a third PA cell, PA3, are switched with respect to the power source  $V_{in}$  rails through PA2 and PA1 switches, respectively, to provide voltage addition of the initial (PA1 and PA2) ladder and PA3 outputs. The proposed topology essentially arranges the comprising PA cells in a HoC topology, where PA3 acts as a “flying domain” [43]. During  $\phi_1$  ( $\phi_2$ ), in Fig. 6(b), the odd (even) numbered switches are on, and hence,  $R_L$  is connected to GND ( $V_{in}$ ).

The equivalent output resistance,  $R_{out}$ , of such a PA is  $2R_{on}$ , where  $R_{on}$  is the ON-resistance of switches  $s_1$ ,  $s_5$ ,  $s_4$ , and  $s_6$ . The charge delivered to the output load  $R_L$  does not pass through the inner switches  $s_2$  and  $s_3$ , which are only used to balance capacitors  $C_1$  and  $C_2$  using  $C_{fly}$  during



Fig. 6. (a) Example two-stack HoC PA. (b) Resulting phases when  $\phi$  is high and low.

transients. As a result, switches  $s_2$  and  $s_3$  can be minimal size. Essentially, switches ( $s_1$  and  $s_4$ ) and ( $s_5$  and  $s_6$ ) form two class-D PA cells connected in cascade, where both handle the total output current,  $i_o$ , and are therefore termed ac PA cells. Through switches  $s_2$  and  $s_3$ ,  $v_1$  and  $v_2$  are never left floating, unlike what would occur in a conventional cascaded switcher, and hence the proposed HoC topology guarantees reliable operation without exceeding any device voltage rating, all in a self-contained solution without any bias circuitry or added complexity.

The proposed HoC topology can be generalized to simultaneously realize a  $1:N$  voltage step-up ratio and  $N$  PA power combining. The proposed topology realizes  $NV_{DD}$  swing from  $N$  ac  $V_{DD}$ -rated PA cells by flying an entire  $N - i$ -stacked-PA ladder through the switches of a prior  $N - i + 1$ -stacked-PA, with the  $N - i$ -ladder input gates clamped to the intermediate nodes of the prior ladder, recursively, until the resulting lower stack PA is a single PA cell, as shown in Fig. 7(a) for  $N = 3$ . The commutation of the switches permits the addition of the voltage swings of the  $N$  ac PA cells (i.e., voltage-domain combining). In addition, each flying PA-ladder provides automatic dc voltage balancing of the stacked domains of the prior ladder. Each clamping capacitor, dc or flying, is automatically balanced to  $\sim V_{DD}$  at steady state. With the proposed gate connection, the cascaded PA-ladders, in Fig. 7(a), are switched in a domino falling fashion with the annotated transient states of the intermediate nodes. As a result, the voltage swing at the gate and drain of each switch is perfectly aligned through the clamping capacitors to guarantee safe operation in a robust digital manner [Fig. 7(b)].



Fig. 7. (a) Example three-stack HoC PA. (b) Fundamental ac PA cells and their gate/drain voltages to enable safe, aligned operation through the clamping capacitors while performing series power combining.

### III. RECURSIVE HOC AMPLIFIER ARCHITECTURE

#### A. HoC Digital Power Amplifier Linearization

Among various digital linearization techniques [6]–[10], [44], the SC PA architecture [17], [18] has superior efficiency and linearity. A similar approach is followed in this paper to linearize the nonlinear HoC PA. Fig. 8(a) shows the block diagram of the implemented two-stack by two-cascade recursive HoC PA powered directly at  $V_{in} = 4.8$  V using  $V_{DD} = 1.2$  V thin-oxide transistors in 65 nm. The input baseband signal is oversampled and raised-cosine filtered using a DSP to generate the in-phase (I) and quadrature (Q) signals. Using a CORDIC algorithm, the digital I and Q signals are converted into a 5-bit envelope ( $A[4:0]$ ) and phase ( $\phi$ ) components. A square carrier at  $f_o$  is phase modulated by the produced phase signal through a mixer.

The generated PM clock is used to drive 16 PA slices, each sized to have conductance  $G_{on}/16$ , and each implementing a two-stack two-cascade HoC PA. As shown in Fig. 8(b), six  $V_{DD}$ -rated class D PA cells are used to implement each HoC slice. Three class-D cells are arranged in a two-cascade HoC topology to establish two  $2 V_{DD}$  swing PAs: HoC1 and HoC2, which are then stacked on top of each other to block  $V_{in} = 4 V_{DD}$ . The generated constant-envelope PM clock between (GND and  $V_{DD}$ ) is level shifted through a star-connected shifter, discussed later, to in-phase clocks switching between ( $V_{DD}$  and  $2 V_{DD}$ ), ( $2 V_{DD}$  and  $3 V_{DD}$ ), and ( $3 V_{DD}$  and  $4 V_{DD}$ ). The 16 PA slices share the same intermediate dc nodes ( $V_{int1}$ ,  $V_{int2}$ , and  $V_{int3}$ ). The output of each two-cascade HoC PA of the 32 (i.e., 16 slices, two 2-cascade HoC PAs each) is coupled to the output  $V_{out}$  through a  $C_c/32$  capacitor, which forms a 4-bit (16 slices) unary-sized RF-DAC.



Fig. 8. (a) Block diagram of a single-ended HoC; actual implementation is differential. (b) Top-level schematic of a single HoC Slice; actual slice is differential.

Based on the required envelope amplitude, fetched at the sample rate, the 16 HoC slices are selectively enabled through a 4-bit thermometer decoder to switch the bottom plates of the unary-sized MIM capacitor array, whose total capacitance is  $C_c = 25 \text{ pF}$ , at  $f_o$  and with a voltage swing of  $2V_{DD}$ . On the other hand, bit  $A[4]$  of the envelope is employed to set the internal voltage gain of each HoC slice to one of two possible values 1:1 or 1:2, as will be discussed later in this section. Therefore, in total, the HoC PA achieves 5-bit amplitude resolution capability. At peak power, all the HoC slices are actively switching. At backoff, the slices are gradually deactivated by connecting the bottom plate of each slice  $C_c/32$  capacitors to GND to  $V_{int2}$ . Unlike the binary-weighted RF-DAC approach, the use of unary-sized amplifier cells enables better differential nonlinearity (DNL) and integral nonlinearity (INL) performance. In addition, it reduces the glitches at the sampling instants where only one HoC slice is deactivated/activated as the digital envelope changes by one LSB, further improving the linearity and dynamic performance of the overall PA.

An output inductive bandpass filter is used to resonate with  $C_c$  and establishes an  $LC$  impedance transformation network. As shown in Fig. 8(a), a two-stage  $LC$  impedance transformation network is employed to transform the  $25\text{-}\Omega$  load resistance ( $50\text{-}\Omega$  antenna through a balun) to  $10 \text{ }\Omega$  (i.e., an impedance transformation 1:2.5) to generate the desired 23-dBm total output power. Thus, each  $LC$  matching stage should provide  $\sqrt{2.5}$  of impedance transformation ratio for maximum bandwidth. However, the first  $LC$  stage is designed to provide an impedance transformation ratio of  $\sim 1.8$ , which is a little larger than  $\sqrt{2.5}$  for lower DAC charge-sharing loss while maintaining a reasonable bandwidth, as will be discussed. Therefore, the required loaded quality  $Q_l$  of the first  $LC$  matching stage is  $\sim 0.89$ , which sets the value of  $C_c$  as  $25 \text{ pF}$  at  $0.72 \text{ GHz}$  for the white-space mobile market.

*1) HoC RF-DAC Drain Efficiency:* Modulating the output amplitude by controlling the number of actively switching PA slices essentially resembles a controllable capacitive voltage divider to a constant-envelope  $2V_{DD}$  square wave, as shown



Fig. 9. Equivalent circuit of the implemented SC HoC PA.

in Fig. 9. Here,  $K$  is the number of unary-sized slices to enable  $\log_2(K)$ -bit resolution. As the envelope code,  $i$ , is increased, more capacitors are switched between GND and  $2V_{DD}$  through an  $iG_{on}/K$  conductive path, while  $K - i$  capacitors are statically pulled down through a  $(K - i)G_{on}/K$  path. Since the input port of the employed matching network is inductive, the matching can be considered as high-impedance during the fast transition of the input square signal. Therefore, the output voltage  $V_{out}$  is determined by the voltage divider in Fig. 9 as

$$V_{out} = \frac{2}{\pi} \frac{i}{K} 2 \times V_{DD}. \quad (1)$$

As readily seen, the series combination of the capacitor array  $i(K - i)/K^2 C_c$  must be charged and discharged once per the RF cycle, and thus, the array charge-sharing loss  $P_{cs}$  is

$$P_{cs} = \frac{i(K - i)}{K^2} C_c (2 \times V_{DD})^2 f_o. \quad (2)$$

By employing a series inductive reactance, the series capacitive reactance  $1/(\omega_o C_c)$  can be canceled at  $f_o$ , thereby enabling a significant reduction in the employed  $C_c$  to an extent dependent on the unloaded quality of the employed inductor,  $Q_{ind}$ . Through a larger inductance  $L$  at a given  $R_L$ , a higher loaded quality factor  $Q_l = \omega_o L / R_L = 1/(\omega_o C R_L)$  can be realized. This results in a smaller array capacitance, and hence,  $P_{cs}$  can be reduced, as demonstrated in [17]

$$\eta_{drain} = \left( 1 + \frac{\pi}{4} \frac{(K - i)}{i} \frac{1}{Q_l} \right)^{-1}. \quad (3)$$



Fig. 10. Drain efficiency of a conventional 5-bit SC PA, a conventional Class-G SC PA with two supplies [ $V_{DD}$  and  $2V_{DD}$ , see (6)], the proposed HoC PA with class-G amplitude coding, and the proposed HoC PA with Doherty coding [see (5)], all at  $Q_l$  of 0.5.

Fig. 10 shows the drain efficiency of an SC PA with an array size of 32 unit capacitors. As shown, with a reasonable on-chip  $Q_{ind}$  of 10–15, the SC PA efficiency falls by 60% at 6-dB backoff. Techniques are thus required to enhance efficiency at backoff.

### B. Magnetic-Less Implicit Class-G Swapping Doherty for High Average Efficiency

To realize high efficiency at backoff in a fully integrated, magnetic-less, and reconfigurable approach, the proposed PA can reconfigure each PA slice, containing two stacked HoC cells [HoC1 and HoC2 in Fig. 11(a) (left)] with  $2V_{DD}$  output swings, into a stack of four class-D PA cells whose outputs are capacitively coupled to provide  $V_{DD}$  output swings. In this manner, the charge-sharing losses of the capacitor array,  $P_{cs}$  in (2), can be scaled by the same factor as the output power at 6-dB backoff (i.e., four times), and hence, the HoC PA realizes a second efficiency peak at 6-dB backoff that matches the peak  $P_{out}$  efficiency, as shown in Fig. 10 (class-G-like HoC). Since the overall PA supply voltage,  $V_{in} = 4V_{DD}$  is not changed, the reconfigurable HoC amplifier can be considered as a solid-state RF impedance transformer that achieves two voltage transformation ratios, 1:2 and 1:1, as in Fig. 11(b). The available two transformation ratios boost the achievable resolution by 1 bit, and thus,  $0 \leq i \leq 2K$ .

The MSB of the envelope code,  $A[4]$  in Fig. 8(a), is used to set the transformer ratio. The remaining four least significant bits,  $A[3:0]$ , are used to enable fine-grain amplitude resolution through the formed RF-DAC. There are two ways to accomplish fine-grain amplitude modulation: Class-G-like and Doherty-like, which differ in how to utilize the inactive slices.

**1) Class-G-Like HoC Backoff:** At the 1:2 transformation ratio (i.e.,  $A[4] = 1$ ),  $A[3:0]$  can be employed through the decoder in Fig. 8(a) to adapt the number of actively switching slices with  $2V_{DD}$  swings,  $i - K$ , while the remaining  $2K - i$



Fig. 11. (a) Reconfiguring the HoC slice transformation ratio from 1:2 to 1:1 to achieve high efficiency at backoff. (b) Simplified equivalent circuit.

slices (where  $K \leq i \leq 2K$ ) are statically connected low. In this case, the drain efficiency is similar to (3) but replacing  $K$  with  $2K$ . As shown in Fig. 10 (“Class-G-like HoC”), the HoC suffers from a discontinuous efficiency profile near the transition point in between the two transformation ratios, since at the 6-dB backoff point, all the capacitors  $C_c$  in the HoC array are actively switching from an input  $V_{DD}$  voltage swing with zero charge-sharing loss and, therefore, the PA efficiency jumps to the ideal 100% value at the  $-6$  dB code. This resembles the operation of a conventional class-G PA that operates through a 100%-efficiency dc–dc converter that produces the  $-6$  dB  $V_{in}/2$  supply.

**2) Doherty-Like HoC Backoff:** To improve efficiency at backoff, when  $A[4] = 1$ , the  $2K - i$  inactive slices instead switch the bottom plate of their coupling capacitors with a swing of  $V_{DD}$  rather than being static. Essentially, the input signal is amplified through two voltage-mode PA paths, a *main* amplifier path with  $V_{DD}$ -swing and a *peaking* amplifier path with  $2V_{DD}$ -swing, as shown in Fig. 12(a). The two paths are simply combined through a programmable capacitive voltage-divider network to generate amplitudes between  $V_{DD}$  and  $2V_{DD}$ , according to  $A[3:0]$ , and hence, the output voltage becomes

$$V_o = \frac{2}{\pi} \left( \frac{i-K}{K} 2 \times V_{DD} + \frac{(2K-i)}{K} V_{DD} \right) \quad (4)$$



Fig. 12. (a) Equivalent circuit of the HoC while generating amplitudes between the 1:1 (“main”) and 1:2 (“peaking”) transformations ratios. (b) Load-pull characteristics of the HoC for  $K \leq i \leq 2K$ . (c) Normalized voltages and admittances of the *main* and *peaking* amplifiers. (d) Swapping Doherty illustration for maximum area utilization: (1) when the peaking PA is virtually off, (2) when peaking PA acts as an “active load” for the main amplifier, and (3) when peaking PA is fully on,  $P_{out} = P_{max}$ .

for  $K \leq i \leq 2K$ . This way, the  $K$ -capacitor array is charged and discharged through only the amplitude difference between the two amplifiers,  $V_{DD}$ , instead of  $2V_{DD}$  in the prior approach, reducing the charge-sharing losses by four times, and enhancing the efficiency profile between the two ratios to exactly follow a Doherty backoff profile.

The operation, to a great extent, is similar to the two-way Doherty configuration, where capacitive load pull of the *main* amplifier occurs. Rather than treating the two amplifiers in the two-way Doherty as current sources, the *main* and *peaking* amplifiers are employed as voltage sources of different amplitude levels  $V_M = (2K-i)/K \times V_{DD}$  and  $V_P = (i-K)/K \times 2V_{DD}$ , respectively. In the proposed voltage-domain combining, the load admittance, rather than impedance in current-mode Doherty, is gradually lowered once the auxiliary amplifier is ON, as in Fig. 12(b) and (c). Unlike the classical Doherty implementations that disable the peaking amplifier at backoff, wasting silicon area, a “swapping Doherty” architecture is used, where at backoff, the peaking amplifier slices are reconfigured (i.e., swapped) to act as the *main* amplifier, realizing 100% resource utilization, as in Fig. 12(d). The efficiency under such operation

becomes

$$\eta_{\text{drain|Doherty}} = \left( 1 + \frac{\pi}{4} \frac{(i-K)(2K-i)}{i^2} \frac{1}{Q_l} \right)^{-1} \quad (5)$$

for  $K \leq i \leq 2K$ . To the best of our knowledge, the continuous efficiency transition through the second amplitude coding scheme was first noted in [18] in a class-G SC PA.

It is important to note that, the conventional class-G PA with multiple supplies cannot achieve the efficiency profile of the Doherty configuration even with the discussed Doherty amplitude coding. This is since the secondary efficiency peak at 6-dB backoff is reduced by the cascaded losses of the back-off dc–dc converter. By adding the normalized loss incurred for supplying the power of the *main* PA, the efficiency of such approach can be given by

$$\begin{aligned} \eta_{\text{drain}} = & \left( 1 + \frac{K(2K-i)}{i^2} \left( \frac{1}{\eta_{\text{dc-dc}}} - 1 \right) \right. \\ & \left. + \frac{\pi}{4} \frac{(i-K)(2K-i)}{i^2} \frac{1}{Q_l} \right)^{-1}. \end{aligned} \quad (6)$$

On the other hand, through the implicit 100% dc–dc conversion, the HoC topology can realize the exact two-way Doherty efficiency profile, as shown in Fig. 10, without an extra dc–dc converter or any bulky transformer.

### C. Stacked-FET AM–AM and AM–PM Distortion

Unlike typical DACs in mixed-signal applications, SC RF-DACs provide high output power levels and, hence, require large switches to achieve small equivalent ON-state resistances and conduction losses. Unfortunately, the gate and drain parasitic capacitances of the employed switches linearly increase with their widths. Typically, the minimum loss point between the conduction and switching components is such that the switches' ON-conductance  $iG_{\text{on}}/K$  is comparable with the series reactance of the capacitors  $j\omega_o (iC_c)/K$  in the employed DAC. Therefore, not only the capacitor mismatch but, more importantly, the switch ON-resistance mismatch affects the RF-DAC linearity. It is important to note that while it is relatively easy to realize capacitors' size matching within 1% accuracy in CMOS technologies, it is hard to control the transistors' ON-resistance ratios, and hence, the DAC nonlinearity is dominated by the matching between switches.

In the proposed HoC architecture, the total conductance,  $G_{\text{on}}$ , and capacitance,  $C_c$ , seen by the output inductive bandpass filter do not change with the amplitude code,  $i$ , and hence, ideally the PA  $R_{\text{out}}$  is constant and equals  $1/G_{\text{on}}$  irrespective of the code. Unfortunately, the ON-state resistance mismatch between the constituent switches of the RF-DAC results in a code-dependent amplitude and phase distortion, i.e., AM–AM and AM–PM distortions. The resulting DAC nonlinearity can be evaluated from the equivalent circuit in Fig. 9. As previously discussed, to realize the  $i$ th code amplitude,  $i$  capacitors are switched between GND and NV<sub>DD</sub> through the equivalent output conductance of the actively switching  $i$  HoC PA slices,  $i/K \times G_{\text{on}}$ . On the other hand,  $K - i$  capacitors are statically held down through  $K - i$  NMOS switches  $(K - i)/K \times G_{\text{on}}$ . Each switching PA slice comprises pull-up PMOS and pull-down NMOS paths of equivalent ON-resistance of  $R_p$  and  $R_n$ , respectively. Therefore, for a 50% duty cycle input, the equivalent PA slice output resistance is essentially the average of both resistances,  $(R_p + R_n)/2$ . If perfect ON-resistance matching between all the constituent PMOS and NMOS switches is realized, i.e.,  $R_p = R_n = R$ , neither AM–AM nor AM–PM distortion would result, assuming zero capacitor mismatch. Unfortunately, the practical levels of ON-resistance mismatch result in a voltage-division ratio deviation from the ideal value, which causes amplitude code,  $i$ , dependent AM–AM and AM–PM distortions.

Assuming perfect matching between the comprising capacitors, the worst case (peak) AM–AM and AM–PM distortion at each envelope code  $i$  due to the ON-resistance mismatch can be evaluated by assuming each PMOS and NMOS in the actively switching PA slices take on the ON-resistance deviation (e.g.,  $\pm 3\sigma_R$ ), while each of the  $K - i$  pull-down NMOS switches approaches the opposite ON-resistance extreme variation (e.g.,  $\mp 3\sigma_{RN}$ ), simultaneously, under a

Gaussian distribution of the ON-resistance. Thus, the upper bound of the resulting AM–AM and AM–PM distortion at each code  $i$  can be evaluated analytically through the output voltage, amplitude and phase, of the capacitive divider network in Fig. 9. In this case, the equivalent output resistance of the actively switching PA slices becomes  $i/K \times G_{\text{on}}(1 - 3\sigma_R/R)$ , assuming the PMOS and the NMOS switches in each actively switching PA slice deviates in the same direction by  $3\sigma_R$  for worst case calculation, while the static pull-down NMOS switches approach  $(K - i)/K \times G_{\text{on}}(1 + 3\sigma_R/R)$ , where  $\sigma_{RP} = \sigma_{RN}$  for simplicity.

Fig. 13 shows the calculated peak value of the resulted AM–AM and AM–PM at each amplitude code  $i$  for a  $3\sigma_R$  of 25%R, under different values of total PA conductance  $G_{\text{on}}$ . The analytical expression derived through the capacitive voltage divider in Fig. 9 provides AM–AM and AM–PM distortion values within 10% of the schematic simulation results. As shown in Fig. 13, the maximum AM–AM distortion occurs at  $i = 1$  and the peak code  $i = 16$ . When the total PA conductance  $G_{\text{on}}$  is increased ten times, the maximum AM–AM and AM–PM distortions are reduced by 37.6 times and 2.8 times, respectively. Thus, wider PA switches not only enhance the PA drain efficiency, but also improve the PA AM–AM/PM distortion. Since the PA exhibits a mild second-order nonlinearity in amplitude and phase, digital predistortion can be easily employed to realize higher linearity. It is important to note that if the opposite ON-resistance deviation polarity is instead assumed, i.e.,  $i/K \times G_{\text{on}}(1 + 3\sigma_R/R)$  and  $(K - i)/K \times G_{\text{on}}(1 - 3\sigma_R/R)$  in Fig. 9, the illustrated AM–AM and AM–PM nonlinearities in Fig. 13 will tilt with negative slope versus the amplitude code  $i$ . Using MATLAB simulations, Fig. 13(c) shows the resulting spectral regrowth in a 10-MHz 16-QAM signal due to the AM–AM and AM–PM nonlinearities in Fig. 13(b) of a 4-bit DAC in a two-way Doherty-like HoC PA with total  $G_{\text{on}}$  of 1 Ω<sup>-1</sup>. As shown, the AM–PM distortion dominates the out-of-band (OOB) emission and results in ~6-dB shoulder height degradation.

On the other hand,  $N$  times FET-stacking in the proposed PA architecture results in  $\sqrt{N}$  times larger standard deviation  $\sigma_R$  of the implemented switches' ON-resistance under random local variations. As a result, the AM–AM and AM–PM distortions are exacerbated by more than  $\sqrt{N}$  times, according to the model, with device stacking. Fortunately, the proposed PA employs unary decoded architecture and also relies on multiple solid-state transformer ratios to realize higher amplitude resolutions. The simulations shown in Fig. 13 illustrate the worst case deviation in the output amplitude and phase from the ideal value in a unary decoded architecture for a given switch ON-resistance sigma,  $\sigma_R$ . The decoding architecture alters the conductance-step size  $\Delta G$  between two consecutive amplitude codes. It can be shown that the standard deviation of the conductance-step between two consecutive codes,  $\sigma(\Delta G)$ , approaches  $(2^N - 1)^{1/2}\sigma_R/R$  in a binary-weighted DAC in contrast to  $\sigma_R/R$  in a unary decoded converter. On the other hand, increasing the conductance and capacitance segmentation in the employed DAC to realize higher amplitude resolutions comes at increased standard deviation in the conductance or capacitance step size (i.e.,  $\Delta G$  and  $\Delta C$ ).



Fig. 13. Maximum distortion value due to ON-resistance mismatch in a 4-bit DAC with (a) total conductance  $G_{on}$  of  $0.1 \Omega^{-1}$  and (b)  $G_{on}$  of  $1 \Omega^{-1}$ . Here,  $C_c = 25 \text{ pF}$  and  $f_o = 0.72 \text{ GHz}$ . (c) MATLAB-simulated spectrum of a 10-MHz 16-QAM modulated signal generated through a 5-bit RHoC PA utilizing a 4-bit DAC with  $G_{on} = 1 \Omega^{-1}$ , illustrating  $\sim 6\text{-dB}$  spectral regrowth due to the AM-AM and AM-PM nonlinearities in Fig. 13(b).

#### IV. CIRCUIT IMPLEMENTATION

##### A. Recursive HoC Slice Architecture

The proposed PA architecture requires dynamic reconfiguration of individual slices between ratios to achieve



Fig. 14. Loading of a disabled PA cell on an active PA cell, resulting in potential device voltage rating violations.

Doherty-like backoff. It can be challenging to reconfigure without exceeding device ratings or wasting area all while maintaining the same  $R_{out}$  across all reconfiguration states to avoid AM-AM and AM-PM distortion. Perhaps the most straightforward solution would be to implement two parallel HoC amplifiers, each configured for the 1:1 and 1:2 ratios, respectively, and enable or disable one or the other to realize each ratio as in Fig. 11(a). However, three main drawbacks come with this approach. First, the  $V_{DS}$  device rating of the disabled amplifier can be exceeded through the high voltage output amplitude coupled through the output-side capacitor, as shown in Fig. 14. Furthermore, the disabled switches act as diode-connected devices and, hence, load the peaks and valleys of the output amplitude, establishing nonlinear loading and compromising linearity. Third, disabling one of the amplifiers wastes almost half of the silicon area.

To realize the high-efficiency at backoff while maintaining high linearity, a recursive reconfiguration approach is chosen in this paper. Fig. 15(a) shows the switch diagram of the implemented slice architecture used to realize the two reconfigurable transformation ratios, 1:1 and 1:2. Although six  $V_{DD}$ -rated class D PA cells are only technically necessary to implement each two-stack two-cascade HoC slice in Fig. 8(b), 12 cells are used to permit recursive reconfiguration with fixed  $R_{out}$ , without exceeding the device ratings, and without disabling or wasting silicon area. Each recursive HoC slice is implemented through two parallel two-stack two-cascade HoC ladders. The intermediate nodes  $V_{int1}$ ,  $V_{int2}$ , and  $V_{int3}$  are tied together in the two parallel HoC ladders, while the output of each of the four 2-cascade HoC PAs is coupled to  $V_{out}$  via  $C_c/64$  capacitors.

Each two-cascade HoC comprises six switches. Each ac switch is assigned  $G_{on}/2$  to realize an overall  $R_{out}$  of  $R_{on}$ . The dc switch of the two available is allocated  $G_{on}/8$ . The four switches  $s_{31}$ ,  $s_{33}$ ,  $s_{22}$ , and  $s_{24}$  include an extra helper switch, sized to be  $3G_{on}/8$ , to enable fixed  $R_{out}$  value across both ratios. In the 1:2 transformation ratio, all the switches are operated from the input PM clock level shifted to the corresponding stacked/flying domain on-chip, as will be discussed, while the helper switches are disabled. In the 1:1 ratio, as shown in Fig. 15(b), switches  $s_{11}$  and  $s_{31}$  in HoC1 are statically turned on to connect the class-D PA1 permanently between GND and  $V_{int1}$ , while  $s_{51}$  and  $s_{61}$  are operated through



Fig. 15. (a) Recursive architecture of an HoC slice. The output-side stacked two dc capacitors in Fig. 8(b) are not shown for clarity. (b) HoC slice in the 1:1 ratio case.

the PM clock. Similarly, switches ( $s1_3$  and  $s3_3$ ) in  $\text{HoC}3$  ( $s2_2$  and  $s4_2$ ) in  $\text{HoC}2$  and ( $s2_4$  and  $s4_4$ ) in  $\text{HoC}4$  are used to permanently connect  $\text{PA}3$ ,  $\text{PA}2$ , and  $\text{PA}4$  at ( $V_{int2}$  and  $V_{int3}$ ), ( $V_{int1}$  and  $V_{int2}$ ), and ( $V_{int3}$  and  $V_{in}$ ), respectively. This way the four PA cells,  $\text{PA}1$ ,  $\text{PA}2$ ,  $\text{PA}3$ , and  $\text{PA}4$ , are stacked on top of each other, as shown in Fig. 11(a), to provide  $V_{DD}$  output voltage swing while operating from  $V_{in}$ , enabling high efficiency at 6-dB backoff. By statically enabling the helper switches within  $s3_1$ ,  $s3_3$ ,  $s2_2$ , and  $s2_4$  at the 1:1 ratio and disabling them in the 1:2 ratio, a fixed  $R_{out}$  value (equal to  $R_{on}$ ) can be realized across both 1:1 and 1:2 ratios.

#### B. Reconfigurable Class-D PA Cell Design

The 12 class-D PA cells used to implemented an HoC slice in Fig. 15(a) are divided into three categories based on

the required digital conductance programmability: nominal, segmented pull-up, and segmented pull-down. The segmented configurations include an additional pull-up/pull-down helper switch over the nominal cell. Fig. 16(a) shows the schematic implementation of an example segmented pull-down class-D cell. The other cell configurations can be realized in a similar way. In Fig. 16(a), two nonoverlapping clocks,  $\phi_1$  and  $\phi_2$ , are generated from the received level-shifted PM signal through three-transistor inverters [43] with feedback from the opposite phase to realize minimal dead time and eliminate any shoot-through current. Clocks  $\phi_1$  and  $\phi_2$  are provided through a cascaded chain of buffers to drive the gate capacitance of the NMOS  $M_n$  and PMOS  $M_p$  switches. The helper transistor,  $M_h$ , is applied as a static switch through the transformation ratio control bit  $TX$ , level-shifted to the corresponding stacked domain.

The PA conduction rms losses stem from the load current flow through the switches' ON-resistance and, hence, the PA equivalent  $R_{out}$ . The second key loss component of the PA originates from the charging and discharging of the parasitic capacitance, once per the RF cycle, of the constituent power switches, which includes the gate, drain, and body parasitics, along with the capacitors' top and bottom parasitics. Therefore, the total PA loss is set by

$$P_{\text{loss}} = P_{\text{rms}} + P_{\text{switching-transistor}} + P_{\text{switching-cap}}. \quad (7)$$

In order to realize the maximum possible PA power-added efficiency (PAE) at a given carrier frequency  $f_o$ , input voltage  $V_{in}$ , optimum resistance  $R_L$ , and for a given technology, the total PA loss,  $P_{\text{loss}}$ , must be minimized. Fig. 17(a) shows the optimization plots (at 65 °C) of the simulated PA conduction and switching loss components associated with the PA switches, including capacitors' parasitic switching loss, versus the switch size in low-power (LP) 65-nm CMOS. While a wider switch results in a smaller conduction loss,  $P_{\text{rms}}$ , it comes at a higher switching parasitic losses, and hence, the optimal switch width is at the break-even point between both loss components [Fig. 17(a)]. However, as shown in Fig. 17(b), the selected switch size for this design is almost 1.8 times the optimal point size. This is in order to realize an overall  $R_{out}$  of 1 Ω for a single-ended amplifier at 65 °C to enable the PA to deliver above 23 dBm of total  $P_{\text{out}}$  to a load of  $R_L \approx 10 \Omega$  [after impedance transformation per Fig. 8(a)] and to achieve acceptable linearity, given the dominance of the ON-resistance mismatch on the PA distortion, as discussed in Section III-C. Thus, the selected switch sizes are 16 μm for NMOS and 41.6 μm for PMOS in an ac class-D cell of a matched NMOS and PMOS ON-resistance. Fig. 17(c) shows the schematic-simulated peak-amplitude PAE under the optimal switch sizing, of equal conduction and switching losses, versus  $f_o$  for an  $R_L$  of 10 Ω. As shown, the peak PAE degrades with higher  $f_o$  and reaches 44% at  $f_o = 3$  GHz in schematic simulations. Furthermore, the PA peak  $P_{\text{out}}$  gets lower as  $f_o$  is increased, since the optimal switch size, in Fig. 17(c), is reduced, increasing  $R_{out}$ , to realize lower parasitic switching losses.

Fig. 18(a) shows the schematic-simulated PAE (at 65 °C) at the peak amplitude of the designed PA versus  $V_{in}$ . The



Fig. 16. (a) Generic reconfigurable class-D PA cell schematic. (b) Placing each PA cell in a separate deep n-well, where the well is floated to enable a  $2 \times$  reduction in bottom parasitics.

peak  $P_{out}$  and PAE degrade from 22.9 dBm and 66.7% to 17.5 dBm and 51.4% as  $V_{in}$  is reduced from 4.8 to 3 V (typical lithium-ion battery voltage range). On the other hand, as shown in Fig. 18(b), the ON-resistance of the NMOS and PMOS switches remains almost matched within 0.8% accuracy, as  $V_{in}$  decays.

On the other hand, a noninverting buck-boost regulator can be employed, if desired, to provide a fixed  $V_{in}$  of 4.8 V to the RHoC PA to maintain 23-dBm peak output power capability throughout battery voltage decay. Fig. 19 shows the simulated dc–dc efficiency of the buck-boost regulator versus the input lithium-ion battery voltage when supplying the same peak power at different PA input voltage levels of 1.2, 2.4, and 4.8 V (by employing one-, two-, and four-stacked PA-cells architectures). Supplying the required PA power at 4.8 V results in two and four times lower  $I_{in}$ , drawn from the input dc–dc buck-boost converter, as compared with the 2.4 and 1.2 V cases, respectively. As shown, this enables lower conduction losses within the input buck-boost converter and hence higher efficiency in the  $V_{in} = 4.8$  V case. Since the lithium-ion battery voltages remain above 3.85 V for approximately 90% of the battery discharge time, as in Fig. 19, the dc–dc buck-boost converter achieves above 5% and 11% higher dc–dc efficiency at 4.8 V, as compared with PAs operating at 2.4 and 1.2 V. Furthermore, since the dc–dc converter maintains approximately the same  $\Delta\eta_{dc/dc}$  improvement down to 9-dB backoff, as shown in Fig. 4, more than 3.3% and 7.3%, i.e.,  $\Delta\eta_{dc/dc} \times PAE_{avg}$ , improvement in the overall PA average efficiency can be achieved, as compared with the 2.4- and 1.2-V supply cases while generating a 9-dB PAPR modulated signal.

The power switches discharge/charge the top plate of each cell's coupling capacitor,  $C_c/64$ , implemented between M8/M7 with a  $2-fF/\mu m^2$  MIM capacitor. This way the bottom-plate capacitance is tuned out through the inductive bandpass filter, rather than being hard discharged/charged through the PA cell. MIM capacitors instead of the denser MOS are used for  $C_c$  for their higher precision/linearity and, importantly, their high voltage rating of 10 V.

With process scaling, the substrate/drain diode's breakdown voltage gets lower. Stacking NMOS devices with their bodies tied to the p-substrate would cause the topmost NMOS in the stack to block a large drain-to-body voltage (e.g.,  $V_{DB}$  of 4.8 V), exceeding the breakdown voltage in deep-submicrometer CMOS. The implemented class-D cell is instead isolated in a separate deep n-well (DNW) as in Fig. 16(b) so that the substrate p/DNW diode (which has a breakdown voltage on the order of 12 V) blocks the large output voltage instead of the NMOS substrate p/n+ diode. This enables stacking up to ten class-D PA cells for a 12-V maximum output voltage swing. Furthermore, this ensures a fixed threshold voltage  $V_{th}$  for the switches to ensure constant conductance and minimum distortion. To reduce switching losses, the DNW is left floating, while the inner p-well is shorted to its respective flying ground to prevent latch-up. As shown in Fig. 16(b), this effectively places the parasitic capacitors of the p/DNW and the p-well/DNW diodes in series, reducing the bottom-plate well parasitics by a factor of  $\sim 2$ . In order to avoid any potential yield degradation, the DNW bias can be connected to the cell  $V_H$  through a large  $\sim 1\text{ M}\Omega$  resistance, which is much higher than the reactance of the parasitic capacitance of the diodes at  $f_o$ . A similar technique was illustrated in SC dc–dc converters [45]. The cell clamping capacitance [ $C$  in Fig. 15(a)] is implemented in the same DNW using a  $12.5-fF/\mu m^2$  thin-oxide PMOS transistor with a breakdown voltage of 1.5 V. The nonlinearity of such capacitance does not affect the topologically defined steady-state dc voltage across each 1.2 V cell. Each cell comprises 0.6 pF ( $\sim 7.8\text{ }\Omega$  ESR at 65 °C) of clamping capacitance to enable automatic voltage balancing against nonfully differential signals and to provide proper decoupling for the gate drivers of the power switches. The total required clamping capacitance is 230.4 pF in the present differential implementation.

### C. Interfacing Level Shifters

Besides the TX signal that controls the helper switch in a static manner, an extra bit, EN, is employed to clock gate



Fig. 17. Simulated overall loss (conduction and switching) optimization plots versus the switch size and carrier frequency  $f_o$  with  $C_c = 25 \text{ pF}$  and  $R_L = 10 \Omega$  in 65-nm LP CMOS. (a) Conduction and switching loss components at  $f_o = 720 \text{ MHz}$ . (b) Peak-amplitude PAE and  $R_{out}$  versus switch size at  $f_o = 720 \text{ MHz}$ . (c) Optimal peak-amplitude PAE versus  $f_o$ .

(i.e., PM gate) the whole HoC slice to statically hold the slice coupling capacitors low. Therefore, each recursive HoC slice in Fig. 15(a) receives two gain setting bits (EN and TX) to establish three gain states: statically holding  $C_c/16$  down (0, 0), switching with  $V_{DD}$  swing (1, 0), and switching with  $2V_{DD}$



Fig. 18. (a) Simulated peak-amplitude PAE and (b) NMOS and PMOS ON-resistance of the designed RHoC PA versus  $V_{in}$  at  $f_o = 720 \text{ MHz}$ .



Fig. 19. Simulated peak-power dc-dc efficiency of a noninverting buck-boost converter, employed to power the PA at  $V_{in}$ , versus lithium-ion battery voltage. The buck-boost converter is implemented using Coilcraft 220-nH ceramic inductor HA4033 [50] and 0.25- $\mu\text{m}$  5-V power transistors.

swing (1, 1). This requires shifting the voltage levels of the input PM clock and the enable signal of the helper switch TX to the appropriate levels needed by all 12 PA cells. Fig. 20(a) shows the proposed star-connected capacitive level shifter to achieve this. A fork-based clock tree is established through



Fig. 20. (a) Proposed balanced star-connected shifter. (b) Generating the PM clocks for PA1, PA2, PA3, and PA4 in the recursive slice shown in Fig. 15(a).



Fig. 21. Chip micrograph.

the depicted star-connected capacitor connection ( $C_{sh} = 35 \text{ fF}$  using MIM) to distribute balanced in-phase PM signals to the initial four stacked domains in each HoC ladder. The star connection is similar to the capacitor connection in a SC Dickson dc–dc charge pump [46]. Unlike conventional ladder shifters which, due to the series connection of the capacitors, and therefore, the unequal reactances connecting the input clock to the inputs of the stacked domains can have large skew (40 ps in simulation), the proposed approach achieves low skew and requires three times less capacitance. A static latch is used to provide a low-impedance path to balance the voltage across the shifter capacitors and enable robust operation against leakage or any coupled glitches. A 1/2-sized inverter is used in the level shifter to establish a weak feedback in the latch that is easily overridden by the triggering input PM driver, thus reducing the required capacitance. The low,



Fig. 22. Measurement setup.

$V_L$ , and the high,  $V_H$ , supplies of each latch are provided through two consecutive voltage levels from the following list: GND,  $V_{int1}$ ,  $V_{int2}$ ,  $V_{int3}$ , and  $V_{in}$ , as shown in Fig. 20(a). The digital processing circuitry and the employed clock tree of cascaded buffers to distribute the PM signal are supplied from  $V_{int1}$ .

The helper enable signal, TX, can be shifted in a similar manner for each of the four switches  $s_{22}$ ,  $s_{31}$ ,  $s_{24}$ , and  $s_{33}$ , where the star-connected shifter operates at the envelope sample rate. The PM input of the flying cells PA1 and PA3, in Fig. 15(a), is provided through CMOS OR gates between (GND and  $V_{int1}$ ) and ( $V_{int2}$  and  $V_{int3}$ ), while the inputs to PA2 and PA4 are supplied through an AND gate between ( $V_{int1}$  and  $V_{int2}$ ) and ( $V_{int3}$  and  $4V_{DD}$ ), as shown in Fig. 20(b). When TX = 1, at the 1:2 ratio, the gate terminals of (PA1 and PA2) and (PA3 and PA4) are statically connected to  $V_{int1}$  and  $V_{int3}$ , respectively. In the 1:1 ratio, the initial four-stacked PA cells in the odd ladder are statically enabled, connecting PA1 and PA3 between (GND and  $V_{int1}$ ) and ( $V_{int2}$  and  $V_{int3}$ ), respectively, while the PM signals are allowed through the ORs to the gates



Fig. 23. (a) Measured battery-to- $P_{\text{out}}$  PAE, output power ( $P_{\text{out}}$ ), and output voltage amplitude versus the input code of the proposed RHoC PA with 50- $\Omega$  antenna ( $f_0 = 720$  MHz). (b) Measured DNL and INL of the proposed PA.

of PA1 and PA3. A similar operation follows for the even ladder. When the recursive HoC slice is deactivated through the EN signal received from the thermometer decoder in Fig. 8(a), the input PM clock is gated, enabling all the NMOS switches and statically holding the output  $C_c/64$  capacitors low. When reconfiguring between any two of the three states, the lead delay should be balanced by ensuring equal logic depth for the clock propagation in the 1:1 and 1:2 cases, to eliminate any AM-AM/PM distortion.

## V. EXPERIMENTAL RESULTS

The proposed recursive SC HoC PA is implemented in an LP 65-nm bulk CMOS process with nine metal layers.<sup>2</sup> A die photograph is shown in Fig. 21 with the comprising 16 recursive HoC slices annotated as well as the differential three-level H-tree for balanced PM signal distribution. The occupied area is 1.2 mm × 1 mm. However, the design is loosely wired and the combined area of the individual blocks is approximately 0.83 mm × 0.58 mm to realize 5-bit amplitude resolution; higher resolutions can be achieved by further segmenting the same total conductance and capacitance resources to realize finer steps. The chip is directly mounted onto a Rogers 4003C PCB with 50- $\Omega$  transmission lines for the input and output terminals. All PA cells are implemented with thin-oxide 1.2-V transistors, and yet, thanks to the novel stacking and cascading HoC structure, the PA is directly connected to a 4.8-V supply without violating any transistor voltage ratings.

The PA testing setup is shown in Fig. 22. A vector signal generator (Keysight N5182B) generates constant-envelope phase-modulated RF waveforms up to 1 GHz, while an field programmable gate array (FPGA) (Xilinx Spartan 6) generates 32 bit of digital amplitude data (2 bit to set the state of each of the 16 HoC slices) with a sampling rate up to 144 MHz. The differential PA outputs are then combined via an off-chip balun and measured by a spectrum analyzer (Keysight N9020A). The large digital bus from the FPGA to the chip has up to 5 ns of within-bus timing misalignment due to trace length mismatch inside the FPGA and the PCB. This limits the close-in shoulder height of the resulting spectrum to about -38 dBc, according to MATLAB simulations. The generated

<sup>2</sup>LP was chosen due to run availability; better performance could have been achieved in a general purpose process.

AM and PM signals are frequency synchronized through a 10-MHz reference signal, while the PATT trigger signal that marks the start of the frame fetch in the vector signal generator aligns the frame start on the FPGA with a resolution of one sample.

When generating nonmodulated (continuous wave) signals, the PA was measured to generate up to 23 dBm of peak power at the 1:2 ratio while achieving a 40.3% battery-to-RF efficiency, as shown in Fig. 23(a). The accuracy of the off-chip matching components results in ~90-mV voltage amplitude imbalance between the differential PA channels, which, in addition to the amplitude and phase imbalance of the employed balun, serves to degrade the efficiency by 4%–6% due to the nondifferential dc-dc loading. A fully integrated matching can help mitigate the amplitude and phase imbalance issues. At the 1:1 ratio (6-dB backoff), the PA achieves 40.8% efficiency, demonstrating the elimination of cascaded dc-dc losses from the power flow. This is in fact higher than the efficiency at the peak power due in part to the linear and quadratic scaling of the gate-drive and DNW bottom-plate parasitics of the flying domains, respectively.

Thanks to the magnetic-less Doherty-like structure, the PA achieves a nearly flat backoff between the two ratios. This is unlike conventional digital Doherty implementations with high-order transformer magnetics [47], [48] that suffer from 8.2% and 4.9% lower relative efficiency at 6-dB backoff and that are powered from low supply voltages of 3 and 1.5 V, respectively. Even with off-chip baluns and matching networks, a voltage-mode Doherty [49], which built in the same 65-nm LP process to provide 24 dBm peak  $P_{\text{out}}$  at 900 MHz, fails to produce the ideal Doherty backoff performance and, in fact, achieves 11% lower relative efficiency at 6-dB backoff while employing two external supplies 2.4/1.2 V. The PA power-supply loss is not reported in the prior work. However, as previously discussed, delivering the required PA input dc power at higher voltage levels enables higher dc-dc conversion efficiency.

On the other hand, compared with an ideal class-B PA powered by an 80% efficiency dc-dc converter, the proposed PA achieves 8.1% and 24.8% higher efficiency at peak power and 6-dB backoff, respectively, as in Fig. 23(a). As shown, the measured efficiency is in good agreement with an analytical model. Thanks to the topologically defined, KVL constrained



Fig. 24. Measured dynamic characteristics of a 16-QAM signal.

circuit and the unary-sized array, the proposed PA achieves good static linearity results: less than 0.05 LSB DNL, and less than 0.5 LSB INL, as shown in Fig. 23(b).

Fig. 24 shows the results of dynamic tests, where a 10-MHz 16-QAM signal was fed into the PA at a 72-MHz envelope rate. This requires strict timing alignment between the phase and amplitude paths, ideally with subnanosecond resolution. The employed measurement setup afforded an alignment accuracy of only  $\sim 13$  ns, which, while not ideal, was sufficient to, with the excellent linearity of the proposed circuit, achieve an error vector magnitude (EVM) of 3.6%-rms, as shown in Fig. 24 with the constellation diagram (bottom right) and the in-band power spectrum (left), all without employing any digital predistortion. Fig. 25 shows the measured 10-MHz 16-QAM signal transmitted power spectrum characteristics. While achieving  $-31.7$  dBc adjacent channel power ratio (ACPR), the close-in shoulder height in Fig. 25(a) can be further enhanced with a better time alignment between the amplitude code and PM signal. MATLAB simulations of the same 10-MHz 16-QAM modulated signal are shown in Fig. 25(b) to illustrate the influence of the delay between the AM-path and the PM-path on the OOB emission and the ACPR degradation due to the resulting nonideal signal restoration. Specifically, the AM signal is delayed by 0, 2 $T_o$ , 4 $T_o$ , 6 $T_o$ , 8 $T_o$ , and 10 $T_o$  than the PM signal, where  $T_o$  is the RF cycle time  $1/f_o$ . The 10 $T_o$  case corresponds to AM/PM timing misalignment of one envelope sample time. As shown, the shoulder distance initially degrades by 5 dB for each additional 2 $T_o$  delay between the AM and PM paths, gradually subsiding with higher delay misalignment until it reaches approximately 17.5-dB degradation when the delay is one AM sample time. When the delay is 6 $T_o$ , the achievable ACPR is approximately  $-32$  dBc, which suggests that at least 6 $T_o$  delay exists between the AM and PM paths in the measurement setup. Therefore, when taken along with simulations presented in Section III-C, the timing delay between the AM and PM signals dominates the OOB over the PA nonlinearity products.

The PA achieves an average PAE of 26.5% at an average  $P_{out}$  of 15.7dBm while generating the 10-MHz 16-QAM signal. The aliased artifacts in Fig. 25(c) are caused by sampling the amplitude at 72 MHz, and can be reduced further through increasing the sampling frequency and/or using



(a)



(b)



Fig. 25. (a) Measured spectrum, close-in. (b) MATLAB simulated spectrum illustrating the influence of the delay between the AM and PM signals on OOB emission for the employed 16-QAM signal. (c) Measured spectrum, far-out.

a higher order filtering function. A first-/second-order hold digital filter can be employed to reconstruct the continuous-time signal from the discrete samples through linear (or higher order) interpolation instead of holding each sample value for one sample interval (i.e., zero-order hold).

A transient test of a 20-MHz 32-QAM modulated signal performed using a previous version of the developed test setup is shown in Fig. 26. At a 100-MHz envelope rate, the PA responds to 3-bit AM codeword changes (observed to be the largest change in the signal) within 2.5 ns as shown in Fig. 26 (bottom), implying that the maximum bandwidth of

TABLE I  
COMPARISON WITH PRIOR WORK

|                                              | [37]            | [51]          | [52]               | [28]          | [53]            | [54]                    | [55]          | This work             |
|----------------------------------------------|-----------------|---------------|--------------------|---------------|-----------------|-------------------------|---------------|-----------------------|
| <b>PA technique</b>                          | Power combining | Digital polar | Switched capacitor | Env. tracking | Digital Doherty | Class-G Doherty         | Env. tracking | <b>Power Inverter</b> |
| <b>Technology</b>                            | 180nm           | 150nm         | 90nm               | 130nm         | 65nm            | 65nm                    | 130nm         | <b>65nm LP</b>        |
| <b>Frequency [GHz]</b>                       | 1.9             | 0.85/ 1.75    | 2.2                | 1.747         | 3.6             | 3.71                    | 1.747         | <b>0.72</b>           |
| <b>V<sub>BAT</sub> [V]</b>                   | N/A             | 3.3           | N/A                | 3.7           | N/A             | N/A                     | 4             | <b>4.8</b>            |
| <b>PA V<sub>DD</sub> [V]</b>                 | 1.8             | N.R.          | 1.25/ 2.5          | N.R.          | 3               | 1.65 & 3                | N.R.          | N/A                   |
| <b>Pout,max [dBm]</b>                        | 34.5            | 27.5 / 29     | 25.2               | 26.3          | 27.3            | 26.7                    | 26            | <b>23</b>             |
| <b>PAE @ Pout,max [%]</b>                    | 50              | N.R.          | 55.2               | N.R.          | 32.5            | 40.2 (drain efficiency) | 40            | <b>40.3</b>           |
| <b>PAE @ 6dB backoff [%]</b>                 | 27*             | N.R.          | 35.1               | N.R.          | 22              | 37 (drain efficiency)   | 28            | <b>40.8</b>           |
| <b>V<sub>BAT</sub>-to-Pout @ Pout,max[%]</b> | N/A             | 13.2 / 22.2   | N/A                | 39            | N/A             | N/A                     | N/A           | <b>40.3</b>           |
| <b>V<sub>BAT</sub>-to-Pout @ 6dB backoff</b> | N/A             | N.R. / 11.1*  | N/A                | N.R.          | N/A             | N/A                     | N/A           | <b>40.8</b>           |
| <b>INL</b>                                   | N/A             | N/A           | <3 LSB             | N/A           | N/A             | N/A                     | N/A           | <0.5 LSB              |
| <b>DNL</b>                                   | N/A             | N/A           | <0.5 LSB           | N/A           | N/A             | N/A                     | N/A           | <0.09 LSB             |



Fig. 26. Top: measured time-domain output of the proposed PA with 32-QAM 20-MHz OFDM ( $f_o = 720$  MHz). Bottom: measured AM step response for six-step change.

the proposed design is up to 400 MHz. However, we do note that the implementation of the amplitude and phase modulators on-chip at high data rates is challenging and can result in large area and power overheads if striving for such high data rates.

Table I summarizes the results of the proposed PA in contrast to prior art. The recursive HoC PA achieves the highest PAE amongst prior-art battery-connected CMOS PAs at both peak and 6-dB backoff power levels.

## VI. CONCLUSION

This paper has presented a new PA design that utilized stacked and flying class-D cells arranged in an HoC

architecture to facilitate efficient generation of high output power, all while using low-voltage, thin-oxide CMOS transistors. Individual HoC cells were then made fully modular and reconfigurable to support different voltage conversion ratios and thus high efficiency at 6-dB backoff. By capacitively combining the outputs of HoC slices operating at different ratios, the PA could be dynamically configured to deliver high efficiency at intermediate backoff levels, exactly following a Doherty backoff profile, but without requiring any magnetic components. The proposed HoC PA was implemented in 65-nm LP CMOS, operated directly at 4.8 V without any explicit dc–dc converter, and shown to achieve >40% battery-to-RF efficiency at both peak power and 6-dB backoff while enabling linear transmission of >10-MHz 16-QAM signals.

## REFERENCES

- [1] P. Haldi, D. Chowdhury, P. Reynaert, G. Liu, and A. M. Niknejad, "A 5.8 GHz 1 V linear power amplifier using a novel on-chip transformer power combiner in standard 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 5, pp. 1054–1063, May 2008.
- [2] T. Sowlati and D. M. W. Leenaerts, "A 2.4-GHz 0.18- $\mu$ m CMOS self-biased cascode power amplifier," *IEEE J. Solid-State Circuits*, vol. 38, no. 8, pp. 1318–1324, Aug. 2003.
- [3] J. G. McRory, G. G. Rabjohn, and R. H. Johnston, "Transformer coupled stacked FET power amplifiers," *IEEE J. Solid-State Circuits*, vol. 34, no. 2, pp. 157–161, Feb. 1999.
- [4] S. Pornpromlikit, J. Jeong, C. D. Presti, A. Scuderi, and P. M. Asbeck, "A watt-level stacked-FET linear power amplifier in silicon-on-insulator CMOS," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 1, pp. 57–64, Jan. 2010.
- [5] L. R. Kahn, "Single-sideband transmission by envelope elimination and restoration," *Proc. IRE*, vol. 40, no. 7, pp. 803–806, Jul. 1952.
- [6] P. Reynaert and M. S. J. Steyaert, "A 1.75-GHz polar modulated CMOS RF power amplifier for GSM-EDGE," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2598–2608, Dec. 2005.
- [7] J. S. Walling, S. S. Taylor, and D. J. Allstot, "A class-G supply modulator and class-E PA in 130 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 9, pp. 2339–2347, Sep. 2009.
- [8] P. A. Godoy, S. Chung, T. W. Barton, D. J. Perreault, and J. L. Dawson, "A 2.4-GHz, 27-dBm asymmetric multilevel outphasing power amplifier in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 10, pp. 2372–2384, Oct. 2012.
- [9] J. S. Walling *et al.*, "A class-E PA with pulse-width and pulse-position modulation in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1668–1678, Jun. 2009.

- [10] A. Kavousian, D. K. Su, M. Hekmat, A. Shirvani, and B. A. Wooley, "A digitally modulated polar CMOS power amplifier with a 20-MHz channel bandwidth," *IEEE J. Solid-State Circuits*, vol. 43, no. 10, pp. 2251–2258, Oct. 2008.
- [11] C. Lu *et al.*, "A 24.7dBm all-digital RF transmitter for multimode broadband applications in 40nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 332–333.
- [12] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, and J. R. Long, "A wideband 2 × 13-bit all-digital I/Q RF-DAC," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 4, pp. 732–752, Apr. 2014.
- [13] P. E. P. Filho, M. Ingels, P. Wambacq, and J. Craninckx, "A 0.22mm<sup>2</sup> CMOS resistive charge-based direct-launch digital transmitter with -159dBc/Hz out-of-band noise," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan. 2016, pp. 250–252.
- [14] E. Bechthum, G. I. Radulov, J. Braire, G. J. G. M. Geelen, and A. H. M. van Roermund, "A wideband RF mixing-DAC achieving IMD < -82 dBc up to 1.9 GHz," *IEEE J. Solid-State Circuits*, vol. 51, no. 6, pp. 1374–1384, Jun. 2016.
- [15] I. Aoki *et al.*, "A fully-integrated quad-band GSM/GPRS CMOS power amplifier," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2747–2758, Dec. 2008.
- [16] K. H. An *et al.*, "Power-combining transformer techniques for fully-integrated CMOS power amplifiers," *IEEE J. Solid-State Circuits*, vol. 43, no. 5, pp. 1064–1075, May 2008.
- [17] S.-M. Yoo, J. S. Walling, E. C. Woo, B. Jann, and D. J. Allstot, "A switched-capacitor RF power amplifier," *IEEE J. Solid State Circuits*, vol. 46, no. 12, pp. 2977–2987, Dec. 2011.
- [18] S.-M. Yoo *et al.*, "A class-G switched-capacitor RF power amplifier," *IEEE J. Solid-State Circuits*, vol. 48, no. 5, pp. 1212–1224, May 2013.
- [19] H. Jin *et al.*, "Efficient digital quadrature transmitter based on IQ cell sharing," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [20] W. Yuan and J. S. Walling, "A multiphase switched capacitor power amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 5, pp. 1320–1330, May 2017.
- [21] L. G. Salem, J. F. Buckwalter, and P. P. Mercier, "A recursive house-of-cards digital power amplifier employing a  $\lambda/4$ -less Doherty power combiner in 65nm CMOS," in *Proc. 42nd Eur. Solid-State Circuits Conf. ESSCIRC Conf.*, Sep. 2016, pp. 189–192.
- [22] E. W. McCune, "pPSK for bandwidth and energy efficiency," in *Proc. Eur. Microw. Conf.*, Oct. 2013, pp. 569–572.
- [23] E. McCune, "Signal design and figure of merit for green communication links," in *Proc. IEEE Radio Wireless Symp. (RWS)*, Jan. 2017, pp. 22–25.
- [24] D. K. Su and W. J. McFarland, "An IC for linearizing RF power amplifiers using envelope elimination and restoration," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 2252–2258, Dec. 1998.
- [25] F. H. Raab, B. E. Sigmon, R. G. Myers, and R. M. Jackson, "L-band transmitter using Kahn EER technique," *IEEE Trans. Microw. Theory Techn.*, vol. 46, no. 12, pp. 2220–2225, Dec. 1998.
- [26] F. Wang *et al.*, "An improved power-added efficiency 19-dBm hybrid envelope elimination and restoration power amplifier for 802.11g WLAN applications," *IEEE Trans. Microw. Theory Techn.*, vol. 54, no. 12, pp. 4086–4099, Dec. 2006.
- [27] M. Hassan, L. E. Larson, V. W. Leung, and P. M. Asbeck, "A combined series-parallel hybrid envelope amplifier for envelope tracking mobile terminal RF power amplifier applications," *IEEE J. Solid-State Circuits*, vol. 47, no. 5, pp. 1185–1198, May 2012.
- [28] P. Arno, M. Thomas, V. Molata, and T. Jerabek, "17.6 Envelope modulator for multimode transmitters with AC-coupled multilevel regulators," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 296–297.
- [29] S. C. Cripps, *RF Power Amplifiers for Wireless Communications*, 2nd ed. Norwood, MA, USA: Artech House, 2006.
- [30] L. G. Salem and P. P. Mercier, "A recursive switched-capacitor DC-DC converter achieving  $2^N$ -1 ratios with high efficiency over a wide output voltage range," *IEEE J. Solid-State Circuits*, vol. 49, no. 12, pp. 2773–2787, Dec. 2014.
- [31] L. G. Salem, J. Warchall, and P. P. Mercier, "A 100nA-to-2mA successive-approximation digital LDO with PD compensation and sub-LSB duty control achieving a 15.1ns response time at 0.5V," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 340–341.
- [32] D. Wolaver, "Basic constraints from graph theory for dc-to-dc conversion networks," *IEEE Trans. Circuit Theory*, vol. 19, no. 6, pp. 640–648, Nov. 1972.
- [33] A. Afsahi, A. Behzad, and L. E. Larson, "A 65nm CMOS 2.4GHz 31.5dBm power amplifier with a distributed LC power-combining network and improved linearization for WLAN applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2010, pp. 452–453.
- [34] A. Afsahi and L. E. Larson, "Monolithic power-combining techniques for watt-level 2.4-GHz CMOS power amplifiers for WLAN applications," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 3, pp. 1247–1260, Mar. 2013.
- [35] A. M. Niknejad and R. G. Meyer, "Analysis, design, and optimization of spiral inductors and transformers for Si RF ICs," *IEEE J. Solid-State Circuits*, vol. 33, no. 10, pp. 1470–1481, Oct. 1998.
- [36] J. R. Long and M. A. Copeland, "The modeling, characterization, and design of monolithic inductors for silicon RF IC's," *IEEE J. Solid-State Circuits*, vol. 32, no. 3, pp. 357–369, Mar. 1997.
- [37] I. Aoki, S. D. Kee, D. B. Rutledge, and A. Hajimiri, "Fully integrated CMOS power amplifier design using the distributed active-transformer architecture," *IEEE J. Solid-State Circuits*, vol. 37, no. 3, pp. 371–383, Mar. 2002.
- [38] A. K. Ezzeddine and H. C. Huang, "The high voltage/high power FET (HiVP)," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2003, pp. 215–218.
- [39] S. Leuschner, J.-E. Mueller, and H. Klar, "A 1.8GHz wide-band stacked-cascode CMOS power amplifier for WCDMA applications in 65nm standard CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, Jun. 2011, pp. 1–4.
- [40] J. R. Long, "Monolithic transformers for silicon RF IC design," *IEEE J. Solid-State Circuits*, vol. 35, no. 9, pp. 1368–1382, Sep. 2000.
- [41] L. Wu, I. Dettmann, and M. Berroth, "A 900-MHz 29.5-dBm 0.13- $\mu$ m CMOS HiVP power amplifier," *IEEE Trans. Microw. Theory Techn.*, vol. 56, no. 9, pp. 2040–2045, Sep. 2008.
- [42] Y. Lu, Y. Wang, Q. Pan, W.-H. Ki, and C. P. Yue, "A fully-integrated low-dropout regulator with full-spectrum power supply rejection," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 3, pp. 707–716, Mar. 2015.
- [43] L. G. Salem, J. G. Louie, and P. P. Mercier, "Flying-domain DC-DC power conversion," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 2830–2842, Dec. 2016.
- [44] R. Staszewski *et al.*, "Software assisted digital RF processor (DRP) for single-chip GSM radio in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 2, pp. 276–288, Feb. 2010.
- [45] H. P. Le, J. Crossley, S. R. Sanders, and E. Alon, "A sub-ns response fully integrated battery-connected switched-capacitor voltage regulator delivering 0.19W/mm<sup>2</sup> at 73% efficiency," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 372–373.
- [46] J. F. Dickson, "On-chip high-voltage generation in MNOS integrated circuits using an improved voltage multiplier technique," *IEEE J. Solid-State Circuits*, vol. 11, no. 3, pp. 374–378, Jun. 1976.
- [47] S. Hu, S. Kousai, J. S. Park, O. L. Chlieh, and H. Wang, "Design of a transformer-based reconfigurable digital polar Doherty power amplifier fully integrated in bulk CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 5, pp. 1094–1106, May 2015.
- [48] E. Kaymaksut and P. Reynaert, "Dual-mode CMOS Doherty LTE power amplifier with symmetric hybrid transformer," *IEEE J. Solid-State Circuits*, vol. 50, no. 9, pp. 1974–1987, Sep. 2015.
- [49] V. Vorapipat, C. Levy, and P. Asbeck, "A wideband voltage mode Doherty power amplifier," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, May 2016, pp. 266–269.
- [50] Coilcraft. (2012). *Chip Inductors for DOCSIS 3.x*. [Online]. Available: <http://www.coilcraft.com/pdfs/ha4031.pdf>
- [51] T. Nakatani, D. F. Kimball, and P. M. Asbeck, "Multiband and wide dynamic range digital polar transmitter using current-mode class-D CMOS power amplifier," in *Proc. IEEE Compound Semiconductor Integr. Circuit Symp. (CSICS)*, Oct. 2013, pp. 1–4.
- [52] S. M. Yoo, J. S. Walling, E. C. Woo, and D. J. Allstot, "A switched-capacitor power amplifier for EER/polar transmitters," in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2011, pp. 428–430.
- [53] S. Hu, S. Kousai, J. S. Park, O. L. Chlieh, and H. Wang, "A +27.3dBm transformer-based digital Doherty polar power amplifier fully integrated in bulk CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, Jun. 2014, pp. 235–238.

- [54] S. Hu, S. Kousai, and H. Wang, "A broadband CMOS digital power amplifier with hybrid class-G Doherty efficiency enhancement," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [55] S. C. Lee *et al.*, "A hybrid supply modulator with 10dB ET operation dynamic range achieving a PAE of 42.6% at 27.0dBm PA output power," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.



**Loai G. Salem** (S'11) received the B.Sc. degree in electronics and communication engineering from Cairo University, Giza, Egypt, in 2008, and the M.Sc. degree in microelectronics system design from Nile University, Giza, in 2011. He is currently pursuing the Ph.D. degree in electrical and computer engineering with the University of California at San Diego (UC San Diego), La Jolla, CA, USA.

He is currently involved in efficient dc-to-dc/ac power converters using innovative circuits in scaled CMOS. He has authored over 20 publications, a book chapter, one Egyptian patent, and four U.S. patent applications.

Mr. Salem received the 2016–2017 IEEE Solid-State Circuits Society Predoctoral Achievement Award, the 2016 International Solid-State Circuits Conference Analog Devices Outstanding Student Designer Award, and the ECE Departmental Fellowship at UC San Diego. He received the second runner up in the 2012 Massachusetts Institute of Technology Arab Business Plan Competition for a silicon IP startup. He was selected as a Finalist for the Qualcomm Innovation Fellowship in 2015.



**James F. Buckwalter** (S'01–M'06–SM'13) received the Ph.D. degree in electrical engineering from the California Institute of Technology, Pasadena, CA, USA, in 2006.

From 1999 to 2000, he was a Research Scientist with Telcordia Technologies, Morristown, NJ, USA. In 2004, he was with the IBM T. J. Watson Research Center, Yorktown Heights, NY, USA. In 2006, he joined the Faculty of the University of California at San Diego, La Jolla, CA, USA, as an Assistant Professor and was promoted to Associate Professor

in 2012. He is currently a Professor of Electrical and Computer Engineering with the University of California at Santa Barbara, Santa Barbara, CA, USA.

Dr. Buckwalter was the recipient of the 2004 IBM Ph.D. Fellowship, the 2007 Defense Advanced Research Projects Agency Young Faculty Award, the 2011 NSF CAREER Award, and the 2015 IEEE MTT-S Young Engineer Award.



**Patrick P. Mercier** (S'04–M'12) received the B.Sc. degree in electrical and computer engineering from the University of Alberta, Edmonton, AB, Canada, in 2006, and the S.M. and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology, Cambridge, MA, USA, in 2008 and 2012, respectively.

He is currently an Assistant Professor in electrical and computer engineering with the University of California at San Diego (UCSD), La Jolla, CA, USA, where he is also the Co-Director of the Center for Wearable Sensors. He was the Co-Editor of the *Ultra-Low-Power Short Range Radios* (Springer, 2015) and *Power Management Integrated Circuits* (CRC Press, 2016). His current research interests include the design of energy-efficient microsystems, and the design of RF circuits, power converters, and sensor interfaces for miniaturized systems and biomedical applications.

Dr. Mercier received the Natural Sciences and Engineering Council of Canada (NSERC) Julie Payette Fellowship in 2006, the NSERC Postgraduate Scholarships in 2007 and 2009, the Intel Ph.D. Fellowship in 2009, the 2009 IEEE International Solid-State Circuits Conference (ISSCC) Jack Kilby Award for Outstanding Student Paper at ISSCC 2010, the Graduate Teaching Award in Electrical and Computer Engineering at UCSD in 2013, the Hellman Fellowship Award in 2014, the Beckman Young Investigator Award in 2015, the DARPA Young Faculty Award in 2015, and the UCSD Academic Senate Distinguished Teaching Award in 2016. He currently serves as a member of the ISSCC Technical Program Committee (Technology Directions Subcommittee) and an Associate Editor of the *IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS* since 2013. He has served as an Associate Editor of the *IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION* from 2015 to 2017.