

# An Intrinsically Linear Wideband Polar Digital Power Amplifier

Mohsen Hashemi, *Student Member, IEEE*, Yiyu Shen, *Student Member, IEEE*,  
 Mohammadreza Mehrpoo, *Student Member, IEEE*, Morteza S. Alavi, *Member, IEEE*,  
 and Leo C. N. de Vreede, *Senior Member, IEEE*

**Abstract**—This paper presents an intrinsically linear wideband polar digital power amplifier (DPA) operating in semi class-E/F<sub>2</sub> mode. Without using any type of digital pre-distortion (DPD), the proposed architecture achieves high linearity by accurately controlling its AM–AM and AM–PM characteristic curves through nonlinear sizing, overdrive-voltage control, and multiphase RF clocking without compromising the achievable output power or efficiency. Measurement results of the fabricated prototype in 40-nm bulk CMOS show –46 and –40 dBc adjacent channel power ratio (ACPR) for 20- and 40-MHz orthogonal frequency-division multiplexing (OFDM) signals, respectively. The measured error vector magnitudes (EVM) are –36 dB and –33 dB, respectively. Measured results indicate a *P*<sub>SAT</sub>, peak drain efficiency (DE), and power-added efficiency (PAE) of 14.6 dBm, 44%, and 26%, respectively, using a 0.5-V supply for the output stage at 2.2 GHz.

**Index Terms**—Class-E/F<sub>2</sub>, digital power amplifier (DPA), digital pre-distortion (DPD)-less, efficient, linear, multiphase RF clocking, nonlinear sizing, overdrive-voltage control, wideband.

## I. INTRODUCTION

WHEN designing modern wireless digital transmitters (TX), system integration, output power, energy efficiency, and bandwidth are considered to be key parameters. These parameters are highly influenced by the linearity and efficiency of the digital power amplifier (DPA) as the final stage. A DPA consists of an array of tiny sub-power amplifier (PA) cells in which the TX output power is directly controlled by the number of enabled sub-PAs which effectively changes the overall width ( $W_{eff}$ ) of the active devices in the output stage [1]–[7]. Unfortunately, an energy-efficient DPA, normally implemented in class-E, D, or D<sup>-1</sup> [8]–[11], is typically highly nonlinear [12]. In a conventional “linearly sized” switched-mode DPA, as shown in Fig. 1(a), the effective size of the DPA is proportional to the digital amplitude control word (ACW), showing significant nonlinearities in its ACW-AM and ACW-phase-modulated (PM) characteristics. The most widely used approach to correct for these nonlinearities is digital pre-distortion (DPD), normally implemented as lookup tables (LUTs) [1]–[5], [13]–[18].

Manuscript received April 25, 2017; revised July 27, 2017 and July 28, 2017; accepted July 30, 2017. Date of publication September 27, 2017; date of current version November 21, 2017. This work is subsidized by the STW research program SEEDCOM (project number 13315), and the Catrene project EAST (CAT121). This paper was approved by Guest Editor Chih-Ming Hung. (*Corresponding author: Mohsen Hashemi*.)

The authors are with the Delft University of Technology, 2628CD Delft, The Netherlands (e-mail: m.hashemi@tudelft.nl).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2737647

In a typical digital polar TX with LUT-DPDs, the input I/Q data are converted to amplitude  $AM[n] = \sqrt{(I[n]^2 + Q[n]^2)}$  and phase  $\phi[n] = \arctan(Q[n]/I[n])$  and then pre-distorted by two independent LUT-DPDs. At the end of TX chain, the up-converted phase and the amplitude signals are combined by the DPA which implicitly acts as a multiplier. All of these operations are highly nonlinear and result in extensive bandwidth expansion in the amplitude and phase paths. On the other hand, since a DPA operates as an RF-digital-to-analog converter (RF-DAC) [19], it requires a very high sampling rate to attenuate and push the spectral replicas away from the carrier frequency. Therefore, when aiming for large video bandwidths, a very high speed DPD with an effective sampling rate up to 10–20× the bandwidth of input signal is required. Such a high-speed LUT-DPD can consume power up to 5–6× the power consumption of the driver stages when transmitting an orthogonal frequency-division multiplexing (OFDM) signal. For a Cartesian TX, using LUT-DPD is even more complicated since a 2-D LUT is typically required [6], [13], [14], [20] where a polar TX can use two independent 1-D LUTs [3]–[5], [15]–[18]. Hence, while a DPD might be used to linearize the DPA, it would not be an optimal solution, at least not by itself, especially for low output power applications.

To avoid such practical hardware and speed constraints of a DPD, an inherently linear DPA is desirable. Considering this, in [21], the bias point of a class-B PA array is adaptively tuned using feedback from an analog AM replica to eliminate the DPD. In [22] and [23], linear structures are exploited for the modulator at low output power (~1 dBm) at the expense of lower drain efficiency (DE). However, Hashemi *et al.* [24] recently proposed a linear polar DPA with novel linearization techniques to circumvent the DPD without sacrificing the efficiency through nonlinearly sizing sub-PA segments, overdrive voltage tuning, and multiphase RF clocking as shown in Fig. 1(b). To provide a fundamental understanding of the proposed concept, in Section II, the linearity of a class-E DPA is analyzed followed by the description of the proposed linearization techniques in Section III. In Section IV, the details of the implementation of main blocks are described, and the measurement results and the conclusion are presented in Sections V and VI, respectively.

## II. CLASS-E DPA LINEARITY ANALYSIS

Switched-mode PAs can be driven directly by digital signals. Therefore, they are logical candidates to be used in a



Fig. 1. (a) Conventional polar DPA with linear sizing resulting in ACW-AM and ACW-PM distortion. (b) Proposed polar DPA with nonlinear sizing, multiphase RF clocking, and overdrive voltage control.



Fig. 2. (a) Conventional single-ended 9-bit Class-E DPA. (b) Time-domain output waveforms showing ACW-AM and ACW-PM distortion.

digital-intensive TX solution. In this paper, a segmented class-E DPA, as shown in Fig. 2(a), is adopted for the design of the DPA based on its high efficiency potential and relatively simple output matching network [8], [9], [25]. However, despite its simplicity and high efficiency, a class-E DPA shows significant nonlinearity in ACW-AM and ACW-PM conversions. In the following, the linearity of a class-E DPA is analyzed.

#### A. DC Characteristic Curve and Dynamic Load Lines

In a switched-mode DPA, unlike an analog PA, the amplitude of the input RF signal applied to the gate of the transistors is constant for different output power levels while the overall effective width of the switched-on devices varies according to the input ACW. The ratio of the total effective width  $W_{\text{eff}}$  to the width of one unit device  $W_0$  is defined as relative sizing factor  $K_w = W_{\text{eff}}/W_0$ . Therefore, to analyze the dc characteristics of class-E DPA as a function of  $K_w$ ,  $I_{DS}$  versus  $V_{DS}$  curves can be plotted for different values of  $K_w$  with a fixed  $V_{GS}$ . In Fig. 3(a),  $I_{DS}$  versus  $V_{DS}$  curves are plotted for a 9-bit DPA with 511 uniform NMOS switches with  $V_{GS} = 1.1$  V. For each curve, two main operation regions can be distinguished: triode (for small  $V_{DS}$ ) and saturation (for large  $V_{DS}$ ). The dynamic load lines for a typical nonideal class-E DPA with  $V_{DD} = 0.5$  are simulated and plotted in Fig. 3(b). It can be seen that, as  $K_w$  increases, the swing of the drain voltage increases and pushes the operation region from a semi-saturation region toward a triode region. At large values of  $K_w$ , the DPA is fully switched between the triode and off-mode regions resulting in the typical class-E drain voltage

waveform as shown in Fig. 3(c). On the other hand, by increasing  $K_w$ , the waveform of drain current changes from a square wave into a typical class-E waveform as shown in Fig. 3(d). For small values of  $K_w < 30$ , the DPA can be modeled as a digitally controlled switching current source DPA which is a linear DPA. For large values of  $V_{DD}$ , it can be modeled as a resistive mode class-E DPA in which its on-resistance is modulated resulting in a highly nonlinear behavior. The linear range of the DPA can be extended by either increasing  $V_{DD}$  (which raises reliability issues) or by decreasing  $V_{GS}$ , even though, in practice, the overall linearity is still highly influenced and degraded by the “nonlinear region.”

#### B. Analysis of ACW-AM and ACW-PM Distortion Mechanism

The class-E DPA is implemented in a push-pull configuration to suppress even harmonics at the output. A transformer (TRF) with 1:3 turns ratio is implemented as the balun. According to the electromagnetic simulation results at 2 GHz, the primary and secondary inductances and resistances, the input load resistance, the magnetic coupling factor, and the passive efficiency are  $L_P = 0.41$  nH,  $L_S = 3.1$  nH,  $R_P = 0.5$  Ω,  $R_S = 5.5$  Ω,  $R_L = 1.73$  Ω,  $K_m = 0.756$ , and  $\eta_{\text{passive}} = 73.3\%$ , respectively. For  $1.4 \text{ GHz} < f_C < 2.8 \text{ GHz}$ , the input loaded reactance of all of the higher harmonics is negative ( $-6j < X_{IN} < 0$ ) and comparable with the fundamental input load resistance, as required by class-E load conditions [26]. The principle schematic of the DPA and its lumped model are illustrated in Fig. 4(a) and (b), respectively. The odd-mode equivalent circuit is shown in Fig. 4(c). In general, a class-E PA is not a linear time-invariant (LTI) system.



Fig. 3. Simulated (a) dc curves of  $I_{DS}$  versus  $V_{DS}$  for different  $K_W$  with a fixed  $V_{GS} = 1.1$  V, (b) dynamic load lines for a typical class-E DPA with  $V_{DD} = 0.5$ , (c) drain voltage waveforms, and (d) drain current waveforms.



Fig. 4. (a) Push-pull class-E DPA, (b) its lumped model, (c) its odd-mode half circuit, and (d) its odd-mode half circuit LTI (Norton equivalent) model for simplified theoretical analysis of linearity.

Therefore, for theoretical simplicity, we use the Norton equivalent model by replacing the switching transistors with a set of parallel current sources to analyze the dependency of the

output amplitude and phase on  $K_W$ . These current sources represent the Fourier series of the drain current [26] as shown in Fig. 4(d). To model the nonlinearity of the resulting drain



Fig. 5. Simulated and calculated (a) ACW-AM conversion curves of a class-E DPA and (b) ACW-PM conversion curve of a class-E DPA.

current caused by the limited output resistance, a resistance parallel with the current source is included which is inversely proportional to  $K_W$ . Consequently, the output current of the transistors is given by

$$\begin{aligned} I_D &= \sum_i^N (f_i(K_W) I_{Hi} - I_{RD}) \\ &= \sum_i^N \left( f_i(K_W) I_{Hi} \left( 1 - \frac{K_W Z_{IN}}{R_{D0}} \right) \right) \end{aligned} \quad (1)$$

where  $f_i(K_W)$  represents the amplitude and phase of the  $i$ th harmonic as a function of  $K_W$ ,  $I_{Hi}$  is the  $i$ th harmonic component of the current,  $R_{D0}$  is the output resistance for a unit transistor,  $Z_{IN}$  is the total input impedance seen by the current sources, and  $I_{RD}$  is the nonlinear component of the output current  $I_D$  with regard to  $K_W$ . Since the variation of the transistors drain capacitance is small, for now, we consider  $C_D$  to be almost constant<sup>1</sup>. Next, we assume that the amplitude of the first harmonic related to the current sources increases proportionally to  $K_W$  [i.e.,  $f_1(K_W) = K_W$ ]. However, the eventual drain current  $I_D$  will increase nonlinearly due to its decreasing output resistance (unless it sees zero load impedance). By neglecting the higher harmonics and considering only the first harmonic of input current,  $I_{H1}$ , the output voltage is calculated (2), as shown at the bottom of this page, where  $L_1$  and  $L_2$  are the leakage and magnetizing

<sup>1</sup> $C_D = K_W(C_{DS0} + C_{GD,Triode,0}) + (K_{W,MAX} - K_W)(C_{DS0} + C_{GD,OFF,0}) + C_{PAR} + C_{EXT}$ . The variation of  $C_D$  with no  $C_{PAR} + C_{EXT}$  is less than 5% and with large  $C_{PAR} + C_{EXT}$ , the variation can be less than 1%.

inductances, respectively.  $R_{LP}$  is the load resistance seen from the primary side of the transformer.  $C_D$  is the total drain capacitance. The first term in the numerator in (2) represents the linear gain of the DPA, and the remainder represents the term responsible for amplitude and phase distortion. Equation (2) can be used for both the linear operation region of DPA where  $R_{D0}$  is large and for the nonlinear region where  $R_{D0}$  is smaller. In Fig. 5, the simulated and calculated  $K_W$ -AM and  $K_W$ -PM curves for a class-E DPA with a total width of 2.5 mm are plotted. By assuming  $K_m \approx 1$  in the transformer for theoretical simplicity, the analytical solutions of output amplitude and phase error are calculated as (3) and (4), are shown at the bottom of this page, where  $q = 1/[\omega\sqrt{(L_2 C_D)}]$ . For the nonlinear region ( $K_W > 30$ ),  $R_{D0}$  is assumed to be  $1/3 R_{D0}$  in the linear region. Although, as mentioned earlier, two different modes of operation are distinguished, in both modes, the amplitude and phase distortions are mainly caused by the large variation in the effective output resistance of the transistors. The variation of  $C_D$  is  $\sim 0.043$  fF/ $\mu$ m (less than 1%), and it has trivial effect on the normalized  $K_W$ -AM curve. However, even with a higher variation of 0.1 fF/ $\mu$ m, it increases the phase distortion by only 1° as shown in Fig. 5(b).

### C. Power and Efficiency Roll-Off

In a switched-mode amplifier, the output power is a function of  $K_W$ , which is given by

$$P_{OUT} = \frac{|V_{OUT}|^2}{2R_{LP}} = \frac{AM^2(K_W)}{2R_{LP}} \quad (5)$$

$$V_{OUT} = \frac{(K_W I_{H1} R_{LP}) \times j\omega L_2 R_D}{R_{D0} R_{LP} - \omega^2 [K_W L_1 L_2 + (L_1 + L_2) C_D R_{D0} R_{LP}] + j\omega [L_2 R_{D0} + K_W (L_1 + L_2) R_{LP} - \omega^3 L_1 L_2 C_D R_{D0}]} \quad (2)$$

$$AM(K_W) \cong K_W I_{H1} R_{LP} \left( \frac{\omega q^2 L_2 R_{D0}}{\sqrt{(q^2 R_{D0} R_{LP} - R_{D0} R_{LP})^2 + \omega^2 q^4 (L_2 R_{D0} + K_W L_2 R_{LP})^2}} \right) \quad (3)$$

$$\phi_{Err}(K_W) \cong 90^\circ - \arctan \left( \frac{\omega q^2 L_2 (R_{D0} + K_W R_{LP})}{(q^2 - 1) R_{D0} R_{LP}} \right) \quad (4)$$



Fig. 6. Simulated DE of ideal class-E/F<sub>2</sub>, class-E, and class-B DPAs versus output voltage.

where  $AM(K_w)$  is given by (3). The dc power consumption of the PA can be calculated as follows:

$$P_{dc} = V_{DD}I_{dc} = P_{dc,Max}P_{NORM}(AM(K_w)) \quad (6)$$

where  $P_{dc,Max} = V_{DD}I_{dc,Max} = K_p V_{DD}^2/R_{LP}$ , in which  $K_p$  is the class-E power scaling factor [27].  $P_{NORM}(AM)$  is a unitless monotonically increasing function of the output voltage normalized between (0:1). The curvature of  $P_{NORM}(AM)$  depends on the impedance seen by the drain of the transistors at all harmonics, with the first and second as being dominant due to the on-resistance modulation. For simplicity, if we ignore the second harmonic impedance (which practically means that the second harmonic is an open circuit as in a class-F<sup>-1</sup> or class-E/F<sub>2</sub> tuning), then  $P_{NORM} \cong AM(K_w)/AM_{Max}$ , so the dc power and DE are given by

$$P_{dc} = V_{DD}I_{dc} = \frac{K_p V_{DD}^2}{R_{LP}AM_{Max}}AM(K_w) \quad (7)$$

$$\begin{aligned} \eta &= \frac{P_{OUT}}{P_{dc}} \cong \left( \frac{AM^2(K_w)}{2R_{LP}} \right) / \left( K_p V_{DD}^2 \frac{AM(K_w)}{R_{LP}AM_{Max}} \right) \\ &= \frac{AM_{Max}}{2K_p V_{DD}^2} AM(K_w). \end{aligned} \quad (8)$$

The simulated DE versus the normalized output AM is plotted in Fig. 6 for ideal class-E, class-E/F<sub>2</sub>, and class-B DPA/PA. It can be seen that the DE of a class-E/F<sub>2</sub> DPA is almost a linear function of the output amplitude similar to class-B PA and it achieves higher DE at back-off power compared with class-E DPA due to its higher second harmonic impedance.

### III. PROPOSED LINEARIZATION TECHNIQUES

#### A. Nonlinear Sizing

As mentioned before, in a conventional DPA, the total effective size of active devices is a linear function of the input ACW, which we refer to as *linear sizing* or *segmentation*. However, according to (3) and the simulation results shown in Fig. 5(a), as the effective size of the DPA linearly increases, the output amplitude increases nonlinearly due to the on-resistance modulation. For simplicity, if we assume  $q = 1$  in (3) (similar to class-D<sup>-1</sup> or F<sup>-1</sup>), we derive a simple

equation to describe the amplitude nonlinearity as follows, which is similar to the calculated results in [3] and [28]:

$$AM(K_w) = \frac{K_w R_{D0}}{K_w R_{LP} + R_{D0}} R_{LP} I_{H1} = \frac{K_w}{K_w K_{NL} + 1} R_{LP} I_{H1} \quad (9)$$

where  $K_{NL} = R_{LP}/R_{D0}$  is defined as the nonlinearity factor. As  $K_{NL}$  increases (by lower  $R_{D0}$  or higher  $R_{LP}$ ), which is beneficial for increasing the DE, the concavity of the ACW-AM curve of a linearly sized DPA increases. In a linearly sized DPA,  $W_{eff,L} = f(ACW) = W_0 \cdot ACW$  and thus,  $K_w = ACW$  which results in a high ACW-AM distortion, as simulated and depicted in Fig. 7. However, by nonlinearly sizing the sub-PA cells, i.e., making the sizing factor  $K_w$  a nonlinear function of ACW, it is possible to linearize the ACW-AM conversion without the need to pre-distort the ACW data, as shown in Fig. 7. Thus, by assuming  $AM(K_w) = G \cdot ACW \cdot R_{LP} I_{H1}$  where  $G$  is a constant, we get

$$K_w = \frac{G \cdot ACW}{1 - G \cdot K_{NL} ACW}. \quad (10)$$

In order to have the same total effective size of the DPA as a linearly sized DPA, we should have  $G = 1/(K_{NL} ACW_{Max} + 1)$ . So, the total nonlinear effective size ( $W_{eff,NL}$ ) is given by

$$W_{eff,NL}[ACW] = \frac{W_0 \cdot ACW}{1 + K_{NL}(ACW_{Max} - ACW)}. \quad (11)$$

From (9)–(11), we can calculate the extra dynamic range (DR) that we gain by using nonlinear sizing compared with a linearly sized DPA. Considering the DPA as an amplitude quantizer, its DR can be defined as  $DR = dB(P_{OUT}|_{ACW=Max}/P_{OUT}|_{ACW=1})$ . Thus, if we assume that the total size and the resolution of a linearly sized DPA and a nonlinearly sized DPA are the same, consequently, their maximum output power would also be the same. So, by dividing their amplitude at  $ACW = 1$ , we get

$$\begin{aligned} \Delta DR(dB) &= dB\left(\frac{1 + K_{NL} ACW_{Max}}{1 + K_{NL}}\right) \\ &\approx dB(1 + K_{NL} ACW_{Max}). \end{aligned} \quad (12)$$

As can be seen in Fig. 5(a), for a typical class-E DPA, the output amplitude at  $ACW = 1$  is  $\sim 3 \times$  the amplitude of a linear DPA. Thus, even by using an ideal DPD to linearize a nonlinear DPA, the DR is still 10–12 dB less than a linear DPA with the same number of bits. Compensating for this loss of DR requires at least two extra bits in the DPA which increases the complexity of the preceding digital circuitries and the DPD block. Furthermore, as  $K_{NL}$  increases, the benefit of nonlinear sizing in DR with the same resolution also increases.

In a practical design where  $q \neq 1$  and  $K_m \neq 1$ , it is easier to extract  $W_{eff}$  by simulating the ACW-AM curve of a linearly sized DPA and then inverting, normalizing, and multiplying it by  $ACW_{Max} W_0$ . However, unlike a linearly sized DPA, where the differentiation of the total effective size is constant and equal to one LSB unit cell size ( $W_0$ ), here, the differentiation



Fig. 7. (a) Total effective size  $W_{\text{eff}}$  ( $\mu\text{m}$ ) versus ACW, (b) simulated normalized output AM versus  $W_{\text{eff}}$  ( $\mu\text{m}$ ), and (c) resulting simulated ACW-AM curves for a DPA with linear sizing, nonlinear sizing, and segmented nonlinear sizing.



Fig. 8. Simulated (a) ACW-AM and (b) output PSD of a nonlinearly sized DPA for different number of segments assuming no ACW-PM or other kind of non-idealities.



Fig. 9. Concept of overdrive-voltage tuning technique to control the linearity of ACW-AM curve.

of the effective total size given by (11) has a different value for each  $ACW > 0$ , as follows:

$$\begin{aligned} W_{\text{Device,NL}}[ACW] &= \text{diff}(W_{\text{eff,NL}}(ACW)) \\ &= W_{\text{eff,NL}}[ACW] - W_{\text{eff,NL}}[ACW - 1]. \end{aligned} \quad (13)$$

This means that, in order to implement a fully nonlinearly sized  $N$ -bit DPA, we need  $2^N - 1$  different devices, something that is not only very labor intensive but would also result in high power consumption of the driver stages. In order to benefit from the commonly used binary–unary segmentation to reduce the power consumption of the drivers, *segmented nonlinear sizing* can be devised instead of fully nonlinear sizing. In a segmented nonlinearly sized DPA, the  $W_{\text{eff,NL}}(ACW)$

curve is divided into  $N$  segments in which the effective size  $W_{\text{eff,NL},i}$  of the activated transistors in the  $i$ th segment increases linearly, resulting in a piecewise-linear approximation of  $W_{\text{eff,NL}}$  as shown in Fig. 7(a). Thus,  $W_{\text{eff,NL},i} = W_i(ACW - P_i)$ , in which  $W_i$  is the unit size of the  $i$ th segment and  $P_i$  is the sum of the ACW range ( $\Delta P_i$ ) of previous segments, i.e.,  $P_i = \sum_{j=1}^{i-1} \Delta P_j$  and  $P_0 = 0$ . By knowing  $W_{\text{eff,NL}}(ACW)$  either analytically (from (11)) or experimentally, we can calculate  $W_i$  for  $i > 0$  as follows:

$$W_i = \frac{W_{\text{eff,NL}}[P_i] - W_{\text{eff,NL}}[P_{i-1}]}{\Delta P_i}. \quad (14)$$

In order to decrease the complexity of an  $N$ -bit DPA array, we select the number of the segments as a power of two, i.e.,  $N_{\text{Seg}} = 2^m$ , and choose the same range for



Fig. 10. Simulated (a) ACW-AM curves of a scenario showing how to correct for the process variation from TT to FF corner by controlling the overdrive voltage and (b) effect of temperature variation on normalized ACW-AM and ACW-PM conversion curves.



Fig. 11. (a) Basic concept of multiphase RF clocking. (b) Resulting simulated phase distortions of a DPA with the conventional single-phase RF clocking and multiphase RF clocking.

all segments equal to  $\Delta P = 2^{N-m} = 2^n$ . Therefore, the array is implemented in  $2^m$  rows (segments), and they can be fully realized by unary cells, binary cells, or by a combination of both. Segmented nonlinear sizing results in a quasi-linear ACW-AM curve, as simulated and depicted in Fig. 8(a), whereby linearity depends on the number and the range of the segments. Assuming  $2^m$  similar range segments and no ACW-PM distortion, the output power spectral density (PSD) of a nonlinearly sized DPA is plotted in Fig. 8(b) for different numbers of segments. As can be seen, in order to create enough margin for other sources of nonidealities, eight segments are sufficient to reach an acceptable linearity.

### B. Overdrive Voltage Tuning for PVT Compensation

Accuracy of the nonlinear sizing technique depends on the accuracy of the calculated or simulated  $W_{\text{eff}}\text{-AM}$  curve of the DPA. However, as predicted by (2) and (3), in practice, this curve varies with any process/voltage/temperature (PVT) variations which changes  $R_{D0}$ . In addition, any change in the carrier frequency, the load network, or the antenna impedance that can be modeled as variation in  $R_{LP}$  results in degradation of linearity of the ACW-AM curve. For small

variations of  $V_{DD}$  ( $<10\%$ ), the normalized ACW-AM is almost the same. However, for larger variations, the linearity of ACW-AM curve degrades since the ranges of the current-source mode and resistive-mode regions change. Nonetheless, any PVT/frequency/load variations can generally be modeled as variation in the nonlinearity factor  $K_{NL}$  which can be corrected by tuning  $R_{D0}$ . Therefore, as predicted by the following simplified equation, we can linearize the normalized ACM-AM curve again by correcting  $K_{NL}$ :

$$AM_{\text{NORM}}(W_{\text{eff}}) \cong \frac{W_{\text{eff}}}{W_{\text{eff},\text{Max}}} \left( \frac{W_{\text{eff},\text{Max}} K_{NL} + W_0}{W_{\text{eff}} K_{NL} + W_0} \right). \quad (15)$$

Assuming  $V_{DD} < V_{OD}$ , where  $V_{OD} = V_{GS} - V_{TH}$  is the overdrive voltage of the DPA output transistors,  $R_{D0}$  is given by  $R_{D0} = [(W/L) \times K_n \times V_{OD}]^{-1}$  [29]. Thus, in order to tune  $R_{D0}$  and correct  $K_{NL}$ , the overdrive voltage can be tuned by changing the amplitude of the RF clock applied to the gate of the transistors (i.e.,  $V_{GS}$ ). This is feasible by tuning the dc supply of the buffers driving the transistors. In Fig. 9, the concept of the overdrive tuning technique to control the linearity of the ACW-AM curve by making it more concave (increasing overdrive voltage) or convex (decreasing overdrive voltage) is shown. For example, if the DPA is designed by nonlinear



Fig. 12. (a) Simplified LTI model of multiphase RF clocking. Phasor representation of the output signal and the currents of each segment for (b) conventional DPA, (c) DPA with multiphase RF clocking requiring positive phase-offsets, and (d) DPA with multiphase RF clocking with negative phase-offsets implementable by positive delay-offsets.



(a)



(b)

Fig. 13. (a) Flowchart of delay-offsets optimization for ACW-PM correction. (b) Simulated effect of multiphase RF clocking on ACW-AM conversion.

sizing to be linear in an ambient temperature of  $T_0$  and in the typical-typical (TT) process corner but the chip is fabricated in the fast-fast (FF) process corner or, during the

chip operation, the temperature is less than  $T_0$ , then  $R_{D0}$  will decrease. Therefore, to correct  $K_{NL}$ , the  $V_{DD}$  of the buffers driving the output transistor should decrease in order to lower



Fig. 14. Capacitive harmonic tuning for efficiency enhancement. (a) Circuit, and (b) power and efficiency simulation versus duty cycle.



Fig. 15. (a) Overall block diagram of the proposed DPA. (b) Circuit of sub-PA. (c) Single-ended to differential converter.

the  $V_{OD}$  and increase  $R_{D0}$ . In Fig. 10(a), the simulation results of such a scenario for the process variation from TT corner to FF process corner is depicted showing less than 0.1-dB decrease in the output power. In Fig. 10(b), the simulated effect of the temperature variation is shown which indicates a negligible impact on the linearity.

### C. Multiphase RF Clocking

In a conventional DPA, all of the transistors are driven by the same modulated RF clock in which the phase is dynamically modified either digitally (by DPD) [1]–[5], [16], [17] or in the analog PM path [15] to correct for the ACW-PM distortion. This phase distortion is translated into an ACW-dependent delay in the time domain as shown in Fig. 2(b). In order to avoid modifying the phase information for each ACW, in this

work, multiple RF clocks with different but fixed delay-offsets are applied to the DPA cells. In Fig. 11(a), the basic concept of *multiphase RF clocking* is shown. The DPA array is divided into a few segments (which are typically but not necessarily the same segments used for segmented nonlinear sizing). Each segment is driven by an RF clock with a delay-offset different from other segments. In Fig. 11(b), the resulting simulated phase distortions of a DPA with conventional single-phase RF clocking and the multiphase RF clocking are depicted. By knowing the phase-offset of each segment ( $\Delta\theta_i$ ) at a carrier frequency of  $f_C$ , the delay-offset of that segment is calculated by  $\Delta T_i = \Delta\theta_i / (360^\circ \times f_C)$ .

Equation (2) can be rewritten for the output voltage as a product of transistors current, modeled as current sources, and the trans-impedance seen by that current



Fig. 16. Chip micrograph (core area = 1 mm × 0.45 mm).



Fig. 17. 6-bit digitally programmable on-chip LDO designed for overdrive-voltage tuning.

source including the output resistance of the transistors as follows:

$$V_{\text{OUT}}(K_w) = I_{H1} \times K_w \times \{|Z(K_w)| \angle \Phi_Z(K_w)\} \quad (16)$$

where  $|Z(K_w)|$  is the absolute value of the trans-impedance function and  $\angle \Phi_Z(K_w)$  is its phase response as functions of the sizing factor of  $K_w$ . In a simplified LTI (Norton equivalent) model for the multiphase RF clocking with  $N$  segments, we can replace each segment with a current source at fundamental frequency with a phase-offset ( $\Delta\theta_i$ ) and an amplitude proportional to the sizing factor of that segment ( $K_{wi}$ ), as shown in Fig. 12(a). Thus, by using the superposition theorem, the output voltage is given by

$$\begin{aligned} V_{\text{OUT}}(K_w) &= \left\{ \sum_{i=1}^N |I_{H1}| \angle \Delta\theta_i \cdot K_{wi} \right\} [|Z(K_w)| \angle \Phi_Z(K_w)] \\ &= \sum_{i=1}^N S_i(K_w) \end{aligned} \quad (17)$$

where  $K_w = \sum_{i=1}^N K_{wi}$ , and  $S_i(K_w) = |I_{H1}| \times K_{wi} \times |Z(K_w)| \angle [\Delta\theta_i + \Phi_Z(K_w)]$  is the output voltage phasor contributed by the  $i$ th segment as depicted in Fig. 12(b)-(d). The value of  $K_{wi}$  represents the effective size of the enabled transistors in the  $i$ th segment normalized to the unit size  $W_0$ . Therefore, when Segment 1 is fully switched ON, the output phasor is equal to  $S_1(K_w)$  whereby the phase is equal to  $\Delta\theta_1 + \Phi_Z(K_w)$ . When Segment 2 is also fully switched ON, the output phasor is equal to  $S_{1-2} = |S_1(K_w)| \angle [\Delta\theta_1 + \Phi_Z(K_w)] + |S_2(K_w)| \angle [\Delta\theta_2 + \Phi_Z(K_w)]$ , where



Fig. 18. Structure of the 4-bit fine-resolution delay line and its delay cells.

$K_{w1-2} = K_{w1} + K_{w2}$ . So, its phase is calculated as

$$\begin{aligned} \Delta\theta_{1-2} &= \Delta\theta_2 + \Phi_Z(K_{w1-2}) + \arctan \left( \frac{|S_1(K_{w1-2})| \sin(\Delta\theta_1 - \Delta\theta_2)}{|S_1(K_{w1-2})| \cos(\Delta\theta_1 - \Delta\theta_2) + |S_2(K_{w1-2})|} \right). \end{aligned} \quad (18)$$

Hence, in order to rectify the phase distortion, by equating  $\Delta\theta_{1-2}$  to  $\Delta\theta_1 + \Phi_Z(K_w)$ ,  $\Delta\theta_2$  is obtained as follows:

$$\begin{aligned} \Delta\theta_2 &= \Delta\theta_1 + \arcsin \left( \frac{|S_1(K_{w1-2})|}{|S_2(K_{w1-2})|} \sin[\Phi_Z(K_w) - \Phi_Z(K_{w1-2})] \right) \\ &\quad + \Phi_Z(K_w) - \Phi_Z(K_{w1-2}). \end{aligned} \quad (19)$$

In general, if  $K_{w1-i} \stackrel{\text{def}}{=} \sum_{j=1}^i K_{wj}$  and  $S_{1-i}(K_{w1-i}) \stackrel{\text{def}}{=} \sum_{j=1}^i S_j(K_{w1-i})$ , by knowing  $\Delta\theta_1$  to  $\Delta\theta_{(i-1)}$ , and applying the same procedure, the phase-offset  $\Delta\theta_i$  is calculated as follows:

$$\begin{aligned} \Delta\theta_i &= \Delta\theta_1 + \arcsin \left( \frac{|S_{1-(i-1)}(K_{w1-i})|}{|S_i(K_{w1-i})|} \right. \\ &\quad \times \left. \sin[\Phi_Z(K_w) - \Phi_{S1-(i-1)} + \Delta\theta_1] \right) \\ &\quad + \Phi_Z(K_w) - \Phi_Z(K_{w1-i}) \end{aligned} \quad (20)$$

where  $\Phi_{S1-(i-1)} \stackrel{\text{def}}{=} \angle S_{1-(i-1)}(K_{w1-i})$ . Since phase-offsets calculated by (20) are positive, as shown in Fig. 12(c),



Fig. 19. AM/PM timing mismatch correction by (a) coarse delay line and (b) digital FIR filter implemented as a fractional delay.

they are not feasible for implementation by delaying the RF clocks as they are equivalent to negative delay-offsets. In order to make the phase-offsets implementable by delay lines, the largest phase-offset should be less than zero. Thus, for a class-E or semi class-E/F<sub>2</sub>, by having  $\Delta\theta_N = 0$ , as shown in Fig. 12(d), the phase-offset of the first RF clock is given by (21), as shown at the bottom of this page.

By having  $\Delta\theta_1$  from (21), the other phase-offsets can be calculated from (20). For a DPA with  $N$ -phases RF clocks,  $N - 1$  steps are required to estimate all of the phase/delay-offsets. In practice, the delay-offsets can be found by using an iterative algorithm as shown in Fig. 13(a). In this algorithm, in each iteration of the outer loop, the phase errors of segments 1 to  $(N - 1)$  are measured in respect to the phase of segment  $N$ , converted to delay codes, and then programmed into the chip. This loop typically reiterates four to five times until the root-mean-square of the measured phase errors is less than  $1^\circ$ . Once the ACW-PM is flattened, the normalized ACW-AM curve is almost the same as a single phase nonlinearly sized DPA. In Fig. 13(b), the simulated effect of multiphase RF clocking on the ACW-AM curve is shown. Furthermore, by using this technique, due to the intrinsic weighted phase averaging at the output, the total phase error inside each segment is significantly reduced. For example, as shown in Fig. 11(b), the total phase errors of Segment 3 are reduced from  $10^\circ$  to  $2^\circ$  by employing multiphase RF clocking.

#### D. Harmonic Tuning for Efficiency Enhancement

In a typical class-E DPA, multiphase RF clocking does not degrade the average DE [with peak-to-average power ratio (PAPR)  $> 6$  dB]. However, depending on the load network conditions, it may slightly degrade the peak DE. By using a capacitor ( $C_C$ ) between the differential drains of the push-pull DPA, as shown in Fig. 14(a), the impedance of odd and even modes can be tuned from a typical class-E PA more toward a class-E/F<sub>2</sub> condition [25]. By doing so, the peak DE is enhanced back to the level of a single-phase class-E DPA. However, the sensitivity of the power and efficiency to duty cycle variations (and timing mismatches) may increase. In this paper, by properly optimizing  $C_C$ , not only the



Fig. 20. Measurement setup.

peak DE is enhanced but also the sensitivity of the peak  $P_{OUT}$ , and DE to the variations of duty cycle (and timing mismatches) are improved compared with a single-phase class-E DPA as shown by the simulation results in Fig. 14(b).

## IV. IMPLEMENTATION

The 9-bit linear polar DPA is designed and fabricated in 40-nm bulk CMOS with a core area of  $1 \text{ mm} \times 0.45 \text{ mm}$ . The overall block diagram of the proposed DPA is shown in Fig. 15(a), and the chip micrograph is depicted in Fig. 16. The DPA consists of two identical push-pull arrays which are configured in  $8\text{-row} \times (16 + 3)$  column pattern. Clock gating is applied to the row drivers to enhance the efficiency at power back-off (PBO) levels. Each row is one segment of an eight-segment nonlinearly sized DPA. Similar to a typical segmented DAC [30], each segment consists of 16 MSB cells which are addressed by the first four most significant bits of the column decoder and 3 LSB cells which are addressed by the two least significant bits of the column decoder. In each segment, the size of MSB cells is  $1/16$  the total size of that segment and the size of LSB cells is  $1/64$  the total size of that segment.

$$\Delta\theta_1 = -2\Phi_Z(K_{W1}) + \arctan \left( \frac{|S_N(K_{W1-N})| \sin(\Phi_Z(K_{W1-N})) + |S_{1-(N-1)}(K_{W1-N})| \sin(\Phi_{S1-(N-1)})}{|S_N(K_{W1-N})| \cos(\Phi_Z(K_{W1-N})) + |S_{1-(N-1)}(K_{W1-N})| \cos(\Phi_{S1-(N-1)})} \right) \quad (21)$$



Fig. 21. (a) Measured peak DE (%), PAE (%), and  $P_{\text{OUT}}$ (dBm) versus carrier frequency for  $V_{\text{DD}} = 0.5, 0.6$ , and  $0.7$  V. (b) Measured  $P_{\text{OUT}}$  and  $P_{\text{dc}}$  normalized to  $P_{\text{dc},\text{Max}}$ , DE, and PAE versus normalized output amplitude at  $2.2$  GHz with  $V_{\text{DD}} = 0.5$  V, showing a linear roll-off for DE similar to class-B.

The related circuit of the sub-PA cells is shown in Fig. 15(b). The drivers in each sub-PA cell are sized proportional to the size of the output transistor.

In order to facilitate the overdrive tuning technique, a 6-bit digitally programmable low-dropout regulator (LDO) is designed and implemented on-chip which is capable of driving  $50$  mA with a resolution of  $10\text{--}12$  mV and a settling time of  $\sim 300$  ns as illustrated in Fig. 17. The reference voltage of the LDO is provided by a 6-bit R-2R DAC. There is only one LDO on the chip that supplies all the drivers in the entire push-pull DPA array, but not the delay-offsets blocks. The input PM RF clock is amplified on-chip and subsequently fed to the multiphase RF clocking circuit. This block generates five separate differential RF clocks with optimized delay-offsets and, at the end, simultaneously applies them to the corresponding DPA segments. In order to compensate for the PVT, frequency, and the load variations effect on the ACW-PM correction, the resolution of the phase-offsets should be about  $5\text{ -- }6^\circ$  which translates into an approximate  $6.5$ -ps delay for the RF range of  $2\text{ -- }2.5$  GHz. This resolution is less than half of the absolute delay of a minimum sized inverter in  $40\text{-nm}$  CMOS technology. To overcome this limitation, each delay-offset is implemented with a 4-bit digitally programmable fine resolution delay line based on the relative delay of current-starved inverters [31] as shown in Fig. 18. The absolute delay of each delay cell is controlled through a single bit by enabling or disabling NMOS and PMOS transistors in series with the  $V_{\text{DD}}/\text{GND}$  paths. The RF clock passes through 15 cascaded delay cells to arrive at the output resulting in a total relative delay of  $97$  ps with a resolution of  $\sim 6.5$  ps, which is sufficient to compensate for the practical variations. By changing the LDO settings by less than 10 LSB, the delay of the smallest driver and the largest driver in the DPA array change almost by the same amount. Therefore, the ACW-PM linearity remains intact and there is no need to retune the delay-offsets settings. At the output, the single-ended RF clocks are converted into differential clocks with the circuit shown in Fig. 15(c).

The ACW data are stored in an on-chip 4 K SRAM running at 625 MHz. In order to compensate the timing

mismatch between the ACW and PM paths, which significantly degrades the error vector magnitude (EVM) and adjacent channel power ratio (ACPR) for wideband signals [3], [5], two different techniques, used either separately or simultaneously, are utilized. The first is a 4-bit programmable delay line comprising 15 cascaded delay cells, as shown in Fig. 19(a), with a resolution of  $\sim 30$  ps and a total range of  $\sim 450$  ps which is placed in the path of the baseband sampling clock of the ACW registers. The second is a digital 10-tap FIR filter implemented on-chip as a fractional delay element in the digital path of ACW as shown in Fig. 19(b). The coefficients of the filter are given by  $h[n] = \sin[\pi(n - \Delta.F_S)]/[\pi(n - \Delta.F_S)]$  [32] in which  $n$  is the index of the tap coefficient and  $\Delta$  is the desired delay which is not necessarily an integer multiple of  $1/F_S$ . Therefore, while the registers of the filter are clocked with a frequency of  $F_S$ , the output codes are delayed by a fraction of  $1/F_S$  which is the group delay of the digital FIR filter.

## V. MEASUREMENT RESULTS

The measurement setup is shown in Fig. 20. An analog off-chip I/Q modulator provides the PM RF clock. Since the output pads of the balun are not located at the edge of the chip, the static continuous wave (CW) measurements of output power ( $P_{\text{OUT}}$ ), DE, and power-added efficiency (PAE) are carried out by probing to avoid the loss caused by the long bond-wires. The dynamic measurements are performed after wire bonding. PAE includes all power consumptions on-chip including the sub-PA drivers, digital decoders, multiphase RF clocking circuit, and LDO. All of the measurements are conducted without using any type of DPD.

### A. Static (CW) Power/Efficiency Measurements

The peak output power and efficiency at different carrier frequencies are measured using CW signals over the range of  $1.5\text{--}3$  GHz for different output stage  $V_{\text{DD}}$ s as plotted in Fig. 21(a). By increasing the  $V_{\text{DD}}$  from  $0.5$  to  $0.7$  V, the peak output power increases accordingly from  $14.3$  to  $17.3$  dBm at  $f_C = 2$  GHz and from  $14.6$  to  $17.6$  dBm at  $f_C = 2.2$  GHz. The 1-dB bandwidth of the peak  $P_{\text{OUT}}$



Fig. 22. Measured semi-static linearity using a triangle signal at 2 GHz and  $V_{DD} = 0.54$  V. (a) ACW-AM for various LDO settings. (b) ACW-PM after each iteration of the optimization algorithm. The numbers in the brackets show the codes of the five delay-offsets.



Fig. 23. Measured ACW-AM and ACW-PM of the DPA under load variations. (a) Before correction. (b) After correcting the LDO and delay-offsets settings.



Fig. 24. Measured spectrum and constellation diagram of (a) 20-MHz 64-QAM signal and (b) 20-MHz OFDM 64-QAM signal.

ranges from 1.5 to 2.7 GHz, showing a fractional bandwidth of over 57%. The peak PAE also increases from 24% to 29% at  $f_C = 2$  GHz and from 26% to 32% at  $f_C = 2.2$  GHz.

The peak DE is not dependent on  $V_{DD}$ , and it reaches approximately 37% at  $f_C = 2$  GHz and 44% at  $f_C = 2.2$  GHz. The measured  $P_{OUT}$  and  $P_{dc}$  at  $f_C = 2.2$  GHz with  $V_{DD} = 0.5$  V



Fig. 25. Measured spectrum and constellation diagram of (a) 40-MHz OFDM 64-QAM signal and (b) 80-MHz OFDM 64-QAM signal.



Fig. 26. Measured spectrum of a 20-MHz OFDM signal under different (a) load conditions and (b)  $V_{DD}$ s.

are normalized to maximum measured dc power and plotted versus the normalized output amplitude in Fig. 21(b) as well as DE and PAE. As can be seen,  $P_{dc}$  and DE have almost linear roll-off versus the output amplitude similar to class-B. As expected, there is a nonlinear roll-off for PAE since this parameter includes the power consumption of all of the other circuits blocks which do not scale with output power. The measured and simulated power breakdown of the DPA at full power (with ACW = 511) is shown in Table I.

### B. Static Linearity Measurement by Triangle Signal

Since the input signal in a DPA is digital, it is straightforward to generate a perfect (quantized) ramp or triangle as an input signal and directly measure the (semi) static linearity. Consequently, in this measurement, a 4096-sample triangle signal is generated and programmed into the on-chip SRAM, resulting in a 152.6-kHz AM signal without phase modulation. At the output, 128 periods of the signal are measured and averaged. The ACW-AM curves for different LDO settings

are depicted in Fig. 22(a) which shows the effectiveness of the overdrive voltage tuning to control the concavity of the ACW-AM curve. The ACW-PM curves without and with multiphase clocking after each iteration of the optimization algorithm as well as the corresponding delay codes [DelaySeg1 DelaySeg2 ... DelaySeg5] are depicted in Fig. 23(a). After four iterations, the phase error from ACW = 64 to ACW = 511 is less than  $\pm 1^\circ$ . Furthermore, the effect of the load variations from 25 to 100  $\Omega$  is measured and shown in Fig. 22(a). Although the linearity degrades slightly, by retuning the LDO and delay-offsets, as shown in Fig. 23(b), the DPA is once again optimally linearized.

### C. Modulated Signal Measurement

The dynamic performance of the DPA is measured with quadratic-amplitude modulated (QAM) and OFDM signals without using any type of DPD. Fig. 24(a) and (b) shows the spectrum and constellation diagram of 20-MHz 64-QAM (PAPR = 6.5 dB) and OFDM 64-QAM (PAPR = 8.1 dB)



Fig. 27. Measured spectrum of (a) 40-MHz and (b) 80-MHz OFDM signals under different  $V_{DDs}$ .

TABLE I  
MEASURED AND SIMULATED POWER BREAKDOWN  
OF THE DPA AT  $f_C = 2$  GHz

|                                                    |             | 20MHz OFDM 64-QAM<br>based on 802.11g | CW Full Power,<br>(at ACW=511) |
|----------------------------------------------------|-------------|---------------------------------------|--------------------------------|
|                                                    | Measurement | Measurement                           | Simulation                     |
| Output Power (dBm)                                 | 6.2         | 14.3                                  | 15.6                           |
| Drain DC Power (mW)                                | 27.4        | 71                                    | 85                             |
| Buffers & Digital DC Power (mW)                    | 4.8         | 23                                    | 21.4                           |
| Multiphase Clocking DC Power (mW)                  | 6.6         | 6.6                                   | 6.8                            |
| LDO DC Power (mW)                                  | 0.2         | 1.1                                   | 1.9                            |
| SRAM DC Power (mW)<br><b>(Not included in PAE)</b> | 32          | 26.4                                  | NA                             |

signals, measured around  $f_C = 2$  GHz with  $V_{DD} = 0.5$  V. The ACPR1/ACPR2 are as low as  $-40/-50$  and  $-46/-50$  dBc, respectively. The measured EVMs are  $-35$  and  $-36$  dB. The measured DEs are  $18\%$  and  $15.2\%$ , respectively, and measured PAEs are  $12.6\%$  and  $10.7\%$ , respectively. The DPA is also measured with 40- and 80-MHz OFDM 64-QAM signals. The measured spectra and constellation diagrams are shown in Fig. 25(a) and (b). The ACPR1/ACPR2 are  $-40/-50$  and  $-34/-42$  dBc respectively. The measured EVMs are  $-33$  and  $-26$  dB.

Furthermore, without retuning the LDO and delay-offsets settings, the 20-MHz OFDM signal is measured under different DPA loads and dc supply voltages as shown in Fig. 26(a) and (b). As can be seen, the output spectrum easily passes the 802.11/ac/g masks with different  $V_{DD} = 0.5$ , 0.6, and 0.7 V (resulting in  $P_{OUT} = 6.2$ , 7.8, and 9.2 dBm, respectively). Although linearity is degraded under load variations, the output spectrum can still easily pass the 802.11/ac/g masks. In addition, the output spectra of 40- and 80-MHz OFDM signals are also measured with different  $V_{DDs}$  as shown



Fig. 28. Measured out-of-band spectrum of a 20-MHz OFDM signal at  $f_C = 2$  GHz.

in Fig. 27(a) and (b). Measured results show that, for very wideband signals ( $BW > 40$  MHz), the output spectrum can pass the 802.11/ac/g masks by only slightly retuning the LDO settings (less than 5 LSB  $\approx$  60 mV). The out-of-band spectrum of a 20-MHz OFDM signal centered at 2 GHz is measured and shown in Fig. 28. The spectral sampling replicas are located at  $\pm 625$ -MHz offset frequency and attenuated by the zero-order-hold transfer function to less than  $-40$  dBc.

The measured power breakdown of the DPA at 8.1-dB PBO with a 20-MHz OFDM signal is depicted in Table I. As can be seen, the total power consumption of the LDO and multiphase RF clocking is approximately 6.8 mW while the measured power consumption of the SRAM is approximately 32 mW. Thus, compared with an LUT-DPD-based linearization technique, the advanced techniques proposed in this paper conserve more than 25 mW of power, which significantly improves the efficiency of the DPA. Table II summarizes and compares this paper with the state of the art. Compared with prior art which has targeted high efficiency by

TABLE II  
PERFORMANCE SUMMARY AND COMPARISON WITH PRIOR ART

|                      | This Work                            |                                      |                                      | [15]<br>JSSC'16    | [16]<br>ISSCC'17 | [17]<br>ISSCC'17 | [18]<br>ISSCC'17  | [20]<br>JSSC'16 | [21]<br>JSSC'15  | [22]<br>JSSC'15   | [23]<br>JSSC'16   |
|----------------------|--------------------------------------|--------------------------------------|--------------------------------------|--------------------|------------------|------------------|-------------------|-----------------|------------------|-------------------|-------------------|
| Architecture         | Polar DPA                            |                                      |                                      | Polar DPA          | Polar VMD SC-DPA | Polar DPA        | Polar DPA         | IQ SC-DPA       | Polar DPA        | IQ QDAC           | Polar SC-DPA      |
| Bandwidth (MHz)      | <b>20</b>                            | <b>40</b>                            | <b>80</b>                            | 8                  | 10               | 20               | 20                | 10              | 20               | 20                | 2                 |
| ACPR1 (dBc)          | <b>-46</b><br>@ $f_c=2.05\text{GHz}$ | <b>-40</b><br>@ $f_c=2.04\text{GHz}$ | <b>-34</b><br>@ $f_c=2.04\text{GHz}$ | -27.9              | -45              | -35 <sup>1</sup> | -31.5             | -30.7           | -28 <sup>1</sup> | -42               | -42 <sup>1</sup>  |
| EVM (dB)             | <b>-36</b><br>(OFDM 64-QAM)          | <b>-33</b><br>(OFDM 64-QAM)          | <b>-26</b><br>(OFDM 64-QAM)          | -36.3<br>(256-QAM) | -40<br>(256-QAM) | -25<br>(64-QAM)  | -31.9<br>(64-QAM) | -28<br>(64-QAM) | -28<br>(64-QAM)  | NA                | -27.1<br>(64-QAM) |
| DPD                  | <b>NO</b>                            |                                      |                                      | <b>YES</b>         | <b>YES</b>       | <b>YES</b>       | <b>YES</b>        | <b>YES</b>      | <b>NO</b>        | <b>NO</b>         | <b>NO</b>         |
| DPA $V_{DD}$ (V)     | <b>0.5</b>                           |                                      |                                      | 3                  | 1.2/2.4          | 2.2              | 1.4               | 1.2/2.4         | 2.1              | 0.9 / 1.8         | 1                 |
| $f_c$ (GHz)          | <b>2.0</b>                           | <b>2.2</b>                           |                                      | 2.6                | 3.5              | 2.45             | 2.5               | 2               | 2.1              | 1                 | 0.93              |
| Peak $P_{OUT}$ (dBm) | <b>14.3</b>                          | <b>14.6</b>                          |                                      | 28.1               | 25.3             | 28.2             | 24.5              | 20.5            | 24               | 8 <sup>1</sup>    | 8                 |
| Peak DE (%)          | <b>37</b>                            | <b>43.8</b>                          |                                      | 40.7               | NA               | 39               | 42.7              | NA              | 35               | 27.4 <sup>1</sup> | NA                |
| Peak PAE (%)         | <b>24<sup>3</sup></b>                | <b>26<sup>3</sup></b>                |                                      | 35                 | 30.4             | NA               | NA                | 20              | NA               | 15.3 <sup>1</sup> | 45                |
| Matching Network     | <b>On-chip</b>                       |                                      |                                      | On-chip            | On-Chip          | On-Chip          | On-Chip           | On-Chip         | On-chip          | Off-chip          | Off-chip          |
| Technology (nm)      | <b>40</b>                            |                                      |                                      | 65                 | 45 SOI           | 28               | 28                | 65              | 65               | 28                | 40                |

<sup>1</sup> Calculated or estimated from the paper figures or tables.

<sup>2</sup> LUT-DPD for AM-AM, no DPD for AM-PM

<sup>3</sup> PAE includes the power consumption of all drivers, digital decoders, the multiphase RF clock generator and LDO.

using nonlinear DPA, this paper achieves comparable or higher efficiency without compromising the linearity. On the other hand, compared with prior art which has used linear DPA structures, this paper achieves higher linearity, higher signal bandwidth, and higher efficiency or output power.

## VI. CONCLUSION

While the linearity and power/efficiency of a TX stage are normally traded off against each other, this paper provides advanced circuit-level linearization techniques suitable for switched-mode DPAs without the need for DPD or any compromise in the output power or efficiency. Three novel circuit techniques have been introduced, namely, nonlinear sizing, overdrive-voltage tuning, and multiphase RF clocking. The combination of these inventive techniques leads to very high TX linearity for wideband signals. They also allow digitally controlled fine tuning to manage the variations of PVT, operating frequency, and output load. Compared with prior art which uses linear DPD-less DPA structures, this paper achieves higher linearity, higher signal bandwidth, and higher efficiency or higher output power. When comparing with DPAs in general (including those using DPD), this paper still can provide better linearity with comparable efficiency.

## ACKNOWLEDGMENT

The authors would like to thank A. Akhnoukh, W. Straver, and M. Pelk with TU Delft for their high quality support during the fabrication and measurement, and Dr. M. Babaie for the technical discussions.

## REFERENCES

- [1] R. B. Staszewski *et al.*, "All-digital PLL and transmitter for mobile phones," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2469–2482, Dec. 2005.
- [2] A. Kavousian, D. K. Su, M. Hekmat, A. Shirvani, and B. A. Wooley, "A digitally modulated polar CMOS power amplifier with a 20-MHz channel bandwidth," *IEEE J. Solid-State Circuits*, vol. 43, no. 10, pp. 2251–2258, Oct. 2008.
- [3] C. D. Presti, F. Carrara, A. Scuderi, P. M. Asbeck, and G. Palmisano, "A 25 dBm digitally modulated CMOS power amplifier for WCDMA/EDGE/OFDM with adaptive digital predistortion and efficient power control," *IEEE J. Solid-State Circuits*, vol. 44, no. 7, pp. 1883–1896, Jul. 2009.
- [4] D. Chowdhury, L. Ye, E. Alon, and A. M. Niknejad, "An efficient mixed-signal 2.4-GHz polar power amplifier in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 46, no. 8, pp. 1796–1809, Aug. 2011.
- [5] L. Ye, J. Chen, L. Kong, E. Alon, and A. M. Niknejad, "Design considerations for a direct digitally modulated WLAN transmitter with integrated phase path and dynamic impedance modulation," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3160–3177, Dec. 2013.
- [6] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, and J. R. Long, "A wideband 2 × 13-bit all-digital I/Q RF-DAC," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 4, pp. 732–752, Apr. 2014.
- [7] M. Babaie *et al.*, "A fully integrated bluetooth low-energy transmitter in 28 nm CMOS with 36% system efficiency at 3 dBm," *IEEE J. Solid-State Circuits*, vol. 51, no. 7, pp. 1547–1565, Jul. 2016.
- [8] G. D. Ewing, "High-efficiency radio-frequency power amplifiers," Ph.D. dissertation, Dept. Elect. Eng., Oregon State Univ., Eugene, OR, USA, Jun. 1964. [Online]. Available: <http://ir.library.oregonstate.edu/xmlui/handle/1957/20196>
- [9] N. O. Sokal and A. D. Sokal, "Class E-A new class of high-efficiency tuned single-ended switching power amplifiers," *IEEE J. Solid-State Circuits*, vol. SSC-10, no. 3, pp. 168–176, Jun. 1975.
- [10] S.-A. El-Hamamsy, "Design of high-efficiency RF class-D power amplifier," *IEEE Trans. Power Electron.*, vol. 9, no. 3, pp. 297–308, May 1994.
- [11] H. Kobayashi, J. M. Hinrichs, and P. M. Asbeck, "Current mode class-D power amplifiers for high efficiency RF applications," in *IEEE MTT-S Int. Microw. Symp. Dig.*, vol. 2, May 2001, pp. 939–942.
- [12] S. C. Cripps, *RF Power Amplifiers for Wireless Communications* (Microwave Technology Library). Norwood, MA, USA: Artech House, 2006.
- [13] C. Lu *et al.*, "A 24.7 dBm all-digital RF transmitter for multimode broadband applications in 40 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 332–333.
- [14] Z. Deng *et al.*, "A dual-band digital-WiFi 802.11a/b/g/n transmitter SoC with digital I/Q combining and diamond profile mapping for compact die area and improved efficiency in 40 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan./Feb. 2016, pp. 172–173.

- [15] J. S. Park, S. Hu, Y. Wang, and H. Wang, "A highly linear dual-band mixed-mode polar power amplifier in CMOS with an ultra-compact output network," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1756–1770, Aug. 2016.
- [16] V. Vorapipat, C. Levy, and P. Asbeck, "A class-G voltage-mode Doherty power amplifier," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 46–47.
- [17] D. Cousinard *et al.*, "A 0.23 mm<sup>2</sup> digital power amplifier with hybrid time/amplitude control achieving 22.5 dBm at 28% PAE for 802.11g," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 228–229.
- [18] J. Park, Y. Wang, S. Pellerano, C. Hull, and H. Wang, "A 24 dBm 2-to-4.3 GHz wideband digital power amplifier with built-in AM-PM distortion self-compensation," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 230–231.
- [19] S. Luschas, R. Schreier, and H.-S. Lee, "Radio frequency digital-to-analog converter," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1462–1467, Sep. 2004.
- [20] W. Yuan, V. Aparin, J. Dunworth, L. Seward, and J. S. Walling, "A quadrature switched capacitor power amplifier," *IEEE J. Solid-State Circuits*, vol. 51, no. 5, pp. 1200–1209, May 2016.
- [21] S. Zheng and H. C. Luong, "A WCDMA/WLAN digital polar transmitter with low-noise ADPLL, wideband PM/AM modulator, and linearized PA," *IEEE J. Solid-State Circuits*, vol. 50, no. 7, pp. 1645–1656, Jul. 2015.
- [22] P. E. P. Filho, M. Ingels, P. Wambacq, and J. Cranckx, "An incremental-charge-based digital transmitter with built-in filtering," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 3065–3076, Dec. 2015.
- [23] A. Ba *et al.*, "A 1.3 nJ/b IEEE 802.11ah fully-digital polar transmitter for IoT applications," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 3103–3113, Dec. 2016.
- [24] M. Hashemi *et al.*, "An intrinsically linear wideband digital polar PA featuring AM-AM and AM-PM corrections through nonlinear sizing, overdrive-voltage control, and multiphase RF clocking," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 300–301.
- [25] S. D. Kee, I. Aoki, A. Hajimiri, and D. Rutledge, "The class-E/F family of ZVS switching amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 51, no. 6, pp. 1677–1690, Jun. 2003.
- [26] F. H. Raab, "Class-E, class-C, and class-F power amplifiers based upon a finite number of harmonics," *IEEE Trans. Microw. Theory Techn.*, vol. 49, no. 8, pp. 1462–1468, Aug. 2001.
- [27] M. Acar, A. J. Annema, and B. Nauta, "Analytical design equations for class-E power amplifiers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 54, no. 12, pp. 2706–2717, Dec. 2007.
- [28] P. T. M. van Zeijl and M. Collados, "A digital envelope modulator for a WLAN OFDM polar transmitter in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 10, pp. 2204–2211, Oct. 2007.
- [29] B. Razavi, *Design of Analog CMOS Integrated Circuits*, 1st ed. New York, NY, USA: McGraw-Hill, 2001.
- [30] C.-H. Lin and K. Bult, "A 10-b, 500-MSample/s CMOS DAC in 0.6 mm<sup>2</sup>," *IEEE J. Solid-State Circuits*, vol. 33, no. 12, pp. 1948–1958, Dec. 1998.
- [31] M. Maymandi-Nejad and M. Sachdev, "A monotonic digitally controlled delay element," *IEEE J. Solid-State Circuits*, vol. 40, no. 11, pp. 2212–2219, Nov. 2005.
- [32] A. V. Oppenheim and R. W. Schafer, *Discrete-Time Signal Processing* (Prentice-Hall Signal Processing Series). Englewood Cliffs, NJ, USA: Prentice-Hall, 1989.



**Mohsen Hashemi** (S'14) was born in Eilam, Iran. He received the B.Sc. degree in electrical engineering from Shahid Beheshti University, Tehran, Iran, in 2006, the M.Sc. degree in microelectronics from the Sharif University of Technology, Tehran, in 2010. He is currently pursuing the Ph.D. degree in microelectronics with the ELCA Group at the Delft University of Technology, Delft, The Netherlands.

He was the Head of the RF Research Group at Baregheh Company, Tehran, where he was involved in designing wideband receivers and frequency synthesizers. His current research interests include wideband efficient transmitters, analog/digital signal conditioning for high accuracy in transceivers, and high-speed data converters.

Mr. Hashemi was a recipient of the IEEE SSCS Student Travel Grant Award in 2017. He has been serving as a reviewer for the IEEE TRANSACTIONS ON VLSI SYSTEMS and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS since 2014.



**Yiyu Shen** (S'17) received the M.S. degrees in microelectronics from Tsinghua University, Beijing, China, and Katholieke Universiteit Leuven, Leuven, Belgium, in 2014. He is currently pursuing the Ph.D. degree in electrical engineering with the Delft University Technology, Delft, The Netherlands.

His current research interests include power amplifiers and digital-assisted RF integrated circuit and system.



**Mohammadreza Mehrpoor** (S'10) received the B.Sc. degree in electrical engineering from Tehran University, Tehran, Iran, in 2010, and the M.Sc. (*cum laude*) degree in Microelectronics from the Delft University of Technology, Delft, The Netherlands, in 2012. He is currently pursuing the Ph.D. degree in microelectronics and quantum engineering with the Delft University of Technology, Delft, The Netherlands.

He was an RF System and Circuit Architect with Cetena Microelectronics, Delft, where he was involved in low-power CMOS circuits and systems for Internet of Things applications from 2012 to 2015. His current research interests include analog and RF integrated circuits and cryogenic electronics for quantum computations.

Mr. Mehrpoor was a recipient of the Top Talent Fellowship from the Delft University of Technology the Project Fellowship from Mediatek, Taiwan, in 2010, and the IEEE Radio Frequency Integrated Circuits Symposium Best Student Paper Award (first prize) in 2017.



**Morteza S. Alavi** (S'09–M'13) was born in Tehran, Iran. He received the B.Sc. degree in electrical engineering from the Iran University of Science and Technology, Tehran, Iran, in 2003, the M.Sc. degree in electrical engineering from the University of Tehran, Tehran, in 2006, and the Ph.D. degree in electrical engineering from the Delft University of Technology, Delft, The Netherlands, in 2014.

He was the Co-Founder and the CEO of DitIQ B.V., Delft, a local company developing energy-efficient, and wideband wireless transmitters for the next generation of the cellular network. Since 2016, he has been an Assistant Professor with the ELCA/ERL Group at the Delft University of Technology. His current research interests include designing RFIC transceivers for wireless and cellular communication systems as well as CMOS wireline transceivers.

Dr. Alavi serves as a reviewer for the TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES and the IEEE JOURNAL OF SOLID STATE CIRCUITS. He was a recipient of the Best Paper Award of the 2011 IEEE International Symposium on Radio-Frequency Integrated Technology. He was also a recipient of the Best Student Paper Award (Second Place) of the 2013 Radio-Frequency Integrated Circuits (RFIC) Symposium. Recently, his Ph.D. student won the Best Student Paper Award (First Place) of the 2017 RFIC Symposium held in Honolulu, HI, USA.



**Leo C. N. de Vree** (M'01–SM'04) received the Ph.D. (*cum laude*) degree from the Delft University of Technology, Delft, The Netherlands, in 1996.

In 1996, he was appointed as an Assistant Professor at the Delft University of Technology, involved in the nonlinear distortion behavior of bipolar transistors at the Delft Institute of Microelectronics and Submicron Technology. He was appointed as an Associate Professor and a Full Professor at the Delft University of Technology, in 1999 and 2015, respectively, where he became responsible for the Electronic Research Laboratory. He was involved in solutions for improved linearity and RF performance at the device, and circuit and system levels. He has co-authored more than 110 IEEE refereed conference and journal papers and holds several patents. His current research interests include RF transmitters and measurement systems, RF technology, and circuit/system concepts for wireless systems.

Prof. de Vree is the Co-Founder/Advisor of Anteverta-mw, a company specialized in RF device characterization, a co-recipient of the IEEE Microwave prize in 2008, Mentor of the Else Kooi Prize awarded for the Ph.D. work in 2010, and a Mentor of the Dow Energy Dissertation Prize awarded for the Ph.D. work in 2011. He was a recipient of the TUD Entrepreneurial Scientist Award 2015. He co-guided several students who won the Best Paper Awards at BCTM, PRORISC, GAAS, ESSDERC, IMS, RFIT, and RFIC.