

# Design of a Fully Integrated Two-Stage Watt-Level Power Amplifier Using 28-nm CMOS Technology

Patrick Oßmann, *Student Member, IEEE*, Jörg Fuhrmann, *Student Member, IEEE*, Krzysztof Dufrêne, *Senior Member, IEEE*, Jonas Fritzin, *Member, IEEE*, José Moreira, Harald Pretl, *Senior Member, IEEE*, and Andreas Springer, *Member, IEEE*

**Abstract**—We present a linear two-stage power amplifier (PA) for UMTS terrestrial radio access (UTRA) applications. The PA has been designed using a standard 28-nm complementary metal–oxide–semiconductor process. It includes an on-chip input matching network, a predriver stage, and an on-chip output matching network. Additional process-voltage-temperature compensation circuits and electrostatic discharge protection have been implemented on-chip. A differential triple-stack transistor array acts as transconductance circuit and generates watt-level RF output power. Measured saturated output power is more than 31 dBm and peak power-added efficiency is 33% for sinusoidal operation at 1.8 GHz. When applying memoryless digital predistortion (DPD) for 3rd Generation Partnership Project (3GPP) UTRA test vectors, an adjacent-channel leakage ratio of  $\leq -33$  dBc at  $\pm 5$  MHz for 26.5-dBm output power is achieved. A corresponding error-vector magnitude of  $\leq 1.7\%$  can be measured when using memoryless DPD.

**Index Terms**—CMOS RF power amplifier (PA), 3rd Generation Partnership Project (3GPP) mobile communications handset applications, two-stage amplifier.

## I. INTRODUCTION

COMPLEMENTARY metal–oxide–semiconductor (CMOS) technology has been proven as well suited for integrating digital and analog low-power circuits on a single die by inherently providing both positive and negative charge carrier devices. Due to advances in back-end-of-line (BEOL)

Manuscript received July 06, 2015; revised October 20, 2015; accepted November 14, 2015. Date of publication December 03, 2015; date of current version January 01, 2016. This work was supported in part by the Linz Center of Mechatronics (LCM) under the framework of the Austrian COMET-K2 program.

P. Oßmann and A. Springer are with the Institute for Communications Engineering and RF-Systems (NTHFS), Johannes Kepler University, 4040 Linz, Austria (e-mail: p.ossmann@nths.jku.at; a.springer@nths.jku.at).

J. Fuhrmann is with the Institute for Electronics Engineering, Friedrich-Alexander University Erlangen–Nuremberg, 91054 Erlangen–Nuremberg, Germany, and also with Danube Mobile Communications Engineering (DMCE) GmbH & Co. KG, 4040 Linz, Austria.

K. Dufrêne is with Danube Mobile Communications Engineering (DMCE) GmbH & Co. KG, 4040 Linz, Austria.

J. Fritzin and J. Moreira are with Intel Deutschland GmbH, 85579 Munich, Germany.

H. Pretl is with the Research Institute for Integrated Circuits (RIIC), Johannes Kepler University, 4040 Linz, Austria, and also with Danube Mobile Communications Engineering (DMCE) GmbH & Co. KG, 4040 Linz, Austria (e-mail: harald.pretl@jku.at).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/TMTT.2015.2503343

properties, aggressive feature size scaling according to Moore's law [1], and smart circuit and physical-design solutions [2], CMOS technology has matured to a successful competitor for integrated low-cost applications in wireless front-end products.

Especially for low-power circuits each new technology node is beneficial in terms of integration level, chip area, and power consumption [3]. Consequently, it is desired to integrate the entire transceiver in the latest CMOS technology to benefit from those properties and achieve a low-cost single-chip solution. This means all RF front-end building blocks need to be implemented in the same technology as the digital front-end [4]. Especially for the implementation of the power amplifier (PA), however, the use of deep nanometer CMOS technology poses several challenges [5], even though a successful integration into a complete transceiver chain has been proven by Moreira *et al.* [4].

With a device gate–oxide thickness in the order of 1.5 nm and below for 28-nm CMOS technology [6], the transistor gate–drain/source breakdown voltages are reduced. This requires reduced supply voltages in order not to damage the devices. Moreover, since the transistor output power limit is proportional to the square of the device supply, the required load impedance to reach a desired power level needs to be reduced as well. Consequently, parasitic resistance-aware circuit design is becoming more critical and challenging in deeply scaled CMOS technologies. The increased resistive losses in the back-end due to scaled metal layers mainly impact the PA output power and efficiency [7].

To fulfill output power requirements for typical 3rd Generation Partnership Project (3GPP) UMTS terrestrial radio access (UTRA) operation, peak voltages exceeding tens of volts might drop across the PA output. In order to be compliant with the reduced breakdown voltages in nanometer technologies, stacking of transistors to handle the large voltage swings and to be able to generate the desired RF output power without destroying the devices is mandatory [8]. Therefore, various circuit techniques have been proposed to overcome the limited power capability of scaled CMOS devices [9]–[11] or to make the PA robust versus high voltage swing [12], [13]. Furthermore, due to specifications of modern communication schemes incorporating high peak-to-average ratio (PAR) signals and stringent out-of-band requirements [14], the efficiency versus linearity tradeoff in PAs becomes even more important than it has been in the past. Large back-off requirements in order to achieve the desired PA linearity generally results in poor system efficiency.

This paper is an extension of [15]–[17] and describes the design and analysis of a linear two-stage watt-level PA implemented in a 28-nm standard CMOS process.

Unlike [10], [12], and [18], no additional external components like off-chip predrivers or matching networks are required for operation. The PA has a small-signal gain of more than 28 dB, therefore it can act as a standard building block to be integrated into modern low-power digital RF transceivers. This paper particularly addresses the challenges and design trade-offs of integrating high-power circuitry using deep-nanometer CMOS technology. Circuit techniques to overcome the problem of limited transistor power capability in scaled CMOS technology are presented and verified by measurements.

Section II presents the overall system concept and emphasizes problems to be solved for a successful PA implementation when utilizing such a deep-nanometer CMOS process. In Section III, we show the design of predriver and interstage matching circuits. A key element of the proposed PA is a triple-stack transistor array to generate watt-level RF power, which is introduced in Section IV. In Section V, the biasing concept is presented. The physical implementation of the output matching network (OMN) including details of electromagnetic (EM) analysis using the Sonnet field simulator is explained in Section VI. Sections VII and VIII show sign-off simulations, the silicon implementation, and experimental results for dc and small-signal measurements. Measurements using sinusoidal stimulus have been included and 3GPP operation has been proven using RMC12k2 test signals. A comparison with state-of-the-art implementations of recently published CMOS handset PAs is shown in Section IX, which also concludes this work.

## II. SYSTEM ARCHITECTURE

For UTRA applications 3GPP specifies maximum channel transmit (TX) power at the antenna to be 24 dBm [14]. For a generic multi-mode/multi-standard W-CDMA/UMTS transceiver architecture, as proposed by Holma and Toskala in [19], the key building blocks in the TX path are the PA followed by a duplex filter and isolator. The isolator output is attached to an antenna switch, which is the final element in the TX chain. When adding typical insertion loss (IL) of the antenna switch ( $IL_{ant} \approx 0.5$  dB), duplex filter ( $IL_{dpx} \approx 2$  dB), and isolator ( $IL_{iso} \approx 0.5$  dB), the PA has to deliver roughly 27-dBm linear RF output power to overcome the TX chain losses. At this power level it still has to fulfill 3GPP requirements. For a signal PAR of up to 5 dB, a peak power requirement of  $P_{out} \approx 31$  to 32 dBm at the PA output results [14]. This translates to a peak-to-peak voltage swing of

$$V_{out} = 2 \cdot \sqrt{2 \cdot Z_L \cdot 10 \exp\left(\frac{(P_{out} + PAR)}{10}\right)} \cdot 1 \text{ mW} \quad (1)$$

across a single-ended load  $Z_L$  at the PA output. For the above equation, a peak-to-peak voltage of approximately 25 V results when using the aforementioned values and assuming a pure resistive load of  $Z_L = 50 \Omega$ .

Besides power specification, coexistence is another crucial issue in 3GPP communications. That is, the signal transmission



Fig. 1. PA block diagram including chip boundary and reflection coefficients in impedance Smith chart. (a) Proposed PA system architecture block diagram. (b) Reflection coefficients ( $Z_0 = 50 \Omega$ ).

of the PA in a dedicated TX channel must not interfere with any adjacent or alternating channel. Therefore, an adjacent-channel leakage ratio (ACLR), which must not exceed  $-33$  dBc at 5-MHz spacing from carrier frequency has been defined by 3GPP for UTRA applications [14]. In order to fulfill such stringent linearity requirements, a highly linear PA transfer characteristic is mandatory. However, due to the strong non-linear behavior of CMOS PAs, the use of techniques to linearize the gain stage such as analog or digital predistortion (DPD) are usually required [20] to fulfill UTRA linearity specifications.

Recently Breschel *et al.* presented a solution for a multi-standard second-generation (2G)/third-generation (3G)/fourth-generation (4G) cellular modem in 28-nm CMOS technology [21]. This single-chip radio supports multiple radio access technologies like time division duplex (TDD)/frequency division duplex (FDD) long-term evolution (LTE), TDD/FDD high-speed packet access (HSPA), and Global System for Mobile Communications (GSM)/General Packet Radio Service (GPRS); however, it does not feature an integrated PA. For a possible future merge of similar digital baseband/low-power RF architectures, a PA exhibiting high linear gain and utilizing low-power deep nanometer CMOS devices is required. A classical approach to overcome the problem of limited linear gain of CMOS PAs is applying a two-stage topology, which is proposed in Fig. 1(a).

From left to right there is a differentially driven input transformer connected to a predriver stage. The interstage matching network connects the predriver with the power ( $gm$ ) stage. This block generates more than one watt of peak RF output power and is directly attached to an on-chip output transformer. Electrostatic discharge (ESD), process-voltage-temperature (PVT) compensation, biasing, and control networks have been omitted in this visualization. In total, six different reflection coefficients can be identified from input to output, which are shown in Fig. 1(b). The largest transformation ratio is from predriver



Fig. 2. Simplified circuit diagram of the two-stage PA. The predriver amplifier (dashed-dotted frame – · –), the interstage matching (dashed frame —), and transconductance stage (dashed-dotted-dotted frame – · ·) are highlighted. In the schematic, the circuit's key parameters are annotated.



Fig. 3. Sonnet model of the input matching network used for EM simulations. Reference planes (—) and co-calibrated ports are shown. Metal trace width  $W$ , spacing  $s$ , and distance  $d$  for selected path segments is shown.

output  $\Gamma_{\text{out},\text{pre}}^{\text{opt}}$  to  $g_m$ -stage input  $\Gamma_{\text{in},g_m}^{\text{opt}}$ . Fig. 2 depicts a simplified PA circuit diagram, where ESD, PVT compensation, biasing, and control circuits again are not drawn for brevity. For future investigations on back-off and low battery voltage operation, the transistor array can be scaled down to  $1/2 \cdot W_{gm}$  and  $1/4 \cdot W_{gm}$  by means of two control bits  $b_0$  and  $b_1$  in combination with a thin-oxide 28-nm transistor  $M_0$  that is switched to either high or low impedance. Its gate voltage has been generated using an on-chip low drop-out (LDO) regulator.

### III. PREDRIVER AND INTERSTAGE MATCHING

### A. Predriver Amplifier

Fig. 3 shows an EM model for the input matching network including bump connection pads. The reference planes (dashed lines) have been chosen in such a way that the model is slightly too pessimistic since here the current distributes from the left pad edge instead of the pad center, where actually the tin-silver (SnAg) bumps are physically connected. A capacitively tuned transformer  $T_1$  acts as input impedance matching network and provides the transistor  $M_{1,\text{pre}}$  bias voltage utilizing its secondary center tap. It transforms  $\Gamma_S = 0$  to  $\Gamma_{\text{in},\text{pre}}^{\text{opt}} = 0.96 + j0.15$ , which is an optimum input impedance for maximum small-signal gain. The predriver stage optimum input impedance has been determined utilizing a small-signal analysis and then applying a conjugate matching. To ensure low IL for low and medium input power, the transformer metal



Fig. 4. Simulated input matching small-signal performance. (a) Inductance and coupling. (b) Matching and IL.

TABLE I  
SUMMARY OF INPUT TRANSFORMER PERFORMANCE AT 1.8 GHz

|       | $M$ | $L_p$ | $L_s$ | $k$  | $Q_p$ | $Q_s$ | $\eta$ | $IL$ |
|-------|-----|-------|-------|------|-------|-------|--------|------|
| Unit  | nH  | nH    | nH    | —    | —     | —     | %      | dB   |
| Value | 0.8 | 1.2   | 1.2   | 0.74 | 11    | 11    | 72     | 1.4  |

traces have been designed as wide as possible using only thick metal layers.

Simulated self-inductance  $L$ , mutual inductance  $M$ , and coupling coefficient  $k$  of the input transformer's EM model shows Fig. 4(a), whereas matching and IL is shown in Fig. 4(b). Given the transformer IL, its efficiency  $\eta$  can be calculated as  $\eta = 10 \exp(-IL/10)$ . The transformer performance has been summarized in Table I for operation at  $f = 1.8$  GHz.

A differential cascode circuit, which is shown within the dashed-dotted frame in Fig. 2, serves as an on-chip predriver. Transistor  $M_{1,\text{pre}}$  acts as a common source (CS) and transistor  $M_{2,\text{pre}}$  is connected as a common gate (CG) device. The RF signal is applied to the gate of  $M_{1,\text{pre}}$ . Gate of  $M_{2,\text{pre}}$  is RF grounded with a dc value of  $V_{g2,\text{pre}} = 1.8$  V. A cascaded structure has been applied since it is beneficial to reduce the Miller effect by lowering the coupling between input and output because of an intrinsic low amplification from the gate of  $M_{1,\text{pre}}$  to source node of  $M_{2,\text{pre}}$  due to a small load resistance seen by the CS stage. Transistor geometry is  $W_{\text{pre}}/L_{\text{pre}} = 50 \times 10 \mu\text{m}/180 \text{ nm}$  to generate saturated output power of  $P_{\text{out}} \approx 30 \text{ mW} = 15 \text{ dBm}$ . The predriver biasing



Fig. 5. Sonnet model of interstage matching network with co-calibrated ports  $G \dots K$ . The ground plane underneath the predriver structure has been included to improve model accuracy. Metal trace width  $W$ , spacing  $s$ , and distance  $d$  for selected path segments is shown.



Fig. 6. Simulated interstage transformer small-signal performance. (a) Inductance and coupling. (b) Matching and IL.

is in a class-A configuration using  $V_{DD,\text{pre}} = 1.8 \text{ V}$  and  $V_{g1,\text{pre}} = 0.7 \text{ V}$ . The  $M_{1,\text{pre}}$  dc gate voltage has been generated on-chip by means of a source degenerated diode-connected transistor (not shown in Fig. 2 for brevity).

### B. Interstage Matching

Fig. 5 shows the physical design of the interstage matching network including a predriver ground plane. Tuning capacitors at the transformer's input and output are in the order of 1 pF and have been omitted in this visualization.

To minimize interstage matching network related IL, careful layout techniques and placement of the inductors are mandatory. To optimize the inductors' position, the interstage transformer  $T_2$  metal traces are slightly smaller when compared to the input matching transformer  $T_1$ . Therefore the inductors' distance to each other can be reduced, which turned out to be beneficial due to increased differential coupling. Simulated self-inductance  $L$ , mutual inductance  $M$ , and coupling coefficient  $k$  of the interstage transformer's EM model shows Fig. 6(a), whereas matching and IL is shown in Fig. 6(b). Values evaluated at  $f = 1.8 \text{ GHz}$  are summarized in Table II, where again  $\eta = 10 \exp(-\text{IL}/10)$ .

EM simulations furthermore showed octagonal spiral inductors being beneficial in terms of quality factor versus squared inductors. Fig. 7(a) presents the final simulation results of the coils' on-chip implementation. At frequency of interest a peak in the quality factor of  $Q = 12$  and an inductance of  $L =$

TABLE II  
SUMMARY OF INTERSTAGE TRANSFORMER PERFORMANCE AT 1.8 GHz

|       | $M$ | $L_p$ | $L_s$ | $k$  | $Q_p$ | $Q_s$ | $\eta$ | $IL$ |
|-------|-----|-------|-------|------|-------|-------|--------|------|
| Unit  | nH  | nH    | nH    | —    | —     | —     | %      | dB   |
| Value | 0.8 | 1.1   | 1.1   | 0.76 | 21    | 11    | 71     | 1.5  |



Fig. 7. Simulated interstage matching small-signal performance. (a) Quality factor and inductance for  $L_1$  and  $L_2$ . (b) Full interstage matching network S-parameter simulation.

4.5 nH can be noticed. Due to the high impedance transformation ratio from predriver to  $g_m$ -stage, a multistage matching architecture overcomes the problem of limited bandwidth. As indicated by the Smith chart representation given in Fig. 1(b), total impedance transformation ranges from  $\Gamma_{\text{out},\text{pre}}^{\text{opt}} = 0.94 + 0.03j$  to  $\Gamma_{\text{in},g_m}^{\text{opt}} = 0.99 - 0.01j$ . Fig. 7(b) shows the simulation results for the full interstage matching network. Whereas the 3-dB bandwidth is approximately 500 MHz, matching to the PA remains below  $-8 \text{ dB}$  in this interval. Total IL is mainly dominated by the inductors'  $Q$ -factor and by the transformer's IL.

### IV. DIFFERENTIAL TRIPLE-STACK PA CORE

Core element of the proposed two-stage PA is the  $g_m$ -stage, which is shown in Fig. 8(a) in more detail (single-ended for brevity), where the drain of transistor  $M_3$  is connected to the OMN. The fundamental concept has been introduced by Leuschner *et al.* [12], [13] for a stack of four transistors. The authors call this a stacked-cascode architecture, which leads to good robustness and efficiency for gigahertz operation. It has been optimized for an implementation in a 65-nm standard CMOS process. For the proposed PA, we apply a triple-stack concept [22] utilizing a floating cascode to reduce the overall stack on-resistance. This is beneficial in terms of efficiency and RF performance to overcome the increased parasitic losses caused by scaled metals in the 28-nm CMOS process BEOL stack-up. The triple-stack transistors are realized in a p-well within a deep n-well to avoid breakdown of the drain diode. The deep n-well is biased to the supply voltage and the bulk connection of the upper cascode to its source. Hence, the p-well swings together with the top cascode circuit, thereby forcing the bulk-to-source potential to zero what eliminates the inherent body-effect of standard CMOS transistors. This can be also derived using (2),

$$V_{\text{th}} = V_{\text{th},0} + \gamma \left( \sqrt{V_{\text{bs}} + 2\Phi_s} - \sqrt{2\Phi_s} \right) \quad (2)$$

where  $V_{\text{th}}$  is the device threshold voltage,  $V_{\text{bs}}$  is the bulk–source voltage,  $\Phi_s$  is the surface potential,  $V_{\text{th},0}$  is the threshold voltage of a long-channel device at zero substrate bias, and  $\gamma$  is the body bias coefficient. The impact of this effect when comparing



Fig. 8. Triple-stack schematic and simulated transient waveforms. Gate voltages are dashed (—), drain voltages are solid lines (—). Gate-drain/source voltage stress must not exceed the SOA, which is 3.5-V RF stress for each device. Critical HCl transitions are highlighted. (a) Simplified triple-stack. (b) Transient terminal waveforms.



Fig. 9. Impact of the floating bulk cascode on PA performance. (a) Triple-stack performance when using ideal matching networks. (b) Bulk-source voltage for common ground cascode architecture.

to a common grounded cascode shows Fig. 9 for the proposed triple-stack architecture.

Simulations utilize ideal (lossless) matching networks and no distributed parasitic effects in the transistor array. Whereas there is no difference in saturated output power for the floating bulk cascode when compared to a common ground cascode, in terms of peak power-added efficiency (PAE) there is an improvement of roughly PAE = 4%.

Recalling the root-mean-square (RMS) output power and a typical 3GPP signal PAR, a voltage swing of up to  $V_{\text{out},pp} \approx 25$  V drops across a  $50\text{-}\Omega$  load, as demonstrated in (1). This translates to a differential voltage swing of

$$V_{d3}^+ = n_3 \cdot V_{\text{out},pp} = \frac{n_{3,p}}{n_{3,s}} \cdot V_{\text{out},pp} \quad (3)$$

at the drain of the positive transistor  $M_3^+$  with  $n_3$  being the ratio between the number of primary to secondary windings of transformer  $T_3$ . In order to comply with the transistors' breakdown voltage, which is in the order of 3.5-V RF stress between any terminal [23], stacking of three transistors can be derived

for  $n_3 \approx 1/1.5$ . By properly choosing the capacitive part of the OMN and the value of  $C_{g3}$ , where the parasitic gate-drain capacitance of  $M_3^+$  must not be neglected, the transient waveforms according to Fig. 8(b) result.

Since reliability is crucial in CMOS circuits with large voltage swings, it is important to analyze the transient voltages on possible sources of hot-carrier injection (HCI) [24], [25]. RF stress with high drain and low gate voltage might create electron-hole pairs, which then can accelerate in the channel electric field. This injection mechanism has been reported to be the most stringent device degradation in submicrometer technology since a large amount of hot electrons are injected into the gate oxide at the same time [26]. In Fig. 8(b), the most critical transitions are highlighted. Given the simulated conditions, the RF stress level is not expected to pose any HCI injections.

Unlike the solution proposed by Sowlati and Leenaerts [27] we apply the capacitive feedback to the gate of the upper cascode from its drain rather than its source, which facilitates the physical design of the capacitor array. For this situation, however, it is worth noting that  $R_{g3}$  and  $C_{g3}$  should be tuned in such a way that, on the one hand, the gate-drain/source stress does not exceed the safe operating area (SOA) at the top transistor, and on the other hand, the low-pass filter time constant fulfills (4) in order not to cause biasing related memory effects within the modulation bandwidth  $B_{\text{mod}}$ ,

$$\tau_3 = R_{g3}C_{g3} \ll \frac{1}{B_{\text{mod}}}. \quad (4)$$

Single-ended impedance level seen at the drain side of  $M_3$  is

$$\underline{Z}_{d3} = \left( \frac{n_{3,p}}{n_{3,s}} \right)^2 \cdot \underline{Z}_L \quad (5)$$

which translates to a minimum requirement for the transistor array size of  $W/L \geq 1000 \times 10 \mu\text{m}/180 \text{ nm}$  to generate the desired PA drain current. Gate length of the triple-stack transistors is the minimum feature size for thick-oxide devices. In the output stage, thick-oxide transistors are required since the gate-drain/source voltage RF stress would exceed the SOA for thin-oxide devices, which is much less than 3.5 V. The operating point for the CS transistor is in the class-AB configuration using  $V_{g1} = 0.55$  V. Supply voltage at the top cascode transistor is  $V_{DD} = 3$  V.

## V. BIASING AND PVT COMPENSATION

In order to generate the required biasing voltages/currents on-chip and simultaneously being PVT independent, special care has been taken of the triple-stack CS transistor bias voltage generation. The proposed biasing scheme relies on a controlled current mirror and a PA replica stage. Fig. 10 shows a high-level conceptual view, where the actual PA stage is framed by a dashed line, and the scaled PA replica is within a dashed-dotted frame. For simplicity, the circuit is analyzed single-ended in this section.

Due to its cascaded nature, the current  $I_{d1}$  flowing through the triple-stack CS transistor  $M_1$  essentially dominates the PA operating point. A first-order approximation to describe this current is well known and repeated in (6), where the electron



Fig. 10. Proposed biasing concept. Key elements are PA core (dashed —) transistors and a scaled replica (dashed-dotted ——). Both blocks are connected to a unity current mirror and an error amplifier  $E_1$ , respectively.

mobility  $\mu_N$ , oxide capacitance  $C_{\text{ox}}$ , and the device threshold voltage  $V_{\text{th}}$  are major contributors for on-chip PVT variations,

$$I_{d1} \approx 1/2\mu_N \cdot C_{\text{ox}} \cdot W_1/L_1 \cdot (V_{g1} - V_{\text{th}})^2. \quad (6)$$

To compensate side effects of bias point fluctuations for different dies and wafers, an accurate PVT independent reference current  $I_{\text{rep1}}$  is being injected externally into the drain of a  $1 : m$  scaled CS replica transistor labeled  $X_1$ . The reference current is chosen as  $I_{\text{rep1}} = I_{DD}/2m$ . Major issue for compensating PVT variations are the drain-to-source voltages of  $M_1$  and  $X_1$ . To overcome this problem, a  $1 : m$  scaled PA replica stack has been introduced in close physical proximity to the actual PA stage. Thereby the assumption of equal temperature and silicon conditions for the PA and replica stage is valid. The replica stage consists of transistors  $X_1^*$ ,  $X_2^*$ , and  $X_3^*$ . Given the reference current  $I_{\text{rep1}}$ , the appropriate drain voltage at  $X_1$  will settle for nominal conditions. To compensate PVT variations at this node, a high gain error amplifier  $E_1$  connects the drain terminals of  $X_1$  and  $X_1^*$ , which are in 1:1 current mirror configuration. In case of a divergence between the drain terminal voltages of  $X_1$  and  $X_1^*$ , the input stage of  $E_1$  adjusts the gate terminal of both transistors to compensate it accordingly. As a consequence, this method allows the control of the actual PA drain current in absence of RF stress by modifying the drain-to-source voltage of replica transistor  $X_1$  properly. To fix the bias drain voltage of the PA CS device  $M_1$ , a diode connected transistor  $X_2$  has been introduced. In conjunction with a source resistance  $R_{X_2^*}$ , the gate voltage can be adjusted using a reference current  $I_{Vg2}$ . After unity gain buffering,  $V_{g2}$  is applied to  $M_2$  and to  $X_2^*$ . Low output impedance of the  $V_{g1}$  and  $V_{g2}$  unity gain buffers within the RF modulation bandwidth has been a major design criterion to minimize bias related memory effects. Transistor  $X_3$  is in diode-connected configuration and directly attached to an off-chip supply voltage without being RF stressed.

Simulations show the dc operation point variation for the (un-)compensated amplifier. While under nominal conditions a quiescent current of  $I_{DD} \approx 250$  mA at  $\vartheta \approx 60^\circ\text{C}$  is desired,



Fig. 11. Simulation of biasing concept. Nominal (dotted ···), slow (dashed-dotted - · -) and fast (dashed —) corners are shown over temperature.



Fig. 12. MC simulation for the uncompensated PA for  $N = 500$  runs.



Fig. 13. MC simulation for the proposed biasing scheme for  $N = 500$  runs.

a maximum spread of roughly 290 mA, i.e., more than 100%, can be observed over worst case PVT corners. For the compensated PA, however, the operation point is nearly independent from PVT variation, which is shown in Fig. 11.

To ensure being independent from statistical uncertainties of the proposed biasing solution, the actual variation of  $I_{DD}$  over process corners and mismatch has been observed utilizing Monte Carlo (MC) simulations. Shown for  $N = 500$  runs, Figs. 12 and 13 verify the proposed concept since the PA drain current is nearly constant over worst case corners and mismatch contrary to the uncompensated PA. The key parameters are summarized in Table III, which shows that the standard deviation of  $I_{DD}$  can be reduced by almost a factor of 10 (44.3 mA/5 mA).

Due to PVT operation point stabilization the amplifier's large-signal output power and small-signal gain have been

TABLE III  
STANDARD DEVIATION  $\sigma$  AND EXPECTATION VALUE  $\mu$  OF  $I_{DD}$

| $I_{DD}$         | $\mu$ | $\sigma$ | Unit |
|------------------|-------|----------|------|
| w/o compensation | 222.6 | 44.3     | mA   |
| w/ compensation  | 244.1 | 5.0      | mA   |

TABLE IV  
PA PERFORMANCE DEGRADATION DUE TO PVT VARIATIONS

| w/o comp.       | $N_{-40}$ | $N_{120}$ | $S_{-40}$ | $S_{120}$ | $F_{-40}$ | $F_{120}$ |
|-----------------|-----------|-----------|-----------|-----------|-----------|-----------|
| $P_{sat}$ (dBm) | 30.1      | 28.0      | 29.2      | 25.8      | 30.7      | 29.1      |
| $G$ (dB)        | 16.7      | 12.7      | 9.8       | 7.7       | 19.8      | 16.4      |
| w/ comp.        | $N_{-40}$ | $N_{120}$ | $S_{-40}$ | $S_{120}$ | $F_{-40}$ | $F_{120}$ |
| $P_{sat}$ (dBm) | 31.2      | 29.4      | 30.9      | 28.8      | 31.3      | 29.9      |
| $G$ (dB)        | 20.2      | 15.3      | 19.2      | 14.7      | 21.2      | 17.2      |

stabilized as well. Table IV compares harmonic-balance simulation results for nominal (N), slow (S), and fast (F) NMOS/PMOS devices. Simulation temperature is  $-40^{\circ}\text{C}$  and  $120^{\circ}\text{C}$ , respectively. For the uncompensated PA, a deviation of  $\Delta P_{sat} \approx 4.9$  dB in terms of saturated output power and  $\Delta G \approx 12.1$  dB in terms of linear gain can be observed. These numbers reduce to  $\Delta P_{sat} \approx 2.5$  dB and  $\Delta G \approx 6.5$  dB, respectively, when applying the proposed compensation technique.

## VI. OMN AND DIFFERENTIAL TO SINGLE-ENDED CONVERSION

By performing large-signal load-pull analysis for the proposed transistor triple-stack, the optimum output reflection coefficient of  $\Gamma_{out,gm}^{\text{opt}} = -(990+3.98j) \times -3$  was determined, as it has been indicated in Fig. 1(b).<sup>1</sup> This translates to a de-normalized optimum output impedance of  $(3.5 + 5j)\Omega$ . A maximum efficiency of  $49\%$  can be simulated when applying this load impedance to the transistor stack and assuming realistic OMN losses. However, implementing such a low-impedance level on-chip has several drawbacks. It requires a transformer with very small form factor in order to have a low ohmic metal resistance. A large tuning capacitor is then required since a small-sized transformer has a low self-inductance. Additionally, a higher turning ratio is required, what results in higher losses due to reduced coupling between primary and secondary winding. Therefore, we increased the real part of  $\Gamma_{out,gm}^{\text{opt}}$  such that a single-ended resistive load of  $Z_{p,SE} \approx 7\Omega$  occurs, which is sufficient to generate the desired level of output power. Of course, this comes at the cost of efficiency degradation since the triple stack is not operating at its optimum load impedance anymore. In order to match the imaginary part, a capacitive tuning network has been introduced between ports M and N according to Fig. 2. The output transformer's final physical design, which has been used for EM field analysis, is shown Fig. 14.

Since the OMN essentially determines the overall PA efficiency, the transformer's standalone efficiency has been identified as

$$\eta = \frac{V_{out}^2}{Z_s} / (v_{inp} - v_{inn}) \Bigg|_{i_{in=1} \text{ A}} \quad (7)$$

<sup>1</sup>For evaluation of constant efficiency contours.



Fig. 14. Sonnet model for on-chip output transformer  $T_3$ . BEOL tuning capacitors are connected between ports M/N and O/P (not shown in this visualization). Port L is the center tap supply connection.



Fig. 15. Simulated output transformer small-signal performance. (a) Efficiency and IL. (b) Coupling and quality factor.



Fig. 16. Output transformer performance metrics versus frequency. (a) Self-inductance. (b) Impedance.

where the power available from a differential source has been evaluated using a single-ended load. The transformer's IL can be converted using the relationship  $IL = -10 \log_{10} \eta$ . Simulation results of the EM model depicts Fig. 15 for efficiency and IL, and Fig. 16 for inductance and impedance. The primary impedance has been identified using (8). For all simulations, the secondary impedance is  $Z_s = 50\Omega$ , except for determination of primary/secondary inductance, for which we used an open circuit. The output transformer figures-of-merit have been summarized in Table V when evaluating at  $f = 1.8$  GHz,

$$Z_p = \Re \left\{ \frac{1}{v_{inp} - v_{inn}} \Big|_{i_{in=1} \text{ A}} \right\}. \quad (8)$$

## VII. SIGN-OFF SIMULATIONS

During final sign-off simulations the major contributors of performance degradation have been identified. Table VI presents the PA performance, where parasitic effects have been taken into consideration. The contributions in terms of power loss  $\Delta P$  and efficiency loss  $\Delta PAE$  have been evaluated at  $f = 1.8$  GHz. Achievable saturated output power

TABLE V  
SUMMARY OF OUTPUT TRANSFORMER PERFORMANCE AT 1.8 GHz

|       | $Z_p$    | $L_p$ | $L_s$ | $k$  | $Q_p$ | $Q_s$ | $\eta$ | $IL$ |
|-------|----------|-------|-------|------|-------|-------|--------|------|
| Unit  | $\Omega$ | nH    | nH    | —    | —     | —     | %      | dB   |
| Value | 14       | 0.93  | 2.8   | 0.76 | 0.76  | 21    | 83     | 0.84 |

TABLE VI  
SIMULATED PA PERFORMANCE DEGRADATION

|              | Input <sup>1</sup> | OMN  | Gate | SPLY | Bias | $\vartheta$ | Unit |
|--------------|--------------------|------|------|------|------|-------------|------|
| $\Delta P$   | 0.4                | 0.8  | 0.3  | 0.3  | 0.2  | 0.1         | dB   |
| $P_{sat}$    | 33.0               | 32.2 | 31.9 | 31.6 | 31.4 | 31.3        | dBm  |
| $\Delta PAE$ | 3.2                | 12.5 | 6.2  | 1    | 1.7  | 0.8         | %    |
| $PAE_{sat}$  | 56.8               | 44.3 | 38.1 | 37.1 | 35.4 | 34.6        | %    |

<sup>1</sup>Includes also coreboard losses and the interstage matching.

$P_{sat}$  and efficiency  $PAE_{sat}$  have been determined at maximum input power when using the default biasing configuration ( $V_{DD,pre} = 1.8$  V and  $V_{DD} = 3$  V), and for nominal corner.

As expected, the OMN is the dominant contributor of IL, followed by the input stage (input and interstage matching including coreboard losses). Interestingly, an extraction of supply networks (SPLY) shows less impact on total power losses as expected, where only approximately 0.3 dB of loss are predicted by simulations. Extraction of biasing stages and increased temperature  $\vartheta$  hardly affects the overall performance. In total, roughly  $\Delta P \approx 2.1$  dB can be expected from coreboard input to PA output when comparing transistor-level simulations without any parasitic effects to those including parasitic effects. Efficiency reduces from  $PAE = 57\%$  to roughly 35% for equal operating conditions, when including parasitic effects.

Especially in multi-stage architectures, where typically high power gain is available, stability of the amplifier has to be analyzed. For small-signal operation, stability can be guaranteed if and only if the amplifier has an input (output) referred stability factor  $\mu$  ( $\mu'$ ) greater unity according to [28]. However, it is worth noting that the stability factor ensures unconditional stability if and only if a single active device is used. For a two-stage stacked transistor amplifier, there might still be oscillations between the active devices, which are not covered in that kind of analysis. Nevertheless, the stability factor is a good indicator if special care has to be taken at certain frequencies,

$$\mu = \frac{1 - |S_{11}|^2}{|S_{22} - S_{11}^* \Delta| + S_{21} S_{12}} > 1 \quad (9a)$$

$$\mu' = \frac{1 - |S_{22}|^2}{|S_{11} - S_{22}^* \Delta| + S_{21} S_{12}} > 1 \quad (9b)$$

where  $\Delta = \det(|S|) = S_{11}S_{22} - S_{12}S_{21}$ . Fig. 17(a) depicts the simulated input and output referred stability factor, which has been derived by S-parameter simulations using (9). It shows the PA being unconditional stable over the whole simulated frequency range.

Large-signal simulation results when using two-tone stimulus ( $f_1 = 1.83$  GHz and  $f_2 = 1.855$  GHz) to estimate the expected PA linearity are presented in Fig. 17(b). For typical 3GPP output power, the PA has a third-order intercept-point (TOI) of roughly 5-dBm input referred and 30-dBm output referred.



Fig. 17. Sign-off simulation results for stability and linearity. (a) Stability factor. (b) Two-tone stimulus.



Fig. 18. PA die photograph. Chip size is  $A_{chip} \approx 2100 \mu m \times 1600 \mu m$ .

## VIII. IMPLEMENTATION AND EXPERIMENTAL RESULTS

### A. Silicon Implementation

This chip has been implemented in standard 28-nm CMOS technology. Only thick oxide transistors have been used in the output stage to ensure good reliability and high ruggedness. Digital logic circuitry has been implemented using thin-oxide transistors to benefit from technology scaling. The PA has been prepared for flip-chip (FC) packaging and has been soldered directly on the printed circuit board (PCB) for evaluation. Additional benefit of the FC packaging concept is reduced bump inductance since no bond wires are required. This also reduces simulation complexity since no special package model needs to be derived. The chip BEOL stack-up comprises five thin-metal, two thick-metal, and one aluminum layer on top. Fig. 18 shows a micrograph, where the input matching network (IMN), predriver (PRE), interstage (ITS) matching network,  $gm$ -stage (PA), and OMN are highlighted. Also, the PA replica stage (REPL) and placement of biasing circuitry (BIAS) is shown.

### B. DC Measurements

Recalling that the replica current  $I_{rep1}$  serves as input for the biasing stage, which determines the PA operating point, Fig. 19(a) shows the PA quiescent current being nearly independent from supply voltage variations.  $V_{DD}$  has been varied in the range of 1 V to its nominal value of 3 V. For a nominal replica current of  $I_{rep1} = 550 \mu A$ , the PA drain current has a variation of only  $\Delta I_{DD} = 6 \mu A$ , as predicted by simulations in



Fig. 19. Measured dc characteristic of the proposed two-stage PA. (a) Input referred. (b) Output referred.



Fig. 20. DC drain current and small-signal gain deviation over ambient temperature and supply voltage variation.



Fig. 21. Measured S-parameter dataset for nominal biasing conditions. (a) Input reflection coefficient. (b) Output reflection coefficient. (c) Forward voltage gain. (d) Reverse voltage gain.

Section V. Fig. 19(b) shows a comparison between simulated and measured dc characteristics. Excellent matching between both datasets can be observed. The diode behavior for supply voltages below 0.5 V results from the diode-connected transistor  $M_3^{\pm}$  (from a dc perspective).

The PA dc characteristic has also been analyzed for an ambient temperature sweep of 90  $^{\circ}$ C, and for supply variations of



Fig. 22. Output power degradation versus temperature and supply voltage.



Fig. 23. Measured PA performance for pulsed sinusoidal stimulus. (a) Performance versus input power at fundamental frequency. (b) Performance versus frequency at saturated output power.



Fig. 24. Measured 3GPP output spectrum for a W-CDMA test vector. Channel power is  $P_{out} = 26.5$  dBm.

up to 1 V. As depicted in Fig. 20, a drain current deviation of  $\Delta I_{DD} = 15$  mA referred to its nominal value can be observed.

### C. Small-Signal Measurements

A comparison between simulated and measured small-signal datasets has been carried out in the frequency range between 0.5 and 5 GHz. The results are shown in Fig. 21, where the magnitudes of input/output reflection coefficients and forward/reverse voltage gains are plotted.

Whereas there is good agreement for low frequencies, a deviation for mid- and high-frequency range can be noticed. Especially the measured matching at 2.9 GHz for the input, and the measured matching at 3.6 GHz for the output reflection coefficient are not covered properly in simulations. Also there is a roughly 400-MHz mismatch between simulated and measured frequency of maximum gain, which is shifted towards higher

TABLE VII  
PERFORMANCE COMPARISON OF RECENTLY REPORTED CMOS HANDSET PAs

| Year Ref. | CMOS nm | $P_{sat}$ dBm     | PAE %           | $G_{lin}$ dB | ACLR dBC         | EVM %              | $V_{DD}$ V | $f$ GHz | Stages | Technique          | OMN      | Test Vector |
|-----------|---------|-------------------|-----------------|--------------|------------------|--------------------|------------|---------|--------|--------------------|----------|-------------|
| 2015 [22] | 180     | n. a.             | 26 <sup>4</sup> | 22.9         | -23              | 4.5                | 5.6        | 2.4     | 1      | triple-stack       | on-chip  | W-LAN       |
| 2008 [9]  | 130     | 35                | 51              | n. a.        | n. a.            | n. a.              | 6          | 0.9     | 1      | power-combining    | on-chip  | GPRS        |
| 2010 [10] | 130     | 29.4              | 41.4            | 14.6         | -33              | n. a.              | 6.5        | 1.9     | 1      | quad-stack         | off-chip | W-CDMA      |
| 2012 [11] | 90      | 31.8 <sup>1</sup> | 42 <sup>1</sup> | 31           | -33 <sup>3</sup> | 5.6 <sup>2,3</sup> | 3.3        | 2.5     | 2      | thin-thick cascode | on-chip  | LTE         |
| 2008 [18] | 90      | 24.3              | 27              | 8            | n. a.            | n. a.              | 1          | 5.8     | 1      | power-combining    | on-chip  | CW          |
| 2010 [12] | 65      | 31                | 61              | 27           | -33              | n. a.              | 3.6        | 0.85    | 1      | stacked-cascode    | off-chip | GSM         |
| 2011 [13] | 65      | 31                | 51              | 34           | -33              | n. a.              | 3.4        | 1.85    | 2      | stacked-cascode    | on-chip  | W-CDMA      |
| This Work | 28      | 31.2 <sup>1</sup> | 33 <sup>1</sup> | 28           | -33 <sup>3</sup> | 1.7 <sup>3</sup>   | 3.0        | 1.8     | 2      | triple-stack       | on-chip  | W-CDMA      |

<sup>1</sup>Pulsed measurements.

<sup>2</sup>For 26-dBm output power.

<sup>3</sup>With DPD.

<sup>4</sup>At 1-dB compression.

frequencies in measurements. Two explanations can be found for this behavior.

- 1) The circuit models have been extracted with focus on large-signal operation, i. e., lumped-element models for interconnections do not sufficiently cover small-signal effects. Especially parasitic elements in the input matching network and interstage matching network are sensitive to S-parameter simulations and need to be slightly adjusted according to the measurement results.
- 2) Measurements of an unpopulated PCB showed significant frequency dependency of the input/output transmission lines and the SMA connector, which are not covered in simulations. Furthermore, the SMA connector footprint introduces a mismatch to the  $50\Omega$  input/output transmission line, which has also not been de-embedded in simulations.

#### D. Sinusoidal Measurements

For sinusoidal stimulus, a gain difference of  $\Delta G = 2.1$  dB can be noticed over temperature and supply voltage variations, as shown in Fig. 20. Output power degradation over temperature and supply changes have been observed as well. Fig. 22 shows a power drop of  $\Delta P_{out} \approx 1.7$  dB for typical 3GPP output power levels over a temperature range of  $90^\circ\text{C}$  and 0.5-V supply variation.

For circuit model validation, sinusoidal measurements have been carried out at room temperature. To eliminate undesired temperature effects of PA self-heating, a pulsed measurement setup according to [15] has been utilized, where detailed investigations between continuous wave (CW) and pulsed measurements are presented. For all measurements following next, the pulse duty-cycle is 10%. Pulsed measurements have been applied since in 3GPP operation the probability density function of the signal magnitude typically shows a probability for peak

output power of less than 0.1%. Thus, performing a CW input power sweep means unrealistic temperature conditions for the PA, which can be mitigated as described in detail in [15]. For the measurements following next, the PA biasing is in class-AB operation, and center frequency is at  $f = 1.75$  GHz. The measured PA output power, PAE, and power gain shows Fig. 23.

The transfer characteristic given in Fig. 23(a) has a smooth nonlinear behavior, which is typical for CMOS PAs. Saturated output power is 31.2 dBm and small-signal gain is 28 dB. In terms of efficiency the PA has a maximum of PAE = 33% at nearly saturated output power. When operating at 28 dBm, i.e., at 3.2-dB output power back-off, the efficiency decreases to PAE = 27% for CW operation. A comparison with large-signal simulation results verifies the analog EM and RC-extracted circuit models to be valid. Although the comparison shows the circuit models being slightly too pessimistic, still good agreement between simulated and measured data has been achieved. The deviations for low input power levels have been identified to be circuit related effects in simulations. For instance, a more complex model for the SnAg bumps needs to be derived, which might improve accuracy of simulated small-signal gain. The amplifier's frequency response given in Fig. 23(b) is very broadband with a 1-dB bandwidth of approximately  $B = 400$  MHz. Within this range, output power remains above 30 dBm and efficiency remains above 27%.

#### E. 3GPP Operation

This chip has also been tested using different W-CDMA test vectors. In general, to achieve the required ACLR  $\leq -33$  dBC at  $\pm 5$ -MHz spacing from carrier frequency, the use of memoryless DPD is mandatory. Fig. 24 shows the output spectrum for 3GPP test vectors prior to (gray) and with (black) the use of a fifth-order memoryless polynomial DPD. For RMC12k2 signals (3G voice, CF = 3.4 dB), the spectrum is given in

Fig. 24. It is symmetric around the carrier frequency and adjacent-channel requirements can be achieved for the specified output power. Measured drain efficiency for this operating condition ( $V_{DD} = 3$  V and  $I_{DD} = 815$  mA) is  $\eta_D = 18\%$ .

## IX. CONCLUSION

The design and implementation of a two-stage PA using a standard 28-nm CMOS process has been presented. Table VII compares the proposed PA to the performance of recently reported CMOS implementations.

The proposed amplifier exhibits state-of-the-art linearity performance. Efficiency degradation when compared to more matured technology nodes can be observed, which is the result of increased resistive losses in the back-end due to scaled metal layers. The amplifier delivers 31.2-dBm saturated output power with a small-signal gain of 28 dB. Maximum PAE is 33% at 1.75 GHz for pulsed sinusoidal stimulus. Adjacent-channel and output power requirements for 3GPP UMTS test patterns can be fulfilled when using fifth-order memoryless DPD.

This paper has shown the feasibility of using the deep-nanometer CMOS technology node to generate watt-level RF power. To the authors' best knowledge, this is the first implementation of a watt-level PA in 28-nm CMOS technology fulfilling W-CDMA/3GPP linearity and output power specifications. For the proposed solution, a differentially driven triple-stack approach in combination with a capacitive feedback path ensures the cascode transistors being in a safe operation area even for a voltage swing exceeding 25 V at typical 50- $\Omega$  RF load impedance. No external active/passive circuits are required for operation, which makes this PA topology a promising candidate for integration into modern digital RF transceiver architectures.

## ACKNOWLEDGMENT

The authors would like to thank S. Leuschner, A. Saudi, V. Kisa, and other colleagues at Intel Deutschland GmbH and Danube Mobile Communications Engineering (DMCE) GmbH & Co. KG, for supportive work and constructive discussions.

## REFERENCES

- [1] G. E. Moore, "Cramming more components onto integrated circuits," *IEEE Newslett. Solid-State Circuits Soc.*, vol. 11, no. 5, pp. 33–35, Nov. 2006, reprinted from *Electronics*, vol. 38, no. 8, Apr. 19, 1965, pp. 114 ff.
- [2] Z. Boos *et al.*, "A fully digital multimode polar transmitter employing 17b RF DAC in 3G mode," in *Int. Solid-State Circuits Conf.*, 2011, pp. 376–378.
- [3] M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar, and K. Bernstein, "Scaling, power, the future of CMOS," in *Int. Electron Devices Meeting*, 2005, pp. 7–15.
- [4] J. Moreira *et al.*, "A single-chip HSPA transceiver with fully integrated 3G CMOS power amplifiers," in *Int. Solid-State Circuits Conf.*, 2015, pp. 162–164.
- [5] T. Johansson and J. Fritzin, "A review of watt-level CMOS RF power amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 1, pp. 111–124, Jan. 2014.
- [6] A. Bravaix *et al.*, "Impact of the gate-stack change from 40 nm node SiON to 28 nm high-K metal gate on the hot-carrier and bias temperature," in *Int. Reliab. Phys. Symp.*, 2013, pp. 2D.6.1–2D.6.9.
- [7] J. Scholvin, D. R. Greenberg, and J. A. del Alamo, "Fundamental power and frequency limits of deeply-scaled CMOS for RF power applications," in *Int. Electron Devices Meeting*, 2006, pp. 1–4.
- [8] P. Asbeck, "Stacked si MOSFET strategies for microwave and mm-wave power amplifiers," in *Silicon Monolith. Integr. Circuits RF Syst.*, 2014, pp. 13–15.
- [9] I. Aoki *et al.*, "A fully integrated quad-band GSM/GPRS CMOS power amplifier," in *Int. Solid-State Circuits Conf.*, 2008, pp. 570–636.
- [10] S. Pornpromlikit, J. Jeong, C. Presti, A. Scuderi, and P. Asbeck, "A watt-level stacked-FET linear power amplifier in silicon-on-insulator CMOS," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 1, pp. 57–64, Jan. 2010.
- [11] O. Degani *et al.*, "A 90 nm CMOS PA module for 4G applications with embedded PVT gain compensation circuit," in *Power Amplifiers Wireless Radio Appl. Conf.*, 2012, pp. 25–28.
- [12] S. Leuschner, S. Pinarello, U. Hodel, J.-E. Mueller, and H. Klar, "A 31-dBm, high ruggedness power amplifier in 65-nm standard CMOS with high-efficiency stacked-cascode stages," in *RF Integr. Circuits Symp.*, 2010, pp. 395–398.
- [13] S. Leuschner, J.-E. Mueller, and H. Klar, "A 1.8 GHz wide-band stacked-cascode CMOS power amplifier for WCDMA applications in 65 nm standard CMOS," in *RF Integr. Circuits Symp.*, 2011, pp. 1–4.
- [14] *User Equipment Radio Transmission and Reception (FDD)*, 3GPP TS 25.101, Third Generation Partnership Project.
- [15] P. Obmann, J. Fuhrmann, J. Moreira, H. Pretl, and A. Springer, "A measurement method to mitigate temperature effects in nanometer CMOS RF power amplifiers," in *Austrochip Microelectron. Meeting*, 2014, pp. 1–5.
- [16] P. Obmann, J. Fuhrmann, K. Dufrêne, H. Pretl, and A. Springer, "A linear watt-level power amplifier implemented in 28 nm standard CMOS technology," in *Asia-Pacific Microw. Conf.*, 2014, pp. 674–676.
- [17] P. Obmann, J. Fuhrmann, J. Moreira, H. Pretl, and A. Springer, "A circuit technique to compensate PVT variations in a 28 nm CMOS cascode power amplifier," in *German Microw. Conf.*, 2015, pp. 131–134.
- [18] P. Haldi, D. Chowdhury, P. Reynaert, G. Liu, and A. M. Niknejad, "A 5.8 GHz 1 V linear power amplifier using a novel on-chip transformer power combiner in standard 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 43, no. 5, pp. 1054–1063, May 2008.
- [19] H. Holma and A. Toskala, *WCDMA for UMTS—HSPA Evolution and LTE*, 5th ed. New York, NY, USA: Wiley, 2010.
- [20] J. Kim, C. Park, J. Moon, and B. Kim, "Analysis of adaptive digital feedback linearization techniques," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 2, pp. 345–354, Feb. 2010.
- [21] M. Breschel *et al.*, "A multi-standard 2G/3G/4G cellular modem supporting carrier aggregation in 28 nm CMOS," in *Int. Solid-State Circuits Conf.*, 2014, pp. 190–191.
- [22] Y. Yin, X. Yu, Z. Wang, and B. Chi, "An efficiency-enhanced stacked 2.4-GHz CMOS power amplifier with mode switching scheme for WLAN applications," *IEEE Trans. Microw. Theory Techn.*, vol. 63, no. 2, pp. 672–682, Feb. 2015.
- [23] L. Larcher, D. Sanzogni, R. Bramà, A. Mazzanti, and F. Svelto, "Oxide breakdown after RF stress: Experimental analysis and effects on power amplifier operation," in *Int. Reliab. Phys. Symp.*, 2006, pp. 283–288.
- [24] J. Fritzin, C. Svensson, and A. Alvandpour, "A wideband fully integrated +30 dBm class-D outphasing RF PA in 65 nm CMOS," in *Int. Integr. Circuits Symp.*, 2011, pp. 25–28.
- [25] L. Kuang, B. Chi, H. Jia, W. Jia, and Z. Wang, "A 60-GHz CMOS dual-mode power amplifier with efficiency enhancement at low output power," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 62, no. 4, pp. 352–356, Apr. 2015.
- [26] E. Maricau and G. G., *Analog IC Reliability in Nanometer CMOS*. New York, NY, USA: Springer, 2013.
- [27] T. Sowlati and D. M. W. Leenarts, "A 2.4-GHz 0.18- $\mu$ m CMOS self-biased cascode power amplifier," *IEEE J. Solid-State Circuits*, vol. 38, no. 8, pp. 1318–1324, Aug. 2003.
- [28] M. L. Edwards and J. H. Sinsky, "A new criterion for linear 2-port stability using a single geometrically derived parameter," *IEEE Trans. Microw. Theory Techn.*, vol. 40, no. 12, pp. 2303–2311, Dec. 1992.



**Patrick Obmann** (S'14) was born in Tübingen, Germany, in 1984. He received the B.Eng. and M.Eng degrees in electrical engineering from Hochschule Konstanz (HTWG), Konstanz Germany, in 2010 and 2011, respectively, and is currently working toward the Ph.D. degree in microelectronics engineering from the Institute of Communications Engineering and RF-Systems (NTHFS), Johannes Kepler University Linz, Austria.

In 2011, he joined the NTHFS. His research interests are integrated circuits with a focus on RF power amplifiers for mobile communications systems.



**Jörg Fuhrmann** (S'13) was born in Nuremberg, Germany, in 1985. He received the Dipl.-Ing. (M.Sc.) degree in electrical, electronics and information technologies from the Friedrich-Alexander-University Erlangen–Nuremberg, Erlangen–Nuremberg, Germany, in 2012, and is currently working toward the Ph.D. degree at the Friedrich-Alexander-University Erlangen–Nuremberg.

In June 2012, he joined Danube Mobile Communications Engineering (DMCE) GmbH & Co. KG (majority owned by Intel Austria GmbH), Linz, Austria, in cooperation with the Institute for Electronics Engineering, Friedrich-Alexander University Erlangen–Nuremberg. His research is focused on integrated circuits for CMOS power amplifiers for fourth-generation (4G) long-term evolution (LTE).



**Krzysztof Dufrêne** (S'03–M'07–SM'13) was born in Warsaw, Poland, in 1979. He received the M.Sc. degree in telecommunications from the Warsaw University of Technology, Warsaw, Poland, in 2003, and the Ph.D. degree in electrical engineering from the University Erlangen–Nuremberg, Erlangen–Nuremberg, Germany, in 2007.

In 2007, he joined Danube Mobile Communications Engineering (DMCE) GmbH & Co. KG (majority owned by Intel Austria GmbH), Linz, Austria, where has been involved in the development of several generations of cellular RF transceivers. His research interests are communication systems, RF integrated circuit design for wireless applications, as well as compensation and calibration techniques of RF imperfections in communications transceivers.



**Jonas Fritzin** (S'07–M'12) received the M.Sc. degree in electrical engineering from the Chalmers University of Technology, Göteborg, Sweden, in 2004, and the Ph.D. degree from Linköping University, Linköping, Sweden, in 2011.

From January 2012 to May 2013, he was with Ericsson AB, Stockholm, Sweden, where he was involved with research and development of analog/RF integrated circuits (ICs) for base stations. Since June 2013, he has been an RF Circuit Design Engineer with Intel Deutschland GmbH, Munich, Germany. His research interests include CMOS RF power amplifiers (PAs), transmitters, and predistortion.



**José Moreira** was born in Lisbon, Portugal, in 1971. He received the Electrical Engineering degree and Ph.D. degree from the Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal, in 1994 and 1999, respectively.

He is currently with Intel Deutschland GmbH, Munich, Germany, where he has been contributing to several generations of cellular RF transceivers as a Circuit Designer and Technical Lead. His research interests are in the fields of analog and mixed-signal circuit design with a current focus on UMTS/LTE transmitters and power amplifiers.



**Harald Pretl** (S'97–M'01–SM'08) was born in Linz, Austria, in 1972. He received the Dipl.-Ing. (M.Sc.) degree in electrical engineering from the Graz University of Technology, Graz, Austria, in 1997, and the Dr. techn. (Ph.D.) degree from Johannes Kepler University (JKU), Linz, Austria, in 2001.

He is currently a Principal Engineer with Danube Mobile Communications Engineering (DMCE) GmbH & Co KG (majority owned by Intel Austria GmbH), Linz, Austria, where he has been contributing to several generations of cellular RF transceivers and mobile communications platforms as an Analog Circuit Designer, Project Lead, and RF Systems Architect. Since 2015, he has also been a Full Professor with the Institute of Integrated Circuits (RIIC), JKU, where he heads the Energy-Efficient Analog Circuits Group. He has authored or coauthored more than 20 papers at international conferences in the area of RF transceivers. He holds or has filed more than 25 patents. His current research interests are focused on highly integrated GSM/UMTS/LTE/fifth-generation (5G) transceivers, integrated CMOS power amplifiers for mobile communications and Internet of Things (IoT), wireless sensor networks, and low-power RF system-on-chip (SoC).

Dr. Pretl was a member of the Technical Program Committee (TPC) of the International Solid-State Circuits Conference (ISSCC) (2010–2012).



**Andreas Springer** (S'90–A'97–M'99) received the Dr. Techn. (Ph.D.) degree and Univ.-Doz. (Habilitation) degree from Johannes Kepler University Linz (JKU), Linz, Austria, in 1996 and 2001, respectively.

From 1991 to 1996, he was with the Microelectronics Institute, JKU. In 1997, he joined the Institute for Communications and Information Engineering, JKU, where in 2005, he became a Full Professor. Since July 2002, he has been Head of the Institute for Communications Engineering and RF-Systems (formerly the Institute for Communications and Information Engineering), JKU. With the Linz Center of Mechatronics (LCM), he serves as the coordinator for the “wireless system” research area. He has authored or coauthored more than 200 papers in journals and at international conferences, one book, and two book chapters. He has been engaged in research on GaAs integrated millimeter-wave TEDs, monolithic microwave integrated circuits (MMICs), and millimeter-wave sensor systems. His current research interests are focused on wireless communication systems, single- and multi-carrier communications, architectures and algorithms for multi-band/multi-mode transceivers, UMTS/HSDPA/LTE, and recently, wireless sensor networks.

Dr. Springer is a member of the IEEE Microwave Theory and Techniques Society (IEEE MTT-S), the IEEE Communications Society, and the IEEE Vehicular Technology Society. He is a member of OVE and VDI. From 2002 to 2012, he served as chair of the IEEE Austrian Joint COM/MTT Chapter. He is a member of the Editorial Board of the *International Journal of Electronics and Communications (AEÜ)*. He serves as reviewer for a number of international journals and conferences. In 2006 he was a corecipient of the Science Prize of the German Aerospace Center (DLR).