



Figure 7.2.7: Chip photo.

### 7.3 A 0.3-to-1.2V Frequency-Scalable Fractional-N ADPLL with a Speculative Dual-Referenced Interpolating TDC

Minseob Lee<sup>1</sup>, Shinwoong Kim<sup>2</sup>, Hwasuk Cho<sup>1</sup>, Jayyun Koo<sup>1</sup>, Kwang-Hee Choi<sup>2</sup>, Jin-Hyeok Choi<sup>2</sup>, Byungsub Kim<sup>1</sup>, Hong-June Park<sup>1</sup>, Jae-Yoon Sim<sup>1</sup>

<sup>1</sup>Pohang University of Science and Technology, Pohang, Korea

<sup>2</sup>Samsung Electronics, Hwaseong, Korea

Power management with dynamic frequency control has been a key feature in battery-operated systems. It effectively reduces energy consumption by the microcontroller in mobile systems towards meeting ultra-low-power constraints for implantable system-on-a-chip wireless nodes by optimal profiling of voltage and frequency according to the operating modes and tasks. The essential circuit block for such applications is the phase-locked loop (PLL), supporting a wide frequency lock range scaled along with the supply voltage [1, 2]. Traditional design approaches of analog PLLs, however, are not suited for wide supply voltage operation since they suffer from degraded accuracy when subject to a reduction in supply headroom. On the other hand, all-digital PLL implementations hold inherent benefits of robustness, small size and programmability with strong immunity to process and supply variations. While a number of integer-N digital PLLs have successfully demonstrated wide frequency lock ranges scaled with supply voltage, there have been few reported works on fractional-N frequency generation. Fractional-N frequency generation imposes additional requirements on the time-to-digital converter (TDC), such as high linearity, a wide conversion range with a fine resolution, and an automatic gain calibration. When it comes to automated synthesis of a digital PLL, such requirements are an even greater challenge to be overcome. This paper presents a frequency and voltage-scalable fractional-N digital PLL mostly synthesized from a register-transfer-level (RTL) behavioral description. By speculating a time region for conversion, the TDC efficiently achieves a fine resolution with a dual-referenced interpolation [3]. The PLL, implemented in 0.0043mm<sup>2</sup> in 28nm CMOS, achieves a wide frequency lock range operating at a supply voltage from 0.3-1.2V without any calibration or tuning.

Figure 7.3.1 shows the block diagram of the proposed PLL, which consists of a 4-phase digitally controlled oscillator (DCO) with a 1<sup>st</sup> order delta-sigma modulator (DSM), a speculative dual-referenced interpolating TDC (DI-TDC) for phase conversion, a retiming flipflop to synchronize the reference edge with one of DCO phases, differentiators for phase-to-frequency conversion, a subtractor, an accumulator and a digital loop filter (DLF). The speculative DI-TDC contains a quadrant detector (QD). One DCO period is divided into four quadrants with boundaries defined by the four phases of DCO. Each quadrant represents 1/4 of an integer phase. While the output of the counter is the integer phase, the outputs of QD and DI-TDC are interpreted to be the fractional phase. Therefore, the conversion range of the DI-TDC (borrowing a concept in [3]), becomes only 1/4 of one DCO period. A nine-stage 2-inverter delay chain provides eight phases ( $f_{rd}<7:0>$ ) of delayed  $f_{ref}$  to be used in the DI-TDC, which slices a quadrant into eight fractions, resulting in effective quantization of one DCO period into 32 steps. The  $f_{rd}<4>$  is used as the main time reference in the PLL. The digitized frequency obtained from the differentiators is subtracted from the target frequency code word (FCW), generating a frequency error ( $f_{error}$ ).

Figure 7.3.2 illustrates the speculative DI-TDC scheme with a counter, a QD and two 8-step sub-TDCs. The counter is driven by one of DCO phases,  $\phi_3$ .  $N[i]$  represents the counter output at the  $i^{th}$  rising edge of the reference ( $f_{rd}<4>$ ). The QD finds the quadrant containing the  $i^{th}$  rising edge of  $f_{rd}<4>$  and generates the output M of 0, 1, 2 or 3, indicating the corresponding quadrant, respectively. A 4-to-2 MUX passes two boundaries of the quadrants of interest to two 8-step sub-TDCs for dual-referenced interpolation. Note that the DI-TDC measures the time period between the earlier boundary of the quadrant and the rising edge of  $f_{rd}<4>$ . Therefore, the QD output, updated after the rising edge of  $f_{rd}<4>$ , cannot be used for the MUX selection.

This work incorporates a speculation scheme for the DI-TDC to predict the quadrant where the next rising edge of  $f_{rd}<4>$  will fall. The prediction is performed by simply summing the previous TDC output  $K[i]$  and the given fractional frequency code word (FCW\_frac). Once PLL is locked, the change in fractional phase at the next reference edge should be FCW\_frac. Thus, the fractional part of

the summation result defines what the next fractional phase would be. However, though this scheme gives a reasonable prediction for the quadrant, there can be mismatch between the quadrant obtained by prediction logic and the quadrant obtained by QD. Considering that this error occurs only when the reference edge is close to the quadrant boundaries, a compensation scheme can be applied to fix the speculation error. It is performed by inverting all the DI-TDC output bits when a mismatch is detected. Even with the compensation, the error might not be perfectly eliminated, but it will negligibly affect overall performance since it has no influence on the phase detection for the next reference cycle. The proposed speculative DI-TDC reduces power consumption by 50% compared with the scenario of multiple counters and sub-TDCs [3].

The sub-TDC performs the sampling of given DCO phase with  $f_{rd}<7:0>$  with a resolution given by the 2-inverter delay in the reference delay chain. While the conversion gain of the sub-TDC is sensitive to the resolution, the summation of the two sub-TDC outputs compensates the gain error to the first order and leaves the third order distortion as the remaining nonlinearity [3]. This DI-TDC scheme shows good linearity with strong immunity to inverter delay variation. In addition, the DCO is also formed with inverter stages with 255 tristate control inputs (Fig. 7.3.3). The reference delay chain and the DCO are both inverter-based circuits and have similar supply sensitivity trends, resulting in good tracking between the two delay quantities without any calibration in the reference delay chain.

The PLL is fabricated with 28nm CMOS process in 0.0043mm<sup>2</sup>. All the blocks were fully synthesized from a register-transfer-level (RTL) behavioral description except for the DCO and the reference delay chain. The PLL operates at a supply voltage from 0.3-1.2V with a lock range of 10MHz to 2.7GHz. Fig. 7.3.4 shows measured phase noise plots when the same FCW of 25.6953125 was applied with different supply voltages. The PLL dissipates 6.95mW (6.16mW by the DCO) from a 1.0V supply at 2GHz, achieving an FoM of -225dB. It consumes 7.74μW (5.34μW by the DCO) from a 0.3V supply at 20MHz, showing -203dB FoM. Fig. 7.3.5 summarizes measured performance as supply voltage varies. It shows a stable FoM of over -220dB at supply voltage above 0.6V. Fig. 7.3.6 compares the performance with previously reported state-of-the-art ring oscillator-based fractional-N PLLs. This work achieves the smallest size and the widest voltage and frequency ranges. Fig. 7.3.7 shows a chip microphotograph.

#### Acknowledgements:

This work was supported by Open Research Program of KIST under grant No. 2E26691-16-113 and IDEC program.

#### References:

- [1] N. August, et al., "A TDC-Less ADPLL with 200-to-3200MHz Range and 3mW Power Dissipation for Mobile SoC Clocking in 22nm CMOS," ISSCC, pp. 246-248, 2012.
- [2] J. Zhu, et al., "A 0.0021mm<sup>2</sup> 1.82mW 2.2GHz PLL Using Time-Based Integral Control in 65nm CMOS," ISSCC, pp. 338-340, 2016.
- [3] S. Kim, et al., "A 2 GHz Synthesized Fractional-N ADPLL With Dual-Referenced Interpolating TDC," IEEE JSSC, vol. 51, no. 2, pp.391-400, 2016.
- [4] W. Deng, et al., "A 0.048mm<sup>2</sup> 3mW Synthesizable Fractional-N PLL with a Soft Injection-Locking Technique," ISSCC, pp. 252-254, 2015.
- [5] H. Cho, et al., "A 0.0047mm<sup>2</sup> Highly Synthesizable TDC- and DCO-Less Fractional-N PLL with a Seamless Lock Range of  $f_{REF}$  to 1GHz," ISSCC, pp. 154-156, 2017.
- [6] J. Liu, et al., "A 0.012mm<sup>2</sup> 3.1mW Bang-Bang Digital Fractional-N PLL with a Power-Supply-Noise Cancellation Technique and a Walking-One-Phase-Selection Fractional Frequency Divider," ISSCC, pp. 268-270, 2014.
- [7] T.-H. Tsai, et al., "A 1.22ps Integrated-Jitter 0.25-to-4GHz Fractional-N ADPLL in 16nm FinFET CMOS," ISSCC, pp. 260-262, 2015.
- [8] L. Kong, et al., "A 2.4GHz RF Fractional-N Synthesizer with 0.25f<sub>REF</sub> BW," ISSCC, pp. 330-332, 2017.



Figure 7.3.1: Overall block diagram of the proposed PLL.



Figure 7.3.2: The proposed speculative DI-TDC scheme.



Figure 7.3.3: Schematic of the digitally controlled oscillator.



Figure 7.3.4: Measured phase noise plots with FCW = 25.6953125.



Figure 7.3.5: Measured performance as supply voltage varies.

| Tech. [nm]                | This Work |            |           |        | ISSCC 2015 [4] | ISSCC 2017 [5] | ISSCC 2014 [6] | ISSCC 2015 [7] | ISSCC 2017 [8] |
|---------------------------|-----------|------------|-----------|--------|----------------|----------------|----------------|----------------|----------------|
| Area [mm <sup>2</sup> ]   | 0.0043    | 0.048      | 0.0047    | 0.012  | 0.029          | 0.096          |                |                |                |
| Synthesis                 | RTL-level | Gate-level | RTL-level | No     | No             | No             |                |                |                |
| Supply Voltage [V]        | 0.4       | 0.6        | 0.8       | 1      | 0.8            | 1              | 0.9            | 0.52~0.8       | 1              |
| Power [mW]                | 0.053     | 0.669      | 2.984     | 6.950  | 3              | 15.2           | 3.1            | 9.3**          | 10             |
| Output Freq. [GHz]        | 0.103     | 0.617      | 1.439     | 2.056  | 1.52           | 1              | 1.6            | 3**            | 2.42           |
| Power Efficiency [uW/MHz] | 0.51      | 1.08       | 2.07      | 3.38   | 1.97           | 15.2           | 1.94           | 3.1**          | 4.13           |
| Integ. RMS Jitter [ps]    | 122       | 9.45       | 3.26      | 2.13   | 3.6            | 3.3            | 28             | 1.22**         | 1.5            |
| FoM* [dB]                 | -211.1    | -222.2     | -225.0    | -225.0 | -224.2         | -218           | -206.1         | -228.6**       | -226.5         |

\* FoM = 10-log( (σ(s))^2 · P(nW) ), \*\* Measured at 0.8V with an integer FCW



Figure 7.3.6: Performance summary and comparison with ring oscillator based fractional-N PLL.



Figure 7.3.7: Chip microphotograph.

## 7.4 A 55nm Time-Domain Mixed-Signal Neuromorphic Accelerator with Stochastic Synapses and Embedded Reinforcement Learning for Autonomous Micro-Robots

Anvesha Amravati, Saad Bin Nasir, Sivaram Thangadurai, Insik Yoon, Arijit Raychowdhury

Georgia Institute of Technology, Atlanta, GA

Even as rapid advances are being made in the areas of deep neural networks (DNNs) and convolutional neural networks (CNNs) with most hardware demonstrations geared towards inference in vision-based platforms [1-5], we recognize that true autonomy in intelligent agents will only emerge when such bio-mimetic systems can perform continuous learning through interactions with the environment. Reinforcement learning (RL) presents one such computational paradigm inspired by behaviorist psychology, where autonomous agents take actions in an environment to maximize a notion of cumulative reward. This concept is deeply rooted in the human brain where dopamine mediated neurotransmitters (in the cortex, striatum and thalamus of the brain) have been shown to encourage reward-motivated behavior in all our social interactions (Fig. 7.4.1). In this paper, we present a  $690\mu\text{W}$  ( $V_{\text{CC}}=1.2\text{V}$ ) neuromorphic accelerator fabricated in 55nm CMOS, which: (1) inherits unique properties of stochastic neural networks, (2) leverages recent advances in Q-learning as an implementation of RL, and (3) demonstrates energy-efficient time-domain mixed-signal (TD-MS) circuit architectures, to provide autonomy to a mobile, self-driving micro-robot at the edge of the cloud, with possible applications in disaster relief, reconnaissance and personal robotics.

The feed-forward path is implemented in a three-layered neural network (input, hidden and output) and the network sizes and bitwidths are optimized for minimum power for the target application. Fig. 7.4.1 illustrates the system diagram where three ultra-sonic (US) sensors feed pulses (depth information) directly to a layer of 84 TD neurons through an array of stochastic synapses, thus avoiding time-to-digital conversion at the sensor interface. These hidden layer neurons perform a weighted sum of the inputs and using an activation function (rectified linear unit, ReLU) each neuron produces pulses that are retransmitted via stochastic synapses to the output layer of neurons. The output layer, after a winner take all (WTA) comparator produces an action that leads the robot to move straight, left or right (action). Each action is associated with changing sensor data, which is re-evaluated for continuous RL. Using backpropagation and gradient descent, the feedback circuit predicts the reward (i.e., avoid obstacles and cover maximum distance for area mapping), computes the loss function and updates the model (synaptic weights) for further exploration. The test-chip enables full-scan, embedded timing and memory controllers, debug features, direct interfaces to US sensors and can interface with a microcontroller board for motor and sensor control.

Motivated by the fact that edge devices need to operate at ultra-low power, and the observation that such systems require low effective-number of bits (4-8b) in the feed-forward data-paths, we employ analog computing blocks. However, voltage-domain analog circuits require high  $V_{\text{CC}}$  to accommodate the dynamic range and data-conversion is typically expensive. Hence, we explore TD-MS circuits, where operands are represented by pulse-widths, thereby enabling large dynamic ranges even close to  $V_T$ . As a trade-off, the architectures are slower, which is perfectly acceptable for the applications in hand. Fig. 7.4.2 captures the components and simulations for the TD-MS neuron and the stochastic synapse. A TD-MS multiply-and-accumulate (MAC) is implemented by a 21b counter (Fig. 7.4.2) which multiplies the 6b input ( $x$ ) from a pre-synaptic neuron to the 6b weight ( $w$ ) of the synapse. The input is a pulse of width  $T=x \cdot T_0$  (generated by a digital to pulse converter, DPC) where  $T_0$  is the unit delay of the DPC. A local DCO with embedded memory (stores the model weights for the fanin synapses,  $w$ ) generates a frequency  $F=w \cdot F_0$  and clocks an up/down counter. The DCO is kept ON for the period,  $T$ , thus enabling in-situ computation of  $x \cdot w$  and accumulation in the counter. The up/down counter easily enables negative values of  $w$  (downcount) in the signed magnitude system. The ReLU activation function of the neuron is implemented by a DPC (Fig. 7.4.2) and feeds into the following stage. Such a TD-MS MAC shows unique properties: (1) the energy to perform a MAC is proportional to the magnitude of the operands and hence the importance of the computation in the neural network – a feature inherent in the brain, but missing in digital logic (shown in the color-map), (2) 45% lower system area

(largely contributed by lower routing overhead) than a digital implementation, (3) 47% lower interconnect power, since each synaptic connection is one buffer-chain and goes through one  $0 \rightarrow 1$  and one  $1 \rightarrow 0$  transition irrespective of the operand value and (4) 16% lower leakage power. Stochastic synapses with drop-connect prevents data overfitting and are implemented here with a buffer-chain whose delay comprises of (1) a scan-programmable fixed part, and (2) a stochastic part where the  $0 \rightarrow 1$  and  $1 \rightarrow 0$  transition delays are randomly altered (or the pulses dropped randomly for drop-connect) by a local high-speed LFSR. At every cycle of a reference clock, one neuron in a layer produces pulses (Fig. 7.4.2) that are simultaneously captured by all the neurons in the next layer. The controller then activates the next neuron of the layer and so on. In each cycle, one neuron of every layer fires in a pipelined fashion, increasing system throughput. Fig. 7.4.3 illustrates the data-flow for RL, key circuit components, including reuse of compute blocks with the feedforward path. After an action,  $A_t$ , Q-learning provides reward prediction for the loss function evaluator. We implement a support vector machine (SVM) hinge loss function in 2 cycles. 84 parallel units compute the hidden layer gradient using TD macros and reusing the DCOs and DPCs in the feed-forward path. Next, the gradients of the hidden layer ( $\Delta$ ) are calculated. Hidden layer outputs are multiplied by the predicted reward to produce weight updates ( $w^{(t)}$ ). In the input layer, the gradients  $\Delta$  pass through a ReLU gradient estimator and multiply with the outputs of the input neurons to update the weights of the input layer ( $w^{(t)}$ ). In the final cycle, all the updated weights are written back into the local memory of the neurons and the system is readied for the next feedforward exploration. Unique characteristics of the system architecture include (1) memory-in-logic to reduce data movement, and (2) TD analog for computation and communication seamlessly interfacing with digital (counters and memory) for storage.

The  $F_0-T_0$  design-space is shown in Fig. 7.4.4 and measured results illustrate that the system remains bounded from 0.4-1.0V avoiding both counter overflow and loss of resolution in the neuron's MAC. The measured  $F_{\text{DCO}}$  (of the hidden layer) shows peak performance of 780MHz (at 1V), INL of 1.2LSB (1.6LSB) at 1V (0.4V) and DNL of 1.4LSB (1.5LSB) at 1.0V (0.4V). Measured  $T_{\text{DPC}}$  also shows good linearity with INL of 1.3LSB (1.6LSB) at 1V (0.4V) and DNL of 1.5LSB (at 1V) and 1.8LSB (at 0.4V). We note that: (1) the DCO and the DPC are both composed of programmable buffer chains and their delays track each other across  $V_{\text{CC}}$ , and (2) the programmability of the DPC and the DCO depends on the number of stages of buffer delay, which results in high linearity. Measured stochasticity of test synapses demonstrate a uniform distribution with variation of  $\pm 40\%$ . The benefit of stochasticity is seen in the system emulation (Fig. 7.4.5) where a stochastic network reduces the loss function by more than 30% for  $10^4$  training samples. Fig. 7.4.5 illustrates measured  $F_{\text{MAX}}$  and power demonstrating operation down to 0.4V and peak power of only  $690\mu\text{W}$  (at 1.2V). The peak throughput shows a wide dynamic range catering to a variety of RL tasks. Peak energy efficiency is obtained at 0.8V ( $V_T \sim 0.5\text{V}$ ), where we note  $690\text{pJ/inference}$  and  $1.5\text{nJ/training}$ , demonstrating a  $1.25\text{pJ/MAC}$  (worst-case). Comparison with the existing literature demonstrates ultra-low power ( $690\mu\text{W}$  at peak performance), an average of  $3.12\text{TOPS/W}$  and enables unique neuromorphic functionality. The testchip is mounted on a mobile nano-robot for autonomous exploration and learning and the overall distance moved by the robot as a function of the reference clock is shown in Fig. 7.4.6. The die-shot and chip micrograph are shown in Fig. 7.4.7.

### Acknowledgements:

This work was sponsored by Qualcomm Inc. and National Science Foundation under grant 1640081, and the Nanoelectronics Research Corporation (NERC), a wholly-owned subsidiary of the Semiconductor Research Corporation (SRC), through Extremely Energy Efficient Collective Electronics (EXCEL), an SRC-NRI Nanoelectronics Research Initiative under Research Task ID 2698.002.

### References:

- [1] J. Park, et al., "Online RL NoC for Portable HD Object Recognition Processor," *CICC*, 2012.
- [2] D. Shin, et al., "DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for General-Purpose Deep Neural Networks," *ISSCC*, pp. 240-241, 2017.
- [3] B. Moons, et al., "ENVISION: A 0.26-to-10TOPS/W Subword-Parallel Dynamic-Voltage-Accuracy-Frequency-Scalable Convolutional Neural Network Processor in 28nm FDSOI," *ISSCC*, pp. 246-247, 2017.
- [4] J. Sim, et al., "A 1.42TOPS/W Deep Convolutional Neural Network Recognition Processor for Intelligent IoT Systems," *ISSCC*, pp. 264-265, 2016.
- [5] Y-H. Chen, et al., "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," *ISSCC*, pp. 262-263, 2016.



Figure 7.4.1: Motivation and system diagram for reinforcement learning as a neuromorphic computational model in autonomous mobile micro-robots.



Figure 7.4.3: Flow-chart and corresponding circuit diagrams for online RL via backpropagation and gradient descent.



Figure 7.4.5: Role of stochasticity in loss minimization and measured system performance-power trade-offs, illustrating 690μW at peak performance and a worst-case of 1.25pJ/MAC.



Figure 7.4.2: Comparison of time-domain mixed-signal (TD-MS) design vs. digital implementation, design of TD-MS neuron with ReLU, stochastic synapses and feedforward timing diagram illustrating pipelined data-propagation.



Figure 7.4.4: Design space illustrating measured operating points, measured DCO characteristics, measured DPC characteristics and measured stochasticity generated by the synapses.

| This work             | [1]                        | [2]                | [3]                 | [4]                | [5]               |
|-----------------------|----------------------------|--------------------|---------------------|--------------------|-------------------|
| ML System             | Reinforcement Learning     | Object Recognition | CNN-RNN             | CNN                | DNN               |
| Technology            | 55nm                       | 65nm               | 65nm                | 65nm               | 65nm              |
| Circuit style         | Time domain mixed-signal   | Digital            | Digital             | Digital            | Digital           |
| Area                  | 3.4mm <sup>2</sup>         | 4mm <sup>2</sup>   | 16mm <sup>2</sup>   | 3.3mm <sup>2</sup> | 16mm <sup>2</sup> |
| Learning/Training     | Online in real time        | Offline            | Offline             | Offline            | Offline           |
| Stochasticity         | Present                    | Absent             | Absent              | Absent             | Absent            |
| Resolution            | 6bit MAC/ 2bit Counter     | 16b                | 16b                 | 4b-16b             | 16b               |
| Power                 | 690 μW at peak performance | 121mW              | 63mW                | 7.5-300mW          | 45mW              |
| Supply voltage        | 0.4-1V                     | 1.2V               | 0.77-1.2V           | Unavailable        | 1.2V              |
| No. inferences/sec    | 254,000                    | Not Reported       | Not Reported        | Not Reported       | Not Reported      |
| No. of training/sec   | 118,000                    | Not Reported       | Not Reported        | Not Reported       | Not Reported      |
| Performance/Watt      | 3.12TOPS/W                 | 1.24TOPS/W         | 2.1TOPS/W           | 0.26-10TOPS/W      | 0.21TOPS/W        |
| Application           | Autonomous micro-robotics  | Object Recognition | General purpose DNN | Visual recognition | CNN processor     |
| Min. Energy/inference | 690pJ                      | Not Reported       | Not Reported        | Not Reported       | Not Reported      |
| Min. energy/training  | 1.5nJ                      | Not Reported       | Not Reported        | Not Reported       | Not Reported      |

COMPARISON TABLE



Figure 7.4.6: Comparison table and application to mobile micro-robotics illustrating the use of RL in exploration.



Figure 7.4.7: Die-photo and chip characteristics.

## 7.5 An Enhanced-Security Buck DC-DC Converter with True-Random-Number-Based Pseudo Hysteresis Controller for Internet-of-Everything (IoE) Devices

Wen-Hau Yang<sup>1</sup>, Li-Cheng Chu<sup>1</sup>, Shang-Hsien Yang<sup>1</sup>, Yan-Jiun Lai<sup>1</sup>, Shao-Qi Chen<sup>1</sup>, Ke-Horng Chen<sup>1</sup>, Ying-Hsi Lin<sup>2</sup>, Shian-Ru Lin<sup>2</sup>, Tsung-Yen Tsai<sup>2</sup>

<sup>1</sup>National Chiao Tung University, Hsinchu, Taiwan

<sup>2</sup>Realtek Semiconductor, Hsinchu, Taiwan

As far as Internet-of-Everything (IoE) devices are concerned, strong security and low electromagnetic interference (EMI) are design requirements for power management to guarantee personal data protection. [1] is robust under power side-channel attacks (PSCA), but a power injection attack (PIA) results in limited random-number generation (RN), as shown in the upper left of Fig. 7.5.1. The loop randomization technique in [1] is cracked and vulnerable to PIA, since predictability and reproducibility arise in the linear feedback shift register (LFSR). Moreover, the PIA narrows the LFSR-based random switching frequency ( $f_{sw}$ ) in [1] and reduces the triangular modulation frequency ( $f_{mod}$ ) in [2] to around 1/N times of  $f_{sw}$ . Consequently, the EMI noise floor fails to meet the specification of EN 55032 Class B, as shown in the upper right of Fig. 7.5.1. Other techniques offer counter measures to improve resistance against malicious attacks, but result in either increased power consumption [3] or large hardware overhead [4].

In this paper, a true random number (TRN)-based pseudo-hysteresis controller (PHC) and an enhanced security randomizer (ESR) (bottom of Fig. 7.5.1) are proposed to defend against both PSCA and PIA simultaneously and reduce EMI without degrading performance. To generate a true random modulated  $f_{sw}$ , the TRN-based PHC converts the 6b digital code RN to constitute a hysteresis window according to the  $V_{ea}$  from the error amplifier (EA). Since the center of the upper bound of the hysteresis window is  $V_{ea}$ , the output voltage ripple can be suppressed. Moreover, the ESR is capable of generating an input-supply-independent RN correctly, even under both PSCA and PIA compared to the conventional constant  $f_{sw}$  and the LFSR-based random  $f_{sw}$ . Specifically, the correlation between output current ( $I_{out}$ ) and input current ( $I_{in}$ ) is constantly kept low to strengthen the security.

Figure 7.5.2 depicts the proposed TRN-based PHC, comprising three parts: (1) an Adaptive Current Correction (ACC) technique, (2) a TRN-to-I converter, and (3) an Offset-Controlled Hysteresis (OCH) comparator. Under a PIA, it is necessary to ensure a suitable spread spectrum of  $f_{sw}$ , since serious interference  $\Delta V_{in}$  at  $V_{in}$  disturbs the center and distribution of  $f_{sw}$  (corresponding to  $f_{sw(center)}$  and ' $f_{sw(max)} - f_{sw(min)}$ ' in the TRN-based PHC). To reduce output voltage ripple and stabilize the converter, the ACC technique adjusts the reference currents  $I_{ref1}$  and  $I_{ref2}$  against  $\Delta V_{in}$ . By setting  $I_{ref2} \propto I_{ref1}^2 \propto [(V_{in} - V_{out})/V_{in}]^2$ , both  $f_{sw(center)}$  and ' $f_{sw(max)} - f_{sw(min)}$ ' remain constant to avoid the PIA attack. In the TRN-to-I converter, the hysteresis current ( $I_{hys}$ ) is related to the RN, where RN[5] is the sign bit and RN[4:0] represents the magnitude of  $I_{hys}$ . The  $V_{ea}$  is disturbed by flowing  $I_{hys}$  through resistor  $R_1$  to form the upper bound  $V_H = V_{ea0}$  and to establish the frequency variation in  $f_{sw}$ . For example, given RN[5] is logic 'H' for slower  $f_{sw}$ ,  $V_{ea}$  swings within the  $V_{ea}$ -centered hysteresis window set by  $\pm I_{hys} \times R_1$ , when the comparator output  $V_C$  is logic 'L' and 'H', respectively. The wider hysteresis window results in slower  $f_{sw}$ . When  $V_C$  is logic 'H', the OCH comparator has an offset voltage  $V_{os}$  (due to the asymmetrical differential pair  $M_{in}$  and  $M_{ip}$ ) to down shift the  $V_H (=V_{ea0})$  to generate the pseudo-lower bound  $V_L (=V_{ea0} - V_{os})$ . Without any extra comparators, the  $V_{sen}$  is bounded within the hysteresis window determined by  $V_H$  and  $V_L$ . More specifically,  $V_{os} = [(K+1)^{\beta} - 1] \times [I_{ref2}/(2\beta)]^{\alpha}$  defines the  $f_{sw(center)}$ , where  $(K+1)$  is the ratio of asymmetric  $M_{ip}$  and  $M_{in}$ , and  $\beta$  is a process parameter.

Figure 7.5.3 shows the implementation of the security-aware TRN generator (SA-TRNG) with the adaptive oscillation frequency correction (AOFC) technique. In a conventional TRNG,  $\Delta V_{in}$  caused by a PIA restricts the oscillation of the ring oscillator (RO) and results in no detection of a 180° phase difference, as indicated by the signal PD. The value of RN is limited and hence, the converter is no longer secure. To mitigate a PIA attack, the attack-detection circuit within the SA-TRNG checks the state of PD at the falling edge of  $V_C$ . Once two successive logic-'L' values are detected, the signal AD is set to logic-'H' and the internal security-

protection supply (ISPS) circuit in the ESR is directed to switch the supply voltage,  $V_{dd}$ , from  $V_{in}$  to an internal supply voltage, pumped by  $V_{out}$ . This ensures PIA-free operation. Normally, the falling edge of  $V_C$  triggers the pulse generator to reset PD and the outputs of the 3-stage RO, OUT<sub>+</sub> and OUT<sub>-</sub>, making them oscillate in-phase. Due to jitter accumulation, the phase difference between OUT<sub>+</sub> and OUT<sub>-</sub> gradually approach 180° until the phase-detection circuit detects this condition, and sets PD to logic-'H' to stop the TRNG counter. The value of the RN depends on the number of times OUT<sub>+</sub> oscillates before 180° phase difference. At the next falling edge of  $V_C$ , the RN is updated and the entire operation repeats again. Moreover, according to the value of AVG (the average of RN), the AOFC technique adjusts the oscillating frequency ( $f_{osc}$ ) by a local negative feedback control to guarantee a true random distribution of  $f_{sw}$ . When the AVG is outside the given range (30 to 34 in this work), the signal OSC increases (or decreases) by 1 to turn on (or off) one more switch  $S_i$  in the RO cell, and thus increases (or decreases)  $f_{osc}$  until the AVG is within the range.

Figure 7.5.4 shows the ISPS circuit with dual-path and quad-mode operation. Under a PIA, the signal AD is logic-'H', triggering the pumping path. The cascaded power supply for the proposed circuit strengthens the security with the cost of more power consumption. The factor control decides the mode (1.5x, 2x, or 3x) according to current output voltage  $V_{out}$ , and the phase generator produces non-overlapping phases  $\phi_1$  and  $\phi_2$  to prevent leakage during phase exchange. Conversely, in a normal situation (i.e. no attack), AD is logic-'L' and the bypass path (1x mode) is enabled, where the  $V_{dd}$  is directly connected to  $V_{in}$  and the whole ISPS except  $S_B$  is disabled to reduce power.

The testchip was fabricated in a 55nm process. In the top of Fig. 7.5.5, the proposed TRN-based PHC technique demonstrates enhanced security against both PIA and PSCA due to the true random distributed  $f_{sw}$ . The middle of Fig. 7.5.5 shows the measured EMI spectrum. With the ESR circuit, the peak EMI noise is reduced from 89.27dB $\mu$ V to 54.32dB $\mu$ V. The bottom left of Fig. 7.5.5 depicts the security mechanism of the SA-TRNG and the ISPS under a PIA, where the operation corresponds to the timing diagram in Fig. 7.5.3. The bottom right of Fig. 7.5.5 shows the measured waveforms of the converter, where the drop in output voltage ( $V_{out,drop}$ ) is around 53mV and the recovery time ( $T_{recovery}$ ) is around 7.3 $\mu$ s in the case of a 0.2A-to-0.8A load transient.

Figure 7.5.6 shows the key parameters of this work and a comparison with the state-of-the-art. Even under PIA interference up to 1V, the measured average value of RN remains within the desired safety range (30 to 34) due to the proposed SA-TRNG with AOFC. With the help of ESR, the correlation between  $I_{in}$  and  $I_{out}$  is kept low. The worst load regulation is 2.5% at  $V_{out}=1.8V$ , when the output current ranges from 0 to 1A. The peak efficiency is 92.4% in 4.5-to-3.3V conversion. Compared to [1] and [2], the TRN-based PHC technique with the ESR has the following features: (1) PIA-free spread spectrum and low EMI to meet EN 55032 Class B; (2) defense against both PSCA and PIA for enhanced security, since the relative correlation always remains smaller than 0.01; and (3) a fast transient response in comparison with [5] due to the inherent hysteresis control. The die micrograph is shown in Fig. 7.5.7.

### References:

- [1] M. Kar, et al., "Improved Power-Side-Channel-Attack Resistance of an AES-128 Core via a Security-Aware Integrated Buck Voltage Regulator," *ISSCC*, pp. 142-144, 2017.
- [2] X. Ke, et al., "A 10MHz 3-to-40V  $V_{in}$  Tri-Slope Gate Driving GaN DC-DC Converter with 40.5dB $\mu$ V Spurious Noise Compression and 79.3% Ringing Suppression for Automotive Applications," *ISSCC*, pp. 430-432, 2017.
- [3] C. Tokunaga and D. Blaauw, "Secure AES Engine with a Local Switched-Capacitor Current Equalizer," *ISSCC*, pp. 64-66, 2009.
- [4] M. Doulcier-Verdier, et al., "A Side-Channel and Fault-Attack Resistant AES Circuit Working on Duplicated Complemented Values," *ISSCC*, pp. 274-276, 2011.
- [5] S.-H. Lee, et al., "A 0.518mm<sup>2</sup> Quasi-Current-Mode Hysteretic Buck DC-DC Converter with 3 $\mu$ s Transient Response in 0.35 $\mu$ m BCDMOS," *ISSCC*, pp. 214-216, 2015.



Figure 7.5.1: Under a PIA, analysis of security and EMI in state-of-the-art and proposed works (top); Structure of the proposed TRN-based PHC and ESR circuit (bottom).



Figure 7.5.3: Circuit Implementation of the SA-TRNG with AOFC technique and its timing diagram.



Figure 7.5.5: Measured results of transient performances under a PIA without and with the TRN-based PHC (top); conductive EMI without and with the ESR circuit (middle); security mechanism of the SA-TRNG with the AOFC and the ISPS circuit when the PIA occurs (bottom left), and load-transient response (bottom right).



Figure 7.5.2: Proposed TRN-based PHC with PIA-free spread spectrum and key operation waveforms.



Figure 7.5.4: Proposed ISPS circuit with dual-path and quad-mode operation that generates the internal security protection supply even under a PIA.



Figure 7.5.6: Performance of proposed TRN-based PHC technique, including the average of RN under the PIA interference, and efficiency, relative correlation, and load regulation under different output current conditions (top); comparison table with the state-of-the-art (bottom).



Figure 7.5.7: Die micrograph in 55nm process.

## 7.6 A Secure Camouflaged Logic Family Using Post-Manufacturing Programming with a 3.6GHz Adder Prototype in 65nm CMOS at 1V Nominal $V_{DD}$

Nail Etkin Can Akkaya, Burak Erbagci, Ken Mai

Carnegie Mellon University, Pittsburgh, PA

With the continued globalization of the IC manufacturing supply chain, securing that supply chain is becoming increasingly difficult and this opens the door to a myriad of security threats such as unauthorized production, counterfeiting, IP theft, and hardware Trojan Horses. A parallel and related threat is posed by advanced reverse engineering capabilities, such that even chips manufactured at the most advanced technology nodes can be de-layered, imaged, and analyzed [1]. While various manufacturing methodologies and camouflaged gates have been proposed, none fully address these threats, especially in combination. To address these concerns, we use post-manufacturing programmable camouflaged logic topology to simultaneously obscure the design IP from the manufacturer as well as combat reverse engineering. The basis of the design is a threshold-voltage-defined (TVD) logic gate topology that solely uses different threshold voltage implants to determine the logic gate function [2]. Every gate has an identical physical layout and is post-manufacturing programmed with different threshold voltages for different Boolean functions using intentional directed hot-carrier injection (HCI). Similar intentional HCI techniques have previously been used to enhance SRAM margins, boost PUF reliability, and build TRNGs [3][4]. The design is fully compatible with standard CMOS logic processes, requiring no special layers, structures, or process steps.

Figure 7.6.1 shows our post-manufacturing programmed threshold voltage defined (PMP-TVD) gate which is a pre-charged differential structure with an embedded cross-coupled inverter positive feedback amplifier similar to that used in sense-amplifier-based logic (SABL) [5]. The inputs (A and B) select one branch on each of the left and right sides of the gate, and based on which side pulls more current, the amplifier structure locks to one of the output states. The logic function is post-manufacturing programmed into the gates via intentional directed HCI on the final device in the three NMOS stack leg. A PMP-TVD gate can either be “pre-programmed” with a particular logic function or “blank” (i.e., with no manufactured logic function, nominally balanced like a traditional sense amplifier). Pre-programmed gates use a mixture of HVT and LVT devices in the legs to set the logic function and allow for simpler post-manufacturing testing. Blank gates use all LVT devices and must be HCI programmed before use.

Before logic function programming, all gates are put in reset mode ( $CLK=0$ ), so the differential outputs of all the gates are 0, turning off all the input pull-down stacks. The bottom stress NMOSes of some legs are turned on (via  $HCI_{bar}=0$ ). The boosted  $V_{DDH}$  (3V) is applied and the selected stress NMOS devices see the HCI current in the opposite direction of the normal current flow, which results in maximizing the  $V_t$  increase of those NMOSes. During normal evaluation, the legs with the stressed NMOSes pull less current than their un-stressed counterparts on the opposite side. Thus, a blank gate can be programmed, a pre-programmed gate can be overwritten (different logic function programmed in) or boosted for higher performance (reinforce pre-programmed function), or a gate function can be erased (e.g., for an erase on tamper detection security feature).

The testchip (Fig. 7.6.7) contained three prototype structures: (1) a pipelined 4b carry-select adder using 2-input pre-programmed PMP-TVD gates, (2) a pipelined 4b carry-select adder using 2-input blank PMP-TVD gates, and (3) a 16b carry select adder using 2- and 3- input fixed TVD gates (i.e., no HCI programming devices). In the 4b adders (Fig. 7.6.2), the sum and carry generators are PMP-TVD gates; the MUXes and the latches are standard CMOS gates. Like other dynamic logic families, PMP-TVD gates have 2 phases of operation (precharge and evaluate), so the adders are split into two phases. The first phase consists of the carry and sum generators, and the second phase consists of the carry selection MUXes, with the data values latched in between. The chips were manufactured in a 9-metal layer 65nm bulk CMOS process with a 1V nominal  $V_{DD}$ .

Using HCI stress, the pre-programmed PMP-TVD gates can either be “boosted” (HCI stress is used to reinforce the pre-programmed logic function) or “reversed” (HCI stress is used to program in a different logic function than pre-programmed).

Fig. 7.6.3 shows the Shmoo plots for the pre-programmed adder design under no stress (baseline), 60 seconds reverse function stress from baseline, and 60 seconds boost stress from baseline. The 60 second reverse function stress fully alters the logic function of the pre-programmed gates. The stress voltage is 3V, resulting in a current density and voltage drop per leg of  $18.4\text{mA}/\mu\text{m}^2$  and  $2.67\text{V}$ .

The pre-programmed 4b adder operates between 1.8-4.08GHz with 0.35-2.15mW power consumption at a supply range of 0.7-1.2V. At nominal 1V  $V_{DD}$ , it operates at 3.21GHz with 1.14mW power consumption with 13% leakage power. After the same chip is HCI stressed for adder configuration boosting, the operating range became 1.87-4.3GHz with the upper range limited by test structures. After HCI stress, the blank version achieves a similar performance with the same range of operating frequency. Another 4b pre-programmed adder is HCI stressed to reverse the functionality. After the stress, the new function operates between 1.32-3.78GHz with 0.29-1.78mW power consumption at 0.7-1.2V supply. At nominal 1V  $V_{DD}$ , it operates at 2.89GHz with 0.96mW power consumption with 14% leakage. The 16b adder with fixed TVD gates operates between 474MHz-1.21GHz (0.7-1.2V  $V_{DD}$ ) with a power consumption of 0.889-5.46mW. At nominal  $V_{DD}$ , the adder operates at 1.03GHz with a power consumption of 3.22mW with 8% leakage.

Figure 7.6.4 shows the operating frequency of the blank PMP-TVD design as a function of stress time. Even with 10 seconds of stress, the blank design is sufficiently programmed to function correctly as an adder. Further stress reinforces the programming and increases the performance. The blue line shows the efficacy of using stress to reverse a pre-programmed gate, requiring at least 20 seconds of stress before the pre-programmed function is overridden.

To test the permanence of the HCI programming, we baked a test chip at 125°C for 48 hours (in two 24 hour steps) in a temperature chamber. After the initial 24 hours, the structure’s maximum frequency at nominal  $V_{DD}$  and room temperature decreased from a post-stress 3.74GHz to 3.52GHz. However, after the second 24 hours, the performance of the structure remained the same, showing a slight reversal from baking, but a plateauing and program retention under high temperature [3].

Figure 7.6.6 shows overhead and security comparisons of PMP-TVD gates compared to previously proposed camouflaged gates using dummy vias [6] and fixed TVD [2]. The dummy via design only addresses reverse engineering, and only partially so, since advanced reverse engineering can typically discern real from dummy vias. Fixed TVD gates more fully address reverse engineering and have low side-channel leakage due to their differential gate topology, but they do not address untrusted fab or have the ability to erase or re-program the logic function. PMP-TVD gates address both untrusted fab and reverse engineering threats, as well as having low side-channel leakage due to also having a differential gate topology.

### Acknowledgements:

The authors would like to thank DARPA for funding in support of this work.

### References:

- [1] R. Torrance and D. James, “Reverse Engineering in the Semiconductor Industry,” *CICC*, pp. 429-436, 2007.
- [2] B. Erbagci, et al., “A Secure Camouflaged Threshold Voltage Defined Logic Family,” *IEEE HOST*, pp. 229-235, 2016.
- [3] M. Bhargava and K. Mai, “A High Reliability PUF Using Hot Carrier Injection Based Response Reinforcement,” *CHES*, LNCS vol. 8086, pp. 90-106, 2013.
- [4] K. Miyaji, et al., “A 6T SRAM with a Carrier-injection Scheme to Pinpoint and Repair Fails That Achieves 57% Faster Read and 31% Lower Read Energy,” *ISSCC*, pp. 232-234, 2012.
- [5] K. Tiri, et al., “A Dynamic and Differential CMOS Logic with Signal Independent Power Consumption to Withstand Differential Power Analysis on Smart Cards,” *ESSCIRC*, pp. 403-406, 2002.
- [6] J. Rajendran, et al., “Security Analysis of Integrated Circuit Camouflaging,” *ACM CCS*, pp. 709-720, 2013.



Figure 7.6.1: Schematic of a 2-input PMP-TVD logic gate pre-programmed as a NAND. The stress NMOSes used to program, boost, reverse, or erase the logic function are marked in the blue dashed lines.



Figure 7.6.2: Structure and layout of the 4b PMP-TVD carry select adder.



Figure 7.6.3: Shmoo plot at room temperature for the 4b PMP-TVD pre-programmed adder: pre-programmed baseline (yellow), 1 minute of reverse stress (green), and 1 minute of boost stress (blue).



Figure 7.6.4: Frequency vs. HCI stress time plot of 4b blank PMP-TVD adder at 1V and room temperature (orange). Also, the blue line shows stress time needed to reverse pre-programmed PMP-TVD adder (20 seconds) and subsequent boosting of the reverse function.

|                   | 4-bit PMP-TVD<br>(pre-prog) | 4-bit PMP-TVD<br>(reverse) | 4-bit PMP-TVD<br>(boost) | 4-bit PMP-TVD<br>(blank) | 16-bit TVD<br>adder  |
|-------------------|-----------------------------|----------------------------|--------------------------|--------------------------|----------------------|
| Area<br>(w/ test) | 0.007mm <sup>2</sup>        |                            |                          |                          | 0.029mm <sup>2</sup> |
| Area<br>(core)    | 0.001mm <sup>2</sup>        |                            |                          |                          | 0.003mm <sup>2</sup> |
| Freq.<br>@ 1V     | 3.2GHz                      | 2.9GHz                     | 3.7GHz                   | 3.6GHz                   | 1.0GHz               |
| Power<br>@ 1V     | 1.14mW                      | 0.96mW                     | 1.09mW                   | 1.09mW                   | 3.22mW               |
| Leakage<br>@ 1V   | 0.15mW                      | 0.14mW                     | 0.14mW                   | 0.14mW                   | 0.26mW               |

Figure 7.6.5: Silicon results for 4b PMP-TVD (no stress), 4b PMP-TVD (60s reverse stress), 4b PMP-TVD (60s boost stress), 4b PMP-TVD (blank, 60s adder program), and 16b fixed TVD adder at 1V nominal  $V_{DD}$  and room temperature.

|      | Overhead vs. Static CMOS Std. Cell |       |      | TVD[2] |       |      | PMP-TVD (This Work) |       |      |
|------|------------------------------------|-------|------|--------|-------|------|---------------------|-------|------|
|      | Power                              | Delay | Area | Power  | Delay | Area | Power               | Delay | Area |
| NAND | 6.5X                               | 2.6X  | 5X   | 1.6X   | 3.2X  | 3.7X | 9.2X                | 6.6X  | 7.3X |
| NOR  | 6.1X                               | 2.1X  | 5X   | 1.9X   | 2.6X  | 3.7X | 4X                  | 5.4X  | 7.3X |
| XOR  | 1.8X                               | 1X    | 2.2X | 1.1X   | 1.7X  | 1.5X | 1.8X                | 3.4X  | 3X   |

  

|                          | Security Comparison |        | PMP-TVD (This Work) |
|--------------------------|---------------------|--------|---------------------|
|                          | Dummy via[6]        | TVD[2] |                     |
| Low Side-channel Leakage |                     |        | ●                   |
| Untrusted Fab            |                     |        | ●                   |
| Reverse Engineering      | ○                   | ●      | ●                   |
| Erasable/Programmable    |                     |        | ●                   |

Figure 7.6.6: Overhead comparison of dummy via [6], TVD [2], and PMP-TVD gates normalized to static CMOS standard cells. Also, security comparison of these gate types. Black dot indicates the extent that the gate type addresses the security threat (fully or partially).



Figure 7.6.7: Die shot of the test chip. 16b TVD adder (red), 4b PMP-TVD adder (green) and 4b PMP-TVD blank structure (blue) are highlighted.

## 7.7 A PUF Scheme Using Competing Oxide Rupture with Bit Error Rate Approaching Zero

Meng-Yi Wu, Tsao-Hsin Yang, Lun-Chun Chen, Chi-Chang Lin, Hao-Chun Hu, Fang-Ying Su, Chih-Min Wang, James Po-Hao Huang, Hsin-Ming Chen, Chris Chun-Hung Lu, Evans Ching-Sung Yang, Rick Shih-Jye Shen

eMemory, Hsinchu, Taiwan

Security is critical to today's interconnected world, and hardware protection is equally important as security at the network and system levels. Silicon physically unclonable functions (PUFs) are increasingly used as a hardware root of trust and an entropy source for cryptography applications. In those applications, the reliability of PUF output is key to a successful implementation. Both weak and strong PUFs obtain output by amplifying analog signals from physical properties on IC blocks (e.g. propagation delay, ring oscillator, time-controlled oxide breakdown [1] or threshold voltage of SRAM transistors [2,3,4]). These physical measurements are by nature sensitive to environmental conditions, such as temperature, operating voltage, thermal/interface noise of transistors, process corners and aging. As a result, it is difficult to obtain a stable PUF output without taking additional stabilization and error-correction techniques, e.g. temporal majority voting (TMV), pre-burning on PUF bits for end-of-life (EOL) prediction and reliability screening, masking algorithms, as well as leveraging parity bits for an Error-Correcting-Code (ECC) [3,4]. This paper presents a PUF architecture fabricated in 55nm ultra-low-power (ULP) CMOS and 55nm embedded Flash. The scheme is able to produce reliable and uniformly random PUF output without the need for complex error correction or error bit testing.

The scheme is designed to obtain PUF output by binarizing oxide breakdown behaviors in a MOSFET. The simplified PUF cell unit, shown in Fig. 7.7.1, is a MOS capacitor consisting of two AF gates [5] and connecting to a select transistor (WL). The initial state of the PUF unit, defined as "0", is high impedance with low gate leakage ( $<1\text{nA}$ ). With the WL turned on, the process is activated by applying a high voltage of  $5.5\text{V}$  within  $1\mu\text{s}$  on both AFO/AF1 as an irreversible filament. The stress will result in different levels of rupturing due to inherent differences of oxide thickness or uniformity, and an oxide breakdown will first occur on either side of AFO or AF1. By sharing a channel potential, the other side will instantly be relieved from the stress and avoid a breakdown, as the first breakdown has changed resistance of a random path and charged it. In the end, this proposed PUF unit will result in a strong on/off ratio of over  $10^3$  in conductive current ( $\mu\text{A/nA}$ ). The design of two competing AFs is intended to reflect then enhance the physical differences of unit paths. The competing mechanism is based on intrinsic gate dielectric behavior that is applicable in all CMOS processes. Note that there is a possibility that two AF gates breakdown simultaneously (negligible 500ppm) in 55nm technologies.

Figure 7.7.2 shows that the PUF macro comprising a 64K array, an embedded high voltage generator, internal biases regulator, analog parts, and control logic circuits. The PUF state as "1" or "0" is determined by comparing DL voltage and VREF through a voltage comparator. In the case of "1" (the breakdown side), the conductive current ( $\sim\mu\text{A}$ ) charges the DL from floating ground to high after a signal reset, and the DL voltage is sensed higher than the VREF. On the other hand, if the DL remains uncharged (without a breakdown) and its voltage is lower than VREF, it is categorized as "0". Most conventional PUF schemes use differential sensing to detect the complementary electrical signal pairs, but this approach occasionally produces confusing results such as "1"/"1" – that is the main source of bit error rate. In this paper, we propose single-ended sensing deployed on either AFO or AF1, not only for easy implementation, but also to avoid confusing results if "1"/"1" ever happens.

In cryptography applications, it is imperative that PUF responses are reliable enough to ensure consistent bit output. While conventional PUF schemes require complex stabilization procedures, including algorithms and error correction with helper data, the current PUF needs only simple margin checks to guarantee the functionality under varying conditions of temperature, biasing corners and aging end-of-life (EOL) prediction in field applications. Built-in self-test (BIST) logic circuits are implemented to detect weak bits by voltage voting, and no bit or chip failures were detected in all fabricated die tested. On uniqueness of the PUF output, Fig. 7.7.3 demonstrates an almost ideal value ( $\mu/\sigma=0.499999/0.031252$ )

of inter-ID Hamming-Distance (HD) among 63 sample chips of 64Kb PUF array (equivalent to  $16\text{K}\times256$  bit-strings), fabricated in the 55nm ULP and 55nm embedded Flash processes. On the reliability of PUF output, Intra-ID Hamming-Distance results show the bit-strings are always the same regardless of changing operating conditions (e.g. temperature ranging from  $-40^\circ\text{C}$  to  $150^\circ\text{C}$ ). Analysis of Hamming-Weight (HW) along WL and BL directions is also performed for layout dependency or any loading correlations. The mean HW close to 50% shows "1" and "0" bits are randomly distributed without biases. Die-to-Die hardware checks, presented in a cumulative Weibull plot from 63 samples, also demonstrate a statistically ideal HW of 50%. The insensitivity to process nodes means that the proposed PUF structure is a flexible, stable and almost ideal entropy source for a wide range of security applications. Furthermore, Fig. 7.7.4 shows results of the Bit-Error-Rate (BER) analysis under varying voltage, temperature and aging time. The PUF code and macro still work well without any bit changes even at the high temperature of  $150^\circ\text{C}$ . In addition, no entropy is lost in the High-Temperature Operating Life (HTOL) reliability stress, in which the PUF units and peripheries are stressed under  $125^\circ\text{C}$  for up to 1000hrs. PUF results are observed as consistently stable at different device corners, power supply and temperature. Repeated readout for intra-ID and BER check is also conducted on post burn-in samples, showing no error bits.

As shown in Fig. 7.7.5, no spatial correlations are found in the autocorrelation-function (ACF) study on the 64Kb of the purposed the PUF (mean value: 0; 95% confidence intervals:  $\pm 0.0076$ ). Also, an ACF test on sample chips shows the mean value at 0 and two standard deviations remain stable at  $\pm 2\sigma$ . The ACF test results indicate the PUF output bits are independent and stable either within a single chip or among different chips. The proposed PUF passes all NIST randomness tests [6] (Fig. 7.7.6). In addition, PUF sources, density, PUF unit size, ACF analysis, inter/intra-HD performance, readout power and operating temperature are benchmarked with mainstream PUF schemes [1,2,3], and it shows our proposed scheme has advantages in most metrics. Finally, Fig. 7.7.7 shows the micrograph top-view of the proposed PUF, including the PUF array and periphery circuits. The PUF exhibits easy implementation, high reliability, insensitivity to aging or environmental factors, and an entropy source which can generate a random number seed of up to 64Kb. Above all, the relatively low BER makes this scheme a reliable technology for security applications to secure hardware from the SoC level.

### Acknowledgments:

The authors would like to thank our colleagues at eMemory Technology who assisted with brainstorming, designing and testing of the proposed device. Also, special thanks to Taiwan-based leading foundries tsmc and UMC for their full support in shuttle program and testchip fabrication.

### References:

- [1] N. Liu, et al., "OxID: On-chip One-Time Random ID Generation Using Oxide Breakdown," *IEEE Symp. VLSI Circuits*, pp. 231-232, 2010.
- [2] Y. Su, et al., "A  $1.6\text{pJ/bit}$  96% Stable Chip-ID Generating Circuit Using Process Variations", *ISSCC*, pp. 406-407, 2007.
- [3] S. K. Mathew, et al., "A  $0.19\text{pJ/b}$  PVT-Variation-Tolerant Hybrid Physically Unclonable Function Circuit for 100% Stable Secure Key Generation in 22nm CMOS," *ISSCC*, pp. 278-279, 2014.
- [4] B. Karpinskyy, et al., "Physically Unclonable Function for Secure Key Generation with a Key Error Rate of  $2\text{E-}38$  in 45nm Smart-Card Chips," *ISSCC*, pp. 158-159, Feb. 2016.
- [5] K. Yang and C. Hu. "MOS Capacitance Measurements for High-Leakage Thin Dielectrics," *IEEE TED*, vol. 46, no. 7, pp. 1500-1501, 1999.
- [6] A. Rukhin, et al., "A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications", NIST, vol. 800-22, no. rev 1a, p. 131, 2010.



Figure 7.7.1: PUF unit with schematic diagram, operation conditions and equivalent models for initialization/enrollment/readout sensing.



Figure 7.7.2: Illustrations on PUF array architecture, sensing circuit and block diagram of BIST auto code verification.



Figure 7.7.3: Inter/intra-HD and HW analysis conducted across fab corners and WL/BL directions. Weibull plot across 63 samples also demonstrates an ideal HW of 50%.



Figure 7.7.4: BER consistently low under varying supplying voltages, temperatures, and aging time. BER on post burn-in samples also reflect output bits stable.



Figure 7.7.5: ACF measurements reveal no spatial correlations among bit strings generated from a single chip or various corners.

| #  | Test Name                     | Stream Length | No. of Runs | Min. Pass (%) | Average P-value | P/F | ISSCC07                      | VLSI 10   | ISSCC14  | This Work                        |
|----|-------------------------------|---------------|-------------|---------------|-----------------|-----|------------------------------|-----------|----------|----------------------------------|
| 1  | Frequency                     | 40000         | 75          | 97.33         | 0.4999          | P   | Entropy Source               | SRAM Base | AntiFuse | SRAM Base                        |
| 2  | BlockFrequency                | 40000         | 75          | 100           | 0.5087          | P   | Node                         | 0.13um    | 65nm     | 22nm                             |
| 3  | CumulativeSums Forward        | 40000         | 75          | 98.67         | 0.5084          | P   | Density                      | 128       | 128      | 100K                             |
| 4  | CumulativeSums Reverse        | 40000         | 75          | 98.67         | 0.4946          | P   | Unit Size (um <sup>2</sup> ) | 73.83     | 3        | 4.66                             |
| 5  | Runs                          | 40000         | 75          | 100           | 0.5384          | P   | ACF 95%                      | NA        | 0.0088   | 0.00766                          |
| 6  | LongestRun                    | 40000         | 75          | 100           | 0.4783          | P   | BER                          | 3.78%     | 0        | 0.97%                            |
| 7  | Rank                          | 40000         | 75          | 97.33         | 0.4568          | P   |                              |           |          | <500ppm (diff.) 0 (single ended) |
| 8  | FFT                           | 40000         | 75          | 97.33         | 0.5142          | P   |                              |           |          |                                  |
| 9  | NonOverlapping Template (m=9) | 40000         | 75          | 94.67         | 0.5060          | P   |                              |           |          |                                  |
| 10 | Overlapping Template (m=9)    | 40000         | 75          | 100           | 0.4498          | P   |                              |           |          |                                  |
| 11 | Universal                     | 1000000       | 3           | 100           | 0.6428          | P   |                              |           |          |                                  |
| 12 | ApproximateEntropy (m=10)     | 40000         | 75          | 100           | 0.4245          | P   | Stabilizing Method           | NA        | NA       | Burn-in/Voting Masking/ECC       |
| 13 | RandomExcursions              | 1000000       | 3           | 100           | 0.5701          | P   |                              |           |          | Voltages Voting                  |
| 14 | RandomExcursions Variant      | 1000000       | 3           | 100           | 0.4801          | P   |                              |           |          |                                  |
| 15 | Serial                        | 40000         | 75          | 100           | 0.5387          | P   | Inter HD                     | 0.50125   | 0.499375 | 0.49                             |
| 16 | LinearComplexity              | 1000000       | 3           | 100           | 0.7000          | P   | IntraHD                      | NA        | NA       | 0.0258                           |
|    |                               |               |             |               |                 |     | Op. Temp.                    | NA        | 0~85°C   | 25~50°C                          |
|    |                               |               |             |               |                 |     |                              |           |          | -40~150°C                        |

Figure 7.7.6: The presented PUF passes NIST 800-22 randomness tests and demonstrates various advantages comparing to prior art.



Figure 7.7.7: Top view of the PUF micrograph shows a 64K PUF array and comprehensive circuitries with HV, analog and logic BIST.

## 7.8 A 445F<sup>2</sup> Leakage-Based Physically Unclonable Function with Lossless Stabilization Through Remapping for IoT Security

Jongmin Lee, Donghyeon Lee, Yongmin Lee, Yoonmyung Lee

Sungkyunkwan University, Suwon, Korea

With the advent of the IoT era, billions of devices are connected to networks, and assuring sufficient security at low cost is a critical concern. Physically Unclonable Functions (PUFs) have drawn increasing attention as key security building blocks for authentication since each PUF circuit has unique challenge response pairs (CRPs). Such uniqueness is achieved by maximizing the effects of process variation using process-sensitive circuits, i.e. PUF cells. Recently reported PUF cell types include cells based on a two-transistor amplifier [1], NAND gate [2], ring oscillator [3], current mirror [4], back-to-back connected inverters [5], and inverter [6]. Regardless of the variation source, PUFs inevitably include CRPs that respond inconsistently when the process variation of the compared element in the CRP is small compared to noise. For example, if the output of a two-transistor amplifier in [1] is near the switching threshold, the output can be inconsistent, resulting in bit error and an unstable CRP. Thus, efforts have focused on stabilizing unstable CRPs. The most straightforward stabilization scheme is temporal majority voting (TMV) [1,5], but the improvement in bit error rate (BER) and stability is limited since it does not directly address the instability of a given CRP. Trimming [2,3,5,6], another widely used approach, improves BER/stability by discarding unstable CRPs. However, stability evaluation is not very accurate, so the number of discarded CRPs can be significant (up to 30% in [3]), increasing the required silicon area for additional CRP generation and making it prohibitive for cost-sensitive IoT applications. This is especially true for weak PUFs. In this paper, a leakage-based PUF that allows lossless stabilization through remapping of unstable PUF cell pairs is presented. BER and stability comparable to, or better than, trimming stabilization method are achieved without discarding CRPs.

The leakage-based PUF cell that generates key values according to leakage current mismatch is shown in Fig. 7.8.1 (top-left). Initially the PMOS gate ( $G_P$ ) is pre-charged to  $V_{DD}$ , and the NMOS gate ( $G_N$ ) is pre-discharged to GND. As the cell is evaluated,  $G_N$  rises, and  $G_P$  falls due to sub-threshold leakage current through PMOS/NMOS. The voltage developed at  $G_N/G_P$  then forms positive feedback to accelerate the leakage current, resulting in a sharp voltage transition of  $G_N$  to  $V_{DD}$  and  $G_P$  to GND. The time required to reach this latching event is inversely proportional to the leakage current, and the leakage current is exponentially dependent on the threshold voltages ( $V_{th}$ ) of the leaking transistors in the sub- $V_{th}$  operation region. Thus, the latching delay is more sensitive to  $V_{th}$  in the sub- $V_{th}$  operation region than in the super- $V_{th}$  region, as shown in Fig. 7.8.1. The latching delay then results in a wide distribution with a high  $\sigma/\mu$  value, increasing stabilization. A key value can be generated by comparing the latching delay of two randomly selected cells from a cell array; the key value will also have wide distribution (Fig. 7.8.1).

An array of leakage-based PUF cells can be constructed as in Fig. 7.8.2. To minimize area, the leaking PMOS is shared, and actual PUF cells contain only leaking NMOS. By connecting only one NMOS to the shared PMOS, a thyristor-like latching structure is formed. The shared PMOS is wide to minimize variation from array to array and long to minimize impact on the latching time, so that the latching time is more sensitive to the  $V_{th}$  of the leaking NMOS than the PMOS. Leaking NMOS at the coordinate (i,j) is selected by connecting the i-th row gate signal ( $G_{Ni}$ ) to  $G_N$  and enabling the j-th column with high- $V_{th}$  transmission gate enable signal ( $EN_j$ ). To minimize the impact of leakage current injected to  $G_P$  from unselected NMOS, another version of a leakage-based PUF cell is designed with a footer (Fig. 7.8.1, bottom left). In this version, the source of the unselected NMOS is set to  $V_{DD}$  during evaluation to minimize undesired leakage through unselected NMOS.

For latching delay comparison, a pair of PUF cell arrays is used. First, one NMOS is selected in each array and  $G_P$  is pre-charged, and  $G_N$  is pre-discharged in each array. As the pre-charge and pre-discharge are released, the latching delay of each cell is evaluated. The transition of the  $G_N$  node in each array is then detected by cross-connected D flipflops, and the key value is determined as shown in the look-up table in Fig. 7.8.2. An advantage of the paired array structure is that it can function as both: 1) a strong PUF with  $2^{2N}$  CRPs for maximizing the number of CRPs, and 2) a weak PUF with  $2^N$  independent CRPs for robustness against machine-learning attack, assuming N-bit addresses for each array. Another

advantage is that the latching delay of  $G_N$  in each array can be measured with a counter during enrollment, and the counted value enables stabilization schemes such as trimming and remapping.

The uniqueness and reproducibility of the implemented PUF is measured by the Hamming distance (HD) of 128b PUF words (Fig. 7.8.2). Normalized inter-HD is 0.4920/0.5000, and intra-HD is 0.0072/0.0069 for footless/footed cells. The separation between inter-HD and intra-HD is 68.43×/72.99× for footless/footed cells, demonstrating the identifiability of the PUF. The PUF uniformity is confirmed by spatial autocorrelation bounds of 0.0226/0.0229 and 0.0221/0.0224 (footless/footed).

The remapping scheme for lossless stabilization is shown in Fig. 7.8.3. During enrollment, key and latching delay values for randomly selected independent CRPs are read out. Unstable CRPs can be identified by finding cell pairs with a small latching delay difference. With conventional trimming, the unstable CRPs are simply discarded, resulting in CRP loss. However, in a remapping scheme, unstable CRPs are reused. For example, cell pairs (L1,R1) and (L2,R2) are both unstable due to small delay differences ( $d_{L1} \approx d_{R1}$ ,  $d_{L2} \approx d_{R2}$ ); however, the delay of cells in one pair can be still be very different from that of the other pair ( $d_{L1}, d_{R1} \ll d_{L2}, d_{R2}$ ). In this case, the remapped pairs (L1,R2) and (L2,R1) can be stored as stable CRPs without additional time overhead for actual re-examination, significantly improving BER and stability without losing CRP count.

The proposed PUF is fabricated in a 180nm process, and the BER/stability results are shown in Fig. 7.8.4. With 1,000 evaluations for 15 chips each, for footless/footed cells, the native BERs are 0.72%/0.69% and number of unstable cells are 6.65%/5.69%. BER/stability results over 1.2-1.8V and 0-80°C are also shown in Fig. 7.8.4. The stabilization scheme's effectiveness is evaluated over a range of temperatures with 10% of CRPs being trimmed/remapped. The set of CRPs to be trimmed/remapped is determined using the delay data extracted during the enrollment process performed at nominal condition (25°C, 1.4V). The CRPs with trimming/remapping applied are recorded, and errors in response to changes in voltage and temperature are measured based on these CRPs. By applying stabilization at room temperature, for footed case, the BER was improved by 33.7× and 38× with trimming+TMV11 and remapping+TMV11, respectively, and stability was improved by 63.1× and 71.7×.

Figure 7.8.5 shows the BER and stability improvement trend vs. the percentage of CRPs trimmed/remapped. For fair comparison, remapping is only performed among PUF cells that are to be included in the percentage trimmed, limiting the available cells to be remapped. However, in real usage, remapping can be performed even between stable and unstable cells to maximize the delay difference and hence drastically increase the stability of the remapped cells. Measurement results show that BER and stability clearly improve as the ratio of trimmed/remapped cells increases. By remapping 10% of CRPs and applying TMV11, BER/number of unstable cells are reduced to 0.004%/0.04% for footless cells and 0.019%/0.08% for footed cells. Fig. 7.8.6 summarizes the results and compares with prior art.

### Acknowledgements:

This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (2016R1C1B2009047).

### References:

- [1] K. Yang, et al., "A 553F<sup>2</sup> 2-Transistor Amplifier-Based Physically Unclonable Function (PUF) with 1.67% Native Instability," *ISSCC*, pp. 146-147, 2017.
- [2] B. Karpiński, et al., "Physically Unclonable Function for Secure Key Generation with a Key Error Rate of 2E<sup>-38</sup> in 45nm Smart-Card Chips," *ISSCC*, pp. 158-159, 2016.
- [3] K. Yang, et al., "A Physically Unclonable Function with BER<10<sup>-8</sup> for Robust Chip Authentication Using Oscillator Collapse in 40nm CMOS," *ISSCC*, pp. 254-255, 2015.
- [4] A. Alvarez, et al., "15fJ/b Static Physically Unclonable Functions for Secure Chip Identification with <2% Native Bit Instability and 140x Inter/Intra PUF Hamming Distance Separation in 65nm," *ISSCC*, pp. 258-259, 2015.
- [5] S. Mathew, et al., "A 0.19pJ/b PVT-Variation-Tolerant Hybrid Physically Unclonable Function Circuit for 100% Stable Secure Key Generation In 22nm CMOS," *ISSCC*, pp. 278-279, 2014.
- [6] S. Stanzione, et al., "CMOS Silicon Physically Unclonable Functions Based on Intrinsic Process Variability," *JSSC*, vol. 46, no. 6, pp. 1456-1463, 2011.



Figure 7.8.1: The structure and characteristics of leakage-based PUF cell.



Figure 7.8.3: Proposed remapping scheme for lossless stabilization.



Figure 7.8.5: Measured BER/stability improvement with trimming/remapping.



Figure 7.8.2: Array structure of leakage-based PUF and measurement result of Hamming distance and autocorrelation.



Figure 7.8.4: Measured BER/stability vs number of evaluations and vs voltage/temperature.

|                                    | This work                                              |                                                     | ISSCC' 17 [1]            | ISSCC' 16 [2]     | ISSCC'15 [3]     | ISSCC'15 [4]      | ISSCC'14 [5]                      | JSSC' 11 [6]     |
|------------------------------------|--------------------------------------------------------|-----------------------------------------------------|--------------------------|-------------------|------------------|-------------------|-----------------------------------|------------------|
|                                    | Footless                                               | Footed                                              |                          |                   |                  |                   |                                   |                  |
| Technology                         | 180nm                                                  | 45nm                                                | 180nm                    | 45nm              | 65nm             | 22nm              | 90nm                              |                  |
| PUF cell area / independent CRP(F) | 445*                                                   | 890*                                                | 782(LVT)<br>553(DNW)     | 2613              | 66016**          | 6000              | 9632                              | 270062**         |
| Core area (F)                      | 228054                                                 | 455569                                              | -                        | -                 | 528125           | -                 | -                                 | 4320988          |
| Number of Independent CRPs         | 512                                                    | -                                                   | -                        | -                 | 8                | -                 | -                                 | 16               |
| Native Unstable bits               | 6.65%                                                  | 5.62%                                               | 1.73%(LVT)<br>1.67%(DNW) | -                 | -                | 1.73%             | -30%                              | -                |
| Native BER                         | 0.72%                                                  | 0.69%                                               | 0.18%(LVT)<br>0.13%(DNW) | -                 | -9%              | -                 | 8.3%                              | -                |
| Unstable bits after stabilization  | 1.89% (TMV11)<br>0.12% (T+T)***<br>0.04% (R+T)***      | 1.76% (TMV11)<br>0.03% (T+T)***<br>0.08% (R+T)***   | 0.95%(LVT)<br>0.50%(DNW) | -                 | -                | -                 | 3%                                | 0.1%             |
| BER after stabilization            | 0.30% (TMV11)<br>0.17% (T+T)***<br>0.004% (R+T)***     | 0.25% (TMV11)<br>0.009% (T+T)***<br>0.019% (R+T)*** | 0.08%(LVT)<br>0.05%(DNW) | 0.1%              | -2%***           | -                 | 0.97%                             | -                |
| Stabilization Method               | TMV11<br>Trimming+TMV11 (T+T)<br>Remapping+TMV11 (R+T) | TMV11<br>Valid map<br>Dynamic Thresholding          | -                        | -                 | -                | -                 | TMV15<br>+ Burn-in<br>+ Dark bits | Mask             |
| Tested Condition                   | Voltage(V)<br>Temp(°C)                                 | 1.2-1.8<br>0-80                                     | 0.8-1.8<br>-40-120       | 0.7-1.2<br>-25-85 | 0.7-1.0<br>25-85 | 0.7-0.9<br>25-125 | ±10.0%<br>25-50                   | ±10.0%<br>25-125 |
| Bit rate(Mb/s)                     | 0.018                                                  | 0.018                                               | 4832                     | 1.92              | 1.6              | -                 | 2000                              | 0.00625          |
| Core energy(pJ/bit)                | 9.8                                                    | 3.6                                                 | 0.014(LVT)<br>0.011(DNW) | -                 | 17.75            | 0.015             | 0.013                             | 38               |

\*: Area for two cells = 1pair   \*\*: Area per independent CRP = Core area / # of independent CRP  
\*\*\*: 10% trimming case   \*\*\*\*: 10% remapping case   \*\*\*\*\*: Voltage variation

Figure 7.8.6: Summary of measurement results and comparison with prior-arts.



Reported PUF: Footed & Footless cells in 64x8 array pair

Figure 7.8.7: Die micrograph and layout of PUF cells.

# Session 8 Overview: *Wireless Power and Harvesting*

## POWER MANAGEMENT SUBCOMMITTEE



**Session Chair:**  
**Yuan Gao**  
*IME, A\*STAR, Singapore*



**Associate Chair:**  
**Zhiliang Hong**  
*Fudan University, Shanghai, China*

**Subcommittee Chair: Axel Thomsen, Cirrus Logic, Austin TX**

Innovations in energy harvesting continue to enhance power-conversion efficiency, reduce circuit self-startup voltage, and system static-power consumption. Various new energy sources including a triboelectric nanogenerator and MEMS AlN-on-Si piezoelectric harvesters have been reported. New design methodologies for wireless-power and piezoelectric-vibration-energy harvesting are improving the state of the art.



8:30 AM

### 8.1 A 960pW Co-Integrated-Antenna Wireless Energy Harvester for WiFi Backchannel Wireless Powering

K. R. Sadagopan, Oregon State University, Corvallis, OR

In Paper 8.1, Oregon State University and Texas Instruments present a 2.4GHz antenna co-integrated wireless energy harvester powered by a WiFi radio. Implemented in 65nm CMOS, the circuit only consumes 960pW and achieves a sensitivity of -36dBm in primary mode and -33dBm in cold-start mode.



9:00 AM

### 8.2 A 70W and 90% GaN-Based Class-E Wireless-Power-Transfer System with Automatic-Matching-Point-Search Control for Zero-Voltage Switching and Zero-Voltage-Derivative Switching

Y.-T. Lin, National Chiao Tung University, Hsinchu, Taiwan

In Paper 8.2, National Chiao Tung University presents a 6.78MHz GaN-based Class-E wireless-power system. A dual-loop control for automatic matching-point searching and linearization-compensation-capacitance tuning is proposed. Peak efficiency up to 90% is achieved under 70W output power.



9:30 AM

### 8.3 A Reconfigurable Cross-Connected Wireless-Power Transceiver for Bidirectional Device-to-Device Charging with 78.1% Total Efficiency

F. Mao, University of Macau, Macau, China

In Paper 8.3, the University of Macau presents a 6.78MHz cross-coupled reconfigurable wireless-power transceiver for device-to-device (D2D) charging. With the proposed near-optimum switch-timing control schemes, the peak D2D total efficiency is 78.1% and the maximum charging power is 2.7W.



9:45 AM

### 8.4 A 13.56MHz Wireless Power and Data Transfer Receiver Achieving 75.4% Effective-Power-Conversion Efficiency with 0.1% ASK Modulation Depth and 9.2mW Output Power

Y. Wang, Fudan University, Shanghai, China

In Paper 8.4, Fudan University presents a 13.56MHz wireless power- and data-transfer receiver for implantable biomedical devices. To avoid the efficiency reduction due to the large AM modulation depth and to maintain RX low-power consumption, the receiver with shifted limiters is co-designed with an active rectifier using dynamic impedance matching and adaptive conversion-ratio tuning. Power conversion efficiency of 75.4% at 0.1% AM MD is achieved.



10:15 AM

**8.5 MISIMO: A Multi-Input Single-Inductor Multi-Output Energy Harvester Employing Event-Driven MPPT Control to Achieve 89% Peak Efficiency and a 60,000x Dynamic Range in 28nm FDSOI**
*S. S. Amin*, University of California, San Diego, La Jolla, CA

In Paper 8.5, the University of California, San Diego, presents a multi-input single-inductor multi-output energy harvester employing event-driven MPPT control to achieve 89% peak efficiency and a 60,000 $\times$  dynamic range in a 28nm FDSOI process.



10:45 AM

**8.6 A 4.5-to-16 $\mu$ W Integrated Triboelectric Energy-Harvesting System Based on High-Voltage Dual-Input Buck Converter with MPPT and 70V Maximum Input Voltage**
*I. Park*, Korea University, Seoul, Korea

In Paper 8.6, Korea University presents a triboelectric energy-harvesting system based on a high-voltage dual-input buck converter with a maximum power-point analysis. Implemented in a 0.18 $\mu$ m CMOS BCD process, the input voltage ranges up to 70V, and the maximum power conversion efficiency is 51.1%.

8



11:00 AM

**8.7 A Piezoelectric Energy-Harvesting Interface Circuit with Fully Autonomous Conjugate Impedance Matching, 156% Extended Bandwidth, and 0.38 $\mu$ W Power Consumption**
*Y. Cai*, University of Freiburg - IMTEK, Freiburg, Germany

In Paper 8.7, the University of Freiburg presents an interface circuit for piezoelectric energy harvesting with fully autonomous conjugate-impedance matching. Implemented in a 0.35 $\mu$ m CMOS, the bandwidth is increased by 156% over the natural frequency of the circuit, and the power output at the resonant frequency is increased by 26%.



11:15 AM

**8.8 A 30nA Quiescent 80nW-to-14mW Power-Range Shock-Optimized SECE-Based Piezoelectric Harvesting Interface with 420% Harvested-Energy Improvement**
*A. Morel*, CEA-LETI-MINATEC, Grenoble, France

In Paper 8.8, CEA-LETI-MINATEC presents an SECE piezoelectric energy-harvesting interface circuit in 40nm CMOS. The circuit is optimized to work under shock stimulus and features a 30nA quiescent current in sleep mode and an event-driven sequencing in harvesting mode. The peak end-to-end efficiency is 94%, and the circuit can harvest vibrations in the 80nW-to-14mW power range with up to 10V input voltage.



11:45 AM

**8.9 A Fully Integrated Split-Electrode Synchronized-Switch-Harvesting-on-Capacitors (SE-SSHc) Rectifier for Piezoelectric Energy Harvesting with Between 358% and 821% Power-Extraction Enhancement**
*S. Du*, University of Cambridge, Cambridge, United Kingdom

In Paper 8.9, the University of Cambridge presents an inductorless split-electrode-synchronized-switch harvesting-on-capacitors (SE-SSHc) rectifier, fully integrated in 0.18 $\mu$ m CMOS, for piezoelectric energy harvesting. Co-integrated with a custom MEMS device comprising of a split-electrode topology, up to 821% of power-harvesting improvement is demonstrated with a peak power of 186 $\mu$ W.



12:00 PM

**8.10 A 13.56MHz Time-Interleaved Resonant-Voltage-Mode Wireless-Power Receiver with Isolated Resonator and Quasi-Resonant Boost Converter for Implantable Systems**
*S-U. Shin*, KAIST, Daejeon, Korea

In Paper 8.10, KAIST presents a 13.56MHz time-interleaved resonant-voltage-mode wireless-power receiver in 0.18 $\mu$ m CMOS. The resonant-capacitor-interleaving scheme isolates the LC tank from the output and maintains optimal power transfer regardless of the operation phase. The circuit achieves the maximum receiver efficiency of 67.8%.

## 8.1 A 960pW Co-Integrated-Antenna Wireless Energy Harvester for WiFi Backchannel Wireless Powering

Kamala Raghavan Sadagopan<sup>1</sup>, Jian Kang<sup>1</sup>, Yogesh Ramadass<sup>2</sup>, Arun Natarajan<sup>1</sup>

<sup>1</sup>Oregon State University, Corvallis, OR; <sup>2</sup>Texas Instruments, Santa Clara, CA

Leveraging the ubiquitous WiFi infrastructure to wirelessly power sensors can enable perpetually powered sensors for several monitoring and asset-tracking IoT applications. Small form factor is often desirable to ensure unobtrusive sensors. However, typical 2.4GHz WiFi output power of  $<+20\text{dBm}$  implies  $\sim-30\text{dBm}$  ( $1\mu\text{W}$ ) incident power (assuming free space path loss) at a  $\sim 3\text{m}$  range. This presents a fundamental trade-off since small antenna area can further restrict the wireless power available to the rectifier/harvester. In addition, the time-varying nature of RF wireless powering implies that the energy-harvesting approach must accommodate cold start. In this work, we address the challenge of simultaneously achieving small form factor,  $\mu\text{W}$ -scale wireless input sensitivity, and operation at relatively high frequency (2.4GHz) by co-designing the antenna, rectifier, and DC-DC converter, achieving  $-36\text{dBm}$  input sensitivity for a 0.8V output in primary operating mode and  $-33\text{dBm}$  sensitivity from cold start with overall  $1.97\text{cm}^2$  area (including antenna). In contrast to prior work, the proposed wireless harvesting approach optimally extracts energy from the wireless beacon even with  $<-30\text{dBm}$  ( $1\mu\text{W}$ ) incident power levels. The harvester consumes 960pW quiescent power while supporting cold start. The feasibility of the proposed approach is demonstrated by harvesting energy from a commercial WiFi node.

Figure 8.1.1 shows the schematic of the proposed wirelessly powered IC. An antenna-rectifier co-integration approach is adopted in which the antenna is resonated directly with the rectifier to provide passive boost [1]. The resulting higher voltage swing at rectifier input increases rectifier efficiency and open-circuit voltage,  $V_{\text{RECT,OC}}$ , for a given RF incident power. Figure 8.1.2 shows the measured  $V_{\text{RECT,OC}}$ , the maximum power point  $V_{\text{RECT,MP}}$ , and the available power from the co-designed antenna-rectifier versus power available to an isotropic antenna at harvester location,  $P_{\text{AV,ISO}}$  [1,2]. Prior work has focused either on directly operating sensors that load the rectifier or on operating sensors once the rectifier achieves a targeted voltage (typically  $\sim 0.8$  to 1V) on the storage capacitor [1-5,7]. As a result, energy from the wireless signal is only utilized when  $P_{\text{AV,ISO}}$  is high enough to ensure sufficient rectifier output voltage. However, Fig. 8.1.2 suggests that a DC-DC converter that is capable of operating with low  $V_{\text{RECT}}$  and with  $<1\text{nW}$  quiescent power can extract energy from the rectifier output at low incident power levels. This energy can then be accumulated on storage capacitors for subsequent sensor activation. Such an approach is particularly well-suited for IoT applications that operate with very low duty cycles and is advantageous if it achieves a steady-state voltage on the storage capacitor that is higher than  $V_{\text{RECT,OC}}$  for a given incident power.

The architecture of the low-voltage boost converter with cold-start capability that follows the antenna-cointegrated rectifier is shown in Fig. 8.1.1. The charge pump and ring oscillator are initially powered by  $V_{\text{RECT}}$  for cold start. As  $V_{\text{RECT}}$  increases, the ring oscillator is enabled (at  $V_{\text{RECT}} \sim 140\text{mV}$ ), which allows the 5-stage Dickson charge-pump to achieve  $V_{\text{CP}} > 0.8\text{V}$  (at  $V_{\text{RECT}} \sim 230\text{mV}$ ). The pulse generator initially operates at  $V_{\text{CP}}$ , generating pulses  $P_1$  and  $P_2$  that enable charging of  $C_{\text{LOAD}}$  through S1 and primarily S2. Once the voltage  $V_{\text{LOAD}}$  on  $C_{\text{LOAD}}$  exceeds a programmable threshold,  $V_{\text{CS}}$  (nominally 0.8V), the comparator activates the cold-start disable signal,  $\text{CS\_DIS}$ , starting the primary mode of operation. In this mode, the DC-DC converter operates from  $V_{\text{LOAD}}$  and the switch S4 is disabled, disconnecting the ring-oscillator and the charge pump from  $V_{\text{RECT}}$ . Additionally, the switch S5 is disabled, disconnecting diode-connected S2, and the switch S6 is enabled, connecting the charge pump output voltage,  $V_{\text{CP}}$ , and  $V_{\text{LOAD}}$  together. Critically, the primary mode reduces the parasitic load on  $V_{\text{RECT}}$  and hence increases the power available to the DC-DC converter, enabling further charging of  $C_{\text{LOAD}}$  and an increase in  $V_{\text{LOAD}}$  beyond 0.8V. If  $C_{\text{LOAD}}$  is used to power a sensor, the resulting energy depletion causes a decrease in  $V_{\text{LOAD}}$ . The hysteresis in the cold-start comparator ensures stable transition from cold start to primary mode and back to cold-start mode once  $V_{\text{LOAD}}$  falls below 0.5V.

The DC-DC converter must operate within the available power in the cold-start mode and primary mode to ensure charging of  $C_{\text{LOAD}}$ . For a DC-DC converter operating at switching frequency,  $f_s$ , the rectifier power constraint can be reduced to a per-cycle constraint where the energy consumed by converter building blocks in each cycle must not exceed the available energy-per-cycle (Fig. 8.1.3). As the maximum available power is fixed for a given input power, the available energy per cycle scales down as  $f_s$  increases. The DC-DC converter switch losses are calculated assuming the converter is operating at the maximum power point [6]. The quiescent power consumption is dominated by the leakage power at low  $f_s$  and by the dynamic power at higher  $f_s$ . Comparing the energy available per cycle to the total energy consumed per cycle at a given  $f_s$  yields the range of switching frequencies that lead to net charging of  $C_{\text{LOAD}}$ . Figure 8.1.3 shows the simulated trade-offs across switching frequencies for a  $P_{\text{AV,ISO}}$  of  $-37\text{dBm}$  and  $-33\text{dBm}$ . The DC-DC switch loss is the most

significant fraction of the energy consumption in the DC-DC converter for a  $P_{\text{IN}}$  of  $-37\text{dBm}$ . However, this work is focused on sensitivity and hence the DC-DC switches are sized to maximize sensitivity under cold-start conditions leading to a size of  $1.6\text{mm} \times 320\text{nm}$  and  $1.78\text{mm} \times 400\text{nm}$  for the switches S1 and S3, respectively. Thick oxide 2.5V switches were used to reduce leakage. To achieve acceptable conduction losses and peak inductor current, the switches are driven from 0.8V (using the charge-pump) during the cold-start mode and from  $V_{\text{LOAD}}$  during the primary mode. The simulated block-level power consumption is shown in Fig. 8.1.3. Measured power consumption ( $P_0$ ) for  $V_{\text{LOAD}}$  and  $V_{\text{CP}}$  at 0.8V is also shown with measured  $P_0$  below 1nW at 0.8V  $V_{\text{LOAD}}$ .

The DC-DC converter can be regulated to achieve the maximum power point shown in Fig. 8.1.2 by using the comparator, COMP1, and an off-chip logic to change the frequency and pulse width of  $P_1$  signal. A zero-current-switching scheme is implemented for the  $P_2$  pulse to optimize the pulse width for highest efficiency. The 65nm CMOS IC (Fig. 8.1.7) occupies  $1.6\text{mm}^2$ . The IC is packaged with an antenna on Rogers 4350B using a chip-on-board approach. All tests are performed using wireless power and a storage capacitor of  $1\mu\text{F}$ .

Figure 8.1.3 shows the measured  $V_{\text{LOAD}}$  for a given  $P_{\text{AV,ISO}}$  ( $-33\text{dBm}$ ) as a function of switching frequency (measured with an external clock). An  $f_s$  of 10Hz is found to be optimal to ensure the highest  $V_{\text{LOAD}}$  and fastest charging time. Lower power levels require even lower  $f_s$ . Figure 8.1.4 shows the measured transient for  $V_{\text{LOAD}}$  and  $\text{CS\_DIS}$  for  $P_{\text{AV,ISO}}$  of  $-33\text{dBm}$  and  $-30\text{dBm}$ . The wireless energy harvester achieves  $V_{\text{LOAD}} > 0.8\text{V}$  (leading to cold-start disable) for an input power of  $-33\text{dBm}$  from cold start. Once cold start is disabled, the reduced load on  $V_{\text{RECT}}$  (since the charge pump and ring-oscillator are disabled) leads to a subsequent increase in  $V_{\text{LOAD}}$  as more energy is transferred to  $C_{\text{LOAD}}$ . Figure 8.1.4 plots a similar transient for a higher  $P_{\text{AV,ISO}}$  of  $-30\text{dBm}$  achieving a higher  $V_{\text{LOAD}}$ . Figure 8.1.4 also shows the measured transient when  $P_{\text{AV,ISO}}$  is varied dynamically. Following an initial  $P_{\text{AV,ISO}}$  of  $-33\text{dBm}$  that ensures transition from the cold-start mode to the primary mode, the  $P_{\text{AV,ISO}}$  is lowered. Since the rectifier load is reduced in the primary mode, the energy harvester still charges  $C_{\text{LOAD}}$  at lower  $P_{\text{AV,ISO}}$  with a minimum  $P_{\text{AV,ISO}}$  of  $-36\text{dBm}$  for maintaining  $V_{\text{LOAD}} > 0.8\text{V}$ . Notably, the  $V_{\text{RECT,OC}}$  at this  $P_{\text{AV,ISO}}$  is 0.4V, demonstrating the ability of the harvester to operate with low input voltages and low available power.

Figure 8.1.5 shows a demonstration of the proposed application, where a WiFi signal from an Adafruit Feather M0 WiFi (WiFi shield) is used to transfer power to the energy harvester. The Feather M0 operates in the 2.4GHz WiFi band and is configured to operate as a WiFi hotspot, leading to periodic SSID transmissions in Channel 3 (2.422GHz). The  $\sim 1\text{nW}$  wireless energy harvester is placed at different distances from the Feather M0 and charging transients are measured. The energy harvester, operating with no other external inputs, achieves 1V output for the range of 0.5m and 0.6V output for the range of 0.75m between the WiFi shield and the device (typical output power of  $15.5\text{dBm}$ ), demonstrating the feasibility of sensors being powered by ubiquitous WiFi routers. Importantly, the WiFi signal is not modified in this scheme demonstrating the feasibility of harvesting  $\mu\text{Joules}$  of energy during WiFi transmission. Beamforming and higher power WiFi radios can further increase energy harvesting range.

The performance is compared to prior art in Fig. 8.1.6. The proposed approach achieves a sensitivity of  $-36\text{dBm}$  in the primary mode for 0.8V output while ensuring optimal energy transfer from the rectifier to the storage capacitor.

### Acknowledgements:

The authors thank Intel and Texas Instruments for supporting this research and Tektronix for equipment assistance.

### References:

- [1] J. Kang, et al., "A 3.6cm<sup>2</sup> Wirelessly-Powered UWB SoC with  $-30.7\text{dBm}$  Rectifier Sensitivity and Sub-10cm Range Resolution", *IEEE RFIC*, pp. 255-258, May 2015.
- [2] M. Stoopman, et al., "Co-Design of a CMOS Rectifier and Small Loop Antenna for Highly Sensitive RF Energy harvesters," *IEEE JSSC*, vol. 49, no. 3, pp. 622-634, Mar. 2014.
- [3] G. Papotto, et al., "A 90-nm CMOS Threshold-Compensated RF Energy Harvester," *IEEE JSSC*, vol. 46, no. 9, pp. 1985-1997, Sept. 2011.
- [4] T. Le, et al., "Efficient Far-Field Radio Frequency Energy Harvesting for Passively Powered Sensor Networks," *IEEE JSSC*, vol. 43, no. 5, pp. 1287-1302, May 2008.
- [5] S. Mandal and R. Sarapeshkar, "Low-power CMOS Rectifier Design for RFID Applications," *IEEE TCAS-I*, vol. 54, no. 6, pp. 1177-1188, June 2007.
- [6] S. Bandyopadhyay, et al., "A 1.1 nW Energy-Harvesting System with 544 pW Quiescent Power for Next-Generation Implants" *IEEE JSSC*, vol. 49, no. 12, pp. 2812-2824, Sept. 2014.
- [7] J. Kang, et al., "A 1.2cm<sup>2</sup> 2.4GHz self-oscillating rectifier-antenna achieving  $-34.5\text{dBm}$  sensitivity for wirelessly powered sensors," *IEEE ISSCC*, pp. 374-375, Feb. 2016.
- [8] M. Choi, et al., "A Resonant Current-Mode Wireless Power Receiver and Battery Charger With  $-32\text{ dBm}$  Sensitivity for Implantable Systems," *IEEE JSSC*, vol. 51, no. 12, pp. 2880-2892, Dec. 2016.



Figure 8.1.1: Pico-Watt wireless energy harvester for WiFi backchannel powering with co-designed high-Q antenna, rectifier, and DC-DC converter with cold-start capability.



Figure 8.1.2: Measured rectifier open-circuit voltage  $V_{RECT,OC}$  and optimum voltage  $V_{RECT,MP}$  that maximizes available power as a function of  $P_{AV,ISO}$ .



Figure 8.1.3: Simulated DC-DC converter loss across switching frequencies ( $f_s$ ). Measured  $V_{LOAD}$  across different  $f_s$ . Simulated and measured quiescent power consumption.



Figure 8.1.4: Measured  $V_{LOAD}$  transients across  $P_{AV,ISO}$  demonstrate -33dBm sensitivity in the cold-start mode and -36dBm sensitivity in the primary mode.



Figure 8.1.5: Measured  $V_{LOAD}$  transient for WiFi backchannel energy harvesting with a WiFi node configured as a hotspot.

|               | Tech.       | Antenna Area                | Sensitivity (dBm)   | $V_{RECT} @$ Sensitivity | $R_{LOAD} @$ Sensitivity | Freq.   | Cold Start | Architecture              | Requirement         |
|---------------|-------------|-----------------------------|---------------------|--------------------------|--------------------------|---------|------------|---------------------------|---------------------|
| This work     | 65nm CMOS   | 1.27cm <sup>2</sup>         | -36<br>Primary mode | 1V                       | ∞                        | 2.4 GHz | No         | Rectifier-Boost converter | Deep N-Well         |
| This work     | 65nm CMOS   | 1.27cm <sup>2</sup>         | -33<br>Cold start   | 1V                       | ∞                        | 2.4 GHz | Yes        | Rectifier-Boost converter | Deep N-Well         |
| ISSCC 16 [7]  | 65nm CMOS   | 1.21cm <sup>2</sup>         | -34.5               | 1.6V                     | 1.8MΩ                    | 2.4 GHz | Yes        | Bootstrapped Rectifier    | Deep N-Well         |
| JSSC 16 [8]   | 0.18μm CMOS | 2.6*3.5*11.7cm <sup>2</sup> | -32.2               | -                        | -                        | 50 kHz  | No         | Resonant LC               | -                   |
| RFIC 15 [1]   | 65nm CMOS   | 1.33cm <sup>2</sup>         | -30.7               | 1                        | ∞                        | 2.4 GHz | Yes        | Rectifier                 | Deep N-Well         |
| JSSC 14 [2]   | 90nm CMOS   | 12cm <sup>2</sup>           | -27                 | 1                        | ∞                        | 868 MHz | Yes        | Bootstrapped Rectifier    | Control Loop        |
| JSSC 11 [3]   | 90nm CMOS   | No antenna                  | -24                 | 1                        | ∞                        | 915 MHz | Yes        | Rectifier                 | Deep N-Well         |
| JSSC 08 [4]   | 0.25μm CMOS | 30cm <sup>2</sup>           | -22.6               | 2                        | ∞                        | 906 MHz | No         | Rectifier                 | External Pre-charge |
| TCAS-I 07 [5] | 0.18μm CMOS | 37.4cm <sup>2</sup>         | -18                 | 0.8                      | ∞                        | 970 MHz | Yes        | Rectifier                 | -                   |

Figure 8.1.6: Performance summary and comparison to prior art.



Figure 8.1.7: Die micrograph of 2.4GHz pico-Watt wireless energy harvester implemented in 65nm CMOS.

## 8.2

## A 70W and 90% GaN-Based Class-E Wireless-Power-Transfer System with Automatic-Matching-Point-Search Control for Zero-Voltage Switching and Zero-Voltage-Derivative Switching

Che-Hao Yeh<sup>1</sup>, Yen-Ting Lin<sup>1</sup>, Chun-Chieh Kuo<sup>1</sup>, Chao-Jen Huang<sup>1</sup>, Cheng-Yu Xie<sup>1</sup>, Shen-Fu Lu<sup>1</sup>, Wen-Hau Yang<sup>1</sup>, Ke-Horng Chen<sup>1</sup>, Kuo-Chi Liu<sup>1</sup>, Ying-Hsi Lin<sup>2</sup>

<sup>1</sup>National Chiao Tung University, Hsinchu, Taiwan

<sup>2</sup>Realtek Semiconductor, Hsinchu, Taiwan

High-power (>50W) and high-efficiency (>90%) wireless-power-transfer (WPT) systems are becoming in demand for portable electronic applications. In Fig. 8.2.1, power efficiency and/or output power specifications in prior-art designs are much below the expected requirements [1-5]. Frequency tuning in [1,2] is simple, but the switching frequency ( $f_{SW}$ ) deviates from 6.78MHz. Capacitor tuning [3,5] is the most intuitive approach, but the capacitor matrix occupies a large area, and the dynamically tuned compensation capacitor bank is limited by the digital-control resolution and compensation accuracy. In addition, the duty-cycle control in [4] leads to an unregulated output voltage at the RX side. Existing impedance-matching techniques for reducing power loss are not applicable to high-power impedance matching of a GaN-based WPT system in the case of time-variable charging distance, loading, operation voltage, and temperature variations that induce a wide range of inductive or capacitive loading effects. Inductive loading degrades the efficiency by 51% in a GaN power switch and induces serious coupling effects to the gate of the GaN device due to the hard-switching (HS) power loss. Likewise, capacitive loading results in the efficiency degradation of 14% due to the body-diode conduction (BDC) power loss. Such large dissipation easily breakdowns a GaN device and even seriously damages the WPT system, especially when transmitting high-power. Therefore, simultaneously achieving both (1) the minimized HS and BDC power loss by efficient impedance matching and (2) highly reliable operation of a GaN device over a wide range of loading effects is in urgent demand for high-power and high-efficiency WPT systems.

This paper proposes the dual-loop control, which includes an automatic matching-point- searching (AMPS) loop and an adjustable linearization-compensation-capacitance (ALCC) loop in a 6.78MHz GaN-based Class-E WPT system. The AMPS control loop ensures both zero-voltage switching (ZVS) and zero-voltage derivative switching (ZVDS) for reducing both HS and BDC power loss. Moreover, the analog ALCC control loop linearly adjusts the compensation capacitance to improve the compensation accuracy. Besides, the ringing-voltage suppression (RVS) technique in a GaN driver is proposed to improve both the reliability and efficiency.

The top of Fig. 8.2.2 illustrates the complete architecture of the proposed GaN-based Class-E WPT system with the AMPS and ALCC dual control loops. Under any impedance variations, the AMPS automatically adjusts the biasing voltage ( $V_{BIAS}$ ) to control a discrete FET, R6030ENZ, according to the voltage  $V_{SW}$  to ensure both ZVS and ZVDS for impedance matching. The R6030ENZ, source, and body terminals grounded works as an adjustable compensation capacitance, since its internal parasitic capacitance,  $C_{OSS}$ , is a function of  $V_{BIAS}$ . However, in the bottom of Fig. 8.2.2, a nonlinear relationship exists between  $C_{OSS}$  and  $V_{BIAS}$ . If the control loop uses the constant  $dV_{BIAS}/dt$ , the variable  $dC_{OSS}/dt$  disturbs the AMPS control loop gain and deteriorates the accuracy and stability of the AMPS control loop. Therefore, an additional ALCC control loop is applied to linearize the  $dC_{OSS}/dt$  through the use of the variable-slope voltage-biasing (VSB) circuit, which generates three different  $dV_{BIAS}/dt$  including  $k/m_1$ ,  $k/m_2$ , and  $k/m_3$ , where  $m_1$ ,  $m_2$ , and  $m_3$  represent the piecewise-linear values of  $dC_{OSS}/dV_{BIAS}$  and  $k$  is a constant. Thus,  $C_{OSS}$  is linearized in time domain ( $dC_{OSS}/dt=k$ ) to ensure both loop stability and compensation accuracy. Moreover, owing to high-frequency effects on parasitic inductance and high-frequency electromagnetic-interference (EMI) noise, instantaneous current and voltage spikes at the gate  $V_G$  can damage the GaN power switch. Thus, the proposed RVS technique suppresses the voltage spikes and EMI noise to improve the efficiency and ensure the reliable operation of the GaN device under high-power and high-switching applications.

The top of the Fig. 8.2.3 shows the circuit implementation of the AMPS control loop with the ALCC control loop. In the time-based hysteresis window (TBHW), the signal  $V_{GL}$  and  $V_G$  are used as the leading bound and lagging bound, respectively, where  $V_G$  at the gate of the GaN device is the delayed signal of  $V_{GL}$  of the GaN driver. In the bottom left of Fig. 8.2.3, the TBHW checks the state of

the WPT system by comparing the signal  $V_{SEN}$ , which represents the comparison result between  $V_{SW}$  and ground, with  $V_{GL}$  and  $V_G$ . Corresponding to the timing diagram in the bottom right of Fig. 8.2.3, the system is at the HS state when  $V_{SEN}$  is logic 'L' ( $V_{SW}>0$ ) after both  $V_{GL}$  and  $V_G$  rise (e.g.,  $V_{SEN}$  is outside of the TBHW), and the output signals R and F become logic 'H' and 'L', respectively, after  $V_G$  rises. Thus, the TBHW in the AMPS control loop informs the VSB circuit to charge  $V_{BIAS}$ , thereby decreasing the compensation capacitance to eliminate the inductive loading effect. When the system is at the ZVS and ZVDS state for impedance matching,  $V_{SEN}$  becomes logic 'H' between the rise of  $V_{GL}$  and  $V_G$  (e.g.,  $V_{SEN}$  is inside of the TBHW), and both of R and F become logic 'L' to disable the VSB circuit. The VSB circuit applies the duty-cycle-controlled boost operation ( $dV_{BIAS}/dt \propto D_R^2$ ) and sink-current operation ( $dV_{BIAS}/dt \propto -l_{dis} \times D_F$ ) for charging and discharging  $V_{BIAS}$ , respectively. Besides, the ALCC control loop, which is linearized by calibrating with  $V_{D0}$  to  $V_{D5}$  during both the charging and discharging period, adjusts suitable duty cycles according to the present  $V_{BIAS}$  and R by the slope-decision circuit to realize  $dC_{OSS}/dt=k$ .

The top left of Fig. 8.2.4 shows  $L_G$  and  $L_S$  representing the bonding-wire effects on the PCB circuit component, while  $C_{GD}$ ,  $C_{GS}$ , and  $C_{DS}$  are the parasitic capacitances of the GaN power switch. Under high-frequency and high-power applications, bonding wire effects and parasitic capacitances result in a large voltage spike at the gate of the GaN device. Besides, when the WPT system operates in the HS state, the instantaneous current  $I_{SW}$  induced in the ground through the  $L_S$  aggravates the spike voltage phenomenon and thus damages the GaN device as shown in the upper right of Fig. 8.2.4. In high-power applications, the degradation of efficiency limits the highest power-transfer ability of the GaN power switch. Thus, the proposed RVS technique contains two paths to protect it. The regular path is similar to prior art that clamp the upper limit of  $V_{FB,G}$  at the reference voltage  $V_{REF}$  by an error amplifier (EA), but this method cannot react to the instantaneous spike at  $V_G$  due to the limited EA bandwidth. Therefore, the RVS technique uses a shunt low-dropout (LDO) regulator with an additional coupling path to strengthen the suppression. The bottom right of Fig. 8.2.4 depicts the operation of the RVS technique. Based on the different levels of the voltage  $V_{SH}$  sampled from the  $V_{FB,SW}$  at the rising edge of  $V_{GL}$ , the coupling path generates a corresponding coupling magnitude by controlling the on/off of the switches  $S_1$  to  $S_7$  connected to the on-chip coupling-capacitor array. Therefore, the coupling path, prepared in advance, is capable of eliminating the instantaneous spike at  $V_G$  and, hence, reduces EMI noise to improve the efficiency and ensure reliable operation of the GaN device.

The test chip was fabricated in a SOI 0.5μm CMOS process. As shown in the top of Fig. 8.2.5, with the AMPS control loop, the proposed WPT system can always achieve both ZVS and ZVDS for impedance matching (corresponding to region 2, 4, and 6) under different loading effects (corresponding to  $t_3$  and  $t_5$ ). Furthermore, with the ALCC control loop, the measured trajectory of  $C_{OSS}$  demonstrates the linearization of  $C_{OSS}$  in time domain for improving the compensation accuracy. The bottom of Fig. 8.2.5 shows the measured  $V_{SW}$  and  $V_G$ . With the RVS technique, the maximum spike at  $V_G$  is reduced from 7.1 to 5.3V, achieving 85.7% ringing suppression. The statistic results in the top of Fig. 8.2.6 depict the power loss is reduced from 34 to 0.8W, achieving 42 times reduction and >85% overall efficiency under a wide range of loading variation. The bottom of Fig. 8.2.6 shows the comparison table with prior art. The proposed WPT system is capable of realizing impedance matching over a wide range of loading effects without a complicated implementation and limited resolution. Finally, peak output power of 70W and peak efficiency of 90% meet both the requirements of high power and high efficiency for WPT system. The die micrograph is shown in Fig. 8.2.7.

### References:

- [1] J. Pan, et al., "An Inductively-Coupled Wireless Power-Transfer System that is Immune to Distance and Load Variations," *ISSCC*, pp. 382-383, Feb. 2017.
- [2] S. Aldhaher, et al., "Tuning Class E Inverters Applied in Inductive Links Using Saturable Reactors," *IEEE Trans. Power Electron.*, vol. 29, no. 6, pp. 2969-2978, June 2014.
- [3] Y. Lim, et al., "An Adaptive Impedance-Matching Network Based on a Novel Capacitor Matrix for Wireless Power Transfer," *IEEE Trans. Power Electron.*, vol. 29, no. 8, pp. 4403-4413, Aug. 2014.
- [4] M. Fu, et al., "Analysis and Tracking of Optimal Load in Wireless Power Transfer Systems," *IEEE Trans. Power Electron.*, vol. 30, no. 7, pp. 3952-3963, July 2015.
- [5] H. Kennedy, et al., "A Self-Tuning Resonant Inductive Link Transmit Driver Using Quadrature-Symmetric Phase-Switched Fractional Capacitance," *ISSCC*, pp. 370-371, Feb. 2017.



Figure 8.2.1: Power loss caused by hard-switching (HS) loss and body-diode conduction (BDC) loss due to impedance mismatch.



Figure 8.2.2: Dual loops, including the AMPS and the ALCC, improve impedance matching, and the RVS technique ensures the reliable operation of the GaN device under high-power and high-switching operation.



Figure 8.2.3: Circuit implementation and operation of the AMPS and ALCC control loops.



Figure 8.2.4: The proposed GaN driver with the RVS technique has the ability to deliver high power and high efficiency, since the RVS technique has two protection paths to increase the reliability of the GaN power switch.



Figure 8.2.5: ZVS and ZVDS techniques, realized with the dual-loop control, show correct impedance matching in the time domain when loading varies (top); the linearity and high efficiency are guaranteed by the ALCC control loop (middle); and the RVS technique effectively suppresses the spikes on the gate of the GaN device for higher reliability and high power (bottom).



Figure 8.2.6: Power loss and peak efficiency versus load variations and the comparison table with prior art.



Figure 8.2.7: Die micrograph.

## 8.3 A Reconfigurable Cross-Connected Wireless-Power Transceiver for Bidirectional Device-to-Device Charging with 78.1% Total Efficiency

Fangyu Mao<sup>1</sup>, Yan Lu<sup>1</sup>, Seng-Pan U<sup>1,2</sup>, Rui P. Martins<sup>1,3</sup>

<sup>1</sup>University of Macau, Macau, China

<sup>2</sup>Synopsys Macau, Macau, China

<sup>3</sup>Instituto Superior Técnico/Universidade de Lisboa, Lisbon, Portugal

Wireless power transfer (WPT) via inductive coupling is a convenient way to charge power-starved portable/wearable devices. Recently, device-to-device (D2D) wireless charging was demonstrated [1,2], which expands the range of WPT applications. Different from the traditional wireless charging, which obtains its energy from the AC mains and has virtually unlimited energy, the D2D charging sources power from an energy-constrained battery. Therefore, achieving the maximum-efficiency transfer is a key design issue. A zero-voltage-switching (ZVS) Class- $\phi_2$  receiver with maximum efficiency tracking, but using several off-chip passives, was designed in [1] to improve the rectifier and coupling-link efficiencies for unidirectional D2D wireless charging. In [2], a reconfigurable wireless-power transceiver (TRX) with the maximum-current charging mode was proposed to turn a WPT receiver (RX) into a WPT transmitter (TX) with negligible additional hardware, which enabled the bidirectional D2D wireless charging. However, the TX mode efficiency and maximum output power in [2] are relatively low, and its WPT distance is short.

Figure 8.3.1 shows the proposed reconfigurable cross-connected (CC) wireless-power TRX with near-optimum switch-timing-control schemes for bidirectional D2D charging. As we know, CC power switches are widely used in full-wave rectifiers to reduce the switching losses, because they are driven by the LC tank with no loss [3]. On the other hand, the CMOS power transistors in the traditional Class-D power amplifier (PA) are controlled by separate buffers, generating large gate-drive switching losses. Here, we found that the CC topology can also be applied to the differential Class-D PA for switching loss reduction as well. Now,  $M_{P1}$  and  $M_{P2}$  are also cross-connected in the TX mode, charged/discharged by the resonant current  $I_{TX}$  when the ZVS of  $M_{N1,2}$  is guaranteed by an optimum-switching control, significantly reducing the gate-drive losses. In addition, the multiplexers on the gate-drive paths in [2] are no longer required, saving certain area and conduction loss.

The reconfigurable controller can generate adaptive optimum switching for both modes. Figure 8.3.1 also shows the ideal critical waveforms of both modes. To achieve ZVS in the TX mode, the equivalent load impedance of the Class-D PA should be inductive, which means that the phase of the output voltage leads that of the output current. Therefore, additional variable capacitors  $C_{A1}$  and  $C_{A2}$  can be added to the RX, transformed by the resonant coupling link, making an inductive equivalent load to the TX. In the TX mode,  $M_{N2}$  turns off at  $t_0$ , prior to the  $I_{TX}$  zero-crossing point. During the deadtime  $t_{DT}$ ,  $V_{TX2}$  is charged up and  $V_{TX1}$  is discharged by  $I_{TX}$ , while the operations of  $M_{P1}$  and  $M_{P2}$  swap. Then,  $M_{N1}$  is turned on at  $t_1$  with ZVS. In the RX mode, to obtain a zero-current switching (ZCS), the off-delay of  $M_{N1,2}$  is compensated by a delay-locked loop (DLL) in the reconfigurable controller. Near-ZVS of  $M_{N1,2}$  can be achieved by comparators  $CMP_{1,2}$  without any timing control.

Figure 8.3.2 shows the reconfigurable controller for  $M_{N1}$  and its timing diagrams. The voltage-controlled delay line (VCDL) and charge pump are reused by both modes. Separate phase detectors  $PD_{TX}$  and  $PD_{RX}$  are designed. The turn-on timing of  $V_{NT1}$  ( $V_{NR1}$ ) is determined by detecting the rising edge of CLK ( $V_{CMP1}$ ). While the falling edge (FE) is obtained from the VCDL delayed CLK ( $V_{CMP1}$ ). During the start-up process, the delay time of the VCDL may be larger than half of the operation period, causing large transients and harming the safe operation of power transistors in both modes. Therefore, the logic AND<sub>1</sub> is added to limit the duty cycle of  $V_{NT1}$  and  $V_{NR1}$  to be <0.5. The  $PD_{TX}$  compensates  $V_{TX2}$  with  $t_{r\_CP}$  first and detects the phase difference between  $V_{NT2}$  and  $V_{TX2\_D}$ , where  $t_{r\_CP}$  is the phase difference between  $V_{NT2}$  and  $V_{TX2}$  with the proper FE timing of  $V_{NT1}$ . Similarly, the  $PD_{RX}$  compensates  $V_{NR1}$  with  $t_{r\_CP}$  first and detects the phase difference between  $V_{CMP1}$  and  $V_{NR1\_D}$ , while  $t_{r\_CP}$  is the off-delay caused by  $CMP_1$ . In the TX mode, when the deadtime is too large,  $I_{TX}$  reverses during the deadtime and charges up the discharged  $V_{TX2}$  again by the reverse current in the same half cycle. Thus, the second falling edge FE<sub>2</sub> misleads the  $PD_{TX}$ . Therefore, the  $PD_{TX}$  must be carefully designed to ignore FE<sub>2</sub>.

Figure 8.3.3 analyzes the effectiveness of the typical and the proposed resonant switching topologies. To obtain inductive load on the TX side, one solution is to operate the TX at the frequency  $\omega_{OP}$  higher than the resonant frequency  $\omega_{RES}$ .  $R_{RECT}$  is the equivalent input impedance of the rectifier and  $Z_{EQ}$  is the equivalent load impedance on the TX side. However, this solution is only effective when the coupling coefficient k is lower than a critical value  $k_c$ . If  $k \geq k_c$ ,  $Z_{EQ}$  is resistive and capacitive. In our case, a capacitor  $C_A$  is connected in-parallel with  $R_{RECT}$  and provides a capacitive load to  $L_2 C_2$ . This capacitive load can be transformed to inductive  $Z_{EQ}$  in the whole range of k at  $\omega_{OP} = \omega_{RES}$ . However,  $C_A$  causes a resonant-frequency shift and reduces the current that goes into the rectifier. Therefore, the impedance of  $C_A$  should be several times larger than  $R_{RECT}$ . Here,  $C_A \approx 300\text{pF}$  is selected. In the TX mode,  $C_A$  is harmful because it adds capacitive load to  $Z_{EQ}$ . Therefore,  $C_A$  should be tunable. In this design,  $C_A$  is composed of two series-connected PMOS capacitors  $C_{A1}$  and  $C_{A2}$ , with the body connected to high DC voltage. Thus,  $C_A$  can be tuned by switching its gate voltage.

To enhance the D2D total efficiency, the coupling link efficiency  $\eta_{LINK}$  should be also optimized. The theoretical peak link efficiency  $\eta_{MAX}$  can only be achieved at the optimal load resistance  $R_{OPT}$  [4], which is highly related to k. For maximum efficiency-point tracking (MEPT), a boost converter following the rectifier was used in [1], and a Q-modulation technique was proposed in [5], to tune the RX input impedance. In our design, we found that MEPT can be realized automatically if the TX and RX coils are identical and  $k^2 Q_1 Q_2 > 1$ , where  $Q_1$  and  $Q_2$  are the unloaded quality factor of  $L_1$  and  $L_2$ , respectively. Because  $R_{RECT}$  has a very simple relationship with  $R_{OPT}$ ,  $R_{RECT} = (A/\eta_{PRM}) \times R_{OPT}$ , where A is the voltage gain of the power link and  $\eta_{PRM}$  is the TX coil efficiency. Thus, the lowest  $\eta_{LINK}/\eta_{MAX}$  can be estimated. When  $V_{BAT1} = 4.2\text{V}$  and charges  $V_{BAT2}$  from 2.5 to 4.2V with  $\eta_{PRM} > 60\%$ , then  $\eta_{LINK}/\eta_{MAX}$  can be higher than 0.917. Figure 8.3.3 shows the theoretical  $\eta_{MAX}$  and the simulated  $\eta_{LINK}$  of our design. We can see that the  $\eta_{LINK}$  is very close to the theoretical  $\eta_{MAX}$  at different k and  $V_{BAT2}$ .

The reconfigurable CC wireless-power TRX has been fabricated in a  $0.35\mu\text{m}$  CMOS process.  $C_{A1}$  and  $C_{A2}$  are placed in the space margins of the power transistors, without increasing the chip area. The total area is  $3.92\text{mm}^2$ . Two identical PCB coils with a 4cm outer diameter are used in the measurement. Their inductance and unloaded Q-factor at 6.78MHz is  $1.05\mu\text{H}$  and 111, respectively. Figure 8.3.4 shows the D2D total efficiency and  $P_{OUT}$  versus  $V_{BAT2}$  at different coupling distances. The peak total efficiency is 78.1% at  $P_{OUT}=0.8\text{W}$ . The maximum  $P_{OUT}$  of 2.7W with 62.5% total efficiency is achieved when the distance d=23mm. Figure 8.3.5 shows the measured RX voltage and current waveforms when charging a  $4.7\text{mF}$  capacitor from 2.5 to 4.2V. Figure 8.3.6 shows the comparison with prior works. This work demonstrates the highest total efficiency and the longest transmitting distance relative to prior works in Fig. 8.3.6. Figure 8.3.7 shows the die micrograph.

### Acknowledgment

This work is supported by the Research Committee of University of Macau under MYRG2015-00107-AMSV, and the Macao Science and Technology Development Fund (FDCT) SKL-AMSV-2017-2019.

### References:

- [1] N. Desai and A. Chandrakasan, "A ZVS Resonant Receiver with Maximum Efficiency Tracking for Device-to-Device Wireless Charging," *ESSCIRC*, pp. 313-316, Oct. 2016.
- [2] M. Huang, et al., "A Resonant Bidirectional Wireless Power Transceiver with Maximum-Current Charging Mode and 58.6% Battery-to-Battery Efficiency," *ISSCC*, pp. 376-377, Feb. 2017.
- [3] Y. Lu and W.-H. Ki, "A 13.56 MHz CMOS Active Rectifier with Switched-Offset and Compensated Biasing for Biomedical Wireless Power Transfer Systems," *IEEE Trans. Biomed. Circuits Syst.*, vol. 8, no. 3, pp. 334-344, June 2014.
- [4] K. van Schuylenbergh and R. Puers, *Inductive Powering: Basic Theory and Application to Biomedical Systems*, Springer, p. 96, 2009.
- [5] M. Kiani, et al., "A Q-Modulation Technique for Efficient Inductive Power Transmission," *IEEE JSSC*, vol. 50, no. 12, pp. 2839-2848, Dec. 2015.
- [6] J.-T. Hwang, et al., "An All-in-One (Qi, PMA and A4WP) 2.5W Fully Integrated Wireless Battery Charger IC for Wearable Applications" *ISSCC*, pp. 378-379, Feb. 2016.
- [7] J.-H. Choi, et al., "A Resonant Regulating Rectifier (3R) Operating at 6.78 MHz for a 6W Wireless Charger with 86% Efficiency," *ISSCC*, pp. 64-65, Feb. 2013.



Figure 8.3.1: The proposed cross-connected reconfigurable wireless-power transceiver (top), which is configured to a cross-connected differential Class-D PA in the TX mode and a cross-connected full-wave rectifier in the RX mode, and the critical operating waveforms (bottom).



Figure 8.3.2: The block diagram of the re-configurable controller (top) and its timing diagram (bottom).



Figure 8.3.3: Comparison between a typical and the proposed solution to provide inductive TX load (top), the estimated lower limit of the power link efficiency, and the simulated power link efficiencies compared with the theoretical peak link efficiencies (bottom left).



Figure 8.3.4: D2D total efficiencies and charging power at different output voltages and transmitting distances.



Figure 8.3.5: Measured RX AC input voltage, output voltage, and charging current when charging a 4.7mF capacitor.

|                         | ISSCC'13 [7]             | ISSCC'16 [6]                | ESSCIRC'16 [1]              | ISSCC'17 [2]   | This Work      |
|-------------------------|--------------------------|-----------------------------|-----------------------------|----------------|----------------|
| WPT Direction           | Unidirectional           | Unidirectional              | Unidirectional              | Bi-Directional | Bi-Directional |
| Mode                    | Pad-to-Device            | Pad-to-Device               | Device-to-Device            | Reconf. D2D    | Reconf. D2D    |
| Process                 | 0.35μm BCD               | 0.18μm BCD                  | 0.18μm CMOS                 | 0.35μm CMOS    | 0.35μm CMOS    |
| Freq. (MHz)             | 6.78                     | 0.1-0.3, 6.78               | 6.78                        | 6.78           | 6.78           |
| $V_{OUT,MAX}$ (V)       | 5                        | 3.5                         | 4.2                         | 4.2            | 4.2            |
| $P_{OUT,MAX}$ (W)       | 6                        | 2.5                         | 0.74                        | 1.65           | 2.7            |
| $\eta_{TOTAL,MAX}$      | 55%                      | 63%                         | 52.3%                       | 58.6%          | 78.1%          |
| Distance (mm)           | NA                       | NA                          | 19                          | 6              | 23             |
| Area (mm <sup>2</sup> ) | 5.52                     | 5.83                        | 1.2                         | 3.9            | 3.92           |
| Off-Chip Components     | 5 Diodes<br>3 Capacitors | 1 Inductors<br>3 Capacitors | 2 Inductors<br>2 Capacitors | 1 Capacitor    | 1 Capacitor    |

Figure 8.3.6: Performance summary and comparison table.



Figure 8.3.7: Die micrograph.

## 8.4 A 13.56MHz Wireless Power and Data Transfer Receiver Achieving 75.4% Effective-Power-Conversion Efficiency with 0.1% ASK Modulation Depth and 9.2mW Output Power

Yu Wang<sup>1</sup>, Dawei Ye<sup>1</sup>, Liangjian Lyu<sup>1</sup>, Yingfei Xiang<sup>1</sup>, Hao Min<sup>1</sup>, C.-J. Richard Shi<sup>1,2</sup>

<sup>1</sup>Fudan University, Shanghai, China

<sup>2</sup>University of Washington, Seattle, WA

Implantable and wearable devices require both wireless power transfer (WPT) and wireless data transmission (WDT) in biomedical systems [1-3], e.g., neural recording applications [4]. Very often, amplitude modulation (AM) is preferred in these applications due to its lower power consumption and less circuit complexity in (de)modulation. To achieve the required bit-error rate (BER) lower than  $10^{-3}$ , a large modulation depth (MD), typically in the range of 8% to 100%, is needed. Unfortunately, as shown in Fig. 8.4.1, the available power on the coil is reduced quadratically with respect to  $1 - MD$  during the low-amplitude time interval ( $t_2$ ). Large MD leads to low effective power-conversion efficiency (EPCE), defined as the combined efficiency of modulation and rectification of the WPT receiver. One solution to this problem is to separate the power and data link using two different frequencies [2-4]. This requires an extra coil or antenna, and the corresponding (de)modulator consumes higher power, not suitable for compact ultra-low power design. The challenge, which has not been addressed previously, is how to use the same coil (frequency) to transfer the maximal power (thus high WPT EPCE) while achieving the required BER.

This paper presents a wireless-power and data-transfer (WPDT) receiver, which addresses this challenge. It utilizes a recently introduced technique called shifted limiter (SL) [5] to support ultra-low MD for AM wireless data communication, co-designed with dynamic impedance matching (DIM) and adaptive conversion-ratio (CR) tuning for jointly maximizing the EPCE.

Figure 8.4.2 shows the system architecture of the WPDT receiver. It consists of a 13.56MHz ASK receiver and WPT circuitry. The receiver is able to demodulate the ASK signal with MD as low as 0.1%. A cascaded SL chain is used to amplify the signal envelope on the coil. Since the SL can inherently suppress the large carrier and amplify the small AM signal simultaneously, the gain saturation caused by the carrier is alleviated, thus there is no need to use extra frequency translation for demodulation, removing the need for a power-hungry local oscillator (LO). After the SL chain, the signal is down-converted by the envelope detector (ED) for further amplification and filtering and is finally output to the baseband. On the WPT side, an active rectifier extracts DC power from the carrier. Due to the very-low MD, the envelope of the signal is nearly constant, and the EPCE of the active rectifier is not deteriorated. The rectified voltage ( $V_{rec}$ ) is up-converted by a step-up (SU) switched-capacitor power converter (SCPC) to high potential to charge external battery through an active diode. A 100:1 current mirror duplicates the charging current and feeds to the DIM, which generates six control bits to a programmable capacitor bank (CB) for the best impedance matching. A step-down (SD) SCPC powered by the external battery is also designed as an alternative power source for the loads, in case that the rectified voltage is too low to provide the supply voltage. To achieve higher PCE, a conversion ratio adapter (CRA) for both the SU and SD SCPC is used to determine the CR of the two SCPCs.

Figure 8.4.3 shows the ULP wireless data receiver. To suppress the carrier and amplify the signal before the ED, five cascaded SLs are designed to sustain low MD. As shown in the top right corner of Fig. 8.4.3, the carrier amplitude ( $V_c$ ) is much larger than the signal amplitude ( $V_{sig}$ ) and  $V_c$  is much larger than the linear range of the LNA. When using  $V_c + V_{sig}$  as the total LNA input ( $V_{in}$ ), the signal is clipped and thus degrades the receiver sensitivity. Since most energy of the small signal in AM is distributed on the envelope, by using the SL, the carrier envelope can be extracted to shift the linear range of the LNA to the envelope region of  $V_{in}$ , relaxing the LNA gain compression. Moreover, since the SL amplifies only one side of the signal envelope in the opposite phase, we place P-type and N-type SL alternatively to compose the SL chain to avoid clipping. The modulated signal is composed of a main frequency and a sub-carrier frequency. After the SL chain, the signal envelope is amplified so that an LO and mixer are not needed for filtering the carrier. Hence two simple EDs (ED1) are used to down-convert the signal to

the IF band, which equals to the sub-carrier frequency (847kHz). Then, the signal is further amplified, down-converted, and filtered at the baseband.

Figure 8.4.4 depicts the power management circuitry. The rectified DC voltage from the rectifier is up-converted for battery charging by the SU SCPC, which consists of three stages of voltage doublers. As shown in Fig. 8.4.4, four auxiliary transistors  $M_{N3/4}$  and  $M_{P3/4}$  are added to the cross-coupled structure to boost the gate-source voltage of the pass transistors  $M_{N1/2}$  and  $M_{P1/2}$ , thus reducing the conduction loss of the pass transistors. To achieve higher PCE, CR is selected according to the ratio between the input voltage (the rectified voltage)  $V_{rec}$  and the target voltage (battery voltage)  $V_{bat}$ . Three resistors R1 to R3 with specific ratios in the CRA determine the two critical ratios between  $V_{rec}$  and  $V_{bat}$  for choosing CR from 2 $\times$ , 3 $\times$ , and 4 $\times$ . The critical ratios are detected by two comparators and fed to the SU SCPC. The battery charging current sensed by the 100:1 current mirror is sampled on two capacitors  $C_{S1}$  and  $C_{S2}$  alternatively by DIM controller, which realizes real-time impedance matching by adjusting six control bits of the programmable CB. In each sampling clock cycle, the sampled current is compared to the last sample to generate a voltage  $V_{cmp}$ , which indicates whether the current is increasing or not. If the current increases, a pilot signal  $V_{dec}$  is turned to high level and the six binary control bits CB<5:0> increases by one, and vice versa. As a result, after tens of clock cycles, CB<5:0> fluctuates around a small range, which is around the best impedance matching condition. The SD SCPC down-converts the battery voltage to serve the loads when the rectified voltage is insufficient. It is also assisted by the CRA to achieve the best CR from 1/2 $\times$ , 1/3 $\times$ , and 2/3 $\times$  for the target output voltage  $V_{ref}$  (supply voltage for the loads). Noting that the two ratios between  $V_{ref}$  and  $V_{bat}$  are the same as the ones for the SU SCPC, the resistive voltage divider and two comparators in the CRA are shared by the SU and SD SCPCs.

The WPDT receiver is fabricated in a 65nm CMOS technology, and its die micrograph is shown in Fig. 8.4.7. The data-rate of the WPDT receiver is 100 to 150kb/s. The signal is first modulated to an 847kHz sub-carrier, which is further modulated to a 13.56MHz main carrier. The top left corner of Fig. 8.4.5 shows the minimum MD under which the received bit-error rate is lower than  $10^{-3}$ . At the input power level of 8.5dBm, the lowest MD of 0.1% is achieved. The top right corner of Fig. 8.4.5 shows the efficiency of the rectifier under MD of 1% and 10%, respectively with different loads. It is observed that, compared to a 10% MD, the efficiency of the rectifier is improved by 20% under a 1% MD. Moreover, by using DIM, the peak efficiency of the rectifier is enhanced by 30% compared to the same rectifier with fixed matching capacitor (half CB), which is shown in the bottom left corner of Fig 8.4.5. In the bottom right of Fig. 8.4.5, the efficiencies of the SU and SD SCPCs are shown at input voltages of 1.2V and 2.5V, respectively, over a wide output range, sufficient for most application requirements. Figure 8.4.6 shows the comparison results of this work to the prior works that support both wireless power and data transfer. This work reports the best EPCE among those in Fig. 8.4.6 for AM wireless communication with the required communication quality.

### Acknowledgement:

This work is supported by Science and Technology Commission of Shanghai Municipality under Grant 16JC1400102.

### References:

- [1] Y.-P. Lin, et al., "A Battery-Less, Implantable Neuro-Electronic Interface for Studying the Mechanisms of Deep Brain Stimulation in Rat Models," *IEEE TBioCAS*, vol. 10, no. 1, pp. 98-112, Mar. 2016.
- [2] Y.-K. Lo, et al., "A 176-Channel 0.5cm<sup>3</sup> 0.7g Wireless Implant for Motor Function Recovery after Spinal Cord Injury," *ISSCC*, pp. 382-383, Feb. 2016.
- [3] W.-M. Chen, et al., "A Fully Integrated 8-Channel Closed-Loop Neural-Prosthetic SoC for Real-Time Epileptic Seizure Control," *ISSCC*, pp. 286-287, Feb. 2013.
- [4] S. Lee, et al., "An Inductively Powered Scalable 32-Channel Wireless Neural Recording System-on-a-Chip for Neuroscience Applications," *IEEE TBioCAS*, vol. 4, no. 6, pp. 360-371, Nov. 2010.
- [5] D. Ye, et al., "An Ultra-Low-Power Receiver Using Transmitted-Reference and Shifted Limiters for In-band Interference Resilience," *ISSCC*, pp. 438-439, Feb. 2016.
- [6] N. Desai, et al., "An Actively Detuned Wireless Power Receiver with Public Key Cryptographic Authentication and Dynamic Power Allocation," *ISSCC*, pp. 366-367, Feb. 2017.



Figure 8.4.1: Concept and merits of the WPDT receiver.



Figure 8.4.2: Block diagram of the WPDT receiver system.



Figure 8.4.3: Simplified schematic of the wireless data receiver with shifted limiters.



Figure 8.4.5: Measurement results of the WPDT receiver.



Figure 8.4.4: Schematic of the power-management circuits.

|                                | TBCAS10 [4]          | ISSSC13 [3] | ISSSC16 [2] | TBCAS16 [1] | ISSSC17 [6] | This work |
|--------------------------------|----------------------|-------------|-------------|-------------|-------------|-----------|
| CMOS Technology                | 0.5µm                | 0.18µm      | 0.18µm HV   | 0.18µm      | 0.18µm      | 65nm      |
| Data link carrier (MHz)        | 915/845.5<br>915/877 | 401-406     | 20          | 10          | 6.78        | 13.56     |
| Data rate(Mbps)                | 0.46-5.64            | 4           | 2           | 0.1         | 0.02        | 0.1-0.15  |
| Modulation mode                | PWM FSK OOK          | OOK         | DPSK        | ASK         | PWM OOK     | ASK       |
| Min. MD (in AM)                | 100%                 | 100%        | N/A         | 7%          | 100%        | 0.1%      |
| Data Rx power consumption (mW) | 3.8                  | 0.28*       | N/A         | N/A         | N/A         | 0.132     |
| Power link frequency (MHz)     | 13.56                | 13.56       | 2           | 10          | 6.78        | 13.56     |
| Peak rectifier efficiency      | 80.2%                | 84.8%       | N/A         | N/A         | 74%         | 75.4%     |
| AM Peak EPCE**                 | 40.1%                | 42.4%       | N/A         | N/A         | 37%         | 75.4%     |
| Max output power (mW)          | N/A                  | N/A         | N/A         | 10          | 520         | 9.2       |
| Coil/antenna No.               | 2                    | 2           | 2           | 1           | 1           | 1         |
| Battery charge                 | No                   | No          | No          | No          | Yes         | Yes       |

\* excluding LDO and biasing

\*\* calculated by  $EPCE = PCE \times [1 + (1 - MD)]^2 / 2$  under the assumption that codes "0" and "1" are equally distributed in the data.

Figure 8.4.6: Performance summary of the WPDT receiver and comparison to prior art.



Figure 8.4.7: Die micrograph.

## 8.5 MISIMO: A Multi-Input Single-Inductor Multi-Output Energy Harvester Employing Event-Driven MPPT Control to Achieve 89% Peak Efficiency and a 60,000x Dynamic Range in 28nm FDSOI

Sally Safwat Amin, Patrick P. Mercier

University of California, San Diego, La Jolla, CA

Harvesting energy from ambient sources is an attractive way to enable net-zero-power operation in small wearables, environmental monitors, and IoT devices. However, the power available from most harvesting sources is low, and is thus often mismatched to the instantaneous demands of wirelessly connected devices, which require up to 10s of mW during radio transmission, and down to 1μW during sleep. Previous works have suggested utilizing multiple energy harvesting sources to increase extractable power, incorporating a battery as an energy buffer, and processing energy with a single inductor to minimize device volume [1,2]. However, such prior art does not support the generation of multiple independently regulated power domains needed for modern SoCs. While single-inductor multi-output (SIMO) converters have demonstrated multi-domain regulation [3,4], they have not been combined with multi-input harvesting, in part since simultaneous regulation of multiple input sources (for MPPT purposes) and loads with a single inductor is not possible with conventional control techniques.

This paper presents a multi-input single-inductor multi-output (MISIMO) energy harvester that meets the needs of next-generation net-zero-power IoT devices by: 1) harvesting from multiple sources and delivering this energy either directly to multiple load domains or to the battery depending on instantaneous load conditions, all via a single-stage single-inductor topology to eliminate cascaded stage losses and minimize implementation footprint; 2) performing independent-domain regulation within a single-inductor switching cycle to reduce switching losses by 3×; 3) calibrating the battery discharge time to increase the inductor time allocated to energy harvesting by 10×; 4) allowing excess inductor energy to recycle back to the battery as needed to decouple input source and load regulation from each other, thereby enabling simultaneous MPPT across all sources and independent regulation of all loads, all with a single inductor; 5) dynamically adapting both the inductor on-time and switching frequency via a hysteretic event-driven control circuit; and 6) modulating power-switch sizes with load conditions to improve light-load efficiency by up to 24%, while utilizing cascaded switch structures to reduce leakage power losses by 9×.

The MISIMO buck-boost power stage is shown in Fig. 8.5.1, along with the illustrations of various inductor current switching schemes proposed for efficient, decoupled, multi-input MPPT harvesting and multi-output load regulation. The implemented MISIMO chip harvests energy from up to 3 sources simultaneously: via a photovoltaic (PV) cell at 0.2 to 1V, a thermoelectric generator (TEG) at 0.1 to 0.4V, and a biofuel cell (BFC) at 0.2 to 0.5V; and independently regulates 3 different power rails (each between 0.4 and 1.4V). During periods where instantaneous load demands outstrip harvesting capacity, the event-driven MISIMO controller selects the battery as the source in phase  $\phi_1$ , to deliver energy to one or more loads in phase  $\phi_2$ . If harvester energy is available, it is extracted in  $\phi_1$ , and delivered to up to three loads and/or the battery in  $\phi_2$ . In both cases, single-cycle multi-load regulation reduces switching losses by up to 3×. Hysteretic PFM comparators are employed to regulate each source at its maximum-power-point voltage,  $V_{MPP,i}$ , and each load at its own reference voltage,  $V_{ref,i}$ . The lower hysteretic limit of the source comparators triggers the end of  $\phi_1$ , and, hence, dynamically determines the inductor charging time,  $T_{\phi1\_Hb}$ , while the upper limit triggers a new inductor switching cycle, and dynamically defines the harvester switching frequency,  $F_{sw\_Hi}$ . If, during a load or source change, the load regulation hysteresis is satisfied yet there is still current flowing through the inductor, the MISIMO controller recycles this extra charge back to the battery at the end of  $\phi_2$ . This proposed control technique ensures that the output ripple is independent of the peak inductor current set by MPPT, thereby decoupling source- and load-regulation conditions and solving the trade-off in [3]. In addition, when the battery is selected as a source,  $T_{\phi1\_BAT}$  varies adaptively with load conditions to both minimize the amount of charge recycling back to the battery, which minimizes extra conduction losses, and to minimize the amount of time the inductor is used under battery power so as to free up the time the inductor could instead be used for energy harvesting.

A detailed schematic of the MISIMO power stage (PS) and controller is shown in Fig. 8.5.2. The PS utilizes stacked transistors to reduce  $\phi_3$  leakage losses by 9× at low load conditions. The MISIMO controller adapts  $T_{\phi1\_BAT}$  and performs switch-size modulation (SSM) on the 2b binary-weighted PS switch sizes, in both cases based on the load current indicator output, which counts the number of the inductor switching cycles required for each load to get sufficient energy under battery power, such that each load receives sufficient energy from the battery in a single cycle. The zero-current detector (ZCD), which would nominally dominate controller losses since it needs to be fast enough to avoid efficiency degradation caused by power-switch diode conduction, is duty cycled to enable operation only in  $\phi_2$ , making the impact of its losses on the overall controller losses negligible.

A flow chart detailing the source and load selection and regulation MISIMO algorithms is shown in Fig. 8.5.3. The source-side controller is enabled only if a load needs energy ( $L_{j\_cmp}=0$ ) or if harvester energy is available ( $H_{i\_cmp}=1$ ) and the battery is not fully charged ( $BAT_{ov}=0$ ). Otherwise, the source-side controller is disabled, saving up to 2.3× in power. When enabled, a source is selected in  $\phi_1$  based on the harvester comparator outputs  $H_{i\_cmp}$  and the load condition. The load-side algorithm is enabled only at the end of  $\phi_1$ , and delivers energy to one or more loads depending on real-time conditions. To avoid cross-regulation errors rising from sudden  $V_{ref}$  or  $I_L$  steps or from heavy loads dominating switching cycles, the MISIMO load algorithm limits a single load from receiving all of the inductor energy in consecutive cycles.

The MISIMO chip occupies 0.5mm<sup>2</sup> and is implemented in 28nm FDSOI. Measurements in Fig. 8.5.4 reveal a peak efficiency of 89% and an efficiency >75% across all loads over a 60,000× dynamic range (1μW to 60mW) in L3 at 1V (Fig. 8.5.4, top left), thanks in part to the proposed event-driven dynamic  $T_{\phi1\_Hb}$ ,  $T_{\phi1\_BAT}$ , and  $F_{sw\_Hi}$  control, which improves efficiency by up to 34% at light loads (Fig. 8.5.4, top right), and in part to the up to 17-to-24% improvement enabled by dynamic SSM at 1V and 0.6V, respectively (Fig. 8.5.4, bottom). Dynamic measurements in Fig. 8.5.5 show that the MISIMO energy-harvesting chip can effectively regulate three independent loads at different voltages with dynamic on-time and PFM control, even during startup (Fig. 8.5.5, top left) or during a load step (Fig. 8.5.5, top right), all with negligible droop and <30mV ripple. Measured load (Fig. 8.5.5, bottom left) and harvesting source (Fig. 8.5.5, bottom right) steps demonstrate that the MISIMO chip can simultaneously regulate input sources (to <15mV ripple for MPPT purposes) and output loads with the single-inductor topology with no cross-regulation errors, thanks to the proposed dynamic inductor-charge-recycling technique. A table of comparisons to prior work in Fig. 8.5.6 shows that, among other designs in Fig. 8.5.6, the MISIMO chip is the first to perform MPPT harvesting from multiple sources while regulating multiple outputs with a single inductor, while achieving the widest dynamic range amongst prior-art multi-input harvesters, all in a small die area and with competitive efficiency. A die micrograph is shown in Fig. 8.5.7.

### Acknowledgments:

The authors thank Harish Krishnamurthy, Sergio Carlo, Vaibhav Vaidya, and Christopher Schaefer for initial discussions and acknowledge support from ST Microelectronics for chip fabrication.

### References:

- [1] S. Bandyopadhyay and A. Chandrakasan, "Platform Architecture for Solar, Thermal, and Vibration Energy Combining with MPPT and Single Inductor," *IEEE JSSC*, vol. 47, no. 9, pp. 2199-2215, Sept. 2012.
- [2] G. Chowdary, et al., "An 18 nA, 87% Efficient Solar, Vibration and RF Energy-Harvesting Power Management System with a Single Shared Inductor," *IEEE JSSC*, vol. 51, no. 10, pp. 2501-2513, Oct. 2016.
- [3] K. Chew, et al., "A 400nW Single-Inductor Dual-Input-Tri-Output DC-DC Buck-Boost Converter with Maximum Power Point Tracking for Indoor Photovoltaic Energy Harvesting," *ISSCC*, pp. 68-90, Feb. 2013.
- [4] A. Shrivastava, et al., "A 1.2μW SIMO energy harvesting and power management unit with constant peak inductor current control achieving 83–92% efficiency across wide input and output voltages," *IEEE Symp. VLSI Circuits*, pp. 1-2, June 2014.



Figure 8.5.1: Power stage of the MISIMO energy harvester (top); representative inductor-switching schemes (bottom).



Figure 8.5.2: Block diagram of the MISIMO energy harvester, including detailed schematics of the power stage with power-switch width control, and of the duty-cycled zero-current detector (ZCD).



Figure 8.5.3: MISIMO controller flow chart.



Figure 8.5.5: Measured turn-on transient demonstrating automatic PFM control (top left); measured load step under battery power demonstrating independent voltage regulation across all 3 loads (top right); measured load step (bottom left) and source step (bottom right) during energy harvesting demonstrating simultaneous source regulation (for MPPT) and load regulation.



Figure 8.5.4: Measured efficiency for all three loads vs. current in load three (top left); measured total efficiency improvement with dynamic  $T_{\phi1-BAT}$  calibration (top right); measured total efficiency improvement with dynamic switch-size modulation at 1V (bottom left) and 0.6V (bottom right).

|                                               | Bandyopadhyay,<br>JSSC'12                                              | K. Chew<br>ISSCC'13                | Shrivastava,<br>VLSI'14            | Chen,<br>ISSCC'15                  | Lu*<br>JSSC'16                    | Chowdhury,<br>JSSC'16               | This Work                                                         |
|-----------------------------------------------|------------------------------------------------------------------------|------------------------------------|------------------------------------|------------------------------------|-----------------------------------|-------------------------------------|-------------------------------------------------------------------|
| Technology                                    | 0.35μm                                                                 | 0.18μm                             | 0.13μm                             | 0.5μm                              | 0.35μm                            | 0.18μm                              | 28nm                                                              |
| No of inputs                                  | 3+input                                                                | 1+battery                          | 1+battery                          | 1+battery                          | 3+supercap                        | 3+battery                           |                                                                   |
| No of outputs                                 | 1+battery                                                              | 2+battery                          | 3+battery                          | 1+battery                          | 1+supercap                        | 3+battery                           |                                                                   |
| Converter Architecture                        | 2-stage, 1-ind Buck/Boost                                              | 1 stage, 1-ind Buck-Boost          | 2-stage, 1-ind Buck/Boost          | 1-stage, 1-ind Buck/Boost          | 1-stage, 1-ind Buck-Boost         | 2-stage, 1-ind Buck/Boost           | 1-stage, 1-ind Buck-Boost                                         |
| Energy sources (input voltage)                | PV(0.15-0.75V)<br>Piezo(0.2-0.4V)<br>TEG(0.02-0.16V)<br>Battery (3.0V) | PV (2.6V)<br>Battery (3V)          | PV (0.38-3.3V)<br>Battery (<5V)    | PV (3.6V)<br>Battery (4V)          | PV (0.03-3.6V)<br>Battery (<3.6V) | PV<br>Piezo<br>RF                   | PV (0.2-1V)<br>TEG (0.1-0.4V)<br>BFC (0.2-0.5V)<br>Battery (1.8V) |
| Load regulation mechanism                     | PFM                                                                    | PFM<br>$f_{inv}$ Control           | PFM                                | PFM                                | PFM                               | PFM                                 | PFM+PFM+SSM                                                       |
| MPPT mechanism                                | Adaptive $T_{ON}$<br>Fixed $F_{SW}$                                    | Constant $T_{ON}$<br>Vary $F_{SW}$ | Constant $T_{ON}$<br>Vary $F_{SW}$ | Constant $T_{ON}$<br>Vary $F_{SW}$ | N.R.                              | Constant $f_{inv}$<br>Vary $F_{SW}$ | Adaptive $T_{ON}$<br>Adaptive $F_{SW}$                            |
| $L$                                           | 22μH                                                                   | 10μH                               | 20μH                               | 4.7μH                              | 22μH                              | 47μH                                | 10μH                                                              |
| $C_L$                                         | 15μF                                                                   | 10μ                                | 8μF                                | 4.7μF                              | 4.7μF                             | 10μF                                | 1μF                                                               |
| Die Area (mm <sup>2</sup> )                   | ~15                                                                    | 4.62                               | 2.25                               | 0.5                                | 4                                 | 1.1                                 | 0.5                                                               |
| $V_{out}$ (V)                                 | 1.8V                                                                   | 1V, 1.8V                           | 1.2, 1.5, 3.3V                     | 1-3.3V                             | 3.6V                              | 1.2V-1.8V                           | 0.4-1.4V                                                          |
| Quiescent Power or Current (@ $V_{DD}=1.8V$ ) | 2.7μA                                                                  | 0.4μA                              | 1μA                                | 1.2 μW                             | 200nA                             | 18nA                                | 262nA<br>@ $V_{DD}=1V$                                            |
| $P_{out}$                                     | 9μW-540μW                                                              | 1μW-10mW                           | <100mW                             | 1μW-15mW                           | N.R.                              | 60nW-40μW                           | 1μW-60mW                                                          |
| Dynamic Range (DR) for $\eta>70\%$            | 60X                                                                    | 10,000X                            | 16,500X                            | 15,000X                            | N.R.                              | 667X                                | 60,000X                                                           |
| Peak Efficiency (%)                           | 90%                                                                    | 83%                                | 92%                                | 93%                                | 85%                               | 87%                                 | 89%                                                               |

\* Load is regulated by an LDO if the harvester energy is not available

N.R.: Not Reported

<sup>†</sup> Estimated from plotted data

Figure 8.5.6: Comparison of the proposed MISIMO with prior work.



- ① MISIMO Power Stage [0.48mm<sup>2</sup>]
- ② Level Shifters and Drivers [0.013mm<sup>2</sup>]
- ③ Zero Current Detector [0.0000845mm<sup>2</sup>]
- ④ Source Hysteresis Comparators [0.00004mm<sup>2</sup>]
- ⑤ Load Hysteresis Comparators [0.00004mm<sup>2</sup>]
- ⑥ MISIMO Controller [0.002156mm<sup>2</sup>]
- ⑦ Battery controlled pulse width generator Controller [0.0375mm<sup>2</sup>]

Figure 8.5.7: Micrograph of the fabricated MISIMO die in 28nm FDSOI.

## 8.6 A 4.5-to-16 $\mu$ W Integrated Triboelectric Energy-Harvesting System Based on High-Voltage Dual-Input Buck Converter with MPPT and 70V Maximum Input Voltage

Inho Park, Junyoung Maeng, Dongju Lim, Minseob Shim,  
Junwon Jeong, Chulwoo Kim

Korea University, Seoul, Korea

As a newly emerging energy source, a triboelectric nanogenerator (TENG) was introduced in 2012, and various types of energy harvesters and active sensors based on the TENG have since been developed. Although research in the material-engineering field is actively conducted, there is not much research on TENG energy-harvesting circuits in the integrated-circuits field. From the viewpoint of material engineering, much research focuses on the applications and the analysis of instantaneous power. However, topics such as rms maximum power point (MPP), spice modeling, and impedance matching are more important from the circuit designer's viewpoint. This paper presents a TENG energy-harvesting circuit designed as a high-voltage (HV) dual-input (DI) buck converter with MPP tracking (MPPT) based on the proposed MPP analysis for the TENG.

Figure 8.6.1 describes the characteristics of the TENG. The triboelectric (TE) effect is the phenomenon in which two substances are electrically charged by physical contact or rubbing. One factor that determines the TE potential is the relative properties of these two materials. As shown in Fig. 8.6.1, charges are developed due to the friction generated between the two materials, and physical pressure changes the distance between the two electrodes. The resulting capacitance variation in the TE system generates power by inducing current flow. Because the TENG equivalent circuit can be modeled simply using linear electrical components, the MPP can be expressed as a fraction of the open-circuit voltage ( $V_{OC}$ ). The MPP equation derived from the equivalent circuit shows that the MPP is approximately a third of each  $V_{OC}$ .

A full-bridge rectifier (FBR) has been used in piezoelectric and electrostatic energy harvesting systems [1-3], because piezoelectric and electrostatic energy harvester (PEH and EEH) produce an AC output voltage. In contrast to the PEH, the EEH and TENG can have different positive and negative peak voltage values as shown in Fig. 8.6.1. If the conventional FBR is connected to the outputs of an EEH or a TENG ( $V_P$  and  $V_M$ ), the lower voltage ( $V_M$ ) cannot be harvested as only the voltage exceeding  $V_{RECT}$  can be extracted. Nonetheless, some energy harvesting circuits neglect this inefficient extraction and use commercial rectifiers externally [2,3]. The output terminal of the FBR can be switched to extract  $V_P$  and  $V_M$  separately, but this causes an additional power consumption and a switching loss. In order to harvest the different peak voltages efficiently, a dual-output rectifier (DOR) is proposed for generating the DC-DC converter inputs as shown in Fig. 8.6.2. The DOR is composed of four diode-connected LDMOSs and passes the charges from the TENG two output terminals to each input capacitor ( $C_{IN,P}$  and  $C_{IN,M}$ ). An active diode is used to reduce the voltage drop across the diode-connected transistors [1]. For HV applications, however, additional level shifters and controllers are needed that dissipate a large amount of power. Thus, the active diode is unsuitable for low power applications. Owing to the HV produced by the TENG (~85V), the relatively small voltage drop across the DOR (0.7V) may be neglected.

The proposed buck converter for a human-skin-based TE energy-harvesting system regulates two HV inputs from the DOR at each MPP. Conventional DI DC-DC converters either operate at a moderate voltage [4] or the previous HV DC-DC converters are used only for single-input applications [2,3]. The conventional DI power stage can cause a critical problem in HV applications as shown in Fig. 8.6.2. As  $V_P$  is 3-to-4 times larger than  $V_M$  from the TENG, one input of the DC-DC converter ( $V_{RECT,P}$ ) is regulated at a higher voltage than the other input ( $V_{RECT,M}$ ).  $V_{GS}$  of the transistor for  $V_{RECT,M}$  should be 0V whenever  $V_{RECT,P}$  is transferred to  $V_{OUT}$ . But, the drain voltage ( $V_{LX}$ ) is extremely high as compared to the other terminal voltages of the transistor for  $V_{RECT,M}$ . In this situation, as  $V_{GS}$  exceeds the rated voltage, the transistor for  $V_{RECT,M}$  is inevitably damaged and almost all the charge from  $V_{RECT,P}$  flows into  $V_{RECT,M}$  and not  $V_{OUT}$ . The proposed HV protector prevents this damage and transfers the harvested power to  $V_{OUT}$ .

Figure 8.6.3 shows the top architecture of the proposed TE energy-harvesting system. The TENG is connected to the integrated DOR in order to extract the TE power separately. The power-on-reset (POR) compares  $V_{RECT,M}$  to a threshold voltage and triggers an enabling signal for a voltage-controlled oscillator (VCO) when  $V_{RECT,M}$  increases beyond a threshold voltage. During the initial operation,  $V_{RECT,M}$  is sufficient for use as  $V_{DD}$  but not sufficient to be transferred to  $V_{OUT}$ . Therefore, in this case, the power stored only in  $C_{IN,P}$  is transferred to  $V_{OUT}$  and the power transistor for  $V_{RECT,M}$ ,  $M_M$ , stays off until  $V_{OUT}$  becomes sufficiently high to be used as  $V_{DD}$ . When  $V_{OUT}$  rises for supplying the power, the  $V_{DD}$  MUX changes the  $V_{DD}$  from  $V_{RECT,M}$  to  $V_{OUT}$  and generates a trigger signal that enables  $M_M$  to deliver additional power.

The buck converter for the TENG is designed with synchronous pulse-skipping modulation (PSM). Figure 8.6.3 depicts the operation of the HV DI buck converter for TENG. Two comparators are used to monitor the two input voltages for regulating them at the MPP. Two reference voltages ( $V_{REF,P}$  and  $V_{REF,M}$ ) are controlled externally because  $V_{OC}$  of the TENG exceeds the rated voltage of LDMOS. In the case of high  $\phi_P$  and low  $\phi_M$ , only  $V_{GP}$  is activated for transmitting the charges in  $C_{IN,P}$  to the output. If  $\phi_P$  is low and  $\phi_M$  is high,  $V_{GM}$  controls  $M_M$  to make  $V_{RECT,M}$  track the MPP. The last case is that both  $\phi_P$  and  $\phi_M$  are high.  $M_P$  is turned on first, and a falling-edge detector detects the falling of  $V_{GP}$  and activates  $M_M$  subsequently. If only  $V_{RECT,P}$  is delivered, a zero-current sensor (ZCS) drives the NMOS according to  $V_{GP}$ . When  $V_{RECT,P}$  and  $V_{RECT,M}$  are harvested together, the ZCS drives the NMOS in accordance with  $V_{GM}$  of  $M_M$ , which is turned on later.

In order to prevent  $M_M$  from breaking down due to the extremely high voltage, when  $\phi_P = V_{DD}$ , the proposed HV protector is implemented between  $M_M$  and  $V_{LX}$ . Figure 8.6.4 describes the operation of the proposed HV protector. It consists of a series-connected switch,  $M_{PRT}$ , and charging-discharging switches with level shifters (LSs).  $M_{PRT}$  is formed by connecting the two source terminals of an LDMOS with active body biasing. In state 1, the  $M_P$  is turned on and  $V_{PRT}$  reaches  $V_{RECT,P}$ , which is the same as  $V_{LX}$  such that  $V_{GS}$  of  $M_{PRT}$  is 0V. In the intermediate process from state 1 to state 2 in which  $M_M$  is turned on, the charge ( $\Delta Q$ ) in the parasitic capacitance ( $C_{PAR}$ ) starts to flow toward the LS, and  $V_{PRT}$  drops from  $V_{RECT,P}$ . When entering state 2, the remaining charge in  $V_{PRT}$  flows through the LS, and this causes  $V_{PRT}$  to drop to  $V_{RECT,M} - V_{DD}$ . As  $V_{PRT}$  is less than  $V_{RECT,M}$ ,  $M_{PRT}$  can transfer the power from  $V_{RECT,M}$  to  $V_{LX}$  with the accumulated  $I_L$ .

Figure 8.6.5 shows the waveforms obtained from using the harvesting system. Each output current from TENG flows into each output of DOR. In the steady state,  $V_{RECT,P}$  and  $V_{RECT,M}$  are applied to  $V_{LX}$  accordingly. Figure 8.6.6 shows the measured performance of the system and the comparison table. The proposed system exhibits a maximum MPPT efficiency of 97%. The maximum power conversion efficiency is measured as 51.1% on considering the power consumption of the controller.

### References:

- [1] M. Shim, et al., "Self-Powered 30  $\mu$ W to 10 mW Piezoelectric Energy Harvesting System with 9.09 ms/V Maximum Power Point Tracking Time," *IEEE JSSC*, vol. 50, no. 10, pp. 2367-2379, Oct. 2015.
- [2] S. Stanzione, et al., "A Self-Biased 5-to-60V Input Voltage and 25-to-1600 $\mu$ W Integrated DC-DC Buck Converter with Fully Analog MPPT Algorithm Reaching up to 88% End-to-End Efficiency," *ISSCC*, pp. 74-75, Feb. 2013.
- [3] S. Stanzione, et al., "A 500nW Batteryless Integrated Electrostatic Energy Harvester Interface Based on a DC-DC Converter with 60V Maximum Input Voltage and Operating From 1 $\mu$ W Available Power, Including MPPT and Cold Start," *ISSCC*, pp. 372-373, Feb. 2015.
- [4] H. Chen, et al., "An Energy-Recycling Three-Switch Single-Inductor Dual-Input Buck/Boost DC-DC Converter with 93% Peak Conversion Efficiency and 0.5mm<sup>2</sup> Active Area for Light Energy Harvesting," *ISSCC*, pp. 374-375, Feb. 2015.



Figure 8.6.1: Characteristics and analysis of TENG.

Figure 8.6.2: Conventional and proposed rectifiers and HV DI DC-DC converters.



Figure 8.6.3: Top architecture of the TE energy harvesting system and synchronous pulse-skipping modulation.



Figure 8.6.4: Schematic and waveforms of the proposed HV protector.



Figure 8.6.5: Waveforms and the power-conversion efficiency of the harvesting system.

|                               | JSSC 2015 [1] | ISSCC 2013 [2] | ISSCC 2015 [3] | This work       |
|-------------------------------|---------------|----------------|----------------|-----------------|
| Process                       | 0.35-μm BCD   | 0.25-μm BCD    | 0.25-μm BCD    | 0.18-μm BCD     |
| Harvester type                | Piezoelectric | Electrostatic  | Electrostatic  | Triboelectric   |
| Topology                      | Buck-boost    | Buck           | Buck           | Dual-input buck |
| Input voltage                 | 1-7V          | 5-60V          | <60V           | <70V            |
| Input power                   | 33pW-10mW     | 25pW-1.6mW     | 1pW-1mW        | 4.5-16pW        |
| Output voltage                | 1-8V          | 2-5V           | 0-5V           | 1-5V            |
| Maximum conversion efficiency | 80%           | 88.7%          | 86%            | 51.1%           |
| Maximum MPPT efficiency       | 99%           | 99.9%          | 99%            | 97%             |

Figure 8.6.6: Measurement system and comparison with that of conventional works.



Figure 8.6.7: Die micrograph.

## 8.7 A Piezoelectric Energy-Harvesting Interface Circuit with Fully Autonomous Conjugate Impedance Matching, 156% Extended Bandwidth, and 0.38 $\mu$ W Power Consumption

Yifeng Cai<sup>1</sup>, Yiannos Manoli<sup>1,2</sup>

<sup>1</sup>University of Freiburg - IMTEK, Freiburg, Germany

<sup>2</sup>Hahn-Schickard, Villingen-Schwenningen, Germany

Harvesting energy from ambient vibrations with piezoelectric transducers is an alternative solution for the surging needs of self-powered devices, such as IoT devices, condition and structural monitoring devices, or biomedical implants. A limiting factor for piezoelectric transducer is the narrow bandwidth due to the high mechanical quality factor. Therefore, the output power drops significantly when the excitation frequency deviates from the resonant frequency in a real environment. One solution is to introduce time delays into active harvesting-interface concepts such as Synchronous Electric-Charge Extraction (SECE) or Synchronized Switch Harvesting on Inductor (SSHII). This emulates conjugate impedance, which matches that of the transducer, resulting in higher power from vibrations at non-resonant frequencies.

Although this idea has been verified by [1-4], the real challenge of implementing such a system is the ability to automatically adjust its parameters according to the excitation frequency, while being self-powered with low power consumption. These concepts, which are based on discrete components [1] or on simulations [2,3], use manual parameter adjustments, external power sources, or extra mechanical components without taking the power consumption into consideration, making them impractical in real applications. The chip in [4] uses look-up tables to achieve automatic impedance matching, but the LUTs must be re-configured once the system loses power, which greatly limits its usability. This work presents both an enhanced SECE interface with fully autonomous conjugate-impedance matching after a one-time configuration and also a unique unbalanced-switching method to counteract the drawbacks of the conventional voltage doubler design.

Figure 8.7.1 shows the block diagram of the presented system and its operating principle: the peak detector (PD) senses when the output voltage ( $V_{PEH}$ ) of the transducer (PZT) reaches its maximum. The system waits for a time delay  $t_1$  generated by the ramp generator (RG) after each peak, and then closes the switch S1 for a short time  $t_2$  to extract the energy stored in PZT to the inductor L. The duration of  $t_2$  depends on the polarity of  $V_{PEH}$ . If  $V_{PEH} < 0$  before S1 is closed (negative half-cycle NHC), then  $t_2 = 72\mu s$  in order to send all the extracted energy back to the PZT, reversing the polarity of  $V_{PEH}$  and harvesting no energy at all. If  $V_{PEH} > 0$  (positive half-cycle PHC), then  $t_2$  is shorter so that some energy is sent back to the PZT to increase the damping force, while the rest is sent to  $C_{BUF}$  by closing S2. A comparator detects when the transfer is finished and opens S2.

The different values of  $t_2$  for NHC/PHC (unbalanced-switching) cycles are implemented to achieve AC-DC conversion without using a rectifier or a voltage doubler. It automatically generates a DC offset in the circuit to shift  $V_{PEH}$  to a higher potential. This results in more energy being extracted during a PHC, which compensates for not harvesting any at a NHC. Assuming that  $V_{PEH}$  for the conventional balanced method is  $V_0$  at the beginning of the energy extraction process and  $V_1$  at the end, then the DC offset is  $V_{OS} = (V_1 + V_0)/2$ , and the analysis in Fig. 8.7.2 shows that the harvested energy is exactly the same for both methods. Compared to the presented design, the rectifier solution [2] raises the gate driver voltage for S1 to the level of  $V_{PEH}$ , which increases the power consumption. The voltage doubler design [4] requires a larger chip area and isolated NMOS to implement an additional switch. It also needs 400% as much energy as the presented design to start-up the circuit for the same threshold voltage and total buffer capacitance; therefore, harvesting less energy since it stays longer in the low-efficiency passive mode.

Figure 8.7.3 shows the principle of the fully autonomous impedance matching. Under the constraints of the energy harvesting system, it is difficult to automatically and accurately generate the optimal  $t_1/t_2$  for maximum power output. However, they exhibit the following characteristics:  $t_2$  is symmetric around the resonant frequency, while  $t_1$  is relatively steady at far-off-resonant frequencies. Therefore, the frequency range can be divided into 5 frequency regions (FR1 to

FR5), each with a combination of  $t_1/t_2$  to approximate the optimal  $t_1/t_2$  as closely as possible. The system measures the excitation frequency to decide which combination to use. This design greatly reduces the number of  $t_1/t_2$  parameters to only 6 (Fig. 8.7.1) by accepting a lower power at certain frequency bands, but enabling the adjustment of the time delays by trimming a limited number of passive components. In the frequency range from 89.1 to 90.5Hz, the presented setting  $t_1=0ms$  causes no phase shift, while the optimal setting (5.1 to 5.5ms) delays the system by nearly half a cycle, resulting in a similar effect.

The parameters are generated and tuned by the RG shown in Fig. 8.7.4, which is also used for frequency measurement. The NMOS transistor M1 is connected to an off-chip  $2M\Omega$  trimmer resistor  $R_{SD}$  to provide source degeneration. This creates a current  $I_{SD}$  at sub-nA levels for charging the capacitors, so that delays of microseconds can be achieved without using large components. The parameters are adjusted by trimming  $R_{SD}$ , which can also compensate for chip-to-chip variations. For the frequency measurement circuit, the digital controller switches among three load capacitors as illustrated in Fig. 8.7.4. This design also provides a certain degree of robustness against supply-voltage variation. Figure 8.7.4 also shows the design of S1 and the PD. By combining a PMOS with a bulk-regulation circuit, S1 is capable of pinching-off the circuit even if  $V_{PEH}$  is negative or larger than the gate-drive voltage ( $\pm V_{BUF}$ ).

The chip is manufactured in a 0.35um CMOS technology and tested with a MIDE V21 transducer mounted to a shaker, with its resonant frequency tuned to 90.5Hz by adding a tip mass. Other than the presented concept, the chip can also be configured as a conventional SECE or can generate the optimal  $t_1/t_2$  curve as shown in Fig. 8.7.3 with manual adjustment. The presented system extends the 3dB bandwidth by 110% over the SECE mode or 156% over the natural bandwidth (3.2Hz) of the PZT (Fig. 8.7.5). Furthermore, the harvested power is 29% higher than that of SECE at the resonant frequency, since SECE only reaches its maximum efficiency when the electromechanical coupling coefficient ( $k^2Q_m$ ) of the transducer equals  $\pi/4$ . For a lower  $k^2Q_m$ , which is common for transducers with smaller sizes (emulated by shunting a capacitor to V21 in this experiment), this gap becomes even larger (Fig. 8.7.5). Assuming that the excitation frequency in a real environment drifts from 85 to 96Hz, the extracted power of the presented mode is 96% higher than that of SECE, giving it a significant advantage. Compared to the manual (optimal) mode, the presented design achieves a similar bandwidth and only 8.7% less average power in the same frequency range. The conventional SECE/SHHI methods [5,6] show no or little bandwidth improvement over the natural bandwidth (FOM) as shown in Fig. 8.7.6. Compared to similar techniques [1,4], the presented chip is able to run fully autonomously with better FOM, as well as lower chip area and power consumption.

### Acknowledgement:

The German Research Foundation partially funded this work (DFG/GRK1322).

### References:

- [1] P. Hsieh, et al., "Improving the Scavenged Power of Nonlinear Piezoelectric Energy Harvesting Interface at Off-Resonance by Introducing Switching Delay," *IEEE Trans. Power Electronics*, vol. 30, no. 6, pp. 3142-3155, June 2015.
- [2] A. Badel and E. Lefevre, "Wideband Piezoelectric Energy Harvester Tuned Through its Electronic Interface Circuit", *Journal of Physics: Conference Series*, vol. 557, no. 1, pp. 1-5 (012115), 2014.
- [3] S. Zhao, et al., "Extending the Bandwidth of Piezo-Electric Energy Harvesting Through the use of Bias Flip," *IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference*, pp. 1-3, Oct. 2015.
- [4] Y. Cai and Y. Manoli, "A Piezoelectric Energy Harvester Interface Circuit with Adaptive Conjugate Impedance Matching, Self-Startup and 71% Broader Bandwidth," *ESSCIRC*, pp. 119-122, Sept. 2017.
- [5] T. Hehn, et al., "A Fully Autonomous Integrated Interface Circuit for Piezoelectric Harvesters," *IEEE JSSC*, vol. 47, no. 9, pp. 2185-2198, Sept. 2012.
- [6] D. Sanchez, et al., "A 4 $\mu$ W-to-1mW Parallel-SSHII Rectifier for Piezoelectric Energy Harvesting of Periodic and Shock Excitations with Inductor Sharing, Cold Start-up and up to 681% Power Extraction Improvement," *ISSCC*, pp. 366-367, Feb. 2016.



|                              | This work         | [1]                 | [4]          | [5]          | [6]          |
|------------------------------|-------------------|---------------------|--------------|--------------|--------------|
| Process technology           | 0.35 $\mu$ m      | Discrete components | 0.35 $\mu$ m | 0.35 $\mu$ m | 0.35 $\mu$ m |
| Transducer                   | V21B              | V22B                | V22B         | V22B         | V22B         |
| Delayed-Scheme               | Delayed-SECE      | Delayed-SHII        | Delayed-SECE | SECE         | SHII         |
| Load-independent*            | Yes               | No                  | Yes          | Yes          | No           |
| Extra mechanical components  | No                | Accelerometer       | No           | No           | No           |
| Fully-autonomous             | Yes               | No                  | No           | Yes          | No           |
| Power consumption [ $\mu$ W] | 0.38              | NA                  | 0.85         | 4.4          | NA           |
| Resonant frequency [Hz]      | 90.5              | 425.5               | 175.4        | 174          | 224          |
| FOM**                        | 156%              | 23%***              | 71%          | <40%****     | <0****       |
| Chip area [mm $^2$ ]         | 0.80/0.27(active) | NA                  | 3.57         | 1.25         | 1.17(active) |

\* Doesn't require buffer voltage regulation to reach maximum efficiency

\*\* Bandwidth of the output power over the natural bandwidth of the transducer

\*\*\* Over SSHI

\*\*\*\* Calculated from the paper

Figure 8.7.6: Comparison table.



- 1: S1
- 2: S2
- 3: Start-up circuit
- 4: Logic
- 5: Comparator
- 6: Peak detector
- 7: Ramp generator 1  
Ramp generator 2  
Ramp generator 3

Figure 8.7.7: Die micrograph.

## 8.8 A 30nA Quiescent 80nW-to-14mW Power-Range Shock-Optimized SECE-Based Piezoelectric Harvesting Interface with 420% Harvested-Energy Improvement

Anthony Quelen<sup>1</sup>, Adrien Morel<sup>1</sup>, Pierre Gasnier<sup>1</sup>, Romain Gréaud<sup>1</sup>, Stéphane Monfray<sup>2</sup>, Gaël Pillonnet<sup>1</sup>

<sup>1</sup>CEA-LETI-MINATEC, Grenoble, France

<sup>2</sup>STMicroelectronics, Crolles, France

Piezoelectric Energy Harvesters (PEHs) are usually used to convert mechanical energy (vibration, shocks) into electrical energy, in order to supply energy-autonomous sensor nodes in industrial, biomedical or domotic applications. Non-linear extraction strategies such as Synchronous Electrical Charge Extraction (SECE) [1,2], energy investing [3], and Synchronized Switch Harvesting on Inductor (SSHI) [4] have been developed to maximize the extracted energy from harmonic excitations. However, in most of today's applications, vibrations are not periodic and mechanical shocks occur at unpredictable rates [4]. SSHI interfaces naturally seem as the most appropriate candidates for harvesting shocks as they exhibit outstanding performance in periodic excitations [4]. However, the SSHI strategy presents inherent weaknesses while harvesting shocks, since the invested energy stored in the piezoelectric capacitance cannot be recovered.

In this work, we propose a self-starting batteryless 0.55mm<sup>2</sup> integrated energy-harvesting interface based on the SECE strategy, which has been optimized to work under shock stimulus. Due to the sporadic nature of mechanical shocks, which implies long periods of inactivity and brief energy peaks, the interface average power consumption is optimized by minimizing the quiescent power. A dedicated energy-saving sequencing circuit has thus been designed, reducing the static current to 30nA and enabling energy to be extracted with only one single 8μJ shock occurring every 100s. Our SECE-based circuit features a shock FOM 1.6× greater than previous SSHI-based interfaces in [4].

The proposed system, depicted in Fig. 8.8.1, is made of a negative voltage converter rectifying the PEH output voltage and a SECE power path, controlled by a sequencing circuit. The sequencing is divided in 4 phases, and the associated time diagrams are illustrated in Fig. 8.8.2. In the sleeping mode T<sub>1</sub>, all blocks, except for the shock detection (SD), are turned off. During the starting phase, the energy is stored in C<sub>ASIC</sub> through a cold-start path, increasing V<sub>ASIC</sub>. This progressively turns on the SD. Next, when stress, applied to the piezoelectric material, leads to an increase in V<sub>REC</sub>, the SD checks if the electrical energy, converted by the piezoelectric transducer, is sufficiently high to be harvested (V<sub>REC</sub>>V<sub>ASIC</sub>). By setting Flag<sub>SHOCK</sub>, the SD enables the V<sub>ASIC</sub> detection, which determines whether the cold-start path should be activated. If V<sub>ASIC</sub> is below 1.5V, we consider that the stored energy is insufficient to start the SECE operation, and the cold-start path remains connected in order to keep on charging C<sub>ASIC</sub>. If this is not the case, the detection block sends the Flag<sub>START</sub> signal, which disables the cold start, enables the peak detection, and starts the maximum-voltage-detection phase T<sub>2</sub>. When V<sub>REC</sub> reaches its maximum, the system enters its harvesting phase T<sub>3</sub>. V<sub>M1</sub> is set high, which connects the inductance L with the piezoelectric capacitance, C<sub>P</sub>. The dual-mode comparator (DMC) is used in its zero-crossing-detection (ZCD) configuration, and detects when V<sub>REC</sub> goes below V<sub>TL</sub>=-14mV, which means that almost all the energy previously stored in C<sub>P</sub> has been extracted in L. Then, the system starts its storing phase T<sub>4</sub>. N<sub>1</sub> is turned off, while P<sub>2</sub> and N<sub>2</sub> are turned on. The instance when I<sub>L</sub> reaches zero, which is detected by the same DMC used in its Reverse Current Detection (RCD) configuration, indicates that all the energy that was stored in L during T<sub>3</sub> has been transferred in C<sub>STORE</sub>. Ultimately, the system returns to its sleep mode T<sub>1</sub>, waiting for the next energy event. N<sub>3</sub> acts as a freewheeling diode and provides a path to dissipate the remaining energy in L.

Figure 8.8.3 shows detailed transistor-level schematics of the DMC and the shock V<sub>ASIC</sub> detectors. During T<sub>3</sub>, the DMC is enabled in its ZCD configuration. M<sub>2</sub> and M<sub>4</sub> constitute a differential pair allowing V<sub>REC</sub> to be compared with the ground voltage. Due to M<sub>1</sub>, when V<sub>REC</sub> is high, only 1/4 of the bias current flows through M<sub>1</sub> and M<sub>2</sub>. As V<sub>REC</sub> decreases (thanks to the charge transfer occurring between C<sub>P</sub> and L), the current in M<sub>2</sub> is increased, which improves the detection accuracy. Furthermore, the circuit consumption is reduced when V<sub>REC</sub> is high, since it is only

useful to increase the comparator performances when V<sub>REC</sub> gets close to 0V. V<sub>HYST</sub> is initially high, which creates a -14mV offset on the input of the comparator. The circuit also includes a comparator, which is used to accurately implement the zero-crossing detection. During T<sub>4</sub>, the DMC switches to its RCD configuration. V<sub>HYST</sub> is set low, which suppresses the -14mV offset. In this phase, V<sub>REC</sub> is proportional to -I<sub>L</sub>, as N<sub>2</sub> is turned on. Therefore, when I<sub>L</sub> decreases, V<sub>REC</sub> increases until it reaches 0V. Then, T<sub>1</sub> starts. The DCM is disabled in order to avoid any unnecessary energy consumption, and only the SD is powered. Therefore, during T<sub>1</sub>, the 30nA@1.5V current drawn from C<sub>ASIC</sub> is the one flowing through M<sub>17</sub>, as shown in Fig. 8.8.3. When V<sub>REC</sub> increases, current starts flowing through M<sub>14</sub>, which forces M<sub>16</sub> drain potential to increase. If V<sub>REC</sub>>V<sub>ASIC</sub>, then Flag<sub>SHOCK</sub> becomes high, which consequently enables the V<sub>ASIC</sub> detector by forcing M<sub>20</sub> conduction. To avoid any ringing, a resistance R<sub>HYS</sub> is used to create a difference between the high (1.5V) and low (1.4V) threshold. The integrated resistances R<sub>1</sub> and R<sub>2</sub> enable the minimum of V<sub>ASIC</sub> to be selected, to ensure the self-operation of the chip. In our case, we fixed this minimum V<sub>ASIC</sub> at 1.5V. When this condition is satisfied, Flag<sub>Start</sub> is set high, thanks to a two-stage comparator.

Our chip was fabricated in a CMOS 40nm technology including 10V devices, and occupies a 0.55mm<sup>2</sup> core area (Fig. 8.8.7). In order to emulate both periodic and shock excitations, a MIDE piezoelectric generator (PPA1011) with a 5.67g mobile mass and a resonant frequency of 75.4Hz has been placed on a shaker. The harvester has an intrinsic capacitance C<sub>P</sub> of 43nF. Figure 8.8.4 shows multiscale experimental waveforms of the interface circuit undergoing shocks with C<sub>STORE</sub> and C<sub>ASIC</sub> initially discharged. The shocks are applied every second, with various accelerations from 5 to 16G. The values of the off-chip inductance L and capacitances C<sub>STORE</sub> and C<sub>ASIC</sub> are 2.2mH, 100μF, and 10μF, respectively. After the first three shocks, which are used to store enough energy in C<sub>ASIC</sub>, thanks to the cold-start power path, the system operates autonomously in its optimized mode, and the energy is stored in C<sub>STORE</sub>. For test purposes, when V<sub>STORE</sub> reaches 2.8V, the energy-monitoring block intermittently connects a 1kΩ load resistance to emulate the consumption of a sensor.

The power stored in C<sub>STORE</sub> was measured under shock and periodic vibrations for various V<sub>STORE</sub>, as shown in Fig. 8.8.5. From weak to strong shocks, our chip harvested 2.8 to 4.2× more energy than the maximum energy harvested using an on-chip full-bridge-rectifier interface, while it reached an FOM of 3.14 under periodic excitation. In Fig. 8.8.6, the performance of our chip is compared to prior art. We obtained a 1.6× shock-FOM enhancement in comparison to a previous work [4]. Our system also shows the best FOM under periodic excitation compared to other SECE interfaces in [1,2]. The measured maximum end-to-end efficiency of our circuit is 94% under periodic excitation at 82μW, which is the highest end-to-end efficiency compared to former works in [1-4]. The measured quiescent current in sleeping mode is 30nA@1.5V. This allows self-operation of our circuit with an input power as low as 80nW. We were able (using various PEHs) to maintain an efficiency over 70% for input power below 14mW. The proposed IC in a 40nm technology allows to add harvesting functionalities within a microcontroller die.

### Acknowledgments:

This research was, in part, funded by the French Inter-ministerial Fund (FUI), through HEATec project, and by STMicroelectronics.

### References:

- [1] P. Gasnier, et al., "An Autonomous Piezoelectric Energy Harvesting IC Based on a Synchronous Multi-Shot Technique," *IEEE JSSC*, vol. 49, no. 7, pp. 1561–1570, 2014.
- [2] T. Hehn, et al., "A Fully Autonomous Integrated Interface Circuit for Piezoelectric Harvesters," *IEEE JSSC*, vol. 47, no. 9, pp. 2185–2198, 2012.
- [3] D. Kwon, et al., "A Single-Inductor 0.35μm CMOS Energy-Investing Piezoelectric Harvester," *ISSCC*, pp. 78–79, Feb. 2013.
- [4] D. A. Sanchez, et al., "A 4μW-to-1mW Parallel-SSHI Rectifier for Piezoelectric Energy Harvesting of Periodic and Shock Excitations with Inductor Sharing, Cold Start-up and up to 681% Power Extraction Improvement," *ISSCC*, pp. 366–367, Feb. 2016.



Figure 8.8.1: Proposed piezoelectric energy-harvesting-interface overview.



Figure 8.8.2: Waveforms, chronograms, and sequencing associated with the system and its control circuit.

Figure 8.8.3: Dual-mode-comparator schematic, its associated waveforms, and the shock and V<sub>ASIC</sub> detector schematic.

Figure 8.8.4: Measured transient waveforms of the proposed chip under shock excitation.



Figure 8.8.5: Measured harvested-power comparison between our system harvested power and that using an active standalone full-bridge rectifier (FBR) under periodic and shock excitations.

|                               | [1]                 | [2]       | [3]              | [4]                | This Work        | Unit            |
|-------------------------------|---------------------|-----------|------------------|--------------------|------------------|-----------------|
| Technology                    | 350                 | 350       | 350              | 350                | 40               | nm              |
| Chip Size                     | 3.6                 | 1.25      | 2.34             | 0.72               | 0.55             | mm <sup>2</sup> |
| Scheme Type                   | SECE                | SECE      | Energy Investing | SSH                | SECE             | -               |
| Piezoelectric Harvester       | Murata              | MIDE V22B | MIDE V22B        | MIDE V21B & V22B   | MIDE             | -               |
| C <sub>P</sub>                | 23                  | 19.5      | 15               | 26                 | 43               | nF              |
| Excitation type               | Periodic            | Periodic  | Periodic & Shock | Periodic & Shock   | Periodic & Shock | -               |
| Operation Frequency           | 100                 | 174       | 143              | 225                | 75.4             | Hz              |
| FOM (periodic) <sup>(2)</sup> | ≈170 <sup>(1)</sup> | 206       | 360              | 681                | 314              | %               |
| FOM (shocks) <sup>(3)</sup>   | N/A                 | N/A       | -                | 269                | 420              | %               |
| Cold Startup                  | Yes                 | Yes       | No               | Yes                | Yes              | -               |
| End-to-end Efficiency         | 61                  | 85.3      | 69.2             | ≈88 <sup>(1)</sup> | 94               | %               |
| Input power range             | 10-1000             | 5-500     | -                | 4-1000             | 0.080-14000      | μW              |
| Quiescent current             | 1                   | 0.3       | 0.1              | ≈1 <sup>(1)</sup>  | 0.03             | μA              |

(1) Calculated from the paper

(2) FOM (periodic) =  $\frac{\max(P_{out})}{f \cdot V_{oc}^2 \cdot C_p}$  (3) FOM (shocks) =  $\frac{\max(P_{out})}{\max(P_{out,BD})}$ 

Figure 8.8.6: Performances comparison with prior art.

- 1. Energy Monitoring
- 2. Peak Detection
- 3. Current Reference
- 4. ZCD
- 5. NVC
- 6. SD & V<sub>ASIC</sub> Detection
- 7. Drivers
- 8. RCD
- 9. Digital



Figure 8.8.7: Die micrograph.

## 8.9 A Fully Integrated Split-Electrode Synchronized-Switch-Harvesting-on-Capacitors (SE-SSH) Rectifier for Piezoelectric Energy Harvesting with Between 358% and 821% Power-Extraction Enhancement

Sijun Du, Ashwin A. Seshia

University of Cambridge, Cambridge, United Kingdom

Along with the development of the Internet of Everything (IoE), miniaturized piezoelectric vibration-energy harvesters have drawn significant recent interest as a means of harvesting ambient kinetic energy to power wireless sensors. As the energy generated by a piezoelectric transducer (PT) cannot be directly used, an interface circuit is needed to rectify the generated power and provide a stable supply. Full-bridge rectifiers (FBR) are widely used due to their simplicity despite their low energy efficiency. Recently, various interface circuits have been reported [1-5] to improve power efficiency, such as the SSHI (Synchronized Switch Harvesting on Inductor) rectifier. However, most of these reported circuits require large inductors to achieve good performance, and these inductors significantly increase the system volume, counter to the requirement for system miniaturization. Although a flipping-capacitor rectifier was proposed in [2] to flip voltages using on-chip capacitors, it was designed for high frequency (>100kHz) ultrasonic energy transfer applications and does not work with PTs with a large internal capacitor  $C_p$  since the values of the capacitors required are too large for on-chip implementation. Another inductorless circuit, named SSHC (synchronized switch harvesting on capacitors), was recently proposed in [1] (Fig. 8.9.1); however, the required switched-capacitor (SC) values must equal  $C_p$  to achieve optimal performance and this limits the on-chip implementation for PTs with large  $C_p$  capacitance.

In this paper, an inductorless fully integrated split-electrode (SE) SSHC rectifier is proposed, which is integrated with a MEMS PT with the electrode split into  $N$  ( $N=4$  in this work) regions, and the required SC values to achieve optimal performance using a SSHC rectifier is reduced to  $C_p/N$ . The split-electrode design enables the on-chip implementation of the SCs even if the  $C_p$  of a PT is high, and hence, enables a fully integrated implementation. Figure 8.9.1 shows the split-electrode PT, which consists of a single AlN-on-Si micromachined proof mass, and the electrode layers are split into 4 equal regions. As the electrodes are located on a common cantilever, the signals generated from these 4 regions during vibration have similar amplitudes, frequencies, and phases; hence, they can be electrically connected in parallel or in series. While the electrode is not split, it is equivalent to four regions connected in parallel and the total capacitance is noted as  $C_p$ . While these four regions are connected in series, the effective capacitance of the PT becomes  $C_p/16$ , which is significantly smaller compared to the corresponding monolithic electrode configuration. Hence, this series configuration can be used during voltage flipping so that the required SC values are decreased by a factor of 16.

A die micrograph of a custom MEMS PT is shown in Fig. 8.9.2 and a custom MEMS process with a thick (3μm) AlN (Aluminum Nitride) layer is used to fabricate the PT. The measured capacitance for the parallel connection is  $C_p=1.94\text{nF}$ , and when the four regions are connected in series, the resulting capacitance of the PT is  $0.155\text{nF}$ . Therefore, the SCs in SSHC rectifiers must be equal to  $0.155\text{nF}$  for optimal performance. The plot in the right-hand corner shows the theoretical voltage-flip efficiency values while different numbers of SCs are employed in a SSHC rectifier. In this implementation, the number of employed SCs can be up to 8. Figure 8.9.2 also shows the connection configuration cycle when the voltage across the PT ( $V_{PT}$ ) is flipped from positive to negative. Once the voltage flipping is commenced, the four electrode regions are configured to be connected in series. 17 pulses are then sequentially generated (assuming 8 SCs are employed), in the order of  $\phi_{1p}, \phi_{2p}, \phi_{3p}, \phi_{4p}, \phi_{5p}, \phi_{6p}, \phi_{7p}, \phi_{8p}, \phi_{7n}, \phi_{6n}, \phi_{5n}, \phi_{4n}, \phi_{3n}, \phi_{2n}, \phi_{1n}$ , to flip the voltage in three steps: sequentially storing charge in the 8 SCs following the first 8 pulses, clearing the remaining charge in the PT following the middle pulse  $\phi_0$ , and sequentially recovering the charge from the 8 SCs to bias the PT in an opposite polarization following the last 8 pulses. When the voltage needs to be flipped from negative to positive, the sequence of the 17 pulses is reversed. After the voltage flipping is complete, the connection of the four electrode regions is configured back to a parallel connection.

The system architecture of the implemented SE-SSH interface is shown in Fig. 8.9.3. The four regions of the PT are connected to the system via a connection configuration block, which connects them in parallel while the signal  $PARA$  is high; otherwise, in series. A zero-crossing detection block is employed to generate a synchronous signal,  $SYN$ , at each zero-crossing instant of the generated current from the PT. Sequential pulse signals are then generated at each rising edge of  $SYN$  in the pulse generation

block and these pulses are sequenced in the following sequencing block according to the signal  $PN$ , representing the polarization of  $V_{PT}$ . The number of generated pulse signals depends on the number of SCs being enabled. For the case of  $k$  enabled SCs, a total of  $2k+1$  pulses are generated and sequenced. In this implementation,  $k$  can be set to up to 8 since there are 8 on-chip SCs; hence, up to 17 pulses are generated. These sequenced pulses drive the switches in the switch control block to flip the voltage across the PT. During the generation of the pulses,  $PARA$  is set to low to force a series connection of the four electrode regions of the PT for capacitance matching. The switch control block consists of all switches and on-chip SCs to perform the voltage flip. While the four PT regions are connected in series during the voltage flip, the  $PARA$  signal is used in this block to disconnect the FBR from the system. An on-chip voltage regulator is employed to generate a 1.5V DC supply for the system to be self-sustained. The measured waveforms while enabling 1 and 8 SCs are shown in this figure, as well as the zoomed-in waveforms at voltage flip instants. The spikes for voltage flip instants show the increased  $V_{PT}$  voltage (increased by 4 times) due to the series connection of the four electrode regions.

Figure 8.9.4 shows circuit implementations for three blocks. The connection configuration block connects the four regions of the PT to the CMOS interface using parallel or series connections according to the signal  $PARA$ . The switches in this block employ thick-oxide transistors to tolerate voltages as high as 25V since the voltage across the PT is 4 times higher when connected in series. The zero-crossing detection block employs two continuous-time comparators and a D-flipflop to generate the synchronous signal  $SYN$  and a voltage polarization signal  $PN$ , with waveforms as shown for  $V_{PT} = V_{P4} - V_{N1}$ . The reference voltage  $V_{ref}$  is set to slightly higher than  $-V_D$  to detect the moment while either  $V_{P4}$  or  $V_{N1}$  begins to increase from  $-V_D$ , where  $V_D$  is the forward voltage drop of the diodes used in the FBR. This is also the moment when the current generated from the PT crosses zero and when the voltage  $V_{PT}$  needs to be flipped. At this instant, a rising edge in  $SYN$  is generated. If  $V_{PT}$  needs to be flipped from positive to negative,  $PN$  is high; otherwise, low. The pulse generation block employs 17 pulse cells to generate up to 17 sequential pulses, of which the pulse width can be adjusted externally. The 8 SCs can be selectively enabled by signals  $EN_1$  to  $EN_8$ , and signal  $EN_0$  enables the  $\phi_0$  switch.

The pulse-sequencing block (Fig. 8.9.5) consists of 8 multiplexers to sequence the (up to) 17 signals according to the signal level of  $PN$ . These signals are shifted to a higher level via an array of level shifters to fully turn ON and OFF the switches. The switch-control block consists of 33 analog switches and 8 on-chip SCs using capacitor arrays. The capacitance values of the 8 SCs can be externally adjusted according to the capacitance of the PT to achieve optimal voltage-flip performance. This block also disconnects the on-chip FBR from the system when the voltage flip is being performed. The FBR is implemented using four on-chip Schottky diodes and the forward voltage drop of the diodes,  $V_b$ , is measured to be around 0.2V.

The chip was fabricated in a 0.18μm HV CMOS process. During the measurements, the MEMS PT was excited at its resonance (219Hz) on a shaker driven by a sinusoidal signal. The measured power under weak excitations (open-circuit zero-peak amplitude generated from the PT:  $V_{OC} = 2.5\text{V}$ ) and a wide range of excitation levels (up to  $V_{OC} = 22\text{V}$ , equivalent to 3.4g acceleration level) are depicted in Fig. 8.9.6. The performance improvement of the IC compared to a FBR is up to 8.2x under weak excitation and the chip has been tested to output up to 186μW power at  $V_{OC} = 22\text{V}$ . Measurements demonstrate significant performance improvement and a record power output for a MEMS PT relative to the prior art as shown in the accompanying table in Fig. 8.9.6. A die micrograph is shown in Fig. 8.9.7.

### Acknowledgements:

The authors thank the UK Engineering and Physical Sciences Research Council (EPSRC) (Grant Numbers: EP/L010917/1 and EP/N021614/1) for financial support.

### References:

- [1] S. Du and A. Seshia, "An Inductorless Bias-Flip Rectifier for Piezoelectric Energy Harvesting," *IEEE JSSC*, vol. 52, pp. 2746-2757, Oct. 2017.
- [2] Z. Chen, et al., "A 1.7mm<sup>2</sup> Inductorless Fully Integrated Flipping-Capacitor Rectifier (FCR) for Piezoelectric Energy Harvesting with 483% Power-Extraction Enhancement," *ISSCC*, pp. 372-373, Feb. 2017.
- [3] D. Sanchez, et al., "A 4uW-to-1mW Parallel-SSH Rectifier for Piezoelectric Energy Harvesting of Periodic and Shock Excitations with Inductor Sharing, Cold Start-up and up to 681% Power Extraction Improvement," *ISSCC*, pp. 366-367, Feb. 2016.
- [4] D. Kwon and G. Rincón-Mora, "A Single-Inductor 0.35μm CMOS Energy Investing Piezoelectric Harvester," *ISSCC*, pp. 78-79, Feb. 2013.
- [5] Y. Ramadas and A. Chandrakasan, "An Efficient Piezoelectric Energy-Harvesting Interface Circuit Using a Bias-Flip Rectifier and Shared Inductor," *ISSCC*, pp. 296-297, Feb. 2009.



**Figure 8.9.1:** Circuit diagrams showing the operation of the Switched Synchronized Harvesting-on-Capacitor (SSHC) circuit, and the specific implementation for the split electrode configuration.



**Figure 8.9.2:** The design of a custom MEMS PT with a custom process for higher output power, and a schematic description of the connection configuration cycle during voltage flip.



**Figure 8.9.3:** System architecture and measured waveforms while enabling 1 capacitor and 8 capacitors, respectively.



**Figure 8.9.4:** Circuit schematics of the connection configuration block, zero-crossing block, and pulse generation block, and waveforms corresponding to the zero-crossing block.



**Figure 8.9.5:** Schematics for the pulse sequencing block, level shifters, and switch control block.



(a) Power improvement compared to a FBR.  $P_{IC}$  is the power with ICs and  $P_{FBR}$  is the power with FBRs.  
(b) Varies based on the number of switched capacitors enabled (from 1 to 8)

**Figure 8.9.6:** Measured performance and comparison table with prior art.



Figure 8.9.7: Die micrograph.

## 8.10 A 13.56MHz Time-Interleaved Resonant-Voltage-Mode Wireless-Power Receiver with Isolated Resonator and Quasi-Resonant Boost Converter for Implantable Systems

Se-Un Shin<sup>1</sup>, Minseong Choi<sup>1</sup>, Seok-Tae Koh<sup>1</sup>, Yujin Yang<sup>1</sup>, Seungchul Jung<sup>2</sup>, Young-Hoon Sohn<sup>1</sup>, Se-Hong Park<sup>1</sup>, Yongmin Ju<sup>1</sup>, Youngsin Jo<sup>1</sup>, Yeunhee Huh<sup>1</sup>, Sungwon Choi<sup>1</sup>, Sang Joon Kim<sup>2</sup>, Gyu-Hyeong Cho<sup>1</sup>

<sup>1</sup>KAIST, Daejeon, Korea

<sup>2</sup>Samsung Advanced Institute of Technology, Suwon, Korea

Wireless power transfer (WPT) has been widely adopted in various applications, such as biomedical implants and wireless sensors. A conventional voltage-mode receiver (VM-RX) uses a rectifier or a doubler for AC-DC conversion [1,2]. This requires a sufficiently large input power ( $P_{IN}$ ) inducing a large voltage ( $V_{AC}$ ) in the LC tank of the receiver (RX) due to the limited voltage conversion efficiency. A subordinate DC-DC converter is also required for voltage regulation or battery charging, which reduces the overall power-conversion efficiency (PCE) due to the 2-stage structure. To overcome these limitations, the resonant current-mode receiver (RCM-RX) has been proposed for direct battery charging [3] and voltage regulation [4,5]. The RCM-RX has two operation phases: a resonance phase ( $PH_{RE}$ ) that accumulates energy in the LC tank during optimal resonant cycles ( $N_{OPT}$ ) to track the maximum efficiency [3], and a charging phase ( $PH_{CH}$ ) that delivers the energy of the LC tank to the output, when the resonant current ( $I_{AC}$ ) is at its peak. However, the RCM-RX typically operates at low resonant frequency  $f_{RESO}$  (50kHz to 1MHz) because it is challenging to accurately detect the peak timing of  $I_{AC}$  due to the intrinsic delay and offset of the comparator used in the peak timing detector. Operating at low  $f_{RESO}$  causes the coil size to increase, making a burden on a size-constrained implant. In addition, the RCM-RX has a LC-tank resonance-loss interval  $PH_{CH}$ , which hinders optimal power transfer from the transmitter (TX) to the RX because the reactive impedance is not cancelled out but appears on the TX side. Because the LC tank and the output are not isolated during  $PH_{CH}$ , the power-transfer efficiency (PTE) can also be affected by load variation, such as the battery-voltage ( $V_{BAT}$ ) variation. These problems become worse as  $N_{OPT}$  is reduced to lower number.

To address the above issues, this paper proposes a resonant voltage-mode receiver (RVM-RX) that can operate at high  $f_{RESO}$  of 13.56MHz with a small RX coil for direct battery charging. The LC tank in the RVM-RX is always configured while isolated from the output, leading to continuous optimal power transfer regardless of the operation phase and  $V_{BAT}$  level.

Figure 8.10.1 shows the top structure of the RVM-RX with two capacitors ( $C_{2A}$ ,  $C_{2B}$ ), a coil ( $L_2$ ), and an inductor ( $L_{EX}$ ). The RVM-RX has two configurations separated by two switches ( $S_1$ ,  $S_2$ ): an LC tank and a quasi-resonant boost converter (QRBC), where the capacitor  $C_{2A}$  (or  $C_{2B}$ ) is connected to  $L_2$  (or  $L_{EX}$ ) via switch  $S_1$  (or  $S_2$ ). The RVM-RX uses a voltage-peak timing detector (VPTD). When the peak timing of  $V_{AC}$  is found in the LC tank after resonating  $N_{OPT}$  times, similar to [3] for tracking the maximum efficiency, this peak voltage is stored in  $C_{2A}$  or  $C_{2B}$ , which is connected to the QRBC. The QRBC then extracts all of the capacitor energy by finding the optimal duty cycle (D) through the optimal duty-cycle tracker (ODT).

In the conventional VM-RX, the PTE is significantly affected by the equivalent resistance of the output loading [1,2]. The PTE in the RCM-RX is also affected by load variation during  $PH_{CH}$ . In contrast, the RVM-RX utilizes a time-interleaving scheme to avoid such load dependency and charges the battery independently of  $V_{BAT}$  level. Figure 8.10.2 shows the operation of the RVM-RX. In  $\phi_1$ , the LC tank is formed with  $C_{2A}$ , and the QRBC is formed with  $C_{2B}$  during  $N_{OPT}$  times of resonance, and vice versa in  $\phi_2$ . Owing to this scheme, the LC tank is isolated from the output and keeps continuous resonance. Therefore, the PTE can be maintained at its optimum condition regardless of  $V_{BAT}$  variation. Moreover, the RVM-RX enables the optimal power transfer while cancelling out the reactive impedance of TX at all times.

Unlike the RCM-RX, the RVM-RX adopts a passive zero-current detection method to sense the peak timing of  $V_{AC}$  accurately in one shot using the characteristic of the diode to enable operation at high  $f_{RESO}$  without any trimming. Figure 8.10.3 shows the circuit of the VPTD and the operation with the switches  $S_1$  and  $S_2$ . The switch configuration of  $S_1$  ( $S_2$ ) is shown in Fig. 8.10.3 (right middle). All operations of the VPTD are based on the recovered clock (CLKR) using the capacitor voltages  $V_{C2A}$  and  $V_{C2B}$ . First, the sub-switch  $M_{11}$  of  $S_1$  is turned on to form an LC tank,  $L_2$  and  $C_{2A}$ , when resonance starts. The energy of the LC tank is then accumulated as resonance cycle repeats, generating CLKR, with  $V_{AC}$  increasing. The diode-mode signal (DM) is high (DM=1) when CLKR exceeds  $N_{OPT}$  times. When DM=1,  $M_{11}$  is changed from ON state to the diode mode ( $\phi_{G4B}$ ). At the peak of  $V_{AC}$ ,  $I_{AC}$  becomes zero and the diode is turned off, causing  $V_{AC}$  to drop sharply due to parasitic resonance through  $L_2$  and the parasitic capacitance of  $V_{AC}$  node. Using this phenomenon, the peak timing is sensed quickly by a falling edge of  $V_{AC}$ . At this time, the phase is changed from  $\phi_1$  to  $\phi_2$  and DM becomes low (DM=0). Additionally, the sub-switch  $M_{21}$  of  $S_1$  is turned on to form QRBC. The LC tank is reconfigured with  $L_2$  and  $C_{2B}$  through  $M_{12}$  of  $S_2$ . After repeating the above procedure in  $S_2$ , the phase is changed from  $\phi_2$  to  $\phi_1$ , and vice versa, as in the operation phases (1) to (4) in Fig. 8.10.3. Meanwhile, the switching loss is reduced by multilevel gate driving ( $\phi_{G1}$  to  $\phi_{G5}$ ) at the phase transition between  $\phi_1$  and  $\phi_2$ , and the negative envelope holder is used to generate  $V_L$  to turn off  $S_1$  or  $S_2$ . The  $V_{TH}$  cancellation scheme is also applied using a capacitor ( $C_D$ ) to reduce the conduction loss due to the diode forward voltage when DM=1 ( $\phi_{G4A}$ ,  $\phi_{G4B}$ ).

Once  $C_2$  ( $C_{2A}$  or  $C_{2B}$ ) with stored energy is connected to the QRBC, it is necessary to find the optimal D that can extract all of the energy, as shown in Fig. 8.10.4. The one-shot pulses from the falling edges of both  $\phi_1$  and  $\phi_2$  are provided to the delay line, and D is generated using this delayed signal in the ODT. At each phase, the capacitor voltage checker (CVC) in the QRBC determines whether the energy of  $C_2$  is well extracted by providing a selection signal (Sel) after zero-current detection for  $S_4$ . If Sel is 0, D is increased in the next period (UP). Conversely, if Sel is 1, D is decreased in the next period (DN), and this is repeated to form the digital feedback shown in the timing diagram in Fig. 8.10.4. In the RCM-RX approach, the small inductance of the coil increases the peak of the battery charging current as  $P_{IN}$  increases, resulting in lower efficiency and stress on the battery. In contrast, the QRBC can significantly reduce this peak current, allowing efficient and stable power delivery to the output even when  $P_{IN}$  is increased.

The RVM-RX was fabricated in a 0.18μm 1P4M CMOS process. The left side of Fig. 8.10.5 shows that the RVM-RX delivers power to the output with low peak current through the QRBC and senses the peak timing of  $V_{AC}$  accurately at 13.56MHz in steady state. When  $N_{OPT}$  is 2, 3, and 4,  $P_{IN}$  is 23mW, 11mW, and 9.5mW, respectively. In particular, when  $N_{OPT}$  is 2,  $I_L$  has continuous conduction-mode (CCM) characteristic. In addition, the right of Fig. 8.10.5 shows the waveforms where the QRBC with a 4.7μH small-size inductor finds the optimum D and settles in steady state by the ODT, achieving the VPTD accuracy of up to 98% with 5 samples. Figure 8.10.6 shows the PCE plot with different  $P_{IN}$  and a comparison table with the RCM-RX. The peak efficiencies of RX and the QRBC are 67.8% and 95%, respectively. Since the LC tank is always configured and isolated with the output, the optimal power transfer is always possible regardless of the operation phase and  $V_{BAT}$  level. Moreover, the RVM-RX has efficiency comparable to that of the RCM-RX notwithstanding the high  $f_{RESO}$  of 13.56MHz, enabling a smaller RX coil. The die micrograph is shown in Fig. 8.10.7.

### References:

- [1] H.-M. Lee and M. Ghovanloo, "An Adaptive Reconfigurable Active Voltage Doubler/Rectifier for Extended-Range Inductive Power Transmission," ISSCC, pp. 286-287, Feb. 2012.
- [2] M. Kiani, et al., "A Power-Management ASIC with Q-Modulation Capability for Efficient Inductive Power Transmission," ISSCC, pp. 226-227, Feb. 2015.
- [3] M. Choi, et al., "A Current-Mode Wireless Power Receiver with Optimal Resonant Cycle Tracking for Implantable Systems," ISSCC, pp. 372-373, Feb. 2016.
- [4] H. Gougheri and M. Kiani, "Current-Based Resonant Power Delivery with Multi-Cycle Switching for Extended-Range Inductive Power Transmission," IEEE TCAS-I, vol. 63, pp. 1543-1552, Sept. 2016.
- [5] H. Gougheri and M. Kiani, "Adaptive Reconfigurable Voltage/Current-Mode Power Management with Self-Regulation for Extended-Range Inductive Power Transmission," ISSCC, pp. 374-375, Feb. 2017.



Figure 8.10.1: Overall structure of the RVM-RX.



Figure 8.10.2: Operation principle of the RVM-RX.



Figure 8.10.3: Operation principle and circuit implementation of the voltage-peak timing detector.



Figure 8.10.4: Operation principle and timing diagram of the quasi-resonant boost converter with the optimal duty-cycle tracker.



Figure 8.10.5: Measurement results.



Figure 8.10.6: Efficiency graph and comparison table.



Figure 8.10.7: Die micrograph.

# Session 9 Overview: *Wireless Transceivers and Techniques*

## WIRELESS SUBCOMMITTEE

**Session Chair:***Alan Wong**EnSilica, Abingdon, United Kingdom***Associate Chair:***Xin He**NXP, Eindhoven, The Netherlands***Subcommittee Chair: *Stefano Pellerano, Intel, Hillsboro, OR***

Wireless technologies continue to penetrate and support a wide range of application areas. This session includes state-of-the-art wireless transceivers for car radar, synthetic-aperture imaging radar, RF-to-bits cellular base stations, and 60GHz dual polarization MIMO. Furthermore, wireless techniques to enhance performance are presented, including a full duplex self-interference-cancellation FDD transceiver, an automatic tracking 3flo suppression notch filter technique for LTE HPUE and a high-efficiency outphasing PA using a triaxial balun combiner.

**8:30 AM****9.1 A Multimode 76-to-81GHz Automotive Radar Transceiver with Autonomous Monitoring***B. P. Ginsburg, Texas Instruments, Dallas, TX*

In Paper 9.1, Texas Instruments describes an RF-to-bits 3TX/4RX automotive radar transceiver operating at 76 to 81GHz implemented in 45nm CMOS. The 10dBm TX has binary and linear phase modulation for MIMO and beamforming; a 15MHz BW I/Q RX has <18dB NF and a PLL chirp generator delivering <-91dBc/Hz phase noise with 4GHz ramps at up to 100MHz/μs.

**9:00 AM****9.2 A 253mW/Channel 4TX/4RX Pulsed Chirping Phased-Array Radar TRX in 65nm CMOS for X-Band Synthetic-Aperture Radar Imaging***L. Lou, Nanyang Technological University, Singapore*

In Paper 9.2, Nanyang Technological University describes a 4×4 phased-array transceiver implemented in 65nm CMOS for airborne/spaceborne Synthetic Aperture Radar Imaging reporting <30cm resolution. The TX has 14.7dBm output power, <2dB ripple, 1GHz chirp bandwidth and 1-to-10MHz/μs tunable chirp rate while the RX attains -37dBm IP<sub>1dB</sub> and 7.1-to-7.9dB NF.

**9:30 AM****9.3 A Highly Reconfigurable 65nm CMOS RF-to-Bits Transceiver for Full-Band Multicarrier TDD/FDD 2G/3G/4G/5G Macro Basestations***D. J. McLaurin, Analog Devices, Raleigh, NC*

In Paper 9.3, Analog Devices describes a wideband 65nm CMOS direct-conversion 2RX/2TX/1FBRX RF-to-bits basestation transceiver with operating bandwidths of 200/450/450MHz for 2G/3G/4G/5G macro basestations. For FDD systems the TRX achieves >90dB in-band SFDR for GSM blocker scenarios and <-85dBc non-IM3 in-band TX emissions.



10:15 AM

**9.4 A 40Gb/s 6pJ/b RX Baseband in 28nm CMOS for 60GHz Polarization MIMO***C. Thakkar, Intel, Hillsboro, OR*

In Paper 9.4, Intel describes a 60GHz receiver baseband that supports dual polarization MIMO. The work presents a mixed-signal 384-coefficient FFE and 2400-coefficient DFE-based area-efficient polarizer-equalizer, integrated with CDR. The prototype supports up to 40Gb/s LOS 16QAM and 14Gb/s NLOS QPSK and is implemented in 28nm CMOS.



10:45 AM

**9.5 A 27.8Gb/s 11.5pJ/b 60GHz Transceiver in 28nm CMOS with Polarization MIMO***S. Daneshgar, Intel, Hillsboro, OR*

In Paper 9.5, Intel describes a 60GHz dual-polarization MIMO transceiver. It utilizes orthogonal polarization modes to enable simultaneous independent data streams. Using two TX/RX with a single PCB antenna, the work demonstrates over-the-air 2x13.9Gb/s 16QAM.



11:15 AM

**9.6 A 120Gb/s 16QAM CMOS Millimeter-Wave Wireless Transceiver***K. K. Tokgoz, Tokyo Institute of Technology, Tokyo, Japan*

In Paper 9.6 the Tokyo Institute of Technology presents a wideband transceiver achieving 120Gb/s data-rate 16QAM. It simultaneously up/downconverts two 15GHz signal channels located at 70GHz and 105GHz. The 70GHz and 105GHz LO signals are generated by a doubler and a tripler from an external 35GHz source.



11:30 AM

**9.7 A Broadband and Deep-TX Self-Interference Cancellation Technique for Full-Duplex and Frequency-Domain-Duplex Transceiver Applications***K-D. Chu, University of Washington, Seattle, WA*

In Paper 9.7, the University of Washington describes a full duplex transceiver with integrated electrical balance duplexer. Together with two feedforward self-interference-cancellation signal paths, it achieves cancellation of 70dBc (40MHz BW) and 65dBc (80MHz BW), with an RX NF degradation of 1.6dB at 12.5dBm PA output power at around 1.8GHz.



11:45 AM

**9.8 A 1.4-to-2.7GHz High-Efficiency RF Transmitter with an Automatic 3F<sub>LO</sub>-Suppression Tracking-Notch-Filter Mixer Supporting HPUE in 14nm FinFET CMOS***Q. Liu, Samsung Electronics, Suwon, Korea*

In Paper 9.8, Samsung Electronics presents a 1.4-to~2.7GHz transmitter with automatic tracking 3f<sub>lo</sub> notch filter at mixer output to lower CIM3. At 5.1dBm output for LTE PC2 HPUE it achieves -54.4dBc CIM3 while consuming 136.8mW DC power.



12:00 PM

**9.9 A High-Efficiency 28GHz Outphasing PA with 23dBm Output Power Using a Triaxial Balun Combiner***B. Rabet, University of California, San Diego, La Jolla, CA*

In Paper 9.9, the University of California, San Diego describes a 28GHz outphasing PA using a triaxial balun as the low loss Chireix combiner. The SiGe PA reaches a saturated output power P<sub>sat</sub> of 23dBm from a 4V supply with a peak PAE of 41% at 21dBm and a PAE of 34.7% at 6dB backoff.

## 9.1 A Multimode 76-to-81GHz Automotive Radar Transceiver with Autonomous Monitoring

Brian P. Ginsburg<sup>1</sup>, Karthik Subburaj<sup>2</sup>, Sreekiran Samala<sup>1</sup>, Karthik Ramasubramanian<sup>2</sup>, Jasbir Singh<sup>2</sup>, Sumeer Bhataria<sup>2</sup>, Sriram Murali<sup>2</sup>, Dan Breen<sup>1</sup>, Meysam Moallem<sup>1</sup>, Krishnanshu Dandu<sup>1</sup>, Saket Jalan<sup>2,3</sup>, Neeraj Nayak<sup>1</sup>, Rittu Sachdev<sup>2</sup>, Indu Prathapan<sup>2</sup>, Karan Bhatia<sup>1</sup>, Tim Davis<sup>1</sup>, Eunyoung Seok<sup>1</sup>, Harikrishna Parthasarathy<sup>2</sup>, Rohit Chatterjee<sup>2</sup>, Venkatesh Srinivasan<sup>1</sup>, Vito Giannini<sup>1,4</sup>, Anil Kumar<sup>2</sup>, Ross Kulak<sup>1</sup>, Shankar Ram<sup>2</sup>, Pankaj Gupta<sup>2</sup>, Zahir Parkar<sup>2</sup>, Sachin Bhardwaj<sup>2</sup>, Rakesh Y.C.<sup>2</sup>, Rajagopal K. A.<sup>2</sup>, Arun Shrimali<sup>2</sup>, Vijay Rentala<sup>1</sup>

<sup>1</sup>Texas Instruments, Dallas, TX

<sup>2</sup>Texas Instruments, Bangalore, India

<sup>3</sup>Intel, Bangalore, India

<sup>4</sup>Uhnder, Austin, TX

Radar is a key sensing technology for advanced driver assistance systems and autonomous vehicles due to its strong detection capability, long range, and robustness to environmental variations such as inclement weather and lighting extremes. As these radars demonstrate increased levels of integration and performance [1,3], it is desired to have a single multimode radar transceiver that can address the stringent form factor constraints of corner radars and the wide and narrow field-of-view requirements of front radars used in urban and highway driving, respectively. This paper presents a single-chip 76-to-81GHz radar transceiver, which utilizes frequency-modulated continuous wave (FMCW) synthesis, 3 transmitters, and 4 receivers with integrated ADCs, built in a 45nm CMOS technology. It achieves high resolution and flexible multimode operation to address all classes of short, medium, and long range. The design also features autonomous fault monitoring of the RF chain to support system-level functional safety.

Figure 9.1.1 shows the chip block diagram, with the clock system shown in Fig. 9.1.2. The fast-chirp FMCW signal is generated in a closed-loop 19-to-20.25GHz fractional-N frequency synthesizer, which uses a high frequency reference clock from an integer-N clean-up PLL to suppress its quantization noise. The synthesizer output is multiplied by four and then distributed to the TX and RX channels. For MIMO radar use cases, each transmitter supports independent binary phase modulation and on/off keying. Further, each TX has a linear phase shifter with 6° phase step for beamforming in long range applications. A four-way power-combined PA delivers typical 12dBm to a grounded CPW waveguide on the PCB.

While FMCW receivers typically use a real-only baseband (BB) [1,3], this work uses a complex (I/Q) BB. Besides rejecting the LNA image-band noise, the I/Q BB has several additional advantages, including crossing interferers detection using the image band energy vs. time information and digital compensation of delay errors between RX channels. The latter is especially critical for the cascaded configurations described below due to the difficulty in matching routings to the large number of antennas. A variable gain LNA drives I/Q passive mixers, followed by a second-order HPF, BB amplification, and 15MHz BW delta-sigma modulators. Higher IF BW and faster ramps increase the maximum unambiguously detectable velocity. The output is decimated down to a finely programmable output sampling rate up to 18.75MHz. The image band is optionally rejected in the digital front-end to output the same data-rate as a real receiver over the CSI-2 interface.

Highly automated driving needs significantly higher angular resolution than can be achieved from a single chip. Cascading multiple MMICs requires (1) maintaining coherence of LO phase noise such that it is correlated across all TX/RX pairs and (2) minimizing any delay mismatch in the LO and ADC sampling instants that would lead to angle estimation errors. The LO synchronization, performed at ~19GHz, as shown in Fig. 9.1.2, reduces the board routing loss by roughly half compared to synchronizing at 76GHz, allowing larger spacing or fanout to more devices. The synthesizer of one chip, configured as master, generates and delivers the LO signal to one or both of its output buffers. After symmetric routing on PCB, all chips, including the master, receive this LO signal through their input buffer. Two input-output pin pairs are placed on two opposite edges of the package, to ease the PCB routing requirements and allow cascading of up to 4 chips (including 1 master) with a single 19GHz routing layer on PCB. Further, instead of using two input buffers or a switch to select the package edge for input, the two balls are shorted on the package through  $\lambda/4$  transmission lines. On the board, the unused input ball is simply shorted, which transforms to an open at the T-junction with minimum loss and reflection for the desired path.

At the start of every frame, the master's frame sync sends a single digital edge that is routed to all chips symmetrically and synchronized to the clean-up PLL to provide a common notion of the chirp envelope timing and ADC sampling instants, with ~1ns uncertainty. The frame sync and chirp sequencer being tightly coupled to the RF circuits also enables system-level power reduction by efficiently turning off the RF paths outside chirp durations.

The device contains an embedded ARM MCU that autonomously monitors key RF performance parameters. A directional coupler-based power detector at the TX output detects faults in the PA or the package (Fig. 9.1.1). The RX gain, noise figure, and phase imbalance are detected using a loopback signal (Fig. 9.1.3). The output of the TX phase shifter, which embeds a branch-line hybrid coupler and a Gilbert cell vector modulator, is injected to each of the RXs through a switched amplifier, a symmetrical splitter, and a capacitive coupler. This coupling allows autonomous internal test signal generation for the RXs during their monitoring. Varying the phase shift at a constant rate produces a frequency shift, which creates a non-zero-IF of the loopback signal after downconversion in the RX mixer; disabling the PA eliminates undesired coupling. A peak detector is included in the test signal path before the capacitive coupler to sense the absolute RX input power during the monitoring. Also, digital processing capability is included to synchronously collect ADC samples from all the RXs during the monitoring and to compute the amplitude and phase of the test signal IF frequency component. The RX gains are estimated from the difference in the test signal amplitude as measured from the ADC outputs and the peak detector. Also, given the test signal path symmetry across RXs, inter-RX differences in the amplitudes (phases) of the IF component at the ADC outputs are detected as inter-RX gain (phase) imbalance. To estimate the RX noise figure, the TXs and test signal are switched off and PSD of the RX ADC output noise is computed. It is converted to input-referred noise PSD or noise figure using the previously estimated RX gain. In measurements, the estimated parameters are accurate to 1 to 2dB for gain and noise figure (Fig. 9.1.4, bottom) and a few degrees for phase imbalance.

The MMIC is implemented in a 9 metal layer 45nm bulk CMOS process. The RF transceiver portion occupies 22mm<sup>2</sup> as shown in Fig. 9.1.7. The die is flip-chip assembled into a standard BGA package, which has low RF insertion loss and mechanical robustness, meeting automotive reliability requirements. Figure 9.1.4 shows the measured TX and RX performance. The TX outputs >10.8dBm across the junction temperature range of -40°C to +125°C. The noise figure is <18dB with a -7dBm 1dB compression point in the HPF stop-band, as needed to handle large reflections from the bumper covering the radar module. The clock system demonstrates (Fig. 9.1.5) -94dBc/Hz phase noise @ 1MHz offset between 76 and 77GHz and <-91dBc/Hz up to 81GHz with better than 0.1% nonlinearity for a 4GHz ramp at 100MHz/μs. Operating from 1/1.2/1.8V supplies, the radar consumes 3.5W in continuous operation with all channels active. A summary of the performance and comparison to prior art is shown in Fig. 9.1.6. The single-chip SiGe implementation [3] does not include the baseband/ADC processing or integrated chirp synthesis. Compared to the other CMOS solutions, this TRX operates over a greater temperature range and has superior phase noise compared to the other integrated PLL solutions [1,2] and can handle large reflections from the bumper, up to 20dB below the TX output power, which is much higher than the TX-to-RX spillover considered in [2]. Overall, this demonstrates a highly integrated solution, including phase shifting, ADCs, and autonomous monitoring, and is well suited for complex multimode driving scenarios in next generation automotive radars.

### Acknowledgments:

The authors would like to thank all the members of the RAP, SCP, and ATD organizations that contributed to this work, including C. Chi, A. Prasad, G.C. Jung, C. Kumar Y.B., S. Rao, D. Sahu, D. Fernandes, A. Karkisval, P. Narayanan, K. Rajagopalan, G. Morrison, M. Mi, M. Gupta, A. Mani, P. Kalyan, K. Kukkar, T. Altus, V. Ayyagari, N. Kodur, N. Narayanan, S. Chellappan, W. Pradeep, N. Naresh, K. Roush, E. Estacio, J. Gupta, P. Inuganti, A. Killedar, R. Sheth, S. Rangachari, M. Subramaniam, S. Polarouthu, S. Martin, V. Kulkarni, P. Saraf, Vasudeva G., D. Shetty, B. Sharma, and S. Ramakrishnan, and S. Anandwade.

### References:

- [1] T. Shimura, et al., "Multi-Channel Low-Noise Receiver and Transmitter for 76-81 GHz Automotive Radar Systems in 65 nm CMOS," *European Microwave Conf.*, pp. 596-599, 2014.
- [2] D. Guermandi, et al., "A 79-GHz 2x2 MIMO PMCW Radar SoC in 28-nm CMOS," *IEEE JSSC*, vol. 52, no. 10, pp.2613-2626, Oct. 2017.
- [3] T. Fujibayashi, et al. "A 76- to 81-GHz Packaged Single-Chip Transceiver for Automotive Radar," *IEEE BCTM*, pp. 166-169, 2016.



Figure 9.1.1: Radar transceiver conceptual block diagram.



Figure 9.1.2: Clock system diagram, with example board connections for a two-chip use case.



Figure 9.1.3: TX phase shifter and TX-to-RX loopback implementation.



Figure 9.1.4: (Top) Measured output power. (Bottom) Noise figure measured externally and via autonomous monitoring.



Figure 9.1.5: Measured phase noise at 76GHz (top) and frequency vs. time of a 77-to-81GHz ramp in 40us after downconversion to 1 to 5GHz (bottom).

|                                    | This work         | Trotta,<br>TMTT 3/12 | Knapp,<br>Wagner,<br>RFIC 2012 | [1]                 | [2]                | [3]                  |
|------------------------------------|-------------------|----------------------|--------------------------------|---------------------|--------------------|----------------------|
| Technology                         | 45nm<br>CMOS      | 180nm SiGe<br>BiCMOS | SiGe HBT                       | 65nm<br>CMOS        | 28nm<br>CMOS       | 130nm SiGe<br>BiCMOS |
| # TX / # RX                        | 3 / 4             | 1 / 4                | 3 / 4                          | 2 / 4               | 2 / 2              | 1 / 6                |
| # of chips                         | 1                 | 2                    | 2                              | 2                   | 1                  | 1                    |
| Output power                       | 10.8 dBm          | 11 dBm               | 9.6 dBm                        | 11 dBm              | 8.5 dBm            | 13 dBm               |
| Phase shifter                      | 6°                | —                    | —                              | —                   | —                  | —                    |
| Noise figure                       | 18 dB             | 15 dB                | 16.5 dB                        | 12 dB               | 12 dB              | 13.5 dB              |
| IP1dB                              | -7 dBm            | -5 dBm               | -4 dBm                         | —                   | —                  | +3 dBm               |
| IF BW                              | 15 MHz            | —                    | —                              | 200 kHz             | 1 GHz              | —                    |
| RX ADC                             | 10.5 ENOB         | —                    | —                              | —                   | 6 ENOB             | —                    |
| Phase noise @ 1MHz offset          | -91 dBc/Hz        | -97 dBc/Hz           | -98 dBc/Hz                     | -84 dBc/Hz          | -85 dBc/Hz         | -97 dBc/Hz           |
| Ramp BW                            | 4 GHz             | 4 GHz                | 3 GHz                          | 5 GHz               | 4 GHz              | 4 GHz                |
| Ramp rate                          | 100 MHz/us        | —                    | —                              | 250MHz/us           | 0 (PMCW)           | —                    |
| Chirp nonlinearity                 | 0.06%             | —                    | —                              | —                   | 0 (PMCW)           | —                    |
| BIST                               | Autonomous        | Partial              | Partial                        | None                | None               | Partial              |
| Temperature                        | -40 to +125       | -40 to +125          | -40 to +125                    | -40 to +90          | 27                 | +25 to +125          |
| Area                               | 22mm <sup>2</sup> | 18.5mm <sup>2</sup>  | 18mm <sup>2</sup>              | 33.6mm <sup>2</sup> | 7.9mm <sup>2</sup> | 12mm <sup>2</sup>    |
| Power dissipation, 100% duty cycle | 3.5W              | 2.54W                | 3.3W                           | 1.1W                | 1W                 | 1.8W                 |

Figure 9.1.6: Summary and comparison table. Numbers are worst case across reported frequency and temperature.



Figure 9.1.7: Die micrograph highlighting the 22mm<sup>2</sup> transceiver and select sub-blocks.

## 9.2 A 253mW/Channel 4TX/4RX Pulsed Chirping Phased-Array Radar TRX in 65nm CMOS for X-Band Synthetic Aperture Radar Imaging

Liheng Lou<sup>1</sup>, Kai Tang<sup>1</sup>, Bo Chen<sup>1</sup>, Ting Guo<sup>1</sup>, Yisheng Wang<sup>2</sup>, Wensong Wang<sup>1</sup>, Zhongyuan Fang<sup>1</sup>, Zhe Liu<sup>1</sup>, Yuanjin Zheng<sup>1</sup>

<sup>1</sup>Nanyang Technological University, Singapore, Singapore

<sup>2</sup>Singapore University of Technology and Design, Singapore, Singapore

Airborne or spaceborne synthetic aperture radar (SAR) targeted for micro-unmanned aerial vehicles (UAV) or micro-satellites is capable of observing large area under all weather conditions with strong penetration, and finds many emerging applications in defence, geology, oceanography, agriculture and other areas [1]. Phased-array SAR, operating in three modes of spotlight, stripmap and scanSAR, can significantly improve the spatial resolution and detection SNR by steering the beams to the targets. The existing SAR system is still constrained by power, size, and performance for payload in micro-UAVs/micro-satellites. This paper describes the development of an X-band phased-array radar TRX IC achieving a power of 253mW/channel, and resolution of <30cm, with a die size of 7.8mm<sup>2</sup>.

Conventional radar systems are narrowband systems and beamforming is achieved using phase shifters. To achieve a wideband phased-array operation, a true time delay is required in each element [2]. In addition, there always exists the trade-off between large-angle beamforming and achieving fine-angle resolution. To solve these issues, a two-stage delay control with calibration is employed in the transmitter, where a DLL is used to control the coarse true time delay of the baseband chirp and a compact active phase shifter in the RF path is used for fine delay tuning. As shown in Fig. 9.2.1, a DLL-based multiphase synthesizer (MPS) generates the multiphase delayed clocks with precise delay control [2]. Then each clock is used as a reference clock input to a corresponding direct digital synthesizer (DDS) in baseband chirp generation that then gets phase-reserved upconverted to an RF wideband chirp by a PLL. In this way, a relatively large time delay (tens of ps) in coarse steps between adjacent RF chirp signals can be obtained through the MPS, and fine time-delay adjustment (<1ps) is subsequently achieved with a 6b phase shifter. For the beamforming receiver, digital beamforming is a more flexible way to apply advanced algorithms. The receiver is capable of working in two modes: without or with stretch processing [3]. An external reference LO generator chip is developed for both processing modes. For the former mode, direct downconversion is applied, and the downconverted IF signals will occupy the same bandwidth as the RF chirps, requiring at least double the bandwidth in ADC sampling; For the latter mode, a time-of-return (TOR) predictor with <0.1μs delay resolution estimates the starting time of the LO chirps with the knowledge of the UAV/Satellite's altitude, and then activates the reference LO chirp of the same slope rate as the RF chirp for compressing the signal bandwidth, significantly relaxing the requirement on the ADCs.

The paper presents a 4TX/4RX pulsed chirping phased-array TRX for SAR operating at X-band. As shown in Fig. 9.2.1, the MPS employing fractional DLLs and a Vernier delay matrix (VDM) [2] provides a maximum clock time delay up to 200ps with 20ps steps for coarse delay tuning with jitter of 1.67ps RMS. It resorts to 174ps delay to achieve a maximum beamsteering angle of ±60° when antennas are separated by 2λ (6cm) pitch. A compact DDS is employed for tunable baseband chirp generation with frequency from 74.2 to 82.1MHz, chirp duration of 20μs to 1ms and pulse repetition rate of 1ms. The PLLs work as closed-loop frequency multipliers to obtain 10GHz centred chirps up to 1GHz BW with a linearity of 183kHz-to-1.1MHz RMSE, which preserves the same relative true time delay between RF chirps as those of MPS outputs and governs a resolution of <30cm for long-range detection. An active phase shifter (PS) is used to attain a fine beamsteering resolution of 1° that requires 5.86° phase-shift resolution, translating to 1.67ps true time delay. The ripple at the TX output is minimized to <2dB by employing a saturated power amplifier (PA), and with the LNA having 0.2dB gain ripple in RX, no significant paired echo is observed. The RX baseband (RXBB) is employed after a mixer with interleaved filter and programmable-gain amplifier (PGA) stages to improve the linearity to the IP<sub>1dB</sub> of -5dBm and suppress the antenna leakage and other IF interferences by >60dB.

Figure 9.2.2 shows the key building block circuits. For the 400MHz fractional DLL, a 16b, 3<sup>rd</sup>-order ΔΣ modulator (DSM) selects the outputs of 4 adjacent voltage-control delay cells (VCDCs) to form the required fractional delay that is mapped to the VDM to further generate four delayed clocks for the DDSs. The DDS mainly consists of a 20b frequency/phase accumulator-based digital modulation core and a linear/nonlinear hybrid digital-to-analog converter (DAC). It achieves 10b quarter wave-resolution with a 4b coarse DAC and a 6b fine DAC using 1b to generate a full sine wave. The PS, implemented as an I/Q vector sum phase shifter after a 2-stage polyphase filter, employs a compact 6b binary-weighted current-source array (CSA) associating with a 3b current-bleeding CSA for constant-current calibration and phase trimming. A 3-stage transformer-coupled PA achieves 25.1dB gain and 15.2dBm saturated output power. A programmable gain (PG) LNA is employed to adapt to echoes' power level. The first-stage common-source LNA uses a transistor with a size of 96μm/60nm to reduce the NF, and the cascade transistor arrays of the second stage are used to program the LNA gain. An on-chip balun is designed to match the single-ended LNA to a following differential neutralized dynamic current-injection-based Gilbert mixer. The RXBB is composed of a fixed gain stage (GS), coarse GSs, a 5<sup>th</sup>-order elliptic Gm-C BPF, a fine GS and a linear output buffer. These stages are interleaved to achieve a tunable frequency range of 60 to 280MHz (for both dechirping and <280MHz chirp downconversion) and a gain range of 0 to 60dB in steps of 1dB.

The phased-array TRX chip was fabricated in a 65nm CMOS process (Fig. 9.2.7) and consumes 253mW/channel at 1.2V (33.8mW DLL MPS, 21.5mW DDS, 22.9mW PLL, 15.2mW PS, 111.2mW PA, 23.6mW RX RFFE, 50.2mW RXBB). The chip was mounted on a PCB for testing. The measured spectrum of the TX output is shown in Fig. 9.2.3, with 9.7dBm output power (After the 5dB loss from the PCB, cable and SMA is taken into account, the calibrated TX has output power of 14.7dBm), <2dB ripple, 1GHz (10%) BW, 1.69ps DLL RMS jitter and -118.8dBc/Hz PLL PN at 1MHz offset. PS exhibits <1.7° phase RMSE and <0.6dB gain RMSE. In Fig. 9.2.4, the RX RFFE attains variable gain of 15.3 to 28.6dB from 9.5 to 10.5GHz, <1.1dB ripple, -37dBm IP<sub>1dB</sub>, and 5.7-to-6.5dB DSB NF (7.1-to-7.9dB DSB NF for RX at 28.6dB RFFE gain). The RXBB gain and frequency responses are shown at three typical settings. The delay-line test shows that the dechirped pulse from RFFE output achieves -12.9dB peak sidelobe ratio (PSLR) for the 33.8kHz tone and phase coherence of 0.45° RMS at the chirp rate of 1MHz/μs, similarly applied to other frequencies. The -3dB BW of the main lobe is measured as 0.52kHz, which translates to a long-range resolution of <30cm. An implementation of 8dBi Vivaldi antenna with 6GHz bandwidth is demonstrated. With phased-array synthetic aperture formed, it achieves an azimuth resolution <17cm.

Figure 9.2.5 shows a typical TRX radiation pattern co-tested with the 4×4 Vivaldi antenna array of -53dB isolation. For TX beamsteering, a chirp rate of 1 to 10MHz/μs is adopted; For RX beamforming, the dechirped IF signals are collected to form digital beamforming off-line. With antennas 6cm apart from each other, the TX can beamsteer across angles up to ±60° with steps of 1°, and the RX can continuously scan the same range with a resolution of <1°, which enhances the spatial resolution in SAR imaging. Performance of the phased-array prototype and some prior arts [3-6] are summarized in Fig. 9.2.6. The X-band SAR is widely used for airborne and spaceborne applications though only a few IC implementations have been reported. The compact design of the phased-array radar IC makes it well suited for micro-UAV and micro-satellite applications.

### References:

- [1] Christopher F. Barnes, Synthetic Aperture Radar. Barnes, 2015.
- [2] L. Wang, et al., "3–5 GHz 4-Channel UWB Beamforming Transmitter with 1° Scanning Resolution Through Calibrated Vernier Delay Line in 0.13-μm CMOS," *IEEE JSSC*, vol. 47, pp. 3145–3159, Dec. 2012.
- [3] J. Yu, et al., "An X-Band Radar Transceiver MMIC with Bandwidth Reduction in 0.13μm SiGe Technology," *IEEE JSSC*, vol. 49, pp. 1905–1915, Sept. 2014.
- [4] B. Ku, et al., "A 77–81-GHz 16-Element Phased-Array Receiver with ±50° Beam Scanning for Advanced Automotive Radars," *IEEE TMTT*, vol. 62, pp. 2823–2832, Nov. 2014.
- [5] P. Chen, et al., "A 94GHz 3D-Image Radar Engine with 4TX/4RX Beamforming Scan Technique in 65nm CMOS," *ISSCC*, pp.146–147, Feb. 2013.
- [6] T. Chu, et al., "A Short-Range UWB Impulse-Radio CMOS Sensor for Human Feature Detection," *ISSCC*, pp. 294–296, Feb. 2011.



Figure 9.2.1: The proposed phased array TRX IC architecture.



Figure 9.2.2: Circuit schematics of the MPS, DDS, PS, VGLNA and RXBB.



Figure 9.2.3: Measured DLL jitter (top left), PLL PN (top right), TX chirp spectrum (bottom left), and PS RMSE (bottom right).

Figure 9.2.4: Measured RX RFFE (top left), RXBB (top right), dechirped spectrum and phase coherence (bottom left), and antenna  $S_{11}$  (bottom right).

Figure 9.2.5: 4TX beamsteering radiation pattern (top left), 4RX beamforming pattern (top right), and phased array radar prototype (bottom).

|                           | JSSC2014 [3]          | TMTT2014 [4]             | ISSCC2013 [5]           | ISSCC2011 [6]                                | This work                                 |
|---------------------------|-----------------------|--------------------------|-------------------------|----------------------------------------------|-------------------------------------------|
| Application               | N.A.                  | Automotive radar         | Imaging sensor          | Human Feature Detection                      | Synthetic Aperture Radar                  |
| Transceiver Architecture  | DDS+PLL+RFFE+RXBB+ADC | RFFE+PS                  | PLL+Tripler +PS+PA+RFFE | Osc+DLL +RFFE+RXBB                           | DLL+DDS +PLL+PS +RFFE+RXBB                |
| Channel Num.              | TX+RX                 | 16RX                     | 4Tx+4Rx                 | TX+RX                                        | 4Tx+4Rx                                   |
| Tx output BW              | 150MHz (1.8%)         | 0.5GHz (0.64%)           | 200MHz                  | 1.6-2.5GHz                                   | 1GHz (10%)                                |
| Modulation Method         | Chirp                 | FMCW                     | Pulsed RF               | Pulsed RF                                    | Pulsed chirp                              |
| Modulation Period         | N.A.                  | N.A.                     | 100ns                   | 100ns                                        | 200ns-1ms                                 |
| Stretch Processing        | YES                   | NO                       | NO                      | NO                                           | (Dual mode)                               |
| Center Freq.              | 8.5GHz                | 77GHz                    | 94GHz                   | 0.8-5GHz<br>N.A.<br>$\pm 60^\circ @ 1^\circ$ | 10GHz<br>N.A.<br>$\pm 60^\circ @ 1^\circ$ |
| Beamforming               | N.A.                  | $\pm 50^\circ @ 1^\circ$ | $\pm 28^\circ$          | N.A.                                         | $\pm 60^\circ @ 1^\circ$<br>Digitally     |
| TX Pout                   | N.A.                  | N.A.                     | -5dBm                   | N.A.                                         | 14.7dBm                                   |
| RX RFFE Gain              | 10.7-44.7dB           | 22.5dB                   | 27-40dB                 | 12dB                                         | 15.3-28.6dB                               |
| RX RFFE NF                | 2.4-4.5               | 18dB                     | N.A.                    | 4.5dB                                        | 5.7-5.5dB<br>(DSB)                        |
| RX RFFE IP <sub>1dB</sub> | <35dBm                | N.A.                     | N.A.                    | N.A.                                         | <37dBm                                    |
| Power per Chl.            | 659mW                 | 75mW (RX only)           | 240mW                   | 695mW                                        | 253mW                                     |
| Technology                | 0.13μm SiGe           | SiGe                     | 65nm CMOS               | 0.13μm CMOS                                  | 65nm CMOS                                 |

Figure 9.2.6: Performance summary and comparison.



Figure 9.2.7: Die micrograph and layout.

### 9.3 A Highly Reconfigurable 65nm CMOS RF-to-Bits Transceiver for Full-Band Multicarrier TDD/FDD 2G/3G/4G/5G Macro Basestations

David J. McLaurin<sup>1</sup>, Kevin G. Gard<sup>1</sup>, Richard P. Schubert<sup>2</sup>, Manish J. Manglani<sup>3</sup>, Haiyang Zhu<sup>4</sup>, David Alldred<sup>5</sup>, Zhao Li<sup>5</sup>, Steven R. Bal<sup>1</sup>, Jianxun Fan<sup>1</sup>, Oliver E. Gysel<sup>1</sup>, Christopher M. Mayer<sup>2</sup>, Tony Montalvo<sup>1</sup>

<sup>1</sup>Analog Devices, Raleigh, NC

<sup>2</sup>Analog Devices, Norwood, MA

<sup>3</sup>Analog Devices, Greensboro, NC

<sup>4</sup>Analog Devices, Wilmington, MA

<sup>5</sup>Analog Devices, Toronto, Canada

This paper presents a 65nm 2-TX, 2-RX RF-to-bits basestation transceiver with 200MHz large-signal BW and 450MHz DPD synthesis/observation BW, and LO frequencies from 400MHz to 6GHz. For FDD operation the TRX supports a low-IF mode that meets the dynamic range requirements of GSM basestations. It provides full-band multicarrier (MC) operation in all TDD/FDD 3GPP bands for 2G/3G/4G/5G radios. The SoC includes an 8×16Gb/s SERDES interface, two receivers, two transmitters, and a digital pre-distortion (DPD) feedback RX (FBRX) (Fig. 9.3.1). The FBRX employs a “stitching” system that combines the outputs of both RX basebands to provide 450MHz of observation BW. Three PLLs provide the digital/converter/SERDES clocks, a calibration LO, and an RF LO that meets GSM TX phase-noise requirements. Digital interpolation, decimation, AGC, TX Power control, and calibrations are managed by an integrated ARM Cortex M3. Internal calibration timing is adaptable to support 3G/4G/5G subframe timing requirements. The SoC is a single-chip solution for TDD, and a two-chip set for FDD. GSM requires an external LO for the RX. Power dissipation in the maximum BW mode (2T/2R/1FBRX, 450/200/450MHz, 0dB RF attenuation, 50% TX/RX duty cycle for TDD) is 4.1W for TDD and 6.6W for FDD.

Authors in [1-3] have demonstrated ZIF transceivers aimed at 3G/4G small cells, but with 100MHz RF BW they cannot provide full-band support in 200MHz TDD bands. They also lack the RX and TX SFDR to support MC-GSM. The GSM standard specifies basestation tests with 15dB higher RX blockers and 3dB lower RX desensitization than UMTS/LTE for channel BW of only 180kHz [4], and the TX has similarly difficult requirements. This makes true ZIF difficult for GSM basestations.

Because the widest 3GPP FDD band has 75MHz BW, the 200MHz RF BW required for full-band TDD operation enables a low-IF approach that can support MC-GSM. As shown in Fig. 9.3.3, a low-IF architecture can relax performance requirements because spurious components due to several direct-conversion impairments fall out of band, including flicker noise, LO leakage, quadrature error and 3<sup>rd</sup>-order harmonic distortion (HD3) due to in-band blockers. HD2 and HD3 image (the portion of HD3 that falls on the same sideband due to imperfect non-linear I/Q gain/phase matching) still fall in the desired sideband. To address this, the RX includes digital correction hardware that achieves >100dB rejection of in-band HD2. The RX HD3 image is rejected by greater than 30dB. The RX LO quadrature error correction (QEC) combines wideband digital correction with I/Q delay control on the local oscillator which, unlike digital-only phase correction, corrects the image at all LO harmonics and minimizes downconversion of the image of LNA/attenuator/mixer HD3 into the desired sideband. The TX employs a static calibration of the DAC current-source array and high-Z current-source design to minimize TX HD2. With integrated QEC, low-IF GSM support is accomplished without the increased rejection from the antenna filter that IF sampled MC-GSM TRXs require.

The RX, shown in Fig. 9.3.2, includes a linear passive RF attenuator, a passive quarter-duty-cycle mixer, single-pole transimpedance amplifiers and continuous-time  $\Delta\Sigma$  (CTSD) ADCs. The attenuator is divided into eight binary-scaled segments. When a segment is disabled shunt resistors are connected across the mixer inputs and outputs to minimize the impact of impedance changes on the RF match and TIA tuning. NF and linearity increase dB-for-dB with attenuation over a 30dB gain range (Fig. 9.3.3). This is important because GSM in-band blockers can be large (-16dBm [4]) and require significant attenuation without

degrading SFDR. The 4<sup>th</sup>-order CTSD ADC is clocked at rates up to 2GHz and has an SNR of 71dB over 200MHz bandwidth. The receiver achieves over 90dB of in-band SFDR in low-IF mode (Fig. 9.3.3). The RX is preceded by an external LNA in the system.

The TX, shown in Fig. 9.3.2, comprises two 14b current steering DACs clocked at a maximum frequency of 2GHz. An opamp RC 2<sup>nd</sup>-order Butterworth TIA provides a low impedance input to minimize DAC non-linearity and can be tuned from 50MHz to 225MHz -3dB BW. With fine tuning of the capacitors between the I and Q paths combined with analog LO delay control, the TX achieves a guaranteed 65dB undesired sideband (USB) across the maximum 200MHz large-signal BW. An opamp-based current mirror converts the TIA output to a current and mirrors it to the 64-segment upconverter. Combined with digital gain interpolation between analog steps, the upconverter provides 36dB of gain control in 0.05dB steps, with an INL of 0.1dB over any 4dB step and a +/-0.04dB step error. The TX saturated output power is +7dBm. In-band noise is -148dBm/Hz and reduces dB-per-dB with increased TX attenuation. An internal loopback path to the FBRX baseband allows for autonomous TX calibration. ACLR for a -5dBm (-12dBFS rms power) 20MHz LTE carrier at a 90MHz offset from the LO is 67dB. SFDR in low-IF mode is >85dB, limited by baseband HD3 image (Fig. 9.3.4).

The FBRX supports 450MHz of RF BW for observing PA distortion. This bandwidth is achieved using an ADC stitching scheme in which the outputs of the CTSD ADCs from both receivers, one configured as lowpass and one as a bandpass, are combined to create a channel that achieves lower noise over frequency than either ADC can alone. This approach allows the FBRX to use the same ADC design as the main RX path, and allows hardware reuse in TDD systems. Figure 9.3.5 shows a block diagram of the FBRX subsystem. The receivers and FBRX use identical RF front-ends and share a common baseband path. In FBRX mode, muxes at the TIA outputs pass the baseband signal to both the RX1 and RX2 ADCs. A “channel alignment” observation and correction block inserted after RX QEC seeks to flatten, linearize, and match the frequency response of the RX1 and RX2 channels. This is achieved using a one-time startup calibration in which a harmonically rich signal is injected into the TIA inputs to serve as the calibration reference. A non-adaptive channel merge filter then combines the two channels based on the expected ADC NTFs for a given BW profile. The FBRX achieves 58.5dB SNR, less than 0.2dB deviation in gain and less than 1° deviation from linear phase over a 450MHz RF BW.

The RF local oscillator is generated by a fractional-N PLL using a single VCO that tunes from 6GHz to 12GHz, which is followed by a programmable divider. The VCO uses four resistively coupled cores with their polarities chosen to minimize magnetic coupling. A digital state machine monitors the VCO control voltage and servos the voltage on a dedicated varactor to guarantee that the PLL remains locked over a 150°C change in temperature as well as compensating for aging. Phase noise is -151dBc/Hz at a 6MHz offset referred to a 1900MHz LO. Integrated phase noise is 0.18° rms (-50dB EVM) at 1900MHz.

Figure 9.3.6 summarizes the TRX performance and compares it to the prior state-of-the-art. This TRX has two times the RF bandwidth for similar power as prior art, and supports full-band MC-GSM. The 8.1mm×7.8mm die (Fig. 9.3.7) is packaged in a 12mm×12mm 196-ball chip-scale ball-grid array and is powered by 1.3V and 1.8V supplies.

#### Acknowledgements:

The authors would like to acknowledge the efforts of B. Glenn, C. Angell, J. Kornblum, L. Wu, B. Wilcox, A. Kagan, H. Dougan, D. Oates, J. Oates, J. Fernald, R. Waltman, P. Wiers, and the rest of the TPG team.

#### References:

- [1] N. Klemmer, et al., “A 45nm CMOS RF-to-Bits LTE/WCDMA FDD/TDD 2×2 MIMO Base-Station Transceiver SoC with 200MHz RF Bandwidth,” *ISSCC*, pp. 164-165, Feb. 2016.
- [2] C. Mayer, et al., “A Direct-Conversion Transmitter for Small-Cell Cellular Base Stations with Integrated Digital Predistortion in 65nm CMOS,” *IEEE RFIC*, pp. 63-66, July 2016.
- [3] D. McLaurin, et al., “A Direct-Conversion Receiver for Multi-Carrier 3G/4G Small-Cell Base Stations in 65nm CMOS,” *IEEE RFIC*, pp. 71-74, July 2016.
- [4] Digital Cellular Telecom. System (Phase 2+); Radio Transmission and Reception, GSM 05.05, 1996.



Figure 9.3.1: TRX Block Diagram.



Figure 9.3.2: RX schematic (top) and TX schematic (bottom).



Figure 9.3.3: RX spectrum under blocking conditions and NF, IIP3, and IIP2 vs. attenuation (top, right).



Figure 9.3.4: TX spectrum for two 5MHz LTE carriers, two GSM carriers at the bottom of band 3, and GSM emissions.



Figure 9.3.5: FBRX schematic, simulated LP, BP, and merged ADC NTF, and measured channel flatness.

| Specification                                  | This work | Klemmer [1] | Mayer [2]     | McLaurin [3] | Unit            |
|------------------------------------------------|-----------|-------------|---------------|--------------|-----------------|
| Technology                                     | 65nm CMOS | 45nm CMOS   | 65nm CMOS     |              |                 |
| Architecture                                   | ZIF RX/TX | ZIF RX/TX   | ZIF TX        | ZIF RX       |                 |
| Chip Area                                      | 68.7      | 49.0        | 17.2*         | 4.2*         | mm <sup>2</sup> |
| RF BW                                          | 200       | 100         | -             | 100          | MHz             |
| Low IF SFDR (75MHz BW)                         | 90        | -           | -             | -            | dB              |
| LO Range                                       | 400-6000  | 400-4000    | -             | 400-6000     | MHz             |
| Input Referred Full Scale                      | -10       | -15         | -             | -10          | dBm             |
| NF                                             | 12        | 12.5        | -             | 12           | dB              |
| IIP3 at edge of max BW                         | 12        | 14          | -             | 15           | dBm             |
| Noise PSD                                      | -152      | -146.5      | -             | -152         | dBFS/Hz         |
| IIP2                                           | 60        | 50          | -             | 60           | dBm             |
| RF attenuator range for constant dynamic range | 30        | 25          | -             | 20           | dB              |
| RF BW                                          | 450       | 200         | 250           | 250          | MHz             |
| Low-IF SFDR (75MHz BW)                         | 85        | -           | -             | -            | dB              |
| 20MHz LTE ACLR @ -5dBm out (band edge)         | 67        | 65          | 64            | -            | dB              |
| EVM @ 2.6GHz                                   | -47       | -46         | Not specified | -            | dB              |
| Noise at min atten (-5dBm composite out)       | -143      | -154        | -143          | -            | dBc/Hz          |
| Total SOC Power, TDD/FDD                       | 4.1/6.6   | 5/6.5       | 3.7***        | 2.7****      | W               |

\* Partial chip area: 3RX RF/analog + one PLL only

\*\* Partial chip area: 1Tx RF/analog only

\*\*\* 2Tx, 1FBRx, FDD Only.

\*\*\*\* 2Rx, FDD Only.

Figure 9.3.6: Performance summary and comparison to prior work.



Figure 9.3.7: Die Micrograph.

## 9.4 A 40Gb/s 6pJ/b RX Baseband in 28nm CMOS for 60GHz Polarization MIMO

Shinwon Kang<sup>1</sup>, Chintan Thakkar<sup>1</sup>, Nathan Narevsky<sup>2</sup>, Kaushik Dasgupta<sup>1</sup>, Saeid Daneshgar<sup>1</sup>, James Jaussi<sup>1</sup>, Bryan Casper<sup>1</sup>

<sup>1</sup>Intel, Hillsboro, OR; <sup>2</sup>University of California, Berkeley, CA

To meet burgeoning demand for PAN connectivity between content-rich consumer devices, the next-generation 60GHz standard [1] supports higher per-user bandwidth with frequency-channel bonding and enhances frequency reuse with MIMO. At millimeter-wave frequencies, orthogonal polarizations (pol) of propagation provide an additional degree of antenna/frequency reuse for MIMO multiplexing [2]. While the associated 2x increase in data-rate is promising, the feasibility of a simultaneous dual-pol (DP) link may be limited by cross-pol interference (CPI) (both same-symbol and multipath) and co-pol multipath reflection-based ISI. To efficiently mitigate CPI and ISI and enable 60GHz 2x2 DP-MIMO operation, this work demonstrates a mixed-signal pol-equalizer RX baseband with CDR (Fig. 9.4.1). The design supports equalization of QPSK/16-QAM constellations up to 5Gsymbols/s (Gsym/s) similar to 3x channel bonding [1].

The analysis of single-input, single-output (SISO) 60GHz ~5m PAN statistical channel models [3] shows that multipath delay spread may be ~5ns/15ns for pre/post-cursor ISI even for directional links. At 5Gsym/s, this translates to 25/75UI of pre/post-cursor ISI. Compensation of such a high number of coefficients has been efficiently implemented with time-domain mixed-signal FFE and DFE [3,4]. For DP-MIMO links however, CPI creates additional undesired multi-UI interaction between the orthogonally transmitted DP data streams. Conventional frequency-domain equalization techniques such as OFDM and single-carrier frequency-domain equalization do not inherently support CPI mitigation.

To better understand cross-pol, specular ray-tracing simulations of a 60GHz wireless DP-MIMO transceiver were done for an indoor environment with furniture using Remcom's Wireless Insite® software modeling multiple random instantiations of a 4-element TX/RX phased array with DP loop-antenna models. Due to variable TX-to-RX orientation and de-polarizing effects of reflector/absorber materials, the models show considerable CPI (Fig. 9.4.5). Moreover, since V/H streams may traverse different propagation paths, compensation at I/Q baseband requires 2x more coefficient combinations than two independent SISO equalizers. Support for 16QAM (4b/symbol) additionally needs 2x more pol-DFE coefficients than QPSK (2b/symbol). For the 4 baseband streams (H1, H2, V1, V2), this equates to 2400 pol-DFE coefficients (75UI × 4 CPI+ISI/UI × 2b/UI × 4 streams) and 384 pol-FFE coefficients (24UI × 4 CPI+ISI/UI × 4 streams), which is far beyond the feasible design limits of prior mixed-signal equalizers [3-5].

This work uses architecture and circuit design techniques to implement a power- and area-efficient pol-equalizer. Ensuring that the total quantization noise of 96 pol-FFE + 600 pol-DFE coefficients/stream is lower than thermal noise requires 9b/coefficient resolution for BER<10<sup>-3</sup>. Previous mixed-signal coefficient implementations used a current DAC (I-DAC) with large area (~3600μm<sup>2</sup> for 8b in 65nm CMOS [4]) to satisfy matching requirements. The area penalty of implementing 1000s of 9b coefficients is prohibitive. Fortunately, wireless channels with finite multipath reflections do not require all coefficients to simultaneously operate at maximum strength. Furthermore, reflection coefficients from longer delays have higher path-loss and lower average strength. Leveraging these traits, each I-DAC is implemented with a coarse-control mirror to achieve better resolution when operating at lower strength (Fig. 9.4.2). The matching-dominated area overhead of the fine I-DAC is relaxed by using a fully thermometer-coded design. To minimize its digital area overhead, thermometer coding is implemented hierarchically by exploiting redundancy in the combinational logic. The 4b hierarchical encoding concept is recursively extended. As a result of all optimizations, the area is only 480μm<sup>2</sup> per 9b (3b+6b) DFE I-DAC and 1916μm<sup>2</sup> per 12b (3b+9b) FFE I-DAC including encoding logic.

The pol-FFE delay line is implemented as a 24-tap current-integrating switching matrix with per-leg offset compensation [3] and consumes 3.4mW bias power per stream. To mitigate self-loading from the high number of I-DAC based coefficients and routing complexity of the 4x4 interleaved CPI cancellation summer architecture, cascode-summing is used [3,4] (Fig. 9.4.3). CMFB-based PMOS current-source loads ensure a high output common-mode voltage. The 96-coefficient pol-FFE resistor-loaded summer and 600-coefficient pol-DFE current-integrating summer consume only 1.4mW and 2.8mW bias power per stream respectively. To support 16QAM, each pol-DFE output is quantized using 7 slicers for 3 flash 4PAM thresholds and 4 data-levels for monitoring and coefficient adaptation. Flash-to-binary (3b-2b) conversion would require adding

a MUX to the tap-1 feedback. Since tap-1 feedback is latency sensitive (delay<UI/2 for current integration), the tap-1 coefficients are flash-encoded to allow removing the MUX from the critical path and incorporating the MUX with the feedback register using a latch-based design. To reuse the 16-QAM circuitry for higher QPSK equalization capability, the flip-flop chain is configurable to convert the 75UI 2b parallel chain to a 150UI 1b daisy chain.

To avoid generating additional clock phases, the PD for CDR is implemented at baud-rate based off decision logic (similar to [6]) in Fig. 9.4.4. The optimal locking phase is achieved with adjustable unequal early/late clock phase updates. Since DP-MIMO can have different TX-to-RX delays on V/H pols, the optimal sampling phases for V/H baseband data are likely not identical. This mismatch is mitigated with independent per-pol PD and phase-interpolator (PI)-based delay adjustment. Phase mismatch between ISI and CPI coefficients is self-compensated during weight adaptation. Clock skew and duty-cycle distortion across the FFE, DFE and slicer clocking paths over several millimeters of distribution is adjusted with current-starved inverter-based delay lines.

The 28nm CMOS RX baseband die (Fig. 9.4.7) was wire-bonded onto a PCB. Input analog data was generated by an arbitrary waveform generator programmed with independent >16k-length random sequences convolved with DP-MIMO channel models. The line-of-sight (LOS)/non-LOS (NLOS) models and signal swings (55mV for QPSK, 165mV for 16QAM) were obtained from over-the-air 5-to-16cm DP-MIMO measurements [2] and 5m indoor-channel ray-tracing simulation (Fig. 9.4.5). The summer gains for optimal slicer input swing and linearity, and slicer threshold settings for 4PAM per-stream were set by characterizing input voltage versus slicer offset codes. The optimal integration timing for the DFE summer relative to the FFE summer settling was achieved by adjusting the DFE-to-FFE clock delay. Pol-DFE/FFE coefficients were adapted with off-chip sign-sign LMS [7]. The fully thermometer-coded coefficients ensured monotonicity and prevented incorrect convergence. The BER was measured using on-chip self-test circuits.

The RX baseband was measured with LOS channels up to 5Gsym/s for QPSK data with 6.4x total ISI+CPI magnitudes with respect to the cursor over 150 UI, and 16-QAM data with 3.4x ISI+CPI over 75UI. Without the pol-equalizer enabled with DP-MIMO, bits were erroneously detected (BER>0.1) even at high SNR (Fig. 9.4.5). With the CDR and all pol-DFE coefficients enabled, the baseband achieved <10<sup>-6</sup> BER for 20Gb/s QPSK with 20dB SNR, and 6x10<sup>-4</sup> BER for 40Gb/s 16QAM with 25dB SNR. With all pol-FFE coefficients additionally enabled, the baseband was measured with NLOS channels at 3.5GSym/s QPSK with 4x total ISI+CPI over 24/150 pre/post-cursor UI. With this input, BER was 4x10<sup>-5</sup> for 14Gb/s QPSK with 25dB SNR.

The equalizer efficiency is 2.5fJ/bit/coefficient at 40Gb/s LOS 16QAM with the pol-DFE, and 8.4fJ/bit/coefficient at 14Gb/s NLOS QPSK with both pol-FFE and pol-DFE. Compared with state-of-art SISO mixed-signal equalizers (Fig. 9.4.6), this work demonstrates 10x-to-60x higher equalization capability based on the number of coefficients, 1.6x-to-4.9x higher CPI+ISI total cancellation magnitude, 3.7x-to-86x better energy-efficiency/coefficient, 6x lower area/coefficient and 2.5x higher data-rate. Measurements with symbol-rates similar to 2x/3x 60GHz channel bonding demonstrate energy-efficient DP-MIMO equalization for 2x spectral efficiency.

### Acknowledgements:

The authors thank S. Hyvönen, V. Baca, A. Puglielli, T. Musah, A. Chakrabarti, C.-M. Hsu, R. Inti, E. Quijano Centeno, B. Jackson, R. Kalim, and T. Nguyen for assistance with IC design, layout and testing, and M. Haycock for management support.

### References:

- [1] IEEE 802.11ay [Online]: [http://www.ieee802.org/11/Reports/tgay\\_update.htm](http://www.ieee802.org/11/Reports/tgay_update.htm)
- [2] S. Daneshgar, et al., "A 27.8Gb/s 11.5pJ/b 60GHz Transceiver in 28nm CMOS with Polarization MIMO," *ISSCC*, pp. 166-167, Feb. 2018.
- [3] C. Thakkar, et al., "A Mixed-Signal 32-Coefficient RX-FFE 100-Coefficient DFE for an 8Gb/s 60GHz Receiver in 65nm LP CMOS," *ISSCC*, pp. 238-239, Feb. 2013.
- [4] C. Thakkar, et al., "A 10 Gb/s 45 mW Adaptive 60 GHz Baseband in 65 nm CMOS," *IEEE JSSC*, vol. 47, no. 4, pp. 952-968, April 2012.
- [5] O. E. Mattia, et al., "An up to 36Gbps Analog Baseband Equalizer and Demodulator for mm-Wave Wireless Communication in 28nm CMOS," *IEEE CICC*, pp. 1-4, April 30 to May 3, 2017.
- [6] P. A. Francese, et al., "A 16 Gb/s 3.7 mW/Gb/s 8-Tap DFE Receiver and Baud-Rate CDR With 31 ppm Tracking Bandwidth," *IEEE JSSC*, vol. 49, no. 11, pp. 2490-2502, Nov. 2014.
- [7] V. Stojanović, et al., "Autonomous Dual-Mode (PAM2/4) Serial Link Transceiver with Adaptive Equalization and Data Recovery," *IEEE JSSC*, vol. 40, no. 4, pp. 1012-1026, April 2005.



Figure 9.4.1: Concept of mm-wave dual-pol MIMO, and implemented polarization equalizer baseband.



Figure 9.4.2: Coefficient DAC area reduction using hierarchical thermometer encoding (top), DAC range/resolution control using a programmable mirror (bottom).



Figure 9.4.3: Schematics of per-stream FFE, DFE summers (left), and QPSK/16QAM configuration (right).



Figure 9.4.5: Ray tracing NLOS simulation of 4TX+4RX DP-phased array in 5m indoor environment at 5 Gsym/s (left), measured BER vs. SNR (right).



Figure 9.4.4: Baud-rate CDR PD logic and transfer characteristics, and clock generation/distribution.

|                                             | [4] JSSC 2012  | [3] ISSCC 2013 | [7] CICC 2017 | This work           |
|---------------------------------------------|----------------|----------------|---------------|---------------------|
| Technology                                  | 65nm CMOS      | 65nm CMOS      | 28nm CMOS     | 28nm CMOS           |
| Polarization                                | Single         | Single         | Single        | Dual-pol MIMO       |
| On-Chip CLK Recovery                        | 2X oversampled | –              | –             | Baud-rate, per-pol  |
| Equalizer + Clocking Area ( $\text{mm}^2$ ) | 0.76           | 0.96           | 0.52          | 4.9                 |
| Modulation                                  | QPSK           | QPSK           | 16QAM         | QPSK 16QAM          |
| Area / DFE DAC ( $\mu\text{m}^2$ )          | 2900           | 3600           | 4500          | 480 480             |
| DFE DAC Resolution                          | 7b             | 8b             | 7b            | 9b 9b               |
| Channel                                     | LOS            | NLOS           | LOS           | NLOS LOS LOS        |
| FFE + DFE UI                                | 0 + 20         | 16 + 50        | 0 + 5         | 24+150 0+150 0+75   |
| FFE + DFE Coefficients                      | 0 + 80         | 64 + 200       | 0 + 40        | 384 + 2400 0 + 2400 |
| ISI + CPI (w.r.t. cursor)                   | 2X             | 2.5X           | 0.7X          | 4X 6.5X 3.4X        |
| Symbol Rate (Gsym/s)                        | 5              | 4              | 4             | 3.5 5 5             |
| Data Rate (Gbit/s)                          | 10             | 8              | 16            | 14 20 40            |
| Equalizer Power (mW)                        | 14             | 66             | 138           | 328 240 240         |
| Power Efficiency (pJ/b)                     | 1.4            | 8.3            | 8.6           | 23.0 12.0 6.0       |
| Efficiency/Coefficient (fJ/b/coeff.)        | 17.5           | 31.3           | 215.6         | 8.4 5.0 2.5         |

Figure 9.4.6: Performance summary and comparison with mixed-signal 60GHz equalizer basebands.



Figure 9.4.7: Die micrograph (28nm CMOS).

## 9.5 A 27.8Gb/s 11.5pJ/b 60GHz Transceiver in 28nm CMOS with Polarization MIMO

Saeid Daneshgar, Kaushik Dasgupta, Chintan Thakkar, Anandaroop Chakrabarti, Shuhei Yamada, Debabani Choudhury, James Jaussi, Bryan Casper

Intel, Hillsboro, OR

The industry-wide impetus on user experience and immersive content for handheld/wearable consumer devices is accelerating the demand for high-speed millimeter-wave (mm-wave) PAN wireless connectivity. Next-generation 60GHz PAN standards [1] have made it mandatory to achieve >20Gb/s rates using wide (4.32GHz or higher) bandwidth. However, in order to support multiple concurrent high-speed links, it is imperative to achieve high spectral efficiency. MIMO techniques allow for such spectrum reuse by employing simultaneous spatial streams. However, unlike at low-GHz frequencies, which exhibit rich multipath scattering and therefore a high-rank TX-RX MIMO channel matrix, mm-wave propagation is fundamentally less diverse due to higher reflection/absorption.

This work harnesses polarization (pol) selectivity to demonstrate simultaneous communication on the two pols of a single mm-wave channel. Wireless channels are prone to cross-pol interference (CPI) from variable spatial orientation between the TX and RX, non-line-of-sight (NLOS) channel conditions, and the de-polarizing effects of multipath reflector/absorber materials. This work counters such non-idealities with time-domain RX pol-equalization [2] and demonstrates mm-wave dual-pol (DP) MIMO. The integration form-factor of the single-element 2x2 DP-MIMO 60GHz transceiver (TRX) (Fig. 9.5.1) is minimized by using a dual-drive Vertical (V) / Horizontal (H)-pol antenna.

Since high spectral efficiency requires the use of high-order constellations, TX energy efficiency at output power back-off is enhanced by using a polar architecture (Fig. 9.5.2). Amplitude and phase modulation (AM, PM) are digitally implemented by using a 6b digital power amplifier (DPA) and a 9b injection-locked oscillator (ILO), respectively. Wideband digital PM is implemented at the 1/3rd carrier sub-harmonic (20GHz) by dynamically switching a 9b capacitor DAC of the ILO to modulate the center frequency of a series injection-locked oscillator. Modulating at the 1/3rd sub-harmonic reduces the phase-shift range requirement to only  $\pm 60^\circ$  as compared to a full  $\pm 180^\circ$ , while also reducing LO distribution power. The ILO tank quality factor is designed to optimize the trade-off between the phase-shift range and the modulation bandwidth and maintains the required  $\pm 60^\circ$  phase-shift over a simulated 16-to-22GHz input LO frequency range. The PM signal is then tripled to 60GHz using a wideband injection-locked tripler. A transformer-based balun is subsequently used to convert the differential tripler output to the single-ended DPA driver input. The 6b binary-weighted DPA segments are based on a single-ended 2-stacked Class-E PA topology [3]. The upper transistor is reused for digital modulation by means of CMOS rail-to-rail swing inverter drivers and for power gain. To enable antenna reuse for half-duplex operation, the TX is turned off in the RX-mode by turning off the DPA supply switch and turning on all DPA segments. A quarter-wave transformer to the I/O pad then presents a high impedance at the antenna.

High-speed digital modulation streams for the 9b PM and 6b AM are generated using an on-chip 16b wide, 256-word SRAM (programmable with 4096 unique bits/stream) array followed by 16-to-1 serializers. The datapath delay mismatch between AM and PM paths is compensated using digitally controlled tristate inverter-based delay lines at the two clock inputs for these paths. Digital AM/PM pre-distortion compensates for AM/PM non-idealities. Including the serializer-to-AM/PM digital buffers, the entire TX chain (Fig. 9.5.2), consumes a measured 86mA/pol from a 1.1V supply with a measured output  $P_{sat}$  of 4dBm including the simulated 1.5dB TX-mode I/O switch loss.

Similar to the TX output, the RX uses an input switch that is turned on in the TX-mode and quarter-wave transformed to a high impedance at the antenna. In the RX-mode (Fig. 9.5.2), front-end gain is provided by a cascaded, 3-stage stagger-tuned LNA with inductive source-degeneration-based input matching (Fig. 9.5.3). The single-balanced mixer pair with an LO trap is biased to simultaneously achieve high gain (9dB simulated) and low-power (4.2mA measured). Carrier recovery with a baseband I/Q phase-shifter is achieved with low-power (1.6mA measured) yet high-resolution ( $4^\circ$  worst-case measured angle-step) by means of cascode voltage-DAC-based I/Q gain control. The RX datapath (excluding 50Ω baseband output drivers) consumes 21mA/pol (measured) from a 1.1V supply while

achieving 30dB gain, 3GHz bandwidth and 8dB average noise figure across the bandwidth of interest including the simulated 2dB RX-mode I/O switch loss.

LO distribution is implemented as a 1/3rd carrier-subharmonic lumped-element LC-tuned cascaded V/H, TX/RX splitter/amplifier chain that is more power-efficient than a carrier-frequency and/or wideband T-line-based design. RX I/Q LO generation is done with 7b cap-DAC-tuned sub-harmonic ILOs with  $\pm 15^\circ$  phase shifts. The following parallel-injection frequency triplers provide both high locking bandwidth (5GHz) and high swing (1.2V<sub>pp</sub>, diff.) at the RX mixers. The entire V/H, TX/RX LO distribution chain and IQ generation circuitry consumes 76mA from a 1.1V supply.

The 60GHz DP-MIMO TX/RX test-chip (Fig. 9.5.7) is fabricated in a 28nm CMOS process. The die is wirebonded onto a 6-layer, 22-mil PCB with RO4350B core and RO4450F prepreg materials. The bonded mm-wave I/Os are matched on-PCB to the stacked patch antenna structure (Fig. 9.5.4) with identical V/H-pol excitation ports placed orthogonally at 0/90°. A proximity-coupled back-side excitation scheme is used to achieve an 8GHz antenna bandwidth on both V and H-pol. The simulated DP radiation patterns including wirebonds, matching circuitry and antenna feeds show a 90° spatial 3dB bandwidth per pol. Although surface-wave radiation from the large PCB surface area coupled with wirebonding causes gain variation around the broadside, about 20dB of polarization isolation is achieved for both H- and V-pol.

Over-the-air (OTA) DP-MIMO measurements were performed with two PCB assemblies, with each side configured either as TX or RX mode on both V/H pols. Simultaneous DP unidirectional transmission was then demonstrated by programming the TX chains with uncorrelated data sequences. Degradation in RX SNR due to cross-pol from a combination of finite near-end TX-to-TX isolation, misaligned TX-RX orientation, multipath propagation and NLOS channel conditions was mitigated by employing a 2-UI RX pol-FFE and a 3-UI pol-DFE off-chip. The equalization used is well within the capabilities of the mixed-signal pol-FFE and pol-DFE implementation in [2]. With QPSK modulation, a maximum DP data-rate of 18.8Gb/s / 16Gb/s was measured at a 7cm/10cm TX-RX distance with LOS / NLOS multipath channels (Fig. 9.5.5) with -13.0dB/13.0dB RX EVM and no errors over 4096 bits on each pol. With 16-QAM modulation, a maximum DP data-rate of 27.8Gb/s / 20Gb/s was measured at a 5cm/7cm TX-RX distance with a LOS/NLOS channel with -17.8dB/17.9dB RX EVM and BER of 0.7E-3/0.6E-3 respectively. 16Gb/s QPSK DP-MIMO was measured up to a 13cm distance with an RX EVM of -12.5dB and 0 errors over 4096 bits. At 7cm with relative TX/RX angular separation in both vertical/horizontal planes, the link achieved better than -13dB EVM over 65°/90° (Fig. 9.5.4), thus demonstrating a wide angular range of coverage.

Compared to state-of-art multi-Gb/s 60GHz TRX [4-7] (Fig. 9.5.6), this work achieves the best energy efficiency per element normalized by distance and antenna/array gain (pJ/bit;element/ND). Compared to designs with integrated PCB/package antennas and/or multipath with NLOS channel demonstrations [5,6], this work achieves 4-to-6x higher data-rate at equal BER (<1E-3). The measurements demonstrate the feasibility of mm-wave DP-MIMO to achieve data-rates >20Gb/s at 60GHz and enhance spectral efficiency by 2x for next-generation mm-wave PAN connectivity.

### Acknowledgements:

The authors thank V. Gopinathan, S. Kang, K. Datta, S. Shopov, V. Baca, E. Quijano Centeno, B. Jackson, R. Kalim, and T. Nguyen for assistance with IC design, layout and testing, and M. Haycock for management support.

### References:

- [1] IEEE 802.11ay [Online]: [http://www.ieee802.org/11/Reports/tgay\\_update.htm](http://www.ieee802.org/11/Reports/tgay_update.htm).
- [2] S. Kang, et al., "A 40Gb/s 6pJ/b RX Baseband in 28nm CMOS for 60GHz Polarization MIMO," *ISSCC*, pp. 164-165, Feb. 2018.
- [3] K. Dasgupta, et al., "A 25 Gb/s 60 GHz Digital Power Amplifier in 28nm CMOS," *ESSCIRC*, pp. 207-210, Sept. 2017.
- [4] J. Pang, et al., "A 128-QAM 60GHz CMOS Transceiver for IEEE802.11ay with Calibration of LO Feedthrough and I/Q Imbalance," *ISSCC*, pp. 424-425, Feb. 2017.
- [5] G. Mangraviti, et al., "A 4-Antenna-Path Beamforming Transceiver for 60GHz Multi-Gb/s Communication in 28nm CMOS," *ISSCC*, pp. 246-247, Feb. 2016.
- [6] M. Boers, et al., "A 16TX/16RX 60GHz 802.11ad Chipset with Single Coaxial Interface and Polarization Diversity," *ISSCC*, pp. 344-345, Feb. 2014.
- [7] K. Okada, et al., "A 64-QAM 60GHz CMOS Transceiver with 4-Channel Bonding," *ISSCC*, pp. 346-347, Feb. 2014.



Figure 9.5.1: Conceptual block diagram of the implemented 2x2 dual-pol MIMO system.



Figure 9.5.2: Schematics of polar TX with ILO-based PM and DPA-based AM.



Figure 9.5.3: RX schematics with single-ended LNA, single-balanced mixer, low-power phase-shifter and sub-harmonic LO chain.



Figure 9.5.4: Dual-pol antenna and cross-section (top left), simulated gain (top right) and return loss (bottom right). Measured RX EVM for 16Gb/s DP QPSK across spatial angles (7cm LOS) (bottom left).



Figure 9.5.5: Received constellation diagrams and EVMs for LOS and NLOS channels with multipath propagation.

| Metric                                    | This Work                                                    | [4] ISSCC'17                                       | [5] ISSCC'16                                        | [6] ISSCC'14                                                           | [7] ISSCC'14                                          |
|-------------------------------------------|--------------------------------------------------------------|----------------------------------------------------|-----------------------------------------------------|------------------------------------------------------------------------|-------------------------------------------------------|
| Technology                                | 28 nm CMOS                                                   | 65 nm CMOS                                         | 28 nm CMOS                                          | 40 nm CMOS                                                             | 65 nm CMOS                                            |
| Integration                               | TX: Digital polar<br>RX: Direct conv.<br>Ant.: Wire-bond PCB | TX: Direct conv.<br>RX: Direct conv.<br>Ant.: Horn | TX: Heterodyne<br>RX: Direct conv.<br>Ant.: Package | TX: Direct conv.<br>RX: Heterodyne<br>Ant.: Direct conv.<br>Ant.: Horn | TX: Direct conv.<br>RX: Direct conv.<br>Ant.: Package |
| TX+RX<br>Ant./Array Gain (dBi)            | 7 (1Tx + 1Rx)                                                | 28 (1Tx + 1Rx)                                     | 24 (4Tx + 4Rx)                                      | 45 (16Tx + 16Rx)                                                       | 24 (4Tx + 4Rx)                                        |
| DC Power / element                        | TX + LO: 210 mW <sup>1</sup><br>RX + LO: 110 mW              | TX: 169 mW<br>RX: 139 mW                           | TX: 167 mW<br>RX: 107 mW                            | TX: 74 mW<br>RX: 60 mW                                                 | TX: 251 mW<br>RX: 220 mW                              |
| Constellation                             | QPSK                                                         | 16QAM                                              | 64QAM                                               | 16QAM                                                                  | 16QAM                                                 |
| Data Rate (Gb/s) <sup>2</sup>             | 16                                                           | 27.8                                               | 42.24                                               | 7                                                                      | 4.6                                                   |
| EVM wrt avg. (dB)                         | -12.5                                                        | -18                                                | -21.1                                               | -20                                                                    | -19.5                                                 |
| pJ / bit / element                        | 20                                                           | 11.5                                               | 7.3                                                 | 39.3                                                                   | 29.2                                                  |
| Bits / symbol                             | 4                                                            | 8                                                  | 6                                                   | 4                                                                      | 4                                                     |
| Normalized Distance <sup>3</sup> (ND, cm) | 13                                                           | 5                                                  | 0.35                                                | 14.1                                                                   | 14                                                    |
| pJ / bit / element / ND                   | 1.5                                                          | 2.3                                                | 20.5                                                | 2.8                                                                    | 2.4                                                   |
| Multipath Channel Measurements            | Yes                                                          | No                                                 | No                                                  | Yes                                                                    | No                                                    |
| Die Area (mm <sup>2</sup> ) / element     | 3.9                                                          | 7.2                                                | 3.9                                                 | 1                                                                      | 2                                                     |
| MIMO Streams                              | 2                                                            | 1                                                  | 1                                                   | 1                                                                      | 1                                                     |

<sup>1</sup> Including digital TX driver power at highest rate

<sup>2</sup> Data rate for BER < 1E-3

<sup>3</sup> Link distance (BER < 1E-3) normalized to antenna gain in this work (3.5 dBi averaged over beam-width)

Figure 9.5.6: Comparison to state-of-art 60GHz CMOS transceivers.



Figure 9.5.7: Die micrograph and wire-bonding to dual-pol patch antenna.

## 9.6 A 120Gb/s 16QAM CMOS Millimeter-Wave Wireless Transceiver

Korkut K. Tokgoz<sup>1</sup>, Shotaro Maki<sup>1</sup>, Jian Pang<sup>1</sup>, Noriaki Nagashima<sup>1</sup>, Ibrahim Abdo<sup>1</sup>, Seitaro Kawai<sup>1</sup>, Takuya Fujimura<sup>1</sup>, Yoichi Kawano<sup>2</sup>, Toshihide Suzuki<sup>2</sup>, Taisuke Iwai<sup>2</sup>, Kenichi Okada<sup>1</sup>, Akira Matsuzawa<sup>1</sup>

<sup>1</sup>Tokyo Institute of Technology, Tokyo, Japan; <sup>2</sup>Fujitsu Laboratories, Atsugi, Japan

This paper presents an ultra-wideband millimeter-wave (mm-wave) wireless transceiver (TRX) achieving 120Gb/s data-rate, which is the highest rate among the state-of-the-art mm-wave transceivers [1-4]. The data-rate of 120Gb/s is realized by two 15GBaud data streams in 16-QAM modulation ( $2 \times 15 \times 4b/\text{symbol} = 120\text{Gb/s}$ ). The total 30GBaud signal is up- and down-converted with 70GHz and 105GHz LO signals, which are generated by a doubler and a tripler from an external 35GHz source input with more than 29dBc and 38dBc undesired harmonic suppression, respectively. Single-IF balanced mixers are employed in the TX with an LO leakage-cancellation technique. Hence, the TRX is capable of transmitting and receiving the whole bandwidth of 35GHz. The power consumptions of transmitter (TX) and receiver (RX) are 120 and 160mW, respectively. The total core area for the TRX is 3.2mm<sup>2</sup>.

The TRX block diagram is shown in Fig. 9.6.1. Low-Band (LB) and High-Band (HB) IF signals lie from 0.3 to 17.2GHz with a center frequency of 8.75GHz. A frequency doubler and a tripler are used to generate up- and down-conversion 70 and 105GHz LO signals from a 35GHz input. The doubler uses a two-stage narrow-bandwidth LO buffer working at 70GHz, and the tripler uses a three-stage narrow-bandwidth LO buffer working at 105GHz. An additional LO buffer in RX for 105GHz is used to provide more power for the HB downconversion mixer. Single-IF balanced upconversion mixers are used in TX for LO leakage cancellation. Capacitive cross-coupled differential LO buffers are used to provide differential LO into the TX mixers. Wideband resistive feedback RF amplifiers are designed from 70 to 87.5GHz for LB and 87.5 to 105GHz for HB. These resistive-feedback amplifiers provide wideband matching at LB and HB IF inputs of the TX owing to the impedance transformation ability of mixers from RF to IF. There is an additional one-stage common-source (CS) RF amplifier on HB. This amplifier ensures the power levels for upconverted LB and HB RF signals to be close since the HB mixer has more loss compared to the LB mixer. The PA is a six-stage design and works from 70 to 105GHz. The last five stages are positive-feedback CS (PFCS) whereas the first stage is designed as CS to ensure more isolation and hence the stability of the amplifier. The RX has a five-stage PFCS LNA working from 70 to 105GHz. The output of the LNA is divided into two paths and one-stage PFCS RF amplifiers follow. The LB RF amplifier works from 70 to 87.5GHz and the HB RF amplifier works from 87.5 to 105GHz. The gain characteristics of TX RF amplifiers, PA, LNA and RX RF amplifiers help to suppress the upconverted sidebands such as before 70GHz for LB and after 105GHz for HB. RX downconversion mixers are based on single transistors with a total width of 12μm and finger width of 2μm.

Figure 9.6.2 shows the LO generation and upconversion mixer circuitry. The doubler transistor has a total width of 12μm with a finger width of 2μm and is biased at 0.35V for the highest second-order non-linearity. The buffers used in both doubler and tripler are based on the PFCS topology designed to be narrow bandwidth to suppress the fundamental (35GHz) and undesired harmonics for better TX and RX EVM performances while avoiding cross modulations between LB and HB. In the figure, details of the first stage buffer for the doubler are given. The simulation results for the doubler show 29dBc difference between the desired second-order and undesired harmonics. The tripler transistor has a total width of 40μm with a finger width of 2μm and is biased at 0.2V for the highest third-order non-linearity. The simulation results of the tripler input-to-output power characteristics illustrate that the difference between the desired third-order and undesired harmonics is around 38dBc. The outputs of the doubler and tripler are matched to 25Ω since it is divided into two for TX and RX which are matched to 50Ω. Finally, as mentioned previously, an additional buffer is inserted after the tripler to increase the LO power to the RX HB mixer. The upconversion mixers are single-IF balanced topology for LO leakage cancellation. This can be achieved using a differential LO signal and proper adjustment with bias nodes of mixer and differential buffer transistors as well as the DC offset voltage at the IF ports of the mixers. DC offset voltages are adjusted with the DC current flowing through the 80Ω wideband RF resistors. These also help with the wideband matching of IF input ports of the TX.

Figure 9.6.3 shows the only-TX and only-RX EVM measurement results. The 16-QAM EVM requirement of -19.5dB is also indicated in the figure, which is 3dB

higher than the requirement of TX-to-RX EVM for a BER of 10<sup>-3</sup>. A custom test module is implemented with a transition from PCB to W-Band waveguide interface featuring connection to external components for measurement purposes [3]. Additionally, the waveguide bandpass filtering property helps to suppress the undesired sidebands. A 65GS/s 25GHz-BW AWG (Keysight M8195A) is used to generate 5GBaud QPSK and 16-QAM IF signals and a 200GS/s 70GHz-BW oscilloscope (Tektronix DPO77002SX) is used to collect the output data. During the TX EVM measurements, the output of the TX is connected to a 70-to-84GHz BPF for LB and a 93-to-109GHz BPF for HB before the external W-band mixer which has an LO input at 87.5GHz. The IF output of the external mixer (Millitech MXP-10-RFSFL), which has -13dB conversion loss and 13dB NF, is connected to the oscilloscope. Figure 9.6.3 illustrates that the TX achieves the best EVM for 16QAM of -24.3dB at an output power of -7.4dBm for LB, and -21.1dB at an output power of -9.4dBm for HB. Note that the input power to the TX is limited up to -2dBm due to AWG, cable and module losses. Hence better TX EVM is possible with higher input power. During the RX EVM measurements, the external mixer is used to upconvert the IF signals to 78.75GHz for RF LB and 96.25GHz for RF HB and BPFs are used before the RX. The output of the RX is connected to the oscilloscope. For 5GBaud 16-QAM RF signals, the RX has the best EVM of -25.6dB at an input power of -26.3dBm for LB and -24.7dB at an input power of -24.3dBm for HB.

Figure 9.6.4 illustrates the measurement results of TX-to-RX EVM in QPSK and 16QAM at a distance of 0.2m for symbol rates from 4GBaud to 15GBaud. External 23dBi horn antennas are used for TX and RX. The roll-off factor for symbol rates from 4GBaud to 12.5GBaud is 0.35, i.e. analog data bandwidth from 5.4GHz to 16.9GHz. The roll-off factor is 0.25 for 13.5GBaud (16.9GHz analog bandwidth), and 0.13 for 15GBaud (17GHz analog bandwidth). The required TX-to-RX EVM limit lines of -9.8dB for QPSK and -16.5dB for 16QAM are also drawn in the plots. In the plots measurement results of without built-in software equalization and with equalization are presented. The TX-to-RX QPSK EVM difference when only LB IF is ON and when both LB and HB IFs are ON is less than 0.9dB. Similarly, the TX-to-RX QPSK EVM difference for HB is less than 0.6dB. These results strongly illustrate that the contribution of cross-modulation between LB and HB to the TRX performance is negligibly low. A 120Gb/s (15GBaud for LB and 15GBaud for HB simultaneously) data-rate in 16QAM is achieved with the built-in software equalization, and 64Gb/s (8GBaud for LB and 8GBaud for HB simultaneously) data-rate is also achieved in 16QAM without any equalization.

Figure 9.6.5 provides the measured constellation and TX output spectra. As mentioned above, the 120Gb/s data-rate is achieved with a TX-to-RX EVM of -16.9dB for 15GBaud LB and -17dB for 15GBaud HB in 16QAM. The TX output power is -2dBm excluding the module loss of 10dB. Moreover, the TRX achieves a maximum data-rate of 72Gb/s (2×12GBaud in 8PSK) without any equalization, and 60Gb/s (2×6GBaud in 32QAM) is also achieved. Furthermore, a 60Gb/s (6GBaud for LB with -31.1dB EVM and 4GBaud for HB with -36.1dB EVM) data-rate is achieved in 64QAM with built-in software equalization.

Figure 9.6.6 shows the comparison table for high data-rate TRXs. This work achieves 120Gb/s wireless data-rate. The TX and RX consume 120mW and 160mW from 1V supply voltage, respectively.

The TRX is manufactured on 65nm CMOS with a total area of 6mm<sup>2</sup> shown in Fig. 9.6.7. The total active area is around 3.2mm<sup>2</sup> without SPI control circuitry. Area and power breakdown of the TRX is also detailed in this figure.

### Acknowledgments:

This work is partially supported by MIC, SCOPE, STAR, and VDEC in collaboration with Cadence Design Systems, Inc., Synopsys, Inc., Mentor Graphics, Inc., and Keysight Technologies Japan, Ltd. The authors thank Mr. Junetsu Hariu for his technical support on module implementation.

### References:

- [1] R. Wu, et al., "64-QAM 60-GHz CMOS Transceivers for IEEE 802.11ad/ay," *IEEE JSSC*, vol. 52, no. 11, pp. 2871-2891, Nov. 2017.
- [2] J. Pang, et al., "A 128-QAM 60GHz CMOS Transceiver for IEEE802.11ay with Calibration of LO Feedthrough and I/Q Imbalance," *ISSCC*, pp. 424-425, Feb. 2017.
- [3] K. K. Tokgoz, et al., "A 56Gb/s W-Band CMOS Wireless Transceiver," *ISSCC*, pp. 242-243, Feb. 2016.
- [4] F. Boes, et al., "Ultra-Broadband MMIC-Based Wireless Link at 240 GHz Enabled by 64 GS/s DAC," *Int. Conf. Infrared, Millimeter, and Terahertz Waves*, pp. 1-2, Sept. 2014.



Figure 9.6.1: Block diagram of frequency-interleaved CMOS transceiver.



Figure 9.6.2: LO generation and upconversion mixer circuitry.



Figure 9.6.3: Transmitter and receiver EVM measurement summary.



Figure 9.6.4: Measured TX-to-RX EVM of QPSK and 16QAM.



\*23dB horn antennas are used for TX and RX. Communication distance is 0.2m including the loss from module.  
\*\*Roll-off factor is 0.13 in all cases above, total occupied band is 70-105GHz, spectrum is normalized for LB&HB.

Figure 9.6.5: Measured TX-to-RX constellation, spectrum and performance.

| Ref.      | Freq. [GHz]                                    | Data-Rate (Modulation)                                                                                                                    | Symb. Rate [GHz]           | TX P <sub>out</sub> [dBm] | Technology  | P <sub>DC</sub> [mW] | Die Area [mm <sup>2</sup> ] | Distance [m] |
|-----------|------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|----------------------------|---------------------------|-------------|----------------------|-----------------------------|--------------|
| [5]       | 123-138                                        | 12.5Gb/s (OOK) <sup>a</sup>                                                                                                               | 12.5                       | 9.5                       | 55nm BiCMOS | TX: 59<br>RX: 38     | 3.2                         | 5**          |
| [6,7]     | 235-245                                        | 16Gb/s (QPSK) <sup>a</sup>                                                                                                                | 8                          | 0                         | 65nm CMOS   | TX: 220<br>RX: 260   | 4                           | 0.02         |
| [8]       | 148-162                                        | 20Gb/s (QPSK) <sup>a</sup>                                                                                                                | 10                         | -7                        | 45nm SOI    | TRX: 345             | 3.92                        | N/A*         |
| [9,10]    | 285-295                                        | 32Gb/s (16QAM)                                                                                                                            | 8                          | -5.5                      | 40nm CMOS   | TX: 1400<br>RX: 650  | 8.34                        | 0.01         |
| [2]       | 56-67                                          | 50.1Gb/s (64QAM)                                                                                                                          | 8.35                       | 4.8                       | 65nm CMOS   | TX: 169<br>RX: 139   | 6                           | 0.01         |
| [3]       | 74-98                                          | 56Gb/s (16QAM)                                                                                                                            | 14                         | -8.4                      | 65nm CMOS   | TX: 260<br>RX: 300   | 6                           | 0.1          |
| [4]       | 208-272                                        | 96Gb/s (8PSK)                                                                                                                             | 32                         | -3.5                      | 35nm GaAs   | —                    | 5                           | 40**         |
| This Work | 70-105<br>75-100<br>73-102<br>71-104<br>70-105 | 60Gb/s (QPSK) <sup>a</sup><br>60Gb/s (32QAM) <sup>b</sup><br>64Gb/s (16QAM) <sup>b</sup><br>72Gb/s (8PSK) <sup>a</sup><br>120Gb/s (16QAM) | 30<br>12<br>16<br>24<br>30 | -1.9                      | 65nm CMOS   | TX: 120<br>RX: 160   | 6                           | 0.2          |

\* Wireline measurements \*\* With high-gain lens ant. <sup>a</sup> Without built-in software equalization

[5] N. Dolatsha et al., ISSCC 2017 [6] S. Kang et al., JSSC 2015 [7] S. V. Thayagaran et al., JSSC 2015  
[8] Y. Yang et al., RFIC 2014 [9] K. Takano et al., ISSCC 2017 [10] S. Hara et al., IMS 2017

Figure 9.6.6: Performance comparison of high data-rate transceivers.



Figure 9.6.7: Die micrograph, power and area breakdown.

## 9.7 A Broadband and Deep-TX Self-Interference Cancellation Technique for Full-Duplex and Frequency-Domain-Duplex Transceiver Applications

Kun-Da Chu, Mohamad Katanbaf, Tong Zhang, Chenxin Su,  
Jacques C. Rudell

University of Washington, Seattle, WA

Full-Duplex (FD) radios, capable of simultaneously transmitting and receiving on the same frequency, have evolved as one method to address future demand for higher data-rates. Although recent highly integrated full-duplex radios [1-4] show promise towards improving spectral efficiency by >1.9x [5], mitigating transmitter (TX) self-interference (SI) for broad-bandwidth signals remains a major challenge, particularly for longer range transceivers requiring a high-output-power PA. This paper describes a CMOS transceiver with a deep TX SI cancellation performance over a very wideband (65dBc/80MHz and 70dBc/40MHz).

Achieving sufficient TX SI cancellation depth and bandwidth for envisioned 5<sup>th</sup>-Generation standards using a single component is quite challenging. However, cascading several SI cancellation techniques as the signal passes down the receiver chain appears promising to achieve high TX SI attenuation as was done with a two-point injection architecture in [3]. This approach utilized two feedforward cancelers, where both inputs are supplied by the TX output, with a high-frequency multitap adaptive FIR filter supplying a cancellation signal to the LNA input, while the second cancellation path frequency-translated the signal to baseband, where a complex analog adaptive I/Q filter further suppressed the TX SI [3]. However, the BB I/Q filters require many taps and occupy significant silicon area as compared to the equivalent RF filter implementation. In addition, calibrating both complex filters is challenging and any residual imbalance between the I and Q paths limits the cancellation performance. In contrast, this radio implements three components (versus two) along the RX signal chain, each contributing to TX SI cancellation. These include an integrated electrical balance duplexer (EBD) between the TX, RX and antenna ports, in addition to a new dual-point injection feedforward canceler between the TX and RX, see Fig. 9.7.1.

The two points of injection for these feedforward cancelers, which are along the receiver chain of this dual-path architecture, are at the LNA input (similar to [3]), and at the LNA-RX downconversion mixers interface (Fig. 9.7.1). This obviates the need for the complex I/Q auxiliary downconversion mixers and the analog FIR filters as in [3]. Moreover, the second cancellation path has now been moved to earlier in the RX chain, thus reducing the linearity demand on the RX downconversion mixers and filters in the analog baseband. There is also an advantage as a far lower fractional-BW is required in the 2<sup>nd</sup> cancellation path, as compared to [3] which is 100%. Further benefits of this dual-path architecture relate to the method for TX broadband and phase-noise (PN) suppression in the receiver. Unlike the work in [3], which only performs suppression of the TX reciprocal mixing products when both the TX and RX utilize the same carrier frequency, the pre- and post-LNA cancellation networks in this work can be set to suppress the TX carrier and/or noise when in FDD mode, independent of the RX and TX carrier frequencies, Fig. 9.7.2.

The TX chain is composed of an upconversion mixer with LO dividers/drivers and a Class-AB power amplifier. The RX signal path includes a transconductance ( $G_m$ ) LNA, a passive mixer driven by a 25% duty-cycle LO, followed by a transimpedance amplifier. An integer-N synthesizer is implemented to provide the LO signals for the RX. Both the integrated EBD and a transformer at the PA output eliminate the need for off-chip RF components, including a circulator and RF baluns.

This transceiver employs an EBD and two low-power, compact, and wideband cancelers to achieve high SI cancellation (Fig. 9.7.2). The EBD achieves a high TX-to-RX isolation (~39dB) over a wide bandwidth (200MHz) [6]. The inputs of both cancelers are connected to the PA output matching network to capture the TX carrier signal, along with the phase noise and harmonics generated by the TX BB nonlinearities. A combination of the EBD and the two-path feedforward canceler in this work achieves more than 70dB cancellation over 40MHz (Fig. 9.7.5).

A 2-to-4 turns ratio EBD provides both a single-ended-to-differential conversion and the LNA noise matching (Fig. 9.7.3). The LNA is implemented with a current-reuse  $G_m$  stage, which provides a current-mode output to drive the passive

downconversion mixers. Both PMOS and NMOS tail currents are utilized in the LNA to provide a first-order rejection of any common-mode signals generated by the EBD [6]. A 1-to-3 transformer is placed between EBD and PA, to translate the 50Ω impedance of the antenna down to 6Ω, which is the  $r_{opt}$  for the PA output.

Each RF canceler consists of a 5-tap FIR filter to emulate the inverse response of multiple TX leakage paths [3], see Fig. 9.7.3, where each tap includes an 8b (signed-magnitude) variable- $G_m$  amplifier and a push-pull interstage buffer to drive an RC-CR allpass filter with a measured time delay of 50.9ps (Fig. 9.7.5). The push-pull buffer improves the canceler IIP3 by 3dB as compared to [3]. Both RF cancelers supply a current output signal which is summed with a current-mode signal in the RX path. Each canceler presents a high impedance at both the input (~3kΩ) and output (~1kΩ) to prevent loading the PA output and LNA input/output. To further improve the canceler linearity, an attenuator is placed prior to both cancelers' inputs. The simulated IIP3 of the canceler is +40dBm; a direct measurement of the canceler IIP3 is not possible as both the TX and RX paths are integrated from baseband to the antenna port on the TX side, and from the EBD input to the RX baseband. Also, for the sake of clarity, the IIP3 simulation was performed by placing the PA output  $r_{opt}$  (6Ω differential resistance) at the canceler input, which is significantly lower than the real part resistance looking into just the canceler itself (~3kΩ). This simulation approach most accurately reflects the voltage levels experienced at the canceler input when the PA is at maximum output power.

This transceiver chip was fabricated in TSMC's 6L 40nm LP CMOS process with a die size of 4mm<sup>2</sup> and consumes 106mW (w/o PA). The EBD occupies an area of 0.23mm<sup>2</sup> while both RF cancelers occupy an area of 0.12mm<sup>2</sup>. An Altera Cyclone III FPGA with 14b ADC/DACs operating at 100MHz is used to emulate the digital BB and close the filter adaptation loop (Fig. 9.7.4). The FPGA finds optimal codes for the canceler weights in real time with a gradient descent algorithm. The EBD was characterized using an on-chip test structure and has a measured TX-to-RX isolation of 39dB over 200MHz. The RX operates from 1.6GHz to 1.9GHz with a measured gain of 42dB and a total noise figure (NF) of 8.09dB, 5.6dB of which was contributed by the measured passive loss of the EBD. The PA has a measured output  $P_{-1dB}/P_{sat}$  of 10.6dBm/12.5dBm. The measured locking range of the integer-N synthesizer is 3.52 to 4.28GHz while consuming 10.4mW with a PN of -117dBc/Hz @1MHz offset.

The on-chip self-interference cancellation was tested by applying a 20/40/80MHz OFDM multicarrier 64-QAM modulated signal, and a measured channel power difference of 72.8/70.1/65.2dB (maximum 77.6dB from a single-tone sweep) was obtained with an integration bandwidth of 22/45/85MHz, respectively (Fig. 9.7.6). The noise figure degradation due to the TX leakage signal dropped from 8dB to 1.6dB after turning on both RF cancelers. TX SI reciprocal mixing with the RX LO phase noise was also significantly reduced by 11dB (Fig. 9.7.5), because all cancellation occurs prior to the RX mixers.

A tabular performance comparison is given in Fig. 9.7.6, and the die micrograph is presented in Fig. 9.7.7. A complete transceiver system with the entire analog TX upconversion and RX downconversion paths demonstrates the feasibility for FD radios with more than 72.8/70.1/65.2dB on-chip SI cancellation over 20/40/80MHz bandwidths, respectively. This was achieved by cascading three TX-to-RX isolation blocks along the receiver signal chain.

### Acknowledgements:

This work was supported by NSF #1408575, SRC, Intel, CDADIC, Google, and Marvell. The authors thank Zhenghan Lin for assistance with lab measurements.

### References:

- [1] S. Ramakrishnan, et al., "An FD/FDD Transceiver with RX Band Thermal, Quantization, and Phase Noise Rejection and >64dB TX Signal Cancellation," *IEEE RFIC*, pp. 352-355, 2017.
- [2] N. Reiskarimian, et al., "Highly-Linear Integrated Magnetic-Free Circulator-Receiver for Full-Duplex Wireless," *ISSCC*, pp. 316-317, Feb. 2017.
- [3] T. Zhang, et al., "A 1.7-to-2.2GHz Full-Duplex Transceiver System with >50dB Self-Interference Cancellation over 42MHz Bandwidth," *ISSCC*, pp. 314-315, Feb. 2017.
- [4] B. Liempd, et al., "Adaptive RF Front-Ends Using Electrical-Balance Duplexers and Tuned SAW Resonators," *IEEE TMTT*, no. 99, pp. 1-8, 2017.
- [5] M. Chung, et al., "Prototyping Real-Time Full Duplex Radios," *IEEE Communications Magazine*, vol. 53, no. 9, pp. 56-63, Sept. 2015.
- [6] M. Elkholly, et al, "Low-Loss Integrated Passive CMOS Electrical Balance Duplexers With Single-Ended LNA," *IEEE TMTT*, vol. 64, no. 5, pp. 1544-1559, May 2016.



Figure 9.7.1: Block-level diagram of FD/FDD transceiver architecture.



Figure 9.7.2: Conceptual diagram of TX self-interference suppression as signal passes from the transmitter through the receiver.



Figure 9.7.3: Detailed circuit diagram of: 1) RF front-end interface, 2) RF cancellers.



Figure 9.7.4: Lab bench setup allowing various modes of testing. Chip testboard and Altera Cyclone III FPGA board shown above.



Figure 9.7.5: Measurement results of TX SI mitigation: SI cancellation with CW signal, NF and reciprocal mixing, tap delay of cancellers, and SI cancellation with OFDM signal.

|                                       | RFIC' 2017 [1]           | ISSCC' 2017 [2]                | ISSCC' 2017 [3]             | TMTT' 2017 [4]                | This Work                              |
|---------------------------------------|--------------------------|--------------------------------|-----------------------------|-------------------------------|----------------------------------------|
| Architecture                          | Replica Cancellation DAC | N-Path-Filter-Based Circulator | Dual-Path + Adaptive Filter | EBD + SAW Filter              | EBD + Double-RF Adaptive Filter        |
| Technology                            | 65nm                     | 65nm                           | 40nm                        | 0.18μm SOI                    | 40nm                                   |
| Frequency Range (GHz)                 | 1-2                      | 0.61-0.975                     | 1.7-2.2                     | 0.7-1.0                       | 1.6-1.9                                |
| TX_out-to-RX_in Iso. (dB)             | N/A                      | 40°                            | N/A                         | 50°                           | 39°                                    |
| Total On-Chip SIC Depth (dB) BW (MHz) | 64 / 20                  | 40 / 20                        | 50 / 42                     | 50 / 10 (SAW)<br>50 / 2 (EBD) | 72.8 / 20°<br>70.1 / 40°<br>65.2 / 80° |
| RX Gain (dB)                          | 35                       | 28                             | 36                          | 8.8                           | 42                                     |
| Noise Figure (dB)                     | 3.6                      | 6.3                            | 4                           | 7.6                           | 8.1 (5.6dB from EBD)                   |
| SIC NF Degradation (dB)               | 3.4 <sup>b</sup>         | 1.7                            | 1.5                         | N/A                           | 1.6                                    |
| Canceller Area (mm <sup>2</sup> )     | N/A                      | N/A                            | 0.349 (RF+BB)               | N/A                           | 0.12 (RF <sup>c</sup> )                |
| SI Circuitry Power (mW)               | N/A                      | 36'                            | 11.5 (RF+BB)                | 0 (passive)                   | 14.3 (RF <sup>c</sup> )                |
| TX SI to RX LO Suppression (dB)       | N/A                      | N/A                            | 10                          | N/A                           | 11                                     |
| Integrated TX Upconversion Path       | No                       | No                             | No                          | No                            | Yes                                    |
| Integrated PA                         | Yes                      | No                             | Yes                         | No                            | Yes                                    |
| Integrated PLL                        | No                       | No                             | Yes                         | No                            | Yes                                    |
| Active Area (mm <sup>2</sup> )        | 6.25                     | 0.94                           | 3.5                         | 6.62                          | 4                                      |

<sup>a</sup> Measured with on-chip test structure. <sup>b</sup> Measured channel power difference with 20MHz 64QAM and 22MHz integration BW. <sup>c</sup> Measured channel power difference with 80MHz 64QAM and 85MHz integration BW.  
\* Averaged over 20MHz. <sup>d</sup> Antenna interface. <sup>e</sup> Calculated by the difference between system NF and RX NF.

Figure 9.7.6: Comparison table with state-of-art FD publications.



Figure 9.7.7: TSMC 6L-metal FD and FDD transceiver die micrograph.

## 9.8 A 1.4-to-2.7GHz High-Efficiency RF Transmitter with an Automatic 3f<sub>LO</sub>-Suppression Tracking-Notch-Filter Mixer Supporting HPUE in 14nm FinFET CMOS

Qing Liu, Daehyun Kwon, QuangDiep Bui, Jeonghyun Choi, Jaehun Lee, Sanghyun Baek, Seungchan Heo, Thomas Byunghak Cho

Samsung Electronics, Suwon, Korea

As LTE offers the best wireless experience as never before, people spend much more time on smartphones to enjoy internet surfing and social networking, etc. To further improve the LTE network efficiency, recently, the Power-Class 2 (PC2) High Power User Equipment (HPUE) applied to B41 is deployed. To support HPUE, RF transmitters (TXs) need to deliver higher output power while maintaining stringent linearity requirements of emission mask, especially for third-order Counter Intermodulation (CIM3) in one Resource Block (1RB) case. CIM3 can be improved by several techniques such as power mixers [1,2], Harmonic-Reject Mixers (HRM) [3], notch-filter mixers [4, 5] and digital TXs [6]. Power-mixer, HRM and digital TX solutions may not be suitable for HPUE due to power-efficiency limitations. The notch-filter mixer is based on a voltage-mode passive mixer with 25%-duty-cycle LO, and it reduces 3f<sub>lo</sub>, which exactly corresponds to the component at 3f<sub>lo</sub>-fb, by employing an LC trap. However, the conventional series [4] or parallel [5] 3f<sub>lo</sub> trap causes a significant insertion loss at f<sub>lo</sub>, which exactly corresponds to the component at f<sub>lo</sub>-fb, due to the trade-off between its efficiency, 3f<sub>lo</sub> suppression and tuning range. This work presents a new transformer-based notch filter mixer. In this structure, the 3f<sub>lo</sub> suppression can automatically track the channel frequency resulting in a wide operation frequency range of 1.4GHz to 2.7GHz while both the power efficiency and 3f<sub>lo</sub> suppression are improved compared with the conventional notch-filter mixer [4,5].

The transformer-based notch-filter mixer is shown in Fig. 9.8.1. The mixer load is based on a transformer with one pair of tunable capacitor banks Ctune, which has an impedance featuring a series resonance (zero) and a parallel resonance (pole). Compared with the conventional series [4] or parallel [5] 3f<sub>lo</sub> trap with only one zero or one pole at 3f<sub>lo</sub>, the extra pole in this work is employed at f<sub>lo</sub> to enhance the mixer gain, whereas conventional 3f<sub>lo</sub> traps only provide reactance load at f<sub>lo</sub> with large mixer loss. The impedance zero and pole of the transformer-based notch filter are illustrated in Fig. 9.8.1 with a simplified transformer lumped model as primary inductance L<sub>p</sub>, secondary inductance L<sub>s</sub> and mutual inductance L<sub>m</sub>. The impedance zero and pole are both functions of Ctune.

Defining R as the ratio of impedance zero frequency to pole frequency, it can be expressed as:

$$R = \frac{f_{zero}}{f_{pole}} = \frac{\sqrt{(L_s + L_p)(0.5L_m + L_s)}}{\sqrt{(0.5L_s L_m + L_s L_p + 0.5L_p L_m)}}. \quad (1)$$

From equation (1), the ratio R is always constant irrespective of frequency or Ctune and completely determined by the intrinsic properties of the transformer. R is also insensitive to process and temperature variation because it is decided by the ratio related to L<sub>p</sub>, L<sub>s</sub> and L<sub>m</sub>, which are always changed in the same direction with process and temperature variation. If the transformer is designed with R=3, by tuning Ctune, the impedance zero and pole frequencies move simultaneously with f<sub>zero</sub>=3\*f<sub>pole</sub> as the dashed line to the dotted line shown in Fig. 9.8.2. Therefore, with this notch filter mixer, the 3f<sub>lo</sub> suppression can automatically track the channel frequency when the channel frequency is tuned for different cellular bands.

With this technique, the first advantage is that 3f<sub>lo</sub> suppression is reinforced. The mixer gain is proportional to its load impedance at different frequencies, therefore 3f<sub>lo</sub> suppression is determined by the impedance difference between f<sub>lo</sub> and 3f<sub>lo</sub>. Figure 9.8.2 shows the comparison of the impedance in the dashed line with one pole and one zero to the impedance of a conventional 3f<sub>lo</sub> trap with only one zero in the solid line. The impedances are close to each other at 3f<sub>lo</sub>, while it shows higher impedance at f<sub>lo</sub> in the dashed line with this technique. Therefore, 3f<sub>lo</sub> suppression is reinforced by enhancing the f<sub>lo</sub> gain while sustaining the same attenuation at 3f<sub>lo</sub>. In this figure, the proposed notch filter also shows much wider trap frequency bandwidth compared with the conventional 3f<sub>lo</sub> trap, thus it is less sensitive to PVT variation and the Ctune is fixed for each channel frequency. The second advantage is that it relieves the gm and current requirements in the Driver Amplifier (DA) due to the gain enhancement at f<sub>lo</sub>. Compared to the conventional

passive mixer with a capacitance from DA input stage loading, where its gain drops with increased frequency, the mixer gain in this work nearly maintains a constant value with increased frequency due to the tuned transformer. This characteristic further benefits the power efficiency for HPUE application in B41. The third advantage is its automatic 3f<sub>lo</sub> suppression-tracking once the transformer is designed with ratio R=3. Because ratio R is constant irrespective of frequency or Ctune, the automatic 3f<sub>lo</sub> suppression tracking is effective for a wide frequency tuning range. With this technique, one RF path can cover Middle and High Band (MHB) from 1.4GHz to 2.7GHz.

The block diagram of the TX is shown in Fig. 9.8.3. It consists of a DAC, Low-Pass Filter (LPF), notch-filter mixer, conventional passive mixer for VGA, slicing DA and VGA. For 3G/LTE, it uses an I/Q direct upconversion architecture. The 25%-duty-cycle square waveforms are generated from a PLL by divide-by-2 and the related circuits. The notch transformer is designed and optimized with about R=3 considering the parasitic capacitance from the mixer and DA.

The TX has been implemented in a 14nm FinFET CMOS process with the baseband area of 0.6mm<sup>2</sup> and the RF area of 0.44mm<sup>2</sup> (Fig. 9.8.7). From 1.4GHz to 2.7GHz, the TX can provide at least -38dBc of 3f<sub>lo</sub> suppression, which can meet the HPUE B41 requirement. The measured output spectrum of HPUE B41 is shown in Fig. 9.8.4 for LTE20. It shows the output power of 5.1dBm and 6.3dBm for 1RB and Full RB, respectively. The total power consumptions of B41 are 138.8mW for 1RB and 140.4mW for Full RB including LO generation circuits. The measured emissions for B41 from CIM3 and CIM5 are -54.4dBc and -64.3dBc for 1RB, respectively. The ACLR emission for Full RB is -41.7dBc. For FDD performance in B1, the measured emissions from CIM3, CIM5 and ACLR are -62.6dBc, -70dBc and -44.7dBc, respectively. For 1RB and Full RB, this work shows only 97mW and 113.2mW DC power consumption for 2.1dBm and 3.1dBm output power, respectively, including LO generation circuits. Figure 9.8.5 shows the emissions from CIM3, CIM5 and ACLR versus output powers from various bands across MHB. In the measurement, the DA slices are turned on/off for different output powers to sustain high efficiency. From Fig. 9.8.5, all of the 1RB CIM3s are below -52dBc and CIM5s are below -70dBc, while all of the Full RB ACLRs are below -41dBc. Figure 9.8.6 shows the performance summary and compares with state of the art. It supports PC2 HPUE B41 with 3dB higher output power and better power efficiency. Compared with power mixers [1,2] and the conventional notch mixer [4] in the B1 1RB case, this work shows more than one-third power consumption reduction with very similar output power and CIM3. This work also demonstrates an automatic 3f<sub>lo</sub> suppression tracking technique, and thus covers the wide tuning range in MHB from 1.4GHz to 2.7GHz by only one RF path, while the conventional notch-filter mixers need more than two RF paths [4,5].

### References:

- [1] O. Oliae, et al., "A Multiband Multimode Transmitter Without Driver Amplifier," *ISSCC*, pp. 164-165, Feb. 2012.
- [2] S. Seth, et al., "A Dynamically Biased Multiband 2G/3G/4G Cellular Transmitter in 28nm CMOS," *IEEE JSSC*, vol.51, no.5, pp.1096-1108, 2016.
- [3] Y.-H Chen, et al., "An LTE SAW-Less Transmitter Using 33% Duty-Cycle LO Signals for Harmonic Suppression," *ISSCC*, pp. 172-173, Feb. 2015.
- [4] T. Kihara, et al., "A Multiband LTE SAW-less CMOS Transmitter with Source-Follower-Driven Passive Mixers, Envelope-Tracked RF-PGAs, and Marchand Baluns," *IEEE RFIC*, pp. 399-403, 2012.
- [5] B. Mohammadi, et al., "A Rel-12 2G/3G/LTE-Advanced 2CC Transmitter," *IEEE JSSC*, vol.51, no.5, pp.1080-1096, 2016.
- [6] M. Fulde, et al., "A Digital Multimode Polar Transmitter Supporting 40MHz LTE Carrier Aggregation in 28nm CMOS," *ISSCC*, pp. 218-219, Feb. 2017.



Figure 9.8.1: Circuit diagram and the principle analysis of the transformer-based notch-filter mixer.



Figure 9.8.2: The proposed notch-filter impedance and comparison with conventional 3fLO trap.



Figure 9.8.3: Transmitter block diagram with the transformer-based notch-filter mixer.



Figure 9.8.5: Measured CIM3s, CIM5s and ACLRs for various bands across MHB.



Figure 9.8.4: Measured HPUE B41 output spectrum(1RB and Full RB).

| Publication        | [1]               | [2]            | [4]                | [5]                | This work          |
|--------------------|-------------------|----------------|--------------------|--------------------|--------------------|
| CMOS Tech.         | 90 nm             | 28 nm          | 65 nm              | 40 nm              | 14 nm              |
| Architecture       | Power Mixer       | Power Mixer    | Notch Filter Mixer | Notch Filter Mixer | Notch Filter Mixer |
| Area               | mm <sup>2</sup>   | 5.0 (ABB+LMHB) | 1.45 (ABB+LMHB)    | 3.6 (ABB+LMHB)     | 2.4 (ABB+MHB)      |
| HUPE B41           | N/A               | N/A            | N/A                | N/A                | Support            |
| Band Covering      | MHB (1 path)      | MHB (1 path)   | MB&HB (2 paths)    | LB&MB&HB (3 paths) | MHB (1 path)       |
| 3fLO Auto-Tracking | No                | No             | No                 | No                 | Yes                |
| Cellular Band      | B1                | B1             | B1                 | B13                | B1                 |
| 4G 1RB             | Pout(dBm)         | 2.3            | 1.4                | 2.1                | 3                  |
|                    | CIM3&5(dBc)       | -63.58~-76     | -66.58~-70.6       | -60&N/A            | -68.6&-70          |
|                    | Poc(mW)           | ~185           | 154                | 155.6              | N/A                |
| 4G Full RB         | Pout(dBm)         | 4              | 4.1                | 2.1                | 4                  |
|                    | ACLR(dBc)         | -40.3          | -41                | -42                | -47                |
|                    | Poc(mW)           | 199            | 163                | 155.6              | N/A                |
|                    | RX-Noise (dBc/Hz) | -162@190MHz    | -158.8 @190MHz     | -161@N/A           | -158@80MHz         |

Figure 9.8.6: Performance summary table.



Figure 9.8.7: Die micrograph.

## 9.9 A High-Efficiency 28GHz Outphasing PA with 23dBm Output Power Using a Triaxial Balun Combiner

Bagher Rabet<sup>1</sup>, James Buckwalter<sup>2</sup>

<sup>1</sup>University of California, San Diego, La Jolla, CA

<sup>2</sup>University of California, Santa Barbara, Santa Barbara, CA

Gigabit-per-second millimeter-wave (mm-wave) access and backhaul networks at 28GHz demand high-order QAM, OFDM, and/or carrier-aggregated waveforms that force the PA to operate under high peak-to-average power ratio (PAPR) [1]. High PAPR requirements aggravate the design of mm-wave Si CMOS and SiGe BiCMOS PAs since a linear response and high efficiency are simultaneously desired. Recent work has demonstrated mm-wave PAs with peak efficiency exceeding 30% at 28GHz for output powers above 20dBm [1-5]. However, high average efficiency associated with high-PAPR waveforms remains elusive. To improve average efficiency, circuit techniques based on Doherty [3] and outphasing [6] have been demonstrated in mm-wave bands. Earlier work using these techniques showed average efficiency with QAM waveforms that is well under 20%.

In this paper, we present a SiGe BiCMOS outphasing power amplifier (OPA) with substantially better performance due to an extremely low-loss power combiner that realizes both excellent peak power-added-efficiency (PAE) of 41% and average PAE of 25.3% for an 8.1dB-PAPR signal at 28GHz. The power combiner is based on a compact triaxial balun structure that simultaneously generates the Chireix compensating reactances at the output ports of the PAs for load modulation and combines the RF power with low loss.

The conventional Chireix OPA is shown in Fig. 9.9.1. The PAs drive the combiner with constant-envelope signals separated by an outphasing angle ( $\pm\phi$ ). The load seen by each PA is modulated through a non-isolating power combiner, shown here as a transformer, along with the opposite-signed Chireix reactances ( $\pm jX_{CH}$ ) at each of the combiner ports. Previous work investigated the Chireix OPA in mm-wave bands with limited success to realize high average efficiency [6]. Two significant challenges exist for CMOS/BiCMOS PAs in mm-wave bands. First, high losses in the on-chip power combiners significantly reduce the gain, output power, and any theoretical average efficiency improvement. Second, the typical OPA requires a voltage-mode PA that is difficult to realize at mm-wave frequencies due to the relative admittance presented by device parasitics.

The proposed OPA is also shown in Fig. 9.9.1 and replaces the conventional Chireix combiner with a triaxial balun that combines the outputs of the two PAs while inherently producing the compensating reactances and providing impedance match to the load with low loss. The PA cell is also illustrated in Fig. 9.9.1 based on a 0.13μm SiGe HBT cascode where the output capacitance is roughly 85fF. The cascode base is biased with low impedance to sweep out the carriers generated from impact ionization, thereby improving the breakdown for increased output power. The transistor emitter lengths are sized to create a loadline impedance that optimizes the efficiency and output power over the range of impedances seen by the triaxial balun. The PA cell produces 20.5dBm and a maximum PAE of 47% based on post-layout simulation. While the PA cells present a large output impedance to the combiner compared to the canonical voltage source in a general outphasing approach, the amplitude and phase difference of the input signals ( $S_1$  and  $S_2$ ) are modified to maintain high efficiency over a range of output powers.

The triaxial balun is conceptually illustrated in Fig. 9.9.2 and indicates two inner conductors connected to the input ports ( $P_1$  and  $P_2$ ) and an outer conductor that serves as a return path (ground) for the current from the output port that is connected to a load impedance  $R_L$  [7]. The characteristic impedance  $Z_{0,IN}$  between  $P_1$  and  $P_2$  is designed independently of the characteristic impedance  $Z_{0,OUT}$  between  $P_2$  and ground. As shown in the equivalent RF circuit, the  $Z_{0,IN}$  provides output matching of the triaxial balun while  $Z_{0,OUT}$  produces a shunt reactance on  $P_2$ . Therefore, the impedances produced by the triaxial balun can be related to the design of the Chireix outphasing scheme as shown in Fig. 9.9.2. The PA cell's output capacitance is absorbed into the Chireix network while the length of the balun is chosen to produce the desired shunt inductance on one of the PA outputs. Thus, a relatively short length, e.g.  $l = \lambda/16$ , transmission line is required for the outphasing combiner, resulting in low loss.

The balun equivalent model captures the RF behavior precisely, but slight adjustments are needed to capture the DC behavior. Each PA cell has a DC voltage supply provided through the triaxial balun. The cell connected to  $P_2$  is fed directly from the alternate end of the conductor with AC short provided by local bypass capacitors (not shown in figure). The cell connected to  $P_1$  can be fed via a DC-feed inductor at any point along the inner conductor. In this work, a wirebond connecting one of the DC pads to the output RF pad provides this inductor.

In Fig. 9.9.3, the triaxial balun, as implemented in a planar integrated circuit process, is shown.  $P_1$  is the central conductor while  $P_2$  is the shield around  $P_1$  to form a microstrip structure, and the return path is through the ground conductor on either side of  $P_2$ . A fabricated back-to-back test structure indicates that the measured insertion loss of the combiner is 0.52dB and is close to the simulated value of 0.35dB around 28GHz. Furthermore, Fig. 9.9.3 shows the impedances seen at  $P_1$  and  $P_2$  as a function of outphasing angle, which matches the canonical outphasing load modulation. The loads corresponding to the PA cell's peak output power and efficiency, denoted  $R_{L,Pout}$  and  $R_{L,PAE}$ , are traversed by the outphasing impedance trajectories.

The simulated and measured gain, collector efficiency ( $\eta$ ), and PAE are plotted in Fig. 9.9.4. The input signals for testing are equi-amplitude with opposite phase. An initial sweep is performed to determine the optimal amplitude and phase for maximum efficiency. The measured small-signal gain is 14dB. The peak output power is 23dBm with corresponding PAE of 35.5%. The peak PAE of 41% is reached at 21dBm while the collector efficiency reaches 44%. The PAE measured at 6dB backoff relative to the maximum power of 23dBm is 34.7%. Additionally, corroboration between simulation and measurement is excellent over the entire range of output power.

The OPA was tested with an 80MHz 64-QAM OFDM signal with PAPR of 8.1dB at 28 GHz as shown in Fig. 9.9.5. Equalization is applied to the entire test setup including the PA. The average output power with modulation is 14.3dBm and the RMS EVM of 3% is achieved with the use of a memoryless DPD algorithm. The average PAE for the OFDM signal is 25.3%. The adjacent-channel power leakage, also shown in Fig. 9.9.5, indicates that the relative power 60MHz away from the band edge is less than -33dBc.

Figure 9.9.6 compares the proposed OPA with state-of-the-art PA performance at 28GHz. This work achieves high output power, peak efficiency, and most notably, the highest 6dB-power-backoff PAE at 34.7% compared to prior work. Furthermore, the modulation measurement demonstrates that the PA can achieve excellent EVM at a high average output power with the assistance of memoryless DPD. Additionally, the average PAE for the QAM waveform is the highest average efficiency for a mm-wave PA and demonstrates the potential for fully integrated PAs that can efficiently support high-PAPR waveforms at 28GHz. The die micrograph is shown in Fig. 9.9.7. The PA area, including pads and input routing, is 700um×800um and the chip operates from a 4V supply.

### Acknowledgements:

The authors thank P. Maki of ONR for support of this work and N. Cahoon of GlobalFoundries for chip fabrication. Additionally, the authors thank Prof. D. Sievenpiper, Prof. P. Asbeck, H. Gheidi, and C. Levy of UCSD. We appreciate the donation of EMX software from Integrand.

### References:

- [1] S. Shakib, et al., "A Wideband 28GHz Power Amplifier Supporting 8×100MHz Carrier Aggregation for 5G in 40nm CMOS," ISSCC, pp. 44-45, Feb. 2017.
- [2] A. Sarkar, et al., "A 28-GHz SiGe BiCMOS PA with 32% Efficiency and 23-dBm Output Power," IEEE JSSC, vol. 52, no. 6, pp. 1680-1686, June 2017.
- [3] S. Hu, et al., "A 28GHz/37GHz/39GHz Multiband Linear Doherty Power Amplifier for 5G Massive MIMO Applications," ISSCC, pp. 32-33, Feb. 2017.
- [4] S. Shakib, et al., "A 28GHz Efficient Linear Power Amplifier for 5G Phased Arrays in 28nm Bulk CMOS," ISSCC, pp. 352-353, Feb. 2016.
- [5] A. Sarkar and B. A. Floyd, "A 28-GHz Harmonic-Tuned Power Amplifier in 130-nm SiGe BiCMOS," IEEE TMTT, vol. 65, no. 2, pp. 522-535, January 2017.
- [6] D. Zhao, et al., "A 60GHz Outphasing Transmitter in 40nm CMOS with 15.6dBm Output Power," ISSCC, pp. 170-171, Feb. 2012.
- [7] H.-C. Park, et al., "Millimeter-Wave Series Power Combining Using Sub-Quarter-Wavelength Baluns," IEEE JSSC, vol. 49, pp. 2089-2102, 2014.



Figure 9.9.1: Block diagram comparing a conventional Chireix outphasing PA and proposed triaxial balun outphasing PA. Illustration of proposed HBT PA cell and PAE simulations.



Figure 9.9.2: Conceptual representation of the triaxial balun and equivalent RF circuit as related through design criteria to the outphasing PA.



Figure 9.9.3: Implementation of the planar triaxial balun and comparison of simulation and measurement for a back-to-back balun. Simulation of the resulting load modulation on the PA cells.



Figure 9.9.4: Comparison of simulated and measured efficiency and gain alongside an ideal Class-B response. The efficiency improvement at backoff is evident.



Figure 9.9.5: Illustration of the spectrum, ACPR, and constellation for an 80MHz 64-QAM OFDM signal with average output power of 14.3dBm. The measured EVM is -30.5dB and average PAE is 25.3%.

TABLE I

COMPARISON WITH THE STATE OF THE ART

|                                 | [4]         | [1]                  | [3]                  | This work      |
|---------------------------------|-------------|----------------------|----------------------|----------------|
| Frequency (GHz)                 | 30          | 27                   | 28                   | 28             |
| Technology                      | 28nm CMOS   | 40nm CMOS            | 130nm SiGe           | 130nm SiGe     |
| P <sub>SAT</sub> (dBm)          | 15.3        | 15.1                 | 16.8                 | <b>23</b>      |
| Peak PAE (%)                    | 36.6        | 33.7                 | 20.3                 | <b>41.4</b>    |
| PAE @ 6dB from P <sub>SAT</sub> | NA          | 15.1                 | 13.9                 | <b>34.7</b>    |
| Modulation Type                 | 64 QAM OFDM | 64 QAM OFDM          | 64QAM Single Carrier | 64 QAM OFDM    |
| PAPR (dB)                       | 9.6         | 9.7 (clipped to 8.4) | NA                   | 8.1            |
| Signal BW (MHz)                 | 250         | 800                  | 500 / 1000           | 80             |
| Pre-distortion                  | No          | No                   | No                   | Memoryless DPD |
| Average Pout (dBm)              | 5.3         | 6.7                  | 9.2 / 7.2            | <b>14.3</b>    |
| EVM (dB)                        | -25         | -25                  | -27 / -26.6          | -30.5          |
| Average PAE (%)                 | 9.6         | 11                   | 18.5* / 14.4*        | <b>25.3</b>    |

\* Collector Efficiency (not PAE)

Figure 9.9.6: Table of comparison with recent 28GHz PAs.



Figure 9.9.7: Die micrograph of the PA circuit.