

# SleepTalker: A ULV 802.15.4a IR-UWB Transmitter SoC in 28-nm FDSOI Achieving 14 pJ/b at 27 Mb/s With Channel Selection Based on Adaptive FBB and Digitally Programmable Pulse Shaping

Guerric de Strel, *Member, IEEE*, François Stas, *Student Member, IEEE*, Thibaut Gurné, *Student Member, IEEE*, François Durant, *Student Member, IEEE*, Charlotte Frenkel, *Student Member, IEEE*, Andreia Cathelin, *Senior Member, IEEE*, and David Bol, *Member, IEEE*

**Abstract**—Achieving wireless communications at 5–30 Mb/s in energy-harvesting Internet-of-Things (IoT) applications requires energy efficiencies better than 100 pJ/b. Impulse-radio ultrawideband (UWB) communications offer an efficient way to achieve high data rate at ultralow power for short-range links. We propose a digital UWB transmitter (TX) system-on-chip (SoC) designed for ultralow voltage in 28-nm FDSOI CMOS. It features a PLL-free architecture, which exploits the duty-cycling nature of impulse radio through aggressive duty cycling within the pulse modulation time slot for high energy efficiency and minimum jitter accumulation. Wide-range on-chip adaptive forward back biasing is used for threshold voltage reduction, PVT compensation, and tuning of both the carrier frequency and the output power. To ensure spectral compliance with output power regulations without the use of bulky and expensive off-chip filters, a programmable pulse-shaping functionality is integrated in the digital power amplifier based on a 7–9-GS/s, 5-b current DAC. Operated at 0.55 V, it achieves a record energy efficiency of 14 pJ/b for the TX alone and 24 pJ/b for the complete SoC with embedded power management. The TX SoC occupies a core area of 0.93 mm<sup>2</sup>.

**Index Terms**—Back biasing, dc/dc converter, FDSOI, IEEE 802.15.4a, impulse-radio ultrawideband (IR-UWB), PVT compensation, RF, system-on-chip (SoC), transmitter (TX), UWB, wireless.

Manuscript received August 8, 2016; revised October 15, 2016 and December 8, 2016; accepted December 13, 2016. Date of publication January 26, 2017; date of current version March 23, 2017. This work was supported in part by F.R.S.-F.N.R.S. and in part by the FP7 MSP Project. The work of G. de Strel and C. Frenkel was supported by F.R.S.-F.N.R.S. of Belgium. This paper was approved by Guest Editor Brian Ginsburg.

G. de Strel was with the ICTEAM Institute, Université catholique de Louvain, Belgium. He is now with imec, 3001 Leuven, Belgium (e-mail: guerric.ds@gmail.com).

F. Stas, C. Frenkel, and D. Bol are with the ICTEAM Institute, Université catholique de Louvain, 1348 Louvain-La-Neuve, Belgium (e-mail: david.bol@uclouvain.be).

T. Gurné is with Nokia Bell Labs, 2018 Antwerpen, Belgium.

F. Durant is with Verotech BVBA, Leuven, Belgium.

A. Cathelin is with STMicroelectronics, 38920 Crolles, France.

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2016.2645607



Fig. 1. Typical IoT WSN architecture.

## I. INTRODUCTION

THE deployment of the Internet-of-Things (IoT) ubiquitous sensing paradigm is constrained by the development of the four functions usually embedded in a wireless sensor node (WSN), as shown in Fig. 1. To avoid the economical cost and environmental footprint of battery replacement [1], the WSNs need to operate on an energy-harvesting basis. This introduces strong constraints on the power consumption of the sensing, data processing, and RF transceiver, which communicates the sensed data to the cloud. Energy-efficient data processing, such as feature extraction or compression, can help reducing the power consumption by reducing the required data rate [34], [35]. However, even with strong on-chip data processing, fitting the radio component inside the harvesting power budget is challenging. Many new wireless solutions tackling this have recently appeared, such as [2] or [3]. They usually solve the power problem with duty-cycled radios taking advantage of the low data rate requirement of the sensing applications. However, applications, such as vision or large distributed WSN networks, call for medium data rates above 5 Mb/s or for low communication latency.

Impulse-radio ultrawideband (IR-UWB) communication is a promising solution for high data rate, short-range, and low-power communication due to the duty-cycled nature of the

output signal as well as the potential for low-complexity and low-power transmitter (TX) architectures [4]. These characteristics have been the driving force behind the development of the IEEE 802.14.4a standard [5] covering data rates from 0.11 to 27.24 Mb/s. Nevertheless, current full 802.15.4a TX can hardly achieve energy efficiency better than 200 pJ/b, which corresponds to a prohibitive communication power around 3.5 mW at 27 Mb/s.

There are three critical functions that need to be carefully designed in an IR-UWB TX in order to achieve high energy efficiency: the frequency synthesis, the power amplification, and the spectrum compliance. Frequency synthesis can be done with a PLL [6], [7], which is difficult to fit in the tight power budget set by the application or with an *LC*-tank-based oscillator [8]–[12] integrating bulky inductors. Both of these solutions also feature a long start-up time limiting the duty-cycling capabilities. Alternatively, delay-element-based edge recombination [13]–[15] or ring oscillators (ROs) [16], [17] can achieve lower power consumption and faster settling time. Unfortunately, the delay element tuning usually implemented through current starving reduces the jitter performance as well as the ability to operate with a reduced supply voltage.

Power amplification can be achieved with an analog RF amplifier [6], [13] at the cost of a large static power dissipation or with a digital power amplifier (PA) [4], [18]. However, digital implementation typically features spectrum component in the highly constrained 960–1610-MHz FCC band [20]. Spectrum compliance is thus a critical challenge for a fully integrated IR-UWB TX. Previous work rely on balun or off-chip filters [7], [12], [13] to filter the low-frequency content generated by digital PAs imposing large on-chip passive elements or off-chip components. In [4], [10], and [13], on-chip filtering solutions based on finite impulse response filter integrated in the pulse generation or in the PA are proposed in order to achieve a high level of integration.

This paper presents SleepTalker: a fully integrated versatile IR-UWB TX system-on-chip (SoC) compliant with the IEEE 802.15.4a standard and implemented in 28-nm FDSOI. The key contributions of this paper are the following ones.

- 1) A PLL-free architecture is proposed with an aggressive duty cycling of the local oscillator (LO), which is used within the binary position modulation (BPM) time slot to reduce jitter accumulation and power consumption.
- 2) Programmable pulse shaping is implemented by a digital PA to avoid the need for bulky off-chip filters while meeting spectral compliance with the FCC regulation [19].
- 3) High-speed ultralow-voltage (ULV) digital implementation enabled by the 28-nm FDSOI process with wide-range  $+/-1.8$  V forward back biasing (FBB) and new retentive true single phase clocked (TSPC) flip-flops.
- 4) On-chip adaptive FBB for tuning both the carrier frequency (CF) of the back-bias-controlled (BBCO) as LO between 3.5 and 4.5 GHz for channel selection and the PA output power.
- 5) Continuous-time nMOS/pMOS current-matching loop through adaptive FBB with a back-gate-driven amplifier for voltage range compatibility.

- 6) Embedded power management featuring a high-density dc/dc converter for 0.55 V generation and butterfly connected charge pumps to generate the  $+/-1.8$  V used for the FBB generation.

This paper is organized as follows. The TX architecture is presented in Section II. The design and implementation of key blocks are detailed in Section III. Section IV describes the SoC architecture and Section V presents the measurement results and performance comparison to the related state of the art.

## II. ARCHITECTURE OF THE DUTY-CYCLED PLL-FREE TX SYSTEM

IR-UWB communication stand-out by their inherently duty-cycled nature enabling high-efficiency TX design. The IEEE 802.15.4a standard [5] aims at maximizing this feature by concatenating pulses into bursts while maintaining a constant pulse repetition frequency to keep the average output power constant over a wide range of data rates from 0.11 to 27.24 Mb/s.

To take advantage of the strongly duty-cycled nature of the IEEE 802.15.4an IR-UWB signals, we aim at aggressively duty cycling the majority of the TX right from the LO with a PLL-free architecture.

Three levels of duty cycling are implemented. First, the packet-level baseband keeps the rest of the TX idle between two transmitted packets. Second, at symbol level, the TX is idle in both guard intervals as well as in the unused BPM time slot (Fig. 2). Finally, the LO is enabled (ENABLE\_LO signal) within the BPM time slot only when the pulse burst need to be emitted with a granularity of 16 ns, i.e., half of the reference 31.25-MHz clock period, as described in Fig. 2. In the fast 27.24-Mb/s data rate, the LO is running during 1/4 of the 64-ns symbol clock period. In slower data rates, the LO is running for maximum 32 ns except in the 0.11-Mb/s data rate for which the LO should run for the total burst duration of 64 ns. Such an aggressive duty cycling of the LO directly results in strong power savings as the subsequent TX logic is idle when the LO output remains static.

Fig. 3 shows the TX architecture. It is built from a single 31.25-MHz reference crystal clock from which all necessary high-frequency clocks are generated in the duty-cycled frequency synthesis block. It features a 3.5–4.5-GHz LO whose frequency is divided to generate a 500-MHz internal clock with a 2-ns period corresponding to the pulse duration. The calibration of the LO frequency is performed periodically as triggered by the TRIG\_CALIB signal. The TX covers seven data rates defined in the 802.15.4a standard summarized in Table I. The symbol clock generated from the 31.25 MHz reference varies depending on the selected data rate from 122 kHz to 15.6 MHz.

To perform the within-BPM LO duty cycling, the coarse burst position inside the BPM time slot is computed with a 32-ns time resolution with the output of the IEEE 802.15.4a linear feedback shift register implementing the pulse position scrambling in the symbol-level baseband. Inside this 32-ns window, the pulse-level baseband implements the fine-grain burst positioning by enabling the pulse-shaping PA through



Fig. 2. Aggressive LO duty-cycling principle illustrated by waveforms of the LO enabling signal for two data rates. At high data rate (top), the LO is active for a quarter of the symbol duration, which corresponds to the BPM time slot and half a reference crystal-clock cycle. At low data rate (bottom), the LO is active for one cycle of the crystal clock and the position of the pulse inside the quarter symbol duration is computed in the symbol-level baseband.



Fig. 3. Architecture of the TX. The baseband is partitioned between crystal-frequency blocks synthesized from standard cells and aggressively duty-cycled high-frequency full-custom blocks.

the ENABLE\_PA signal during only the burst duration ranging from 2 to 32 ns (64 ns in the 0.11-Mb/s data rate). Finally, a high-efficiency digital PA performs programmable pulse shaping for compliance with output power spectrum regulations. An SPI port is used for sending data and programming configuration registers, which include pulse-shape registers.

The packet-level baseband clocked at 31.25 MHz is synthesized from standard cells along with the SPI interface and configuration registers. The rest of the TX results from a full-custom design. The symbol-level baseband is clocked at 31.25 MHz. The pulse-level baseband uses the 500-MHz clock extracted from the LO through the frequency divider.

The design of the LO and duty-cycled frequency synthesis is described in Section III-A, the pulse-shaping digital PA in

Section III-B, and the solutions for ULV digital implementation up to 4.5 GHz in Section III-C.

### III. DESIGN AND IMPLEMENTATION OF THE TX

#### A. Duty-Cycled Frequency Synthesis

The IEEE 802.15.4a bursts are modulated both by BPM and binary phase-shift keying (BPSK), which implies a strong requirement on the frequency accuracy and stability for correct demodulation. Following the analysis from [17], we can define an upper bound for the total phase noise at the end of the burst in order to ensure correct demodulation of the BPSK-encoded data. By taking some margin into account, we set the maximum phase noise allowable to  $\pi/4$ , which corresponds to a maximum accumulated jitter of 27.7 ps after

TABLE I  
SUPPORTED IEEE 802.15.4a DATA RATES

| Mode ID | Channels [GHz] | # Burst positions per symbol | burst duration [ns] | Symbol rate [Ms/s] | Data rate [Mb/s] | Mean PRF [MHz] |
|---------|----------------|------------------------------|---------------------|--------------------|------------------|----------------|
| 1       | 3.5            | 32                           | 32                  | 0.98               | 0.85             | 15.6           |
|         | 4.0            |                              |                     |                    |                  |                |
|         | 4.5            |                              |                     |                    |                  |                |
| 2       | 3.5            | 32                           | 4                   | 7.8                | 6.81             | 15.6           |
|         | 4.0            |                              |                     |                    |                  |                |
|         | 4.5            |                              |                     |                    |                  |                |
| 3       | 3.5            | 32                           | 2                   | 15.6               | 27.24            | 15.6           |
|         | 4.0            |                              |                     |                    |                  |                |
| 4       | 3.5            | 128                          | 64                  | 0.12               | 0.11             | 3.90           |
|         | 4.0            |                              |                     |                    |                  |                |
|         | 4.5            |                              |                     |                    |                  |                |
| 5       | 3.5            | 128                          | 8                   | 0.98               | 0.85             | 3.90           |
|         | 4.0            |                              |                     |                    |                  |                |
|         | 4.5            |                              |                     |                    |                  |                |
| 6       | 3.5            | 128                          | 4                   | 1.95               | 1.70             | 3.90           |
|         | 4.0            |                              |                     |                    |                  |                |
|         | 4.5            |                              |                     |                    |                  |                |
| 7       | 3.5            | 128                          | 2                   | 3.90               | 6.81             | 3.90           |
|         | 4.0            |                              |                     |                    |                  |                |
|         | 4.5            |                              |                     |                    |                  |                |

32 ns at 4.5 GHz. However, the aggressive duty cycling of the LO prevents the use of a PLL to improve jitter performances because of the prohibitive settling time of PLLs and forces the choice of an architecture based on a free-running LO. *LC*-based oscillators can offer good jitter performances but at the cost of high power, large area, and large settling time. To avoid these drawbacks, previous low-power TX relied on ROs duty-cycled to a quarter of the symbol period. However, in the case of low data rate, the quarter symbol period is quite long and lead to a high accumulated jitter due to the degraded phase noise of RO compared with *LC* oscillators. Fig. 4(a) shows a high accumulated jitter for a current-starved differential RO at nominal voltage and for a run time corresponding to the worst symbol-duration case in the 0.11-Mb/s data rate of the IEEE 802.15.4a. As shown in Fig. 4(a), the proposed within-BPM LO duty cycling proposed in Section II significantly limits the jitter accumulation. Nevertheless, it is still prohibitive for BPSK data encoding. Therefore, we further improve jitter by optimizing the LO as follows.

- 1) We avoid current starving in the LO thanks to FBB-based frequency tuning. The FBB is used to modify the  $V_T$  [Fig. 4(b)] and, therefore, the delay of the RO delay elements and tune the CF to the selected IEEE 802.15.4a channel. The LO is thus a back-bias-controlled oscillator (BBCO).
- 2) We use a differential multiple-pass RO based on feed-forward loops similar to the one proposed in [21]. The seven-stage architecture shown in Fig. 4(b) improves both the switching speed as well as the noise performance thanks to its fast transitions.

As shown in Fig. 4(a), reducing the run time of the LO from 1/4 of the symbol duration to only one cycle of the crystal clock leads to more than 10× reduction of the accumulated jitter. When reducing the supply voltage down to 0.55 V and applying 1.1 V of FBB to keep the frequency constant, the accumulated jitter is further reduced by 10×. Overall, the proposed ULV multiple-pass RO with aggressive within-symbol duty cycling achieves a total reduction of the accumulated jitter of 500×. This accumulated jitter takes into



Fig. 4. LO. (a) Simulated accumulated jitter performances at 4.5 GHz. (b) Architecture of the proposed multiple-pass back-bias-controlled oscillator. The frequency is controlled through the FBB voltages applied to the nMOS (BBN) and the pMOS (BBP) of the delay element. The  $V_T$  range is illustrated as well.

account the settling time variability that leads to a phase error and thus should also be bound by the 27.7-ps upper limit.

The enable functionality of the proposed BBCO is implemented by slightly modifying two of the seven delay elements in the chain. These cells implements NOR and NAND functions in order to gate the oscillation in the main path as well as in the feedforward path. They are sized to feature the same delay as the regular delay elements in order to keep the maximum tuning frequency above 4.5 GHz taking PVT into account. Postlayout simulations show that the BBCO consumes 180  $\mu$ W when operating at 4.5 GHz and features a phase noise of  $-102$  dBc/Hz at 30-MHz offset.

The free-running LO is integrated inside a calibration loop described in Fig. 5 to periodically calibrate its frequency. The BBCO output is divided by a programmable frequency divider to generated a 500-MHz clock. A control logic compares the divided LO output to a reference crystal clock at 31.25 MHz and adapts the BB voltage command CMD\_CF



Fig. 5. Block diagram of the BBCO frequency calibration loop through adaptive FBB.

to lock the LO frequency on the desired 3.5–4.5-GHz channel. A 10-b DAC generates the nMOS BB voltage (BBN<sub>LO</sub>) and a current-matching loop adapts the pMOS BB voltage (BBP<sub>LO</sub>) to keep the nMOS/pMOS ON current ratio constant as described in Section IV-C.

The calibration of the free-running BBCO is done periodically between transmission depending on the selected data rate as described in Section IV-C.

### B. Pulse-Shaping Power Amplifier

Designing pulse generator without the high-order off-chip filtering to avoid emissions in the 960–1610-MHz restricted band [19] is challenging. As described by [4], capacitive coupling of the digital PA output does not provide a sufficient roll-off to meet the FCC regulation limit when emitting in the 3–5-GHz UWB bands.

In this paper, we propose a fully digital pulse-shaping filter integrated within the PA to improve the roll-off. Fig. 6 shows the spectral impact of shaping a rectangular pulse to a Gaussian pulse with the same duration. For an ideal Gaussian shaping, the spectral content in the 960–1610-MHz band can be reduced by more than 20 dB, which can directly be used to increase the maximum output power while meeting the FCC regulation.

Fig. 6 further shows the impact of the amplitude resolution of the pulse shaping on the resulting spectrum. It is possible to meet the restrictions in the 960–1610-MHz band with a 5-b resolution on the Gaussian shape with a minimum power and area overhead. To implement the shaping functionality, the digital PA is split in independent parallel stages to act as a 32-stages current DAC.

Fig. 7 shows the impact of the sampling rate on the output power density spectrum of the pulses. If the sampling is performed at the LO central frequency, spectral folding will cause significant power close to the restricted 960–1610-MHz band that will not be sufficiently attenuated by the first order roll-off of the coupling capacitance. To avoid this problem, the PA and pulse shaper are operated at 2× the LO frequency,

which pushes the spectral folding away from the restricted bands.

The complete pulse-shaping PA architecture is described in Fig. 8. The dual ring counters are clocked on the LO positive and negative phase and generate seven to nine enable signals with duty-cycle varying from 1/7 to 1/9 depending on the selected channel from 3.5 to 4.5 GHz. The symbol-level baseband generates the BPSK-encoded and scrambled data giving the phase of the pulse. The shape envelope information is contained in the pulse-shaping registers. Each of the 32 parallel stages is controlled by an 18-b word that determines whether the stage is ON or OFF for each of the 18 time slots that compose the 2-ns pulse. These 32 parallel stages thus form a 5-b DAC controlled in a thermometric fashion by the 32 control words with 18 b each for the 18 time slots. This structure allows amplitude modulation over the whole pulse duration. The ring counter generates Sel<sub>M</sub> and Sel<sub>P</sub> that are used to cycle through the 18 pulse time slots by selecting the bit inside the control words that will conditionally enable the parallel output stages. The BPSK-encoded data select the LO phase to use to modulate the pulse. The upconversion is done by using the enable signal in each stage to gate or not the LO signal before propagation to the power stage. Pulse modulation is, therefore, conducted inside each stages and buffering is inserted to drive the PA output stage, as shown in Fig. 8.

The PA output stage is a tristate inverter sized, so that the 32 parallel stages with mid-range back biasing are able to drive the antenna with a peak-to-peak voltage of 350 mV at 4.5 GHz. The tristate output stages offer two advantages as they can be completely turned-OFF to save power between bursts and they allow the TX to share the antenna with a receiver. Adaptive FBB is used to tune the RF output power and to compensate for PVT variations by controlling the current of the PA output stage.

Between pulses, the PA output is precharged to  $V_{DD}/2$  to maximize the output pulse swing. Indeed, due to mismatch in the phase of the signals driving the pMOS and nMOS of each output stages and the enabling signal of the stages and precharge amplifier, the output voltage at the end of the pulse can be different than  $V_{DD}/2$ . In order to be ready for the next pulse, the precharge amplifier should be able to bring the output node back to  $V_{DD}/2$  in less than 16 ns, which is the duration of one guard interval in the fastest data rate.

In conventional class-A OTA, the slew rate is linked to the bias current and to the multiplication factor of the current mirror between the input and output stage leading to a fundamental tradeoff between slew rate and quiescent current. In our application, the slew rate should be high enough to achieve the 17 V/ $\mu$ s specification to bring the dc level back to  $V_{DD}/2$ . Unfortunately, the quiescent current is also a critical specification of the OTA as we do not benefit from duty cycling. To reach these requirements, we designed a high-slew-rate (HSR) precharge amplifier based on a superclass-AB OTA from [25] and shown in Fig. 9. It features an adaptive biasing of the input differential pair depending on the large differential input signal to automatically boost the bias current and achieve



Fig. 6. Left: square and Gaussian ideal pulse shapes. Right: impact of the pulse shape on the output power spectral density. For the same maximum power inside the band, the square-shaped pulse features significant power in 960–1610-MHz restricted band. The Gaussian-shaped pulse shows a higher out-of-band filtering and increasing filtering effect for higher amplitude resolution.



Fig. 7. Impact of the pulse-shape time resolution. Left: Gaussian shape approximated with 9 ( $@f_{LO}$ ) or with 18 ( $@2 \times f_{LO}$ ) time steps. Right: spectral folding caused by the sampling frequency of the shape. A sampling at  $f_{LO}$  leads to a folding peak at dc and spectral content in the 0.96–1.61-GHz band, which violates the FCC regulation mask. Quantization at  $2 \times f_{LO}$  eliminates the folding around dc.

HSR while maintaining a low quiescent current. In addition to the adaptive biasing, it features a local common-mode feedback first described in [26] implemented with two 10-k $\Omega$  matched resistors in order to boost the current efficiency. The HSR is supplied on the 1.2 V supply in order to reach the stringent slew-rate requirement. Two common-source stages upconvert the inputs from the 0.55 V domain for voltage compliance and an enabling signal EN\_HSR disconnects the HSR output to the output of the PA stages during the pulse emission, as shown in Fig. 8.

Fig. 9 shows the adaptive biasing as well as Monte Carlo postlayout simulations of the tristate outputs converging after the end of a pulse.

### C. ULV High-Speed Digital Implementation

Operating digital ULV circuits above 3 GHz is challenging due to the significant speed penalty of the near-threshold MOSFET operation. The long propagation delay can be dealt with by fine-grained pipelining at the expense of an increased clock load [22]. Near-threshold operation also implies a high penalty in term of output transition time. For a fixed operation frequency, the longer output transition time limits the maximum acceptable fan-out, as shown in Fig. 10, which cannot be improved by fine-grain pipelining. The 28-nm FDSOI technology can help to alleviate the clock distribution challenge. Compared with 65-nm GP bulk CMOS, 28-nm FDSOI CMOS is shown in Fig. 10 to achieve higher maximum



Fig. 8. Architecture of the programmable pulse-shaping digital PA. The pulse-shaping functionality is implemented by a 5-b, 9-Gs/s current DAC controlled by the ring counters outputs. The PA output is precharged at  $V_{DD}/2$  by an HSR amplifier and a dc blocking MiM capacitor filters the dc value. Right: sequencing stage-select signals generated by the ring counters and simulated RF output waveforms.

fan-out, lower minimum  $V_{DD}$ , and lower total energy in an inverter chain with a fan-out of one (FO1) thanks to its better electrostatic characteristics and lower capacitances.

FBB allows higher frequency operations through a  $V_T$  reduction [23], Fig. 10 also shows that it offers a maximum fan-out increased by 1 compared with the  $FBB = 0$  V situation and has a negligible energy overhead when switching at frequencies over 1 GHz. At 0.55 V, the maximum fan-out is increased by 2× when the FBB is increased from 0 to 1 V.

The voltage scaling impact on the maximum usable fan-out at 4.5 GHz leads to increased constraints on the clock tree. Standard scan master-slave flip-flops from the foundry library are too slow to operate the dual ring counters or the frequency divider at 3.5–4.5 GHz. The maximum frequency of operation for the frequency divider in postlayout simulations is limited to 4.98 GHz taking into account  $3\sigma$  local variability, as shown in Fig. 11, which leaves only a small design margin.

To increase the design headroom, we designed new flip-flops based on TSPC architecture from [24]. It offers a reduced variability on the clock-to-Q timing thanks to the single phase clock, which allows removing the clock inverter stage. However, TSPC flip-flops loose the data after a few hundred of nanoseconds following the clock edge as shown in Fig. 11. To circumvent this problem, we added a data retention feedback loop in the output stage of the flip-flop to guarantee data retention when the clock is low. At system level, we forced both phases of the LO to be gated at the low logic level to ensure that the frequency divider and the dual ring counters are in data retention without further additions to the TSPC flip-flop architecture.

The clock load of a single flip-flop is reduced by 38% compared with master-slave architecture. At block level, using the new TSPC flip-flops in the frequency divider leads to a +73% improvement in maximum frequency of operation and -24% in total power extracted from postlayout simulations.

Finally, the 9GS/s sampling rate of the PA is achieved by using the two phases of the 4.5-GHz clock. We thus avoid the use of an explicit doubler to obtain a 9-GHz clock from the BBCO output that would be challenging to implement. Indeed, as shown in Fig. 10, we can only propagate a 4.5-GHz clock at 0.55 V by using the FBB, generating and propagating a 9-GHz clock at such a low supply voltage will be impossible. Instead, we used two paths clocked on the positive and negative phases of the BBCO output as shown in Fig. 8, and these two clocks feature a 180° phase shift thanks to the use of both polarities of the differential BBCO.

#### IV. SoC INTEGRATION

The SleepTalker SoC shown in Fig. 12 requires a single 1.2 V external supply in addition to the digital I/O voltage and features the TX described in Section II, an on-chip switched-capacitor dc/dc converter to generate the core 0.55 V supply voltage, a charge-pump-based generator supplying the  $\pm 1.8$  V to the back-biasing generator, the FBB generators, and an always-on sleep controller.

In sleep mode, the dc/dc converter performs power gating on the 0.55 V blocks. The sleep controller remains on in sleep mode and is thus supplied from the external 1.2 V supply. To reduce its leakage power, the sleep controller is implemented with regular- $V_T$  devices. The transition from sleep



Fig. 9. Schematic of the HSR amplifier along with postlayout simulations of the adaptive biasing and Monte Carlo postlayout simulations of the amplifier output, which is driven to  $V_{DD}/2$  after the pulse.

mode to active is completed in less than 1 ms and can be done between two packet transmissions.

#### A. DC/DC Converter

The 0.55 V core voltage distributed to the TX is generated by a multimode switched-capacitor network dc/dc converter integrated on-chip with MiM capacitors and supplied by the 1.2 V external supply. The dc/dc converter first presented in [27] features a low-power mode, a medium-power (MP) mode, and a power gating mode. The MP mode is used in

this SoC to generate the 0.55 V core supply voltage with an output power up to 5 mW. The MP mode uses two two-way interleaved switched-capacitor networks to mitigate the output ripple and FBB is applied to increase switches conductance. The regulation mechanism is based on pulse-skipping modulation, which allows the converter to keep a high-power conversion efficiency over a wide load range. It uses a 10  $\mu\text{F}$  external capacitor.

#### B. +1.8/-1.8 V On-Chip Generation

To supply the FBB generator, we need to generate +1.8/-1.8 V supply voltages ( $V_{DD\_1V8}$  and  $V_{SS\_1V8}$ ) from the 1.2 V external supply voltage. This is done with two charge-pump circuits based on the Dickson architecture with MiM capacitors, clocked by the 31.25-MHz crystal clock. As shown in Fig. 13, pulse-skip modulation is used to regulate the two output voltages  $V_{DD\_1V8}$  and  $V_{SS\_1V8}$ . This is done by comparing these output voltages with a 0.55 V reference voltage through a resistive feedback divider. Two comparators assert the enable signals  $EN\_DD$  and  $EN\_SS$  only if  $V_{DD\_1V8}$  is below 1.8 V and  $V_{SS\_1V8}$  is above -1.8 V, respectively. In sleep mode, the charge pumps are disabled by gating their clock (GCLK signal).

In Dickson architecture, the output voltage and the load regulation are limited by the  $V_T$  drop on the MOS diodes. To reduce  $V_T$  of the MOS diodes, we use a butterfly connection based on dynamic back biasing. As shown in Fig. 13, the pMOS (resp. nMOS) diodes in the charge pump that generates  $V_{DD\_1V8}$  (resp.  $V_{SS\_1V8}$ ) are strongly forward back biased at  $V_{SS\_1V8}$  (resp.  $V_{DD\_1V8}$ ). This does not increase the sleep-mode leakage power as both  $V_{DD\_1V8}$  and  $V_{SS\_1V8}$  are grounded in sleep mode when the charge pumps are disabled.

Fig. 13 shows that the load current range for 5% output voltage drop is extended by a factor  $2.5\times$  thanks to the proposed butterfly connection.

#### C. Forward Back-Biasing Generator

The generation of the nMOS back-biasing voltage (BBN) is implemented by two capacitive DACs shown in Fig. 14 over a [0;1.65 V] voltage range. The BBN voltage of the full-custom TX is generated by a 10-b DAC to achieve fine-tuning of the CF to the selected 802.15.4a channel with 5-MHz frequency resolution. The output buffer is a Miller-compensated two-stage OTA sized to maximize PSRR to limit the impact of supply noise on the LO frequency accuracy. Capacitive DACs feature a low-power consumption linked to the absence of dc current but their outputs drift due to the leakage current of the switches. A refresh of the DAC is thus mandatory to avoid frequency drift during transmission. To do so, the packet-level baseband performs a refresh of the DAC before each 802.15.4a packet composed of maximum 83 preamble symbols and maximum 1209 data symbols. For the lowest 0.11-Mb/s data rates, the packet duration exceeding 10 ms imposes heavy constraints on DAC leakage. To relax these design constraints, the DAC refresh is performed during each guard interval. The worst case regarding the DAC output drift is then obtained when transmitting in the 0.85-Mb/s data rate where the packet



Fig. 10. Fan-out limitation due to ULV operation at 4.5 GHz from postlayout SPICE simulations of an inverter chain. (a) Maximum fan-out at 0.55 V allowing a 10%–90% transition time lower than 20% of the period at 4.5 GHz and (b) total energy of an FO1 inverter chain switching at 4.5 GHz.



Fig. 11. TSPC flip-flop schematic featuring the data retention feedback loop. The TSPC-based frequency divider shows a 24% power reduction and a 73% increase in the  $3\sigma$ -worst case maximum frequency extracted from Monte Carlo postlayout SPICE simulations. The data retention feedback loop guarantees data retention when the clock is low as opposed to baseline TSPC architecture.

duration can be as long as 1300  $\mu$ s. The DAC switches are thus sized to achieve less than 1 LSB of drift over 1300  $\mu$ s. We thus leverage the high-duty-cycled nature of the system to implement refreshing and allow the usage of capacitive DAC with lower power. The BBN applied to the output stages of the PA does not require high accuracy as it is used only to compensate the impact of PVT variations on the output power by increasing or decreasing the peak-to-peak amplitude of the

pulses. To generate this BBN, a 5-b capacitive DAC is used with architecture, output buffer, and refresh pattern similar to the 10-b DAC controlling the CF.

The FBB voltages to be applied to the pMOS (BBP) inside the full-custom TX and the digital PA output stages are generated by continuous-time current-matching loops shown in Fig. 14 similar to [28]. Replicas of an inverter and of a PA output stages are used to determine the correct BBP that needs



Fig. 12. SoC architecture.

Fig. 13. Proposed butterfly connected regulating charge pumps to generate the  $\pm 1.8$  V FBB supplies: schematic and simulated load regulation.

Fig. 14. FBB generator architecture. BBN is generated by DACs (DAC10 and DAC5) to tune the LO frequency for channel selection and the PA output power. The corresponding BBP voltages are generated from the BBNs through a current matcher structure with a back-gate-driven amplifier.

to be applied to balance the current between the nMOS and the pMOS, in the full-custom TX and in the PA, respectively. An output buffer supplied by  $-1.8$  V is then used to drive

Fig. 15. SleepTalker die microphotograph. The SoC core area is  $0.93\text{ mm}^2$  and the TX only occupies  $0.095\text{ mm}^2$ .

Fig. 16. Measured load regulation and efficiency of the dc/dc converter.

the back gate capacitance of all the devices of the full-custom TX. The inputs of the amplifiers driving the back gates at BBP have a  $0.55$  V signaling level ( $V_{DD}$  in Fig. 14) while the output can be driven to up to  $\pm 1.8$  V leading to a voltage



Fig. 17. Measured characteristics of the FBB generation. (a) Functionality demonstration of the 10-b DAC and the BBP generated by the current-matching loop. (b) BBN/BBP mismatch spread over eight dies. (c) 10-b DAC decay without refresh depending on the starting value. (d) INL/DNL of the 10-b DAC.



Fig. 18. (a) Left: measured LO frequency tuning range with FBB to compensate for temperature variations in the three targeted channels. Right: LO frequency variation over eight dies for FBB from 0 to 1.5 V. (b) Automatic LO frequency tuning demonstration over the three channels.

compatibility challenge as even the thick-oxide 1.8 V I/O transistors cannot be exposed to a voltage difference of  $0.55(-1.8) = 2.35$  V. To circumvent this, the output buffer is a Miller-compensated two-stage OTA with a back-gate-driven nMOS input pair. This allows voltage range compatibility between the OTA implemented with thick-oxide 1.8-V I/O transistors and the 0.55-V replica implemented with thin-oxide core devices without the use of level-shifting.

#### D. Physical Implementation

The *SleepTalker* test chip first presented in [29] was designed and manufactured in a ten-metal 28-nm FDSOI CMOS process with MIM capacitance option. Fig. 15 shows the  $0.93\text{-mm}^2$  die microphotograph with an active area of 0.55 and only  $0.095\text{ mm}^2$  for the TX. The fabricated dies were encapsulated in QFN40 packages with unsealed top lid for on-die measurements.

## V. EXPERIMENTAL VALIDATION

### A. Experimental Setup

The packaged dies were mounted on custom RF ROGERS 4350 PCB daughter boards interfaced with a custom mother board and a Nucleo STM32 microcontroller development board for communication and configuration through SPI I/F. For debug and independent measurements purpose, the output voltages from the dc/dc and from the charge pumps are



Fig. 19. Measured TX output temporal and spectral characteristics. (a) Square- and Gaussian-shaped measured pulses. (b) Power spectral densities depending on the pulse shape and the FBB applied on the PA.

connected to external pads. Three independent supplies are available for the synthesized and full-custom parts of the TX as well as for the PA output stage to allow quantification of the power repartition.

The BBN and BBP voltages applied to the full-custom TX are connected to dedicated pads for characterization purposes of the 10-b DAC. The 5-b DAC BBN output and its associated BBP applied to the PA stage are connected to on-die dc probing pads visible in the center of the die in Fig. 15. In a debug and measurement mode, the output of the LO drives an RF GSG pad also visible in the die microphotography in Fig. 15.

The default setup, unless specified otherwise in the following paragraphs, is based on room temperature ( $\approx 25^\circ\text{C}$ ) on typical dies, with an external  $V_{DD\_1V2} = 1.2$  V.



Fig. 20. SoC and TX power breakdown based on postlayout simulation.

Fig. 16 shows the measured output voltage of the switched-capacitor dc/dc supplied at 1.2 V in MP mode. The measured peak efficiency in MP mode is 86% at 2.5-mW load power and 84% at 380  $\mu$ W.

#### B. Back-Biasing Generation

Functional validation of the BBN and BBP controlling the LO frequency is shown in Fig. 17(a). The measured BBN and BBP ranges are, respectively, [0;1.6 V] and [0;−1.7 V]. We can observe that the current-matching loop applies a stronger FBB on the pMOS compared with the nMOS FBB, which indicates a strong mismatch between nMOS and pMOS in the near-threshold domain. Such mismatch was already studied in [23]. Fig. 17(b) shows the nMOS/pMOS mismatch by showing the BBN/BBP gap for eight measured dies originated from a single wafer. Without the current-matching adaptive FBB, this nMOS/pMOS mismatch would lead to strong robustness degradation.

Fig. 17(c) shows the output drift of the 10-b DAC depending on the command value. In the worst case, the time for a 1-LSB drift is 2 ms, which allows us to perform the refresh operation between the transmitted packets except for the 0.11-Mb/s data rate. The output characteristic and DNL of the 10-b DAC are presented in Fig. 17(d). Below 0.2 V, the 10-b DAC saturates leading to poor effective resolution for the LO frequency. However, as shown in Section V-C, the TX will not operate in this range even on 802.15.4a channel 1.

#### C. Carrier Frequency Tuning With Adaptive FBB

The FBB impact on the BBCO frequency is shown in Fig. 18(a). In nominal conditions, a wide range of frequencies

from 2.4 to 5.5 GHz can be achieved and the three targeted 802.15.4a channels are well centered inside the tuning range. Fig. 18(a) also shows that operations in the three channels remain possible for temperatures down to −40 °C or up to 85 °C. Fig. 18(a) presents the frequency drift over eight measured dies that is compensated by the FBB tuning loop.

The calibration loop is demonstrated in Fig. 18(b) for the three channels. The typical lock time is 2.2  $\mu$ s and the maximum lock time observed at start-up is 573  $\mu$ s.

#### D. TX Performances

The TX functionality was assessed in the seven data rates and for the three channels from 3.5 to 4.5 GHz. We successfully measured pulses at a supply voltage of 0.55 V and validated the pulse-shaping functionality at up to 9 GS/s.

Fig. 19 shows measured output pulses with square and Gaussian shape. The pulses are asymmetrical, which leads to a mismatch in rise and fall times at the input of the PA output stage. This comes from the strong observed mismatch between nMOS and pMOS, which is not fully compensated by the current-matching adaptive FBB. Indeed, the BBP voltage saturates at −1.7 V whereas a voltage around −2.1 V would be required for full nMOS/pMOS mismatch compensation. In the PA stage, if the command of the 5-b DAC controlling the applied BBN is too high, the BBP chosen by the current-matching loop also saturates at −1.7 V leading to compressed pulses in the negative phases. This asymmetric pulse shape contains more energy in low frequencies even after the first-order attenuation of the coupling capacitance leading to a reduced maximum allowable output power to meet the FCC regulations.

TABLE II  
MEASURED PERFORMANCES COMPARED WITH THE IEEE 802.15.4a STATE OF THE ART

| Process                                 | [32]            | [17]        | [4]           | [33]           | [7]                          | This work                                   |
|-----------------------------------------|-----------------|-------------|---------------|----------------|------------------------------|---------------------------------------------|
| Data rate [Mb/s]                        | 90 nm           | 90 nm       | 90 nm         | 0.18 $\mu$ m   | 0.13 $\mu$ m                 | 28nm FDSOI                                  |
| Active power [ $\mu$ W]                 | 0.11, 0.85, 6.8 | 16          | 15.6          | 27.24          | 0.11, 0.85, 1.7, 6.81, 27.24 | 0.11, 0.85, 1.7, 6.81, 27.24                |
| Energy per bit [pJ/bit]                 | 168300          | 650         | 4360          | NA             | 5980                         | TX: 380<br>SoC: 650                         |
| Die area [ $\text{mm}^2$ ]              | 24700 @ 6.8Mb/s | 650 @ 1Mb/s | 280 @15.6Mb/s | 740 @27.24Mb/s | 219 @27.24Mb/s               | TX: 14<br>SoC: 24 @27.24Mb/s                |
| Supply voltage [V]                      | NA              | 0.066       | 0.07          | 4.5            | RF: 7.5<br>Dig. BB: 24.7     | TX: 0.095<br>SoC: 0.93                      |
| Output swing [mV <sub>pp</sub> ]        | 3.3             | 1           | 1             | 1.8            | 1.2                          | External: 1.2<br>Core: 0.55*<br>BB: +/-1.8* |
| LO freq. range [GHz]                    | ~400            | ~100        | 165-710       | NA             | Up to 720                    | Up to 350                                   |
| Transmitted output power [dBm]          | 3.5-6.5         | 3-10        | 2.1-5.7       | 3-9            | 3.5-4.5                      | 3.5-4.5                                     |
| Transmitted energy per pulse [pJ/pulse] | NA              | <14.3       | -16.4         | <5.3           | NA                           | -20                                         |
| TX efficiency [%]                       | NA              | <5.7        | 0.53          | <0.5           | NA                           | 2.6                                         |

\* Generated on-chip.

Fig. 19 also shows how the pulse shape modifies the measured output power spectrum density. With a Gaussian shape, we achieved around 8–10 dB of attenuation in the 960–1610-MHz restricted band. Varying the FBB applied on the PA output stages leads to a 5-dB tuning range on the output power. Tuning on the output power through the FBB is done manually on the test setup. The measured peak output power is  $-50$  dBm/MHz while respecting the FCC mask and the total output power is  $-20$  dBm. This is slightly lower than previous 802.15.4a TXs (e.g.,  $-16.4$  dBm in [4]), which is due to the ULV operation of the proposed TX, which limits the output amplitude on the antenna. This will reduce the communication range but we evaluated that a communication range between 5 and 30m, depending on the selected data rate, is still achievable with a low-power noncoherent RX as in [36].

The simulated TX power breakdown is shown in Fig. 20. The combination of the full-custom baseband, LO, frequency divider, dual ring counters, and pulse shaping is accounting for almost 60% of the total TX power. The PA output stages and their driving chains consume equivalent power compared with the synthesized part of the TX.

#### E. Comparison to the State of the Art

The complete SoC consumes  $650 \mu\text{W}$  at 27.24 Mb/s leading to an energy per bit of  $24 \text{ pJ/b}$ . When considering only the TX with its synthesized and full-custom parts along with the FBB generator, the power consumption is only  $380 \mu\text{W}$ , which corresponds to a record-breaking  $14 \text{ pJ/b}$  at 27.24 Mb/s. Fig. 21 compares the energy efficiency of several 802.15.4a or proprietary UWB TX or transceivers depending on the data



Fig. 21. Energy efficiency comparison with recent IR-UWB TXs.

rate. *SleepTalker* achieves the lowest energy per bit over the 0.11–27.24-Mb/s data rate range and improves by  $16\times$  the energy efficiency of the best 802.15.4a TX [7]. References [9], [30], and [31] achieved closer results compared with this paper with proprietary UWB communication schemes.

Table II compares the performances of recent IEEE 802.15.4a TX or transceivers. The proposed UWB TX is the first designed in 28-nm FDSOI CMOS. The combined use of coarse to fine-grain duty cycling, advanced CMOS process, ULV operation enabled by FBB-induced  $V_T$  reduction, and high-speed design explains the obtained ultralow-power consumption.

#### VI. CONCLUSION

We demonstrated the first ULV 802.15.4a IR-UWB TX. It is also the first RF SoC in 28-nm FDSOI and it fully exploits its forward back-bias capabilities for tuning RF performances. The TX achieves ultralow power with  $380 \mu\text{W}$  at 27.24 Mb/s and the high frequency of part of the TX was obtained by  $V_T$

reduction through FBB. A low power and FBB tunable BBCO is proposed and adaptive FBB is also used to compensate for the exacerbated PVT sensitivity at ULV as well as for precise LO frequency tuning to match one of the three channel frequency covered by the TX. Compliance with regulation masks is ensured by programmable pulse shaping and its power overhead is mitigated by the use of a ULV digital PA. The high energy efficiency, small die area ( $0.93 \text{ mm}^2$ ) along with the full on-chip integration and the capability of operating on energy-harvesting sources illustrate the potential of the IEEE 802.15.4a compliant UWB communications for WSNs targeting the IoT.

#### ACKNOWLEDGMENT

The authors would like to thank STMicroelectronics for chip donation and especially A. Cathelin and P. Cathelin for valuable discussions. They would also like to thank P. Simon (UCL-WELCOME platform) for his precious help with on-wafer measurements.

#### REFERENCES

- [1] D. Bol, G. de Strel, and D. Flandre, "Can we connect trillions of IoT sensors in a sustainable way? A technology/circuit perspective (invited)," in *Proc. IEEE SOI-3D-Subthreshold Microelectron. Technol. Unified Conf. (S3S)*, 2015, pp. 1–3.
- [2] A. Paidimarri, N. Ickes, and A. P. Chandrakasan, "A +10 dBm 2.4 GHz transmitter with sub-400pW leakage and 43.7% system efficiency," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [3] P. P. Mercier, S. Bandyopadhyay, A. C. Lysaght, K. M. Stankovic, and A. P. Chandrakasan, "A sub-nW 2.4 GHz transmitter for low data-rate sensing applications," *IEEE J. Solid-State Circuits*, vol. 49, no. 7, pp. 1463–1474, Jul. 2014.
- [4] P. P. Mercier, D. C. Daly, and A. P. Chandrakasan, "An energy-efficient all-digital UWB transmitter employing dual capacitively-coupled pulse-shaping drivers," *IEEE J. Solid-State Circuits*, vol. 44, no. 6, pp. 1679–1688, Jun. 2009.
- [5] *IEEE 802.15.4a Wireless MAC and PHY Specifications for LR-WPANs*, IEEE Standard 802.15.4a, 2007. [Online]. Available: <http://www.ieee802.org/15/pub/TG4a.html>
- [6] B. Razavi *et al.*, "A UWB CMOS transceiver," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2555–2562, Dec. 2005.
- [7] S. Joo *et al.*, "A fully integrated 802.15.4a IR-UWB transceiver in 0.13  $\mu\text{m}$  CMOS with digital RRC synthesis," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2010, pp. 228–229.
- [8] M. Crepaldi, C. Li, K. Dronson, J. Fernandes, and P. Kinget, "An ultra-low-power interference-robust IR-UWB transceiver chip set using self-synchronizing OOK modulation," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2010, pp. 226–227.
- [9] S. Geng, D. Liu, Y. Li, H. Zhuo, W. Rhee, and Z. Wang, "A 13.3 mW 500 Mb/s IR-UWB transceiver with link margin enhancement technique for meter-range communications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 160–161.
- [10] M. Demirkiran and R. R. Spencer, "A pulse-based ultra-wideband transmitter in 90-nm CMOS for WPANs," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2820–2828, Dec. 2008.
- [11] Y. J. Zheng *et al.*, "A 0.92/5.3nJ/b UWB impulse radio SoC for communication and localization," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2010, pp. 230–231.
- [12] J. K. Brown, K.-K. Huang, E. Ansari, R. R. Rogel, Y. Lee, and D. D. Wentzloff, "An ultra-low-power 9.8GHz crystal-less UWB transceiver with digital baseband integrated in 0.18  $\mu\text{m}$  BiCMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3178–3189, Dec. 2013.
- [13] T. Norimatsu *et al.*, "A UWB-IR transmitter with digitally controlled pulse generator," *IEEE J. Solid-State Circuits*, vol. 42, no. 6, pp. 1300–1309, Jun. 2007.
- [14] T.-C. Lee and K.-J. Hsiao, "The design and analysis of a DLL-based frequency synthesizer for UWB application," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp. 1245–1252, Jun. 2006.
- [15] D. Lachartre *et al.*, "A 1.1nJ/b 802.15.4a-compliant fully integrated UWB transceiver in 0.13  $\mu\text{m}$  CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2009, pp. 312–313.
- [16] Y. Park and D. D. Wentzloff, "An all-digital 12 pJ/pulse IR-UWB transmitter synthesized from a standard cell library," *IEEE J. Solid-State Circuits*, vol. 46, no. 5, pp. 1147–1157, May 2011.
- [17] J. Ryckaert, G. V. D. Plas, V. D. Heyn, C. Desset, B. V. Poucke, and J. Cranickx, "A 0.65-to-1.4 nJ/burst 3-to-10 GHz UWB all-digital TX in 90 nm CMOS for IEEE 802.15.4a," *IEEE J. Solid-State Circuits*, vol. 42, no. 12, pp. 2860–2869, Dec. 2007.
- [18] T. Terada, S. Yoshizumi, M. Muqsith, Y. Sanada, and T. Kuroda, "A CMOS ultra-wideband impulse radio transceiver for 1-Mb/s data communications and  $\pm 2.5\text{-cm}$  range finding," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 891–898, Apr. 2006.
- [19] *First Report and Order*, document FCC 02-48, Feb. 2002.
- [20] D. D. Wentzloff and A. P. Chandrakasan, "A 47 pJ/pulse 3.1-to-5 GHz all-digital UWB transmitter in 90nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2007, pp. 118–120.
- [21] Y. A. Eken and J. P. Uyemura, "A 5.9-GHz voltage-controlled ring oscillator in 0.18- $\mu\text{m}$  CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 1, pp. 230–233, Jan. 2004.
- [22] N. Reyniers and W. Dehaene, "A 210mV 5MHz variation-resilient near-threshold JPEG encoder in 40nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 456–457.
- [23] G. de Strel and D. Bol, "Impact of back gate biasing schemes on energy and robustness of ULV logic in 28nm UTBB FDSoI technology," in *Proc. IEEE Int. Symp. Low Power Electron. Design (ISLPED)*, Sep. 2013, pp. 255–260.
- [24] J. Yuan and C. Svensson, "New single-clock CMOS latches and flipflops with improved speed and power savings," *IEEE J. Solid-State Circuits*, vol. 32, no. 1, pp. 62–69, Jan. 1997.
- [25] A. J. Lopez-Martin, S. Baswa, J. Ramirez-Angulo, and R. G. Carvajal, "Low-voltage super class AB CMOS OTA cells with very high slew rate and power efficiency," *IEEE J. Solid-State Circuits*, vol. 40, no. 5, pp. 1068–1077, May 2005.
- [26] J. Ramirez-Angulo and M. Holmes, "Simple technique using local CMFB to enhance slew rate and bandwidth of one-stage CMOS op-amps," *Electron. Lett.*, vol. 38, no. 23, pp. 1409–1411, 2002.
- [27] S. Clerc *et al.*, "A 0.33 V/–40 °C process/temperature closed-loop compensation SoC embedding all-digital clock multiplier and DC-DC converter exploiting FDSoI 28nm back-gate biasing," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [28] N. Couniot, G. de Strel, F. Botman, A. K. Lusala, D. Flandre, and D. Bol, "A 65 nm 0.5 V DPS CMOS image sensor with 17 pJ/frame/pixel and 42 dB dynamic range for ultra-low-power SoCs," *IEEE J. Solid-State Circuits*, vol. 50, no. 10, pp. 2419–2430, Oct. 2015.
- [29] G. de Strel, F. Stas, T. Gurné, F. Durant, C. Frenkel, and D. Bol, "SleepTalker: A 28nm FDSoI ULV 802.15.4a IR-UWB transmitter SoC achieving 14pJ/bit at 27Mb/s with adaptive-FBB-based channel selection and programmable pulse shape," in *Proc. IEEE Symp. Very Large Scale Integr. Circuits (VLSI)*, Jun. 2016, pp. 1–2.
- [30] J. K. Brown, K.-K. Huang, E. Ansari, R. R. Rogel, Y. Lee, and D. D. Wentzloff, "An ultra-low-power 9.8GHz crystal-less UWB transceiver with digital baseband integrated in 0.18  $\mu\text{m}$  BiCMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 442–443.
- [31] F. Padovan, A. Bevilacqua, and A. Neviani, "A 20 Mb/s, 2.76 pJ/b UWB impulse radio TX with 11.7% efficiency in 130 nm CMOS," in *Proc. Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2014, pp. 287–290.
- [32] DW1000: A Fully Integrated Single Chip Ultra Wideband (UWB) Low-Power Low-Cost Transceiver IC Compliant to IEEE802.15.4-2011, document DW1000, DecaWave, Dec. 2014.
- [33] Y. Zheng *et al.*, "A 0.18  $\mu\text{m}$  CMOS dual-band UWB transceiver," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2007, pp. 114–119.
- [34] A. Klinefelter *et al.*, "A 6.45  $\mu\text{W}$  self-powered IoT SoC with integrated energy-harvesting power management and ULP asymmetric radios," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [35] Y. Kim, I. Hong, and H.-J. Yoo, "A 0.5 V 54  $\mu\text{W}$  ultra-low-power recognition processor with 93.5% accuracy geometric vocabulary tree and 47.5% database compression," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [36] A. P. Chandrakasan *et al.*, "Low-power impulse UWB architectures and circuits," *Proc. IEEE*, vol. 97, no. 2, pp. 332–352, Feb. 2009.



**Gueric de Strel** (S'12–M'16) received the M.S. degree in electrical engineering and the Ph.D. degree in engineering science from the Université catholique de Louvain (UCL), Louvain-la-Neuve, Belgium, in 2012 and 2016, respectively.

In 2014, he was a Visiting Ph.D. Student with the Radio Frequency Integrated Circuits Group, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, where he was involved in analog and RF design in advanced FDSOI technology. Since 2012, he has been involved in three mixed

signal system-on-chip prototyping, fabrication and testing, and has been the Project Leader for a 28-nm RF system-on-chip design. He is currently with imec, Leuven, Belgium, where he is involved in millimeter-wave CMOS design for radar applications. He has authored or co-authored over 20 technical papers and conference contributions. His current research interests include ultra-low-power mixed design, analog and RF design in advanced CMOS technology, and efficiency optimization of low-power-digital designs.

Dr. de Strel received the Best Master Thesis Award from the UCL IEEE student branch in 2012 and the Best Student IC Design Paper Award from the IEEE FTFC Conference in 2014.



**François Stas** (S'13) received the M.S. degree in electrical engineering from the Université catholique de Louvain, Louvain-la-Neuve, Belgium, in 2013, he conducted his Master thesis under the supervision of Prof. D. Bol. He is currently pursuing the Ph.D. degree under the supervision of Prof. D. Bol.

He joined the Electronics Circuit and System Team in 2013, where he is involved in power optimization techniques in the context of embedded digital processors and mixed-signal system-on-chip.



**Thibaut Gurné** (S'13) received the M.S. degree in electrical engineering from the Université catholique de Louvain (UCL), Louvain-la-Neuve, Belgium, in 2013, and the M.S. degree in space studies from the Katholieke Universiteit Leuven, Leuven, Belgium.

He was a part-time Researcher under the supervision of Prof. D. Bol with UCL from 2013 to 2014. His current research interests include low-power analog, RF design, and power amplification in advanced CMOS technology. He is currently with Nokia Bell Labs, Murray Hill, NJ, USA, where he is involved in the analog front-end for next-generation DSL.



**François Durant** (S'14) received the M.S. degree in électronique électrotechnique automatisme et traitement du signal, specialized in nanoelectronics from Université Joseph Fourier, Grenoble, France, in 2014.

He was an intern under the supervision of Prof. D. Bol with the Electronic Circuit and System Group. He is currently a Verotech Consultant with imec, Leuven, Belgium, as a Tape-Out Engineer for UMC and ONSEMI foundry for the Europractice Program.



**Charlotte Frenkel** (S'15) received the M.S. degree (*summa cum laude*) in electromechanical engineering from the Université catholique de Louvain (UCL), Louvain-la-Neuve, Belgium, in 2015, where she is currently pursuing the Ph.D. degree as a Research Fellow of the National Foundation for Scientific Research of Belgium, under the supervision of Prof. D. Bol and Prof. J.-D. Legat.

Her current research interests include the design of low-power and high-density neuromorphic circuits as efficient non-von Neumann architectures for real-time recognition and learning.

Ms. Frenkel received the Best Master Thesis Award from the UCL IEEE Student Branch in 2015 for her Master thesis on radiation-hardening techniques for commercial off-the-shelf FPGAs.



**Andreia Cathelin** (M'04–SM'11) started electrical engineering studies at the Polytechnic Institute of Bucharest, Bucharest, Romania, and received the Degree from the Institut Supérieur de l'Électronique du Nord, Lille, France, in 1994, the Ph.D. and Habilitation à diriger des recherches (French highest academic degree) degrees from the Université de Lille 1, Lille, in 1998 and 2013, respectively.

In 1997, she was with Info Technologies, Grignan, France, where she was involved in analog and RF communications design. Since 1998, she has

been with STMicroelectronics, Crolles, France, where she is currently with Technology Research and Development, as Senior Member of the Technical Staff. She has authored or co-authored 100 technical papers and four book chapters, and holds over 25 patents. The 60-GHz transceiver resulting of a joint research between STMicroelectronics and CEA-Leti to which she has actively participated has been selected as cover photo for the International Seventh Edition (Oxford University Press, 2016) of the famous Adel S. Sedra and Kenneth C. Smith *Microelectronics Circuits* book. Her current research interests include the area of RF/mmW/THz systems for communications and imaging.

Dr. Cathelin is an elected member of the IEEE SSCS Adcom from 2015 to 2017. She is a co-recipient of the ISSCC 2012 Jan Van Vessem Award for Outstanding European Paper and of the ISSCC 2013 Jack Kilby Award for Outstanding Student Paper, and also received the 2012 STMicroelectronics Technology Council Innovation Prize. She is serving in several IEEE conferences and committees. She has been active at ISSCC since 2011, in 2011 as a TPC member, the RF Subcommittee Chair from 2012 to 2015, and is currently the Forums Chair and also a member of the Executive Committee. She has been a member of ESSCIRC TPC since 2005. Since 2013, she has been on the Steering Committee (SC) of ESSCIRC and ESSDERC conferences, and currently the SC Chair. She is also a member of the experts team of the AERES (French Evaluation Agency for Research and Higher Education). She has been on the Technical Program Committees of the VLSI Symposium on Circuits from 2010 to 2016, serving in the last years as an Officer. She has been a Guest Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS special issue on the VLSI Symposium in 2016.



**David Bol** (S'07–M'09) received the M.Sc. degree in electromechanical engineering and the Ph.D. degree in engineering science from the Université catholique de Louvain (UCL), Louvain-la-Neuve, Belgium, in 2004 and 2008, respectively.

In 2005, he was a Visiting Ph.D. Student with the CNM National Centre for Microelectronics, Sevilla, Spain, in advanced logic design. In 2009, he was a Post-Doctoral Researcher with intoPIX, Louvain-la-Neuve, where he was involved in low-power design for JPEG2000 image processing. In 2010, he was

a Visiting Post-Doctoral Researcher with the Laboratory for Manufacturing and Sustainability, University of California (UCL) at Berkeley, Berkeley, CA, USA, where he was involved in life-cycle assessment of the semiconductor environmental impact. In 2015, he participated to the creation of e-peas semiconductors, Liège, Belgium. He is currently an Assistant Professor with UCL. He is also leading the Electronic Circuits and Systems Research Group of the ICTEAM institute at UCL with Prof. D. Flandre, focused on ultra-low-power design of integrated circuits for the IoT, including computing, power management, sensing, and RF communications with focuses on technology/circuit interaction in nanometer CMOS nodes, mixed-signal SoC implementation, and variability mitigation. He is also co-responsible of four M.Sc. courses in electrical engineering at UCL on digital, analog, and mixed-signal integrated circuits, and systems and sensors. He has authored or co-authored over 70 technical papers and conference contributions and holds a delivered patent.

Dr. Bol co-received three Best Paper/Poster/Design Awards in IEEE conferences, such as ICCD 2008, SOI Conference 2008, and FTFC 2014. He also serves as an Editor of MDPI Journal of Low Power Electronics and Applications, as a TPC member of the IEEE SubVt/S3S Conference, and as a reviewer of various journals and conferences, such as the IEEE JOURNAL OF SOLID-STATE CIRCUITS, the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I/II. Since 2008, he has presented several invited keynote tutorials in international conferences.