

# A Nanosecond-Transient Fine-Grained Digital LDO With Multi-Step Switching Scheme and Asynchronous Adaptive Pipeline Control

Fan Yang, *Student Member, IEEE*, and Philip K. T. Mok, *Fellow, IEEE*

**Abstract**—This paper introduces a multi-step switching scheme for a digital low dropout regulator (DLDO) that emerges as a new way of achieving nanosecond-transient and fine-grained on-chip voltage regulation. The multi-step switching scheme takes advantage of the adaptive pipeline control and asynchronous clocking for area- and power-efficient digital controller utilization. It speeds up the transient response by varying the pass transistor sizing in two available lengths of coarse steps as per the perturbation, while maintaining a small output voltage ripple by toggling in a finer step at steady operation. A prototype proving the proposed concept, i.e., a 0.6–1.0-V input, 50–200-mV dropout, and 500-mA maximum loading DLDO with an on-chip 1.5-nF output capacitor, is fabricated in a 65-nm CMOS process to verify the effectiveness of this scheme. By employing the multi-step switching scheme and adaptive control, the DLDO achieved a fast transient response to nanoseconds loading current change, and a 100 mV per 10-ns reference voltage switching, as well as a resolution of 768 levels ( $\sim 9.5$  bits) with a 5-mV output ripple. The quiescent current consumed by this DLDO at steady operation is down to 300  $\mu$ A.

**Index Terms**—3-D power stage, asynchronous control, coarse-fine, digital low dropout regulator (DLDO), fine-grained, fully on-chip, multi-step switching, nanosecond transient.

## I. INTRODUCTION

WITH the substantial growth in the development of system-on-a-chip (SoC), the demand for designing clean and responsive on-chip power management has become increasingly challenging [1]–[6]. Such a challenge originates from many aspects: the per-core dynamic voltage and frequency scaling scheme requires the regulators to provide a wide range of supply voltage down to the near-, even sub-threshold, region [4], [6]; the full integration of regulators leaves no budget for a large decoupling capacitor [5]; and as more on-chip modules migrate to digital, all-analog solutions may not enjoy the technology scaling due to the higher chip area overhead [6].

Manuscript received October 3, 2016; revised January 6, 2017, March 22, 2017, and May 19, 2017; accepted May 19, 2017. Date of publication June 14, 2017; date of current version August 22, 2017. This paper was approved by Associate Editor Yogesh Ramadas. This work was supported in part by the Research Grant Council of Hong Kong SAR Government, China, under Project 616813, and in part by Qualcomm Technologies, Inc., under Project QUALCOMM14EG02. (Corresponding author: Philip K. T. Mok.)

F. Yang was with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong. He is now with Qualcomm, Singapore 554910 (e-mail: fyangaa@connect.ust.hk).

P. K. T. Mok is with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong (e-mail: eemok@ust.hk).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2709311

Compared with its analog counterparts [1], [2], [7], [8], the digital controller is known to be able to achieve a higher gain and bandwidth, and has the capacity to work in a near- or sub-threshold region [4], [5]. The dynamic power in obtaining a comparable performance is usually very low on average [4] and the static loss can usually be reduced by means of power gating [9]. Another advantage is the ease and lower cost of migrating over generations of technology scaling [4], [10].

In the recent decade, the digital low-dropout regulator (DLDO) has emerged as a good candidate to fit the SoC application. A DLDO is mainly distinguished from an analog LDO (ALDO) in the control of a piecewised power stage, instead of the single pass transistor and its all-analog control loop. According to how the regulation error is generated, there are two main categories, namely, voltage-based [4], [9] and time-based DLDO [5], [11].

Improvements on the earlier work focus solely on one aspect, i.e., either resolution or speed [5], [9], [10]. More recent published DLDOs target balancing both resolution and speed, such as the work in [6] and [11]–[14]. In particular, Oh and Hwang [11] use the binary-weighted power stage, such that the digital control word length is well bounded, and Bin Nasir *et al.* [12] configure the use of a limited number of power MOS sections and  $1 \times 3 \times$  multiple steps for similar purposes. Such methods of manipulating different sizes can reduce the number of steps to achieve regulation. However, the largest step in [12] is moderate, i.e., limited to  $3 \times$  at maximum, while the binary weighted power stage may suffer from undesirable overshoot and long recovery time, although the response time can be reduced [15]. In fact, when experiencing a rapid large load transient, the droop mitigation and the faster recovery in the above-mentioned work as well as in [6] are realized mainly by the turbo frequency. A combination of power strength and frequency tuning is effective, especially at handling a well-defined transient, which, unfortunately, is not always obtained beforehand. In this sense, a more general purposed alternative is in need to cover a much wider range of transient conditions.

In fact, when a higher current capability is required, one has to solve the inherent problem in balancing the resolution and speed. With a larger number of levels at a wider loading range, without compromising speed, more power stage units can be grouped to be switched for a faster transient response with a coarse resolution temporarily. As a straightforward method, the desired balance will not be achieved unless



Fig. 1. Architecture of the proposed DLDO.

the clocking scheme is properly designed. The synchronous clocking scheme, such as in [4] and [6], is still limited by the cycle-based delay as a boosted global clock can lead to much larger power consumption for a faster response. The asynchronous clocking scheme in [9] and [16] can largely reduce the static power by turning off unused logics. However, the asynchronous control implemented in [9] and [16] is either relatively slow or involved, which can be further improved by adopting different design philosophy such as in [17]. This paper is an extension to [17], in which adaptive sizing that changes *in situ* resolution and asynchronous control is applied with a wide supply and loading range.

This paper is organized as follows. Section II describes the architecture of the proposed DLDO, in particular, how the power stage is organized by considering the balance in speed and resolution. In Section III, the circuit implementations, as well as the behavioral block diagram of the digital controllers, are presented with simulations. In Section IV, measurement results are given to verify the effectiveness of the proposed techniques. Finally, the conclusions are given in Section V.

## II. ARCHITECTURE AND POWER STAGE OF THE PROPOSED DLDO

When using the asynchronous control, dynamic power consumption can be reduced by replacing the global clock with a chain of event-driven local clock pulses [9], [17], [18]. However, not all the digital controllers are suitable for such a change, mainly due to the fact that the matching for adjacent digital cells will affect the clock propagation. The phase and the pulsedwidth of the asynchronous clock are uncertain; thus, such timing information may further generate a larger error if it is directly regulating the voltage. Therefore, the time-based DLDO is not suitable for changing to asynchronous control [5], [11]. As a result, the voltage-based controller for a DLDO [4], [9] is selected as the main baseline structure for this design.

### A. Architecture and Basic Operation of the Proposed DLDO

Fig. 1 shows the architecture of the proposed DLDO. The voltage regulation is constructed by sensing  $V_{OUT}$  and comparing the feedback voltage  $V_{FB}$  with a reference  $V_{REF}$ , followed by a digital controller translating the *in situ* comparison result



Fig. 2. (a) Simplified power stage and ideal controller. (b) Illustrations of load transient waveforms at (1) smaller and (2) larger step size  $n$ .

$Q$  into a digital word  $D[1 : N]$  in order to drive the power PMOS array.

The pipeline controller is used to perform such a series-to-parallel data conversion. The fundamental operation of a pipeline is similar to the bi-directional shifting in [4]. The digital code output of each pipeline unit will be 0 or 1, the bi-directional shifting of which is driven by an asynchronous clock that will be detailed later. The digital controller is governed by a finite-state machine (FSM) that determines the state for reconfiguring the pipeline and generating  $D[1 : N]$  accordingly, which will also be further discussed in Section III.

As the gate drive level is digital, the current delivery capability of each power stage unit is determined by  $V_{IN}$  and the regulated  $V_{OUT}$ , at a given body biasing. This is different from an ALDO, which adjusts its output current by varying the gate voltage for a fixed-sized pass transistor. To deliver a larger maximum current required by the SoC, the total size of the power stage, i.e.,  $W/L$ , is increased. However, increasing the total size without altering the number of power stage units  $N$  will result in a degraded, or coarse resolution. As a result, the resolution of the regulated voltage is proportional to the sizing of the power stage unit that toggles at least significant bit (LSB) change.

### B. Design Considerations of a Multi-Step Power Stage

Assume that the sizing of an LSB unit is given, i.e.,  $W/L$  in Fig. 2(a). There are various methods of organizing the remaining  $(N - 1)$  LSB units, according to whether a linear, binary, or other division is used. Different from a search problem, finding the correct number of units to turn on/off based on  $V_{out}$  actually affects  $V_{out}$  itself. Any change of  $V_{out}$  is caused by the load current change receiving a changed gate signal, as shown in a simplified power stage and its controller in Fig. 2(a). When the loading current steps from  $I_{out1}$  up to  $I_{out2}$ , or the output resistance decreases by  $\Delta R$ , an extra  $n$  power MOS units are turned on, as shown in case (1) of Fig. 2(b).

A general load transient response of a DLDO is, therefore, analyzed with an emphasis on the effects of different strengths of step size, regardless of the clocking scheme, assuming that each turned-on LSB unit works in the linear region ( $V_{sd} \ll V_{dd} - |V_{tp}|$ ), and the initial condition is that  $V_{out}$  is

regulated with a loading resistance of  $R$ , when the number of  $N_0$  LSB units is turned on. Now consider, at time 0, the load resistance changes from  $R$  to  $R - \Delta R$ , and instantaneously, the controller reacts by turning on  $n$  extra units, and the following equation holds:

$$(N_0 + n)K V_{sd} - C_{out} \frac{d(V_{dd} - V_{sd})}{dt} = \frac{V_{dd} - V_{sd}}{R - \Delta R} \quad (1)$$

where  $K$  denotes  $\mu C_{ox}(W/L)(V_{dd} - |V_{th}|)$ , a constant defined by the process and supply voltage,  $V_{out}$  equals to  $V_{dd} - V_{sd}$ , and  $C_{out}$  denotes the output capacitor.

Equation (1) is solved with interest in how the  $V_{out}$  (i.e.,  $V_{dd} - V_{sd}$ ) reacts to the load current change by turning on  $n$  extra units. As  $V_{sd}$  is the single variable while the others are parameters or constant, a general solution for (1) can be obtained. Further consider the initial condition as in

$$N_0 K V_{sd} = \frac{V_{dd} - V_{sd}}{R} \quad (2)$$

thus  $V_{sd}$  can be solved as shown in

$$V_{sd} = \frac{\tau_1}{(R - \Delta R)C_{out}} V_{dd} + \left( \frac{1}{1 + N_0 K R} - \frac{\tau_1}{(R - \Delta R)C_{out}} \right) V_{dd} e^{-\frac{t}{\tau_1}} \quad (3)$$

where  $\tau_1$  is the time constant that is linearly proportional to time  $t_{settle}$  for  $V_{sd}$  or  $V_{out}$  to settle

$$\tau_1 = \frac{(R - \Delta R)C_{out}}{1 + (N_0 + n)K(R - \Delta R)} \approx \frac{C_{out}}{(N_0 + n)K}. \quad (4)$$

Given a load step, there exists an optimal number of extra turned-on power MOS units,  $n_{opt}$ , which is computed by setting the difference between the settled  $V_{out}$  and initial voltage  $V_{ref}$  to 0, i.e.,  $\Delta V_{out} = 0$ , after a settling time of  $t_{settle}$ . In fact,  $n$  is usually either smaller or larger than  $n_{opt}$ . If  $n$  is much smaller than  $n_{opt}$ ,  $V_{out}$  cannot recover from its droop in one step, and  $\Delta V_{out} < 0$ , as shown in case (1) of Fig. 2(b). There shall be more units to be turned on in the following search steps. If  $n$  is much larger than  $n_{opt}$ ,  $\Delta V_{out} > 0$ . However, it is not preferred as the longtime  $\Delta V_{out}$  takes to return to 0 degrades the regulation accuracy. As case (2) of Fig. 2(b) shows, the correction of overshoot is made by turning off these  $n$  units, assuming no action is performed until  $V_{out}$  crosses the desired voltage  $V_{ref}$ . The time required for  $\Delta V_{out}$  returning to 0, as denoted as  $t_{return}$ , is also marked.

An example is provided in which  $V_{dd} = 1$  V and the initial condition is  $V_{sd} = 0.1$  V,  $R = 9 \Omega$ , and  $N = 1$ . The relationships of  $\Delta V_{out}$  versus current step-up ratio ( $I_{out2}/I_{out1}$ ), as well as the ratio ( $t_{return}/t_{settle}$ ) versus ( $I_{out2}/I_{out1}$ ) are shown in Figs. 3 and 4, respectively. Each solid curve stands for the data obtained with a different step size of  $n$ , and the ratio ( $I_{out2}/I_{out1}$ ) ranges from 1 to 100. For example, when step size  $n$  equals to 4, at a current step-up ratio ( $I_{out2}/I_{out1}$ ) of 4, there is an overshoot  $\Delta V_{out}$  of 20 mV (Fig. 3) and the ratio ( $t_{return}/t_{settle}$ ) is 0.3 (Fig. 4). However, when ( $I_{out2}/I_{out1}$ ) rises to 10, the step size  $n$  of 4 is not sufficient, thus resulting in an undershoot  $\Delta V_{out}$  of -80 mV; the ratio ( $t_{return}/t_{settle}$ ) is not plotted in Fig. 4, as  $t_{return}$  always equals to 0 for undershoot cases.



Fig. 3. Plot of  $\Delta V_{out}$  versus  $I_{out2}/I_{out1}$  for load transient response.



Fig. 4. Plot of  $t_{return}/t_{settle}$  versus  $I_{out2}/I_{out1}$  for load transient response.

In order to maintain a regulated  $V_{out}$  in response to the disturbance, such as a load step change, a step size that is close to  $n_{opt}$ , which minimizes  $\Delta V_{out}$ , is desired. As shown in Fig. 3, a series of  $n_{opt}$  at specific current steps are obtained by intersecting the dashed line ( $\Delta V_{out} = 0$ ) with the solid lines. In fact, the optimal step size  $n_{opt}$  can be selected for specific current steps, even though there are a variety of options in determining step sizes for other scenarios. Three cases are marked: the “medium” case uses a step size that is larger than or equal to  $n_{opt}$ ; the “small” case uses a step size that is smaller than or equal to  $n_{opt}$ , while the “large” case uses a step size that is always much larger than necessary.

It can be observed that a very small  $n$  comes with a large undershoot  $\Delta V_{out}$ , as Fig. 3 shows. If  $n$  is too large, the newly turned-on  $n$  units will cause a large overshoot instead. Thus, at a given boundary on  $\Delta V_{out}$ , e.g.,  $\pm 50$  mV, a step sizing similar to the “medium” case is preferred. The side effect of a too large  $n$  can be further observed in Fig. 4. The time  $t_{return}$  returning from a large overshoot can be tens of times of  $t_{settle}$ , which may cause an oscillating  $V_{out}$  due to the accumulated error in each cycle. For both “large” and “medium” cases shown in Fig. 4, although the ratio ( $t_{return}/t_{settle}$ ) is not large, the “large” case is not recommended mainly due to the large overshoot.

In practice, the scheme that uses the optimal step sizing adaptive to a finely divided load step range is difficult to implement. A fewer number of unit step transitions is preferred. The power stage can be designed with three step sizes,  $n_1$ ,  $n_2$ , and  $n_3$ , which are responsible for small, medium, and large load steps, respectively. It should be noted that although there is no assumption for the prior knowledge of the upcoming load step, transition between different step sizing shall be conducted by



Fig. 5. (a) Proposed row–column-bit 3-D power stage. (b) Illustration of the row, column, and bit in a turned on/off example of the 3-D power stage.

a properly designed digital controller, which will be covered in the following sections.

### C. Proposed Row–Column-Bit 3-D Power Stage

For the power stage to deliver a much larger current, an increased  $N$  is generally necessary at a given resolution. In this case, the transient speed will be limited if a 1-D access is adopted [4]. The adaptive sizing technique will achieve a balance between resolution and speed, which can be understood as changing the *in situ* resolution according to the operating modes. However, the conventional adaptive sizing that is generally implemented as 2-D access, such as in [12], may not effectively cover a wider load range. In this paper, the adaptive sizing is realized by the proposed multi-step switching scheme, named the row–column-bit 3-D power stage [17], [18].

Fig. 5(a) shows how the proposed power stage is organized and connected to the address and data bus in a flattened view. The power stage contains  $M_R$  rows with  $M_C$  columns inside each row. Each column is further linearly divided into  $M_B$  bits. Thus, a total number of  $N = (M_R \times M_C \times M_B)$  minimal levels are obtained. In this paper, the numbers  $M_R$ ,  $M_C$ , and  $M_B$  are 6, 8, and 16, respectively, thus 768 levels in total, which is  $\sim 9.5$  binary bits.

The address and data bus are functionally equivalent to the data multiplexer (MUX) in Fig. 1, which selects the output data by the address. As the power stage is organized in  $(M_R \times M_C)$  groups of columns, the address bus should contain an equally large number of addresses to make sure each column is accessible.  $(M_R + M_C)$  latches are used to store the digital gate control codes. The equivalent mapping of the  $(M_R + M_C)$  data latches and  $(M_R \times M_C)$  addresses is illustrated as the row/column address and data controller in Fig. 5(a). Also shown is an extra number of  $M_B$  latches. The  $M_B$ -bit data are used to turn on/off  $M_B$  bit units in an active column by the bit data controller.

Fig. 5(b) provides an example showing the ON/OFF status of the proposed 3-D power stage. In Fig. 5, the large array stands for the power stage, in which each squared block contains the power stage units that are grouped as a column. In this example, the shaded blocks stand for the power stage units that are turned on in their corresponding dimensions. The

TABLE I  
DESCRIPTION OF STATES AND MODES OF OPERATION

| State | Mode             | Control of Data                                               | Speed | Resolution |
|-------|------------------|---------------------------------------------------------------|-------|------------|
| 1     | Row Operation    | Each row of column units controlled as a group                | +++   | +          |
| 2     | Column Operation | Each column unit in a certain row controlled individually     | ++    | ++         |
| 3     | Bit Operation    | Each bit unit inside a certain column controlled individually | +     | +++        |

dashed block also shows a row that is partially turned on, which actually contains both fully turned-on and turned-off columns, as well as the column that will be further finely controlled with bits.

## III. CIRCUIT IMPLEMENTATION OF THE CORE CIRCUITS

### A. Row–Column-Bit Data Controller for Multi-Step Switching Power Stage

The power stage is comprised of linearly divided units, in which the LSB (bits) can be grouped into the column and the row. For a fine resolution, the bitwise control can be realized by *in situ* comparison between  $V_{FB}$  and  $V_{REF}$ . Alternatively, when the controller works in rowwise or columnwise mode for speed enhancement, the grouped bits will be accessed together, as summarized in Table I. For a smooth mode transition between modes without making logic design more involved, the power stage is organized in an array of columns. Therefore, by altering the way of controlling a column, it can equivalently control the mode of the row, column, or bit.

Fig. 6(a) shows how the equivalent  $(M_R \times M_C)$  row/column data matrix is formed by the  $(M_R + M_C)$  row/column data latches. If the DLDO starts up operating in the row-mode operation,  $M_R$  number of rows are subsequently turned on until arriving at a row that should neither be fully turned on ("0") nor off ("1"). After that,  $M_C$  number of columns inside this specific row experience a similar process until locating the active column, i.e., column that needs to toggle ON/OFF to maintain the regulation. Then, to achieve a finer resolution,  $M_B$  bits contained in this column will be further controlled. In Fig. 6(b), the bits inside the active column will experience



Fig. 6. (a) Architecture of the rowwise and columnwise address and data controller. (b) Architecture of the bitwise data controller inside an active column of (a).

a direct read/write following the *in situ* comparison and data multiplexing.

To use an  $(M_R \times M_C)$  array of individual columns for the multi-step switching is inefficient. The scale of an array of columns can be greatly reduced due to the fact that the power stage is switched with a certain pattern, as discussed earlier. The thermometer code that defines the ON/OFF status contains a sequence, which contains a series of consecutive 0's (i.e., "ON" of the PMOS units) plus a series of consecutive 1's (i.e., "OFF" status); thus,  $(M_R + M_C)$  array of columns is sufficient to define the active column as in Fig. 2(a). In addition, when switched to the bit mode, with the use of a MUX, the digital controller associated with the row and column modes can also be reused, as long as the data obtained can be stored separately. A practical compact design of the rowwise, columnwise, and bitwise data controllers for the power stage is suggested to follow the guideline as:

$$M_B \geq M_R + M_C + 2. \quad (5)$$

In (5), constant number 2 originates from the terminated controllers, namely, the header and the trailer as similarly defined in [9]. The number of storage units (i.e., latches) is also reduced to  $(M_R + M_C + M_B)$ , instead of  $(M_R \times M_C \times M_B)$ . It should be noted that the data-address dependence can enable fast addressing by simpler logic operations, which reduces the delay of digital control.

### B. Asynchronous Adaptive Digital Pipeline Control

To drive the power stage of the proposed DLDO, the comparison result will either be stored or directly fed to the gate driver of each power stage unit according to the current state that is determined by the FSM. Supervised by the FSM, the state-oriented pipeline control is the base for successively comparing  $V_{FB}$  and  $V_{REF}$ , and updating the ON/OFF of power stage units accordingly, as shown in Fig. 7.

In Fig. 7, States 1–3 refer to the digital codes generated by the FSM that define the mode of row, column, and bit operation, respectively, as defined in Table I. A thermometer code is used to control a linearly divided array of grouped (States 1 and 2) or individual (State 3) units. The pipeline is shared at different modes and each pipeline unit is connected to two storage units. At the steady state when operating at bit mode, only the specified storage unit is active to store



Fig. 7. Proposed adaptive digital pipeline controller at different states.



Fig. 8. (a) Illustration of a section of the pipeline consisting of the ALU and ALU interface. (b) Simplified block diagram of the ALU in the pipeline controller.

the *in situ* data by comparing  $V_{REF}$  and  $V_{FB}$ , while the others are used to latch the ON/OFF status obtained from the row or column mode. The timing is asynchronous, including the comparator clock  $EnQ$ , which is generated by pipeline itself. The comparison result  $Q_{new}$  is latched to one active storage unit only at a clocking edge, marked as path 1 in Fig. 7. The active latched data  $Q_{latch}$  are fed back to direct the pipeline and clock propagation, marked as path 2.

A closer look into the pipeline operation is shown. Fig. 8(a) demonstrates a section of the pipeline controller that consists of two adjacent arithmetic logic units (ALUs) and, more importantly, the interface unit in between. The ALU, a baseline unit [9] for the proposed pipeline controller, is sketched in Fig. 8(b). An ALU that is connected in the pipeline receives a clock pulse from  $F_{n-1}$  and generates an active pulse to be transmitted to  $F_n$  (forward) or  $B_n$  (backward) after a delay time. In this design, moving forward (backward) in the pipeline means increasing (decreasing) the number of turned-on power stage units, as well as setting the next ALU as active.

The comparator starts a new comparison as the clock signal  $F_{n-1}$  arrives; and upon receiving the updated comparison result, the MUX inside the active ALU will make the decision on whether to propagate in a forward or backward direction. The decision is made by the comparison result  $Q_{new}$ , which is latched as  $Q_n$ , and the following  $Q_{n+1}$  and  $Q_{n+2}$ . As the length of the pipeline used in each state is different,



Fig. 9. Connected and disconnected pipeline controller units. (a) General cases and (b) special cases when the header and the trailer are involved in row/column operation.



Fig. 10. (a) FSM of the digital controller and simplified illustration of the generation of FSM triggering signal  $SteQ$ . (b) Proposed dual-path steady-state guard that generates the FSM triggering signal  $SteQ$ .

the interface unit has to be inserted between ALUs to either conduct or block the clock propagation. Note that the ALU will be reconfigured as the boundary pipeline unit, i.e., one trailer (header) for row (column) operation, which is shown as the triangular shape in Fig. 7.

The adjacent ALUs are “connected” if the switch  $SW_B$  and  $SW_F$  are connected, as shown in Fig. 8(a). When connected, the interface unit works by directly passing the output  $F_n$  of  $ALU\langle i \rangle$  to the input  $F_{n-1}$  of  $ALU\langle i+1 \rangle$ , and the output  $B_n$  of  $ALU\langle i+3 \rangle$  to the input  $B_{n+2}$  of  $ALU\langle i+1 \rangle$ . On the other hand, if both switches toggle to the other side, it will block the signal from the adjacent ALUs by using the preset static digital levels  $F_{n-1}$  and  $B_{n+2}$  instead. The connected and disconnected scenarios for three adjacent ALUs are also shown in Fig. 9. In particular, Fig. 9(a) shows the general cases, while Fig. 9(b) shows the boundary when the header and the trailer are involved.

If the FSM conducts the pipeline from column (State 2 in Fig. 7) to row operation (State 1), it should assist a faster regulation to start from the “next pipeline unit position” following the fully turned on ones at the row operation. In fact, during the state transition, such a “next” position in the pipeline can be determined from the data latch, after

which an initial clock pulse should be inserted at this position as a stimulation of the pipeline operation at a new state. For example, if the last row operation indicates the first three rows have been turned on, then after the column-to-row transition, the initial pulse should start from the fourth row in order to reduce unnecessary settling time. The inserted clock pulses are marked as preset signals for the disconnected pipeline section, i.e.,  $F_{n-1}$  or  $B_{n+2}$ , as shown in Fig. 9(a).

### C. FSM and Dual-Path Digital Steady-State Guard

The state transition among the proposed row, column, and bit operation is shown in Fig. 10(a). The signal that controls the FSM state transition is expressed as  $SteQ$ , which is the flipping flag of the steady state. When  $V_{OUT}$  experiences a variation of the regulated voltage and toward the same direction by a large number of steps, the  $SteQ$  flag flips from 1 to 0, which means it changes from State 3 to State 2 (i.e., bit to column operation) or from State 2 to State 1 (i.e., column to row operation). Otherwise, it will flip back if a steady operation is detected.

Such a procedure is referred to as the digital droop and overshoot detection. If a globally running clock is available,



Fig. 11. (a) Summarized logic operation and (b) exampled logic flow of the key signal in the dual-path steady-state guard (top: forward movement and bottom: backward movement). (c) Exampled illustrative timing diagram of forward and counter movement.

a counter with the programmed boundary can be applied. However, the clock pulse that is propagated in the asynchronous control is usually short, and further making use of these clock pulses for the *SteQ* generation is thus risky. Alternatively, notice that  $Q[1 : N]$  that is stored in the data latch during the pipeline operation remains level for a longer time; therefore, it is regarded as a more stable version of the aforementioned clock pulse, where  $N$  equals the length of the pipeline operating at particular state, i.e.,  $M_R$ ,  $M_C$ , or  $M_B$  for row, column, or bit operation, respectively. As shown in Fig. 10(b), the latched  $Q[1 : N]$  is used in counting the forward and backward movement of “0/1,” accurately reflecting the status of the pipeline operation.

Fig. 10(b) shows a combined block diagram for counting the bi-directional movements of “0/1” in  $Q[1 : N]$ . The forward-path detection includes a cascaded OR gate, Muller-C element, and AND gate, while the backward-path detection is comprised of an AND gate, Muller-C element, and NOR gate. A Muller-C element is a storage unit that remains at the same output level until both input levels are flipped. For this reason, the signal *SteQ* Pulse, which is generated when *SteQ* flips, is inserted to reset the Muller-C element when a new forward/backward movement detection is needed. After resetting by *SteQ* Pulse or the DLDO reset signal *RST*, any forward (or backward) movement in data latch  $Q[1 : N]$  will be counted, as long as the selection signal *MovFSEL*[1 : N] (or *MovBSEL*[1 : N]) is high. Such selection signals can help distinguish whether the movements are one-directional or back-and-forth, thus correctly triggering state transition in need.

Fig. 11(a) summarizes the logic operation of  $Q[i]$ , the intermediate signal  $QSF[i]$  ( $QSB[i]$ ), and the selection signal *MovFSEL*[ $i$ ] (*MovBSEL*[ $i$ ]) at different levels of *SteQ* Pulse (or *RST*), where  $i = 1 : N$ . For example, upon state transition (*SteQ* Pulse = 1), both *MovFSEL*[ $i$ ] and *MovBSEL*[ $i$ ] are reset to 0. After the *SteQ* Pulse is released, both signals are set to 1 or 0 according to the current  $Q[i]$  level. As the PMOS power stage is used, i.e.,  $Q[i]=0$  (1) means

turning on (off) the switch, for the forward (backward) counting, only flipping the  $Q[i]$  with an initial 1 (0) level will be counted.

The operation can be understood by scenarios (I)–(IV) in Fig. 11(b), which are all assumed to start with a steady state. Scenarios (I) and (III) show how the large one-directional movement is counted, while scenarios (II) and (IV) show a small number of back-and-forth variations, which will not trigger the state transition. For example, in scenario (I) of Fig. 11(b), for the transition of  $Q[i]$  from 1 to 0, the selection signal *MovFSEL*[ $i$ ] stays high until the transition is complete. In scenario (I), three forward counts are observed, after which, in scenario (II), there occurs one backward movement, which will not be counted as forward due to its selection signal *MovFSEL*[ $i$ ] already being locked at 0. Illustrative waveforms of exampled forward and counter movements are given in Fig. 11(c), which correspond to the operation scenarios of (I) and (II) of Fig. 11(b), respectively.

After counting a given number of movements toward one direction, which is generally set as around half the pipeline length, the counter will further flip *SteQ* and trigger the FSM for a state transition. As a result, the complete closed-loop FSM-supervised pipeline control is implemented.

#### D. Wide Input-Range Locally Clocked Dynamic Comparator

To achieve a short-delayed comparison while consuming low static power, the dynamic comparator topology introduced in [19] is used. As shown in Fig. 12, the comparator operates with two phases. In the pre-charge phase, the comparator is configured as a preamplifier with both nodes VOP and VON pulled to high. Before EN rises to “1,” the input difference between VIP and VIN is accumulated in the parasitic capacitor at VOP and VON. When the rising edge of EN is asserted, the regenerative latch will rapidly amplify the small difference to full swing. Its slew rate is not limited by biasing current; thus, the comparator achieves both low static power consumption and fast speed. It is reported to have a delay time as low as tens to hundreds of picoseconds [19].



Fig. 12. Clocked comparator in the proposed DLDO.



Fig. 13. (a) Simulated comparison delay time versus input error of the clocked comparator. (b) Simulated pipeline delay time at different process corners.

Fig. 14. Simulated steady-state operation of the proposed DLDO with (a) 30-mV  $V_{OUT}$  ripple at  $V_{IN} = 0.85$  V and  $V_{OUT} = 0.8$  V and (b) 5-mV  $V_{OUT}$  ripple at  $V_{IN} = 0.95$  V and  $V_{OUT} = 0.9$  V.Fig. 15. Simulated load transient response of the proposed DLDO illustrating FSM state transition, simulated at  $V_{IN} = 1.0$  V and  $V_{OUT} = 0.95$  V, with (a) rising load step and (b) falling load step.

The comparator delay time is affected by many factors. Fig. 13 plots the delay time versus the 1–10-mV input error. Inside each subplot, different curves stand for various input

Fig. 16. Simulated load transient response of the proposed DLDO illustrating the effects of the  $C_{OUT}$ , at simulated conditions of  $V_{IN} = 1.0$  V,  $V_{OUT} = 0.95$  V, and  $C_{OUT}$  of 150 pF, 500 pF, and 1.5 nF, with (a) rising load step and (b) falling load step.

Fig. 17. (a) Chip micrograph and (b) layout graph of the power stage of the proposed DLDO.

Fig. 18. Measured dc characteristics. (a)  $V_{OUT}$  versus  $I_{Load}$ . (b)  $V_{OUT}$  versus  $V_{IN}$ . (c) Current efficiency versus  $V_{OUT}$  at 10-mA  $I_{Load}$ . (d) Current efficiency versus  $I_{Load}$  at 50-mV dropout.

common-mode voltages. It is observed that the delay time is increased substantially if the supply voltage  $V_{DD}$  decreases from 1.0 to 0.6 V. To obtain a correct *in situ* comparison result, there should be more waiting time in each pipeline unit of the DLDO when  $V_{DD}$  is low. In this design, the delay variation is compensated by making use of the all-digital delay cell, which is adaptive to  $V_{DD}$  change in the same manner, thus



Fig. 19. Measured load transient response. (a)  $V_{IN}$ : 0.65 V and  $V_{OUT}$ : 0.6 V and (b)  $V_{IN}$ : 0.85 V and  $V_{OUT}$ : 0.8 V, both with 100-mA on-chip current step at 2-ns edge time.

avoiding the risk of running at a constant clock frequency. Fig. 13(b) shows the simulated delay time inside a pipeline unit across the VDD range at different process corners and room temperatures. It can be seen that the typical equivalent switching frequency is approximately within the range of 40–400 MHz.

#### E. Simulated Steady-State Operation and Transient Response of the Proposed DLDO

Fig. 14(a) and (b) plots the simulated steady-state operation of the proposed DLDO. The asynchronous clock pulses  $EnQ(0 : 15)$  for each pipeline stage are given, where  $EnQ(i)$  stands for the received clock signal for  $ALU(i)$ , the same as in Fig. 8(a). For clearer illustration, some  $EnQ$  pulses in the idle pipeline stages are skipped here as they do not have any level change.  $State(1 : 0)$  is the binary FSM state as seen in Fig. 10(a). This DLDO is observed to have a limit cycle phenomenon [20]–[22]; the number of toggling stages (bits) is larger than one, thus creating a small 30-mV oscillation with a frequency much slower than the sampling frequency, which is not addressed in this design. However, as Fig. 14(b) shows, after the loading current rises up to use more than one column of power stage units, the ripple is reduced by intentionally triggering a very short period of toggling one adjacent column instead of forcing the DLDO into bit-mode operation only. As a result, the FSM in the digital controller will flip back to bit-mode operation filling the small droop due to the intentional short switching OFF of the adjacent column. The complete turning on/off of a large number of bit units [if staying in state “11” in Fig. 14(a)] is replaced by the short-time



Fig. 20. Measured load transient response:  $V_{IN}$ : 1.05 V and  $V_{OUT}$ : 1.0 V, with (a) 100-mA on-chip current step and (b) 400-mA on-chip current step, both at 2-ns edge time.

incomplete toggling of one adjacent column unit [if swapping between state “11” and “10” in Fig. 14(b)], which reduces the voltage ripple due to the limit cycle phenomenon in particular for these loading conditions.

Fig. 15(a) and (b) plots the simulated load transient responses of the proposed DLDO when  $V_{IN}$  is 1 V and  $V_{OUT}$  is 0.95 V. The FSM state transition, as defined in Fig. 10(a), is clearly marked with the change of binary signal  $State(1 : 0)$ . Fig. 16(a) and (b) compares the effects of different  $C_{OUT}$  values of 150 pF, 500 pF, and 1.5 nF, with the same large load step. Compared with other smaller values of  $C_{OUT}$ , 1.5 nF is preferred as the output capacitance as it eliminates the undesirable large steady-state ripples and longer settling time, such as what is shown in the  $C_{OUT} = 150\text{ pF}$  case in Fig. 16(b), and more importantly, it reduces the undershoot and overshoot voltage after load transient.

## IV. MEASUREMENT RESULTS

The proposed DLDO is fabricated in a 1.2-V low-leakage 65-nm CMOS process. Fig. 17(a) is the chip micrograph, which shows a total chip area of  $1000\text{ }\mu\text{m} \times 720\text{ }\mu\text{m}$ , including the I/O pads, on-chip decoupling capacitors, and testing loads. The active area of the DLDO is  $450\text{ }\mu\text{m} \times 350\text{ }\mu\text{m}$ , including the power stage, comparator, and digital controller generated by a standard digital flow.

In the layout, the power distribution and signal propagation are considered. As Fig. 17(b) shows, for balanced heat dissipation, the eight columns in all six rows are uniformly distributed. All of the control signals follow the H-tree structure. The mismatch introduced by the layout is largely cancelled,

TABLE II  
COMPARISON OF PERFORMANCE

|                  | Type              | Process   | $V_{IN}$ (V) | $V_{OUT}$ (V) | $I_{OUT,MAX}$ (mA) | Equiv. binary bits | $f_{SW,MAX}$ (MHz)  | $C_{OUT}$ (nF) | Edge time $\Delta t$ (ns) | Current step $\Delta I_{MAX}$ (mA) | Voltage droop $\Delta V_{OUT}$ (mV) | $I_Q$ ( $\mu A$ )    | Load reg. (mV/mA) | Active area ( $mm^2$ ) | $FOM_T^{(4)}$ (ps)  | $FOM_V^{(5)}$ (V)    |
|------------------|-------------------|-----------|--------------|---------------|--------------------|--------------------|---------------------|----------------|---------------------------|------------------------------------|-------------------------------------|----------------------|-------------------|------------------------|---------------------|----------------------|
| [8] 2012 JSSC    | ALDO              | 45 nm SOI | 1.179-1.625  | 0.9-1.1       | 42                 | —                  | —                   | 1.46           | 0.309 <sup>(1)</sup>      | 4.5 <sup>(1)</sup>                 | 7.6 <sup>(1)</sup>                  | 12000 <sup>(1)</sup> | 0.233             | 0.075                  | 62.4 <sup>(1)</sup> | 0.022                |
| [3] 2007 JSSC    | 1/2 $V_{DD}$ Gen. | 90 nm     | 2.4          | 1.2           | 1000               | 4                  | 100 <sup>(2)</sup>  | 2.4            | 0.288                     | 1000                               | 120                                 | 25700                | N/A               | 0.03                   | 7.4                 | 0.003                |
| [4] 2010 CICC    | DLDO              | 65 nm     | 0.5          | 0.45          | 0.2                | 8                  | 1 (typ.)            | 100            | N/A                       | 0.2                                | 40                                  | 2.7 (1 MHz)          | 0.65              | 0.042                  | 270000              | N/A                  |
| [10] 2011 ASSCC  | DLDO              | 40 nm     | 1.34         | 1.20          | 250                | 5.8 <sup>(2)</sup> | 1000 (typ.)         | N/A            | 5                         | 114                                | 50                                  | 10000 (1 GHz)        | N/A               | 0.057                  | 6140.4              | 0.076                |
| [9] 2013 JSSC    | Async. DLDO       | 40 nm     | 0.9-3.6      | 0.8-3.5       | 200                | 5                  | —                   | 100            | N/A                       | N/A                                | N/A                                 | 0.05 <sup>(3)</sup>  | 0.05              | 0.08                   | N/A                 | N/A                  |
| [5] 2014 JSSC    | DLDO              | 32 nm     | 0.7-1.0      | 0.5-0.9       | 5                  | N/A                | 1000 <sup>(2)</sup> | 0.1            | 10                        | 0.8                                | 150                                 | 92                   | N/A               | 0.008                  | 1150                | 0.599 <sup>(2)</sup> |
| [11] 2015 TVLSI  | DLDO              | 110 nm    | 0.6-1.2      | 0.5-0.9       | 80                 | 9                  | 1                   | 1              | 25000 <sup>(2)</sup>      | 80                                 | 53                                  | 32                   | 0.3               | 0.04                   | 0.27                | 1.840                |
| [12] 2015 ISSCC  | DLDO              | 130 nm    | 0.5-1.2      | 0.45-1.14     | 4.6                | 7                  | 400                 | 1              | N/A                       | 1.4 <sup>(2)</sup>                 | 90 <sup>(2)</sup>                   | 24-221               | 10                | 0.114                  | 8571 <sup>(2)</sup> | N/A                  |
| [18] 2015 CICC   | Async. DLDO       | 65 nm     | 0.6-1.0      | 0.55-0.95     | 500                | 9.5                | —                   | 2              | 2 <sup>(1)</sup>          | 500                                | 250                                 | 350                  | 0.15              | 0.291                  | 0.7                 | 0.001                |
| [13] 2016 TCASII | DLDO              | 65 nm     | 0.6-1.1      | 0.4-1         | 100                | 11                 | 500                 | 1              | 20                        | 100                                | 55                                  | 82                   | 0.06              | 0.010                  | 0.45                | 0.003                |
| [14] 2017 JSSC   | DLDO              | 28 nm     | 1.10         | 0.90          | 200                | 6.6                | N/A                 | 23.5           | 4000                      | 180                                | 120                                 | 110                  | N/A               | 0.021                  | 9.57                | 1.019                |
| [16] 2016 ISSCC  | Async. DLDO       | 65 nm     | 0.5-1        | 0.45-0.95     | 3.5                | N/A                | —                   | 0.4            | N/A                       | 0.4                                | 40                                  | 12.5-216             | N/A               | 0.029                  | N/A                 | N/A                  |
| [24] 2017 ISSCC  | DLDO              | 65 nm     | 0.5-1        | 0.3-0.45      | 2.0                | 7                  | 240                 | 0.4            | 1                         | 1.06                               | 40                                  | 14                   | 5.6               | 0.0023                 | 199                 | 0.002                |
| This work        | Async. DLDO       | 65 nm     | 0.6-1        | 0.55-0.95     | 500                | 9.5                | —                   | 1.5            | 2 <sup>(1)</sup>          | 100                                | 50                                  | 300                  | 0.25              | 0.158                  | 2.3                 | 0.001                |

Note: Here “—” stands for “not needed”, “N/A” stands for “data not available”.

(1) Simulated data

(2) Estimated from figure or content

(3) “Freeze mode” [9] (when DLDO is not enabled)

(4)  $FOM_T = (C_{OUT} \times \Delta V_{OUT} \times I_Q) / I_{MAX}^2$  or  $T_R \times I_Q / \Delta I_{MAX}$  [1] ( $T_R$  given, e.g., [5], [10], [12])

(5)  $FOM_V = K (\Delta V_{OUT} \times I_Q) / \Delta I_{MAX}$  [2]



Fig. 21. Measured reference voltage tracking. (a)  $V_{IN}$ : 0.65 V,  $I_{OUT}$ : 20 mA, and  $V_{REF}$ : 0.5–0.6 V at 10-ns edge time. (b)  $V_{IN}$ : 0.95 V,  $I_{OUT}$ : 50 mA, and  $V_{REF}$ : 0.8–0.9 V at 10-ns edge time.

which allows the model of digital controllers to perform in a more predictable way.

The nominal input voltage is 0.6–1.0 V, and the dropout voltage is 50–200 mV. With the largest current capability



Fig. 22. Measured PSR at 10-mA  $I_{OUT}$  and 50-mV dropout condition.

of 500 mA, the quiescent current  $I_Q$  is only around 300  $\mu A$  at normal operation. Fig. 18(a) shows the measured dc characteristics of the output voltage versus load current for a targeted output voltage of 0.5–0.9 V at 50- and 200-mV dropout voltage. The load regulation for a 100-mA load range at 50-mV dropout voltage is below 0.25 mV/mA. Fig. 18(b) shows the measured output voltage versus input voltage at 100-mA load current, and the line regulation is measured to be around 40 mV/V.

Fig. 18(c) shows the current efficiency versus output voltage at different dropout scenarios at a light-load condition of 10 mA, and they are all above 96.5%. Fig. 18(d) shows the current efficiency versus load current at 50-mV dropout. For the same dropout voltage, the current efficiency at a higher  $V_{OUT}$  is slightly higher and extends to a larger load current.

The measured load transient response is given in Figs. 19 and 20. An on-chip switched resistive load with around 2-ns edge time (post-layout simulated) is used.

A 1.5-nF on-chip output capacitor is connected to emulate the on-chip loads and power line parasitics. Fig. 19(a) and (b) shows the measured transient response to a 100-mA current step at 2-ns edge time for 0.6- and 0.8-V  $V_{OUT}$ , respectively. Fig. 20(a) and (b) shows the load transient response for a 100- and 400-mA current step at 1.0-V  $V_{OUT}$ . The proposed DLDO has achieved a nanoseconds load transient response, e.g., for a 100-mA current step, it recovers in 10 ns and settles in 40 ns.

The measured reference tracking is given in Fig. 21(a) and (b). The 100-mV  $V_{REF}$  step at 10-ns edge time is used in the measurement. The proposed DLDO is able to uptrack such a reference voltage step in 15–16.7 ns, and downtrack in 11.1–25 ns, at various conditions. The measured power supply rejection (PSR), by injecting a 100-mV input ripple and measuring the output ripple, is plotted ranging from 100 kHz until the tester limit of over 10 MHz, as shown in Fig. 22. Given that  $V_{IN}$  is 50 mV higher than  $V_{OUT}$ , the PSR obtained is  $-21.2$ ,  $-16.3$ , and  $-9.6$  dB at 1 MHz at  $V_{OUT}$  of 1.0, 0.8, and 0.6 V, respectively. The PSR lowers as  $V_{IN}$  decreases due to the smaller gain as well as equivalent clock frequency, as similarly predicted in a model in [23].

This paper is compared with the state-of-the-art ALDOs and DLDOs, as summarized in Table II. From Table II, among all the other DLDOs, this paper can be seen to have a finer than nine-binary-bit resolution of voltage regulation, which is larger than most prior arts. In terms of figure of merits (FOM), this paper has achieved a fast  $FOM_T$  [1] and the smallest  $FOM_V$  [2] among all prior arts in Table II. This paper is compared with an ALDO with benchmark performance [8], a 1/2- $V_{DD}$  generator [3], as well as other published DLDO works [4], [5], [9]–[14], [16], [18], [24]. The input voltage can be as low as 0.6 V when still working in high speed at a low power consumption, which is another advantage over its fastest analog counterparts and other previously published DLDOs that are not targeted for low voltage operations.

## V. CONCLUSION

A 65-nm capacitorless asynchronous DLDO with nanoseconds range of transient response over hundreds of milliamperes and  $\sim 9.5$ -bit resolution is proposed. It has an input range of 0.6–1.0 V, and the nominal dropout voltage is as low as 50 mV. Targeting to achieve a balance between high speed and fine resolution enabled by a row–column-bit 3-D power stage and adaptive pipeline control, this DLDO has demonstrated the state-of-the-art FOMs. Key design considerations of the power stage that specifically target a balanced performance are presented. The proposed analysis framework is able to intuitively assist the design, and the simulated evaluation of the digital controller provides insights into further development for other fully on-chip applications.

## ACKNOWLEDGMENT

The authors would like to thank Y.-K. Teh, L. Cheng, Y. Gao, Y. Lu, and M. P. Chan for fruitful discussions, and S. F. Luk for technical support.

## REFERENCES

- [1] P. Hazucha, T. Karnik, B. A. Bloechel, C. Parsons, D. Finan, and S. Borkar, “Area-efficient linear regulator with ultra-fast load regulation,” *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 933–940, Apr. 2005.
- [2] J. Guo and K. N. Leung, “A 6- $\mu$ W chip-area-efficient output-capacitorless LDO in 90-nm CMOS technology,” *IEEE J. Solid-State Circuits*, vol. 45, no. 9, pp. 1896–1905, Sep. 2010.
- [3] P. Hazucha *et al.*, “High voltage tolerant linear regulator with fast digital control for biasing of integrated DC-DC converters,” *IEEE J. Solid-State Circuits*, vol. 42, no. 1, pp. 66–73, Jan. 2007.
- [4] Y. Okuma *et al.*, “0.5-V input digital LDO with 98.7% current efficiency and 2.7- $\mu$ A quiescent current in 65 nm CMOS,” in *Proc. IEEE Custom Integr. Circuits Conf.*, San Jose, CA, USA, Sep. 2010, pp. 1–4.
- [5] S. Gangopadhyay, D. Somasekhar, J. W. Tschanz, and A. Raychowdhury, “A 32 nm embedded, fully-digital, phase-locked low dropout regulator for fine grained power management in digital circuits,” *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2684–2693, Nov. 2014.
- [6] S. T. Kim *et al.*, “Enabling wide autonomous DVFS in a 22 nm graphics execution core using a digitally controlled hybrid LDO/switched-capacitor VR with fast droop mitigation,” in *IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2015, pp. 154–155.
- [7] F. Yang and P. K. T. Mok, “Area-efficient capacitor-less LDR with enhanced transient response for SoC in 65-nm CMOS,” in *Proc. IEEE Int. Symp. Circuits Syst.*, Melbourne, VIC, Australia, Jun. 2014, pp. 2325–2328.
- [8] J. F. Bulzacchelli *et al.*, “Dual-loop system of distributed microregulators with high DC accuracy, load response time below 500 ps, and 85-mV dropout voltage,” *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 863–874, Apr. 2012.
- [9] Y. H. Lee *et al.*, “A low quiescent current asynchronous digital-LDO with PLL-modulated fast-DVS power management in 40 nm SoC for MIPS performance improvement,” *IEEE J. Solid-State Circuits*, vol. 48, no. 4, pp. 1018–1030, Apr. 2013.
- [10] M. Onouchi *et al.*, “A 1.39-V input fast-transient-response digital LDO composed of low-voltage MOS transistors in 40-nm CMOS process,” in *Proc. IEEE Asian Solid-State Circuit Conf.*, Jeju, South Korea, Nov. 2011, pp. 37–40.
- [11] T.-J. Oh and I.-C. Hwang, “A 110-nm CMOS 0.7-V input transient-enhanced digital low-dropout regulator with 99.98% current efficiency at 80-mA load,” *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 23, no. 7, pp. 1281–1286, Jul. 2015.
- [12] S. B. Nasir, S. Gangopadhyay, and A. Raychowdhury, “A 0.13  $\mu$ m fully digital low-dropout regulator with adaptive control and reduced dynamic stability for ultra-wide dynamic range,” in *IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2015, pp. 98–99.
- [13] M. Huang, Y. Lu, S.-W. Sin, S.-P. U, and R. P. Martins, “A fully integrated digital LDO with coarse–fine-tuning and burst-mode operation,” *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 7, pp. 683–687, Jul. 2016.
- [14] Y.-J. Lee *et al.*, “A 200-mA digital low drop-out regulator with coarse–fine dual loop in mobile application processor,” *IEEE J. Solid-State Circuits*, vol. 52, no. 1, pp. 64–76, Jan. 2017.
- [15] F. Yang, Y. Lu, and P. K. T. Mok, “A comparative analysis on binary and multiple-unary weighted power stage design for digital LDO,” in *Proc. IEEE Asia Pacific Conf. Circuits Syst.*, Jeju, South Korea, Oct. 2016, pp. 41–42.
- [16] D. Kim and M. Seok, “Fully integrated low-drop-out regulator based on event-driven PI control,” in *IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers*, San Francisco, CA, USA, Jan./Feb. 2016, pp. 148–149.
- [17] F. Yang and P. K. T. Mok, “A 0.6–1 V input capacitor-less asynchronous digital LDO with fast transient response achieving 9.5b over 500 mA loading range in 65-nm CMOS,” in *Proc. IEEE Eur. Solid-State Circuit Conf.*, Graz, Austria, Sep. 2015, pp. 180–183.
- [18] F. Yang and P. K. T. Mok, “Fast-transient asynchronous digital LDO with load regulation enhancement by soft multi-step switching and adaptive timing techniques in 65-nm CMOS,” in *Proc. IEEE Custom Integr. Circuits Conf.*, San Jose, CA, USA, Sep. 2015, pp. 1–4.
- [19] M. Abbas, Y. Furukawa, S. Komatsu, J. Y. Takahiro, and K. Asada, “Clocked comparator for high-speed applications in 65 nm technology,” in *Proc. IEEE Asian Solid-State Circuit Conf.*, Beijing, China, Nov. 2010, pp. 1–4.
- [20] H. K. Khalil, *Nonlinear Systems*, 3rd ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2002.

- [21] S. B. Nasir and A. Raychowdhury, "On limit cycle oscillations in discrete-time digital linear regulators," in *Proc. IEEE Appl. Power Electron. Conf. Expo.*, Charlotte, NC, USA, Mar. 2015, pp. 371–376.
- [22] M. Huang, Y. Lu, S.-W. Sin, S.-P. U, R. P. Martins, and W.-H. Ki, "Limit cycle oscillation reduction for digital low dropout regulators," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 9, pp. 903–907, Sep. 2016.
- [23] S. B. Nasir and A. Raychowdhury, "All-digital linear regulators with proactive and reactive gain-boosting for supply droop mitigation in digital load circuits," in *Proc. IEEE Int. Symp. Circuits Syst.*, Montreal, QC, Canada, May 2016, pp. 205–208.
- [24] L. G. Salem, J. Warchall, and P. P. Mercier, "A 100 nA-to-2 mA successive-approximation digital LDO with PD compensation and sub-LSB duty control achieving a 15.1 ns response time at 0.5 V," in *IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2017, pp. 340–341.



**Fan Yang** (S'11) received the B.Eng. degree in electronic and information engineering from Zhejiang University, Hangzhou, China, in 2011, and the Ph.D. degree in electronic and computer engineering from The Hong Kong University of Science and Technology, Hong Kong, in 2016.

He is currently with Qualcomm, Singapore. His current research interests include ultra-low-power and mixed-signal power management IC design, analog and digital linear regulators, and switched-inductor dc–dc converters.

Dr. Yang was a recipient of the Hong Kong Ph.D. Fellowship from 2011 to 2015 and the Best Student Paper Award in the 2011 IEEE Student Symposium on Electron Devices and Solid-State Circuits.



**Philip K. T. Mok** (S'86–M'95–SM'02–F'14) received the B.A.Sc., M.A.Sc., and Ph.D. degrees from the University of Toronto, Toronto, ON, Canada, in 1986, 1989, and 1995, respectively, all in electrical and computer engineering.

In 1995, he joined the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, where he is currently a Professor. His current research interests include semiconductor devices, processing technologies, and circuit designs for power electronics and telecommunications applications, with current emphasis on power management integrated circuits, low-voltage analogue integrated circuits, and RF integrated circuits design.

Dr. Mok was a member of the International Technical Program Committees of the IEEE International Solid-State Circuits Conference from 2005 to 2010 and from 2015 to 2016. He received the Henry G. Acres Medal and the W.S. Wilson Medal from the University of Toronto, and the Teaching Excellence Appreciation Award three times from The Hong Kong University of Science and Technology. He was also a co-recipient of the Best Student Paper Award twice in the 2002 and 2009 IEEE Custom Integrated Circuits Conference. He served as a Distinguished Lecturer of the IEEE Solid-State Circuits Society from 2009 to 2010. He served as an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS from 2006 to 2011, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I from 2007 to 2009, and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II from 2005 to 2007 and from 2012 to 2015. He has been serving as an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I since 2016.