

# A Fully Passive Compressive Sensing SAR ADC for Low-Power Wireless Sensors

Wenjuan Guo, *Member, IEEE*, Youngchun Kim, Ahmed H. Tewfik, *Fellow, IEEE*,  
and Nan Sun, *Senior Member, IEEE*

**Abstract**—The compressive sensing (CS) theory states that the sparsity of a signal can be exploited to reduce the analog-to-digital converter (ADC) conversion rate and save power. However, most previous CS frameworks require dedicated analog CS encoders built by power-hungry active amplifiers, which limit the overall power saving. Differently, this paper proposes a fully passive switched-capacitor-based CS framework that directly embeds CS into a successive-approximation-register (SAR) ADC. The proposed CS-SAR ADC can operate in two modes: the Nyquist mode and the CS mode. In the CS mode, the CS-SAR ADC quantizes the input once every four-time sampling, reducing the conversion rate and the circuit power by four times compared to the Nyquist mode. A prototype chip is fabricated in a 0.13- $\mu\text{m}$  CMOS process. At 0.8 V and 1 MS/s, the chip consumes 19.2  $\mu\text{W}$  in the Nyquist mode and 5  $\mu\text{W}$  in the CS mode. Discrete-tone signals are converted and reconstructed with a peak signal-to-noise plus distortion ratio (SNDR) of 61 dB and maximum bandwidth occupancy of 8.2%. Speech signals are also used to demonstrate the capability of the chip to compressively sense real-world signals. Compared to prior CS works, it improves the post-reconstruction SNDR by 18 dB and the energy efficiency by 13 times.

**Index Terms**—Analog-to-information conversion (AIC), charge domain analog signal processing, compressive sensing (CS), SAR analog-to-digital converter (ADC), sparsity, switched-capacitor (SC) circuit.

## I. INTRODUCTION

WIRELESS sensors are becoming ubiquitous in modern society. Since 2011, the number of interconnected devices on the planet has overtaken the number of people [1]. The proliferation of various sensing devices results in the generation of enormous amounts of data which need to be captured, processed, stored, and communicated in a highly efficient way. In the past, nearly all signal acquisition protocols are dictated by the Nyquist theorem: the data acquisition rate must be at least twice the signal bandwidth. Nevertheless, the Nyquist theorem is not an efficient way to capture a sparse signal, for its information rate can be much lower than suggested by its bandwidth. As a matter of fact, many natural signals are sparse or compressible in certain domain including audio [2], image [3], and biological signals [4]. Specifically, audio signals generated by resonant systems mainly consist

Manuscript received December 13, 2016; revised March 10, 2017; accepted April 7, 2017. Date of publication June 26, 2017; date of current version July 20, 2017. This paper was approved by Associate Editor Seonghwan Cho. (*Corresponding author: Wenjuan Guo*)

The authors are with the Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712 USA (e-mail: wjguo@utexas.edu; youngchun@utexas.edu; tewfik@austin.utexas.edu; nansun@mail.utexas.edu).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2017.2695573



Fig. 1. Sparse signal acquisition systems. (a) Nyquist-rate ADC with subsequent digital compression. (b) Feature extraction with subsequent low-rate ADC. (c) CS encoder with subsequent low-rate ADC. (d) Proposed CS-SAR ADC.

of a small number of frequency components, allowing a sparse representation in the frequency domain. Biological signals are typically concentrated in time, allowing a sparse representation in either time or wavelet domain. To provide a more efficient data acquisition paradigm for these signals, a groundbreaking theory called compressive sensing (CS) was proposed by Candes *et al.* [5] and Donoho [6] in 2006, stating that sparse signals can be precisely recovered from far fewer measurements than the Nyquist rate. This implies a potential of dramatically relaxing the requirements of speed, power, and memory in a sparse signal acquisition system.

CS operates differently from conventional sparse signal acquisition techniques. As Fig. 1(a) shows, for general sparse-signal applications, data compression is conducted in the digital domain, which still requires a front-end analog-to-digital converter (ADC) to run at the Nyquist rate. For applications which are not interested in the entire signal information such as neural spike detection [7], [8], feature extraction techniques can be used to reduce the ADC conversion rate so that only enhanced features are captured from the signal [Fig. 1(b)]. However, this is highly application specific and the information loss strongly depends on the prior knowledge of the signal. In contrast, by directly correlating the signal with a small set of random waveforms through a CS encoder as shown in Fig. 1(c), CS is able to compress the signal into a small amount of random linear measurements without information loss. Moreover, its compression process is non-adaptive and may not need any prior knowledge of the signal except that it is sparse [9]. Since the required number of measurements is proven to be proportional to the information rate of the signal, data conversion in this scenario is also called analog-

to-information conversion (AIC) [10]. Considering tight power constraint at wireless sensor nodes (WSN), the compressed data can be first stored locally and then recovered after connecting the sensor to a back-end digital signal processor (DSP) where the power constraint is relaxed. The compressed data can also be wirelessly transmitted to cloud for recovery.

With the maturity of the CS theory, more and more circuit researchers are attracted to bringing it into practical use and implementing it on actual hardware [11]–[26]. Although prior works successfully reduce the conversion rate of the ADC, the implementation of the CS encoder, whose core operation is analog matrix multiplication, usually requires active operational transconductance amplifiers (OTAs) for continuous-time (CT) integration or low-pass filtering [11]–[21]. However, OTAs are intrinsically power hungry and scaling unfriendly. Moreover, since the OTA-based integrators still need to process full-bandwidth signal, they are required to have both wide bandwidth and low noise spectral density, which necessitate high-power consumption. As a result, the overall power saving considering both the ADC and the CS encoder is limited. In addition, most prior works require the signal to go through multiple parallel paths each of which consists of a CS encoder and an ADC, and thus, occupy a large area [15]–[21]. Due to these limitations, Chen *et al.* [27] even argue that a Nyquist-rate ADC plus digital CS encoders are a more energy-efficient CS framework than multiple analog CS encoders plus low-rate ADCs because power-hungry OTA-based analog integrators can be replaced by digital accumulators. However, this deviates from the original purpose of CS, which is to perform sparse data conversion below the Nyquist rate.

To fully harvest the benefits of CS to reduce the overall system power, this paper presents a novel power-efficient CS framework. Its key difference from prior works is that it performs the CS operation (i.e., analog multiplication and integration) not in CT, but in the discrete-time (DT) domain using fully passive switched-capacitor (SC) circuits without requiring any OTA. Specifically, the multiplication with a pseudo-random binary sequence (PRBS) is realized by chopping the differential inputs. The integration is implemented in DT as charge summation, realized by connecting capacitors in parallel. Such SC-based analog computation scheme does not need any OTA or consumes any static current, and thus, is highly power efficient. In addition, both switches and capacitors scale well. The SC circuit linearity also improves with CMOS scaling due to reduced channel charges. Furthermore, as shown in Fig. 1(d), this work embeds the entire SC-based CS encoder inside a successive-approximation-register (SAR) ADC and reconfigure the SC network as the SAR DAC after the CS encoding operation. This reduces the hardware cost, and also removes any interface circuit between the CS encoder and the ADC, which would consume additional power and cause SNR loss. The theoretical background behind the proposed CS-SAR ADC is the random demodulator (RD) architecture proposed by Kirolos *et al.* [11]. The advantage of RD over other CS schemes is that the signal only needs to go through one CS encoder. It saves area and power compared to other CS works that require multiple paths [15]–[21]. To the authors' best knowledge, this paper is the first fully passive

SC-based CS-SAR ADC using RD. A prototype in 130-nm CMOS fully demonstrates the validity of the proposed architecture [28]. Compared to prior CS implementations, it improves the post-reconstruction (PR) signal-to-noise plus distortion ratio (SNDR) and energy efficiency [i.e., figure-of-merit (FoM)] by 18 dB and 13 times, respectively.

This paper is organized as follows. Section II introduces the CS theory. Section III reviews existing CS frameworks and presents the proposed CS-SAR ADC architecture. Section IV shows the detailed circuit implementations. Section V shows the chip measurement results. Finally, Section VI concludes this paper.

## II. REVIEW OF COMPRESSIVE SENSING THEORY

Although CS involves various sub-disciplines within the applied mathematical sciences [9], this section intends to briefly review its three key concepts: sparsity which pertains to the signals of interest, incoherence which pertains to the sensing modality, and reconstruction which pertains to the signal recovery. For simplicity of presentation, all signals hereafter are denoted as discrete-time vectors.

### A. Sparsity

Suppose we have an input vector  $\vec{s}$  that can be expanded over an  $N \times N$  orthonormal matrix,  $\Psi = [\vec{\psi}_1, \vec{\psi}_2, \dots, \vec{\psi}_N]$  as

$$\vec{s} = \Psi \vec{\alpha} = \sum_{n=1}^N \alpha_n \vec{\psi}_n \quad (1)$$

where  $\vec{\alpha} \in R^N$  is the coefficient vector for  $\vec{s}$  in  $\Psi$  domain. When  $\vec{\alpha}$  only contains  $K \ll N$  non-zero entries,  $\vec{s}$  is defined as a  $K$ -sparse signal in  $\Psi$  domain. In a more general case, if we keep the largest  $K$  entries in  $\vec{\alpha}$  and zero the rest  $N - K$  entries to make a vector of  $\vec{\alpha}_K$ ,  $\vec{s}$  can be approximated as a  $K$ -sparse signal,  $\vec{s}_K = \Psi \vec{\alpha}_K$  when  $\|\vec{s} - \vec{s}_K\|_2$  is negligible. The knowledge of  $\Psi$  depends on the target signal. For example, if the signal is sparse in the time/frequency/wavelet domain,  $\Psi$  can be an identity/inverse discrete Fourier transform (IDFT)/inverse discrete wavelet transform (IDWT) matrix.

### B. Incoherence

To compress  $\vec{s}$  into a small number of measurements  $\vec{r} \in R^M$ , a  $M \times N$  sensing matrix,  $\Phi = [\vec{\phi}_1; \vec{\phi}_2; \dots; \vec{\phi}_M]$  is needed for a projection as follows:

$$\vec{y} = \Phi \vec{s} = \Phi \Psi \vec{\alpha}. \quad (2)$$

Since the locations of the key information in  $\vec{\alpha}$  are unknown, this projection should ensure that  $\vec{y}$  keeps all the information from  $\vec{\alpha}$ . To make it possible, the sensing matrix  $\Phi$  is required to have a low coherence with the sparse representation matrix  $\Psi$ . The definition of their coherence is as follows:

$$\mu(\Phi, \Psi) = \sqrt{N} \max_{m,n} |\langle \vec{\phi}_m, \vec{\psi}_n \rangle| \quad (3)$$

where  $1 \leq m \leq M$  and  $1 \leq n \leq N$ . As can be seen from (3), the coherence measures the largest correlation between any row vector of  $\Phi$  and any column vector of  $\Psi$ .

Intuitively speaking, the sensing matrix  $\Phi$  is to project  $\vec{s}$  into a domain where  $\vec{s}$  is not sparse to spread out all the

key information. If  $\Phi$  is maximally incoherent with  $\Psi$ , all entries in  $A = \Phi\Psi$  will have the same amplitude as  $1/\sqrt{N}$ . Therefore, each measurement in  $\vec{y}$  will contain contributed equally information from all entries of  $\vec{\alpha}$ . The key information in  $\vec{\alpha}$  is totally spread out and no information is missed. If  $\vec{\phi}_m$  is correlated with  $\vec{\psi}_n$ , the amplitude of  $A_{m,n}$  will be larger than other entries. Therefore, the  $m$ th measurement  $y_m$  will carry a larger weight from the  $n$ th entry  $\alpha_n$ , which may cause a misinterpretation if  $\alpha_n$  is not the key information. Fortunately, the selection of  $\Phi$  is not difficult. It has been proven that random matrices with independent identically distributed (i.i.d.) entries, such as Gaussian or  $\pm 1$  binary entries, exhibit a very low coherence with any fixed representation matrix such as IDFT and IDWT [5], [6]. The i.i.d  $\pm 1$  binary matrix (Bernoulli matrix) attracts the most attention of the research community, for the generation of PRBS and the corresponding projection can be easily implemented on hardware.

### C. Reconstruction

After acquiring the measurements  $\vec{y}$ , the next step is to recover the input signal  $\vec{s}$  based on (2). Since  $M < N$ , (2) is an under-determined equation with many solutions. However, with the knowledge that  $\vec{\alpha}$  is sparse, there is a high probability that the sparsest solution is the correct solution. Once  $\vec{\alpha}$  is solved,  $\vec{s}$  can be recovered using (1). A common method to find a sparse solution to an under-determined system is the traditional  $l_1$  minimization method which can be summarized as a convex optimization (CVX) problem

$$\min_{\vec{\alpha} \in R^N} \|\vec{\alpha}\|_{l_1} \quad \text{s.t. } \vec{y} = \Phi\Psi\vec{\alpha}. \quad (4)$$

In real implementation,  $\vec{y}$  is the output of an ADC that contains quantization noise and thermal noise. For a noisy  $\vec{y}$ , (4) is modified to a problem with relaxed constraints

$$\min_{\vec{\alpha} \in R^N} \|\vec{\alpha}\|_{l_1} \quad \text{s.t. } \|\vec{y} - \Phi\Psi\vec{\alpha}\|_{l_2} \leq \epsilon. \quad (5)$$

Problem (5) is often called the LASSO [29]. Provided that  $\Phi$  is a random matrix with i.i.d. entries, an exact solution to (5) with overwhelming probability requires  $M \geq T \cdot K \log(N/K)$ , where  $T$  is a constant related to  $\mu(\Phi, \Psi)$ . Many researchers have reported that  $M = 4K$  is an empirical estimation. Since  $K$  represents the information rate of a sparse signal, this lower bound proves that in a CS framework the ADC conversion rate is determined by the information rate rather than the signal bandwidth. Reference [9] further shows that the solution  $\vec{\alpha}^*$  to (5) obeys

$$\|\vec{\alpha}^* - \vec{\alpha}\|_{l_2} \leq \frac{T_0 \|\vec{\alpha} - \vec{\alpha}_K\|_{l_1}}{\sqrt{K}} + T_1 \epsilon \quad (6)$$

where  $T_0$  and  $T_1$  are constants depending on the applications. Equation (6) shows that the reconstruction error consists of two parts: the first comes from the source itself and the second comes from measurement errors.

Besides CVX approaches, greedy methods are another common class of algorithm to recover sparse solutions. Greedy methods generally have a lower computation complexity than CVX at a sacrifice of robustness and accuracy [30]. Since



Fig. 2. State-of-the-art CS frameworks and proposed CS framework. (a) RD. (b) RMPI/MWC. (c) NUS. (d) Proposed CS framework.

the implementation of the reconstruction block is beyond the scope of this paper, we implement it off-chip using MATLAB on a PC. A natural question that arises is how complex the nonlinear reconstruction block is, and how much power and area are needed if implemented on chip. This is still an active research area, and the answer depends on many factors such as the choice of the algorithm, the target performance, and the signal sparsity. Several research groups have shown that for a moderate PR performance, the reconstruction block can be implemented on chip with reasonable power consumption [31]–[33]. Besides, in some applications, a full reconstruction process can be circumvented by only extracting critical information from the compressed data [24]. More importantly, because the reconstruction block is fully digital, the power and area cost will keep shrinking with process scaling.

In general, CS can be viewed as an asymmetric compression scheme with economical encoding but relatively expensive recovery. Thus, CS is well suited for WSN applications, where front-end sensors are highly constrained in both energy and computational resources. Once the sensor signals are acquired and digitized, they can be saved or transmitted to a powerful back-end base station (or cloud) for digital signal processing without tight constraints on power and computational resources. The advantage for CS is that it automatically performs data compression at the front-end, and thus, its required data sensing and transmission rate is much lower than that for Nyquist-rate data acquisition.

## III. EXISTING AND PROPOSED COMPRESSIVE SENSING FRAMEWORKS

### A. State-of-the-Art CS Frameworks

State-of-the-art CS frameworks can be mainly categorized into four classes: RD, random-modulation pre-integrator (RMPI), modulated wideband converter (MWC),



Fig. 3. Circuit and timing diagram for the proposed 12-bit CS-SAR ADC.

and non-uniform sampler (NUS). RD is the first proposed CS framework validated both in theory and hardware [11]–[14]. As shown in Fig. 2(a), RD is composed of a mixer, a low-pass filter/integrator, and a low-rate ADC. The basic principle is to demodulate the signal by multiplying it with a Nyquist-rate PRBS, which spreads the signal tones across the entire spectrum. Then the demodulated signal passes through a low-pass filter/integrator and a low-rate ADC captures the signal information. A prototype hardware of RD is implemented in [13] by using discrete components to build an analog Gilbert mixer and an active Gm-C-based integrator. To further reduce the ADC conversion rate and introduce more randomization in the sensing matrix  $\Phi$ , the RMPI architecture is proposed which consists of a parallel of RDs driven by a common input [see Fig. 2(b)] [15]–[19]. However, compared to RDs, RMPI not only consumes more area and power but also needs to address more issues such as synchronization among channels. MWC is another variant of RD, which has a similar architecture of RMPI [20], [21]. Nevertheless, MWC is used for blind acquisition of multi-band signals while RMPI deals with multi-tone signal settings. Therefore, signals are modeled and analyzed in a rather different fashion. A more detailed comparison can be referred to [34]. As shown in Fig. 2(c), by directly using a PRBS to control the sampling of an ADC, NUS avoids analog multiplication and integration in other CS frameworks [22]–[24]. This makes NUS a simple and power-efficient architecture for CS. However, NUS usually shows a more limited performance than other architectures in terms of the maximum sparsity it can deal with. In addition, since NUS directly throws away the signal information in the time domain, it cannot be applied to time-domain sparse signals.

### B. Proposed CS-SAR ADC Architecture

Since the frequencies of most natural signals are in the order of kilohertz, the main challenges of WSN for natural signals lie in area and power consumption rather than speed requirement, meaning that RMPI and MWC are unnecessary. Instead of NUS, this paper proposes another simple and power-efficient architecture, which is a fully passive SC-based CS framework that directly embeds RD into an SAR ADC. As shown in Fig. 2(d), the proposed CS-SAR ADC operates in DT rather than CT, so that the CT integration is replaced by a DT

summation. In real implementation, both the multiplication and the summation are incorporated into the SC sampling network of an SAR ADC. In other words, the CS-SAR ADC is a fully passive power and hardware efficient realization of RD.

Fig. 3 shows the circuit and timing diagram for a 12-bit CS-SAR ADC architecture. Although a single-ended version is shown here, the real design is fully differential. There are three major differences between a CS-SAR ADC and a conventional SAR ADC. The 1st difference is that the input signal  $\vec{s}$  is multiplied with a PRBS  $\vec{p}$  to become a randomized result  $\vec{r}$  before being sampled. For a differential input signal, this multiplication is equivalent to changing the polarity of the signal which can be easily implemented by four switches. The 2nd difference is that the randomized result  $\vec{r}$  is only sampled to 1/4 of the total capacitance ( $C_{\text{tot}}$ ) for each sampling. The 3rd difference is that quantization does not happen after every-time sampling but only happens once every four-time sampling. The four sampling cycles are denoted as  $\phi_1 - \phi_4$ , and the quantization cycle is denoted as  $\phi_5$  in Fig. 3.  $\phi_{1e} - \phi_{4e}$  are a bit earlier cycles of  $\phi_1 - \phi_4$  for bottom-plate sampling. The power of an SAR ADC can be divided into the sampling power and the quantization power. The sampling power mainly comes from the input gain stage (if required) and the input buffer driving the capacitor array. Although most ADC works in the literature do not count this power into their calculation, it is not negligible in real applications. The CS-SAR cannot help with saving the input gain stage power. However, since the CS-SAR only samples  $\vec{r}$  to 1/4 of  $C_{\text{tot}}$  for each sampling, it can save the input buffer power by four times. Although this approach also results in increasing  $kT/C$  noise by four times during  $\phi_1 - \phi_4$ , four ( $kT/(C_{\text{tot}}/4)$ ) noises will be averaged to be one ( $kT/C_{\text{tot}}$ ) noise during  $\phi_5$  when the top plates of all the capacitors are connected back together. Therefore, the  $kT/C$  noise stays the same as a conventional SAR. Four-time fewer quantization also means that the quantization power is reduced by four times. Section IV-B further demonstrates that the power added by the mixer and the PRBS generation is negligible. Therefore, compared to a conventional SAR ADC, the proposed CS-SAR can save the total power by almost four times without considering the input gain stage power. Compared to conventional prior RD/RMPI architectures with



Fig. 4. DAC array configuration of the proposed 12-bit CS-SAR ADC in (a) sampling cycles  $\phi_1 - \phi_4$  and (b) quantization cycle  $\phi_5$ .

the same compression ratio (CR) of 4, it achieves similar ADC power saving, but avoids most power burned by the analog CS encoder in front of the ADC.

To explain the operation mechanism, the CS-SAR capacitor array is divided into two segments: MSB which consists of  $C_1 - C_4$  and LSB which consists of  $C_5$ . Among them,  $C_4$  and  $C_5$  are a group representation of several smaller LSB capacitors to simplify the illustration. One redundant capacitor of  $32 C$  is included in  $C_4$ , which helps absorb the digital-to-analog converter (DAC) settling error and facilitates foreground calibration of capacitor mismatches [25]. The largest MSB capacitor  $512 C$  is halved into  $C_1$  and  $C_2$  so that  $C_1 - C_4$  are all equal to  $256 C$  which can be used for four-time sampling with the same weight. Fig. 4 shows the DAC array configuration in  $\phi_1 - \phi_4$  and  $\phi_5$ , respectively. After multiplication between  $\vec{s}$  and  $\vec{p}$ , the CS-SAR ADC operates as follows. The randomized result  $\vec{r}$  is sampled onto  $C_1 - C_4$  consecutively from  $\phi_1$  to  $\phi_4$  in a bottom-plate sampling fashion. Once  $\vec{r}$  is sampled, the value is held until  $\phi_5$  raises high. Then all four sampled values are averaged on the fly and the asynchronous quantization starts. Note that  $C_5$  samples 0 during  $\phi_4$ , which only causes a minor attenuation on the averaged result.

Referring back to the CS theory, for an  $N$ -length input vector  $\vec{s}$ , the CS-SAR ADC only outputs an  $M$ -length measurement vector  $\vec{y}$  herein  $M = N/4$ . The relationship between  $\vec{s}$  and  $\vec{y}$  forms an under-determined equation with  $N = 8$  as an example

$$\begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} g_1 p_1 & g_2 p_2 & g_3 p_3 & g_4 p_4 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & g_1 p_5 & g_2 p_6 & g_3 p_7 & g_4 p_8 \end{bmatrix} \times [s_1 \ s_2 \ s_3 \ s_4 \ s_5 \ s_6 \ s_7 \ s_8]^T \quad (7)$$

where  $g_i$  represents every-time sampling weight  $C_i/(C_1 + C_2 + C_3 + C_4 + C_5)$  where  $i = 1, 2, 3, 4$ . Without capacitor mismatch, all  $\{g_i\}$  are equal. Otherwise, they can be calibrated along with capacitor mismatch calibration. Note that  $g_i$  ensures that the input signal  $\vec{s}$  can go to rail-to-rail swing and  $\vec{y}$  will not go over the ADC range. However, after multiplying with PRBS, signals sampled on each  $C_i$  become uncorrelated with each other. With all  $\{g_i\} = 1/4$ , the signal power is reduced by 6 dB after the averaging. According to (6), the reconstruction error consists of two terms: the source itself and the measurement errors. If the source error is dominant, the input signal power decrease will not degrade the

reconstruction performance, for any source error will be also scaled by  $g_i$  along with the input signal. The measurement errors mainly come from the ADC noise. If the measurement errors dominate, we can expect to see a maximum 6-dB SNR degradation compared to Nyquist sampling with the constant  $T_1 = 1$  in (6). However, when the signal is very sparse,  $T_1$  can be smaller than 1, for the reconstruction process tends to pick out the sparse signals and suppress the noise.

Looking at (7), each signal sample  $s_n$  only exists in one measurement point  $y_m$ . Since the signal information is not spread out in the time domain, the proposed CS-SAR architecture in Fig. 3 cannot be directly applied to time-domain sparse signals, which is the same as NUS. All the simulations and measurements in this paper are presented with frequency-domain sparse signals. The sparse representation matrix  $\Psi$  is an  $N$ -point IDFT matrix. Nevertheless, the CS-SAR can be extended to a multi-channel architecture like RD is extended to RMPI. The CS-SAR itself can also be modified to another CR by splitting the capacitor array in the corresponding way. A multi-channel CS-SAR can spread out the signal information in the time domain, and thus be applied to time-domain sparse signals.

### C. MATLAB Simulation

To compare the theoretical performance of the proposed CS-SAR ADC with prior architectures, we model the proposed CS-SAR ADC and prior architectures in MATLAB assuming  $N = 1024$  and  $M = 256$ . The input signals are discrete-tone signals consisting of multiple sinusoidal waveforms at different frequencies. Since the MWC architecture is targeted for multi-band signals, we do not make a comparison with it here. For comparison, we model two scenarios: 1) with 11-b effective number of bits (ENOB) ADC noise and 2) with 11-b ENOB ADC noise and 50-dB input SNR. We also use one CVX approach named as SL0 [35] and one greedy method named as orthogonal matching pursuit (OMP) [36] for signal recovery. The PR SNDR results are shown in Figs. 5 and 6, respectively. Note that if the sparsity of a discrete-tone signal is  $K$ , it only contains  $K/2$  different frequencies due to the symmetry of DFT. For each  $K$  value,  $K/2$  frequencies are randomly generated 20 times and the PR SNDR is averaged. To have a better control of the signal power, each frequency tone has the same amplitude while the phase is random. Comparing Figs. 5(a) and 6(a) with Figs. 5(b) and 6(b),



Fig. 5. MATLAB simulated average PR SNDR with 11-b ENOB ADC noise. (a) Convex optimization SL0. (b) Greedy method OMP.



Fig. 6. MATLAB simulated average PR SNDR with 11-b ENOB ADC noise and 50-dB input SNR. (a) Convex optimization SL0. (b) Greedy method OMP.

SL0 always achieves 3–4 dB higher peak SNDR than OMP, which proves the accuracy and robustness of CVX approaches. Comparing Fig. 5 with Fig. 6, the SNR analysis mentioned in Section III-B is verified. When the measurement error (ADC noise) is dominant (Fig. 5), the proposed CS-SAR ADC shows an SNDR degradation compared to RMPI and NUS. In Fig. 5(a), its peak SNDR is 65 dB, which is 3 dB lower than Nyquist sampling an 11-b ENOB ADC. However, when the source error (input noise) is dominant, there is no SNDR degradation for the CS-SAR. In Fig. 6(a), its peak SNDR is 53 dB, which is 3 dB higher than the input SNR. In other words, the disadvantage of the CS-SAR can be mitigated by using an ADC whose resolution is higher than the input SNR, which is the usual case with most natural signal acquisition systems. Besides, taking SNDR = 50 dB in scenario 1) and SNDR = 40 dB in scenario 2) as boundaries, the CS-SAR always outperforms NUS in terms of the maximum sparsity ( $K_{\max} \geq 88$ ). Defining the signal bandwidth occupancy as  $K/N$ ,  $K_{\max} = 88$  translates to a maximum signal bandwidth occupancy of 8.6%.

#### IV. CIRCUIT IMPLEMENTATION AND ANALYSIS

The CS-SAR ADC consists of the clock generator, the mixer, the DAC array, the comparator and the asynchronous SAR logic. This section will introduce the detailed circuit design of each block.



Fig. 7. Clock generator circuit diagram.

#### A. Clock Generator

As shown in Fig. 7, the master  $clk$  is gated to generate  $\phi_1$ – $\phi_4$ . A shift register is initialized to [1, 0, 0, 0] by a  $reset$  signal to start the phase counting. The shift register is triggered by the falling edge of  $clk$ , so that the duty cycle of  $clk$  can



Fig. 8. Passive mixer. (a) Direct implementation. (b) Improved implementation.



Fig. 9. Power spectrum of a PRBS in (a) CT domain and (b) DT domain.

fully pass through transmission gates controlled by the shift register outputs  $Q_1 - Q_4$ . In this way, all sampling edges of  $\phi_1 - \phi_4$  are aligned with the falling edge of  $clk$ , minimizing the sampling skew. With a *mode* signal switching the shift register input to  $V_{dd}$ , the CS-SAR ADC can be reconfigured to the Nyquist mode for non-sparse signals. When all flip-flops are loaded with 1, the phase counting stops and  $\phi_1 - \phi_4$  become the same as  $clk$ . Foreground calibration of capacitor mismatches can also be conducted in this mode.

### B. Mixer

The mixer is to multiply the signal with a PRBS which is essentially changing the polarity of the signal. Fig. 8(a) shows a direct passive implementation of the mixer using four switches. The problem with Fig. 8(a) is that the mixer is on the signal path to the whole DAC array. Large-size switches are required to ensure passing the signal with a high linearity. Fig. 8(b) proposes a better solution by combining the mixer switches with the sampling switches  $\phi_1 - \phi_4$  so that no extra switches are added to the signal path to affect the linearity. Besides, the PRBS generation circuit does not directly drive large sampling switches, but only drives two AND gates for each channel, which also saves power.

Although previous CS works [15], [16] also use four-switch passive mixers, they operate in the CT domain. The power spectrum of a PRBS in CT follows a sinc function, which is plotted in Fig. 9(a). As can be seen, the spectrum rolls off at high frequency, meaning that the signal tones at higher



Fig. 10. Eleven-bit LFSR.

frequencies contribute less power to the randomized signal. Consequently, the reconstruction performance becomes worse with increasing frequencies, which has been demonstrated in the chip measurement results of [16]. By contrast, in the proposed CS-SAR ADC, the mixer operates in DT domain, for the multiplied result is sampled before being averaged. If a PRBS has good correlation properties, the spectrum of a PRBS in the DT domain is white, as shown in Fig. 9(b). Therefore, there is no performance degradation for high-frequency signal tones in the proposed CS-SAR ADC architecture.

There are several classes of PRBSs which have been reported to have good correlation properties in the communication studies such as maximum length, Gold, Kasami, and Hadamard sequences [30]. The maximum-length sequences (M-sequences) are most widely used in previous CS works due to its simple generation. Although our design uses an



Fig. 11. One-bit DAC array for 3-bit resolution.



Fig. 12. Comparator architecture.

external field-programmable gate array (FPGA) to feed a 1024-length M-sequence to the chip, the PRBS generation can be easily integrated on chip with negligible power and area consumption. For example, Fig. 10 shows one possible implementation of the 1024-length M-sequence generation by using 11 flip-flops and one XOR gate to make a linear feedback shifter register (LFSR). Since all zero state is a dead end for the LFSR, a  $m$ -bit LFSR can only achieve a  $(2^m - 1)$ -length M-sequence where  $m$  is the number of flip-flops. Therefore, to get a 1024-length M-sequence, we use an 11-bit LFSR and reset it to the initial state every 1024 clock cycles. At 0.8 V and 1 MS/s, the simulated power of the PRBS generation in a  $0.13\text{-}\mu\text{m}$  process is 98 nW, which is negligible compared to the chip power [see Fig. 15(b)].



Fig. 13. Asynchronous SAR logic architecture.



Fig. 14. Die photograph of the fabricated CS-SAR ADC.

### C. DAC Array

By using the switching technique in [37], a 10-bit DAC array can be used to produce 12-b output (see Fig. 3), thus reducing the DAC switching power by four times assuming the same unit capacitance  $C$ . Fig. 11 shows a 1-bit DAC



Fig. 15. Power breakdown of the CS-SAR ADC chip at 0.8 V and 1 MS/s in (a) Nyquist mode and (b) CS mode.



Fig. 16. Measured static performance. (a) DNL. (b) INL.

array example to give a 3-bit output. As can be seen, both sides of the differential DAC array are charged to an initial sequence [gnd,  $V_{\text{ref}}$ ]. During every comparison cycle, only one side will be switched to  $V_{\text{ref}}$  or gnd, which gives one more bit compared to the conventional SAR switching technique. The last unit capacitor uses  $V_{\text{cm}}$  as another reference voltage to get the 3rd bit. The comparator input common-mode voltage  $V_{\text{cm}}$  finally converges to  $V_{\text{cm}}$ , and thus, obviates the need for a special comparator.

#### D. Comparator

The comparator uses a strong-arm latch architecture, as shown in Fig. 12. Since this architecture has no static biasing, the average power consumption is proportional to the conversion rate. Therefore, compared to the Nyquist-rate mode, the comparator power is reduced by four times in the CS mode. Its simulated input referred noise  $\sigma$  is  $258 \mu\text{V}$  and the offset  $\sigma$  is  $4.4 \text{ mV}$ . Although the comparator offset does not affect the ADC linearity, it needs to be removed before reconstruction. Otherwise it is turned into large noise by the reconstruction process. This phenomenon can be explained as

$$\vec{y} = \Phi \vec{s} + \vec{V}_{\text{os}} = \Phi(\vec{s} + \Phi^+ \vec{V}_{\text{os}}) \quad (8)$$

where  $\Phi^+$  is the pseudo-inverse matrix of  $\Phi$ , and  $\vec{V}_{\text{os}}$  is the comparator offset. Since  $\Phi$  is a random matrix,  $\Phi^+$  is also a random matrix, and thus  $\Phi^+ \vec{V}_{\text{os}}$  becomes large white noise added to  $\vec{s}$ . Fortunately, the multiplication of  $\Phi$  and  $\vec{s}$  inherently paves the way for an offset digital background calibration mechanism. By taking the mean of the ADC result, the offset can be estimated as

$$V_{\text{os}} = \text{mean}(\vec{y}) = \text{mean}(\Phi \vec{s} + \vec{V}_{\text{os}}). \quad (9)$$

Once extracted,  $V_{\text{os}}$  is subtracted from  $\vec{y}$  before reconstruction. Note that the intrinsic random modulation of  $\Phi \vec{s}$  completely



Fig. 17. Measured output spectra with a  $-0.5 \text{ dBFS}$  286.3-kHz input.

decorrelates  $\vec{y}$  from  $\vec{s}$ , allowing a robust and accurate extraction of  $V_{\text{os}}$ .

#### E. Asynchronous SAR Logic

An  $N$ -bit synchronous SAR ADC relies on dividing a master clock into a signal tracking phase and  $N$  conversion phases. Since a CS-SAR ADC also needs to divide the master clock into  $\phi_1 - \phi_5$ , synchronous implementation becomes much more complex. Therefore, we implement the SAR logic in an asynchronous fashion, as shown in Fig. 13. Once the comparator finishes making a decision, a *rdy* signal will be raised to trigger a sequencer which provides 13-phase clocks  $sclk_1 - sclk_{13}$ . In a classic asynchronous SAR ADC, the sequencer usually drives SR latches, switching logic, and temporary bit caches to store the internal comparison



Fig. 18. Measured SNDR/SFDR trends with (a) different input frequencies and (b) different input amplitudes.



Fig. 19. Twenty-time average PR SNDR versus the sparsity K with different reconstruction algorithms.

results [38]. Differently, this paper proposes to use strong-arm latches to store them, which greatly reduces the logic complexity. When  $sclk_i$  is low, both outputs of the  $i$ th latch are reset to high. When  $sclk_i$  is high, the  $i$ th latch will make a decision based on current differential comparator outputs. Once the decision is made, the  $i$ th-bit differential comparator results are stored in the  $i$ th latch until  $sclk_i$  becomes low again. This operation exactly matches the switching scheme in Fig. 11. Therefore, strong-arm latches can be used here to directly drive the differential DAC. The  $rdy$  signal also triggers a delay line to generate a  $lat$  signal self-clocking the comparator.

## V. MEASUREMENT RESULTS

The proposed 12-bit CS-SAR ADC is fabricated in a  $0.13\text{-}\mu\text{m}$  CMOS process, occupying a core area of  $0.2\text{ mm}^2$ . Fig. 14 shows its die photograph. The total DAC capacitance

is  $2.1\text{ pF} \times 2$  with a unit capacitor of  $2\text{ fF}$ . The DAC array is laid out in a segmented common-centroid way to minimize the capacitor mismatch due to the parasitic capacitors of routing wires [39]. The chip is tested in both the Nyquist mode and the CS mode. In the Nyquist mode, the PRBS is set to be always 1. The SAR ADC itself performance is measured. In the CS mode, a 1024-length M-sequence is generated from an FPGA and fed into the chip after level shifting. Both discrete-tone signals and real audio signals are used to demonstrate the CS performance.

### A. Nyquist-Mode Measurement Results

At 0.8-V supply and 1 MS/s, the Nyquist-mode ADC consumes  $19.2\text{-}\mu\text{W}$  power, whose breakdown is shown in Fig. 15(a). The digital power comes from the clock generation and the SAR logic. The reference power comes from the DAC switching and the rest is the comparator power. As can be seen, about 60% power comes from the digital portion, which can be greatly reduced with the technology scaling. To characterize the static performance, Fig. 16 shows the measured integral non-linearity (INL) and differential non-linearity (DNL) without any calibration. The worst INL and DNL errors are 1.6 LSB and 0.75 LSB, respectively. To characterize the dynamic performance, Fig. 17 shows the output spectra with a  $-0.5\text{ dBFS}$  286.3-kHz input. The SNDR is 65 dB and the spurious-free dynamic range (SFDR) is 77 dB. Fig. 18 shows the measured SNDR/SFDR trends with different input frequencies and amplitudes. Since the SNDR performance of the design is limited by noise, there is no necessity to further apply foreground calibration to reduce the INL and DNL error. Combining all the metrics, the CS-SAR ADC achieves a Walden FoM of  $13.2\text{ fJ/conversion-step}$  in the Nyquist mode.

### B. CS-Mode Measurement Results

Fig. 15(b) shows the CS-mode power breakdown at 0.8 V and 1 MS/s. Since the CS-SAR ADC only quantizes once



Fig. 20. Time and frequency domain comparisons of the discrete-tone signals (in black) and the corresponding SL0-reconstructed signals (in red) with (a)  $K/2 = 1$  and (b)  $K/2 = 12$ .



Fig. 21. Comparisons of the 1-s long speech signal  $\bar{s}$  (in black) and the corresponding SL0-reconstructed signal  $\bar{s}^*$  (in red) in (a) time domain and (b) frequency domain. The error signal (in blue) is  $\bar{s} - \bar{s}^*$  in the time domain.

every four-time sampling, the effective conversion rate is 250 kS/s. As can be seen, the reference power and the comparator power exactly scale by four times compared to the Nyquist-mode power in Fig. 15(a). The digital power scales by a bit less than four times because the clock power does not scale.

First, discrete-tone signals are tested. The input settings are the same as MATLAB simulation. The comparator offset has been extracted according to (9) and subtracted from the CS-SAR output before reconstruction. The PR SNDR results reconstructed by SL0 and OMP are presented in Fig. 19. With 50-dB SNDR as a boundary, the maximum  $K$  that can be recovered is 84 (8.2% bandwidth occupancy), which is very close to the MATLAB simulation results in Section III-C. As examples, Fig. 20 shows the time and frequency domain comparisons of the input signals and the SL0-reconstructed signals with  $K/2 = 1$  and  $K/2 = 12$ . When  $K/2 = 1$ , the PR SNDR reaches the peak value of 61 dB by using SL0

(see Fig. 19). Compared to the Nyquist-mode, there is a 4 dB decrease on the peak SNDR. As explained in Section III-B, the reason is that the measurement errors are dominant during the test and the SNR is degraded by the gain coefficients  $g_i$  in (7). The CS-SAR ADC achieves a peak Walden FoM of 5.5 fJ/conversion-step in the CS mode.

Next, we show the capability of the CS-SAR ADC to compressively sense natural sparse signals. A 1-s long 16 kHz-sampled speech signal is taken as an example. The total length of the speech signal is 16 000. Since the PRBS length is 1024, the speech signal is divided into multiple 1024-length frames, each of which individually conducts the CS process. Otherwise, a longer length PRBS would be required. For convex optimization methods, the computation complexity is at  $O(N^3)$  where  $N$  is the PRBS length. Therefore, a longer length PRBS significantly increases the reconstruction complexity. To avoid severe reconstruction artifacts present at the frame boundaries, all frames are 50% overlapped with each other and windowed

TABLE I  
COMPARISON WITH STATE-OF-THE-ART CS WORKS

| Design                                                | [19]             | [23]                | This work          |
|-------------------------------------------------------|------------------|---------------------|--------------------|
| Architecture                                          | RMPI             | NUS                 | RD                 |
| Need OTAs                                             | Yes              | No                  | No                 |
| On-chip PRBS generator                                | Yes              | No                  | No                 |
| CMOS technology                                       | 0.13 $\mu$ m     | 90nm                | 0.13 $\mu$ m       |
| Supply                                                | 0.9V             | 0.9V                | 0.8V               |
| No. of channels                                       | 64               | 1                   | 1                  |
| Area                                                  | 6mm <sup>2</sup> | 0.15mm <sup>2</sup> | 0.2mm <sup>2</sup> |
| Bandwidth                                             | 1kHz             | 12.5MHz             | 500kHz             |
| Effective conversion rate                             | 0.5kS/s (CR=4)   | 5MS/s (CR=5)        | 250kS/s (CR=4)     |
| Maximum occupancy                                     | 5%               | 4%                  | 8.2%               |
| Resolution                                            | 10b              | 10b                 | 12b                |
| Peak PR SNDR by CVX<br>(Peak ENOB)                    | 40.6dB<br>(6.5b) | 43dB<br>(6.9b)      | 61dB<br>(9.8b)     |
| Power                                                 | 1.8 $\mu$ W      | 175 $\mu$ W         | 5 $\mu$ W          |
| Peak FoM [/conv-step]                                 | 9.9pJ            | 73.3fJ              | 5.5fJ              |
| $PeakFoM = Power / (2^{PeakENOB} \times 2 \times BW)$ |                  |                     |                    |

to smooth the edges [40]. Although the original signal is sampled at 16 kHz, we treated it like being sampled at 1 MHz and fed it into the CS-SAR running at 250 kS/s so that we can use the same test bench as the discrete-tone case. The reconstruction results are shown in Fig. 21. As can be seen, the reconstructed signal matches well with the original signal. All major frequency peaks are recovered correctly. To quantify the fidelity between the input signal  $\vec{s}$  and the reconstructed signal  $\vec{s}^*$ , we define a reconstruction signal-to-reconstruction error-ratio (SRER) [30] as

$$SRER(\vec{s}, \vec{s}^*) = 20 \log_{10} \left( \frac{||\vec{s}||_2}{||\vec{s}^* - \vec{s}||_2} \right). \quad (10)$$

Based on (10), the SRER for the speech signal in Fig. 21 is 15.3 dB, which is mainly limited by the SNR of the input speech signal itself. As shown in Fig. 21(b), a speech signal in the real world usually contains large wideband noise in the spectrum [41], [42]. By taking the largest 8.2% DFT bins as signal bins while others as noise bins, the calculated effective SNR of the input speech signal is only 17 dB, which limits the SRER.

### C. Chip Performance Comparison

Although many CS frameworks have been proposed in recent years, few of them are actually implemented on chip. To our best knowledge, this paper is the first fully passive RD implemented on chip. Table I summarizes its performance and compares it with state-of-the-art CS works on chip measurement results. As can be seen, this paper achieves the highest ENOB and the maximum sparsity with a much better energy efficiency. Based on the RMPI architecture, [19] requires multi-channel PRBS generations, analog multiplications and active integrations, leading to a much larger area and power consumption than this paper. Compared to the NUS work in [23], our CS-SAR also achieves 18-dB higher peak PR SNDR and 13-time better peak energy efficiency (FOM). In terms of the maximum bandwidth occupancy, referring to Figs. 5 and 6, the performance of our chip is close to the theoretical performance while [19] is worse. We believe the main reason for the discrepancy is that the proposed CS-SAR is a much simpler and more linear architecture than RMPI, which ensures the accuracy and robustness of reconstruction. Same as Figs. 5 and 6, the NUS work in [23] shows a lower maximum occupancy than our work, especially when it uses

a CR of 5 rather than 4. Note that the total power of our chip is dominated by the digital circuit (see Fig. 15). Since the prototype is implemented in a relatively old process 130 nm, significant power saving can be achieved by going into a more advanced process, especially considering that the proposed CS-SAR is highly scaling compatible without requiring any OTA.

## VI. CONCLUSION

This paper has presented a fully passive CS-SAR ADC architecture for low-power wireless sensors. Compared to previous CS works, the proposed CS-SAR ADC does not require dedicated CS encoders and directly embeds random demodulation into an SAR ADC. The circuit architecture only needs minor modification to the SAR ADC architecture and it can be easily reconfigured between the Nyquist mode and the CS mode. In the CS mode, the CS-SAR ADC quantizes the average result of every four samples, realizing a CR of four compared to the Nyquist mode. A 0.13- $\mu\text{m}$  CMOS prototype chip is fabricated to validate the effectiveness of the CS-SAR ADC architecture. Both discrete-tone signals and real-world speech signals are successfully converted and reconstructed. The chip consumes four-time less power in the CS mode than the Nyquist mode. The CS-SAR ADC architecture can be extended to another CR by splitting the DAC array in the corresponding way. It depends on the target signal sparsity and desired SNR. A smaller CR can deal with a less sparse signal and introduce less signal power degradation while the power saving is compromised. In short, this paper provides the first fully passive hardware realization of a random-demodulation-based CS framework, paving the way for the practical use of CS.

## REFERENCES

- [1] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, "Internet of Things (IoT): A vision, architectural elements, and future directions," *Future Generat. Comput. Syst.*, vol. 29, no. 7, pp. 1645–1660, 2013.
- [2] M. D. Plumley, T. Blumensath, L. Daudet, R. Gribonval, and M. E. Davies, "Sparse representations in audio and music: From coding to source separation," *Proc. IEEE*, vol. 98, no. 6, pp. 995–1005, Jun. 2010.
- [3] M. Elad and M. Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," *IEEE Trans. Image Process.*, vol. 15, no. 12, pp. 3736–3745, Dec. 2006.
- [4] E. G. Allstot, A. Y. Chen, A. M. R. Dixon, D. Gangopadhyay, and D. J. Allstot, "Compressive sampling of ECG bio-signals: Quantization noise and sparsity considerations," in *Proc. Biomed. Circuits Syst. Conf. (BioCAS)*, Nov. 2010, pp. 41–44.
- [5] E. J. Candès, J. Romberg, and T. Tao, "Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information," *IEEE Trans. Inf. Theory*, vol. 52, no. 2, pp. 489–509, Feb. 2006.
- [6] D. L. Donoho, "Compressed sensing," *IEEE Trans. Inf. Theory*, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
- [7] V. Karkare, S. Gibson, and D. Marković, "A 130- $\mu\text{W}$ , 64-channel spike-sorting DSP chip," in *Proc. IEEE Asian Solid-State Circuits Conf. (ASSCC)*, Nov. 2009, pp. 289–292.
- [8] N. Verma, A. Shoeb, J. Bohorquez, J. Dawson, J. Guttag, and A. P. Chandrakasan, "A micro-power EEG acquisition SoC with integrated feature extraction processor for a chronic seizure detection system," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 804–816, Apr. 2010.
- [9] E. J. Candès and M. B. Wakin, "An introduction to compressive sampling," *IEEE Signal Process. Mag.*, vol. 25, no. 2, pp. 21–30, Mar. 2008.
- [10] M. Verhelst and A. Bahai, "Where analog meets digital: Analog-to-information conversion and beyond," *IEEE Solid-State Circuits Mag.*, vol. 7, no. 3, pp. 67–80, Jun. 2015.
- [11] S. Kirolos *et al.*, "Analog-to-information conversion via random demodulation," in *Proc. IEEE Dallas/CAS Workshop Design, Appl., Integr. Softw.*, Oct. 2006, pp. 71–74.
- [12] J. N. Laska, S. Kirolos, M. F. Duarte, T. S. Ragheb, R. G. Baraniuk, and Y. Massoud, "Theory and implementation of an analog-to-information converter using random demodulation," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2007, pp. 1959–1962.
- [13] T. Ragheb, J. N. Laska, H. Nejati, S. Kirolos, R. G. Baraniuk, and Y. Massoud, "A prototype hardware for random demodulation based compressive analog-to-digital conversion," in *Proc. IEEE Midwest Symp. Circuits Syst. (MWSCAS)*, Aug. 2008, pp. 37–40.
- [14] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg, and R. G. Baraniuk, "Beyond Nyquist: Efficient sampling of sparse band-limited signals," *IEEE Trans. Inf. Theory*, vol. 56, no. 1, pp. 520–544, Jan. 2010.
- [15] X. Chen, Z. Yu, S. Hoyos, B. M. Sadler, and J. Silva-Martinez, "A sub-Nyquist rate sampling receiver exploiting compressive sensing," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 58, no. 3, pp. 507–520, Mar. 2011.
- [16] X. Chen *et al.*, "A sub-Nyquist rate compressive sensing data acquisition front-end," *IEEE Trans. Emerg. Sel. Topics Circuits Syst.*, vol. 2, no. 3, pp. 542–551, Sep. 2012.
- [17] J. Yoo, S. Becker, M. Monge, M. Loh, E. Candès, and A. Emami-Neyestanak, "Design and implementation of a fully integrated compressed-sensing signal acquisition system," in *Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP)*, Mar. 2012, pp. 5325–5328.
- [18] J. Yoo *et al.*, "A compressed sensing parameter extraction platform for radar pulse signal acquisition," *IEEE Trans. Emerg. Sel. Topics Circuits Syst.*, vol. 2, no. 3, pp. 626–638, Sep. 2012.
- [19] D. Gangopadhyay, E. G. Allstot, A. M. R. Dixon, K. Natarajan, S. Gupta, and D. J. Allstot, "Compressed sensing analog front-end for bio-sensor applications," *IEEE J. Solid-State Circuits*, vol. 49, no. 2, pp. 426–438, Feb. 2014.
- [20] M. Mishali and Y. C. Eldar, "From theory to practice: Sub-Nyquist sampling of sparse wideband analog signals," *IEEE J. Sel. Topics Signal Process.*, vol. 4, no. 2, pp. 375–391, Apr. 2010.
- [21] M. Mishali, Y. C. Eldar, O. Dounaevsky, and E. Shoshan, "Xampling: Analog to digital at sub-Nyquist rates," *IET Circuits, Devices Syst.*, vol. 5, no. 1, pp. 8–20, Jan. 2011.
- [22] M. Wakin *et al.*, "A nonuniform sampler for wideband spectrally-sparse environments," *IEEE Trans. Emerg. Sel. Topics Circuits Syst.*, vol. 2, no. 3, pp. 516–529, Sep. 2012.
- [23] M. Trakimas, R. D'Angelo, S. Aeron, T. Hancock, and S. Sonkusale, "A compressed sensing analog-to-information converter with edge-triggered SAR ADC core," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 5, pp. 1135–1148, May 2013.
- [24] P. V. Rajesh *et al.*, "A 172  $\mu\text{W}$  compressive sampling photoplethysmographic readout with embedded direct heart-rate and variability extraction from compressively sampled data," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2016, pp. 386–387.
- [25] W. Guo, Y. Kim, A. Sanyal, A. Tewfik, and N. Sun, "A single SAR ADC converting multi-channel sparse signals," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2013, pp. 2235–2238.
- [26] W. Guo, Y. Kim, A. Tewfik, and N. Sun, "Ultra-low power multi-channel data conversion with a single SAR ADC for mobile sensing applications," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Sep. 2015, pp. 1–4.
- [27] F. Chen, A. P. Chandrakasan, and V. M. Stojanovic, "Design and analysis of a hardware-efficient compressed sensing architecture for data compression in wireless sensors," *IEEE J. Solid-State Circuits*, vol. 47, no. 3, pp. 744–756, Mar. 2012.
- [28] W. Guo and N. Sun, "A 12b-ENOB 61  $\mu\text{W}$  noise-shaping SAR ADC with a passive integrator," in *Proc. IEEE Eur. Solid-State Circuits Conf.*, Sep. 2016, pp. 405–408.
- [29] R. Tibshirani, "Regression shrinkage and selection via the lasso," *J. Roy. Statist. Soc., B (Methodol.)*, vol. 58, no. 1, pp. 267–288, 1996.
- [30] Y. Kim, W. Guo, B. V. Gowreesunker, N. Sun, and A. H. Tewfik, "Multi-channel sparse data conversion with a single analog-to-digital converter," *IEEE Trans. Emerg. Sel. Topics Circuits Syst.*, vol. 2, no. 3, pp. 470–481, Sep. 2012.
- [31] C. Luo, M. A. Borkar, A. J. Redfern, and J. H. McClellan, "Compressive sensing for sparse touch detection on capacitive touch screens," *IEEE J. Emerging Sel. Topics Circuits Syst.*, vol. 2, no. 3, pp. 639–648, Sep. 2012.

- [32] J. Xu, E. Rohani, M. Rahman, and G. Choi, "Signal reconstruction processor design for compressive sensing," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, Jun. 2014, pp. 2539–2542.
- [33] F. Ren and D. Marković, "A configurable 12–237 kS/s 12.8 mW sparse-approximation engine for mobile data aggregation of compressively sampled physiological signals," *IEEE J. Solid-State Circuits*, vol. 51, no. 1, pp. 68–78, Jan. 2016.
- [34] M. A. Lexa, M. E. Davies, and J. S. Thompson, "Reconciling compressive sampling systems for spectrally sparse continuous-time signals," *IEEE Trans. Signal Process.*, vol. 60, no. 1, pp. 155–171, Jan. 2012.
- [35] G. Mohimani, M. Babaie-Zadeh, and C. Jutten, "A fast approach for overcomplete sparse decomposition based on smoothed  $\ell^0$  norm," *IEEE Trans. Signal Process.*, vol. 57, no. 1, pp. 289–301, Jan. 2009.
- [36] J. A. Tropp and A. C. Gilbert, "Signal recovery from random measurements via orthogonal matching pursuit," *IEEE Trans. Inf. Theory*, vol. 53, no. 12, pp. 4655–4666, Dec. 2007.
- [37] A. Sanyal and N. Sun, "An energy-efficient low frequency-dependence switching technique for SAR ADCs," *IEEE Trans. Circuits Syst. II, Express Briefs*, vol. 61, no. 5, pp. 294–298, May 2014.
- [38] S.-W. M. Chen and R. W. Brodersen, "A 6-bit 600-MS/s 5.3-mW asynchronous ADC in 0.13- $\mu$ COS," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2669–2680, Dec. 2006.
- [39] L. Chen, A. Sanyal, J. Ma, and N. Sun, "A 24- $\mu$ W 11-bit 1-MS/s SAR ADC with a bidirectional single-side switching technique," in *Proc. IEEE Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2014, pp. 219–222.
- [40] V. R. Pamula, M. Verhelst, C. van Hoof, and R. F. Yazicioglu, "Computationally-efficient compressive sampling for low-power pulseoximeter system," in *Proc. IEEE Biomed. Circuits Syst. Conf. (BioCAS)*, Oct. 2014, pp. 69–72.
- [41] S. U. N. Wood, J. Rouat, S. Dupont, and G. Pironkov, "Blind speech separation and enhancement with GCC-NMF," *IEEE/ACM Trans. Audio Speech Lang. Process.*, vol. 25, no. 4, pp. 745–755, Apr. 2017.
- [42] Y. Wang and M. Brookes, "Speech enhancement using an MMSE spectral amplitude estimator based on a modulation domain Kalman filter with a Gamma prior," in *Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP)*, Mar. 2016, pp. 5225–5229.



**Wenjuan Guo** (S'13–M'17) received the B.S. degree from the Institute of Microelectronics and Nanoelectronics, Tsinghua University, Beijing, China, in 2011, and the Ph.D. degree from The University of Texas at Austin, Austin, TX, USA, in 2016.

From 2013 to 2014, she was a Design Co-Op with Texas Instruments, Dallas, TX, USA. She is currently an Analog Engineer with Intel Corporation, Austin. Her current research interests include analog and mixed-signal integrated circuits design.

Dr. Guo was a recipient of the Texas Instruments Ph.D. Fellowship in 2014 and 2015.



**Youngchun Kim** received the B.E. and M.E. degrees in electrical engineering from Soongsil University, Seoul, South Korea, in 1997 and 1999, respectively, and the M.Sc. degree in electrical engineering from the University of Minnesota, Minneapolis, MN, USA, in 2010. He is currently pursuing the Ph.D. degree with The University of Texas at Austin, Austin, TX, USA.

He was with Samsung Electronics, Suwon, South Korea, and with the Korea Institute of Civil Engineering and Building Technology, Goyang-si, South Korea, where he was involved in intelligent traffic systems. He is currently with Intersil, Austin. His current research interests include speech and bio signal processing, efficient sampling strategies for wearable devices, and novel analog-to-digital converter architectures leveraging signal-processing techniques.



**Ahmed H. Tewfik** (S'81–M'87–SM'92–F'96) received the B.Sc. degree from Cairo University, Cairo, Egypt, in 1982, and the M.Sc.E.E., and Sc.D. degrees from the Massachusetts Institute of Technology, Cambridge, MA, USA, in 1984, 1985, and 1987, respectively.

He is currently the Cockrell Family Regents Chair of Engineering and the Chairman with the Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA. He was with Alphatech, Inc., Burlington, MA, USA, in 1987, and was previously the E. F. Johnson Professor of Electronic Communications with the Department of Electrical Engineering, University of Minnesota, Minneapolis, MN, USA. He served as a Consultant to several companies, including MTS Systems, Inc., Eden Prairie, MN, USA, Emerson-Rosemount, Inc., Eden Prairie, CyberNova, Milpitas, CA, USA, Macrovision, Santa Clara, CA, USA, Visionaire Technology, Fremont, CA, USA, Ipsos, New York, NY, USA, InterDigital Communications, King of Prussia, PA, USA, Keyeye Communications, Sacramento, CA, USA, Transoma Medical, Arden Hills, MN, USA, and St. Jude Medical, Minnetonka, MN, USA. He was with Texas Instruments and Computing Devices International, Dallas, TX, USA. From 1997 to 2001, he was the President and the CEO of Cognicity, Inc., Minneapolis, MN, USA, an entertainment marketing software tools publisher that he co-founded. He has made seminal contributions in the past to food inspection, watermarking and multimedia signal processing, content-based retrieval, wavelet signal processing, and fractals. His research interests included low-power multimedia communications, adaptive search and data acquisition strategies for World Wide Web applications, radar and dental/medical imaging, monitoring of machinery via acoustic emissions, and industrial measurements. His current research interests include man-machine symbiosis, brain computing interfaces and brain science, and 5G wireless networks.

Prof. Tewfik was a recipient of the IEEE Third Millennium Award in 2000, the E. F. Johnson professorship of Electronic Communications in 1993, the Taylor Faculty Development Award from the Taylor foundation in 1992, and an NSF Research Initiation Award in 1990. He was a Distinguished Lecturer of the IEEE Signal Processing Society from 1997 to 1999. He was elected to the post of President-elect of the IEEE Signal Processing Society in 2017 and was elected to be VP for Technical Directions and to the board of governors of that society in 2009 and 2005, respectively. He was invited to be a Principal Lecturer at the 1995 IEEE EMBS Summer School. He delivered plenary lectures at the several IEEE and non-IEEE meetings and taught tutorials on bioinformatics, ultrawideband communications, watermarking and wavelets at major IEEE conferences. He was selected to be the first Editor-in-Chief of the IEEE SIGNAL PROCESSING LETTERS from 1993 to 1999. He was an Associate Editor of the IEEE TRANSACTIONS ON SIGNAL PROCESSING, and a Guest Editor of special issues of that journal, the IEEE TRANSACTIONS ON MULTIMEDIA and the IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING. He is currently an Associate Editor of the EURASIP Journal on Bioinformatics and Systems Biology. He also served as the President of the Minnesota chapters of the IEEE signal processing and communications societies from 2002 to 2005.



**Nan Sun** (M'11–SM'16) received the B.S. degree from the Department of Electronic Engineering, Tsinghua University, Beijing, China, in 2006, and the Ph.D. degree from the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA, in 2010.

He is currently an Associate Professor with the Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA. His current research interests include analog, mixed-signal, and RF integrated circuits, miniature spin resonance systems, magnetic sensors and image sensors, and developing micro- and nano-scale solid-state platforms (silicon ICs and beyond) to analyze biological systems for biotechnology and medicine.

Dr. Sun received the NSF Career Award in 2013 and the Jack Kilby Research Award from UT Austin in 2015. He was the AMD Development Chair at UT Austin. He serves in the Technical Program Committee of the IEEE Custom Integrated Circuits Conference and the Asian Solid-State Circuits Conference. He is an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I, REGULAR PAPERS.