

# A Multi-Sensor and Parallel Processing SoC for Miniaturized Medical Instrumentation

Philipp Schönle<sup>ID</sup>, Member, IEEE, Florian Glaser<sup>ID</sup>, Student Member, IEEE, Thomas Burger, Member, IEEE, Giovanni Rovere, Student Member, IEEE, Luca Benini, Fellow, IEEE, and Qiuting Huang, Fellow, IEEE

**Abstract**—We report *VivoSoC*, a system-on-chip realized in 130-nm CMOS for miniaturized medical instrumentation as used in mobile health devices or implantable telemetry systems for animal experiments. It features six neural stimulation channels and acquisition circuits for 9× electrode-based recordings, 4-channel/32-LED photoplethysmography (PPG), bioimpedance, and temperature. A 34- $\mu$ W/MHz quad-core processing unit with sophisticated data, power, and clock management techniques enables on-chip feature extraction, which can dramatically reduce transmission or storage data rates in a system—crucial for overall power consumption and exemplarily demonstrated on preliminary work on implantable PPG-based vital signs monitoring.

**Index Terms**—Analog front-end (AFE), bio-impedance (BioZ), biomedical, data acquisition, electrode-based recordings (ExG), parallel processing, photoplethysmography (PPG).

## I. INTRODUCTION

MINIATURIZED medical-grade instrumentation is required in two fields of applications: mobile health (mHealth) and (implantable) telemetry systems. Applications are amongst other: holster and disposable vital signs monitoring devices [1], [2], wellness gadgets [3], [4], prosthesis control [5], and research tools for animal experiments [6]–[8]. Despite their diversity all applications involve similar technical requirements and challenges: small and power-efficient hardware, medical-grade signal acquisition, real-time processing capability, wireless connectivity, and data storage. We are working toward a common device platform [9] for such applications with *VivoSoC* [10], the multi-sensor and parallel processing system-on-chip (SoC) presented in this paper, as its cornerstone.

As illustrated with the block diagram (Fig. 1), the SoC features a nine-channel analog front-end (AFE) for electrode-based recordings (ExG) such as electrocardiography (ECG), electroencephalography (EEG), etc. In the implantable telemetry scenario, it serves the recording of peripheral

Manuscript received December 5, 2017; revised January 26, 2018; accepted March 1, 2018. Date of publication April 4, 2018; date of current version June 25, 2018. This paper was approved by Guest Editor Pieter Harpe. This work was supported in part by the SNSF through *Transient Computing Systems* under Grant 157048 and in part by the Swiss Confederation and scientifically evaluated by SNSF through Nano-Tera.ch RTD Project *WearMeSoC*. (*Corresponding author: Philipp Schönle*.)

The authors are with the Department of Information Technology and Electrical Engineering, ETH Zürich, 8092 Zürich, Switzerland (e-mail: schoenle@iis.ee.ethz.ch).

Color versions of one or more of the figures in this paper are available online at <http://ieeexplore.ieee.org>.

Digital Object Identifier 10.1109/JSSC.2018.2815653



Fig. 1. Block diagram of the *VivoSoC* (medical) multi-sensor SoC.

neural signals and subcutaneous ExG. A single-channel bio-impedance (BioZ) AFE is included for respiration monitoring while another highlight, the high-channel count photoplethysmography (PPG) subsystem enables heart rate and oxygenation monitoring, near infrared spectroscopy (NIRS), and pulse wave velocity (PWV) measurement. The latter is of particular interest in implantable vital signs monitoring [11]. The LED driver circuit is further usable in optogenetics, where genetically modified nerves express optically sensitive proteins which can either evoke an action potential or block the propagation thereof when optically activated. The SoC further comprises an on-chip temperature sensor and a six-channel circuit for electrical neural stimulation or blockage.

*VivoSoC* combines this comprehensive set of AFEs with a quad-core microcontroller unit (MCU) based on the parallel ultralow power (PULP) platform [12]. Its flexibility and ample processing power enables on-chip filtering, preprocessing, motion artifact suppression, and feature extraction which is particularly useful where data rates for transmission or storage can be reduced—or in implantable closed-loop systems [13].

The remainder of this paper will discuss architecture and circuit design of the SoC: Sections II–V describe the AFEs for ExG, neural stimulation, BioZ, and PPG. In each section, the main challenges arising from the requirements imposed on the AFE by the respective applications are introduced first, followed by a discussion of the circuits addressing them. A special focus in Section V is power efficiency in PPG and limitations thereof. Since overall power consumption (including LEDs) reaches several milliwatts for the required signal quality, it can easily dominate over further AFEs.



Fig. 2. (a) Block diagram of the ExG AFE. (b) Power spectral density (PSD) measurement for a 20-Hz 6-mV<sub>pp</sub> sine input. (c) Recorded ECG signal (of a Fluke ProSim 8 vital signs generator).

Similarly, in Section VI, the MCU is discussed with special emphasis on the architectural design choices for power efficiency in processing as well as data management and a circuit for signal-dependent duty cycling is presented. Application results are given in Section VII, and this paper is concluded in Section VIII.

## II. ELECTRODE-BASED RECORDINGS

The contact between electrode and tissue imposes various challenges on an ExG AFE design: being a metal-to-electrolyte interface, a dc half-cell potential is superimposed on the biomedical signal of interest. This time-varying offset can exceed the signal amplitude ( $10 \mu\text{V}-1 \text{ mV}$ ) by several orders of magnitude. This implies low noise while linearity is relaxed: only a fraction of the dynamic range—required to tolerate large differential offsets—is used by the signal. The frequency dependent and likewise time-varying contact impedance ranges between ca.  $100 \Omega$  (wet electrodes and abraded skin) and  $10 \text{ M}\Omega$  (dry electrodes), demanding high input impedance and common mode rejection ratio (CMRR).

A block diagram of the ExG AFE is depicted in Fig. 2(a). The nine front-end channels consist of a chopper-stabilized instrumentation amplifier (IA) [14] followed by a low-pass filter and have configurable gain (8–2k) and bandwidth (3.2–12.8 kHz). Their outputs are time multiplexed to a shared 14-bit SAR ADC [15] with programmable per-channel sampling rates of up to 32 kHz and configurable cascaded integrator-comb (CIC) filters for downsampling—adjustable to the requirements of a given application: While ECG and EEG typically require bandwidths below 250 Hz, neural activity recording demands bandwidths of up to 10 kHz. The patient ground (PGND) circuit equalizes device and body potentials and thus defines roughly the input common mode of the current balancing instrumentation amplifiers [16], [17]. The differential input offset and IA offset can be compensated with separate current steering DACs to avoid saturation of the IA at high gain settings. This enables dc-connection



Fig. 3. (a) Ideally, the differential electrode voltage ensures unidirectional neural stimulation. (b) In a real bipolar electrode, current leakage causes a virtual cathode and thus bidirectional AP propagation. (c) Verification by ADN stimulation, known to cause blood pressure drop. (d) Supported waveforms. (e) Schematic of the H-bridge current driver circuit.

for up to  $\pm 300\text{-mV}$  differential electrode offset, which renders large off-chip components for ac-coupling superfluous. Tracking of the baseline wander and mains interference suppression is done in software on the MCU prior to further processing [17]. The AFE also features an electrode impedance measurement (EIM) circuit for contact quality assessment of each individual electrode.

Measurement results are given in Fig. 2(b) and (c) and the chip summary, Table III.

## III. ELECTRICAL NEURAL STIMULATION AND BLOCKING

In electrical neural stimulation, the neuron transmembrane potential is locally depolarized by the application of current; evoking an action potential (AP) [Fig. 3(a)] which spreads along the axon by repetitively inducing currents, depolarizing neighboring membrane, and thus evoking a next AP [18]. Electrode-tissue charge transfer occurs through both capacitive coupling and redox reactions. Corrosion due to the latter can damage both tissue and electrode, but can be limited to a safe level by the right choice of electrode material and zero net-charge transfer—achieved with charge balancing waveforms [19], [20]. Given varying merits in efficacy, safety, and electrode corrosion for different waveforms [21], optimum pattern selection is an application-specific tradeoff.

The six-channel neural stimulation circuit can realize all the commonly used patterns depicted in Fig. 3(d) for repetition rates ranging 5 Hz–50 kHz. For simple bipolar cuff micro-electrodes [22], AP propagation is bidirectional [18] as illustrated in Fig. 3(b). Thus, stimulation is waveform polarity independent, allowing toggling of the compound-waveform polarity which results in virtually no charge imbalance since positive and negative currents inherently match in the chosen

H-bridge topology, depicted in Fig. 3(e). For optional passive electrode discharge in-between pulses, a pair of electrodes can be shorted via an otherwise unconnected resistor. Stimulation current calibration against an external resistor is illustrated in Fig. 3(c). Circuit and waveforms were verified in a series of acute experiments with bipolar *CorTec* micro cuff electrodes in totally nine anesthetized *Wistar* rats by stimulation of the aortic depressor nerve (ADN), Fig. 3(c), confirming that the 3.3-V end-of-life voltage of lithium-ion batteries is sufficient for stimulation with low-impedance micro-electrodes [22].

#### IV. BIO-IMPEDANCE

Tissue impedance is commonly measured by injecting an ac current and measuring the resulting voltage. Considering the issues encountered when interfacing tissue with electrodes discussed in Section II, a 4-lead Kelvin access structure is preferable for high accuracy. ac-coupling of the receiving electrodes is viable as tissue impedance is measured in the low kilohertz range, where signal loss is still low [23] and flicker noise, powerline, and dc artifacts do not interfere. Tissue resistivity of internal organs, skin, and muscle spans a range of 0.1–1  $\text{k}\Omega \cdot \text{cm}$  [24]. This translates into measured impedances in the sub-k $\Omega$  to multi-k $\Omega$  range [25], [26]. The application specificity of both electrode and tissue impedances demands injection current and receiver gain to be configurable. Sinusoidal injection currents are preferred over a rectangular waveform as harmonics can distort the impedance measurement in the presence of non-linearities in the signal path. The requirements on sinusoid purity are relaxed since the receive chain provides further harmonic filtering and additional means to suppress the harmonic downconversion. The design focus is on low power consumption for the given impedance measurement range as defined by the application. The straightforward implementation of a current generator is direct digital synthesis with a multi-bit current DAC. For BioZ, however, a simpler scheme with less resolution and programmability is sufficient [26], [27], which consumes less power and silicon area. Our design is based on a current DAC (iDAC) with fixed-weight nMOS and pMOS current sources to create an eight-step pseudo-sinusoid by repeating the sequence {0, 7, 10, 7, 0, -7, -10, -7}  $\times I_0$ ;  $I_0$  denoting a programmable unit current of 62.5–625 nA. After passing a configurable first-order low-pass filter (LPF), the current is fed into a class AB transimpedance amplifier (TIA) which acts as current buffer with an 8 $\times$  output current multiplication, realized with a scaled replica of the output branch in feedback, Fig. 4(a). A resistive output load is provided to limit dc offset, a second LPF can be formed by adding an external differential capacitor, chosen according to the target measurement range. Using a class AB amplifier as output stage of the current generator has the advantage that the quiescent current can be kept substantially lower than the current under full operation by approximately a factor of 1.7 for our design. The BioZ readout channel needs to provide high dynamic range (DR) in a small bandwidth around the received sinusoidal in order to detect small impedance variations. The respiration signal in a thoracic impedance measurement is,



Fig. 4. (a) Block diagram of the BioZ AFE. (b) Circuit details of the transconductance stage. (c) Second-order biquad filter.



Fig. 5. (a) Output current ( $80 \mu\text{A}_{pp}$ , 63 kHz) spectrum. (b) Rx performance: spectrum for a high sensitivity configuration. (c) SNR and input-referred noise as functions of gain. (d) Sensitivity as a function of measurement range.

e.g., in the single-Ohm range [25]. Variable gain (and injection current) provides the possibility to adapt the receiver to a broad impedance range of  $10 \Omega$ – $35 \text{k}\Omega$ . Variable bandwidth enables the use of simple offset frequency generation between generator and receiver by frequency division from the system clock (16 MHz) with programmable dividers. The presented AFE matches these requirements with a passband (4–100 kHz) input stage and a programmable bandwidth (3.2–12.8 kHz) for the demodulated baseband, combined with a variable gain.

Fig. 4 shows the block diagram of the BioZ AFE, with more detailed illustrations of the input stage and the baseband LPF. The input transconductance stage has a resistively degenerated input differential transistor pair that can handle differential signals of up to  $60 \text{ mV}_{pp}$  at full gain. The differential current of the input branch is folded back to the input of the following TIA via two PMOS transistors. An output common-mode



Fig. 6. (a) PPG AFE block diagram. (b) Schematic of receiver circuit. The receiver can operate in resistive and capacitive feedback TIA mode to cover a wide range of input currents and features circuitry to shift the input range (DCS) and for analog ( $C_S$ ) and digital ALC.

regulation loop ensures correct TIA input operating point at all gain settings and helps to suppress low-frequency common-mode distortions such as mains interference. All tail transistors are degenerated to reduce flicker noise at low input frequencies (3-dB corner  $\approx 3$  kHz). The combination of input and TIA stage spans a gain range of 13 to 42 dB (eight settings) with the variable range of the input and TIA being 12 and 17 dB, respectively. The TIA also forms a variable first-order LPF that limits the noise bandwidth outside the passband. A two-stage RC-compensated Miller amplifier with class AB output stage is used for the TIA and for the amplifiers in the baseband LPF. The dc offset accumulation in the front-end is limited by a low-bandwidth integrator providing a high-pass filter (HPF) loop.

Downconversion of the passband signal into orthogonal I- and Q-branches, enabling *complex* impedance measurements, follows the TIA stage. A harmonic reject mixer (HRM) [28] suppresses the downconversion of noise at the third and fifth harmonic and thus improves the input-referred noise at low AFE gain. Programmable HRM input branch resistance provides further 0/6 dB of gain—in combination with the second-order baseband LPF which is implemented as a Tow–Thomas Biquad with Butterworth characteristic, Fig. 4(c).

Fig. 5(a) shows the measured current generator output in frequency domain for  $80 \mu\text{A}_{\text{pp}}$ , filtered by the TIA input LPF and an external first-order LPF with 160 kHz cutoff. The third and fifth harmonics are reduced due to the choice of the pseudo-sinusoidal sequence and the spectral replicas at the seventh and ninth harmonics are reduced by the LPFs. Fig. 5(c) shows the digitized AFE output for 33-dB gain, achieving a signal-to-noise and distortion ratio (SNDR) of 65.5 dB in a 4-kHz bandwidth. A performance summary and comparison to recently published BioZ AFEs in biomedical SoCs is given in Table I. Our design compares favorable in terms of sensitivity and silicon area, and offers more degrees of freedom than [27], [29], which enables high relative sensitivity over a wide impedance range, see Fig. 5(d).

## V. PHOTOPLETHYSMOGRAPHY

Each heart stroke causes a pressure wave to travel through the arterial tree to the peripheral small arteries and arterioles.

TABLE I  
COMPARISON TO STATE-OF-THE-ART BIOZ AFEs

|                                                           | JSSC<br>2015 [30]     | JSSC<br>2015 [27] | JSSC<br>2016 [29] | This<br>Work                                  |
|-----------------------------------------------------------|-----------------------|-------------------|-------------------|-----------------------------------------------|
| active area <sup>a</sup> [mm <sup>2</sup> ]<br>tech. node | 1.72<br>180nm         | 2.51<br>180nm     | 5.1<br>180nm      | <b>0.96</b><br><b>130nm</b>                   |
| <b>Tx: Current Generator</b>                              |                       |                   |                   |                                               |
| frequency [kHz]                                           | 0.1-100               | 20                | 20/40             | <b>4-100</b>                                  |
| amplitude [ $\mu\text{A}_{\text{pp}}$ ]                   | 10-400                | 27-117            | 50-200            | <b>10-100</b>                                 |
| THD [%]                                                   | < 0.2 <sup>b</sup>    | < 4               | -                 | <b>1.4<sup>c,d</sup></b>                      |
| pwr. (quies.) [ $\mu\text{W}$ ]                           | -                     | -                 | -                 | <b>56<sup>e</sup></b>                         |
| pwr. (op.) [ $\mu\text{W}$ ]                              | -                     | 32                | 119 <sup>d</sup>  | <b>95<sup>d</sup></b>                         |
| <b>Rx: Analog Front-End</b>                               |                       |                   |                   |                                               |
| gain / step [dB]                                          | 18-60 / 6             | -                 | -                 | <b>13-48</b>                                  |
| BW [kHz]                                                  | 0.1-100               | -                 | -                 | <b>3.2/6.4/9.6/12.8</b>                       |
| ir noise [nV/ $\sqrt{\text{Hz}}$ ]                        | 36                    | -                 | -                 | <b>26<sup>f</sup> / 64<sup>g</sup></b>        |
| sens. [ $\text{m}\Omega/\sqrt{\text{Hz}}$ ]               | 4.9                   | 9.8 <sup>h</sup>  | 3.0 <sup>i</sup>  | <b>0.74<sup>d,f</sup> / 2.3<sup>g,j</sup></b> |
| power [ $\mu\text{W}$ ]                                   | ca. 8900 <sup>k</sup> | 26                | 46                | <b>89<sup>n</sup></b>                         |

<sup>a</sup> one channel Tx & Rx incl. ADC    <sup>b</sup>  $200 \mu\text{A}_{\text{pp}}$     <sup>c</sup> 64kHz    <sup>d</sup>  $100 \mu\text{A}_{\text{pp}}$   
<sup>e</sup>  $10 \mu\text{A}_{\text{pp}}$     <sup>f</sup> 48dB gain    <sup>g</sup> 30dB gain    <sup>h</sup>  $27 \mu\text{A}_{\text{pp}}$     <sup>i</sup>  $50 \mu\text{A}_{\text{pp}}$   
<sup>j</sup>  $80 \mu\text{A}_{\text{pp}}$     <sup>k</sup> per channel    <sup>n</sup> w/o ADC

A steep increase in vascular resistance [31] from arterioles to capillaries hinders further spreading in venous vessels. Dilatation of (small) arterial vessels by this pressure wave causes an increase in arterial blood volume in tissue which is observed in PPG as a periodic component in the light absorbed by tissue. Commonly, LEDs are used as light sources and photodiodes (PDs) as detectors. Arterial oxygenation can be estimated by a spectrophotometric analysis of this pulsatile component from PPG recordings at multiple wavelengths (typically two)—known as *pulse oximetry* [32] and commonly used in patient monitoring and diagnostics. Heart rate can be extracted from a single-wavelength measurement as is common in fitness watches. PPG can further serve the measurement of PWV, i.e., the propagation speed of the arterial pressure wave, which is a function of blood pressure and vessel distensibility [11]. NIRS [33], [34] is a related spectrophotometric technology for tissue perfusion and oxygenation assessment with similar requirements imposed on the AFE and limitations by the optoelectronic components.

### A. Requirements and AFE Architecture

Although pulse oximetry is based on a well-known principle and usually gives accurate readings, various conditions and situational parameters can in practice reduce its accuracy [35]. The PPG AFE [Fig. 6(a)] comprises several countermeasures:

109 dB of maximum loop DR enables accurate oxygenation estimation when perfusion index<sup>1</sup> and oxygenation level are low [Fig. 8(a)] [36] or ambient light interferes: correlated double sampling-based ambient light cancellation (ALC) can suppress in-band interference by 50 dB by subtraction of a *dark sample* from the *signal sample*. However, this does not prevent the additive ambient light component to reduce the DR available to the PPG signal in the first place. The transmitter circuit supports up to 32 LEDs in a matrix and 8-bit programmable driving currents of up to 110 mA to overcome high dc absorbance, e.g., due to dark skin pigmentation. The high LED count allows spatial and spectral diversity: a two-wavelengths oximeter can only distinguish two components contributing to the ac part of the absorption, which is sufficient when spectral properties of whole blood are dominated by oxy- and deoxyhemoglobin—additional components are accounted for by empirical calibration. Whenever the arterial blood contains further, unaccounted, components, they will interfere with the  $S_{\text{pO}_2}$  reading. Reported are false readings due to dysfunctional hemoglobin [37], intravenously administered dyes [38], and variant hemoglobins [39]. *Spectral diversity*, i.e., PPG recordings at more wavelengths allows discrimination of more components (dysfunctional hemoglobin) or at least the detection of a spectral anomaly (dyes, variant hemoglobin). Additional PPG recordings at short wavelengths (e.g., green), which have much lower penetration depth into tissue than the typical 660–940 nm, can contribute to motion artifact suppression in reflective oximetry since the effect of motion is less prominent in the outermost tissue layers [40]. This effect is observed in the simultaneous recordings at four wavelengths shown in Fig. 9(b). *Spatial diversity*, i.e., multiple LED-PD configurations, is preferable where probe placement is not well controlled to overcome physiological differences among people [29] and eases separation of signal and motion interferences since the latter differ between channels [41].

The receiver circuit, depicted in Fig. 6(b), covers a wide transimpedance range (required to support both PPG and NIRS) of 19 k $\Omega$ –90 M $\Omega$  by supporting both resistive (RTIA) and capacitive (CTIA) feedback topologies [36]. The receiver features differential dc current subtraction (DCS) from the input signal to increase DR, e.g., when dc ambient light is high. In-band interferers, e.g., by ambient light or mains, is countered by subtraction of a *dark sample* (taken while all LEDs are shut off) from the *signal sample*. This can be done either in the analog (sampling on  $C_S$ , charge subtraction) or digital domain (digitization of both samples); as suppression improves for shorter intervals between the two sampling points, analog ALC has better suppression than digital ALC (digitization time) comes, however, at the cost of additional  $kT/C$  noise by the required charge redistribution. Amplifier offset and flicker noise are suppressed by correlated double sampling while amplifier noise is curbed by the programmable resistors  $R_{Si}$ . As illustrated in Fig. 7, the series resistors reduce the output referred amplifier noise bandwidth while signal bandwidth is unaffected, since dominated by the



Fig. 7. Amplifier noise to output voltage transfer function: adding a series resistance  $R_S$  reduces the noise bandwidth, while the signal transfer function remains unaffected as long as  $R_f C_f > R_S C_S$ .

feedback ( $R_f, C_f$ ). Measurements [42] demonstrated effective amplifier excess noise curbing such that only  $kT/C$  noise remains.

The PPG subsystem further features the option to calibrate the LED current driver against an external precision resistor, similar to the scheme used in the neural stimulation circuit Fig. 3(e). Calibration is usually not required as PPG-based measurements do not rely on absolute values, neither for the receive nor the transmit current. However, precise control over LED current may be required in optogenetics (optical neurostimulation) and eases power management.

### B. Power-DR Tradeoff

Although the indispensable optoelectronic components are commonly neglected in the characterization of PPG circuits [29], [44], [45], they actually may dominate both noise level (PD) and power consumption (LED). The latter usually ranges clearly above 5 mW in a realistic pulse oximetry scenario and thus easily dominates overall power consumption in a mHealth device. We thus analyze hereafter on a system level the relation of power consumption and DR.

In the following, signal-to-noise ratios (SNRs) are given as ratio of dc signal power to noise power and we define PPG receiver DR as the ratio between the highest resolvable dc input signal level to its input referred rms noise (measured with a dc input signal corresponding to 50% full-scale). In the loop DR, we further consider the noise contributions of the transmitter, LED, and PD

$$\text{DR}_{\text{loop}} = \left[ \frac{1}{4\text{SNR}_{\text{PD}}} + \frac{1}{\eta_q} \frac{1}{\text{SNR}_{\phi_{\text{LED}}}} + \frac{1}{\text{DR}_{\text{Rx}}} + \frac{1}{\text{SNR}_{\text{Tx}}} \right]^{-1} \quad (1)$$

with quantum efficiency  $\eta_q$ . The dominant noise source in both PD and LED is shot noise and the SNRs of photo-generated current and emitted photon flux, respectively, are

$$\text{SNR}_{\text{PD}} = \frac{I_{\text{PD}}}{2qf_{\text{BW}}} \quad \text{SNR}_{\phi_{\text{LED}}} = \frac{I_{\text{LED}}}{2\eta_{\text{ex}}qf_{\text{BW}}} \quad (2)$$

with  $\eta_{\text{ex}}$  denoting the external efficiency of the LED [42]. The noise bandwidth  $f_{\text{BW}}$  of the loop is limited by the receiver bandwidth and thus equal for LED and PD. Considering a typical transmission factor  $\Theta = I_{\text{PD}}/I_{\text{LED}}$  of 10 ppm, the PD noise clearly dominates.

Ultimately, a PPG recording circuit is thus limited in terms of noise by PD shot noise and in terms of power by

<sup>1</sup>Perfusion index PI=ac/dc is the ratio of the pulsatile ac component to the dc component of the received photo-generated current.

TABLE II  
COMPARISON TO STATE-OF-THE-ART PPG AFEs

|                             | TBioCAS<br>2013 [47] | JSSC<br>2016 [44] | JSSC<br>2016 [29] | JSSC<br>2017 [45]   | AFE4403<br>2014 [43]   | AFE4404<br>2015 [46] | This<br>Work      |
|-----------------------------|----------------------|-------------------|-------------------|---------------------|------------------------|----------------------|-------------------|
| die size [mm <sup>2</sup> ] | 1.15 <sup>a</sup>    | 2                 | 5.04 <sup>a</sup> | 2.44 <sup>a</sup>   | 9                      | 3.75                 | 1.74 <sup>a</sup> |
| tech. node                  | 350nm                | 250nm             | 180nm             | 130nm               | -                      | -                    | 130nm             |
| supplies [V]                | 3.3                  | -                 | 3.3               | 1.2 / 1.5 / 3.3     | 3.0 / 3.3 <sup>b</sup> | 3.0                  | 1.2 / 3.3         |
| # PD / LED ch.              | 1 / 2                | 1 / 2             | 2 / 64            | 1 / 2               | 1 / 2 (3)              | 1 / 3                | 4 / 32            |
| <b>Receiver</b>             |                      |                   |                   |                     |                        |                      |                   |
| DR 0.1-20 Hz [dB]           | -                    | -                 | 94                | -                   | 106                    | 100                  | 112               |
| transimp. [ $\Omega$ ]      | -                    | -                 | 16k-2.2M          | 10k-2M              | 10k-1M                 | 10k-1M               | 19k-90M           |
| <b>Loop: Tx-LED-PD-Rx</b>   |                      |                   |                   |                     |                        |                      |                   |
| DR 0.1-20 Hz [dB]           | 72 <sup>d</sup>      | 85 <sup>c</sup>   | 93 <sup>e</sup>   | 70.2 <sup>d</sup>   | 80.5 <sup>d</sup>      | 92                   | 89 <sup>d</sup>   |
| LED d-cyc. [%]              | 4 <sup>d</sup>       | 4 <sup>c</sup>    | 6.1 <sup>e</sup>  | 0.4 <sup>d</sup>    | 2.0 <sup>d</sup>       | 5                    | 2 <sup>d</sup>    |
| LED pwr. [mW]               | 1.4 <sup>d</sup>     | 13.2 <sup>c</sup> | 20.3 <sup>e</sup> | 0.13 <sup>f,e</sup> | 0.66 <sup>f,e</sup>    | 16.5                 | 6 <sup>d</sup>    |
| IC pwr. [mW]                | 0.5 <sup>d</sup>     | 3.02              | 0.3 <sup>e</sup>  | 0.08 <sup>d</sup>   | 2.03                   | 1.94 <sup>d</sup>    | 0.34              |
| FOM [dB]                    | 74                   | 86                | 96                | 92                  | 89                     | 93                   | 101               |

$R_{TI} = 450 / 500 \text{ k}\Omega$   $f_s = 600 \text{ Hz}$   $I_{LED} = 50 \text{ mA}$  2 LEDs <sup>a</sup> active area <sup>b</sup> min. LED supply for 2.5 V  $V_{LED}$  <sup>c/d/e</sup>  $f_s = 64 / 100 / 4096 \text{ Hz}$  <sup>f</sup>  $I_{LED} = 5 \text{ mA}$

the LED. The tradeoff between power and performance can be formulated in the DR to power ratio for a given signal bandwidth  $f_{BW,SIG}$

$$\text{DRPR} = \frac{4\text{SNR}_{PD}^{f_{BW,SIG}}}{P_{LED}} = \frac{4 \frac{I_{PD}}{2q f_{BW,Rx}}}{I_{LED} V_{LED} \delta} \cdot \frac{f_s}{2 f_{BW,SIG}} \quad (3)$$

with sampling frequency  $f_s$  and LED duty cycle  $\delta$ . The latter is the product of sampling frequency and LED pulselwidth  $\delta = f_s T_P$ . Note that LED and PD currents are linked by the application and the subject-specific transmission factor  $\Theta$ , while  $T_P$  depends on receiver settling requirements, expressed with the *settling number*  $\xi = T_P / \tau$  (for  $\tau$  the time constant of the receiver low-pass characteristic). This simplifies (3) to

$$\text{DRPR} = \frac{\Theta}{q f_{BW,SIG} V_{LED}} \cdot \frac{1}{T_P f_{BW,Rx}} = \frac{4\Theta}{q f_{BW,SIG} V_{LED} \xi}. \quad (4)$$

Equation (4) demonstrates that power efficiency is independent of LED current. It is therefore desirable to choose a high current for better ambient light resilience while using the sampling frequency to adapt on the DR requirements [Fig. 8(a)], as illustrated in Fig. 8(b). This is possible since typical PPG signal bandwidth is just 15 Hz and the on-chip MCU offers ample processing resources for digital low-pass filters. The wide transimpedance range (9.4/150 k $\Omega$  granularity) of the receiver presents an alternative to the LED current for adaptation on the subject-specific transmission factor  $\Theta$ . As is apparent from (4), a low settling number  $\xi$  is favorable for high power efficiency. In an RTIA a low  $\xi$  requires incomplete settling and thus well-defined resetting of the circuit in-between samples to avoid channel crosstalk and non-linearities. Fig. 8(c) shows measurements of the figure-of-merit (FoM), as proposed in [36] and given below, for different  $\xi$  and  $f_{BW,Rx}$

$$\text{FoM} = \text{DR}_{f_{BW}} + 10 \cdot \log_{10} \left( \frac{f_{BW}}{1 \text{ Hz}} \frac{1 \text{ mW}}{P_{tot}} \frac{10 \text{ ppm}}{\Theta} \right). \quad (5)$$

We observe that the measurements track the theoretical FoM limit closely up to the maximum FoM at  $\xi \approx 2$ , while it drops rapidly for lower settling numbers due to gain loss and



Fig. 8. (a) DR requirements. (b) Trading power for DR by adjusting the sampling frequency. (c) FoM measurements for different RTIA Rx bandwidths and settling numbers and (d) comparison to previous work.

increasing significance of power overhead. The CTIA topology has a fix  $\xi$  of comparably low 2.78 [48] and thus qualifies as alternative to the RTIA topology for sub- $\mu\text{A}$  input currents for which its linearity is sufficient [42]. Table II compares the PPG AFE to recent publications and state-of-the-art commercial solutions; the combination of a low-noise receiver and low  $\xi$  results in best-in-class FoM. As observed in Fig. 8(d), this paper as well as [43], [46] come close to the theoretical upper limit of the FoM, given by PD noise and LED power—which illustrates, that they indeed dominate overall noise and power. The PPG subsystem was verified and used in different real-life applications, such as in the prototype depicted in Fig. 9 and in first experiments toward implantable oximetry and PWV measurements in small animals [11].

## VI. MULTI-CORE PROCESSING UNIT

The multitude of flexible and powerful AFEs calls for adaptive control and (re)configuration, both on startup and



Fig. 9. (a) Demonstration of a multi-channel PPG recording and  $\text{SpO}_2$  calculation (calibrated to a *Fluke ProSim 8*). (b) Motion artifact susceptibility is wavelength dependent. Simultaneous recordings for two movements.

during operation, depending on the specific application *VivoSoC* is used for. Additionally, energy consumption of wireless transmission or non-volatile storage of the AFEs raw data (up to several Mbit/s) can dominate and exceed the power budget of targeted mHealth and implantable systems. To prevent such high off-chip data rates, it is desirable to process raw data as close to the source as possible in a power-efficient way, in line with recent internet-of-things (IoT) trends [49], [50]. Maximum compression is achieved when only resulting features or calculated measurands are transmitted or stored. In biomedical applications, the required processing ranges from filtering and domain translation with medium computational demands to performance hungry deep- or machine learning-based detection and classification algorithms [51]–[54]. In closed-loop applications that make concurrent use of sensing and stimulation front-ends, local data analysis and stimuli calculation is crucial to guarantee reliable operation and low reaction latency. Finally, efficient communications via standard interfaces to further chips is required for data transmission or storage and for device controlling [9].

#### A. MCU Architecture

To address the aforementioned requirements—especially enabling computational demanding processing and control-centered flexible system management with high energy efficiency—a quad-core MCU based on the *PULP* parallel processing platform [55]–[60] is incorporated as a central component of *VivoSoC*. The architecture of the processing subsystem is depicted in Fig. 10, highlighting the independent *MCU peripheral* and *MCU cluster* clock and power domains with associated supply voltages  $V_{\text{DD\_PE}}$ ,  $V_{\text{DD\_CL}}$  and operating frequencies  $f_{\text{PE}}$ ,  $f_{\text{CL}}$ , respectively. Processing and control tasks are handled in the *MCU cluster* domain, centered around four optimized, 32-bit *openRISC* cores [61] that share a multi-bank, L1 tightly coupled data memory (TCDM) with word-level interleaving for contention reduction. The latter is accessible through a single-cycle latency logarithmic interconnect [62], which enables efficient data sharing between cores for parallel processing without costly copy operations. The instruction cache [63] is based on



Fig. 10. Architecture of the multi-core processing unit, highlighting the two independent clock and power domains.

standard cell memory (SCM) [64], allowing dedicated read ports for each core at little power or area cost and consequently avoiding any contentions by construction. The cache is shared between all cores to take advantage of instruction locality among them when executing highly parallel code. Instructions are fetched from the large L2 background memory which resides in the *MCU peripheral* domain. Additionally, system management, the point-of-entry units JTAG, boot-ROM, and SPI slave as well as—consequently—peripherals interfacing both off-chip components and the on-chip AFEs are hosted in this domain. The combination of boot-ROM and (quad) SPI master enables booting from an external, serial flash device and consequently host-less operation of *VivoSoC*. Configuration of all AFEs is done through a memory-mapped, shared and asynchronous bus with separate parts of the address determining AFE and register to be addressed, which provides transparent access for application development. Two direct memory access units (DMAs), one in each domain, are responsible for data transfers between memories and peripherals. All major components of the *MCU* subsystem are linked through a high-bandwidth (up to  $> 5$  Gbit/s) 64-bit AXI interconnect system, which spreads over both domains and allows the instruction cache to refill a cache line in two cycles and DMA transfers to move two words between L2 and L1 memories per cycle. Since both DMA transfers and instruction cache refills are done in blocks, the dual-word width on memory and cluster bus reduces interconnect and ultimately cluster duty cycle while the remaining interconnect systems are configured as area and energy-efficient 32-bit busses.

#### B. Energy-Efficient Processing

As a consequence of the shared nature of many cluster resources, which incurs complexity as well as area and power overhead, the utilization of the *MCU* must be maximized whenever any processing needs to be done. In between, aggressive duty cycling must be employed to minimize standby consumption. The importance of this approach can be clearly shown by the power characterization in Fig. 11(b) for the



Fig. 11. (a) Power trace demonstrating the effectiveness of automated cluster clock gating on the example of PWV processing. (b) Voltage and frequency scaling is illustrated for 32-bit matrix–matrix multiplication; shared L1 eases load distribution among the cores.

example of a 32-bit matrix multiplication: While the difference in (active) power consumption between the execution on a single and all cores is negligible for the MCU peripheral domain, there is a gap of only ca.  $2.1 \times$  for the cluster at an (ideal) speedup of  $4 \times$ . As a result of the speedup, the clock frequencies of both MCU domains can in turn be lowered while still being able to cope with a given computational load, ultimately allowing significant reduction of the supply voltages, offering quadratic savings as illustrated in Fig. 11(b). Operation of both MCU domains with independent supply voltages and clock frequencies is enabled by the 64-bit AXI 2V/2C FIFOs on the domain boundaries, allowing independent operating point optimization.

Software synchronization among cores as well as all other signaling (e.g., finished peripheral transfers) are carried out through events to avoid any busy waiting. Private, per core, event and interrupt buffers, and masks allow flexible and individual assignment of cores to event sources. Clock gating is not only done on a per core basis for individual idle cores but also on the whole MCU cluster: As soon as all cores as well as interconnect and data-transferring units are idle, the whole cluster clock tree gets automatically disabled within one cycle. Events from any peripheral or incoming bus transactions suffice to automatically re-enable the clock, also within a single cycle. Besides the significant power savings during short cluster idle periods, this scheme proves advantageous to application development as it is transparent to software. The effectiveness of clock gating is demonstrated in Fig. 11(a) on the power trace of an example application (PWV estimation).

For longer idle periods of the cluster, power gating with the help of the power manager (PM) in the peripheral domain can be requested by any core. A state machine makes sure all outputs of the cluster are first isolated to defined values before requesting the shutdown of  $V_{DD\_CL}$  from an external power management IC (PMIC) to prevent corruption of the

interconnect systems in the peripheral domain [9]. Similar to clock-gated periods, peripheral events cause the PM to request re-powering of the cluster supply, followed by a reset and reboot of the cluster. Note that this procedure can additionally be used to change both  $V_{DD\_PE}$  and  $V_{DD\_CL}$  during runtime (i.e., without resetting the cluster but securely halting it) if, e.g., required processing performance has changed. Assuming ideal conditions for clock-gated periods (immediate change of  $V_{DD\_CL}$  to the lowest value providing state retention), power gating and rebooting the cluster pays off for more than 40 ms of cluster idle time as a reboot takes only 0.43 ms and 0.26  $\mu$ J.

### C. Energy-Efficient Data Management

The employed MCU extends the two-level memory hierarchy, described in Section VI-A, with a two-level DMA architecture: A dedicated, lightweight peripheral DMA (PDMA) [65], highlighted in Fig. 10, is responsible for low- to medium bandwidth transfers between the various interfaces—both to on-chip AFEs and external ASICs such as flash memory or radio frequency (RF) ICs—and the L2 memory. Therefore, the latter does not only serve as program storage for the cores, but also as ample-sized data buffer that is shared between all peripherals. The core of the PDMA is either a single receive or transmit channel or a pair of such per interface, depending on its directionality. All channels share a high-priority (HP) port to the L2, which adds very little interconnect overhead. Round robin arbitration between all PDMA channels provides bounded access latency; complemented with the ability to enqueue two transfers per channel, data loss is avoided under all circumstances. The PDMA also frees the cores of further code with low processing density by generating interface-specific control and command sequences.

The second level of the DMA approach is represented by the high-bandwidth cluster DMA (CDMA) [66], performing bulk transfers between L2 and L1 and serving two purposes: 1) hiding the L2 access latency from the cores by making received peripheral data and acquired samples single-cycle accessible and 2) providing blocks of calculated results in L2 for transmission or storage by the PDMA.

The presented two-level DMA architecture plays a vital role in enabling the cluster duty cycling concept presented in Section VI-B: during periods of pure data collection, the whole MCU cluster domain can be left clock- or even power-gated, as the PDMA resides in a different domain. Additionally, inter-memory transfers, handled by the CDMA, do not require the cluster to be active any longer than required for processing as transfers can happen in parallel, with the cores working on previously transferred data (software pipelining). The instruction cache significantly reduces the traffic to the peripheral domain originating from the cores, avoiding bus contentions between CDMA transfers and instruction fetching. Both DMAs signal finished transfers with events to the DMA-controlling core, which avoids status polling.

This solution presents a contrast to usual data transfer approaches in microcontroller-class systems that rely on one or more DMAs as medium- and high-bandwidth data handlers, provided with access to all system memories through



Fig. 12. (a) Architecture of the *threshold units*. (b) Recorded EOG signal with corresponding cluster power consumption. Insets: power traces during (c) buffer management and (d) notification (GPIO toggle) on an eye movement.

a central interconnect system [67]–[69]. Although proven very flexible, this classic approach requires the DMA to be designed for high bandwidths. Crucially, such a design makes it difficult to duty cycle the resulting power hungry interconnect- and data transfer units in the absence of processing load. The straightforward concept of assigning fixed per-peripheral buffers leads to low area efficiency as it renders inflexibly assigned valuable memory resources partially unused.

#### D. Data-Driven Duty Cycling

Some biomedical recordings exhibit long intervals with little activity; prominent examples include electromyography (EMG) with periods of relaxed muscles or electrooculography (EOG) with steady, center position, gaze. Often, the signal itself is of little interest during such periods, and only the segments with activity are of interest. Sometimes even just the presence of activity or the intervals between active segments are required. In such scenarios, active and idle periods can usually be distinguished by their signal amplitude, which may be low and in a narrow band for idle segments but several times higher during active periods. Although this distinction could be handled with a simple software algorithm, doing so is inefficient: Constant checking of samples requires an always-on but scarcely utilized cluster, which is a violation to the operating principles of the MCU subsystem. More efficient en bloc processing would result in significant reaction delay.

We therefore introduce lightweight, dedicated hardware in the peripheral domain for this task: The data output of each AFE is, in addition to a PDMA channel, also available to a programmable *threshold unit*, Fig. 12(a). It adds a further event source per AFE, triggered when a programmable portion of the received sample is above or below predefined thresholds. The threshold unit supports both signed and unsigned comparisons and the selectable bit range allows ignoring metadata which some AFEs include in their output. In cases where pure thresholding is not sufficient to distinguish true from false events due



Fig. 13. (a) Dynamic job dispatching at runtime and semaphore mechanism. (b) To prevent multiple execution of a single job. Modified from [9].

to, e.g., motion artifacts or strong noise, the processing power of the cores can be employed for the decision. In this scheme, the cluster is only activated if an event has been detected by the threshold unit, still achieving a significant cluster duty cycle reduction compared to constant software monitoring.

We demonstrate the capabilities of this concept at the example of an EOG experiment with horizontal eye movements recorded by one of the ExG channels, Fig. 12(b). In EOG, the angular position of the eyes correlates linearly with the measured dc voltage. With an appropriately set threshold value, this allows to trigger the immediate awakening of a core when the eyes are moved sufficiently from the center position—in this example, the value is set high enough not to be triggered upon blinking. A slight offset between upper and lower threshold values creates hysteresis for increased noise robustness. In EOG, the rise and fall times, i.e., the angular peak velocity of the eye movements, are usually of particular interest [70]. Thus, we employ in this experiment parallel recording of the data into L2 by the PDMA, which requires infrequent activity of the otherwise clock-gated cluster which consumes only  $45\ \mu\text{W}$  of average power ( $0.6\ \text{V}, 16\ \text{MHz}$ ). The buffer size was set to 512 samples which is at  $2\ \text{kS/s}$  sufficient to record enough data prior to the triggering: The solid parts of the EOG signal in Fig. 12(b) correspond to the minimum interval of automatically acquired data around an event.

## VII. APPLICATION RESULTS

The capabilities of VivoSoC have been demonstrated on first real-life use cases in prosthetics [52], PPG (Fig. 9), and work toward implantable vital signs monitoring in small animals [11]. Here, the powerful processing and data management capabilities proved effective to reduce overall power consumption: the application demands the PPG front end to sample at  $2 \times 10\ \text{kHz}$  (16-bit format), resulting in  $0.32\ \text{Mbit/s}$  of raw data. On-chip calculation of the heart rate, PWV and a thereon based estimation of blood pressure, massively reduces the data rate for subsequent transmission or non-volatile storage to  $6\ \text{Byte/s}$ —i.e., three 16-bit measurands transmitted once per second. The MCU can run at just  $8\ \text{MHz}$  and thus at low supply voltages of  $0.85$  and  $0.6\ \text{V}$  for the peripheral and cluster domains, respectively. In combination with per-core clock gating, overall MCU power consumption is only  $620\ \mu\text{W}$ , confirming that feature extraction comes much cheaper than raw data transmission: In our application,



Fig. 14. Chip micrograph and packaged SoC (169-pin BGA).

TABLE III  
FEATURES SUMMARY AND COMPARISON TO PREVIOUS WORK

|                                               | JSSC 2016 [29]                                                                                                                                   | This Work                                                                                                             |
|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|
| tech. node [nm] / die size [mm <sup>2</sup> ] | 180 / 37.3                                                                                                                                       | <b>130 / 20.8</b>                                                                                                     |
| supplies, AFEs [V]                            | 1.2                                                                                                                                              | <b>1.5 / 3.5</b>                                                                                                      |
| supplies, MCU [V]                             | 1.2 / 1.8                                                                                                                                        | <b>0.6 / 0.8-1.2</b>                                                                                                  |
| PPG                                           | # PD / LED ch.<br>FoM [dB]<br>max. samp. freq. <sup>a</sup> [kHz]                                                                                | 4 / 64<br>96<br>0.128                                                                                                 |
| BioZ                                          | # ch. / complex?<br>frequency [kHz]<br>sensitivity [mΩ/√Hz]                                                                                      | 1 / yes<br>20/40<br>3                                                                                                 |
| ExG                                           | # ch. / BW [kHz]<br>IR noise <sup>b</sup> [μVRMS]<br>pwr per ch. [μW]                                                                            | 1 / 0.5<br>0.6<br>50                                                                                                  |
| Aux.                                          | # ch.: temp./aux.<br>ADC res. [bit]                                                                                                              | n/a<br>n/a                                                                                                            |
| Stim.                                         | # ch.<br>frequency [kHz]<br>DAC res. [bit]<br>pwr, all ch. [μW]                                                                                  | n/a<br>n/a<br>n/a<br>6                                                                                                |
| MCU                                           | # cores / architecture<br>memory (L1/L2) [kByte]<br>CoreMark [1/MHz]<br>max. clk. freq. [MHz]<br>pwr, act. [μW/MHz/core]<br>tot. pwr, sleep [μW] | 1 / ARM M0<br>128<br>n/a<br>20 <sup>d</sup><br>120 <sup>d</sup><br>10 <sup>d</sup>                                    |
|                                               |                                                                                                                                                  | <b>4 / PULP</b><br><b>32 / 128</b><br><b>2.66</b><br><b>110</b><br><b>33.6<sup>f</sup></b><br><b>39.1<sup>g</sup></b> |

<sup>a</sup> 2 ch.    <sup>b</sup> 150Hz BW    <sup>c</sup> 8kSample/s    <sup>d</sup> [27]    <sup>e</sup> 20Hz, 200μs  
<sup>f</sup> 32bit matrix multiplication (full load on all cores),  $V_{\text{cluster}} = 0.6\text{V}$ ,  $V_{\text{periph}} = 0.8\text{V}$   
<sup>g</sup> cluster power gated, peripherals clock gated

a compression from 320 kBit/s to 48 bit/s reduces the power consumption of the *Bluetooth low-energy* SoC [71] from 26 mW to 360 μW [11] in continuous transmission.

The optimized implementation of the PWV extraction algorithm illustrated in Fig. 11(a) employs core 0 to run low-level driver routines as well as a job dispatcher, Fig. 13. Whenever a part of the application is pushed as a job into an FIFO queue, the otherwise sleeping cores 1–3 are woken up with a software event, subsequently trying to get a job from the queue and thereby achieving an efficient dynamic allocation of cores to the jobs. A semaphore mechanism based on test-and-set hardware support in the logarithmic interconnect ensures that only one core starts working on a particular job. Upon task completion, the cores check again the queue for further outstanding jobs, and go back to sleep otherwise.

### VIII. CONCLUSION

We reported a SoC (Fig. 14) featuring a unique combination of a multi-core MCU and (multi-channel) AFEs for ExG,

BioZ, PPG, and temperature recording as well as neural stimulation. It outreaches previously published work [27], [29] with respect to the envisaged application range by supporting high sampling rates for both ExG and PPG as required by implantable telemetry. The quad-core processing subsystem enables not only controlling of the AFEs, but provides ample computation power for on-chip signal processing or feature extraction—crucial for overall power efficiency. In combination with the neural stimulation capabilities, it further enables usage of the SoC for closed-loop neuroprosthetics.

Table III gives a feature summary of the SoC and compares it to [29]. More detailed and broader comparisons are given for the BioZ and PPG subsystems in Tables I and II, respectively. The latter demonstrates best-in-class PPG performance.

### ACKNOWLEDGMENT

The authors would like to thank J. Bösser, T. Kleier, and N. Brun for their help with measurements and applications, S. Fateh for his numerous contributions to the SoC, A. Pullini for his efforts in PDMA design, and Q. Wang for conducting acute animal experiments.

### REFERENCES

- [1] R. G. Haahr *et al.*, “An electronic patch for wearable health monitoring by reflectance pulse oximetry,” *IEEE Trans. Biomed. Circuits Syst.*, vol. 6, no. 1, pp. 45–53, Feb. 2012.
- [2] A. C. W. Wong, D. McDonagh, O. Omeni, C. Nunn, M. Hernandez-Silveira, and A. J. Burdett, “Sensium: An ultra-low-power wireless body sensor network platform: Design & application challenges,” in *Proc. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC)*, Sep. 2009, pp. 6576–6579.
- [3] AliveCor. *Kardia Mobile*. Accessed: Nov. 27, 2017. [Online]. Available: <https://www.alivecor.com>
- [4] Samsung. *Simband: Voice Body*. Accessed: Nov. 27, 2017. [Online]. Available: <https://www.simband.io>
- [5] S. Raspopovic *et al.*, “Restoring natural sensory feedback in real-time bidirectional hand prostheses,” *Sci. Translational Med.*, vol. 6, no. 222, p. 222ra19, Feb. 2014.
- [6] R. C. Pinnell, J. Dempster, and J. Pratt, “Miniature wireless recording and stimulation system for rodent behavioural testing,” *J. Neural Eng.*, vol. 12, no. 6, p. 066015, Aug. 2015.
- [7] D. Fan *et al.*, “A wireless multi-channel recording system for freely behaving mice and rats,” *PLoS ONE*, vol. 6, no. 7, p. e22033, Jul. 2011.
- [8] C. T. Wentz *et al.*, “A wirelessly powered and controlled device for optical neural control of freely-behaving animals,” *J. Neural Eng.*, vol. 8, no. 4, p. 046021, Jun. 2011.
- [9] F. Glaser *et al.*, “Towards a mobile health platform with parallel processing and multi-sensor capabilities,” in *Proc. Euromicro Conf. Digit. Syst. Design (DSD)*, Aug. 2017, pp. 462–469.
- [10] P. Schöngle *et al.*, “A multi-sensor and parallel processing SoC for wearable and implantable telemetry systems,” in *Proc. IEEE Eur. Solid-State Circuits Conf. (ESSCIRC)*, Sep. 2017, pp. 215–218.
- [11] P. Schöngle *et al.*, “Towards an implantable telemetry system for SpO<sub>2</sub> and PWV measurement in small animals,” in *Proc. IEEE Biomed. Circuits Syst. Conf. (BioCAS)*, Oct. 2017, pp. 596–599.
- [12] ETH Zürich and University of Bologna. *PULP: Parallel Ultra Low Power*. Accessed: Oct. 16, 2017. [Online]. Available: <http://www.pulp-platform.org>
- [13] T. Denison, M. Morris, and F. Sun, “Building a bionic nervous system,” *IEEE Spectr.*, vol. 52, no. 2, pp. 33–39, Feb. 2015.
- [14] Q. Huang and C. Menolfi, “A 200 nV offset 6.5 nV/√Hz noise PSD 5.6 kHz chopper instrumentation amplifier in 1 μm digital CMOS,” in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2001, pp. 362–363.
- [15] S. Fateh, P. Schöngle, L. Bettini, G. Rovere, L. Benini, and Q. Huang, “A reconfigurable 5-to-14 bit SAR ADC for battery-powered medical instrumentation,” *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 11, pp. 2685–2694, Nov. 2015.

- [16] R. Yazicioglu, P. Merken, R. Puers, and C. Van Hoof, "A 200  $\mu$ W eight-channel EEG acquisition ASIC for ambulatory EEG systems," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 3025–3038, Dec. 2008.
- [17] P. Schönlé *et al.*, "A DC-connectable multi-channel biomedical data acquisition ASIC with mains frequency cancellation," in *Proc. IEEE Eur. Solid-State Circuits Conf. (ESSCIRC)*, Sep. 2013, pp. 149–152.
- [18] C. van den Honert and J. T. Mortimer, "A technique for collision block of peripheral nerve: Single stimulus analysis," *IEEE Trans. Biomed. Eng.*, vol. BME-28, no. 5, pp. 373–378, May 1981.
- [19] N. P. Aryan, H. Kaim, and A. Rothermel, *Stimulation and Recording Electrodes for Neural Prostheses* (SpringerBriefs in Electrical and Computer Engineering). Heidelberg, Germany: Springer, 2015.
- [20] J.-J. Sit and R. Sarpeshkar, "A low-power blocking-capacitor-free charge-balanced electrode-stimulator chip with less than 6 nA DC error for 1-mA full-scale stimulation," *IEEE Trans. Biomed. Circuits Syst.*, vol. 1, no. 3, pp. 172–183, Sep. 2007.
- [21] D. R. Merrill, M. Bikson, and J. G. R. Jefferys, "Electrical stimulation of excitable tissue: Design of efficacious and safe protocols," *J. Neurosci. Methods*, vol. 141, no. 2, pp. 171–198, Feb. 2005.
- [22] P. Schönlé *et al.*, "A wireless system with stimulation and recording capabilities for interfacing peripheral nerves in rodents," in *Proc. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC)*, Aug. 2016, pp. 4439–4442.
- [23] M. Oberle, "Low power systems-on-chip for biomedical applications," Ph.D. dissertation, Dept. Inf. Technol. Elect. Eng., ETH, Zürich, Switzerland, 2002.
- [24] T. J. C. Faes, H. A. van der Meij, J. C. de Munck, and R. M. Heethaar, "The electric resistivity of human tissues (100 Hz–10 MHz): A meta-analysis of review studies," *Physiol. Meas.*, vol. 20, no. 4, pp. R1–R10, 1999.
- [25] Ø. G. Martinsen, B. Nordbotten, S. Grimnes, H. Fossan, and J. Eilevstjørn, "Bioimpedance-based respiration monitoring with a defibrillator," *IEEE Trans. Biomed. Eng.*, vol. 61, no. 6, pp. 1858–1862, Jun. 2014.
- [26] S. Rodriguez, S. Ollmar, M. Waqar, and A. Rusu, "A batteryless sensor ASIC for implantable bio-impedance applications," *IEEE Trans. Biomed. Circuits Syst.*, vol. 10, no. 3, pp. 533–544, Jun. 2016.
- [27] N. Van Helleputte *et al.*, "A 345  $\mu$ W multi-sensor biomedical SoC with bio-impedance, 3-channel ECG, motion artifact reduction, and integrated DSP," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 230–244, Jan. 2015.
- [28] J. A. Weldon *et al.*, "A 1.75 GHz highly-integrated narrow-band CMOS transmitter with harmonic-rejection mixers," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2001, pp. 160–161.
- [29] M. Konijnenburg *et al.*, "A multi(bio)sensor acquisition system with integrated processor, power management, 8  $\times$  8 LED drivers, and simultaneously synchronized ECG, BIO-Z, GSR, and two PPG readouts," *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2584–2595, Nov. 2016.
- [30] S. Hong *et al.*, "A 4.9 m $\Omega$ -sensitivity mobile electrical impedance tomography IC for early breast-cancer detection system," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 245–257, Jan. 2015.
- [31] N. Westerhof, N. Stergiopoulos, and M. Noble, *Snapshots of Hemodynamics*, 2nd ed. New York, NY, USA: Springer, 2010.
- [32] J. G. Webster, *Design of Pulse Oximeters*. Boca Raton, FL, USA: CRC Press, 1997.
- [33] A. Pellicer and M. del C. Bravo, "Near-infrared spectroscopy: A methodology-focused review," *Seminars Fetal Neonatal Med.*, vol. 16, no. 1, pp. 42–49, Feb. 2011.
- [34] F. Scholkemann *et al.*, "A review on continuous wave functional near-infrared spectroscopy and imaging instrumentation and methodology," *NeuroImage*, vol. 85, no. 1, pp. 6–27, Jan. 2014.
- [35] E. D. Chan, M. M. Chan, and M. M. Chan, "Pulse oximetry: Understanding its basic principles facilitates appreciation of its limitations," *Respiratory Med.*, vol. 107, pp. 789–799, Mar. 2013.
- [36] P. Schönlé, S. Fateh, T. Burger, and Q. Huang, "A power-efficient multi-channel PPG ASIC with 112 dB receiver DR for pulse oximetry and NIRS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Apr. 2017, pp. 1–4.
- [37] S. J. Barker, J. Curry, D. Redford, and S. Morgan, "Measurement of carboxyhemoglobin and methemoglobin by pulse oximetry: A human volunteer study," *Anesthesiology*, vol. 105, no. 5, pp. 892–897, Nov. 2006.
- [38] M. S. Scheller, R. J. Unger, and M. J. Kelner, "Effects of intravenously administered dyes on pulse oximetry," *Anesthesiology*, vol. 65, no. 5, pp. 550–552, Nov. 1986.
- [39] M. Verhovsek, M. P. A. Henderson, G. Cox, H.-Y. Luo, M. H. Steinberg, and D. H. K. Chui, "Unexpectedly low pulse oximetry measurements associated with variant hemoglobins: A systematic review," *Amer. J. Hematol.*, vol. 85, no. 11, pp. 882–885, Nov. 2010.
- [40] J. Lee, K. Matsumura, K.-I. Yamakoshi, P. Rolfe, S. Tanaka, and T. Yamakoshi, "Comparison between red, green and blue light reflection photoplethysmography for heart rate monitoring during motion," in *Proc. IEEE 35th Annu. Int. Conf. Eng. Med. Biol. Soc.*, Jul. 2013, pp. 1724–1727.
- [41] Y. Mendelson, D. K. Dao, and K. H. Chon, "Multi-channel pulse oximetry for wearable physiological monitoring," in *Proc. IEEE Int. Conf. Body Sensor Netw. (BSN)*, May 2013, pp. 1–6.
- [42] P. C. Schönlé, "A power efficient spectrophotometry & PPG integrated circuit for mobile medical instrumentation," Ph.D. dissertation, Dept. Inf. Technol. Elect. Eng., ETH, Zürich, Switzerland, 2017.
- [43] *AFE4403 Datasheet*, document SBAS650B, Texas Instruments, May 2014.
- [44] L. Sant, A. Fant, S. Stojanović, S. Fabbro, and J. L. Cevallos, "A 13.2 b optical proximity sensor system with 130 klx ambient light rejection capable of heart rate and blood oximetry monitoring," *IEEE J. Solid-State Circuits*, vol. 51, no. 7, pp. 1674–1683, Jul. 2016.
- [45] A. Sharma *et al.*, "A Sub-60- $\mu$ A multimodal smart biosensing SoC with >80-dB SNR, 35- $\mu$ A photoplethysmography signal chain," *IEEE J. Solid-State Circuits*, vol. 52, no. 4, pp. 1021–1033, Apr. 2017.
- [46] *AFE4404 Datasheet*, document SBAS689C, Texas Instruments, Jun. 2015.
- [47] K. N. Glaros and E. M. Drakakis, "A sub-mW fully-integrated pulse oximeter front-end," *IEEE Trans. Biomed. Circuits Syst.*, vol. 7, no. 3, pp. 363–375, Jun. 2013.
- [48] G. Xu and J. Yuan, "Performance analysis of general charge sampling," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 52, no. 2, pp. 107–111, Feb. 2005.
- [49] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, "Internet of Things (IoT): A vision, architectural elements, and future directions," *Future Generat. Comput. Syst.*, vol. 29, no. 7, pp. 1645–1660, 2013.
- [50] R. Khan, S. U. Khan, R. Zaheer, and S. Khan, "Future Internet: The Internet of Things architecture, possible applications and key challenges," in *Proc. Int. Conf. Frontiers Inf. Technol. (FIT)*, Dec. 2012, pp. 257–260.
- [51] S. Benatti, F. Montagna, D. Rossi, and L. Benini, "Scalable EEG seizure detection on an ultra low power multi-core architecture," in *Proc. IEEE Biomed. Circuits Syst. Conf. (BioCAS)*, Oct. 2016, pp. 86–89.
- [52] S. Benatti *et al.*, "A sub-10mW real-time implementation for EMG hand gesture recognition based on a multi-core biomedical SoC," in *Proc. IEEE Int. Workshop Adv. Sensors Inter. (IWASI)*, Jun. 2017, pp. 139–144.
- [53] P. Chriskos *et al.*, "Automatic sleep stage classification applying machine learning algorithms on EEG recordings," in *Proc. IEEE Int. Symp. Comput.-Based Med. Syst. (CBMS)*, Jun. 2017, pp. 435–439.
- [54] K. Giannakaki, G. Giannakakis, C. Farmaki, and V. Sakkalis, "Emotional state recognition using advanced machine learning techniques on EEG data," in *Proc. IEEE Int. Symp. Comput.-Based Med. Syst. (CBMS)*, Jun. 2017, pp. 337–342.
- [55] F. Conti *et al.*, "An IoT endpoint system-on-chip for secure and energy-efficient near-sensor analytics," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 9, pp. 2481–2494, Sep. 2017.
- [56] A. Pullini, F. Conti, D. Rossi, I. Loi, M. Gautschi, and L. Benini, "A heterogeneous multi-core system-on-chip for energy efficient brain inspired computing," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, to be published.
- [57] D. Rossi *et al.*, "A –1.8 V to 0.9 V body bias, 60 GOPS/W 4-core cluster in low-power 28 nm UTBB FD-SOI technology," in *Proc. IEEE SOI-3D-Subthreshold Microelectron. Technol. Unified Conf. (S3S)*, Oct. 2015, pp. 1–3.
- [58] D. Rossi *et al.*, "Energy-efficient near-threshold parallel computing: The PULPv2 cluster," *IEEE Micro*, vol. 37, no. 5, pp. 20–31, Sep./Oct. 2017.
- [59] D. Rossi *et al.*, "A self-aware architecture for PVT compensation and power nap in near threshold processors," *IEEE Des. Test*, vol. 34, no. 6, pp. 46–53, Dec. 2017.
- [60] D. Rossi *et al.*, "PULP: A parallel ultra low power platform for next generation IoT applications," in *Proc. IEEE Hot Chips Symp. (HCS)*, Aug. 2015, pp. 21–39.
- [61] M. Gautschi *et al.*, "Tailoring instruction-set extensions for an ultra-low power tightly-coupled cluster of OpenRISC cores," in *Proc. IFIP/IEEE Int. Conf. Very Large Scale Integr. (VLSI-SoC)*, Oct. 2015, pp. 25–30.
- [62] A. Rahimi, I. Loi, M. R. Kakooee, and L. Benini, "A fully-synthesizable single-cycle interconnection network for shared-L1 processor clusters," in *Proc. Design, Autom. Test Eur. Conf. Exhibit. (DATE)*, Mar. 2011, pp. 1–6.
- [63] I. Loi *et al.*, "Exploring multi-banked shared-L1 program cache on ultra-low power, tightly coupled processor clusters," in *Proc. ACM Int. Conf. Comput. Frontiers*, May 2015, p. 64.

- [64] A. Teman, "Power, area, and performance optimization of standard cell memory arrays through controlled placement," *ACM Trans. Des. Autom. Electron. Syst.*, vol. 21, no. 4, Sep. 2016, Art. no. 59.
- [65] A. Pullini, D. Rossi, G. Haugou, and L. Benini, " $\mu$ DMA: An autonomous I/O subsystem for IoT end-nodes," in *Proc. Int. Workshop Power Timing Modeling, Optim. Simulation (PATMOS)*, Sep. 2017, pp. 1–8.
- [66] D. Rossi, I. Loi, G. Haugou, and L. Benini, "Ultra-low-latency lightweight DMA for tightly coupled multi-core clusters," in *Proc. ACM Conf. Comput. Frontiers*, May 2014, p. 15.
- [67] 32-Bit ARM Cortex-M4/M0 Flashless MCU, *Product Data Sheet*, document LPC4350/30/20/10, NXP, Mar. 2016.
- [68] *STM32L4x5 and STM32L4x6 Advanced ARM-Based 32-Bit MCUs, Reference Manual*, STMicroelectronics, Geneva, Switzerland, Mar. 2017.
- [69] SMART ARM-Based Microcontrollers—SAM L21 Family, *Data Sheet*, Microchip, Chandler, AZ, USA, Feb. 2017.
- [70] D. S. Zee, L. M. Optican, J. D. Cook, D. A. Robinson, and W. K. Engel, "Slow saccades in spinocerebellar degeneration," *Arch. Neurol.*, vol. 33, no. 4, pp. 243–251, 1976.
- [71] Nordic Semiconductor, document nRF52832, Trondheim, Norway, Feb. 2017.



**Philipp Schöngle** (S'13–M'17) received the M.Sc. and Ph.D. degrees in electrical engineering from the Swiss Federal Institute of Technology (ETH) Zürich, Zürich, Switzerland, in 2011 and 2017, respectively.

He is currently a Senior Design Engineer with Advanced Circuit Pursuit AG, Zollikon, Switzerland, and an Associate Researcher with the Integrated Systems Laboratory, ETH Zürich. His current research interests include analog integrated circuits for sensor front-ends, system design, and application thereof in medical research.



**Florian Glaser** (S'17) received the M.Sc. degree in electrical engineering from the Swiss Federal Institute of Technology (ETH) Zürich, Zürich, Switzerland, in 2015, where he is currently pursuing the Ph.D. degree with the Integrated Systems Laboratory.

His current research interests include low power, mixed signal integrated circuits and systems for biomedical applications, and energy-efficient synchronization of multi-core microcontrollers.



**Thomas Burger** (M'91) received the Dipl.Ing. and Ph.D. degrees from the Swiss Federal Institute of Technology (ETH) Zürich, Zürich, Switzerland, in 1987 and 2002, respectively.

From 1987 to 1994, he was a Development and Research Engineer with Ascom Radiocom Ltd., Mägenwil, Switzerland. In 1994, he joined the Integrated Systems Laboratory, ETH Zürich, where he currently holds the position of Research Associate. He is leading several external and internal projects, covering a wide range of circuits such as analog-to-digital and digital-to-analog converters, active RC and gm-C filters, LNAs, mixers, and more recently also dc–dc converters and PLLs.

Dr. Burger served in the ISSCC Technical Program Committee for Wireline Communications from 2004 to 2007.



**Giovanni Rovere** (S'17) received the M.S. degree in electrical engineering from the University of Padova, Padua, Italy, in 2013. He is currently pursuing the Ph.D. degree with the Integrated Systems Laboratory, ETH Zürich, Zürich, Switzerland.

He was a Visiting Student with the Institute of Neuroinformatics, UZH-ETH, Zürich. From 2013 to 2014, he was an Asynchronous Digital Designer with the Italian Institute of Technology, Genoa, Italy, where he was involved in the design of a de/serializer for event-driven vision sensor for robotic applications. His current research interests include biomedical acquisition systems and low power analog front-end.



**Luca Benini** (S'94–M'97–SM'04–F'07) received the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, USA, in 1997.

He holds the Chair of digital circuits and systems with ETH Zürich, Zürich, Switzerland, and a Full Professor with the Universita di Bologna, Bologna, Italy. He has authored or co-authored more than 800 peer-reviewed papers. His current research interests include energy-efficient system design for embedded and high-performance computing, energy-efficient smart sensors and ultralow power very large scale integration design.

Dr. Benini is a fellow of the ACM and a member of the Academia Europaea. He was a recipient of the 2016 IEEE CAS Mac Van Valkenburg Award.



**Qiuting Huang** (S'86–M'88–SM'96–F'02) received the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Leuven, Belgium, in 1987.

From 1987 and 1992, he was a Lecturer with the University of East Anglia, Norwich, U.K. Since 1993, he has been with the Integrated Systems Laboratory, Swiss Federal Institute of Technology (ETH) Zürich, Zürich, Switzerland, where he is currently a Professor of electronics. In 2007, he was also appointed as a part-time Cheung Kong

Seminar Professor by the Chinese Ministry of Education and the Cheung Kong Foundation and has been affiliated with South East University, Nanjing, China. His current research interests include radio frequency, analog, mixed analog–digital as well as digital application-specific integrated circuits and systems, with an emphasis on wireless communications and biomedical applications in recent years.

Dr. Huang currently serves as the Vice Chair of the Steering Committee and the SubCommittee Chair of the Technical Program Committee of the European Solid-State Circuits Conference. He also served on the technical program and executive committees of the International Solid-State Circuits Conference from 2000 to 2010.